How we saved millions on AWS - Part 3

Part 3: Making a Cultural Shift

In early 2022, as the world was emerging from the Covid-19 pandemic, inflation surged to multi-decade highs, prompting central banks to raise interest rates. The stock market declined, and tech companies around the world began to shift gears: after the high-growth mindset of 2021, efficiency and profitability became the focus.

At this point, Forter was an 8 years old start-up. In our early days, and especially during the Covid days, we grew rapidly, both in revenue and customers. It was time to optimize our unit of economics, and focus more of our attention on our cloud spend.

Join us on a multi-year journey, as we have transitioned cloud cost budget ownership from Finance to Engineering, generated millions of dollars in savings, improved the company’s gross margins and valuation, and created a cultural shift that saw us adopt cost best practices across the organization.

This is Part 3 of a 3 part series. Read parts 1 and 2 here:

  1. Part 1: Understand how to launch your cost optimization journey, setting targets, understanding Gross Margins, and more.
  2. Part 2: Learn from dozens of real world examples that allowed us to save millions of $ in cloud costs, going in depth on S3, EC2, Elasticsearch, and more.

We created a cloud strategy, built a cost model and budget controls to inform the business. We managed to drastically optimize our cloud costs. Now we only had one piece is missing:

A cost efficiency minded culture is critical to maintain an efficient engineering operation

Here’s what we wanted to avoid:

● Only FinOps or DevOps teams caring about the price of a workload, with application teams being unaware of their application costs

● Engineers acting in a wasteful manner: launching instances and leaving them on when they are not needed, using huge EBS storage for no reason, not deleting unused data, etc.

● Cloud costs being ignored as a KPI when planning new projects

To be fair, this is pretty much where we were in 2021. We needed this to change, and fast. Below are the learnings we’ve implemented.

Alerting and monitoring costs

Cost dashboards are important, and we do recommend having them (we use a third party as Cost Explorer dashboarding capabilities are non-existent). But dashboards are useless if no one is looking at them. When we see resources being needlessly wasted, we now get a strong impulse to immediately call it out. But how can we automate this impulse?

AWS provides a cost anomaly detection feature. While it doesn’t catch everything, we found it very useful and not too noisy. We routed its alerts to Slack, so that sudden cost increases grab peoples’ attention. This makes it much harder to miss an accidental cost increase.

Because Slack is so ingrained in our day to day, we further leveraged it to send monthly cost reports directly into teams’ channels. This way, each team is confronted with its production costs every month, making it hard to forget that some service is costly and inefficient.

Here’s one example:

Cloud Cost as a First Class Citizen in Design Reviews

At Forter, engineers present design reviews when building big new projects. The purpose is for everyone to be aligned on why/what/how we are going to build something, and to get critical feedback before the project is developed.

To put things bluntly, back when the company was focused on growth, cloud spend was not on the agenda. While it was occasionally mentioned in discussions, it was usually a very casual and shallow cost analysis, neglecting a forward looking growth assessment.

In recent years we have grown up a lot in that regard. Every new project design discusses the cost implication of the new or changed system. We also got the hang of using spreadsheets to predict the future cost. Cost trade-offs are often discussed and impact design choices.

As more people were involved in cost optimization projects, Design Reviews tended to include a cloud costs discussion. After that, it became second nature. And as a natural progression, we updated our DR template to include some questions about cloud spend, to remind builders to think about the $$ cost of their projects.

Celebrating Efficiency Wins

Engineering orgs tend to not celebrate successes often enough, and ours wasn’t an exception. We celebrated new product launches or the signings of a big new customer, but efficiency and scale optimizations that aren’t immediately visible to customers were often left behind. This needed to change, as we wanted folks to feel proud of their work.

Make sure to stop and recognize great optimization work

First, we started recognizing both big and small successes in our #engineering-wins slack channel:

 

While it looks like a small thing, people love the channel. As a project owner you want to feel valued for your work. As a reader, you get to know about all the cool things happening around you and feel proud of your team.

For bigger projects, managers make sure to set up a small event with cake, an executive to give a slap on the back, and maybe a few speeches.

The culture change has been significant for us. We see more optimization projects initiated by engineers, and according to internal surveys, people feel more engaged and valued than before.

Encourage End to End Thinking

At Forter we have this massive data pipeline that processes events from many consumers. It looks something like this:

Now, this data pipeline is big and expensive, so the Data Pipeline team invested a lot in its optimization, managing to reduce costs by over 80%. Great, right?

However, it turns out that 40% of the events were coming from a single producer—let's call it Producer 2. The Data Pipeline team was aware of this, but didn't give it much thought, assuming these updates were necessary. Meanwhile, the team responsible for Producer 2 didn't realize they were generating significant costs for the Data Pipeline, as these costs weren't assigned to them, but to the Pipeline team.

Once we noticed it, we managed to reduce 90% of Producer 2 events in 3 lines of code, saving 36% of the entire Data Pipeline costs. Why didn’t we notice this before? Because:

  1. Teams are used to optimizing their own black box. The Data Pipeline team wanted to process the events as cheaply as possible, and weren't stopping to ask whether the updates were even necessary.
  2. Proper cost allocation in shared systems is difficult. If the Data Pipeline costs were spread among the teams that own the Producers, it's very likely Producer 2 team would have noticed the high cost they were generating.

Educate engineers on the system as a whole, to ensure they are able to prioritize global optimizations over local ones.

Final Words

Well, this was a long journey, both in real life and in the blog post!

While the road was difficult, the results for us were amazing:

Over 80% improvement to our cloud costs, saving us many millions of $$, while also drastically improving the company’s gross margin and valuation

● Ownership of cloud cost budget and planning transitioned from Finance to Engineering, resulting in much better alignment and much less mid-year budget surprises!

● A real culture shift has taken place. Cloud costs are now considered in every project plan, people are actually aware of how much their systems really cost, and efficiency gains are always celebrated.

Sounds cool? If you want to work with us on these things do drop us your CV at our careers page!

</div>