Best Practices for Infrastructure Monitoring Cost Management

By: Gedalyah Reback

How to Create Cost-Effective Infrastructure Monitoring

FinOps isn’t limited to cloud cost management, but also observability and infrastructure monitoring. Out-of-the-box monitoring solutions are getting more and more difficult to implement. In addition, they may not have the features you need, and they’re always becoming more expensive. It’s no wonder you’re trying to create a cost-effective solution via infrastructure monitoring cost management on your own by using a combination of your on-site development talent and a few open source libraries.

In this article, you’ll learn how to balance homegrown and software as a service (SaaS) monitoring solutions in your cloud of choice, how to make these solutions cost-effective, and how to manage their associated hidden costs.

Building Up an Infrastructure Monitoring Stack

At its core, a monitoring application consists of three parts. The web tier is the interface that administrators use to interact with the system. The application tier is where all of the data handling and processing occurs. Essentially, this is where the monitor “listens” for all of the goings-on in your environment. Finally, the database tier securely stores persistent data so that it can be easily retrieved for running reports at a later time.

As you create the solution that works best for you, you’ll realize that you don’t need to build everything from the ground up. There are a lot of technologies in the cloud and on-premises that can easily be repurposed. One example of an SaaS you’ll want to incorporate is an event queue. AWS calls this technology the Simple Queue Service (SQS). In Azure, it’s referred to as Azure Event Hubs. You could use an on-premises service to provide this same functionality. You could spin up, manage, and orchestrate your own implementation of Apache Kafka or RabbitMQ. However, if you’re implementing your own infrastructure monitoring, automating the hard part by using a cloud-based native SaaS solution is a better option than adding additional complexity.

We’ve discussed the benefits of cloud native monitoring before. The bottom line is that choosing to spin up an ELK stack in the cloud will save you time, money, and resources that could be better implemented elsewhere.

FinOps & Infrastructure Monitoring Cost Management

FinOps philosophy demands efficient use of resources, something you can’t get without good visibility into your tools and their metrics. Building your monitoring infrastructure effectively is key to making a cost-efficient solution. You must avoid overspend by ensuring that your solution is appropriately sized and scaled for your needs. If you don’t, you’re leaving money on the table.

Cloud native architecture analysis can help you determine areas of overspend or underspend. FinOps services such as AWS Well-Architected Framework and Azure Advisor show you which of your resources are provisioned improperly. They can help you proportion your implementation to avoid overspend in areas like storage and IOPS.

Additionally, you can host your own ELK stack to monitor and report on AWS billing. To do this, enable AWS Billing reports and ship the logs with AWS Lambda. Once they’ve been exported, the logs can be queried and analyzed for areas of improvement. Finally, tools like AWS Well-Architected Framework, a custom ELK stack, or Azure Advisor can monitor the billing of your monitoring infrastructure.

Multicloud Cost Management

Managing costs across multiple vendors adds complexity and challenges. Unfortunately, there’s no way to manage your Azure spending with a native AWS service—or vice-versa. When it comes to multicloud spending, you have to get the fundamentals right by following the three guidelines described below.

1. Spin Down Idle Servers and Services

Although this may seem like a basic point, it’s important to analyze your autoscaled and automatically-provisioned cloud resources. If you’re aggressively scaling your platform, and none of your endpoints are running above 25% capacity, you have a great opportunity for cost savings.

Similarly, if you have a minimum of 50 servers defined in an autoscaling group, and your application only needs 20, there’s no need for those additional 30 servers to be chewing up your resources and your budget. Accordingly, your app’s workload requirements can go a long way towards keeping costs down.

2. Spin Down Over-provisioned Resources

Sometimes it feels good to have a fleet of 100 idle servers ready to handle a sudden burst in traffic. However, maintaining this fleet dramatically increases your spending. A better way to handle an increase in activity is by using containers or orchestration to quickly spin up new servers when you notice the change. DevOps can assist you here.

3. Regularly Reevaluate Your Non-production Environments to Identify Overspend

Although it’s recommended to have your non-production environments mirror production, you may want to consider the cost this carries. You can scale your non-production environment back with minimal impact to your production resources. For instance, you can increase the requirements before you expand your autoscaling groups in non-production in order to avoid running several hundred full-fledged servers for a test load.

Managing multicloud costs isn’t impossible. You just don’t have the automation that you have in a native cloud environment. Being aware of what you deploy and managing costs on a service-by-service level will save you time, money, and resources.

Manage the Hidden Costs of Infrastructure Monitoring

So far, we’ve covered “known known” costs such as the expense of provisioned cloud infrastructures. What we haven’t discussed yet are the “known unknown” costs—the costs of data retrieval, autoscaling, and maintenance. When deploying a custom monitoring solution, cloud monitoring has a few “gotchas” to watch out for.

The first involves data in transit. You already know that there are costs associated with storing data in the cloud at rest. Costs can be accrued for data in transit as well. Most cloud providers allow you to put data into the cloud for free. This is great if you want to create a data lake, securely store your company files, or even open up a pipe for all of your monitoring data to flow through. The cloud gets expensive when you want to pull the data out. Both AWS and Azure charge for running queries and retrieving data.

The next place you might get tripped up is in solution maintenance. It’s understandable to think that support is built-in when you deploy several open source solutions. This couldn’t be further from the truth. One of the benefits of using open source monitoring software is that it can be acquired for free. The maintenance, support, and custom development of the solution will have its own unique costs, however.

Summary

Custom infrastructure monitoring solutions are starting to make more sense, given the ever-changing monitoring landscape. For the most part, open source solutions can compete with many big names, but remember there are still hidden costs.

Consequently. cost management is vital to running your infrastructure monitoring stack. With adequate preparation and appropriate expectations, you can create a monitoring solution that will deliver exactly what you need at a cost you can predict. Implementing best practices for native cloud, hybrid cloud, and multicloud solutions – as modern FinOps would demand – will enable you to take full advantage of all of the technologies available.