Cost optimization has been one of the hottest topics in observability (and beyond!) lately. Everyone is striving to be efficient, spend money wisely, and get the most out of every dollar invested. At Logz.io, we recently embarked on a very interesting and fruitful data volume optimization journey, reducing our own internal log volume by a whopping 50%. In this article, I’ll tell you how exactly we achieved this result.
We always strive to use the observability tools we’ve developed ourselves, i.e. we ‘eat our own dog food’ 🐶. Logging, of course, is no exception. All internal system logs end up in one of the Logz.io accounts that we use every day to monitor the health of dozens of microservices and perform all kinds of troubleshooting in complex distributed system environments.
From a cost perspective, logging costs are a linear function of log volume and the hot retention time: O(m*n) =where m is the daily log volume (GB) and n is the number of days we need these logs. The retention time is often determined by business requirements rather than technical ones, so we started by focusing on the log volume.
Before we could tackle log volume optimization, we needed to know the current situation. To do this, answering the following questions was a good starting point:
For us, the answer to the first question yielded a whopping number between 2.7 TB and 3.7 TB daily in rainy November 2022:
That seemed a bit too much, so we decided to get to the bottom of it.
Not all logs are of equal value: some are used very rarely, some become completely obsolete and irrelevant over time, while others are used on a daily basis. The Data Optimization Hub was a very handy tool to look through the piles of logs and understand which types of logs were taking up the most space and had little or no value:
We have treated the different categories of logs as follows:
Using a table like the one above, we could easily identify the biggest log consumers (both in terms of total log size and number of logs) per log type.
At this point, we were left with the logs we were actively using. Playing around with different visualizations and searching through the Log Size field (you can easily enable it in the account settings) brought us to the realization that some of the logs were much heavier than others. We had logs that were 5 KB each, and those that were over 1 MB. That’s quite a difference! Very heavy logs are usually the sign of the issue and should be investigated.
While checking the heavier types of logs, we found that we generally did not need all of the information: some of it was just repetitive, and some of it just was not that useful in general. We worked with the responsible teams to change the way the heavier logs are generated and that had an extremely positive effect as well.
Once all of the above steps were completed, we recommended setting up ongoing log volume monitoring processes to keep the volume under control. For example:
By following these simple steps, we were able to reduce log volume by 50% to an average of 1.5 TB daily:
We were able to significantly reduce the total logs volume and substantially reduce the costs as a result. But the journey never ends and we have to be attentive and monitor the situation on a regular basis to prevent usage spikes.
Learn more about our Data Optimization Hub and how Logz.io can help transform your observability strategy by signing up for a free trial today.