Considering the scale of log data that modern cloud environments generate, it’s oftentimes prohibitively expensive to index all of it. For monitoring and logging, cost management is just as important as in other parts of the business. Whether sudden spikes of log data overwhelm databases or good business generates more activity in your environment, teams should anticipate and mitigate the steep costs that result from high log volumes.
But isn’t solving this problem as simple as filtering out log data using a Fluentd processor or a syslog server?
We don’t think so. What if you need all of your log data tomorrow to troubleshoot a production incident? It’s easy to be torn between filtering out log data for the sake of cost, or keeping all of it for the sake of observability.
With Logz.io, you won’t have to make those tradeoffs. In this blog, we’ll explore how three Logz.io features work together to easily identify and filter out the costly log data that isn’t always essential. This is true while also maintaining the option to reingest and analyze that data if it becomes indispensable.
Logz.io Archive & Restore: Keep the option to analyze all your logs
Once we filter out logs, we can just forget about them, right?
Throwing away log data can leave a gap in your system’s observability. What if there is a production incident tomorrow, and you’re missing the data that is key to figure out what happened? While filtering out data helps our logging cost management, we still need access to that data just in case we need it at another time.
Before we filter out our logs, we have to make sure we can still access and reingest those logs if they’re needed later. We just need to keep them somewhere with cheap storage, like an S3 bucket. Logz.io’s Archive & Restore allows us to archive our log data in an S3 bucket so we can access it again without indexing everything, which is far less expensive.
First, you’ll need to create an IAM role in AWS and an S3 bucket. Then, navigate to ‘Archive & Restore’ in the Logz.io tools tab. Enter the bucket name, AWS access key, and AWS secret key in the ‘Archive’ tab.
Connect to Your S3 Bucket
You can test the connection to make sure Logz.io can connect to your new S3 bucket. From, there, hit ‘Archive’ to begin sending your data to the S3 bucket for storage.
You’ll get a popup in the app telling you that Logz.io has successfully connected to your S3 bucket. In tow, you’ll receive a message under the page title indicating that logs are currently entering the archive in S3.
Now that our logs are being archived, we will have the option to reingest, index, and analyze all of our logs – including those that have been filtered out – if they’re ever needed later.
Log Patterns: Identify Noisy Logs to Manage Costs
Costly logs refers to what in effect are surplus logs beyond your bare essentials, but take up a lot of space in your indices, and therefore your budget. It’s dangerous to just throw away these logs – the last thing you want is to encounter a production incident and be missing the log data you have to troubleshoot because you deleted it (as we’ve discussed).
Of course, you don’t want to pay for log data you’re not looking at very often. Sending data that won’t see analysis all too often wastes time and money by:
- Impacting the performance and stability of your system by clogging up databases.
- Requiring users to search through more data when exploring logs, metrics, or traces.
- Racking up unnecessary infrastructure costs if using open source on premise solutions.
- Resulting in steep up-front costs from proprietary vendors.
To identify the less useful data, the first steps are to ship, index, and analyze it. Then you can decide whether you’ve got to have it for day-to-day operations.
An easy way to pinpoint the noisy data is with Log Patterns, which clusters similar logs into smaller, manageable groups and order them by frequency. This can turn millions of records into tens, making it clear which is your most common log data.
In the screenshot below, notice the columns with labels ‘Count’, ‘Ratio’, and ‘Pattern”. This shows us which logs have the greatest impact on our costs.
We immediately see that our ‘Proxy request to services send’ logs are eating up 18.26% of our logging costs. If we look at them infrequently, they are strong candidates for removal.
Logz.io Drop Filters: Filter out the costly logs
Once you know which logs you’ll seldom use, the next step is to filter it out. Logz.io’s pricing bases itself on how much log data it indexes, not how much it ships. This means if you filter out the log data before indexing, it won’t be held against your costs.
By adding a Drop Filter, Logz.io will filter out the defined log data rather than indexing it.
Adding a Drop Filter is easy. Simply go to the ‘Tools’ tab under the cog wheel in the top right corner and hit ‘Drop filters’.
From there, we can add a service by specifying:
- The log type that the filter will look at
- The field that the filter will look at
- The value of the field that you’d like to drop
In this case, we pointed the new Drop filter towards ‘service-10’, asked it to watch the ‘message’ field, and pasted our ‘Proxy request services send’ logs in the value field.
From this point on, Logz.io will filter out all logs with this message before being indexed. If these logs were to spike, they wouldn’t be held against the Logz.io cost. And if they’re needed later, they can be re-ingested for analysis (see next section).
Say you have a big release for ‘Service-10’, and want to analyze all of the logs it generates in the next day or two to make sure nothing breaks. You can deactivate the Drop Filter by toggling it. That’ll ensure indexing and analysis of all future ’Proxy request services send’ logs until you turn the Drop Filter back on.
The ability to toggle a list of Logz.io Drop Filters offers a sort of control panel, which allows users to manage and regulate the incoming streams of logs to avoid indexing unnecessary data.
Summing it up
Many modern DevOps teams are caught in a pickle between cutting costs and maintaining full observability.
You don’t have to choose between cost management and full observability. With Logz.io features like Log Patterns, Drop Filters, and Archive & Restore, you can manage costs without sacrificing your data. These features are designed to easily identify and filter out that data before indexing it, and then reingest that data if necessary later.
Check out this page to learn more about how Logz.io can help you make logging more cost efficient.