Considering the scale of log data that modern cloud environments generate, it’s oftentimes prohibitively expensive to index all of it. For monitoring and logging, cost management is just as important as in other parts of the business. Whether sudden spikes of log data overwhelm databases or good business generates more activity in your environment, teams should anticipate and mitigate the steep costs that result from high log volumes.
But isn’t solving this problem as simple as filtering out log data using a Fluentd processor or a syslog server?
We don’t think so. What if you need all of your log data tomorrow to troubleshoot a production incident? It’s easy to be torn between filtering out log data for the sake of cost, or keeping all of it for the sake of observability.
With Logz.io, you won’t have to make those tradeoffs. In this blog, we’ll explore how three Logz.io features work together to easily identify and filter out the costly log data that isn’t always essential. This is true while also maintaining the option to reingest and analyze that data if it becomes indispensable.
Log Patterns: Perfect for Logging Cost Management
“Costly logs” refers to what in effect are surplus logs beyond your bare essentials, but take up a lot of space in your indices, and therefore your budget. It’s dangerous to just throw away these logs – the last thing you want is to encounter a production incident and be missing the log data you have to troubleshoot because you deleted it (more on this later).
Of course, you don’t want to pay for log data you’re not looking at very often. Sending data that won’t see analysis all too often wastes time and money by:
- Impacting the performance and stability of your system by clogging up databases.
- Requiring users to search through more data when exploring logs, metrics, or traces.
- Racking up unnecessary infrastructure costs if using open source on premise solutions.
- Resulting in steep up-front costs from proprietary vendors.
To identify the less useful data, the first steps are to ship, index, and analyze it. Then you can decide whether you’ve got to have it for day-to-day operations.
An easy way to pinpoint the noisy data is with Log Patterns, which clusters similar logs into smaller, manageable groups and order them by frequency. This can turn millions of records into tens, making it clear which is your most common log data.
In the screenshot below, notice the columns with labels ‘Count’, ‘Ratio’, and ‘Pattern”. This shows use which logs have the greatest impact on our costs.
We immediately see that our ‘Proxy request to services send’ logs are eating up 18.26% of our logging costs. If we look at them infrequently, they are strong candidates for removal.
Logz.io Drop Filters: Filter out the costly logs
Once you know which logs you’ll seldom use, the next step is to filter it out. Logz.io’s pricing bases itself on how much log data it indexes, not how much it ships. This means if you filter out the log data before indexing, it won’t be held against your costs.
By adding a Drop Filter, Logz.io will filter out the defined log data rather than indexing it.
Adding a Drop Filter is easy. Simply go to the ‘Tools’ tab under the cog wheel in the top right corner and hit ‘Drop filters’.
From there, we can add a service by specifying:
- The log type that the filter will look at
- The field that the filter will look at
- The value of the field that you’d like to drop
In this case, we pointed the new Drop filter towards ‘service-10’, asked it to watch the ‘message’ field, and pasted our ‘Proxy request services send’ logs in the value field.
From this point on, Logz.io will filter out all logs with this message before being indexed. If these logs were to spike, they wouldn’t be held against the Logz.io cost. And if they’re needed later, they can be re-ingested for analysis (see next section).
Say you have a big release for ‘Service-10’, and want to analyze all of the logs it generates in the next day or two to make sure nothing breaks. You can deactivate the Drop Filter by toggling it. That’ll ensure indexing and analysis of all future ’Proxy request services send’ logs until you turn the Drop Filter back on.
The ability to toggle a list of Logz.io Drop Filters offers a sort of control panel, which allows users to manage and regulate the incoming streams of logs to avoid indexing unnecessary data.
Logz.io Archive & Restore: Keep the option to analyze all your logs
At this point, we’ve identified and filtered out the logs that are impacting our costs. Now we can forget about them, right?
Throwing away log data can leave a gap in your system’s observability. What if there is a production incident tomorrow, and you’re missing the data that is key to figure out what happened? While filtering out data helps our logging cost management, we still need access to that data just in case we need it at another time.
Logz.io’s Archive & Restore allows us to archive our log data so we can access it again, which is far less expensive than indexing all of it.
First, you’ll need to create an IAM role in AWS and an S3 bucket. Then, navigate to ‘Archive & Restore’ in the Logz.io tools tab. Enter the S3 bucket name, AWS access key, and AWS secret key in the ‘Archive’ tab.
Connect to Your S3 Bucket
You can test the connection to make sure Logz.io can connect to your new S3 bucket. From, there, hit ‘Archive’ to begin sending your data to the S3 bucket for storage.
You’ll get a popup in the app telling you that Logz.io has successfully connected to your S3 bucket. In tow, you’ll receive a message under the page title indicating that logs are currently entering the archive in S3.
We’re now archiving our data, but how can we restore it? We can start on the same page in the ‘Restore’ tab.
Simply enter the name of the AWS account and the time range for which data you’d like to index and hit ‘Restore’.
This will begin to pull data from out S3 bucket into Logz.io so you can analyze it in Kibana. After beginning to restore your data, you can check on its status on the next tab to the right: ‘Restored accounts’.
As we can see the data is in a process of re-ingestion by Logz.io.
We’ll get a reminder once Logz.io has fully restored the data for the desired time frame, which is more helpful with larger data restoration projects which can take a little while longer.
When you’re looking for your restored logs, make sure you’ve set the timestamp in Kibana to the same timeframe you originally shipped the logs!
Summing it up
Many modern DevOps teams are caught in a pickle between cutting costs and maintaining full observability.
You don’t have to choose between cost management and full observability. With Logz.io features Log Patterns, Drop Filters, and Archive & Restore add to cost efficiency. These features are designed to easily identify and filter out that data before indexing it, and then reingest that data if necessary later.
Check out this page to learn more about how Logz.io can help you make logging more cost efficient.