Cost-Efficient Data Management with Logz.io Smart Tiering™

Logz.io Smart Data Tiering

Logz.io Smart Tier gives you the flexibility to place data on different tiers to optimize cost, performance and availability. Unlike the traditional hot-warm-cold paradigm, Logz.io Smart Tier gives you a “hot” real-time experience while on a “warm” tier, at a reduced cost.

Not all data is made equal

Keeping log data comes in handy when troubleshooting operational issues. Keeping data, however, comes at a cost: the storage cost. This is especially evident with logs, which are inherently a verbose textual type of telemetry. Sometimes this drives painful decisions to cut retention time short of the desired period, just to keep costs manageable.

What if there were a middle ground?

Well, there is. But to understand it, you need to first understand your data. Ask yourself this:

Do you use all your log data the same way?

Do you need all your logs with the same availability?

When we look at common operations practices, we see that as data ages, it is accessed far less, typically also in a less time-sensitive manner. Just think about it: on operations, day-to-day you frequently need the logs of the last couple of days, and you need them in realtime to monitor and resolve urgent production issues. Looking back a week or two in logs is far less common.

And then you have these cases where you need to fetch older data, even a few months back, following customer claims, security investigation or some compliance auditing process, and these are usually longer processes which are far less sensitive to data retrieval times.

You can think about your data in terms of how frequently and timely you use it:

  • Critical data is frequently used
  • Active data is infrequently used
  • Historical data is seldom used

Data is always there. There is no compromise on data integrity. It is about how highly available it is for immediate visualization and exploration in the tool, and the tradeoff with the cost of keeping it highly available.

Not all data should cost equal

Not all our data is critical. In fact, most of our data probably isn’t critical. So why do we need to pay for the highest level of availability for all our data? Could we reduce costs by storing the non-critical data in lower tiers with lower availability?

Yes, we can!

A common pattern in data architecture is known as Hot-Warm-Cold (some even extend to Frozen). In essence, this architecture defines different data tiers, each based on different storage infrastructure with a different cost-performance tradeoff. For example, it may be that

  • Hot data will be stored on SSD (solid-state drives),
  • Warm tier will run on HDD (spinning disks) and
  • Cold tier will run on tape, or offload to an external cheap storage service.

According to the classic paradigm, you will experience greater latency interacting with your data on the warm tier compared to the hot tier, but as it’s a cheaper storage, it will cost you significantly less (and similarly for cold compared to warm).

Having the flexibility handle different data tiers in different ways is great. But do you really have to sacrifice your query latency to enjoy it?

We believe not.

We at Logz.io change the traditional paradigm. With our smart storage as a second tier, you can reduce costs for the Warm data, while keeping the query performance you’re used to from your normal Hot tier service.

Logz.io Smart Tiering™ for Flexible Cost-Performance Balance

Welcome to Logz.io Smart Tiering:

  • Real-Time Tier for your critical data, with the top real-time performance.
  • Smart Tier for your active data, offering the same real-time performance as above, with reduced replication
  • Historical Tier for your historical data, with archiving to your cloud object storage of choice
Logz.io Data Tiering, different tier options

Logz.io Data Tiering, different tier options

How do we do that?

As your logs come in, they are stored in the Real-Time Tier.

In the Real-Time Tier, your critical log data is kept with redundant hot replicas of the same level, so that upon a failure event the system seamlessly switches over to the replica without you suffering impact on your query latency.

As your data ages beyond the first days where high availability and performance are critical, you may choose to move it to the Smart tier.

Logz.io Smart Tier uses the exact same storage type as the Real-Time Tier, so you enjoy the same query latency you are used to. We bring the price down by replacing the hot replicas with cheaper storage replicas. The impact of that would be noticeable only when you attempt to query data exactly at the time of failure of that specific data segment, and before the recovery has been completed. Aside from these highly infrequent cases, you will enjoy at least 97% availability with the same experience as the standard Real-Time tier, while saving up to 25% of your data retention costs.

On the Historical Tier, you have data that can be months and even years old, which can be archived with Logz.io Archive to your AWS S3 or Azure Blob Storage. Logz.io Archive is meant for optimal cost efficiency, and is currently offered free of charge, so you just bear the cost of your cloud storage for the archived data.

When you need to investigate archived data, you can restore it with Logz.io Restore, which will re-ingest and index the data so it’s accessible in your Kibana. Logz.io Restore is currently offered free of charge for a limited log volume and retention, so you don’t have to sacrifice some of your operations quota to perform off-stream investigation.

You can augment your data management strategy with Logz.io Drop Filters, which enables you to automatically exclude certain logs from being indexed and held against your log volume quota. One of the great things about Logz.io Drop Filters is that you don’t have to give up data integrity even for the logs you filtered out: while the dropped logs aren’t indexed in Elasticsearch, they will still be archived (if archive feature is enabled). And, just like any archived log, you will be able to restore and explore them in Kibana for ad-hoc investigations.

Summary

Not all data is made equal. And not all data should cost equal.

Logz.io breaks the traditional hot-warm-cold paradigm, to give you a “hot” real-time experience also on the second tier, at a reduced cost.

Logz.io Smart Tiering gives you the flexibility to define a data management policy that divides your data across different tiers based on the desired balance between cost, performance and availability:

  • Real-Time Tier for operational data available in Kibana in real time with redundant hot replicas.
  • Smart Tier for infrequent querying, enjoying same real time performance as the top tier, backed up with cold replicas.
  • Historical Tier for compliance and auditing, with archiving to AWS S3 or Azure Blob, supporting reindexing into Elasticsearch for investigation in Kibana

In combination with Drop Filters for filtering out low-importance logs, you can achieve optimal log management, with longer overall data retention and lower costs.

Start planning your data tiering now, try out archiving, or contact us for more details.

 

Get started for free

Completely free for 14 days, no strings attached.