The Complete Guide to Sampling in Distributed Tracing

Understanding Sampling

In the realm of DevOps observability, distributed tracing shines as a beacon illuminating the intricate pathways within complex, distributed systems and microservices architectures. Yet, capturing every trace can quickly become a resource-intensive and cost-prohibitive task.

Enter sampling, a pivotal technique offering observability without drowning in data. Sampling strategically selects traces for recording, capturing a representative subset that mirrors the system’s behavior that can provide meaningful insights without overwhelming the system. 

As organizations become more cost-conscious in the current financial climate, employing efficient sampling strategy becomes mandatory. While sampling can be applied to other types of telemetry, it is most commonly used for tracing. Here, we’ll focus on the tracing use case.

Benefits of Sampling

Applying sampling on your telemetry offers several important benefits: 

  • Resource Optimization: Reduces storage and computational needs by focusing on essential traces.
  • Performance Impact Mitigation: Prevents the overhead associated with capturing excessive trace data.
  • Cost Efficiency: Lowers costs related to storage, processing, and network bandwidth.

The Mechanics of Generating Traces

Let’s first understand how traces are being generated and constructed, as this system view will help us understand the performance implications and different tradeoffs.

A Trace, comprised of spans, illustrated as a dependency graph and a Gantt chart

Traces represent the complete execution sequence for a given request coming into the system. The trace consists of spans, where each span captures one operation in the sequence, and the span is emitted as soon as that operation is completed, independently of any other spans. 

The trace is assigned a unique identifier, trace ID, which is then set on each span to denote where the span belongs, together with the ID of the span that triggered the current (sometimes referred to as “parent span”) and additional context that captures the causality. 

The trace context is propagated with the execution flow via tracing SDKs or instrumentation agents, such as OpenTelemetry’s client libraries or Jaeger’s tracers. 

These emitted spans are typically sent to an observability agent, collector or backend, such as OpenTelemetry Collector open source tool, or Logz.io Telemetry Collector. Once all the spans reach that agent, the full trace is then constructed from the spans, based on the abovementioned trace ID and context.

Collecting traces – system view, with OpenTelemetry and Jaeger as examples

As we can see, traces are generated in a two-phase process: 

  1. Span Emission: Spans are emitted for individual operations, typically by tracing SDKs or client libraries.
  2. Trace Construction: Spans are collected at a trace collector or agent and traces are constructed by causality.

At its core, a sampling engine makes the decision of whether to keep a trace or to drop it. In accordance with the two-phase trace generation process, the sampling decision can be made upon span emission, or upon trace construction. 

What’s the right answer, then? It depends on your sampling strategy. 

At a very high level, sampling strategies can be divided into two categories: head-based and tail-based sampling. Let’s look at what these mean, what’s the difference between them, and what kinds of sampling policies can be achieved with them.

Head-Based Sampling

If a trace were a book (“Tale of a Request”), then Head-Based Sampling would be judging the book by its cover (or the first page). Swift but shallow judgment: doesn’t take much time and effort to decide, but we may skip interesting books.

In head-based sampling, the decision of which trace to keep is done upfront, typically at span emission time. At that point, the full trace is not yet available, and the decision is made based on the individual span at hand. Typically, this decision is made on the root span, the span that initiates the trace, for example due to a client request coming into the system. 

Once a decision is made on whether to keep (namely to sample) or to drop the root span, the same decision is then applied to all the subsequent spans of that trace, to ensure that entire traces are kept or dropped as a whole. 

Technically, this can be achieved by propagating the decision along with the trace context (sometimes referred to as parent-span-based sampling policy), or by applying the same deterministic computation on each span, such as hashing calculation made on the trace ID. 

Head-based vs. Tail-based sampling

Tail-Based Sampling

Going back to our trace book, Tail-Based Sampling would be reading the whole book to the end to pass the judgment. A thorough but consuming judgment: it takes time and effort to read the whole book, but we’re sure to spot interesting books. 

In tail-based sampling, the decision of which trace to keep is done after the trace construction phase is completed, after all the spans have been generated and emitted by the application, and have been collected and reconstructed in the collector agent based on causality. 

In Tail-Based Sampling, the full trace is available for review, and the sampling decision can be made based on the full trace and its context. This opens the door for a variety of elaborate policies, including stratified sampling in which we segment (or stratify, as it’s termed in statistics) the traces so we keep more samples of events that are interesting, high-profile, less-common or altogether new. 

Let’s look at an example, addressing traces that originate from HTTP-based requests, and applying different probabilistic policies for different segments:

  • 100% sampling of traces with errors
  • 100% sampling of traces with HTTP status code of 400s or 500s
  • 10% sampling for traces of customers in the Premium plan
  • 0.1% sampling of readiness/liveness probes
  • 0.5% of a noisy endpoint “/v1/aa/bb” of Service1 
  • 1% sampling for traces of other traces 

As we can see, the errors or HTTP status code can determine interesting traces. Similarly, we can set rules based on the URL, the RESTful verb (PUT, GET etc.) or other elements. We also see reduced sampling of non-interesting or noisy events. 

In the example we also incorporate our own business logic, in this case the customer’s assigned plan, where we attribute higher importance to the premium plan.

Tail sampling is becoming more commonplace in tracing tools, with varying capabilities. For example, OpenTelemetry’s Tail Sampling Processor supports sampling policies (a.k.a. samplers) based on status code, trace duration (latency), span count (minimum/maximum number of spans in the trace), and various supported attribute types (boolean, string, numeric), as well as composite policies, combining multiple samplers. Here’s an example policy configuration:

processors:
    Tail_Sampling:
      policies:
        [
            {
              name: error-in-policy,
              type: status_code,
              status_code: {status_codes: [ERROR]}
            },
            {
              name: slow-traces-policy,
              type: latency,
              latency: {threshold_ms: "500" }            
            },
            {
              name: probability-policy,
              type: Probabilistic,
              Probabilistic: {Sampling_percentage: "10" }
            }       
        ]

Once the decision has been made, if the trace isn’t chosen for sampling, the spans of that trace can be cleared from memory, while the sampled traces are further processed and then forwarded to the analytics backend, such as the Jaeger open source tool or the Logz.io managed service.

Head vs. Tail Sampling: Tradeoffs and Considerations

Let’s go back to our trace as a book metaphor.We need to decide on keeping or skipping the book. In Head-Based Sampling we judge the book by its cover or first page, which is quick and low effort but overlooks anything past the cover or first page (interesting twists in the plot? damaged pages?). In Tail-Based Sampling we invest in reading through the whole book in order to take a fully-informed decision with the complete story in mind. 

As you can see, there is a tradeoff between the performance and the context-based decision. Let’s look into these respective aspects.

Performance and Scalability

Head Samping is more efficient for the monitored application than Tail Sampling, as the decision is made upfront, and the application does not need to generate and emit spans for traces that are not sampled. This saves computational time and resources from the application, on the critical path of serving the request. 

In addition, head sampling is more efficient on compute and storage resources in the collector agent, as it only needs to store and process spans for the traces designated to keep (namely only the sampled traces). In the case of Tail Sampling, all the spans need to be collected and processed into full traces, and the sampling decision is done only afterwards.

Head sampling also lends itself better for parallel processing of incoming spans with multiple collectors. Tail sampling may impact the scalability of the trace ingestion pipeline, as all the spans need to go through the same collector instance that runs the tail processing logic.

Context-Based and Differentiated Policy

Head Sampling only has the first span in sight at decision time, which limits the sampling policies it can support to rather arbitrary decisions, such as probabilistic or rate-limiting policies. The problem with arbitrary decisions is that rare events, that happen in low probability but are of significance in incident flagging and investigation, may not be sampled. 

For example, traces with errors in a downstream service or with exceedingly long latency are probably more interesting for us, and we may want to keep them at higher rates than the “all’s good” traces. Similarly, we may place more importance on certain critical microservices or operations (think login or purchase checkout) and would like to keep higher rates of traces originating from them.

Tail sampling enables sampling policies based on the full trace context, which gives rise to sophisticated policies, according to the nature of the request and its various attributes, as well as the response to that request and any errors, failure or latency patterns that occurred during its execution. 

With Tail sampling we can identify the “interesting” traces, those that typically happen much less frequently, and sample them at higher rates so we have enough data to investigate.

Combining Head and Tail Sampling

As you can see, there are different tradeoffs between Head and Tail Sampling, and the decision of which to use depends on the system requirements and constraints. You can also combine head and tail-based sampling to optimize performance. For example, you can use head sampling to drop non-interesting and trivial traces, then use tail-based sampling to determine different probabilities among the interesting traces.

Let’s look at some common sampling policies which offer a way to define the volume of sampled traces, and can be applied over head and tail based sampling mechanisms.

Probabilistic Sampling: Fixed Probability Policies

Probabilistic Sampling randomly selects traces based on predetermined probability, ensuring a fair representation of the whole tracing data, while reducing computational load. For example, a probabilistic sampling policy of 0.1% means one trace out of every 1,000 traces will be kept. 

This policy is easily implemented over head-based sampling, and is widely supported by tracing libraries such as OpenTelemetry’s Probabilistic Sampling Processor or Jaeger’s Probabilistic Sampler, typically as simple as configuring a percentage or probability figure. 

Some systems offer an Always-Sample and Never-Sample policies, which are effectively the same as using 100% or 0% probability respectively.

Rate Limiting Sampling Policy: Fixed or Maximum Rate Policies

Rate Limiting Sampling samples traces with a certain fixed rate or threshold, typically stated in spans/traces per second. For example, a policy can sample 10 traces per second. This policy is also easily implemented over head-based sampling and is widely-supported by tracing libraries.

In certain cases, Rate Limiting Sampling also sets a threshold for the maximum number of spans to be sampled per second, to avoid spikes from overloading the system. Such threshold can be configured in combination with probabilistic sampling, to cap the respective probabilistic sampling with maximum volume.

Adaptive Sampling Policy: Dynamic Probability Policies

Using fixed probability figures as in the above Probabilistic Sampling may not adequately fit the actual traffic volumes and distribution, and too often engineers are not even aware of the distribution of their own traffic upfront.

Moreover, even if the distribution is known upfront, and static probability has been adequately calculated, traffic patterns may change over time and may call for adjusting the probabilities, such as during an incident. 

Adaptive Sampling addresses that by dynamically adjusting the sampling rate based on system load or specific conditions. In this case, a sliding window of the recent trace data is observed, and the policy engine recalculates the sampling probabilities based on that recent traffic. 

For example, adaptive sampling policy can observe each service-endpoint for its traffic pattern. This example policy can be implemented over head-based sampling, as the service and endpoint information is available at the root span. 

More elaborate adaptive sampling may call for tail-based sampling mechanism, such as collecting more traces of exceedingly high-latency traces, or traces which result in cache miss in the downstream backend caching layer.

While more flexible, adaptive sampling requires more storage and processing resources to buffer the sliding window data and calculate its distribution. It’s also important to note that this advanced policy is not supported by all the tools. 

The Jaeger project introduced support of adaptive sampling in 2022, based on a feature that has been in production at Uber for several years.

Best Practices for Effective Sampling

The sampling policies we’ve seen offer diverse options to choose from. Here are some guidelines on how to choose the options that are right for you:

  • Clear Observability Objectives: Define specific insights sought from traces to set appropriate sampling strategies.
  • Performance And Cost Constraints: Perform adequate load testing to estimate the compute and storage needs for sampling at your expected scale.
  • Blend Sampling Methods: Employ a mix of techniques based on different scenarios for optimal results.

A common best practice for tail-based sampling is:

  • Keep all traces with status code is ERROR
  • Keep all the slow traces, where latency exceeds a predefined static threshold
  • Keep a percentage of all other traces, in probabilistic fashion.  

Determine the Right Sampling Policy for You

Sampling in distributed tracing stands as a powerful ally, offering observability while preventing data inundation. By intelligently selecting traces, teams gain insights into system behavior while optimizing resources. Understanding sampling nuances and implementing them effectively is vital for achieving comprehensive observability in distributed systems.

Different sampling policies can be combined to create more elaborate sampling logic. It can be falling back to a less compute-intensive or storage-intensive policy if reaching some resource saturation, or placing preventive rate-limiting. 

Strike the right balance between capturing enough traces for meaningful insights and not overwhelming the system. Embrace sampling as a strategic asset in your pursuit of robust observability!

Get started for free

Completely free for 14 days, no strings attached.