Instrumenting Microservices with Istio for Distributed Tracing

Istio Tracing Instrumentation
  • Why? Less code. Istio’s tracing integration reduces the work required to instrument your services for tracing. It’s not quite code-free instrumentation, requiring developers to propagate tracing context.
  • How? OpenTracing API or manual copy. To achieve this, either leverage OpenTracing APIs or write the code to manually forward the relevant Zipkin headers. We find that manually forwarding Zipkin headers is the simplest and most transparent option.
  • Why not OpenTelemetry? OpenTelemetry’s specification prevents the propagation of the necessary B3 headers required for well-formed traces.
  • W3C Trace Context. Adopting W3C Trace Context instead of Zipkin is currently being discussed in the Istio community. Existing OpenCencus support may be sufficient.

Previously, I wrote a Beginner’s Guide to Jaeger + OpenTracing Instrumentation for Go providing guidance on manually instrumenting Go services. 

This is useful for cases where we want fine-grained tracing of specific functions. However, what if all we want is to trace a service’s inbound and outbound calls with little to no additional code?

Service mesh-based instrumentation holds the greatest promise thus far towards this goal:

  • A side-car deployment intercepting inbound and outbound requests is likely what a code-free tracing instrumentation solution looks like.
    • However, the side-car would also need to provide the discovery and routing mechanisms for communicating with other services; precisely what service meshes offer.
  • Many of the best-known service meshes such as Istio, Consul and Linkerd, provide distributed tracing instrumentation out of the box.

Though service meshes do much of the heavy lifting in terms of instrumentation, there is one common requirement among service meshes in order to provide well-formed end-to-end traces involving some manual coding intervention: propagating trace context

Put simply, this is the forwarding of incoming request trace headers to outgoing request headers.

Full working examples with the Istio service mesh are available on Github that you can run locally. It is worth noting that Istio uses Zipkin headers to propagate trace context, but there are discussions in the Istio community to adopt the W3C Trace Context.

In an effort to provide coverage on two widely adopted languages, I offer one approach implemented in Golang and another implemented in Java:

  • A Go example of two services: service-a and service-b. This example demonstrates the use of OpenTracing APIs to propagate trace data in request headers.
  • A Java example of two services: service-a and service-b. This example demonstrates the use of manually forwarding HTTP headers to propagate trace data.

Context Propagation: OpenTracing API

When leveraging the OpenTracing API to propagate tracing context, the main steps required are:

  1. Initialize the tracer with a Zipkin propagator.
  2. Extract trace context from inbound HTTP header, ensuring x-request-id is extracted.
  3. Inject trace context into outbound HTTP header, ensuring x-request-id is injected.

Initialize Tracer

func Init() (opentracing.Tracer, io.Closer) {
	zipkinPropagator := zipkin.NewZipkinB3HTTPHeaderPropagator()
	tracer, closer := jaeger.NewTracer(
		...
		jaeger.TracerOptions.Injector(opentracing.HTTPHeaders, zipkinPropagator),
		jaeger.TracerOptions.Extractor(opentracing.HTTPHeaders, zipkinPropagator),
	)
	opentracing.SetGlobalTracer(tracer)
	return tracer, closer
}

Extract Inbound Trace Context

func Extract(r *http.Request) (string, opentracing.SpanContext, error) {
	requestID := r.Header.Get("x-request-id")
	spanCtx, err := opentracing.GlobalTracer().Extract(
		opentracing.HTTPHeaders,
		opentracing.HTTPHeadersCarrier(r.Header))
	return requestID, spanCtx, err
}

Inject Outbound Trace Context

func Inject(spanContext opentracing.SpanContext, request *http.Request, requestID string) error {
	request.Header.Add("x-request-id", requestID)
	return opentracing.GlobalTracer().Inject(
		spanContext,
		opentracing.HTTPHeaders,
		opentracing.HTTPHeadersCarrier(request.Header))
}

Context Propagation: Manually Propagating Headers

When manually forwarding HTTP headers to propagate tracing context, the main steps required are:

  1. Define the Zipkin headers to propagate, along with the Istio-specific x-request-id.
  2. Copy inbound request’s HTTP headers to outbound HTTP request headers.

Define Zipkin Headers to Propagate

final static String[] headersToPropagate = {
	// All applications should propagate x-request-id. This header is
	// included in access log statements and is used for consistent trace
	// sampling and log sampling decisions in Istio.
	"x-request-id",

	// b3 trace headers. Compatible with Zipkin, OpenCensusAgent, and
	// Stackdriver Istio configurations.
	"x-b3-traceid",
	"x-b3-spanid",
	"x-b3-parentspanid",
	"x-b3-sampled",
	"x-b3-flags",
};

Copy Headers

Request.Builder requestBuilder = new Request.Builder().url(url);

for (String header : ServiceA.headersToPropagate) {
	String value = headers.get(header);
	if (value != null) {
		requestBuilder.header(header,value);
	}
}
Request request = requestBuilder.build();

Context Propagation: OpenTelemetry

OpenTelemetry is a very fast moving project, having reached v1 in its specification just a few months ago. The per-language SDKs all striving towards the v1 milestone, with some such as opentelemetry-java-instrumentation already crossing this line.

At the time of writing, leveraging OpenTelemetry’s Go SDK to propagate B3 trace headers is not possible due to some limitations including:

Conclusion

We have seen how service mesh solutions offer a less-involved option for instrumenting services, though still requiring some coding effort to propagate context. There were two approaches covered for context propagation:

  • Leveraging OpenTracing APIs
  • Manual HTTP header forwarding

Leveraging OpenTracing’s APIs has the benefit of not needing to know which headers to copy, deferring it to the Zipkin HTTP header propagator. 

We explore using OpenTelemetry’s Go SDK. However, there are some fundamental positions held in the OpenTelemetry specification that prevent its B3 propagators from correctly propagating the context for our use case.

In the end, however, manually copying HTTP headers is more transparent and arguably simpler than leveraging OpenTracing.

Add Distributed Tracing to Your Logs and Metrics with Logz.io & Jaeger

Internal

× Announcing Logz.io’s native integration with Azure for frictionless observability Learn More