Distributed Tracing Tools and New Industry Standards

The Rise of New Distributed Tracing Tool Standards

Metrics and logs have been around for a long time, yet we haven’t adopted common standards for them. Sure, there have been attempts on the metric side with OpenMetrics. Similarly, tracing only got a standardization effort with OpenTracing just a few years ago. There was no effort in a unified approach to standardize all observability signals until OpenTelemetry began a little less than two years ago. And there has been a need. Open source distributed tracing tools have proliferated of late: firstly with the pioneer Zipkin, then Jaeger, and most recently Skywalking.

A representative of each project from the latter trio joined our CTO Jonah Kowall at OpenObservability to cover a few pressing subtopics on traces. Jose Carlos Chavez (Expedia) represented Zipkin; Apache Skywalking Founder, Wu Sheng (Tetrate); and Yuri Shkuro (Uber) came as the creator and project lead for Jaeger.

They spoke extensively about the challenges in aiming tracing’s path through the industry. The evolving landscape which began as two distinct solutions OpenTracing and OpenCensus which are now being unified under OpenTelemetry, the road so far taken and next steps for the project (and space).

OpenTelemetry and Standardization

The discussion started with a focus on Google’s OpenCensus and Lightstep’s CNCF’s OpenTracing. Google open-sourced the former years ago, while the CNCF has taken over on the latter.

Chavez thought that OpenTracing and OpenCensus were important initiatives, but ultimately a stepping stone to OpenTelemetry.

“My perception is that OpenTelemetry is trying to restart the grounds around observability. It’s trying to do in lots of new abstractions,” Chavez declared. “Sometimes it could feel ambitious…(with)…so many wheels turning at the same time.”

“I think it’s not trying to build bridges or unify paths. It’s more like…resetting the whole thing so it’s going to be interesting. Still there are other alternatives around life is not like OpenTelemetry or nothing so it would be interesting to see what happens,” Chavez added.

Distributed Tracing Tools & Instrumentation

Jaeger, inspired by Zipkin and Dapper, was developed and later open-sourced by Uber. Shkuro, who is also a co-founder of OpenTracing and OpenTelemetry, noted the similarity between the two big project. He also described the indecisive state of working with tracing technology while simultaneously trying to satisfy both standards.

System architecture for distributed tracing tool Jaeger
Jaeger architecture for tracing

Hence, bringing them together, as best developers could, would suit the users of distributed tracing tools.

“You don’t know which one will win and they took slightly different approaches to how they were trying to solve the problem. As we said, open telemetry took kind of the best parts of both of those approaches,” Shkuro said.

That way, “you have both 1) a very abstract API that is vendor neutral that you can implement any way you want.” That’s the part from the OpenTracing. But, at the same time “2) it comes with a very standard SDK so that you don’t have to re-implement the thing.” And this was the problem with OpenTracing.

OpenCensus + OpenTracing = OpenTelemetry’s New Standards

OpenTelemetry’s SDK standardizing data and propagation formats also lays the groundwork for standardizing instrumentation, something Shkuro sees as part of the “original promise” of OpenTracing.

At such a point, a fully standardized telemetry will let developers refocus on more complex tools, instead of, what Shkuro says, “instrumenting a particular framework.”

Skywalking’s approach is unique, as it primarily uses automated instrumentation via agents to collect traces, making it more similar to APM solutions. But Skywalking’s Wu Sheng, who knows the benefits of standardization, sees some benefit to tracing if it takes a while before it happens.

“From my perspective a project like this may not provide a consistent API that covers all of the [possible] cases, but they provide a very good case to educate people,” Wu Sheng told us. It will also, though inconvenient, force people to become familiar with multiple options.

“If you have some kind of specification and implementation, [they] could say ‘No, we don’t have that. Please use this one because this is our implementation.’”

Distributed Tracing Tools’ Approaches to Agents, Libraries and APIs

Chavez noted similarly from his perspective at Zipkin. “We truly believe that provides learning for users, which is also something very important for us. When you do instrument your app, although it’s quite time-consuming, you get to know what you are doing right.”

Zipkin doesn’t use agents, although auto-instrumentation agents are available for many languages. They take the approach of instrumenting directly from libraries, namely because as an all-volunteer project. “Starting an engine, at this point,” when the libraries are already well built-up, “would be very time-consuming.”

System architecture for distributed tracing tool Zipkin
Zipkin system architecture

Shkuro felt agents would facilitate the market as people continue to familiarize themselves with tracing in general.

“Personally, I feel that the agent approach is definitely very powerful, especially as a way to get people in very quickly because…it opens up a path for people to quickly get up and running with distributed tracing or telemetry.”

“That is, unless you learn a very special language like Go or C where you can’t really do agents (which was the case at Uber so we couldn’t even go with that approach),” highlighting the impossibility of using agents in those two languages, which he contrasted with languages like Java. “However, with more JVM or like VM-based languages, it’s much easier to do agents.”

OpenTelemetry is developing its own agents which will be both compliant with their new standards and vendor-neutral. They are also creating a collector which standardizes traces, metrics, and logs. These are all API-driven, allowing custom instrumentation if agents can’t do things your traces require, Shkuro explained.

“If it emits a new metric that is not part of the agent, then you have that same standard API in your hands. So, it kind of opens up a path for power developers to go deeper than just a standard instrumentation.”

Conclusion

Agreeing to tracing standards will involve juggling differences on APIs, instrumentation, and the like. There is also no telling if newer distributed tracing tools are in the offing. But the community is certainly thriving, as evidenced not only by the gathering here but all the work each project has accomplished.

To watch the entire discussion, see the video below:

 

Get started for free

Completely free for 14 days, no strings attached.