Zen and the Art of Kubernetes Monitoring

By: Matt Hines

October 20, 2022

Zen and the Art of Kubernetes Monitoring

The way to solve the conflict between human values and technological needs is not to run away from technology. That’s impossible. The way to resolve the conflict is to break down the barriers of dualistic thought that prevent a real understanding of what technology is – not an exploitation of nature, but a fusion of nature and the human spirit into a new kind of creation that transcends both.
Robert Pirsig, “Zen and the Art of Motorcycle Maintenance”

The real beauty of this modern, cloud-fueled, DevOps-driven world that we are living in is that it’s so highly composable. In so many ways, we’ve been freed from the limitations and structures of the previous annals of software and technology history to build things the way that we want to, and however we choose to do so.

At the same time, monitoring these systems – specifically referring to the nature of cloud applications and infrastructure – has to invoke some common, centralized manner of visualization and analysis.

It’s not to say that everyone needs to use the same monitoring tools – clearly there are endless models and infinite products designed to do so. But, at this point in the cloud revolution, we’ve landed on a set of best practices that most people recognize as the standard for what precisely needs to be involved.

For other fellow followers of Robert Pirsig, the writer and philosopher best known for his seminal work “Zen and the Art of Motorcycle Maintenance” – at least those like myself that actually ride motorbikes – his notions on consistent scrutineering and hands-on troubleshooting of one’s machine serve as a useful analogy.

While admittedly I’ll never be one to rebuild my own clutch, as Pirsig infers, I have to at the very least do the work of consistently measuring the air in my tires, adjusting and lubricating the chain, along with periodically changing the oil and brake pads if I’m to assume that the bike is in fit running condition. Without keeping a close eye on all of these variables, there’s no way that I can verifiably expect it to be reliable.

For today’s cloud practitioners, the monitoring and analysis of logs, metrics and traces, aka “full stack” observability – delivered in centralized fashion, is the obvious analogue to these mechanical processes. To ensure that they have all the right parts and information to do this work, practitioners also need support for universal data collection, in particular popular open source data streams including Fluentd, Prometheus and OpenTelemetry.

And all of this should be integrated and orchestrated to help foster the unified observability practices that today’s organizations need to help break down traditional monitoring siloes – to offer a top down understanding of the entire software machine. Teams increasingly require a unified view of these key indicators if they’re to expect optimal cloud performance.

The Consistent Challenge of Kubernetes Monitoring

Among all the critical data streams that today’s cloud monitoring experts need to keep tabs on, few outrank ensuring the optimal performance of their Kubernetes container orchestration systems. A recent conversation with a leading observability analyst at Gartner reaffirmed that they are seeing ever-increasing use of the technology, and Kubernetes has been a prime time cloud enabler for over five years.

But, to put it lightly, while Kubernetes is a tremendous enabler of applications innovation and flexibility, using it is not altogether easy. It’s no surprise to anyone in the DevOps world that today’s leading challenges related to the use and monitoring of Kubernetes relate directly to scaling their related observability tooling and making sense of the huge volumes of telemetry data that their systems throw off.

Looking back at our annual DevOps Pulse survey – the all new iteration of which you can find here, and please participate in this year’s study – monitoring Kubernetes and microservices remains among teams’ most difficult practices. In last year’s survey, over 52% of respondents cited Kubernetes, microservices and serverless among their primary challenges in attaining effective observability.

It’s not a major leap to infer that a significant contributor to this issue is the sheer volume of tools being utilized to monitor these systems. Another finding of the 2022 DevOps Pulse Report was persistent observability tool sprawl – with 90% of respondents using multiple tools, 66% using at least 2-4 observability systems and roughly 24% employing anywhere from five to 10. By comparison, in 2020, roughly 20% were using only 1 tool, with only 10% using over five.

According to the experts and our own research, day-to-day Kubernetes monitoring also poses numerous challenges. When asked to highlight their greatest obstacles when running Kubernetes in production, DevOps Pulse respondents cited a litany of related issues including security (34%), monitoring and troubleshooting (31%), networking (30%) and cluster management (27%), among others.

A Call for High Quality Kubernetes Observability

Beyond the encouragement to maintain one’s own bike, at least to the limits of your abilities, Pirsig is widely regarded for his thoughts on general matters of quality. As it would stand, it would seem, given widely held insights and validating research, that many teams are still seeking the most effective and efficient approach to monitoring Kubernetes.

It’s no surprise to learn that as an observability vendor, in particular one focused on extending the power of open source (because Kubernetes is an open source tech after all) we here at Logz.io feel that organizations should demand better tooling to help them address this task – bringing together logs, metrics and traces in a unified manner to monitor Kuberenetes in a truly optimized fashion.

At the upcoming KubeCon North America Conference, we’ll be excited to tell you more about the unique ways that we are bringing massively improved quality, consistency and efficiency to the practice of observing Kubernetes.

To see for yourself, visit us there at Booth S1 and don’t forget to see our CTO Jonah Kowall and Principal Developer Advocate Dotan Horovits who will be speaking at the event.

And don’t forget to check your tire pressure before making the drive; the ride’s a lot smoother if you’re willing to look!