A Practical Guide to Kubernetes Logging
Kubernetes has become the de-facto industry standard for container orchestration. It provides the required abstraction for efficiently managing large-scale containerized applications with declarative configurations, an easy deployment mechanism, and both scaling and self-healing capabilities.
As with any system, logs help engineers gain observability into containers and the Kubernetes clusters they’re running on and the key role they play is evident in a lot of incidents featuring Kubernetes failures. Yet Kubernetes poses a set of unique logging challenges.
Kubernetes is a highly distributed and dynamic environment. In production, you’ll most likely be running dozens of machines with hundreds of containers that can be terminated, restarted, or rescheduled at any point in time. This transient and dynamic nature of the system is a challenge in itself.
Kubernetes clusters also consist of multiple layers that need monitoring, each producing different types of logs.
Worried? Don’t be. Thankfully, there is a lot of literature available on how to gain visibility into Kubernetes. There are also various logging tools that integrate natively with Kubernetes to make the task easier. In this article, we’ll review some of these tools as well as review the Kubernetes logging architecture.
A Simple Example: Containerized application logging with Kubelet
Logging to stdout and stderr standard output streams
The first layer of logs that can be collected from a Kubernetes cluster are those being generated by your containerized applications. The best practice is to write your application logs to the standard output (stdout) and standard error (stderr) streams. You shouldn’t worry about losing these logs, as kubelet, Kubernetes’ node agent, will collect these streams and write them to a local file behind the scenes, so you can access them with Kubernetes.
Let’s take a look at an example pod manifest that will result in running one container logging to stdout:
metadata:
name: example
spec:
containers:
- name: example
image: busybox
args: [/bin/sh, -c, 'while true; do echo $(date); sleep 1; done']
To apply the manifest, run:
kubectl apply -f example.yaml
To take a look at the logs for this container:
kubectl logs example
The command calls kubelet service on that node to retrieve the logs. As you can see, the logs are collected and presented with Kubernetes. This is done for each container in a pod, across your cluster. Using kubectl for retrieving logs saves you from needing to access individual nodes in the cluster.
Kubectl can only show a single pod’s logs at a time. If you need to aggregate many pods into a single stream, you would need to use kubetail command, or higher level log aggregation and management tools that we will discuss later in this article.
Using a sidecar for logging
If your application does not output to stdout and stderr, then you can deploy a sidecar container alongside your application that will pick up the application logs and stream them to stdout and stderr respectively.
Such a sidecar pattern enables also performing some log manipulations, such as aggregating several log streams on the node into one, or separating one application log stream into several logical streams (each handled by a dedicated sidecar instance).
For persisting container logs, the common approach is to write logs to a log file and then use a sidecar container:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: example
image: busybox
args:
- /bin/sh
- -c
- >
while true;
do
echo "$(date)\n" >> /var/log/example.log;
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: sidecar
image: busybox
args: [ /bin/sh, -c, 'tail -f /var/log/example.log' ]
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: { }
As seen in the pod configuration above, a sidecar container will run in the same pod along with the application container, mounting the same volume and processing the logs separately.
Kubernetes logging architecture
As mentioned, one main challenge with logging Kubernetes is understanding what logs are generated and how to use them. In the following sections I will look into the node logging and the cluster logging.
Kubernetes Node logging
When a container running on Kubernetes writes its logs to stdout or stderr streams, they are picked up by the kubelet service running on that node, and are delegated to the container engine for handling based on the logging driver configured in Kubernetes. Note that you can view these logs with the kubectl logs
command (see the command list at the end of this post).
In most cases, Docker container logs will end up in the /var/log/containers
directory on your host. Docker supports multiple logging drivers but, unfortunately, Kubernetes API does not support driver configuration.
Once a container terminates or restarts, kubelet keeps its logs on the node. To prevent these files from consuming all of the host’s storage, a log rotation mechanism should be set on the node.
Kubernetes doesn’t provide built-in log rotation, but this functionality is available in many tools, such as Docker’s log-opt, or standard file shippers or even a simple custom cron job. When a container is evicted from the node, so are its corresponding log files.
Depending on what operating system and additional services you’re running on your host machine, you may need to take a look at additional logs. For example, in Linux journald logs can be retrieved using the journalctl
command:
$ journalctl -u docker
-- Logs begin at Wed 2019-05-29 10:59:24 CEST, end at Mon 2019-07-15 10:55:17 CEST. --
jul 29 10:59:35 thinkpad systemd[1]: Starting Docker Application Container Engine...
jul 29 10:59:35 thinkpad dockerd[2172]: time="2019-05-29T10:59:35.285765854+02:00" level=info msg="libcontainerd: started new docker-containerd process" p
jul 29 10:59:35 thinkpad dockerd[2172]: time="2019-05-29T10:59:35.286021587+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
As you can see in the above example, Docker container runtime writes its logs to journald. Other important Kubernetes system processes at the node level are kubelet, which also logs to journald, and kube-proxy, the network proxy that runs on each node, which logs to /var/log
directory.
Logging kernel events might also be required in some scenarios. You might, for example, use Unix dmesg
command to print the message buffer of the kernel to debug device drivers issues:
$ dmesg
[ 0.000000] microcode: microcode updated early to revision 0xb4, date = 2019-04-01
[ 0.000000] Linux version 4.15.0-54-generic (buildd@lgw01-amd64-014) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 (Ubuntu 4.15.0-54.58-generic 4.15.18)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-54-generic root=UUID=6e228d30-6415-4b41-b992-172d6899693e ro quiet splash vt.handoff=1
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
New in Kubernetes 1.27: Query node logs using the kubelet API
Kubernetes 1.27 introduced a new feature called Node log query that allows viewing logs of services running on the node. This is an alpha release of the feature, and is behind a feature gate. In order to use it, you’d need to enable the feature flag NodeLogQuery
for that node, and to set the kubelet configuration options enableSystemLogHandler
and enableSystemLogQuery
to true.
This feature essentially provides a shim that shells out to journald on Linux nodes, assuming the service logs are available via journald and that journalctl
is installed. On Windows nodes, it is assumed that service logs are available in the application log provider (Get-WinEvent
cmdlet). On both operating systems, logs are also available by reading files within /var/log/.
The Node log query feature enables querying logs with the kubectl get --raw command
. For example, here’s how to fetch kubelet logs from a node named node-1.example:
kubectl get --raw
"/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet"
Here’s how to fetch a log from a specific file in the /var/log folder of a given node:
kubectl get --raw
"/api/v1/nodes/<insert-node-name-here>/proxy/logs/?query=/<insert-log-file-name-here>"
You can also use filters to compose more complex queries. For example, fetching kubelet logs from a node named node-1.example that have the word “error”:
kubectl get --raw
"/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet&pattern=error"
Note that this is an alpha release, follow the documentation for details and updates.
Kubernetes system components logging
In addition to kubelet and kube-proxy node services we covered earlier, there are control plane components on the level of the Kubernetes cluster itself that can be logged, as well as additional data types that can be used (events, audit logs). Together, these different types of data can give you visibility into how Kubernetes is performing as a system.
The following are the main system components of Kubernetes control plane:
- kube-apiserver – the API server serving as the access point to the cluster
- kube-scheduler – the element that determines where to run containers
- etcd – the key-value store used as Kubernetes’ cluster configuration storage
Some of these components run in a container, and some of them run on the operating system level (in most cases, a systemd service).
The systemd services write to journald, and components running in containers write logs to the /var/log
directory, unless the container engine has been configured to stream logs differently.
Kubernetes’ system components use Kubernetes’ logging library — klog — to generate their log messages. These system logs were not known to follow uniform structure, which made it difficult to parse, query and analyze. However, Kubernetes’ v1.19 release introduced a new option in klog for structured logging in text as well as in JSON format.
Structured logging provides a well-defined structure in klog native format, with a list of key-value pairs for the variant payload. Using the --logging-format=json
flag enables JSON output.
It’s important to note that structured logging (both string and JSON options) is still in alpha per v1.19, with incremental adoption, so you may encounter early stage issues such as system logs that are still unstructured, log formatting changes, or klog flags which are supported for JSON. Check the documentation for updated feature status and information here.
Kubernetes events
Kubernetes events can indicate any Kubernetes resource state changes and errors, such as exceeded resource quota or pending pods, as well as any informational messages.
The command kubectl get events -n <namespace>
returns all events within a specific namespace:
LAST SEEN TYPE REASON OBJECT MESSAGE
4m22s Normal ExternalProvisioning persistentvolumeclaim/mysql-pv-claim waiting for a volume to be created, either by external provisioner "docker.io/hostpath" or manually created by system administrator
4m22s Normal Provisioning persistentvolumeclaim/mysql-pv-claim External provisioner is provisioning volume for claim "default/mysql-pv-claim"
4m22s Normal ProvisioningSucceeded persistentvolumeclaim/mysql-pv-claim Successfully provisioned volume pvc-b5419197-f122-4263-9c78-e9fb457db630
4m22s Warning FailedScheduling pod/wordpress-57b89f8b5b-gt6bv pod has unbound immediate PersistentVolumeClaims
4m20s Normal Scheduled pod/wordpress-57b89f8b5b-gt6bv Successfully assigned default/wordpress-57b89f8b5b-gt6bv to docker-desktop
4m18s Normal Pulled pod/wordpress-57b89f8b5b-gt6bv Container image "wordpress:4.8-apache" already present on machine
4m18s Normal Created pod/wordpress-57b89f8b5b-gt6bv Created container wordpress
4m18s Normal Started pod/wordpress-57b89f8b5b-gt6bv Started container wordpress
4m22s Normal SuccessfulCreate replicaset/wordpress-57b89f8b5b Created pod: wordpress-57b89f8b5b-gt6bv
Using kubectl describe pod <pod-name>
provides a lot of useful information about the pod, including a section listing the latest events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m44s default-scheduler persistentvolumeclaim "mysql-pv-claim" not found
Warning FailedScheduling 9m44s (x2 over 9m44s) default-scheduler pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 9m42s default-scheduler Successfully assigned default/wordpress-mysql-694777bb76-tqn55 to docker-desktop
Normal Pulled 9m40s kubelet, docker-desktop Container image "mysql:5.6" already present on machine
Normal Created 9m40s kubelet, docker-desktop Created container mysql
Normal Started 9m40s kubelet, docker-desktop Started container mysql
Kubernetes audit logs
Audit logs can be useful for compliance as they should help you answer the questions of what happened, who did what and when.
Kubernetes provides flexible auditing of kube-apiserver requests based on policies. These help you track all activities in chronological order.
Here is an example of an audit log:
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1beta1",
"metadata": {
"creationTimestamp": "2019-08-22T12:00:00Z"
},
"level": "Metadata",
"timestamp": "2019-08-22T12:00:00Z",
"auditID": "23bc44ds-2452-242g-fsf2-4242fe3ggfes",
"stage": "RequestReceived",
"requestURI": "/api/v1/namespaces/default/persistentvolumeclaims",
"verb": "list",
"user": {
"username": "user@example.org",
"groups": [
"system:authenticated"
]
},
"sourceIPs": [
"172.12.56.1"
],
"objectRef": {
"resource": "persistentvolumeclaims",
"namespace": "default",
"apiVersion": "v1"
},
"requestReceivedTimestamp": "2019-08-22T12:00:00Z",
"stageTimestamp": "2019-08-22T12:00:00Z"
}
For more information on monitoring Kubernetes logs for anomalies, as well as for threat detection, check out this post.
Kubernetes logging tools
Hopefully, you’ve now got a better understanding of the different logging layers and log types available in Kubernetes. The logging tools reviewed in this section play an important role in putting all of this together to build a Kubernetes logging pipeline.
Kubernetes doesn’t provide log aggregation of its own. However, Kubernetes release contains optional logging agents for Elasticsearch and for Stackdriver Logging (for use with Google Cloud Platform), and Fluentd as node agent. In the following sections I’ll look into each of them.
The general architecture for cluster log aggregation is to have a local agent (such as Fluentd or Filebeat which are discussed below) to gather the data and send it to the central log management. The agent usually deploys per node as a DaemonSet to collect all the logs on that node. However, it can also deploy per pod for finer granularity. The agent can also perform some filtering and manipulation of the logs before sending them, to improve the logs ingestion and analysis or to reduce log volume. I highly recommend adding metadata from the node (which is accessible to the local logging agent), such as pod name, cluster id and region, which greatly helps in analysis and troubleshooting.
Fluentd
Fluentd is a popular open-source log aggregator that allows you to collect various logs from your Kubernetes cluster, process them, and then ship them to a data storage backend of your choice.
Kubernetes-native, fluentd integrates seamlessly with Kubernetes deployments. The most common method for deploying fluentd is as a daemonset which ensures a fluentd pod runs on each pod. Similar to other log forwarders and aggregators, fluentd appends useful metadata fields to logs such as the pod name and Kubernetes namespace, which helps provide more context.
ELK Stack
The ELK Stack (Elasticsearch, Logstash and Kibana) is another very popular open-source tool used for logging Kubernetes, and is actually comprised of four components:
- Elasticsearch – provides a scalable, RESTful search and analytics engine for storing Kubernetes logs
- Kibana – the visualization layer, allowing you with a user interface to query and visualize logs
- Logstash – the log aggregator used to collect and process the logs before sending them into Elasticsearch
- Beats – Filebeat and Metricbeat are ELK-native lightweight data shippers used for shipping log files and metrics into Elasticsearch
ELK can be deployed on Kubernetes as well, on-prem or in the cloud. While Beats is Elasticsearch’s native shipper, a common alternative for Kubernetes installations is to use Fluentd to send logs to Elasticsearch (sometimes referred to as the EFK stack).
Together, these components provide Kubernetes users with an end-to-end logging solution. As effective as it is, deploying and managing ELK deployments at scale is a challenge unto itself.
The ELK Stack grew into an open source success story. Unfortunately, Elastic – the company that launched and maintains the ELK Stack – decided to close source the ELK Stack in early 2021 by switching the license from Apache 2.0 to proprietary licensing.
In an effort to keep the popular projects open source, Amazon teamed up with Logz.io and other industry leaders and forked the open source Elasticsearch and Kibana, creating OpenSearch and OpenSearch Dashboards projects respectively under Apache 2.0 license. There is no OpenSearch equivalent of Logstash because it’s largely obsolete in modern implementations – it can be replaced by log collectors like Fluentd or FluentBit which have powerful log processing capabilities.
OpenSearch and OpenSearch Dashboards
OpenSearch and OpenSearch Dashboards are quickly growing open source projects that fill the original place of the ELK Stack: a truly open source log management stack for the engineering community. It’s licensed under the open source Apache 2.0 license.
Since these projects were forked from Elasticsearch and Kibana, OpenSearch is still relatively similar to ELK components, while offering a rich roadmap beyond that. Among others, there are some key features that OpenSearch and OpenSearch Dashboards provide, which are only available in the paid versions of Elasticsearch and Kibana, including:
- AI/ML to automatically highlight trends and anomalies in the data worthy of human attention, like data spikes indicating a potential production issue.
- A full suite of security features, including encryption, authentication, access control, and audit logging and compliance.
- User access control to define user roles across organizations and limit access to sensitive data.
Check out our OpenSearch guide to learn more about OpenSearch features, capabilities, and differences from Elasticsearch.
Logz.io
Getting started with open source logging and observability tools is generally straightforward with small Kubernetes clusters.
However, as more services are added and telemetry data volumes grow, you may find yourself overspending resources on maintaining your open source logging data infrastructure. Plus, you may need to evolve your observability strategy to unify and correlate different telemetry types on a single platform.
This is why we built Logz.io, which unifies the leading open source Kubernetes monitoring tools on a SaaS observability platform – including OpenSearch for logs, Prometheus (using a M3DB back end) for metrics, and OpenTelemetry and Jaeger for traces.
Logz.io builds additional capabilities on top of OpenSearch, such as our own ML to surface critical log patterns and exceptions, alerting, RBAC, data filtering to reduce costs, cold data storage to reduce costs, and other capabilities.
If you’re looking for an enhanced open source logging and observability experience for Kubernetes, try Logz.io’s free trial.
Google Stackdriver
Stackdriver is another Kubernetes-native logging tool that provides users with a centralized logging solution. If you’re using GKE, Stackdriver can be easily enabled using the following command:
gcloud container clusters create [CLUSTER_NAME] \
--zone [ZONE]
--project-id [PROJECT_ID]
--enable-stackdriver-kubernetes \
--cluster-version=latest
For more information on using Stackdriver to log Kubernetes, check out Logging Using Stackdriver.
Endnotes
Once a cluster is up and running with logging in place, you can make sure your workloads and underlying infrastructure stay healthy. Logging also helps you to be prepared for issues that may arise during the deployment of a new production release and stop them before they affect the customer’s experience.
Kubernetes’ kubectl
and kubetail
commands can provide a useful manual way to inspect logs, but monitoring clusters in production calls for a cluster-wide log aggregation and analysis tool such as ELK stack. In production it’s recommended to keep your logs separately from the Kubernetes cluster running your monitored application, so that your logs remain accessible for troubleshooting even (and especially) during cluster outage and issues.
It takes time to implement production-ready logging for your services, as well as to set up alerts and tune them appropriately. However, an effective logging solution allows you to focus on monitoring your key business metrics, which, in turn, increases the reliability of your products and your company’s revenue.
To learn more contact us or visit our blog.
kubectl logs and other useful kubectl commands
Some useful kubectl
commands are listed below:
kubectl logs -f # stream logs
kubectl logs --since=1h # return logs newer than a relative duration
kubectl logs --since-time=2020-08-13T10:46:00.000000000Z # return logs after a specific date (RFC3339)
kubectl logs --previous # print the logs for the previous instance of the container
kubectl logs -c # print the logs of this container
kubectl logs -l # print logs from all containers in pods defined by label
kubectl get events --sort-by=’.metadata.creationTimestamp’ # print all events in chronological order
kubectl describe pod # print pod details like status or recent events
You can find more information on these and other commands on the reference documentation here.