Many enterprises have already adopted Kubernetes or have a Kubernetes migration plan in place, making it clear that the platform is here to stay. While it provides a lot of benefits to its users, to take advantage of them, you need to thoroughly learn Kubernetes and how it works in production. Typically, the most difficult aspects of Kubernetes are learned through experience solving real-world problems. This post will focus on providing resources to help you do exactly that, while also explaining some of the core concepts behind Kubernetes.
1. Cluster Administration
If you have to start somewhere, learn Kubernetes cluster concepts. The major cloud providers (Google, AWS, Azure, etc.) all provide Kubernetes clusters as a service. This is a great place to get started. With just a couple of clicks in their cloud consoles, you can create a cluster.
Furthermore, the cloud providers manage the cluster, performing routine tasks like certificate rotation and version patching. If, however, you have to set up your own Kubernetes clusters, you’ll need to manage everything yourself, making the situation a bit more complex.
Described below are a few of the scenarios you might come across when managing clusters on your own, as well as some resources that can help you become more proficient at this process.
Bootstrap Kubernetes from Scratch
While you can easily use a Kubernetes bootstrapping tool like kubeadm to set up clusters, it’s important to know how the cluster is bootstrapped from scratch. Accordingly, you’ll be able to tackle production issues that may arise later on. The following two sources explain this well:
Bootstrap Kubernetes Using kubeadm
Kubeadm will automatically perform the configuration you had to do on your own in the previous section. As a result, it’s a pretty popular tool to bootstrap Kubernetes.
There are multiple architectures to decide from, and, during your set up process, you’ll have to answer questions like, “Should the architecture be highly available?” and, “Is a single master cluster sufficient?” Below are a few guides that will help you make these decisions:
- Options for Highly Available topology, Kubernetes Documentation
- Creating a single control-plane cluster with kubeadm, Kubernetes Documentation
- Kubernetes on CentOS 7 with Firewalld, Platformer Cloud, Nilesh Jayanandana
- Demystifying High Availability in Kubernetes Using Kubeadm, Velotio Technologies
- Adding Windows nodes, Kubernetes Documentation
You still need to maintain clusters after they are configured. It’s recommended that you update Kubernetes versions as soon as new stable versions are released, and these updates need to be done incrementally. In other words, if your current version is 1.15, and the latest version is 1.18, you can’t update your servers directly to 1.18. The following guides will help you to update Kubernetes with minimum disruption to your running workloads:
- Upgrading kubeadm clusters, Kubernetes Documentation
- Upgrading Windows nodes, Kubernetes Documentation
- Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself, Platform9
Prevent Resource Overuse
When you deploy your workloads in Kubernetes clusters, there’s a chance you’ll end up overusing the cluster node resources, a situation which can eventually lead to system crashes.
You can minimize these risks by learning Kubernetes pod limits and resource quotas, processes that the following resources tackle in depth:
- Managing Compute Resources for Containers, Kubernetes Documentation
- Assigning Pods to Nodes, Kubernetes Documentation
- Configure Memory and CPU Quotas for a Namespace, Kubernetes Documentation
- Kubernetes Resource Quotas, Alibaba Cloud
Backup and Restore
Every system should have a backup and restore plan in place, and Kubernetes is no exception. The resources listed below describe how to envision and set up an effective plan. Note that the persistence layer in Kubernetes is a key value store called etcd.
- Backup Kubernetes – how and why (with examples for etcd), Elastisys
- Operating etcd clusters for Kubernetes, Kubernetes Documentation
- Creating etcd backup, CoreOS
Kubernetes networking-related issues are common in misconfigured Kubernetes systems. Networking is a core layer in Kubernetes, and, early in the process of bootstrapping a cluster, you need to make a decision about which container networking interface (CNI) you want to use in your cluster.
Depending on the CNI you choose, you may gain access to additional features like network policy support and encryption over network. Network policy support may be a critical requirement for you if you want to control ingress and egress traffic through your namespaces or pods inside the Kubernetes cluster.
Choosing the correct CNI depends on your security policies, performance targets, and scalability, as well as the hardware running in your data center. These resources can help you to select the best CNI for your use case:
- Container Network Interface (CNI) Providers, Rancher
- Comparing Kubernetes Networking Providers, Rancher
- Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network (Updated: April 2019), ITNext
- Kubernetes Multi-Cluster Networking-Cilium Cluster Mesh, ITNext
- Flannel vs Calico : A battle of L2 vs L3 based networking, Shashank Jain
- Introducing Weave Net, Weaveworks
Kubernetes allows you to configure network policies in the cluster network. These act as a virtual firewall inside the cluster and allow you to fine tune and control traffic between pods and namespaces in Kubernetes.
In order to do this, you’ll need to configure a network policy supported CNI. The most common CNI used in tutorials, Flannel, doesn’t support network policies. The resources below examine the details of working with network policies:
- Network Policies, Kubernetes Documentation
- Exploring Network Policies in Kubernetes, Banzai Cloud
- Secure Your Kubernetes Application With Network Policies, Bitnami
Service discovery in Kubernetes works with the component coredns, which requires your CNI to be configured properly. Coredns provides other functionalities as well, allowing you to configure a wide array of DNS plugins to suit your specific requirements.
- Debugging DNS Resolution, Kubernetes Documentation
- Customizing DNS Service, Kubernetes Documentation
- CoreDNS Plugins, CoreDNS
- How to Add Plugins to CoreDNS, CoreDNS
3. Kubernetes Storage
Kubernetes is a PaaS that allows you to run workloads as containers. More often than not, these workloads will need to persist their state.
K8s supports a wide range of storage drivers natively, and the option to use external drivers exists as well. When choosing a storage driver, you need to consider performance, volume access modes, availability and scalability.
Kubernetes Volume Access Modes
Kubernetes supports three volume access modes: ReadOnly, ReadWriteOnly, and ReadWriteMany. Be careful when choosing the volume drivers, since some may not support all three modes.
Many drivers don’t support ReadWriteMany. However, if ReadWriteMany mode is important to you, the most commonly used driver is NFS. More information about volume access modes can be found at these links:
- Persistent Volumes, Kubernetes Documentation
- Storage Classes, Kubernetes Documentation
- Dynamic Volume Provisioning, Kubernetes Documentation
- NFS Persistent Volumes with Kubernetes on GKE — A Case Study, Nilesh Jayanandana
- Storage on Kubernetes: OpenEBS vs Rook (Ceph) vs Rancher Longhorn vs StorageOS vs Robin vs Portworx vs Linstor, Vito Botta
- Configuring NFS Storage for Kubernetes, Docker
- Using overlay mounts with Kubernetes, Amartey Pearson
Persistent Volume Backup and Restore
After you’ve configured storage for Kubernetes, you need to learn Kubernetes backup and restore protocol. There are multiple tools out there to support this, and some storage drivers already have backup systems implemented. Here are a few resources addressing possible solutions:
- Kubernetes: Backup your Stateful apps, Maud Laurent
- Volume Snapshots, Kubernetes Documentation
- Kubernetes Snapshots and Backups, Portworx
- Stash by AppsCode, AppsCode
4. Learn Kubernetes Security
Whether your Kubernetes cluster is running on-prem or in the cloud, its security is of utmost importance. An ill-configured Kubernetes cluster will be vulnerable to attacks.
As a rule of thumb, it’s best to avoid exposing the Kubernetes API to the public in order to reduce the surface area of potential attacks.
Of course, there are other ways to attack a cluster and consequently to prevent those attacks, many of which we describe here.
Service accounts are the resources associated with Kubernetes’ authentication mechanism. They let you log in and use the Kubernetes API. From there, you can create multiple roles with specific permissions in Kubernetes and bind these roles to service accounts.
The majority of developers use the default kubectl config file from kubeadm instead of creating a separate user and a separate config file for every kubectl user. However, this isn’t advisable since it might open up security vulnerabilities.
The resources below can help you navigate service accounts:
- Kubernetes RBAC and TLS certificates – Kubernetes security guide (part 1), Sysdig
- Securing Kubernetes components: kubelet, Kubernetes etcd and Docker registry – Kubernetes security guide (part 3), Sysdig
- How to Secure Kubernetes Clusters from Pod to Network, Logz
- Using RBAC Authorization, Kubernetes Documentation
- OPA Gatekeeper: Policy and Governance for Kubernetes, Kubernetes Blog
- Auditing, Kubernetes Documentation
Avoiding Root Containers
It’s best to avoid running containers in root mode. You can avoid containers running as root using the security contexts available in Kubernetes and by using container sandboxing methodologies if you’re running a multi-tenant system in a single Kubernetes cluster. The resources below describe these processes:
- Configure a Security Context for a Pod or Container, Kubernetes Documentation
- Pod Security Policies, Kubernetes Documentation
- Kubernetes security context, security policy, and network policy – Kubernetes security guide (part 2), Sysdig
- How to Implement Secure Containers Using Google’s gVisor, Karthikeyan Shanmugam
- Kata Containers on Kubernetes and Kata Firecracker VMM support, Gokul Chandra
Many enterprises use Active Directory or some other tool for user management. When you set up Kubernetes clusters, you must federate users to these clusters and configure access using LDAP or SAML protocols. Here are a few guides that can help you get started with this process:
- Single Sign-On for Kubernetes: An Introduction, The New Stack
- kube-oidc-proxy: A proxy to consistently authenticate to managed Kubernetes clusters, on multi-cloud, using OIDC, Jet Stack
- Role Based Access Control | Kubernetes Authentication in Rancher, Rancher
Your Kubernetes cluster will almost always have a few services exposed to the internet. To have more control over your cluster’s ingress traffic, send it through an API gateway.
- Getting Started With Ambassador, DZone
- Kong Gateway in Kubernetes, ITNext
- Three Strategies for Managing APIs, Ingress, and the Edge with Kubernetes, ITNext
- API Gateway as an Ingress Controller for Amazon EKS, AWS
- Patterns for deploying Kubernetes APIs at scale with Apigee, Google Cloud
The secrets you store in Kubernetes are just base64 encrypted strings. Consequently, just about anyone can easily decrypt them. Unfortunately, you cannot commit configuration files to a repository without exposing your secrets to third parties.
The resources in this section provide best practices and tools for storing your sensitive data in Kubernetes securely.
- Protecting Kubernetes Secrets: A Practical Guide, AquaSec
- Injecting Vault Secrets Into Kubernetes Pods via a Sidecar, Vault
- Managing secrets deployment in Kubernetes using Sealed Secrets, AWS
- Encrypting Secret Data at Rest, Kubernetes Blog
- Securely Keeping Kubernetes Secrets in Git, VictorOps
- Kubernetes Secrets Management, DZone
- Securing Secrets With HashiCorp Vault and Logz.io Security Analytics, Logz
Kubernetes Security Best Practices
A lot of security implementations are required to secure your Kubernetes cluster. This section’s resources explain best practices for Kubernetes clusters.
Remember to always verify and benchmark your Kubernetes cluster to make sure that it meets the security standards specified by your organization.
- Kubernetes Best Security Practices, Logz
- Find Security Vulnerabilities in Kubernetes Clusters, Rancher
- The top Kubernetes security best practices, Sqreen Blog
- Hardening your cluster’s security, Google Cloud
- 9 Kubernetes Security Best Practices Everyone Must Follow, CNCF
- K8s security guide, Sysdig
- Kube-hunter – an open source tool for Kubernetes penetration testing, AquaSec
- Kubernetes Audit: Making Log Auditing a Viable Practice Again, CNCF
- Kubernetes Audit Logging, Sysdig
5. Kubernetes Observability
Logging and monitoring (metrics) complete the setup of your production-grade Kubernetes cluster.
Metrics, also known as monitoring (though the term can sometimes be found extending to other aspects of observability) allows you to observe and act upon cluster behavior over time. Concurrently, logging lets you debug running workloads and observe their status.
There are many monitoring stacks, but the most popular combine Prometheus with Grafana and/or the ELK Stack. Both provide similar features (though Grafana is designed with metrics in mind; Elastic with a focus on logs).
Fortunately, if you’re using a cloud-managed Kubernetes solution, your cloud vendor will have a dedicated monitoring stack available to you. For example, GKE uses Stackdriver.
Below are a few guides describing how to set up monitoring in Kubernetes:
- Kubernetes Monitoring: Best Practices, Methods, and Solutions, Logz
- Kubernetes Monitoring with Prometheus -The ultimate guide (part 1), Sysdig
- Kubernetes Monitoring with Prometheus: AlertManager, Grafana, PushGateway (part 2), Sysdig
- Kubernetes monitoring with Prometheus – Prometheus operator tutorial (part 3), Sysdig
- Configure Liveness, Readiness and Startup Probes, Kubernetes Documentation
- Kubernetes Monitoring, ELK Stack
Containers are stateless. Logs are streamed to standard out, and log files are saved in the container engine log folder in the host operating system.
However, looking for these logs manually is incredibly tedious, so logs are typically aggregated and pushed to a centralized logging provider where users can easily look into them. The following list of links should help you navigate the details of logging:
- An Introduction to Kubernetes Logging, Logz
- Logging Architecture, Kubernetes Documentation
- Logging Using Elasticsearch and Kibana, Kubernetes Documentation
- Monitor and Troubleshoot Your IT Environment, Logz
- Fluentd vs. Logstash: A Comparison of Log Collectors, Logz
- How To Set Up an Elasticsearch, Fluentd and Kibana (EFK) Logging Stack on Kubernetes, Hanif Jetha
In a microservice environment, where you have hundreds if not thousands of microservices running, debugging gets complicated. All your requests hop off these microservices to complete one business domain request in your application.
Tracing manages this complexity by monitoring and analyzing request payloads and request hops between your services, allowing you to get a clearer picture of how your services are running in your environment.
- Top 11 Open Source Monitoring Tools for Kubernetes, Logz
- Jaeger Operator for Kubernetes – JaegerTracing, Jaeger
- Using Kubernetes Pod Metadata to Improve Zipkin Traces, SoundCloud
- Zipkin or Jaeger? The Best Open Source Tools for Distributed Tracing, Epsagon
- When Istio Meets Jaeger – An Example of End-to-end Distributed Tracing.md, Stevenc81
- A guide to distributed tracing with Linkerd, Linkerd
- Distributed Tracing with Java “MicroDonuts”, Kubernetes and the Ambassador API Gateway, Ambassador
Getting started with Kubernetes is easy; doing things the right way requires practice. To master it fully, you need to have hands-on experience using it to solve real world problems.
Sometimes, you need a little guidance from an expert on where to start looking and how to get going. There are a lot of different opinions out there regarding how to best achieve the outcomes we discussed above. As such, we’ve tried to provide a collection of some of the best resources to help you sort through them. The resources in this post have been compiled in hopes of providing you with that initial direction.
It’s up to you to dive into these articles and build a Kubernetes strategy that suits your requirements and needs. With a little time and effort, you, too, can be a pro-Kubernetes administrator!