At Logz.io we’re always keeping tabs on the latest and greatest in the DevOps world, for the benefit of both our own engineering team and for the teams that use our products. As the days get shorter and colder, we decided to look back on 2019 and share the top trends we’ve seen in 2019 so far. The acronym “CALMS” (Culture, Automation, Lean, Measurement, Sharing) is a helpful way to structure thinking about DevOps tools and techniques. Going from 2019 to 2020, the 10 DevOps trends in this article certainly exemplify these principles.
1. Pipeline Automation
The tendency to automate tasks where possible and practical is a consistent trend throughout DevOps. The concept of automated pipelines for software has become ubiquitous. For example, one can see the number of continuous integration and continuous delivery (CI/CD) tools continue to grow since GitHub introduced GitHub Actions, their seamless integration offering packaged with the GitHub Enterprise service that many organizations already use for source control.
2. Infrastructure as Code
Hand-in-hand with the popularity of automation comes the continuing rise of “infrastructure as code” tooling. Tools such as Terraform, AWS Cloud Formation, Azure Resource Manager, and GCP’s Deployment Manager allow environments to be spun up and down at will as part of the development process, in CI pipelines, or even in delivery and production. These tools are continuing to mature. Notably, Terraform version 0.12 was released in 2019, offering a number of new features that make it an even more powerful and expressive tool.
It feels like Kubernetes is everywhere in 2019. From its inception in 2015, this immensely popular container orchestrator has had the most mindshare in the DevOps community, despite competition from products like Mesos and Docker’s Swarm. Major software vendors like RedHat and VMWare are fully committed to supporting Kubernetes. An increasing number of software vendors are also delivering their applications by default on Kubernetes.
In addition, the core Kubernetes API continues to grow, with several releases in 2019. Features like Custom Resources and Admission Webhooks are going into general availability, and the Container Storage Interface is going into beta.
Kubernetes adoption is still growing. While the platform has yet to prove itself for all classes of workloads, the momentum behind it seems to be strong enough to carry it through for a good while.
4. Service Meshes
Conversations about implementing Kubernetes increasingly go hand-in-hand with conversations about service meshes. “Service mesh” is a loose term that covers any software that handles service-to-service communication within a platform.
Service meshes can take care of a number of standard application tasks that application teams have traditionally had to solve in their own code and setups such as load balancing, encryption, authentication, authorization, and proxying. Making these features configurable and part of the application platform frees up development teams to work on improvements to their code rather than standard patterns of service management in a distributed application environment.
The biggest names in the service mesh arena are Istio, Consul, and Linkerd. Istio, which is sponsored by Google and RedHat, is most commonly associated with Kubernetes deployments and has a reputation for both complexity and difficult maintenance. Consul is a Hashicorp product with a simpler design that is quite feature-rich. Linkerd is less feature-rich, and the original product was relatively heavyweight. It’s recently been rewritten in Go and Rust (as “Linkerd2”) specifically for Kubernetes. It remains to be seen whether it can compete as a rival product in that space.
Another trend in DevOps is to talk about observability in applications. Observability is often confused with monitoring, but they are two distinct concepts. A good way to understand the difference is to think of monitoring as an activity and observability as an attribute of a system. Observability is a concept that comes from real-world engineering and control theory. A system is said to possess observability when its internal state can be easily inferred from its outputs. What this means in practice is that it should be easy to infer from an application’s representation of its internal state what is going on at any given time. As applications get more distributed in nature, determining why parts of it are failing (and therefore affecting the system as a whole) becomes more difficult.
This is where the associated concept of cardinality, which refers to the number of discrete items of time-series data a system stores, comes in. As a rule, the higher the cardinality, the more likely a system is to be observable, since you have more pieces of data to look over when trying to troubleshoot it. Of course, the data gathered still needs to be pertinent to the system’s potential points of failure, and a mental map is also still required to effectively troubleshoot.
While the DevOps portmanteau has been a standard part of IT discussions for some time, other neologisms are coming to the fore. DevSecOps is one of these. This concept is gaining traction as teams aim to get security “baked in” to their pipelines from the outset rather than trying to bolt it on after development is complete. Thus security increasingly becomes a responsibility of DevOps, SRE, and development teams; consequently tools are springing up to help them with that.
“Compliance as code” tools like InSpec have gotten popular as automated continuous security becomes a priority for organizations buckling under the weight of the numerous applications, servers, and environments they track simultaneously.
Automated scanning of container images and other artifacts is also becoming the norm as applications proliferate. Products like Aqua and SysDig are fighting for market share in the continuous security space.
You may also hear DevSecNetQAGovOps mentioned as more and more pieces of the application lifecycle seek to make themselves part of automated pipelines. However, DevSecOps is still the most common reiteration to the by-now somewhat-classic DevOps pairing.
7. The Rise of SRE
Site Reliability Engineering is an engineering discipline that originated in 2003 at Google (before the word DevOps was even coined!), described at length in their eponymously book Site Reliability Engineering. Eschewing traditional approaches to the support and maintenance of running applications, Google elevated operations staff to a level considered equivalent to their engineering function. Within this paradigm, SRE engineers are tasked with ensuring that live issues are monitored and fixed, sometimes by writing fresh software to aid reliability. In addition, their feedback on architecture and rework pertaining to reliability and stability is taken on by the development team.
SRE works at the scale of Google’s operations, where a division between development and operations (normally an anti-pattern for DevOps) is arguably required because of the infrastructure’s size. Having a team responsible for an entire application from development to production (a more traditional DevOps approach) is difficult to achieve when the platform is large and standardized across hundreds of data centers.
DevOps companies are more frequently advertising for “SRE Engineers” than “DevOps Engineers” in 2019. This may be in recognition of SRE’s specific engineering focus, as opposed to DevOps’ company-wide one.
8. Artificial Intelligence
There is increasing speculation about the role artificial intelligence (and, specifically, machine learning) can play in aiding or augmenting DevOps practices. Products such as Science Logic’s S1 and the Cognitive Insights feature in Logz.io’s Log Analytics product are starting to trickle into the market and gain traction, although they are still in the early stages of adoption. These products use machine learning to detect anomalous behaviours in applications based on previously-observed or normative behaviors.
In addition to traditional monitoring activities, AI can be used to optimize test cases, determining which to run and not run on each build. This can reduce the length of time it takes to get an application into production without taking unnecessary risks with the stability of the system.
On the more theoretical side, Google has published information about their use of machine learning algorithms to predict hardware failures before they occur. As machine learning becomes more mainstream, expect more products like these to arrive in the DevOps space.
Serverless has been a buzzword since AWS introduced AWS Lambda in 2014. Things have been heating up since then, as other providers and products have been getting in on the act.
The term “serverless computing” can be confusing—in part because servers still have to be involved at some level. Essentially, it describes a situation where the deployer of the application need not be concerned with where the code runs. It’s “serverless” in the sense that providing the servers is not something the developer needs to deal with. Typically, serverless applications are tightly coupled with their underlying computing platforms, so you need to be sure that you’re comfortable with that level of lock-in.
Following Lambda’s introduction, Azure Functions was released. Google Cloud Platform also got in on the act with Cloud Run, a service which allows you to bring your own containers to the platform rather than requiring you to upload the code on approved runtimes that Lamda or Functions currently support.
On the Kubernetes side, Knative is the best supported offering at present, but there are several other serverless options, including Apache OpenWhisk and OpenFaaS. More innovation is expected in this space as adoption grows and more use cases are covered.
10. “Shifting Left and Right” in CI/CD
The concepts of “shifting left” and, to a lesser extent, “shifting right” in CI/CD are gaining visibility this year. As release cycles get smaller and smaller, “shifting left” means making efficiency improvements by failing builds earlier in the release cycle—not just with standard application testing, but also with code linting, QA/security checks, and any other checks that can alert the developer to issues with their code as early in the process as possible.
“Shift-right” testing takes place in production (or production-like) environments. It is intended to bring problems to the surface in production before monitoring or user issues are raised. One example of this trend is “log-driven development,” which is used internally at Logz.io.
These are just ten of the more noteworthy trends we’ve been watching amidst the maelstrom of activity in the world of DevOps in 2019. Here at Logz.io, we strive to help our customers tackle many of the challenges they face around these trends with an observability platform that provides unified monitoring, troubleshooting, and security designed with DevOps teams in mind. Stay tuned for more from us as we keep our ears to the ground on the latest happenings in the industry and finger on the pulse for DevOps trends for 2020.