Everyone’s talking about Platform Engineering these days. Even Gartner recently featured it in its Hype Cycle for Software Engineering 2022. But what is Platform Engineering really about? Is it the next stage in the evolution of DevOps? Is it just a fancy rebrand for DevOps or SRE?
As a veteran of the PaaS (Platform as a Service) discipline about a decade ago, and a DevOps enthusiast at present, I decided to delve into this topic, peel off the hype, and see what it’s about in practice. This also inspired my recent episode of OpenObservability Talks, in which I hosted George Hantzaras, Director, Cloud Platform Engineering at Citrix Systems.
What is Platform Engineering?
Platform Engineering is a DevOps approach in which organizations develop a shared platform to improve their developer experience and productivity across the organization by providing self-service capabilities with automated infrastructure operations.
Platform engineering can address multiple different needs. The core platform capabilities typically cover the runtime environment, Kubernetes infrastructure, and the software release pipeline and other foundations. On top of that, platforms provide a variety of capabilities such as secrets management, certificate management, automated DR (disaster recovery) and Chaos Engineering drills.
Some advanced organizations even take Platform Engineering up the stack towards shared SaaS services (sometimes referred to as SaaS Platform), offering business related reusable services such as billing, metering, cloud account management and cloud authorization .
Where does the line draw between the platform and the application? George put it nicely: “if you can take the service and give it to another product team, or even give it to another company, and they can use it right away, then that belongs in the platform.”
Does Platform Engineering replace DevOps?
As we know, DevOps uses tools to streamline deployment, management and monitoring using automation and visualization. Platform Engineering takes these tools, processes and best practices and productizes them as reusable services and tools for use across the different engineering teams and use cases in the organization.
Imagine that each product team implements its own certificate rotation mechanism. This is a common need, and having a central service to provide it is a clear benefit. This boils down to Repeatability, a foundational element in DevOps maturity. Lack of repeatability at scale serves as a clear signal for the need of a platform. As I always say:
Don’t reinvent, when you can reuse and adapt.
Platform Engineering emerges as DevOps matures and scales. To understand that, let’s look at the DevOps maturity model that maps the DevOps transformation journey. Common DevOps maturity models go through these stages:
Initial → Managed → Defined → Measured → Optimized
On the first stages of Managed and Defined, typically each team creates its own DevOps practice to fit its needs. Example deliverables could be Terraform templates or Terraform modules that engineers can then clone and add their configuration, but as they clone, they lose connection to the origin. These solutions are typically localized within individual teams, with little correlation to other parts of the organization.
As the organization matures and scales, it moves into the Measured and Optimized stages. At this stage, the organization starts collecting data points and understanding the impact of its DevOps tools and practices, which exposes pockets of inefficiency by solving the same problem separately by different teams. The need for consolidation with a shared platform then becomes apparent.
Is it PaaS (Platform as a Service) all over again?
Less than a decade ago, the common paradigm neatly divided the software stack into IaaS/PaaS/SaaS (infrastructure/platform/software as a service). The Platform as a Service (PaaS) paradigm yielded tools such as Heroku and cloud services such as AWS BeanStalk and Google AppEngine, which provided a unified off-the-shelf platform for managing applications, in a way that abstracts much of the complexity of the underlying infrastructure.
If you haven’t heard of these, it’s not your fault. These PaaS solutions took a highly opinionated approach to achieve that simplicity, one which proved not flexible enough to serve the different practices and preferences of many organizations. The Cloud Platform Engineering team at Citrix went through this learning curve internally. George shared, “we initially tried to build what we thought were the best practices and try to build our product with our own roadmap.” They learned the hard way that it didn’t work.
In larger organizations, you often find different engineering teams employ different runtimes, different deployment methods, different observability stacks, etc. The platform typically comes in at a later stage, when these are already in place, and meeting this diversity calls for treating the platform differently.
We don’t need Platform as a Service. We need Platform as a Product.
Platform as a Product
Platform Engineering, unlike PaaS, offers a more flexible approach. Firstly, it is not a general purpose off-the-shelf product provided by a third party vendor, but rather an in-house development. This typically means shorter feedback loops to get new capabilities and modifications into the platform, and tighter alignment with the organization’s tech stack, methods, practices and compliance needs.
But, it’s more than that. Platform Engineering embraces a more flexible approach as a guiding principle, to let different engineering teams work in different ways, without mandating one way for all. It doesn’t mean, however, that the platform can’t endorse certain ways. In fact, Platform Engineering oftentimes chooses a certain golden path and promotes it as the best supported option.
Treating your platform as a product starts with treating the engineers as your users: understanding what problems they have, the landscape in the community and within the company, and how the platform can serve it. When you spot a common problem, you can validate the need across teams, and quantify the benefit it will give the teams.
Let’s look at an example that George shared, of a service that rotates certificates and secrets. In that case: “I look in the backlog of the team, everything that they have implemented in the past. And try to see what toil they have invested in order to rotate secrets plus all the toil and all the manual work, you have to track secrets in Excel files.”
In this case, if you can show the team they’ve been investing 300 story points a year on this feature, while integrating the platform’s service would involve only 50 story points, the value case is made very clear.
A data-driven approach always resonates best, especially with engineers. Therefore, make sure to put metrics in place, just like any other product, to quantify the success of the platform’s capabilities you develop. These could be developer productivity-related, such as reducing PR lead time, or time to first code review, or developer experience-related, measured with engagement scores or similar metrics. George mentioned that his team uses both leading and lagging indicators for best feature KPI coverage.
The user research will also help you find the right level of abstraction in your platform, and strike a good balance between simplicity and flexibility.
Platform Engineering emerges as DevOps matures and scales
“Platform Engineering is what happens when DevOps engineers start talking to each other.” This joke by George nails it down. Different teams solving the same DevOps problem again and again is wasteful. But when they join forces, they can come up with a joint solution. When you need to scale it in your organization, look into Platform Engineering.
Want to learn more? Check out the OpenObservability Talks episode: Platform Engineering: DevOps evolution or a fancy rename?