The numbers show that hybrid infrastructures are the new normal:
- 69% of businesses of all sizes have embraced hybrid clouds (Flexera).
- 88% of cloud-based apps share data and services with on-premises apps (MicroFocus).
- 45% of enterprises see hybrid solutions as their top priority for 2019 (Flexera).
As discussed in our blog post the Rise of the Hybrid Cloud the dramatic growth of hybrid infrastructures is being driven by benefits such as enhanced flexibility, cost optimization opportunities, and support for the agile DevOps culture. But hybrid clouds also come with their challenges, such as determining how to consistently apply security and compliance processes and how to avoid performance issues resulting from the differences between private and public cloud SLAs. Perhaps the biggest challenge of them all, however, is implementing seamless management and monitoring across complex hybrid architectures.
This blog post explores the challenges of monitoring hybrid cloud environments and suggests some best practices that can mitigate their risks.
Why It’s Hard to Monitor Hybrid Clouds
Recent surveys show that one of the main concerns today in moving workloads to the public cloud is achieving effective end-to-end monitoring of software and hardware stacks deployed across on-premises infrastructures and private and public clouds. According to Keysight Technologies, 61% of the IT professionals they surveyed are concerned about their ability to securely deliver cloud traffic to on-premises monitoring systems. Similarly, 93% believed that packet-level visibility is a critical component of security monitoring, yet only 19% thought that they have comprehensive, realtime access to network packets in a hybrid cloud infrastructure.
In addition to the security vulnerabilities they present, hybrid cloud monitoring challenges can also impact business-critical metrics related to performance and availability. With poor visibility and fragmented monitoring stacks, it takes longer to troubleshoot and resolve issues, leading to unacceptable levels of latency (too low) and downtime (too high).
In short, there is a lot at stake in overcoming the obstacles to effective hybrid cloud monitoring. One of the major stumbling blocks is that the different components of a hybrid infrastructure require different monitoring approaches and tooling, but it is unlikely that the organization’s existing monitoring frameworks can effectively track both on-premises and cloud environments. In fact, it is estimated that only 15% of legacy network security tools or appliances have been fully “cloudified,” i.e., all of their on-premises capabilities have been modified to work on the cloud. To make things even more complicated, most of these enterprise-grade monitoring systems underwent significant customization when they were deployed in customer environments. Getting so many disparate and complex systems to work together to provide meaningful, real-time insight into the health of a hybrid cloud infrastructure is close to impossible.
Another major hybrid cloud monitoring obstacle is IT’s limited visibility into the public cloud components of the infrastructure. Public cloud providers do offer monitoring and logging services, such as AWS CloudTrail, Amazon CloudWatch, Azure Monitor, and GCP’s Stackdriver Monitoring. However, the Cavirin-sponsored 2019 AWS Cloud Security Report (registration required) reveals that only 21% of respondents believe that their organizations effectively use the monitoring, logging, and alerting capabilities of AWS CloudTrail or Amazon CloudWatch.
Other issues that make it difficult to monitor hybrid clouds include:
- The difficulty of discovering, creating and maintaining an up-to-date topology for the hybrid environment.
- The sheer scale of metrics that are generated across a complex hybrid environment.
- The siloed nature of cloud provider tools. They are good, but even if they are being used effectively, they cannot provide a full picture across hybrid/multicloud environments.
- The challenge of implementing agent-based legacy monitoring solutions in the highly dynamic and often ephemeral cloud environment. In general, it is exceedingly difficult to instrument modern apps and services for logging and monitoring.
- The dearth of IT personnel skilled in configuring and managing hybrid cloud environments.
Hybrid Cloud Monitoring Best Practices
Monitoring By Design
There are a number of ways that organizations can work to overcome the obstacles to effective hybrid cloud monitoring. For one thing, monitoring requirements and processes must be addressed in the hybrid cloud design stages. The relevant security and operations teams must be given a chance to raise tough monitoring questions as the hybrid infrastructure is being modeled. It is very difficult to implement effective hybrid cloud monitoring as an afterthought.
A corollary of hybrid cloud monitoring by design is identifying and focusing on the most important metrics to be tracked and monitored. As noted above, one of the challenges of hybrid cloud monitoring is the massive quantity and variety of data that is collected and logged. Careful thought has to be put into which layers and elements are most important to the organization’s business KPIs. Those layers and elements should be given priority in the hybrid cloud monitoring system. In addition to the usual suspects such as CPU, memory, disk I/O, and network bandwidth metrics, other layers to be prioritized include various types of servers (web, Java application, database), application performance, UX, and so on.
It is a good practice to implement maximum integration across all layers of the hybrid cloud monitoring system itself and with other relevant systems. Some examples include:
- Standardizing backend monitoring procedures and processes to the greatest extent possible.
- Aggregating a useful subset of monitoring data from the various cloud platforms into a single monitoring tier from which you can then present the most appropriate alerts or reports to the right people and teams.
- Creating dashboards that compare and display logically-related metrics in order to quickly get actionable monitoring and troubleshooting insights. For example, if a particular transaction is CPU-intensive, you can create a dashboard that correlates the application metrics with CPU utilization.
- Integrating the monitoring system with a collaboration management tool, such as Slack, in order to capture how teams resolved issues identified by the hybrid cloud monitoring system. This creates a knowledge base for future reuse.
Hybrid environments are too complicated for manual processes and procedures. Every aspect of the monitoring system should be as automated as possible. This starts with automated, end-to-end discovery of the components that comprise the hybrid environment and a topology that maps the hierarchies among the components.
Artificial intelligence (AI) and machine learning (ML) methods should be used to establish thresholds for acceptable performance. These thresholds then become the basis for automatically identifying anomalies and raising smart alerts.
Similarly, AI and ML can be leveraged to automate troubleshooting analytics and processes at the system level. Determining root causes in realtime and triggering corrective actions dramatically shortens mean time to remediation—a critical KPI in almost all organizations.
Choosing the Right Hybrid Monitoring Solution
Many organizations seek to deploy a third-party solution that can orchestrate all of these complex monitoring processes and maintain the health and security of their hybrid cloud. Some of the criteria that should be used to evaluate different solutions include good integration with the relevant public cloud targets, a rich API and ample webhooks so that the solution can be customized, and, of course, strong automation capabilities.
In the oft-cited Dimensional Research March 2018 global survey, Hybrid Cloud Usage Poses New Challenges For Monitoring Solutions, IT professionals defined the following key capabilities required for a hybrid monitoring solution:
Configuring public and private clouds and on-premises infrastructures to work together as one hybrid platform with unified management lets companies enhance operational efficiency, break down technical barriers, optimize resource utilization, and lower overall TCO. However, as discussed in this article, special attention must be paid to monitoring challenges if the hybrid cloud infrastructure is going to perform well and securely.
Logz.io’s fully-managed, ELK-based monitoring analytics solution aggregates and analyzes disparate monitoring data to provide real-time visibility into the health of applications and the infrastructures on which they run.