For companies in the technology sector, keeping data flowing between their systems and maintaining the ability to observe disruptions in that flow is critical for keeping the lights on.
Egress is a cybersecurity company providing cloud email security software. For observability, they rely on Logz.io’s Open 360™ platform.
Joe Brailsford, director of architecture at Egress, is responsible for the company’s technical strategy and architectural decision-making. For Egress, there’s always a great deal to consider when ensuring the security, reliability, and performance of its services, and the key to that is ensuring that engineering teams have access to best in class observability tooling.
“I help make sure the right conversations happen, and put the right opportunities and content in front of our engineers, so that they’re empowered to make the right decisions for the business to succeed,” Joe says, adding he hones in on security, scalability, maintainability and observability as critical factors for success.
Egress must operate their products at scale, while utilizing both public clouds a,nd maintaining their own internal infrastructure.
“We have a lot to observe, so centralization is paramount,” Joe says. “We need to be able to see what has — and is — happening, in a secure and timely manner in order to be able to execute on our deliverables.”
Like most organizations, Egress has a unique set of observability requirements. Their Site Reliability Engineering (SRE) team is focused on the stability and optimal delivery of services, ensuring they meet user expectations. For engineering and product teams, observability is more focused on engagement with their platform, journeys through systems, and the behavior of their software. Underpinning all of that is security, the nature of the services Egress operates, and the different jurisdictions it operates in, requires that the observability solution must respect borders, store process data securely, and process it in line with contractual and legislative obligations.
“Between products, the variance in the kind and volume of data is significant,” Joe says. “The ability to have per-product retention, granular access control, and multi-tiered data ingestion filters is a differentiator. We want to collect meaningful, usable data—no one wants a firehose of information that you can’t actually do anything with.”
With the focus on operational spending remaining a top business priority for most these days, controlling observability costs is critical for Egress. A major way to do this is by closely monitoring and managing the data utilized in its observability platform.
“We need a system that enables us to quantify the cost of each log and metric we ship to the platform, and identify what role that data plays in our monitoring and alerting story so we can see what the cost footprint is and directly correlate that with value added not just that we have an observability platform that costs X amount a month,” Joe says.
Egress runs a distributed system of its products and following data journeys along those products is crucial. As a result, anything that makes it easier for Egress to investigate and maintain their infrastructure is just as critical.
Logz.io’s Open 360™ allows Egress to maintain detailed visibility into their logging and metrics data in a flexible way. Egress doesn’t have to make their product systems different, write new code or change the way they work to integrate with Logz.io.
Like most organizations, the company ideally requires a platform that provides unified visibility across all of its observability data, without requiring significant customization or coding, or manual integration.
“We’re finding a very flexible interface, with very flexible controls, that we can easily connect with and make it work for our use case rather than having the product dictate what we do,” Joe says.
Previous to buying Logz.io, Egress was going through what Joe describes as a “very traditional Infrastructure-as-a-Service migration.” A lot changed in that process, and part of that was their observability strategy evolving from a lack of centralized, queryable, and reportable logs to having all their logs in one place thanks to Logz.io.
Before Logz.io, Egress focused mostly on consuming and analyzing log data for its observability needs. In an effort to reduce the cost, noise, and maintenance overhead that comes with relying on raw log data for alerting and monitoring, Egress started to adapt their observability story to lean more into metric collection.
“We built an integration with Prometheus, which is what underpins the Logz.io metrics offering,” Joe says. “We found it worked great. But we decided to send through every metric we monitor straight away. What we realized very quickly is all that adds up.”
Thanks to Logz.io’s Data Optimization Hub, Egress can quickly call out if metrics are being used, and what kind of cost or storage footprint any particular metric has made.
“We can actually see if we’re using the metric,” Joe says. “Is it an alert? Is it in a dashboard? Has it been queried? We’re able to question if a metric is needed. We have not only the cost efficiency of what we’re doing in Logz.io, but also the operational efficiency around what metrics we need or don’t need.”
Implementing metrics has been a mindset shift for Egress. There’ve been questions about the level of granularity needed and the team capacity to look through text logs to find meaningful information. That type of manual work may reach a point where it’s not effective at all and it’s impossible to write meaningful reports on them.
Historically, Egress has used more traditional monitoring solutions to deliver this kind of functionality, described by Joe as “brittle” and as relying heavily on niche knowledge and a great deal of toil.
“We looked for a metrics solution that would allow us to not only push data but push the data we wanted to push,” Joe says. “We just want to send data and then query it and then understand what our use case is.”
Egress began using familiar tools through Grafana and Prometheus, but it wasn’t enough for their use case on its own. Using the alerting platform service through Logz.io, they can choose what alerts they want to use, see what alerts should apply to certain data points.
“We define the altering once, it’s active and immediately applied everywhere it needs to be,” Joe says. “We’re covered. We have all the monitoring in place. There’s no after-action to think about.”
Egress is investigating moving its tracing solution to Logz.io based on the fact they’re using Open 360 for logging and metrics and the potential for cost consolidation. They’re also considering using Kubernetes 360 for observability for their Kubernetes-backed platform.
Customer service is another significant factor in Egress’ success in working alongside Logz.io.
“I can, hand on heart, say the replies from Logz.io to our questions have been nearly instant every time, and are immensely positive,” Joe says. “The support team working on this platform clearly, and deeply, understand it and their customers’ use cases.”
The ability to get a rapid response from Logz.io is critical given the many time-sensitive tasks that Joe and his team oversee for Egress.
“The support experience with Logz.io is probably the best I’ve experienced in terms of vendor relationships,” Joe says.
The idea of “essential observability” is one that closely resonates with Egress. Joe said the company knows what’s required for reaching the level of observability needed across its complex environment. Specifically, Egress must be able to send data flexibly, to query it, to alert on it—no matter if it’s logging, metrics or tracing data. When it comes to other solutions available on the market today, Joe said he views many solutions with healthy skepticism related to the potential attempts for vendor lock-in.
“We’re wary of how additional bells and whistles beyond what we need are going to change the way we work, to such an extent that we’re stuck,” Joe says. “If we’re shipping software with custom code to integrate with an observability vendor, we’ve made a mistake somewhere. We could do that in a way that’s very open and compatible with other providers, but vendor lock-in isn’t for us.”
The use of open source software (such as Prometheus) not only gives Egress confidence about the longevity of the service, but also opens the door to a much more cohesive developer experience. It’s important for Egress that the observability platform they use doesn’t become a limiting factor for their product delivery or operational process;the flexibility of Logz.io ensures value without vendor lock-in.
One of the things Joe pushes with Egress’ engineering team is the importance of building differentiators into their products. Yet, observability should be something that enables that innovation instead of being an area where creating innovation itself is a focus for Egress.
“Observability is not a differentiator for our business or our products,” Joe says. “It’s something that enables our success. We don’t build it. We don’t want to write custom things for it. We just want something that works for us and for our use case.”