Eliminating ELK Downtime and Improving Productivity

Industry: Digital Media

Company Size: 500-1000

Founded: 2007

HQ: New York, New York

Logz.io Products: Log Management

Company Profile: Web Analytics

Cloud Infrastructure: AWS

SimilarWeb

SimilarWeb is a staple software of the modern internet, evaluating website traffic stats and comparing sites to each other. Based in Tel Aviv with over 700 employees, SimilarWeb’s insights are valuable to internet marketers who want to see where their websites are falling short and where they are doing well. 

SimilarWeb’s Monitoring Team

For a company that develops software that monitors the internet, it is imperative that their production environment is highly available to ensure accuracy and performance for users.

The company’s infrastructure and engineering team had eight on-prem servers, mainly to account for service availability, which ran “at the brink of cost-effectiveness.” With that sensitivity, monitoring needs to be seamless. Like many others, the team’s engineering organization looked to the ELK stack to make that happen. Unfortunately, managing the ELK stack diverted away resources from the entire engineering and development operation.

‘We would have at least one monthly crash with our ELK stack,” says Or Tzabary, SimilarWeb’s VP R&D, Production Engineering. “It just wasn’t able to reliably process things. It would sometimes take about 3 hours just to do log ingestion. The time gap was significant when those delays correlated with actual incidents we had to investigate.”

Why OSS?

But why go with ELK in the first place? Heck, any sort of open source?

Well, Or puts it simply: “We like open source. You have the ability to develop and patch things the way we want.” Moreover, the sense of collaboration is motivating. There are so many contributors who can consider what you need from a specific tool and provide feedback on those ideas.

“I like to have the ability to develop and that sense of community.”

Time is Money

Palpable here, even more so than in a lot of other cases, but cost was also a driving factor. When they compared the costs of self-maintenance to managed ELK, they were similar. But the time spent on provisioning and scaling ELK was itself ‘over-budget’ as it were.

“The pricing was the same when considering the cost of maintaining the open-source ELK Stack. But the savings with Logz.io are very clear in time commitment. It frees a lot of our time to actually maintain and develop the system instead of just the monitoring infrastructure.”

Prior to the shift to Logz.io, there would be at least one incident per month involving an Elasticsearch server that would fail and then take hours to recover. This outage resulted in sometimes three or four hours of downtime and limited access to new Elasticsearch logs. But with Logz.io, the number of incidents per month has been reduced from one to zero. This means that “once we migrated to Logz.io, we were able to focus more time on delivering impact instead of fixing the same problems over and over again.”

The Logz.io Incentive and Performance

SimiliarWeb’s primary use case is to ship and centralize their logs to help debug and troubleshoot issues before they impact end users. 

But, moreover, Logz.io’s support and overall maintenance is foundational, but it’s also intuitive. Anomaly detection and drop filters have played a major role in SimilarWeb’s development team’s productivity and efficiency. Beyond that, Sub Accounts are playing a crucial role in supporting the productivity of the company’s many segmented teams.

“Some Sub Accounts are overloaded, so we’re trying to shift ownership to the individual R&D groups themselves who will manage their own data. Each one of them has the specific volume of logs that they need to ship and they own it. This flexibility is very helpful.”

With efficient log management, engineering resources can redirect to the work at the center of SimilarWeb’s niche. This drives a tangible business impact as skills are now aligned to providing product development (and customer) value.

“We noticed Logz has a big incentive to do things right, to do them efficiently,” Or told us. And that put Logz.io’s expertise at an advantage. “We could always migrate back to the OSS version of ELK if we needed to. But there’s no benefit [to that] when Logz manages the monitoring stack so well.”

“In terms of support and engagement, Logz.io are very dedicated towards customer-support. It feels like we’ve chosen a great partner for this relationship.”

You might also like

Duda

Improving Visibility across Microservices by Correlating Logs and Traces

Bleacher Report

How Bleacher Report Monitors the Experience of Millions of Sports Fans Using Logz.io

ZipRecruiter

How ZipRecruiter Boosted SRE Productivity with Logz.io

× Announcing Logz.io’s native integration with Azure for frictionless observability Learn More