How Tango Analyzes 2 TB of Log Data Using Logz.io

About Tango

Tango is a leading mobile messaging service with more than 390 million registered members around the world. Evolved from its beginnings in 2009 as a cross platform video and voice call app, Tango today is established as the way to get connected during everyday life moments with close friends and family.

Logging with the ELK Stack

Tango’s engineers use logs to monitor and troubleshoot the company’s application. While millions of users are using Tango for messaging their friends and family, engineering teams are using application and server logs to make sure the service is behaving as expected.

With increased usage, log data began to explode in scale and reached daily volumes of a few terabytes of data. The old school logging method of sshing into machines and grepping for specific log messages was simply unattainable. The need for a centralized logging solution was clear, and the Tango team went through a number of iterations implementing various logging solutions.

The ELK Stack was eventually selected as it answered both the need for a comprehensive platform that could handle all the stages of the logging pipeline — collection and aggregation, storage and analysis, and the team’s preference for open source technologies.

Feeling the Pain

Tango set up their ELK Stack on an AWS-based infrastructure but began to encounter some technological challenges mainly stemming from the scale involved.

The amount of instances provisioned for dealing with the growing volume of log data extracted a steep price from the company, both in terms of money and manpower. Building and sustaining a large scale ELK deployment necessitated a growing amount of resources, resources that the company wanted and needed to divert elsewhere.

It was this consideration that motivated the team to look for alternative solutions. None of these solutions however, had the required functionality and capabilities. First and foremost, the team preferred an ELK-based solution that would help make any future transition as smooth as possible. The team also needed a platform that would not cave under pressure and that could cope with the huge volumes of data Tango was generating.

Offering scalability, security, and high availability, Logz.io’s hosted ELK service was found to fit the bill.

Playing on Familiar Ground

Transitioning to Logz.io was a fast process for Tango because of the team’s familiarity with the ELK Stack and the existing logging architecture already in place.

For example, configuring the shipping of the logs shipped to Logz.io was simple, with Filebeat used for forwarding the different log files and parsing configured with the help of the Logz.io support team. AWS VPC Flow logs are shipped via S3 buckets using Logz.io’s built-in AWS support.

Monitoring Application Usage and Behavior

Tango ships Java application and Apache Tomcat access logs that are used together for monitoring how the Tango application is performing.

Tomcat access logs provide input on how people are accessing and using the application, and are the base for a Kibana dashboard the team constructed to monitor response times and return codes for requests made by users. Any spike in the traffic or error response codes are easily identified.

Tango has identified and mitigated numerous spam attacks using these access logs, identifying anomalous bandwidth usage and suspicious request activity.

When an error is logged and identified, the team drills down into the application level by analyzing the Java application logs shipped into Logz.io. This gives the team a more granular view, allowing the engineers to drill down to the class level and troubleshoot the root cause of the error.

End Result

Both the Operations and Server Engineering teams at Tango use Logz.io to analyze over 2TB of log data a day, shipped from approximately 100 servers running the company’s most important services.

Using Logz.io has helped Tango transform what was once a resource-intensive and expensive process into an easy and cost-effective process. Ultimately, outsourcing the heavy lifting involved in maintaining and running an ELK Stack of their own to Logz.io has helped Tango to focus on improving application user experience instead of on the logging infrastructure.

Turn machine data into actionable insights with ELK as a Service