Bleacher Report is the premier digital destination for millennial sports fans creating and collaborating on the culture of sports for the next generation of fans. Reaching more than 250 million fans each month, its vision for making sports bigger than games has led to unmatched engagement on social media, where it consistently ranks as a top publisher on every platform. Bleacher Report also provides an industry-leading fan experience on mobile devices through a top-rated smartphone and tablet app.
Providing the best possible reading experience for users is a top priority for Bleacher Report. To help mitigate performance issues and minimize their effect on end-users, an elaborate monitoring system was constructed for identifying and troubleshooting issues in production in real-time.
Bleacher Report has long understood the importance of centralized logging for monitoring a distributed and cloud-based environment. With multiple servers backing application services, effectively troubleshooting any issues that may be occurring depends on being able to easily access, analyze and visualize log data.
When the company began building a centralized logging architecture, there were two main considerations. First, the ideal solution would be frictionless and involve a minimal amount of time to get up and running. Second, considering the large amount of data being logged, Bleacher Report required a solution that was going to be efficient in terms of resources.
A number of platforms and technologies were evaluated, but the team felt that the open source ELK Stack was the right path to go. However, the team also realized that the effort of setting up and maintaining a large, long-term deployment of the ELK Stack would be a challenge.
Following a recommendation from Turner Broadcasting – B/R’s parent company – Bleacher Report evaluated and ultimately selected Logz.io in large part to its scalable and fully managed ELK Stack.
Speed and ease of setup were key considerations when selecting Logz.io, and the initial integration with the service was smooth. Over time, Bleacher Report also added additional logging components to make the process even more efficient.
Bleacher Report uses Filebeat to ship logs into Logz.io. An AWS Elastic Beanstalk extension was developed to handle the shipping with Filebeat as well as log rotation.
The team also developed plug_logger_json, an Elixir plug that formats an HTTP log as JSON and results in log messages that contain fields with all the required information on requests such as the request duration, method, and path.
Bleacher Report closely monitors requests being routed to the company’s various applications and services. Incoming HTTP traffic to the AWS EC2 instances hosting these services is logged and shipped into Logz.io. The team is primarily interested in error response codes, so requests outside of the 400-599 range are filtered out.
Since Bleacher Report services are tagged with a host name, the team is able to use the ‘beat.hostname’ field logged by Filebeat to easily differentiate across the logs. For example, Bleacher Report can differentiate between logs for requests made to production instances and those made to the staging instances.
Based on this host differentiation and the parsed JSON logs, Bleacher Report created a series of Kibana visualizations and dashboards that enable the team to quickly identify and subsequently troubleshoot issues in production.
For example, one visualization enables the team to view 95 percentile response times for any given environment, service, host or controller action.
Another example is a visualization that allows Bleacher Report to trace requests across microservices to find where an error may have occurred.
Additional visualizations monitor CDN response times, high-traffic endpoints (articles), database queries, requests per environment and service, and more.
Logz.io is now used across the back-end services team at Bleacher Report, with approximately 80GB of HTTP request logs shipped daily to Logz.io.
Using Logz.io, the team has been able to build a standardized and centralized logging infrastructure for the company’s applications with the end result of being able to cut time-to-resolution for issues in production and build an effective monitoring solution.
Having centralized logging with Logz.io has been a tremendous time saver for debugging issues that would otherwise take days to find the source of.
John Kelly, Senior Backend Engineer for Bleacher Report