Located in the heart of Silicon Valley, Zenfolio is the premier global photography service used by professionals and enthusiasts to display and sell their work online. Zenfolio’s robust set of state-of-the-art features are available via monthly or annual subscription plans and include everything photographers need to showcase their portfolio, attract clients, and grow their business.
Zenfolio’s platform is built upon a combination of .NET applications served by IIS web servers and Microsoft SQL Server databases as well as an IBM private cloud solution for object storage. Logging all of these components was traditionally performed by storing log data on databases and fetching relevant information when necessary.
With the growth of the business and to facilitate more efficient troubleshooting reduced MTTR, the company began searching for a more centralized solution. Zenfolio’s initial attempts to deploy an in-house ELK deployment did not succeed because the sheer amount of data being shipped created too much of a load on the stack.
Quickly realizing that maintaining an in-house ELK deployment would eventually consume too many resources, both human and machine, Zenfolio began looking for a reliable hosted ELK solution. As Brian Tomlin, Senior Director of Development at Zenfolio, put it: “We wanted a hosted ELK solution that would scale automatically and not break as the data grew. We simply did not want to have to worry about handling it. Logz.io fit the bill.”
Since Zenfolio had already been shipping logs into their ELK Stack, the process of moving to Logz.io was easy.
Application logs are outputted to a JSON file, which is then shipped into Logz.io using Filebeat. IIS web server access and error logs are also shipped using Filebeat for tracking web transactions.
Zenfolio uses Logz.io extensively for querying the log data. These queries give the company the visibility that they once lacked, helping the various teams quickly identify and troubleshoot issues by drilling down into specific log messages and fields.
When users of the platform report an issue, the Customer Support team conducts research into the specific issue being reported, identifying patterns within the data or specific requests, before escalating to the development team. With this data, developers can quickly identify the root cause of the issue and resolve it.
For example, when a third-party application using the Zenfolio API reported that users were complaining of issues with the application, a dashboard was created in Logz.io to identify and resolve the specific errors being thrown by both the third-party application and the Zenfolio API.
Zenfolio uses Logz.io to monitor not only the general health of the application but also suspicious activity.
Earlier this year, after having just started to use Logz.io, the operations team identified performance issues with the platform. Taking a look at the logs, the root cause of these issues was quickly identified — an exceptionally high number of requests to the servers was causing the bottleneck.
Drilling down into the log data revealed a pattern in the requests, with specific IPs sending similarly constructed requests. It was obvious to the team that this meant a DDoS attack was taking place. Since the pattern revealed a specific country of origin for the requests, the steps to mitigate the attack involved both blocking specific IPs and some firewall configurations.
Zenfolio has also been successful in using Logz.io to identify crawling activity by bots and attenuate the resulting loads on the platform. In one case, log data revealed bot activity instigated by Yandex, the Russian search engine company. Zenfolio subsequently managed to follow the crawler documentation and instruct the Yandex bots to throttle their crawl rates, thus reducing traffic loads on the Zenfolio servers.
Today, the customer support team, the operations team, and the entire development team at Zenfolio use Logz.io on a daily basis to log over 60 GB of log data being generated by 40-50 servers.
The main benefit of using Logz.io has been 24/7 visibility into the platform’s log data. Whereas in the past the process of fetching the correct logs, analyzing them, and eventually identifying errors was extremely time consuming and inefficient, now log aggregation and analysis has become faster and reliable. This efficiency has proved itself in the past and continues to do so, helping the team resolve issue more quickly, monitor the health of the services, and identify security incidents on time.
Brian Tomlin sums it up:
“What Logz.io has allowed us to do is reduce time to resolution, improve customer support, and ultimately make our customers happier. The amount of work needed to identify and solve issues has been reduced by five times at least.”