Last week was our first ELK Meetup in Tel Aviv. Asaf, co-founder of Logz.io gave a talk on Kibana 4 and shared logz.io experience testing Kibana 4 and using it to internally monitor our infrastructure. It was super-exciting to see people openly share their experiences with ELK, discuss the advances in ELK and the challenges in running it in production.
For me it was a good opportunity to not only share our (usually..) positive experience but also learn about the challenges users have with Kibana and ELK in general. If I had to sum up the feedback from the meetup,
I would say that it’s clear that people like ELK and use it across companies of varying sizes and verticals. It was also apparent that people have challenges with ELK.
Here is my quick braindump on that:
First, how to find the cause of bug/error in my logs. I can definitely relate to this one. As a company which analyzes massive amounts of data, we also generate massive amount of logs internally. Our Ops team and developers struggle to find what’s really important inside billions of log lines. So they create smart queries, visualizations, dashboards and more dashboards in order to eventually find that one log line that’s doing the mess.
Scalability and Security. Configure ELK to work reliably and securely to process growing amounts of logs is a challenge no doubt. From scaling logstash to clustering Elasticsearch through upgrades and troubleshooting. This came up in almost every conversation and it all make sense. Processing large amounts of data is not an easy thing to do. Making sure data is secured, highly-available and scales up and down is even harder.
On our next meetup, I hope we’ll be able to share our experience on tackling these.
We’ll share how our system works under the hood. How we’ve developed a cloud architecture that can process billions and billions of log(z) a minute and how use ELK internally to monitor our own infrastructure.
If you’re around, don’t forget to register here