We at Logz.io are an open source analytics company that develops Big Data technology around log data, and some of the challenges that we face are the same challenges that all next-generation, DevOps-driven organizations face.
How do we manage our source code? How do we do continuous deployment? How do we monitor and test everything so that it will work as expected? These are the questions that everyone asks, and the answers become more confusing as one grows and scales from a couple of servers to hundreds or thousands of servers across dozens of different services that all need to be managed, secured, monitored, and updated constantly.
To answer those questions, many organizations today are practicing an “open source first” strategy. We are the same way. Ninety-nine percent of our tools — from coding to monitoring to backup to security tools — are open source or based on open source. As a company that is building a solution on top of open-source ELK Stack, our team is heavily involved in the open-source community, contributing to multiple projects such as Camel and Kafka (see our GitHub repo) as well as customizing tools to fit our needs.
While we have been building our SaaS company, we have been investing much time in researching and deciding which tools to include in our DevOps toolkit. We’ve based these decisions on our years of experience in the IT industry, dealing with infrastructure for the most part. From building a petabyte-scale, data analytics infrastructure, our architecture, tools, and processes have become key components of our technology and operations. We’ve taken great care in selecting, benchmarking and constantly improving our selection of tools.
By sharing the toolset that we’ve collected and honed over time, we hope to foster a discussion within the DevOps community on what further improvements can be made.
Mesos is a DevOps tool that abstracts CPU, memory, storage, and other resources away from virtual or physical machines to help DevOps teams to build and run fault-tolerant and elastic distributed systems easily.
We’re just starting to test Mesos (along with Marathon) to run our entire software stack. While I do not have much to report at the moment, the tool does look very promising and we’re very happy with the results so far. I will certainly update this post as we have more data on it.
Kubernetes is a tool that allows one to manage multiple Docker containers as a single unit to make development occur more quickly and simplify operations overall. Essentially, it is an open-source orchestration system that handles scheduling onto nodes in a single cluster, manages workloads, and groups containers into logical units for simplified management and discovery. We have been testing it for a small part of our environment, and comparing it to stack built on Mesos and Marathon. The jury is still out on this one .
3. Nagios (& Icinga)
It feels like tools such as Nagios and Zabbix have been here for ages. They are not cutting edge, but we love them and heavily use Nagios. It is still a very popular choice because of the large number of plugins that the open source community has made for the tool. However, Nagios did not have all of the features that we had needed, so we had to find our own workarounds with homegrown monitoring and ELK (see the next item).
Icinga was originally created as a fork of Nagios that was intended to add new features and create a modern user interface. The user community has continuous debated the pros and cons of Nagios versus Icinga, so I will leave it for the community to decide. (For the record, we are continuing to use Nagios, but we may always change our mind in the future.)
4. ELK – Elasticsearch, Logstash, Kibana – via Logz.io
The ELK Stack of Elastic is the most common log management platform in the modern IT world, and it is comprised of three open-source software packages: Elasticsearch, Logstash, and Kibana. Elasticsearch is a NoSQL database that is based on the Lucene search engine. Logstash is a log pipeline tool that accepts inputs from various sources, executes different transformations, and exports the data to various targets. Kibana is a visualization layer that works on top of Elasticsearch.
This is what our company (Logz.io) uses and offers as a cloud-based service, so it’s obviously a core tool in our monitoring and troubleshooting infrastructure. You’re welcome to check it out here (along with our free library of ELK Apps) or install the ELK Stack on-premise yourself. For the latter purpose, here are some of our informational resources:
- How to Install ELK on Amazon Web Services
- How to Deploy ELK in Production
- The Complete Guide to the ELK Stack
5. Chatops and Hubot
ChatOps is a new collaboration model that connects DevOps people, tools, process into a transparent, automated workflow that occurs through a group chat room such as Slack. GitHub coined the term and created the Hubot, which (as the company dryly notes) has the potential both to improve and decrease DevOps engineer efficiency. We are heavily using Slack and Hubot to automate critical automation processes because it empowers individuals to automate mundane tasks quickly and easily. We just love it.
Consul is used to assign DNS names to services. For example, DevOps engineers can provide a single name to a cluster of several machines so that they need to access only that entity — thereby making work easier and more efficient. As such, Consul is a useful in service discovery and configuration, particularly in applications that are built from microservices. Still, we imagine that Consul should be able to be used for a lot more — and we look forward to seeing what the community will come up with next.
Jenkins might not be the flashiest tool out there, but the ecosystem of plugins and add-ons is very simple to use and can be customized easily. At Logz.io, we use Jenkins to run tests, create Docker containers, build code, and push to staging and then production.
By now, everyone probably knows Docker — it makes configuration management, issue control, and scaling much easier through the use of containers that can be moved from place to place.
In our environment, for example, our ELK-as-service solution has a data processing pipeline that consists of twelve layers. We also use Docker containers to run a full pipeline through all of the layers on one Mac machine.
Still, it is not the simplest tool out there. For any users who have problems using Docker, we have written a short guide on how to solve the common problems that arise when migrating to Docker.
Puppet is a DevOps tool that helps to automate the entire software deployment pipeline. First, users define the structure of their IT infrastructure and then Puppet makes sure that the structure is met during the provisioning of physical and virtual machines, the orchestration and reporting, the initial code development, and the testing and release We started our journey with Ansible, but as we grew in scale and complexity, we have migrated most of our components to Puppet.
Ansible was a great and easy way to get started with config management. However, as we scaled out to many hundreds of servers with multiple microservices, it became too cumbersome. We still use Ansible for quick automation needs such as running a command across multiple nodes.
11. Collectd and Collectl
Collectd and Collectl collect various performance statistics — and the bonus is that they are more flexible than other similar tools. Most log collection tools are designed to measure specific parameters, but these two can monitor different ones in parallel.
In one example use case, we use Collectd and Collectl to collect customer performance data and then ship that information to our ELK SaaS platform. For more information on this specific process at our company, we have written a guide on how to use ELK to monitor platform performance.
12. Git (GitHub)
Git, which is now a decade old, is one of the most-common source management tools, and it was created when the Linux community realized that it needed SCM software that was able to work with distributed systems. In our environment, we switched from Git to GitHub because of the latter’s wonderful forking and pull request features as well as its plugins that are able to connect with Jenkins.
Git is certainly not new, but it is important enough to include in any list of DevOps tools.
Jmeter is an open-source Java application that was originally designed to test the functional behavior and measure the performance of web applications. Today, it can also be used to simulate a heavy load on a server, a group of servers, a network, or an object to test its strength or to analyze overall performance under different load types. These functions make it one of the important continuous integration tools in the modern software delivery process. We didn’t want to spend time hosting it on our own so we’re using BlazeMeter.
Riemann is a tool that aggregates server and applications events with a stream processing language. It can be used to send an email for specific events, to track the latency distribution of a web app, to see the top processes on a host by memory and CPU, and to do many other monitoring activities. DevOps teams can also combine statistics from every single Riak node in a cluster and forward them to a visualization tool such as Kibana as well as track user activity in real time.
Did I Miss Anything?
The open source community has many available open-source DevOps tools. One list, of course, cannot contain every single one, but these are the ones that I think are the best.
Of course, I might be wrong. Are there other tools that should replace some of these or are there newer ones that we may have missed? I invite you to comment below with your own favorite tools.