Production environment stability and high availability are the holy grail of every SaaS company. R&D organizations put a lot of effort into achieving these goals by implementing different monitoring and alert methodologies and by utilizing a variety of systems and tools. Mean-time-to-detect (MTTD) and mean-time-to-repair (MTTR) are two crucial KPIs that help R&D management personnel determine the efficiency and proficiency of their teams’ responses to production incidents. DevOps and SRE teams are always looking to improve their MTTD. This article will compare two such metrics tools: Prometheus vs Nagios.
Prometheus is a metric collection tool that works with time series data. Many R&D organizations choose Prometheus as their main monitoring data source because it easily fits into most software architectures, integrates swiftly with most modern technologies, and is convenient to set up and maintain. Prometheus comes with a built-in database for collected time series data, a designated query language (PromQL) for leveraging this database’s multi-dimensionality, and a service discovery ability that helps to monitor new components and services as soon as they deploy as part of the application stack. Prometheus exporters allow for the collection of data from services that Prometheus cannot instrument and automatically identify, and the Prometheus Alertmanager pushes notifications about threshold breaches to external collaboration and on-call tools.
Prometheus users generally tend to choose Grafana as their preferred tool for visualizing the data Prometheus collects, since Prometheus’ user interface is considered somewhat primitive. Grafana’s dashboards and graphs make it possible to query and display metrics from Prometheus as well as to integrate Prometheus’ data with data from other sources.
Nagios is an industry leader in IT infrastructure monitoring. It offers multiple solutions to meet R&D needs, addressing both business and technical challenges. Nagios facilitates the high availability of applications by providing information about database performance. It can also help with capacity planning and cost management. Nagios has four different products to choose from: Nagios XI, Nagios Log Server, Nagios Network Analyzer, and Nagios Fusion. See the features’ descriptions below.
Nagios XI is an enterprise-ready server and network monitoring system that supplies data to track app or network infrastructure health, performance, availability, of the components, protocols, and services. It has a user-friendly interface that allows UI configuration, customized visualizations, and alert preferences.
Nagios Log Server
While Nagios XI is mostly for monitoring 1) application or infrastructure metrics and 2) thresholds, the Nagios Log Server is for log management and analysis of user scenarios. It has the ability to correlate logged events across different services and servers in real time, which helps with the investigation of incidents and the performance of root cause analyses.
Because Nagios Log Server’s design is specifically for network security and audits, it lets users generate alerts for suspicious operations and commands. Log Server retains historical data from all events, supplying organizations with everything they need to pass a security audit.
Nagios Network Analyzer
Nagios Network Analyzer is a tool for collecting and displaying either metrics or extra information about an application network. It identifies which IPs are communicating with the application servers and what requests they’re sending. The Network Analyzer maintains a record of all server traffic, including who connected a specific server, to a specific port and the specific request.
This helps plan out server and network capacity, plus understand various kinds of security breaches likes unauthorized access, data leaks, DDoS, and viruses or malwares on servers.
Nagios Fusion is a compilation of the three tools Nagios offers. It provides a complete solution that assists businesses in satisfying any and all of their monitoring requirements. Its design is for scalability and for visibility of the application and all of its dependencies.
Prometheus vs Nagios: Comparing the Tools
Prometheus and Nagios offer different functionalities. Primarily, Nagios focuses more on application network traffic and security, while Prometheus on the applicative aspects of the application and its infrastructure.
Prometheus collects data from applications that push metrics to their API endpoints (or exporters). Nagios uses agents that are installed on both the network elements and the components that it monitors; they collect data using pull methodology.
Nagios can also leverage the Simple Network Management Protocol (SNMP) to communicate with network switches or other components by using SNMP protocol to query their status. For Windows-based tools, Nagios uses the Windows Management Instrumentation (WMI) protocol for communication and data collection.
As previously mentioned, the graphs and dashboards Prometheus provides don’t meet today’s DevOps needs. As a result, users resort to other visualization tools to display metrics collected by Prometheus, often Grafana.
Nagios comes with a set of dashboards that fit the requirements of monitoring networks and infrastructure components. Yet, it still lacks graphs for more applicative-related issues.
Setup and Maintenance
Nagios comes as a downloadable bundle with dedicated packages for every product with Windows or Linux distributions. After downloading and installing the tool, a set of first-time configurations is required. Once you’ve installed the Nagios agents, data should start streaming into Nagios and its generic dashboards.
However, Prometheus deployment is simpler since there is a Docker image that can spin up on every machine type. Additionally, Prometheus’ maintenance requires only storage upkeep and the deployment of the exporters for non-instrumented services and tools.
Prometheus’ integrations are practically boundless. The long list of existing exporters combined with the user’s ability to write new exporters allows integration with any tool, and PromQL allows users to query Prometheus data from any visualization tool that supports it.
Nagios has a very limited list of official integrations. Most of them are operating systems which use the agents to monitor other network components. Others include MongoDB, Oracle, Selenium, and VMware.
Prometheus offers Alertmanager, a simple service that allows users to set thresholds and push alerts when breaches occur.
Nagios uses a variety of media channels for alerts, including email, SMS, and audio alerts. Because its integration with the operating system is swift, Nagios even knows to generate a WinPopup message with the alert details.
On a side note, if you’re curious, there is an alert Nagios plugin that alerts for Prometheus query results.
Nagios Core is an open-source tool. It provides basic monitoring and comes with a limited list of agents. The community isn’t updating it and it doesn’t have many contributors, watchers, or forks. On GitHub, Prometheus has been forked about 4,900 times, while Nagios Core only just over 300 times.
On the other hand, Prometheus is one of the biggest open-source projects in existence. It actually has hundreds of contributors maintaining it. The tool continues to be up-to-date to contemporary and popular apps, extending its list of exporters and responding to requests.
There is also a specific Prometheus Monitoring Community on GitHub that works on a number of projects.
Pros and Cons
Prometheus has two main advantages: 1) its integrable nature with nearly every system in the industry, and 2) its ease of use. Nonetheless, it has a massive Achilles’ heel: main scaling. Application scaling (including its monitoring framework) affects Prometheus’ real-time time series data is affected, resulting in an increase in maintenance efforts.
This is where the underdog has an advantage in the Prometheus vs Nagios battle. One of Nagios’ main pros is its ability to scale out of the box. Additionally, Nagios is simple to maintain and highly customizable, making it a flexible fit for a wide range of application and network infrastructures.
Runtastic Migrated from Nagios to Prometheus
In his PromCon 2019 talk, Niko Dominkowitsch, a lead Infrastructure Engineer at Runtastic, explained that his company decided to move from Nagios, which was their leading monitoring system, to Prometheus. The main reasons for the migration were the degree of configuration effort Nagios required and the many false positive alerts that it had generated.
Setting up a battle of Prometheus vs Nagios is a contest of the most popular guy in the neighborhood against someone who stands out among his own clique. Prometheus and Nagios are very different in their designs, their audiences, and their capabilities.
Prometheus is useful for monitoring app functionality, while Nagios is a very powerful platform for application networks and security. However, Prometheus has the edge in performance metrics. Because the two tools play different roles in DevOps monitoring stacks, the data each provides is only part of the whole application status picture. Integrating and coordinating both of these tools might be one way to go.
Together, they can help DevOps teams, monitor real-time app status, enhancing the ability to react quickly.