10 Essential Monitoring Interview Questions to Prep

By: Gedalyah Reback

February 3, 2020

10 Essential Monitoring Interview Questions to Prep

You might be on the move. You might be breaking into the field. You might be trying to level up with your next gig. You might be ready for a manager’s role. Regardless of the reason, you are wondering what you need to re-study for your job interviews and “homework” assignments. Even with years of experience, every new gig expects you to answer different kinds of DevOps interview questions. You’re in luck: Logz.io can help a bit on logging and network performance monitoring interview questions.

We can give you a good perspective on what the log management world is seeking, but our teams have experience beyond logging and metrics. While there’s a high probability questions will be more specific, these will give you a good idea of what to expect. You’ll also find examples of follow-up or related questions below:

1. How do you know when you have sufficient amounts of logs?

Interviewers will gauge your understanding of what makes logs efficient. If you define a goal and then focus on specific data to include, you minimize the costs of storing that data. It also makes parsing the information easier—and quicker.

Interviewers might go into more detail asking questions like, “Around which events would you set up alerts,” or “What should the frequency of those alerts be?”

This article on logging best practices will provide more insight into the issue.

Possible follow-up questions:

1. How do you use the ELK Stack for application performance monitoring?
2. What is the role of NIDS in network monitoring? (Also, what are HIDS?)
3. How do you go about cloud monitoring at scale?

2. What should a SIEM tool provide?

There are certain consensus requirements for SIEM systems.

The idea here is to get a sense of your security instincts. Be aware of the things companies expect out of SIEM: handling tool sprawl, data inputs and filtering false positives for instance.

If you have experience managing SIEMs, go into more detail to show your ability to navigate the tool. SIEMs, especially initial deployment, is complicated.

There are also certain problems that characterize the modern challenge of making SIEM work: more data than ever, increasing integrations and APIs between platforms, and dealing with issues of scale in different ways than in non-security contexts.

Common follow-up questions:

1. What SIEM tools are available for the cloud?
2. What should you look for in a threat intelligence feed?
3. What is “next-generation SIEM?”

3. How do you use logs to troubleshoot Kubernetes?

Kubernetes has two main log types: application logs and cluster logs. Cluster logs will provide insight into events such as deployment errors.

This can relate to a lot of basic questions like ‘What are microservices?’ and ‘What are the different parts of a Kubernetes cluster’ to “How do you run apps from Docker containers?” and “How would you debug Docker?”

Common follow-up questions:

1. How do you set up cluster-level logs in Kubernetes?
2. How do sidecar containers help organize log streams?
3. What is log rotation in Kubernetes?

4. How does X work with Elasticsearch? How do you run Y with the ELK Stack?

Integrations will be critical, especially in the monitoring world. In our case, we’re focusing on the ELK Stack (of course, there are other tools like Prometheus and Grafana to consider). Preferences differ from organization to organization on importing logs or metrics from different sources into Elasticsearch or some alternative TSD and then onto a visualizer that might not necessarily be Kibana or Grafana.

Possible follow-up questions:

1. How would you index another database with Elasticsearch?
2. What is the best way to send data from a non-Elasticsearch DB to Kibana?
3. How would you send data from multiple message brokers to a single database?

5. Tell me which tools you’ve often worked with.

This is similar to points about Kubernetes above, but it’s somewhat more general. Different companies work with different programs with varying approaches to making them work together. Of course, all these platforms need monitoring.

HR and engineering leads don’t expect every candidate to have experience explicitly with those platforms. Rather, they want to see that you can learn the ins and outs of a specific system. If you say you have experience with Grafana but not Kibana, then you should expect a lot of pinpoint, deep questions about Grafana.

Knowing your way around one environment shows interviewers you can learn your way around another environment in a similar manner.

Example follow-up questions:

If you talk about Kibana:
1. How do you create a new Kibana dashboard?
If you talk about Elasticsearch:
2. What’s “split brain” among nodes in Elasticsearch?
If you talk about Logstash:
3. What’s the difference between the grok filter and mutate filter?

6. Describe a site outage and how you dealt with it.

Interview questions often ask possible employees about how they handle network outages or downtime.

There isn’t a definite way to deal with an outage, but clearly the interviewer in our case wants to hear you navigate system logs and metrics. Even if you haven’t dealt with a severe situation before, you will want to have a procedure in place.

In addition, you can demonstrate your awareness of the problem with how much you have prepared for that scenario. If you can show you don’t discount the possibility of a catastrophic outage, then the interviewer will see you take the job seriously.

Common follow-up questions:

1. Have you caused a site outage before?
2. What are some things you can do to prevent server outages?
3. What logs might contain data on the duration of downtime on Linux? (/var/log/boot.log)

7. Your database is running too slow. How do you speed it up?

There are some standard places to check you can list in this answer – for example, network latency, app or SQL processing time, etc. – but this is also an important moment to demonstrate your awareness of logs and monitoring. While the first instinct might be to check queries, you can offer a detailed answer by distinguishing between different kinds of troubleshooting methods depending on the database. This is a chance also to go into detailed knowledge about your own experience with a specific database.

Possible follow-up questions:

1. What are/How do you configure MySQL slow query logs?
2. How do you troubleshoot MongoDB when it comes to slow queries?
3. How do you monitor for DDoS attacks?

8. How do you debug Kubernetes or Docker?

What is Docker? How do Kubernetes and Docker work together?

Obviously related to one of the sections above, debugging Kubernetes pods or Docker containers might make up the bulk of your future department’s headaches. There are a number of approaches engineers prefer. In this case, while providing detail about a certain method will demonstrate experience and competence, it still might not correlate with your interviewer’s or prospective employer’s preferred methods. This is still okay. Reiterate your versatility to using different methods (especially when one doesn’t work). For something like this, be aware of two or three different options that you can describe in brief. Logz.io has some content on the subject of searching for debugging information in Kubernetes logs.

Common follow-up questions:

1. Describe Kubernetes architecture.
2. How do you monitor Kubernetes?
3. What is kubectl?

9. Are you familiar with this AWS product?

This one persists. AWS and cloud architecture in general is a shifting sandbox for developers. With new releases and demands on Amazon (or other services like Azure or GCP), what concerns interviewers and team leaders will change year-to-year, role-to-role, and industry-to-industry.

The biggest questions to look out for based on trends for AWS in the coming year are things like “Have you worked in multi-cloud infrastructure?” and “Are you familiar with Lambda / serverless computing?”

Common follow-up questions:

1. What types of load balances does AWS support?
2. Explain the difference between CloudTrail and CloudWatch.
3. What is AWS Kinesis?
4. How would you monitor a multi-cloud architecture?

10. In your experience, what was the most important aspect of the logs you kept?

Log Management Dashboard in Logz.io

Every app—and every industry—has its own particular needs. Each app will have its own predefined goals that will inform how you structure your logs. For more info on this, including log frameworks, deciding what to (and not to) log, and formatting, check out our logging best practices.

Common follow-up questions:

1. How do you display systemd logs?
2. What is the difference between Filebeat and Logstash?
3. What are ingest nodes in Elasticsearch?

Conclusion

These ten topics clearly overlap in some places, but even so can certainly be described in even further depth each to its own. Keeping up with the flurry of new platforms and where they fit into each company’s DevOps strategy is a struggle, but even more so for prospective employees than current employees.