Implementing Synthetic Monitoring with Telegraf and Logz.io

By: Doron Bargo

In my previous blog post, we explored key questions about Synthetic Monitoring, such as what it is, why it’s important, how it works, and how it compares to Real-User monitoring. Synthetic Monitoring is becoming an increasingly-popular method to continuously monitor the uptime of applications and the critical flows within them so that DevOps, IT, and engineering teams are quickly alerted when issues arise.

Unfortunately, a good Synthetic Monitoring tool can be expensive. There are two types of pricing units: per endpoint monitored or per transaction run. As your cloud workloads grow, you will have more endpoints to monitor and, therefore, more to pay. Let’s look at a few options for cost efficient Synthetic Monitoring…

Implementing Synthetic Monitoring with Telegraf

For Internal use, behind a firewall, you can run a Telegraf agent for free to collect Synthetic monitoring data. Telegraf is an open source agent part of the TICK stack – a collection of associated technologies that consists of Telegraf, InfluxDB, Chronograf and Kapacitor.

Telegraph boasts an impressive collection of API integrations that run the gamut of the entire DevOps toolshed. It is written in Golang and also accommodates exporting metrics to a number of the same tools it pulls metrics from.

For availability of internal sites, use the ping input plugin – you can integrate it with almost any observability solution you have.

[[inputs.ping]]
  ## Hosts to send ping packets to.
  urls = ["example.org"]

For internal API services, you can use the http response input plugin which has multiple options – such as supporting multiple request methods, user & password authentication and checking response payload.

[inputs.http_response]]
  ## address is Deprecated in 1.12, use 'urls'

  ## List of urls to query.
  # urls = ["http://localhost"]

  ## Set http_proxy (telegraf uses the system wide proxy settings if it's is not set)
  # http_proxy = "http://localhost:8888"

  ## Set response_timeout (default 5 seconds)
  # response_timeout = "5s"

  ## HTTP Request Method
  # method = "GET"

  ## Whether to follow redirects from the server (defaults to false)
  # follow_redirects = false

  ## Optional file with Bearer token
  ## file content is added as an Authorization header
  # bearer_token = "/path/to/file"

  ## Optional HTTP Basic Auth Credentials
  # username = "username"
  # password = "pa$$word"

  ## Optional HTTP Request Body
  # body = '''
  # {'fake':'data'}
  # '''

  ## Optional name of the field that will contain the body of the response.
  ## By default it is set to an empty String indicating that the body's content won't be added
  # response_body_field = ''

  ## Maximum allowed HTTP response body size in bytes.
  ## 0 means to use the default of 32MiB.
  ## If the response body size exceeds this limit a "body_read_error" will be raised
  # response_body_max_size = "32MiB"

  ## Optional substring or regex match in body of the response (case sensitive)
  # response_string_match = "\"service_status\": \"up\""
  # response_string_match = "ok"
  # response_string_match = "\".*_status\".?:.?\"up\""

  ## Expected response status code.
  ## The status code of the response is compared to this value. If they match, the field
  ## "response_status_code_match" will be 1, otherwise it will be 0. If the
  ## expected status code is 0, the check is disabled and the field won't be added.
  # response_status_code = 0

  ## Optional TLS Config
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false
  ## Use the given name as the SNI server name on each URL
  # tls_server_name = ""

  ## HTTP Request Headers (all values must be strings)
  # [inputs.http_response.headers]
  #   Host = "github.com"

  ## Optional setting to map response http headers into tags
  ## If the http header is not present on the request, no corresponding tag will be added
  ## If multiple instances of the http header are present, only the first value will be used
  # http_header_tags = {"HTTP_HEADER" = "TAG_NAME"}

  ## Interface to use when dialing an address
  # interface = "eth0"

Last but not least, if you are working in a secure environment, your internal traffic is probably protected by using a certificate. Although thousands of SSL certificates expire on a daily basis, in most cases, these seemingly insignificant moments go unnoticed. Our websites stay secure, our servers keep humming, and it’s business as usual until it is not.

For example: Epic Games – maker of fan favorites such as Fortnite, Rocket League, and Houseparty – experienced a massive outage due to (you guessed it) an expired SSL certificate.

“On April 6, 2021, we had a wildcard TLS certificate unexpectedly expire. It is embarrassing when a certificate expires, but we felt it was important to share our story here in hopes that others can also take our learnings and improve their systems. If you or your organization are using certificate monitoring, this may be a good reminder to check for gaps in those systems.”

To check these gaps with Telegraf, you can just use the x509 Certificate input plugin

[[inputs.x509_cert]]
  ## List certificate sources, support wildcard expands for files
  ## Prefix your entry with 'file://' if you intend to use relative paths
  sources = ["tcp://example.org:443", "https://influxdata.com:443",
            "udp://127.0.0.1:4433", "/etc/ssl/certs/ssl-cert-snakeoil.pem",
            "/etc/mycerts/*.mydomain.org.pem", "file:///path/to/*.pem"]

Synthetic Monitoring can be critical for identifying and resolving a production event before it causes a widespread customer impact – making it an essential safeguard against poor customer experiences and lost revenue. However, it doesn’t need to be expensive. Unsurprisingly, the open source community has risen to the occasion with tools such as Telegraph to make this business-critical function free.

Public Synthetic Monitoring using Logz.io Lambda solutions.

Using a Telegraf agent behind a firewall is a good option for those who need a quick and cheap option for Synthetic Monitoring.

However, this might be challenging when you run your apps on public infrastructure and your architecture is mostly serverless. An agent for Synthetic Monitoring needs a server to run on, and you need to monitor this server as well. To solve this problem, Logz.io released three Lambda solutions that are based on Golang to guarantee the best performance for serverless Synthetic Monitoring.

These functions collect and send data to your Logz.io account, which you can create here if you don’t already have one.

Monitor Site Availability

To monitor site availability, you can use our Lambda ping solution and deploy it in multiple regions for better coverage. The deployment is very easy – just press the “Launch stack” button and set the relevant parameters.

This will collect and send site reliability data to Logz.io. You can monitor your site availability with our prebuilt dashboard and alerts.

Monitor API Availability

For API availability you can use our API status Lambda, which, similar to Telegraf, supports multiple request methods, authentication and payload validation. This deployment is also very easy – just press the “Launch stack” button and fill in the relevant parameters.

This will collect and send API availability data to your Logz.io account. You can monitor your API availability with our prebuilt dashboard and alerts.

Monitor certificate availability

To monitor your public certificate you can use our x509 certificate Lambda

Here as well the deployment is very easy: just press the “Launch stack” button and fill in the relevant parameters.

This will collect and send certificate availability data to your Logz.io account. You can monitor your certificate availability with our prebuilt dashboard and alerts.

Try it yourself!

Logz.io’s Synthetic Monitoring is a new use case customers can implement with our Cloud-Native Observability Platform. Specifically, those who sign up for Infrastructure Monitoring will be able to monitor their synthetics data on customizable dashboards.

At Logz.io we live and breathe open source. Our entire stack is based on popular open source technologies: Opensearch, Opensearch dashboards for log storage and visualization, M3db for metrics storage (fully compatible with Prometheus), and Opensearch and Jaeger to store and visualize traces. That is why all our telemetry collection code is public as well.

You can fork it for your own use cases or use it to send data to Logz.io platform. To get started, sign up for a free trial here.