elk stack and twitter

Yesterday, we shared the Logz.io 2016 U.S. Election Real-Time Dashboard, a series of Kibana visualizations that depict public sentiment towards the presidential candidates and the topics that are being hotly debated in the media. These visualizations were created based on data streamed via the Twitter API and then analyzed by Logz.io — our ELK as a service platform.

It’s no secret that the ELK Stack is the world’s most popular open-source log analysis platform. But a fact that is less well-known is that companies worldwide are using ELK to do a lot more than just log analysis. In fact, not a week goes by without us hearing stories from our customers about new use cases, whether for technical SEO, log-driven development, or business intelligence.

This article describes the technical story of how we created the election dashboard. Specifically, we will show how to ship Twitter data to Logz.io with the Logstash Twitter plugin and then create visualizations in Kibana. Please note that to follow the steps outlined here, you need both a Twitter and a Logz.io account.

Creating a Twitter App

To establish a connection with Twitter and extract data, we will need Twitter API keys. To get your hands on these keys, you will first need to create a Twitter app.

Go to the Twitter apps page, and create a new app. You will need to enter a name, description, and website URL for the app. Don’t worry about the particulars, your entries here will not affect how the data is shipped into Elasticsearch.

Once created, open the app’s Keys and Access Tokens tab, and click the button at the bottom of the page to generate a new access token:

twitter on elk stack

Keep this page open in your browser because we will need the data to set up the feed in Logstash.

Installing Logstash

Our next step is to install Logstash.

Logstash, the “L” in the “ELK Stack,” is used at the beginning of the log pipeline to ingest and collect logs before sending them on to Elasticsearch for indexing. Log analysis the most common use case, but any type of event can be forwarded into Logstash and parsed using plugins.

To install Logstash, first download and install the public signing key:

 

Then, add the repository definition to your /etc/apt/sources.list file:

 

Finally, update your system so the repository is ready for use, and install Logstash with:

 

Configuring Logstash

Now that Logstash is installed, we need to configure it to receive input from Twitter and then forward it to the Logz.io-hosted Elasticsearch.

Logstash configuration files are written in JSON-format and reside in /etc/logstash/conf.d. The configuration consists of three sections: inputs, filters, and outputs.

Let’s create a configuration file called ‘twitter.conf’:

 

First, enter the input:

 

Be sure to update the consumer_key, consumer_secret, oath_token, and oath_token_secret values with the values from the Twitter Stream App that was created in the first step.

You can choose any keyword you like, but you must maintain this specific syntax. For the U.S Elections dashboard, we used the most commonly used hashtags for the leading candidates in both parties (Clinton, Trump, Sanders, Cruz) as well as their Twitter handles, but you could, of course, enter any keyword that you want to track.

Next, enter a filter as follows (enter your Logz.io token in the relevant placeholder):

 

Last but not least, define the output to the Logz.io ELK Stack as follows:

 

If you’re shipping to a local instance of Elasticsearch, your Logstash configuration would look like this:

 

Once done, restart Logstash:

 

Data from Twitter should start showing up in the Kibana interface integrated into Logz.io almost immediately.

Logstash configuration options

There are additional configurations that you can apply to Logstash to tweak the Twitter input into Elasticsearch. For example, you can configure Logstash to exclude retweets using ignore_retweets and setting it to true:

 

All the various configuration options are available here.

Analyzing Trends

While we can query Elasticsearch as soon as data begins to appear, it’s best to allow the feed from Twitter to run for a day or two to have a larger pool of data from which to pull.

Searching

You can begin to use the Kibana integrated within the Logz.io user interface to search for the data you’re looking for. If you’re tracking public sentiment regarding your company’s brand, for example, you could query the brand name itself and check the correlation with sentimental expressions.

Querying options in Kibana are varied — you can start with a free-text search for a specific string or use a field-level search. Field-level searches allow you to search for specific values within a given field with the following search syntax:

 

For example, you could search all the ingested tweets for mentions of your brand using the Twitter ‘text’ field (this field represents the actual tweet text):

 

Or, you could try something slightly more advanced using logical statements or proximity searches. We cover these search options in this Kibana tutorial.

For the U.S Elections dashboard, we used free-text searches, regular expression searches, and logical statements to make correlations between the various candidates and specific sentiments.

For example, we used the following regular expression query to create the Animosity Index (the number of times that people tweet a certain phrase to each candidate):

Visualizing

Once you have narrowed the available down to the information that interests you, the next step is to create a graphical depiction of the data so that you can identify trends over time.

As an example, I’ll describe how we created the Mentions Over Time visualization, showing mentions of the four candidates over time.

For this visualization, we used the Line Chart visualization type.

Using the entire data pool as our search base, we configured the following settings:

  • Y-Axis – Count aggregation
  • X-Axis – Date Histogram using the @timestamp field on an hourly interval, together with a Split Lines with filters (trump, clinton, cruz, sanders)

visualizing twitter data on elk

Another example is creating a map depicting the geographic locations of tweeps.

Using a saved search, open the TileMap visualization and then select the Geo Coordinates bucket type and ‘coordinates.coordinates’ in the field drop-down:

geographic location of twitter users

It’s important to point out that the data ingested into Elasticsearch via the Twitter API is not 100% complete. Some fields have null values and the values of others depend on how the original tweets were composed. In this case, the ‘coordinates.coordinates’ field reflects Twitter users who used Twitter’s location feature. (Note: This last step requires some manual mapping. if you’re using Logz.io, please contact Support for help.)

These are just simple examples of what can be done with your Twitter data in Kibana. We would love to hear in the comments below what you thought of this article and what additional ways you are using the ELK Stack.

Visualize and Analyze Your Own Data with Logz.io