Note: Following the official nominations, we’ve removed both Ted Cruz and Bernie Sanders from our analysis and the dashboard.
There is now little doubt that the 2016 U.S. presidential election campaign has been one of the most controversial and heated campaigns to date. There are a number of reasons for this, including the candidates themselves (some more colorful than others), the rhetoric used by them, and the overall political atmosphere and circumstances in which the campaign is taking place.
With emotions running high, it’s no surprise that passionate political debate is rampant on all social media outlets. Thanks (or no thanks) to the digital social revolution, opinions that used to be bottled up within confined media channels or the four walls of one’s private life are out there for all to see.
Wouldn’t it be intriguing to get insights into this public sentiment?
We thought so. To see the overall trends in what people were saying about the candidates and the issues that matter to them, Logz.io created the 2016 U.S. Election Real-Time Dashboard by ingesting Twitter data into our machine-learning log analysis platform. The data was ingested into Elasticsearch and then displayed in a Kibana dashboard.
Here is one of the visualizations in the dashboard:
Aggregating more than 4.6 million tweets in one month and then analyzing them to ascertain where tweeps stand is a technological, social, and political exercise that should interest political pundits, the campaigns themselves, and, of course, techies such as ourselves.
After opening, just refresh the window to see updated metrics!
How did we gather the data?
Any Big Data analysis requires a fully secure, scalable, and reliable storage and indexing platform.
Our tool of choice is the ELK Stack (Elasticsearch, Logstash, Kibana) — the world’s most popular open-source log analysis platform. But instead of ingesting log files, we fed the system with tweets using Twitter’s streaming API. On top of the aggregated data, we created a series of graphic visualizations that best depict the Twitter trends.
A more technical description of how we executed the data aggregation and analysis will be explored in a future article.
What data did we analyze?
The graphs in the dashboard depict public sentiment, as expressed on Twitter, regarding the four leading presidential candidates (in alphabetical order): Hillary Clinton, Ted Cruz, Bernie Sanders, and Donald Trump. As we approach the general election and the nominated candidates are officially selected, we will update the dashboard.
The Legend in the dashboard explains the various indexes that we are measuring and how they are being measured. The selected keywords tracked include both the personal Twitter handles of these candidates and their most commonly-used hashtags. We cross-referenced the tweets with additional relevant keywords for each visualization.
For example, for the “Animosity Index,” we cross-referenced tweets to each of the candidates with the words “f— you” (we’re not kidding!).
The timeframe of the analysis is the prior seven days.
What data are we showing?
So, what information are you actually looking at?
- Mentions Over Time — The number of mentions over time
- The Lying Index — The percentage of times the words “lying” and “liar” are mentioned in conjunction with each candidate
- The Animosity Index — The percentage of times the words “f— you” are mentioned in conjunction with each candidate
- Top Election Topics — The percentages of tweets that are talking about a given topic. We selected the top five trending topics according to the Associated Press.
- The Honesty Index — The percentage of times the words “honest” and “honesty” are mentioned in conjunction with each candidate
- The Trump Geo Index — The locations of people who are mentioning either Trump’s name or his handle in tweets (due to Twitter API limitations, not all tweets are shown)
We’re always testing and refining our parameters to return the most accurate visualizations possible. Feel free to comment below with feedback on what you would like to see!
Everyone is entitled to their own interpretations, and hey — we’re not political experts. We just collect and analyze big data. But still, there are some pretty obvious conclusions one can ascertain:
- Donald Trump is the candidate most associated with both “honesty” and “lying,” likely meaning that his supporters view him as being the most honest while his opponents think he is a liar.
- The greatest numbers of tweets are almost always about Donald Trump, meaning that he is the most effective at generating publicity (or trolling, if you prefer). As a result, he also generates the most animosity.
- The top election issue that is being discussed on Twitter is immigration — which is Trump’s biggest issue — followed closely by the economy. Terrorism is a more-distant third.
What do you see in the data? We invite your comments below. And as the election continues through the ongoing news coverage, the conventions, and the debates, check our real-time dashboard to get more insights into what is occurring.
Editor’s note: Here is our follow-up technical documentation on how we built the dashboard.