Experiments

Google’s recent release of their Cloud Platform Natural Language API was well-timed with the 2016 Republican National Convention and Democratic National Convention. Curious as we are, we took a day internally to learn more about the API and see what we could find through speech sentiment analysis of the Republicans in Cleveland and Democrats in Philadelphia.

View more Experiments

The Problem

It’s pretty evident that our team at Wildebeest is very interested in various Natural Language Processing technologies. So interested, that we built our own called Better Status. When we heard about the new Google Cloud Platform Natural Language API that has endpoints to extract entities and to analyze sentiment, we had to see how it compared. Hopes were high.

Our Solution

After analyzing the sentiment of Republican speeches of Donald Trump, Mike Pence, Paul Ryan, Rudy Giuliana, and Melania Trump, we knew we were on the right track. The missing piece was how to make this data interesting. Our idea was to mash-up these two end-points–to first extract entities (people, places, things), then analyze the sentences they’re contained in to try to find some contextual speech sentiment analysis.

Once the Democrats got their chance, it all came together. We created a bubble chart that shows entities as bubbles, their diameter to represent number of mentions, their Y-axis position to show its sentiment, and its X-axis position to show how salient that topic was to the entire speech. Keep in mind, this was very much an experiment, so some liberties were taken. For example in nearly every speech, the entities “United States” and “Americans” were outliers that would skew the entire chart to the point of being unreadable. Additionally, we found that the words “Trump” and “Hillary” were returning as inherently negative sentiment using the Google Cloud Platform’s Natural Language API, but that’s a whole nother post…

Within the results, we found some similarities and several differences, as well. The biggest challenge we found was normalizing the data. With such long speeches, it was difficult to find much salience between topics and the overall speech. It was also tough to find much variation in sentiment using their API. Most of the speeches were coming back as completely positive or completely negative (“+1” or “-1”) without much gray area at all. We opted after a significant amount of experimentation that in the spirit of the experiment, we should use our own sentiment API, Better Status, to get the range of sentiment we were looking for.