This is small project I whipped together by pulling information off Twitter's API during the 2011 State of the Union Address, and then running the data through a senitment analysis to gauge audience reaction throughout the speech. The result:
This graph represents the ratio of positive words to negative words over time during the SOTU.
For reference, Pres. Obama's speech began and ended at approximately 9:11PM and 10:13PM. To compare, here's a timeline of selected quotes and topics discussed during the speech:
- 21:18 - Economy, international competition
- 21:20 - Need for innovation.
- 21:24 - "This is our generation's Sputnik moment."
- 21:24 - Government investment in biotechnology, energy
- 21:26 - "I'm asking Congress to eliminate the billions in taxpayer dollars we currently give to oil companies."
- 21:28 - Education.
- 21:33 - "We've ended the unwarranted taxpayer subsidies that went to banks, and used the savings to make college affordable for millions of students."
- 21:36 - Immigration reform in regards to education.
- 21:36 - Tax reform.
- 21:45 - "If a bill comes to my desk with earmarks inside, I will veto it."
- 21:54 - Salmon joke.
- 21:59 - al-Qaeda.
- 22:01 - Foreign affairs, Obama's planned trips.
- 22:03 - Independence of South Sudan.
- 22:04 - Uprising in Tunisia.
- 22:06 - DADT, military recuiting on college campuses.
If you're interested in further comparing the data to the speech, I recommend the White House video of the SOTU on Youtube
, which has a full transciption timeline. Simply add 9 hours and 11 minutes to the video time to estimate the time in EST.
The data includes 240,000 tweets which use the term "sotu", "#sotu", "stateoftheunion" or "state of the union", sent between 9:00PM and 10:30PM EST on 1/25/2011.
Every tweet was compared to a list of words with predetermined sentiment scores (from the University of Pittsburgh's OpinionFinder subjectivity lexicon
). Each tweet was given two scores, one for the number of matched positive words and another for number of matched negative words. The scores were then summed up by 10 second time intervals, and a sentiment ratio (total positive words / total negative words) was generated for each 10s interval. The ratio is represented on the above graph as the gray line. The magenta line is a 60 second moving average, and the navy line is a 5 minute moving average.
This methodology is nearly identical to that of a study by Carnegie Mellon (O’Connor et al. 2010
), which was featured on Mashable
and other blogs. As noted in that study, there are lexicons that may work much better with Twitter, as this one was developed for proper English, which is something of a rarity on Twitter. ("#zomg4reels.")
Again, this isn't an extremely rigorous peer-reviewed scientific analysis. Although I followed a very similar methodology to the CMU study, I wouldn't advise using the results to make political decisions or impactful statements about how the American public feels about certain political issues.
Feel free to drop me any questions or comments at
mark [dot] lemunyon
gmail [dot] com