We’ve made a lot of progress since last week. Most of our work has been devoted to sentiment analysis and network analysis.
Initially, to examine themes in SNAP/food stamp coverage, we scraped articles from the past month that included the words “food stamp” or “food stamps” in the title and calculated how often each stemmed word appeared. To initially visualize this information, we made word clouds. The word cloud below shows the most common words in conservative articles about food stamps:
In order to further analyze the content of the news articles and social media posts that we’ve scraped, we’re doing sentiment analysis on the text. To do this, we examined various metrics about these texts such as the complexity of the words, the reading level, the punctuation, and whether the sentences in the articles are positive or negative.
To quantify the tone of the articles, we used Vader, a sentiment analysis metric from the Natural Language Toolkit, as well as AFINN, another sentiment metric. For Vader, sentences can range from -1 (negative) to 1 (positive). For AFINN, sentences can range from -5 to 5. For both metrics, a score above 0 indicates the the sentence has a positive sentiment. We placed each article in a category (eg, Economy, Opinion, News, etc) and found the average. The graph below shows the average total AFINN score vs. the average total vader score. The size of the bubble reflects the number of articles.
Interestingly, the Vader score suggests that all article categories had a positive sentiment (all > 0), while the AFINN score suggests that only the local, international, and opinion categories were positive, on average.
When we talked with the food bank last week, they expressed interest in an analysis of of how Georgia politicians speak about SNAP on twitter. Our research suggest that Georgia politicians do not speak on the issue frequently enough for us to have sufficient data to analyze. Instead, we are considering creating a visualization that tracks how representatives have voted on legislation regarding SNAP. We are doing this using the Open States API, which has data on bills, legislators, and events in state governments, and ProPublica Congress API, which has national data.
We meet with the Atlanta Community Food Bank again tomorrow, and will consult with them to better understand how they currently follow food policy how we could use these APIs to analyze and present this information for them.
Another strategy we are using to analyze news articles is by doing a Term Frequency- Inverse Document Frequency network analysis in gephi on Washington Post articles about food stamps. In the above graph, the bigger, darker circles are more connected. Words are connected if they appear in the same sentence. We were unsure why Perdue and Southerland were so connected. After researching these names, we learned that Steve Southerland is a Florida congressman who wants to impose work requirements on those who get SNAP, and Sunny Perdue is the Secretary of Agriculture.
Going forward, we hope to do a more granular sentiment analysis that can help us to extract arguments from our text. We also need to clean data that we’ve scraped from facebook, and we are also starting to learn about how Google Trends can be a useful tool to us going forward.