This week, along with the mid-program presentation, the team members were invited for a tour of the Atlanta Food Bank. Other than the advanced operations of the warehouse facilities, what caught my attention was that everyone who was working there was very passionate about what they were doing. Lauren Waits, one of our main stakeholders who is director of governmental affairs at ACFB, gave us a tour around the facility and not only she knew almost all the people working within the warehouse and within the office, she would talk to and ask about how they are. Some of the people explained what they did as they worked there, and the way they spoke was very passionate and informing to us although the tour seemed to be very informal. As information about the food bank, especially Atlanta Community Food Bank, was explained by previous blog post, fighting against food insecurity came to me as an issue that shouldn’t be overlooked and how our tool can impact our stakeholders in helping them with their media agenda.
To touch upon our work, our team has been working with three major things: sentiment analysis and text mining visual/census data and map overlays/legislators’ voting record visual. Because we had our mid-program presentation this week, we focused more on making the presentation slides as well as a poster. This process helped us to determine what to focus on moving forward. With less than 4 weeks left the program, our team hopes to achieve finalizing the tool first before we move forward to add anything else to the tool.
To talk about my personal part of the project which is sentiment analysis and topic modeling, beyond the general idea of weighting and aggregating the sentiment scores of the articles, it seems very difficult to generate an equation or formula that would even somewhat show a correspondence with our gut-estimation of public sentiment. This approach is top-down where the possible factors that affect public opinion are guessed to be traffic, readability, and the media bias of the websites. Because of these road blocks, I’m planning on doing more literature reviews on social science paper to see any relevant information on public sentiment estimation in the coming week. This would be especially beneficial as the paper for Bloomberg Data Science for Social Good is due on July 8th.
On the side with Topic Modeling, also a clever tool or aggregation method must be used to make sense of the data. LDA, Tf-idf, n-gram and NER (all mentioned in previous blog post) give a lot of information about which words are (supposedly) more relevant to the text data. However, due to an inundated amount of words and text, it is difficult to weight the words from each of the articles. Using a similar aggregation technique as sentiments, these words will also be weighted to quickly visualize keywords being used within the document.
In making this tool more sustainable for the user, automation of the article gathering, cleaning, and analyzing would be our next step to see the trend of sentiment on individual news outlets as well as the locations. After the paper submission, I will be focusing on making the tool much more user-friendly so that it would be ready to be deployed as the end of the program nears.