RatWatch: Week 6

We are nearing the end of our data collection period and have a total of about 60 reports now. Our plans for this and next week are to analyze the usage of the app and how people are interacting with the questions. An initial look at the database and text message exchanges with users shows that there are some inputs that we were not necessarily expecting, such as free-form descriptions of the rats and typing the name of the option instead of the number associated with it. Once we take a deeper look at the reports we have gathered so far, we will be able to make the necessary improvements to the app. In addition, we are also close to finishing our webpage! We are currently in the process of prototyping some designs, but we are very close to deployment and we cannot wait to show you what we’ve madeImage result for web design

Work on the modeling side is currently based upon improving the model that predicts the baseline probability of seeing rats. The city of Atlanta is divided up into a grid of smaller squares, and for each square the total count of rat sightings over time, intersection areas of different environmental layers, and counts of restaurants is computed. Currently, we are testing different models such as a poisson regression, zero-inflated poisson regression, and generalized boosted regression model to see which provides the most reasonable and accurate predictions.

Seeing Like a Bike: Week 6

Last week, our tasks for the week was very straightforward. All we needed to do was simply to start collecting data on our designated route in Piedmont Park. On Monday, we got the Air Quality sensors set up with the Raspberry Pi and Ardiuno, which allowed us to collect data from the sensors every three seconds. On Tuesday, we spent most of our time testing our data collection process, by collecting test runs with both the PMS Air Quality sensors, and the GRIMM, simultaneously, and comparing the two values. The PMS sensors gives us data for both PM values, and particle counts. However, the PM values, are heavily rounded, so they are not as useful for our purposes, and we decided to stick with particle counts for now.

For particle counts, both sensors data with values of 0.3um, 0.5um, 1um, 5um, etc. However, our task was made much more difficult, as the PMS gives us counts less than each value, and the GRIMM gives us counts greater than each value. For example, at 0.5um, the PMS gives us the number of particles smaller than 0.5um, but the GRIMM gives us the number of particles greater than 0.5um. This makes our comparison much harder, and less straightforward. Further complicating the issue, the PMS sensors does not give very consistent values from second to second, meaning that we need to likely aggregate data specifically depending on the type of route. Additionally, the two PMS sensors we have are not very consistent with each other, with values being off by as much as 25%. Overall, these factors will require more careful analysis after our initial data collection and calibration.

On Wednesday, we focused on getting multiple sensors connected to a single Raspberry Pi. Initially, our goal was to start data collection today, but we were still waiting on various bike parts from Amazon, so we decided to work on a related and useful task. Doing this would allow us to combine and aggregate double the amount of data on a single run, giving us more accurate data. We spent all day, running into the late evening working on this task. However due to inconsistencies with the type of text the sensors send, we were never able to have multiple sensors connected to one Pi, where our code could collect data from both sensors at startup, meaning the code would run by itself, without any human input. This was especially important, as when we are out in the field doing data, we won’t have access to a keyboard or mouse to start the program manually.

Thursday was the big day, as we came out bright and early, at 7am, and went out to Piedmont Park with a number of various types of bikes to test air quality. We made 9 bike runs, of 2.5 miles each. The route we selected had a number of diverse environments, including parks, suburban streets, urban side-roads, and major arterial with heavy construction and traffic. It was a beautiful day outside, and we had fun going around the city to get data. We also borrowed Michael, who helped us gather eye-tracking data, along with Urvi, as all of the others on our team wear glasses, which don’t work with the eye-tracking visors. Urvi and Michael each collected data on three runs. The rest of us alternated going around on the bikes collecting pollution data, so in the end, each of us went on 2-4 rides. In the meantime, we got to relax, and explore Piedmont Park!

However, once we got back, we discovered that we had made a huge mistake in our data collection code. Between each run, we needed to reboot the Raspberry Pi by unplugging and replugging it from our battery. However, because this was our first time in the field, we completely forgot about this crucial step. As a result, no pollution data was collected on this day, as our script was never run correctly. Out of the six runs with the eye-tracking visors, we were able to gather data from three of them, and the other three’s data was mysteriously corrupted. Coincidentally, all three corrupted runs were done with Michael.

On Friday, most of us took the day off, for a number of unrelated personal issues. Urvi was able to complete her research training to officially become a part of the research team. April and I were able to come in for a few hours around 4pm to finalize some of the mistakes we made on Thursday. I updated the code to create files in a different format, making data analysis simpler. We planned on doing additional test runs on Sunday to make up for the disappointing results from Thursday, but this plan was later scrapped.

Today, we went out again this morning, and collected three runs of pollution data. After these three runs, it started to rain, and we had to end our trials early. We came back to the lab, with successful data results from each run! However, as we are the bike team, more issues came up. The Raspberry Pi does not contain an onboard real-time clock, meaning that when the Pi is powered off, time isn’t kept, and it appears that time is even reset. To fix this, we have purchased real-time clocks online, and when the arrive, we will connect them to the Pi.

For the next week, our plan is to integrate the real-time clocks to our setup, and start preliminary data analysis from our results today. We will not have access to the GRIMM for the rest of the week, so we will not be able to gather any more data, but by the end of this week, we will have a robust data-collection setup, as well a solid understanding of the data we are working with. From there, it should hopefully be much easier to proceed with the project, and have meaningful results by the end of the summer

Atlanta Map Room: Week 5 (part 2)

Muniba: This week, I continued sifting through code shared to us from the St. Louis Map Room in the hopes that it could be repurposed for our use in the Atlanta Map Room. However I soon realized that with current issues in their applications (like the reliance on Mapzen, which has since been shut down) as well as the considerable modifications necessary for our concept, it may be in our best interest to create our own, simpler, proof of concept application. In considering which mapping API to use, I ultimately moved forward with MapBox GL, because it supports map rotation, which is important to creating dynamic visualizations of the BeltLine. With the projector (relatively) set up, I had the opportunity to experiment with different base maps to see what displays best for our use. I created a basic map in MapBox GL with toggle-able data and BeltLine outline (below). In the upcoming week, I’m planning to continue developing this interface with new types of data, and create an interface for selecting a portion of the map.

 

 

 

 

 

Atlanta Map Room: Week 5

Annabel: This week I focused on obtaining the datasets to use for our layers – I’m working with data from Trees Atlanta, building permits, as well as demographics, for City of Atlanta, and restaurants around the BeltLine, as well as the tax assessment data I started work on last week. My main focus right now is geolocating the data – much of it has incomplete addresses and there’s a fairly large number of data points – and working with the APIs for both geolocation and Google Places, to get the restaurant locations. My struggles with the APIs right now generally relate to working with the limits of the APIs, especially in terms of numbers of queries made because there’s so much data. I’m also starting to think about how to visualize these data points in more unified ways – for example, there are thousands of building permits and I’m trying to think of ways to show evolution of the points in the dataset over time, which is challenging when there are multiple factors at play but only realistically one point on the map.

Featured, my inspiration for the moment (Yanni’s suggestion) for the demographic data is one of László Moholy-Nagy’s abstract paintings –

Image result for moholy nagy abstract paintings

Image credit: Wiki Art

Electric Vehicle Infrastructure: Week 5

This week we focused on two major items: finishing our survey IRB, and continu make progress on sentiment analysis. We’ve gotten a lot done, and are happy to have submitted our IRB, along with our survey! After long discussions, we have come to the conclusion that we will most likely have to proceed with either the nudge experiment– aimed at detecting bias against electric vehicle owners– or with the topic classification. This will primarily depend on how well the topic classification can be done, which we are still investigating. At this point we are waiting to hear back from IRB, and also from a large panel of EV drivers, to see if we will be able to use them as a data source.

 

We’ve made a lot of progress with sentiment analysis this week. We’ve spent a lot of time learning about different kinds of neural nets, and have done a lot of literature review to determine which models are likely to work best for our particular problem. Currently, we have trained a recurrent neural net using the Keras API with TensorFlow. It seems to be out-performing our past bag-of-words approach, and we are currently investigating it further to determine just how well it works, and how we can improve upon it. One of the methods we are planning to use to improve this performance is to use Bayesian optimization for hyper-parameter tuning of our model. We will also be constructing a convolutional neural network and seeing which one is better fit to accomplish our sentiment task. While we’ve been improving our sentiment classification, we’ve also started doing analyses of sentiment based on our SVM’s predictions. The idea here is that we can work on both aspects of the problem at the same time, and once our sentiment predictions are improved, we’ll update the inputs to our analyses and see a more accurate analysis of the EV infrastructure.

 

We’re very excited to get the survey kicked off, and to see some results from our sentiment analysis!

RatWatch: Week 5

As of Friday last week, the chatbot has been deployed and is now fully functional. 5 days into the collection period, and we already have about 30 reports. However, most of them are on the eastside of Atlanta, not the west. This has prompted us to rethink our marketing strategy on the westside in order to get more reports from the area. We plan on enhancing our advertising efforts for the project on the westside over the next couple of days in order to maximize the number of reports. In the meantime, we are continuing to monitor the reports being made in order to address issues with the software. We are also working very diligently to make the app even more useful to the community by providing visual maps and statistical information about the reports we are gathering. This information will be viewable on our new website in the following weeks, so stay tuned for more informationImage result for rat funny cartoon

In addition, historical rat sighting information, code violations, and other environmental data are being used to generate a model to help identify key areas that may be especially prone to rats. After geocoding this data, we were able to compute the intersection areas between the buffers and other environmental layers, create random dummy samples across the city of Atlanta, and derive a multivariate logistic regression model to assess which features had the most importance. Currently, the model includes land use, restaurants, and bodies of water, with plans to incorporate real estate, census data, and tree cover. According to the current model, high and multi-residential land, as well as restaurants, are associated with higher log odds of a rat sighting. This makes sense, although we have to look further to make sure the two are not confounding variables (denser residential areas may have more restaurants).

Seeing Like a Bike: Week 5

During the last week, we mainly focused on getting the air quality sensor, the PMS5003, to send data every second to the Raspberry Pi via a custom wifi chip, the ESP8266. The ESP8266 technically runs Ardunio code, so this should have been fairly straightforward, but it definitely did not end up this way.

We were given firmware directly from Purple Air, so Nic worked on setting up the files on our devices. On the other hand, I worked on using existing code from the internet which claimed to run on the ESP8266. After working on these tasks for four days, we had not made any significant progress. We consulted with April and Chris on this, and decided to make the drastic step of taking apart the Purple Air, and hooking the PMS5003 directly up to an Arduino Uno, instead of the ESP8266.

However, the Purple Air uses very non-standard wiring connectors, so we had to wire the PMS5003 to the Ardunio by finding spare wires lying around the lab, connecting them by simply sticking the wires into the pins of the PMS and Arduino, and using electrical tape to keep everything together. After uploading some simple code we found on the internet to the Ardunio, we plugged in the sensor, opened the serial monitor on my laptop, and found 3-second resolution readings! While we would have ideally had 1-second resolution instead, after dealing with 80-second resolution for the last three weeks, 3-second resolution was a godsend. When we attempted to replicate these results with the Raspberry Pi, instead of my laptop, the wiring came undone, and we couldn’t get it working for the rest of the day.

This morning, Nic came in early, and rewired the entire system using a different set of wires, with slightly thicker pins. This proved to be much more sturdy, as it stayed in for much longer. Throughout our work today, the wiring only came undone once, and it was a much more easy fix than our work on Friday. However, for our actual placements on the bike, we will need a vastly better, and more permanent solution, as biking in the real world can be pretty bumpy.

With the new wiring, we were able to get the system running fairly quickly on the Pi, and after writing some basic Python code, we could translate the code from the serial monitor to a CSV file to compare to the CSV file from the GRIMM, our research-grade air quality sensor. Our next steps for the week are to compare the data generated from our two sensors, and start calibrating the data from the sensors on our test run to Piedmont Park.

As an aside, we added a new member to our team Friday morning. Welcome to Urvi Latnekar, a Computer Science at Bennett University near Delhi, India! She will be working with us for the rest of the summer, and brings to the team her experience working with Arduinos, and air pollution data in the farmlands of India.

 

Map Room: Week 4 (part 2)

We worked on two different components of the Map Room this week –

Muniba: I primarily considered the design of the interface to enable map room users to choose and project different areas of the BeltLine and data layers. Initially, we hoped to unwind the BeltLine and create a flat, strip map so the user could draw the path as one would walk it. However after further consideration, we decided to instead enable the user to select a rectangular area of a fixed size to zoom in on and map. After looking into different libraries, I ultimately decided to move forward with the MapBox API and p5 for creating maps and drawing, respectively. To display a user’s selection of the map, we will need the coordinates of the central point, the window’s dimensions, and the rotational angle of the rectangular box. After considering these design questions and potential tools, I started looking at the repositories shared with us from the St. Louis Map Room for their projection interface.

Image – Example of a rotated map of the BeltLine I created using the MapBox API.

Map Room: Week 4

We worked on two different components of the Map Room this week –

Annabel: My focus this week has been creating a prototype of the Map Room, to model the interaction between the participant input layer and the historical/social commentary data we’re overlaying it with via the projector. I’m using the tax assessment data from 2010-2017 in Fulton County for the prototype, focusing on the recently completed section of the Southwest trail, near Adair Park. Cleaning the dataset was a big chunk of my week. I’ve also been working on a guide to the tax assessment data – to cover the questions of obtaining the data, standards I’ve used working with it, and to address a variety of ethical questions related to the dataset – and I’m pulling some of the most pertinent information from that to contextualize these projected points for visitors to the Map Room. I’m doing this by creating two context panels, one above and one below the projected layer, to discuss these considerations. My current plan is to explore the data over the collection era in a timeline above and an explanation of the appeals system, especially the lack of transparency based on access level, below. 

(the prototype)

Electric Vehicle Infrastructure: Week 4

While week three was filled with quick wins, week four has been a slow trawl to make progress on critical objectives. A lot of time this week was focused on realignment with our faculty mentor, Professor Asensio, on what was necessary for our review categorization ML training set. After long discussions, we’ve decided to pivot away from using MTurk to utilizing two different tools: Qualtrics and PlugInsights. Qualtrics offers a crowdsourcing platform that higher participant demographic fidelity in comparison to MTurk. PlugInsights is a spin-off crowdsourcing platform from PlugShare, the company that provided us with the original dataset. The bonus of PlugInsights is all participants from the platform are EV drivers, immediately lending them credibility in understanding the nuance in the reviews we would ask them to classify.

In terms of sentiment analysis, after additional feature augmentation and hyper-parameter tuning, we’ve reached the peak of feasible performance with SVMs. Time was spent this week exploring neural network based learning algorithms and understanding how one could be properly implemented for our domain specific problem.

We also tried utilizing an SVM for the review classification problem on a small training set of 1,300 reviews. We reached around ~50% accuracy, which is reasonable considering the difficulties of multi-label data and the size of our training set, but here again, we’ve decided to look towards other methods. We’re excited to see where our new plans will lead us!