Seeing Like a Bike: Iterations for Better Data Quality

Our team has been refining the sensor boxes as we collect data. Our friendly colleagues volunteered to ride a bike, and each time, they came back with handful of data along with points that need to be fixed in hardware, software, and data itself. Even though technical challenges that come from physical shocks and vibrations can be hardly in perfection due to the nature of electrical parts used, many issues have been resolved through making small changes in a iterative way.

Try-and-Error: Data Collection and Sensor Box Refinement

Without help of our awesome colleagues, this would have not been possible.

Whenever there are LED indications of sensor malfunctions or wrong data found, we unpacked the box and examined the flaws. Minor (?) issues that we identified and fixed are as follows:

  • Occasional hiccups in the communication between the Pi and Arduino -> resolved by implementing the timeout and reset functionality.
  • Impedance issue: all of sudden, an Arduino board stoped working and sent out “NACK” signals, and never came back to normal after resetting the board -> By removing some amount of solders from our custom bridge, this could be resolved. Too much solder on PCB prevents weak signals coming in and out of Arduino due to the low impedance.
  • Cable order: some cables for the Sonar sensor were found that the order of pins was reversed. This did not raise an error, but data was wrong -> We examined all the cables, bridges, and sensor pins.
  • Broken wires: some cables looked fine by its appearance but we found that a wire was broken inside the socket. This can be prevented by using stronger cables and sockets (that are tenable to bike shocks) in the future.
  • Hardware errors: some Arduino boards, USB-to-TTL connectors, and sensors were found that they were damaged and out of order -> this is the hardest part to identify. After finding them, we had to replace these parts.

Gas Calibration Data Collected

With help of Raj, a Ph.D. student from the department of Environmental Science, we were able to co-locate our gas sensors at the official gas sensing station that is 10 minutes away from Georgia Tech. By comparing data between sensor data and official data from the sensing station, we expect to adjust gas sensors to some degree. Since the temporal resolution of the official data is one hour, it would be hard to adjust them very precisely. Even though, this would increase our gas sensor accuracy greatly.


Environmental Signatures and Ground Truth Data

If we can identify what objects are around the bike only by looking at the sensor data, it is possible to use sensor data for semantic-level analyses. Without guaranteeing the connection between the sensory data and real-world objects, modeling environmental factors using sensory data would be hardly convincing to audience due to the noisy nature of sensors. Our strategy to analyze the sensory data begins with creating semantic-level signatures and classifying each segment of streaming data from bikes. In order to do that, we recorded environmental information in videos and voices using GoPro and voice recorders. These qualitative data provides ground-truth information for the sensor data.

Based on the Level of Transportation Stress (LTS) model, we listed possible obstacles and objects in the biking routes. After aligning the GoPro video and sensor streams by time, we qualitatively tagged each segment of the video (only when the circumstance was not too complex). For example, when a vehicle passes by the bike and there is no other objects around in a video segment, we assume that its corresponding sensory data is a typical classifier for a vehicle passing-by. After a test riding in downtown Atlanta, we collected a ground-truth data.


Here are some examples for creating signatures: (1) a narrow street with cars parked in parallel, and (2) a city road with a car passing by the rider.


The temporal pattern of the corresponding Lidar data to this video segment is as follows:


Since the frequency of the proximity values might provide better indications for objects rather than the temporal pattern of it, we converted this signal into the frequency domain using the Discrete Cosine Transform.

This frequency signature can be used to classify similar environmental factors in the data. Similar to this, the case where a car is passing by the rider is as follows.

These two different cases show their unique patterns to some degree. The graph of a street with cars parked in parallel shows a regular change of Lidar values which resulted in a high middle-level frequency (around 4 to 7). Meanwhile,  the case where a car is passing by the rider shows a higher value in a low frequency (around 2-3) since the Lidar value changes radically at one time. Of course, these are exploratory signatures, and more ground-truth data and other sensors need to be aggregated/merged to provide robust signatures.

We are working on generating more ground-truth data. The classification performance for data segments depends on (1) the quality of signatures, (2) the quality of ground-truth data, and (3) the prediction model (feature engineering). We hope to finish the first-round classifications of sensory data in a few days with a high prediction performance.

We are reporting our final results at the DSSG final presentation on Monday (July 24th, 2017).