So we are at the halfway point for the program and thought we would make a quick update.
Data: We are pleased to announce that we have finally secured data! We can now begin analyzing it and working to understand migration patterns according to WiFi data. In more detail, we currently have authentication information for anonymized users in the CULC building for the first days of June. This gives us the anonymized user ID, the time and date of authentication, and the MAC and IP addresses for both the router and user device.
IRB: Despite the fact that our data contains no personally identifiable information, it might be possible (in theory) to deduce a user’s identity. Therefore, to be safe we submitted a preliminary draft of an institutional review board (IRB) for approval. This details our methods of data collection and intended uses in order to ensure the highest level of transparency possible.
Tools: We been extensively using Excel for initial data analysis. However, over the last two days we have been more extensively using Python and R to develop models for the data. We will also be using several data mining tools like Weka (http://www.cs.waikato.ac.nz/ml/weka/) to build graphical models.
Moving Forward: Now that we have the data and have it in a manageable form, we intend to first cluster the data by access point. That is, we noticed that a key challenge presented by the data is that a device may be connected to an access point in another room or floor. This means that the number of people in a room may not necessarily reflect the number of devices in the room. To solve this problem, we plan to cluster the access points into groups which will hopefully be more indicative of a person’s occupancy zone. Another approach would be to group paths together instead of access points. For example in the CULC their is a Starbucks. Under this model, the paths of people going to buy coffee would be grouped together even if their devices connected to access points in other areas.