With ten weeks’ busy work, we successfully completed our work for the UN Data for Climate Action Challenge. Here is a wrap up of the research questions we focus on, and the solutions for the problems.
The background is that climate change has the potential to raise the risk of flood for coastal countries, like those of Senegal. Given the large proportion of unpaved road in Senegal, flood risk could damage the road network and affect the accessibility of residents. Given the condition that in Africa countries there is a deficit funding for the infrastructure development, it is critical to identify which roads should be prioritized, to prepare for the possible damages brought by climate change. We propose two steps to identify the roads that should be prioritized.
First, we need to evaluate the probability of flood risk for the areas where roads go through under climate change. To achieve this, we build a flood risk model based on the topographic features and historical weather data for the area we study.
The second step is to analyze the contribution of each road segment to regional connectivity. Roads that are critical to accessibility and under flood risk should be prioritized for weatherproofing.
Applying optimization techniques, we can then determine explicit plans for allocating road maintenance funds. Multiple sustainable development objectives can be explored within this framework, such as maximizing rural connectivity or minimizing the expected number of people isolated due to flooding. This approach has the potential to minimize the long-term cost of establishing a reliable road network while helping to buffer vulnerable populations from extreme weather events.
Flood Risk Prediction Model
For flood risk prediction, we have collected data from multiple sources, for example, flooding maps of Senegal from NASA, daily weather data from NOAA, land cover data from the Food and Agriculture Organization of the UN, and different types of maps from Open Street Map. With the rich information in topography, hydrology, and weather etc. we are able to build machine learning models to evaluate the flood risk for the 1km*1km analysis unit. The bellowing framework shows features we use, the targets, the algorithms we use to build models, and evaluation methods.
There is also a critical step to join the target flooding area and all the features so that they are at the same spatial scales. For raster files, we mainly use the zonal statistics method to get values for each grid cell. For land cover and water area data, we calculate the intersection area of feature polygons to each grid cell. For daily weather, we use the weighted average, where the weights are determined by the distance of the grid cell to two weather stations.
Firstly, we train regression models and use the proportion of flooding in each grid cell during each biweekly time period as the target. We choose three machine-learning models to train on the data: Support Vector Machines (SVM), Random Forest (RF) and XGBoost. The best RF model achieves promising performance, with an R-squared (how close the data fit to the regression line) of about 0.7056 in test set, and a root mean square error (RMSE) of about 0.1041. The top 10 important features of the model show that the dynamic historical weather features affect the flooding area change, especially the historical temperature and precipitation.
However, the regression results do not reflect how adversely the road going through this area may be affected by the flood. This is a challenging idea to quantify, as the flooding area change of a grid cell is not directly related to the probability of road becoming flooded. Therefore, we set a threshold to determine whether the grid cell is flooded or not at a particular biweekly time period, and turn it into a classification problem. Each sample is labeled as flooded or not based on the percentage of flooding areas in this grid cell. For conservative consideration, the threshold is set as 0.5, which means that if a grid cell has 50% area flooded during a biweekly time period, this sample is labeled as flooded, vise versa. In the table shows the model evaluation and performance on test dataset.
A visualization of the historical flood risk map and the predicted map shows that we can precisely capture areas with high flooding risk such as #1, #2, and #3. Meanwhile, for some historical low flood risk areas (#4), our predicted model can overestimate the flood risk. Such areas may get flooded not that frequent in the past, but probably have a risk of getting flooded in the future, according to our model. The predicted results help to offer suggestive information for the future preparation.
Road Network Optimization
We use the telecommunication data from Orange to estimate the traffic flow in road segments. We first began by generating the Voronoi of the cellular network towers by computing the Delaunay Trian-gulation of each tower and assigning road intersections to each Voronoi region. We then began assigning population flow to the edges by checking if a user was in transition. We say a user is in transition if the tower corresponding to their cell phone use changed from one time stamp to the next. If a user is in transition, we calculate the shortest path between two randomly chosen roads corresponding to the origin and destination region. After the path is calculated, we increment the population of the edges in the path by one for the date of the destination’s time stamp.
The second task was to determine which edges in our graph were at most risk of being flooded. Using the 14 days composite flood map from NASA, we calculate the amount of flooding in a road at a particular time period. This is calculated by the sum of the areas that are flooded in one road segment at a specific time period. We then divide this sum by the length of the entire road segment. The assumption is that if a road segment is frequently flooded, or a large pro-portion of the road is flooded, then this road segment has a higher risk of being broken. Therefore, we define the flood risk of a road as the sum of flooded proportions over all the time periods. The third task was to determine overall importance of each road segment and make repair or preemptive fortifications based on the value of the road. We define road importance as how much of an impact its removal may have on accessibility to the surrounding regions. This is computed by finding the distance traveled by all inhabitants on two separate paths, and taking their difference. The first path is the original intact path. The second is the alternate route taken if one of the roads in the original path is damaged. We take the difference between the second path and the first path. That is, the bigger the difference is, the worse the new route is, and thus the more impact on accessibility the flooding of the chosen road will have. We calculated importance of the top 20 riskiest roads.
In conclusion, we solve the road optimization problem by building a flood risk model, evaluating the road traffic based on mobility behaviors extracted from cell phone records, and combining these two to assess the road importance. Hope these models can help decision makers to make more efficient strategies regarding to climate mitigation for transportation.
We thank our mentor Bistra Dilkina, Caleb Robinson, and Amrita Gupta for useful advice.