Greeting! It has been an awesome week working in the Data Science for Social Good internship program (DSSG). I am Pikmai Hui from University of Minnesota at Twin Cities. I am delighted for the chance to work with the 15 brilliant interns in DSSG to demonstrate the power of data science in public affairs.
I was born in Hong Kong, a city where one can experience a good mixture of western and eastern culture. I came to US four years ago in a high school exchange program in Kentucky. After I graduated I applied for undergraduate programs in the US at different colleges. I ended up enjoying the spectacular view of snow-covered street at Minneapolis for 3 years. The first year was great, because I had never seen that much snow in my life. Later on… well, the pain.
Why DSSG for Me?
You may wonder how I come to know an internship program at Atlanta. The answer is simple: the program is an exact fit for my interests! I major in Computer Science, Mathematics, and Physics. I like seeking correlations and building prediction models, which much overlaps with what is usually called data science. In this summer I have decided that I’d like to utilize my skills for something that matters. What matters most in a civilized society? The social good. If you add the two key words up and give it to Google, some magic happens and you find the “Data Science for Social Good” program.
Who do I work with in DSSG?
In the program I am working on the Community Court project with the Red Hook Community Justice Center. The center handles neighborhood low-level criminal cases from three police precincts that would be otherwise sent to Civil, Family, and Criminal Courts. Its contribution to community is clear, including increases in safety and public trust in the neighborhood. The center will provide the project team a set of anonymous records from the community court and other supports during the program.
My teammates are Aditi and Juwon. They are both civil engineers. Aditi is a PhD at GT proposing her thesis on bike route modelling, while Juwon has just got his second year of undergrad done at UIUC. In meetings, Aditi is always the smiling sun-shine and Juwon is always the ice breaker. There is no better team than mine (sorry to say this when my friends in other teams are reading).
What do I do in DSSG?
Our high-level project goal is to provide a benchmark method to evaluate the effectiveness of the correctional programs for young offenders in reducing recidivism. This requires a consistent definition of recidivism and a set of controlled sample data. In addition, we can look for correlations in the data between geographical, demographical, or even latent factors that contribute to juvenile recidivism. Interesting maps or trees can be built once the correlations are well established. Statistical analysis may then allow us to say something about how effective the correctional program is.
One of the hindering obstacles is that there is no commonly agreed definition of recidivism. The definition varies from state to state, from court to court. For example, some may consider being arrested after a conviction as recidivism, while other may define recidivism as being convicted after previous conviction.
Another big problem arises when we compare offenders who successfully complete the program and those who do not. We may be analyzing people with one personality in one group and completely different personality in another. For instance, the completer group may be more hard-working than the other group on average. Hard-workers gets long-term employment more often, hence they recidivate less frequently. This phenomenon is what we call “selection bias”. Selection bias makes our analysis less decisive in stating that the correctional program has any effect on the offenders or not. Of course, these lines of arguments apply to other properties, such as family structure, neighborhood quality, anxiety and self-esteem etc.
We are here to solve problems. We will try to unify the definition of recidivism in a way that makes the most sense in evaluation of correctional programs. In order to reduce the selection bias, we can perhaps cross reference the geographical information to turn the family location and neighborhoods into controlled variables. We will discuss on them this Friday or early next week.
This is it. Now we are rolling like a rock!