This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (winter term 2020/21).
The goal of our TechLabs Journey was to gain insight into the quality of life in German cities by creating indicators and comparing them. We analyzed data from eleven different cities over a time span from 2008 to 2018, looking for potential relationships between variables and possible indicators.
We used data from the “Bertelsmann Stiftung” (Bertelsmann Foundation), which provided us with many different variables to measure the quality of life in the different municipalities over the time period from 2008 to 2018. The goal of our TechLabs Journey was to determine different factors indicating the quality of life that could be applied to different cities in Germany.
The ideal outcome of our journey would have been to determine an indicator, obtained from our different variables to find the most liveable city in Germany. With these variables, it was the goal to find out which aspects of urban life would need to be boosted in order to improve the overall livability of a city.
The most useful part of the Datacamp course was probably the lessons on R’s ggplot2 package. Data visualisation with ggplot made it much easier to understand which relationships were existent or not existent in the data. Handling missing data and cleaning the dataset were also important parts of the course that helped us in the beginning of our DataScience journey.
But Datacamp and Kaggle Practices were usually only the starting point when operating with our own data set and doing the analyses we had in mind. A lot of googling and scrolling through Stackoverflow and other data forums were definitely necessary for us to get R to do what you want. Combating error messages and tolerating some frustration and general mental question marks are also part of the learning process.
Working on one R script with four people was another challenge we did not anticipate beforehand. GitHub is a useful tool, once you know how to use it well, but with four people who did not know each other beforehand, working remotely with varying schedules, it was quite challenging to simultaneously work on the project and solve problems.
Our first milestone was definitely the creation of our central idea and “research question”. With different study and personal backgrounds, we all had many ideas and finally, a variety of good options to choose from. The next milestone and therefore the first part of our solution was the finding, cleaning and shaping of our dataset. We learned that it is not always better to have a bigger dataset. Our data set, for example, included many unknowns which forced us to delete variables that had too much missing data. Hence, we had to shrink our data set quite a bit in order for our analysis to be informative, leaving only a fraction of possible indicators in our analyses.
Our final analyses gave us insight into relationships between several urban indicators, as shown in the pictures. Our goal was to create or find a usable indicator for the quality of life in a city. This guided us through the entire process but was more difficult than we had anticipated. We attempted several strategies to find such an indicator. The data source already clustered variables into categories but none were explicitly designed to measure the quality of life. To reduce the complexity of the data we planned a factor analysis (FA) on the variables that were used to indicate the 17 Sustainable Development Goals (SDGs). In the preparation of the factor analysis we did meet some obstacles as not all indicators were present, sufficient data available and the data-usability criteria such as the “Bartlett-Test” and “Kaiser-Meyer-Olkin criterion” not satisfactory. Emerging from the preparatory steps of the FA, we did find some interesting correlations though, that we visualized with ggplot2. In the last weeks, Clara also invested time into the basics of supervised machine learning. To apply these to our specific data we didn’t have enough time, but understanding the theoretical mechanisms behind it was nonetheless a valuable take-away from the journey.
The potential use and impacts of our project would be the comparison between cities depending on different indicators, for example when moving to a new city or finding the most sustainable city to work in. Further insight into relationships between the SDGs or other indicators can be gained when examining the data.
What are potential uses of analyzing one specific aspect of life quality over another or which goals to a better world might actually be negatively related? Some results might not be very surprising and you might think “well, of course, I would have assumed that more women work if more kids are in daycare”, but it is a much more powerful conclusion when supported and visualized with data and shows that gender equality is still an issue to consider.
Besides learning new skills such as project management, R-coding, and working as a team with “strangers”, we had a valuable experience and learned to act and communicate flexibly as a team. We started out as a team of six and ended up as a team of four, which demanded a lot of flexibility from us in project management.
Last but not least we would like to thank the TechLabs Team, especially Lisa for this incredible and incomparable experience including the events, support, motivation and so much more!