Forseer

7 min readJan 28, 2021

This project was part of “the Digital Shaper Program” hosted by the Techlabs Copenhagen in autumn semester 2020/2021.

Introduction

The idea started as something completely different. Initially, the idea was to create an algorithm that could predict the likelihood of a successful IPO to help investors find the right companies to invest in. This idea was quickly changed to something completely different since we understood how many competitors our algorithm would have. The new idea was something we thought would help the society in the horrible pandemic we are all in. We decided to create an algorithm that was able to predict how many people will be at a certain street at a certain time with the help of weather data. With this project, we hoped to be able to help avoid crowding and to minimize the risk of spreading Covid-19. The great thing about our idea is that we can apply the algorithm for different cases, non-related to Covid-19 crowd minimization. The algorithm uses footfall data from the street we have selected and weather data from that same location. If data from, for example, a clothing store were to be inserted into our algorithm, store managers would be able to see how many people will enter their store next week, and with that information, they would be able to set a schedule for their employees.

Methodology

For this project, we have been working with theories from different subfields. Both data science, AI, and web development skills were used to achieve the final target. A large dataset was cleaned by filling out the missing values and adding features. Moreover, the data was analyzed by plotting features against each other and the target variable to get an overview. The plots emphasized clear patterns, for instance between time and total people. We used that information to further develop our machine learning model. The prediction is based on a Random Forest Regressor with an 87.5% accuracy. The Random Forest method is an ensemble method that combines several Decision Trees where the feature and threshold attribute of each node is chosen based on the Gini index:

Gini=1-i=1Cpi2

In our preprocessing steps, we one-hot encoded our categorical features so that our final dataset only contains numeric values, ready for machine learning. The model’s hyperparameters were found through some basic for-loops. The method used is imported from scikit-learn. Finally, we could use scikit-learn’s built-in attribute for the Random Forest, “feature_importances_”, which are calculated based on the Gini index mentioned above. The attribute yields a list with the features and their percentage of importance, respectively. The list showed that hour, day, and month had the highest level of importance when it comes to our machine learning model. When the model was done, we started building a website to deploy the model on, ready for input and prediction online. We used “Streamlit” to build our website. In contrast to some of the popular packages such as Django and Flask, Streamlit is based on pure python code.

One of our biggest challenges faced in connection with getting user input was the preprocessing part. It was necessary to one-hot encode user input as it was done on the training data. We approached that by making our algorithm from scratch that could perform the task.

Project Results and Purposes

Our first results were retrieved from analyzing and visualizing some of the first data that we got access to when we finished cleaning the data. This included historical footfall data, based on York City, and weather data. At this point, we were able to get an idea of what influenced the number of people on a given street. The weather data showed us, as expected, that people would go out more when the weather is better when there is a lower wind speed and interesting when the weather is between 0–10 degrees. When the weather is below 0 degrees and above 10 degrees, the number of people on the street would decrease. The historical footfall data gave us an idea about what time during the day people would go out as well as what days during the week. The historical footfall data showed that people were most out between 10 am and 4 pm during the week, and that Fridays and Saturdays were the days with most people on the streets.

Figure 1: Correlation between temperature and total count Figure 2: Correlation between wind speed and total count

As we finished all our preprocessing steps, we moved towards finishing our model. And when the model was done, we were able to start working on the website to deploy our model. We were now able to predict the number of people that would be in York City given online user input.

Figure 2: Correlation between weekday and total count Figure 2: Correlation between time and total count

Solution and goals achieved

For now, our final product can be used without the user having to input weather data, but only time and place. Finally, our product will predict how many people will be in York City at a given time. Ideally, in the future, more locations could be added as we get access to more footfall data for our model to base its prediction on. This would increase the number of purposes to which our product can be applied. For now, we already want to give the user the idea that more locations are coming, as we now put ‘’Strøget, Copenhagen’’ in the scroll-down menu as another choice.

Our final product can be found here: http://techlabsproject.herokuapp.com

Purposes of our product

As already mentioned, the purposes of this prediction could be many, examples are:

Safety cases

Find the right time to go out and reduce the risk of getting a Covid infection
Find the right time to go out for dinner, shopping and using public transport
Help people with a psychological disease such as social anxiety
Adjust restrictions

Business cases

Predict demand for shops located in the area and optimize cost structure
Let shops know how many employees to have to work the specific day
How much to stock up for the specific day
Revenue prediction at a given day for future budgeting

Conclusion & Learning

There were multiple challenges that we faced throughout the project phase. The biggest one was to find a dataset to start working with. Even though a huge amount of data is available, the barriers to creating an iterative model and consecutively actual applications are quite high. After we found two useful datasets of weather and footfall data for York City the next challenge was to figure out how we could turn this into a useful project that would serve a social, environmental, or business purpose. These were the biggest obstacles at the very beginning of the project phase that we overcame together as a team.

During the development of Forster, there were a few issues to access and apply the footfall data. Examples include employing the pedestrian data from a government database, augmenting the weather data with metrology terms and units, and adapting the project to the current Covid situation. In all cases, our collective effort always helped us find a solution to the problem.

Along with the challenges there have been lots of highlights and learnings that we all take away from this experience. Most of the team, especially the data science people, have been exposed to this kind of experience for the first time, being highly curious but inexperienced. We successfully compensated for our lack of experience with lots of energy, creativity, and ambition.

On top of individually completing our respective tracks and studying data science with Python, artificial intelligence, or web development we learned how to cooperate as a team working towards a common goal and everyone contributing to the best of their abilities.

A most valuable learning experience for the team was to see the possibilities and opportunities of artificial technology, machine learning, and data science and how they are connected and related. Being part of TechLabs and seeing how an idea unfolds into a useful application from beginning to end opened this new area for us and motivates us to continue learning and developing our skills.

The Team

Artificial Intelligence

Shakir Maytham Shaker — (LinkedIn)

Data Science

Albert Lené — (LinkedIn)

Axel Persson — (LinkedIn)

Johann Schreiber — (LinkedIn)

Karoline Poulsen — (LinkedIn)

Max Fällström — (LinkedIn)

Sarah Nagel — (LinkedIn)

Web Development

Jana Michelke — (LinkedIn)

Mentors

Andrei Sabau

Johan Dybkjær-Knudsen

Github

GitHub Repository Link

For any further questions, feel free to reach out!