A new Hope for Personenfernverkehr!

This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (winter term 2020/21).

Abstract

Anyone who travels frequently with Deutsche Bahn (DB) is probably familiar with this scenario: You access DB’s ticket booking platform, enter all the desired filter parameters and choose the route that best fits your requirements. So far so good.

  • used the libraries NumPy, Pandas, and scikit-learn for data cleaning and preparation.
  • used Plotly/Dash to develop the frontend in a web app.
  • used TensorFlow and Keras for the creation of our backend as a neural network.
Our logo!

Initial Situation

Focusing on just one connection makes DB’s ticket system problem even more obvious. On any given day, there are around 40 possible connections for a trip from Muenster to Berlin alone, with prices ranging from 26.90 € to 139.90 €. These prices are determined by various categories: 1st class, 2nd class, number of transfers, travel time, Super Sparpreis, Sparpreis, Flexpreis, Flexpreis Plus, number of transfers, and more.

Web scraping, data cleaning & team setup

We started with a web scraper and spent weeks screening DB’s website for connections and price parameters. This analysis was limited to connections from Muenster central station to Hamburg, Cologne, Dortmund, Berlin, Duisburg, Aachen, Dresden, Nuremberg, Munich, and Stuttgart main station. Thereupon, we cleaned this collected set of travel data primarily with the help of the Python libraries NumPy and Pandas.

Frontend

Due to our selected TechLabs track (Data Science with Python) the usability and not the design of the frontend was our main focus. Hence, for the development of the frontend we had a basic web application in mind in order for the user to input the specific parameters of the desired connection including the departure and arrival destination as well as travel date and time. During our research to find a suitable development tool, we came across Plotly/Dash, a popular open-source framework for building Data Science web apps directly tied to our Python code. Although Plotly/Dash was not part of our TechLabs course, familiarizing ourselves with its logic did not take long as it is very easy and intuitive to use.

Backend

After its cleaning, our data set was well structured but not yet ready to be used in a neural network. Since the parameters of each train connection are mainly categorical values, we applied the categorical encoding from the scikit-learn library. This allowed us to change these parameters into numerical values.

Results of the project

Our current system that allows the interaction between the frontend and backend provides the following solution. A user decides for a DB connection from Muenster to the above-mentioned cities on a certain day and at a certain time. After entering the different parameters in our frontend, the current price of this connection is retrieved from DB’s website via a web scraper. Once acquired, this price and the parameters of the specific connection are automatically induced into the already trained neural network that acts as our backend.

Concluding thougts

Finally, we would like to thank the entire TechLabs team, especially the dedicated folks in Muenster. Only because of you, we were given the opportunity to work on this exciting project and perhaps make the world a tiny bit fairer. To conclude, we would like to share our biggest learnings (in no specific order), hoping that they will help you with your next Tech4Good project:

  • To motivate the entire team over the course of the project, it is important not to lose sight of the goal and to find relevant tasks for everyone depending on their level of knowledge.
  • Data cleaning and preparation should not be underestimated, it takes a lot of time and is the essential foundation for the success of a coding project.
  • A lot of helpful information is freely available on the internet e.g., via GitHub, Stack Overflow or YouTube.
  • Experienced mentors are especially helpful for difficult technical but also interpersonal challenges.

The team

Daniel Schlegel Data Science: Python (LinkedIn)

Mentor

Justin Hellermann

Our community Members share their insights into the TechLabs Experience