“Doing Good with Bad Habits” Project Report

Inside.TechLabs
10 min readMay 21, 2020

Karoline Brugger, Marie-Sophie Dumont, Antonia Kellerwessel, Mahnoor Samee, Jan Hähl

Project Introduction

Have you ever stood outside smoking in the cold and rain when all your friends were inside and warm having fun, and thought: ‘Wouldn’t it be nice to be with them right now?’ Or woken up in the morning after going out, surrounded by Big Mac wrappers and regretted the food orgy you were a part of on your way home from the club? Well, we have, and we tried to come up with a solution that helps us to get rid of our bad habits and at the same time help those in need. Our solution is the “Doing Good with Bad Habits” application.

We developed a solution where we define an action we consider as a bad habit and which is then recognised based on the details of our bank transactions. Once the bad habit is executed and detected by our system, a certain predefined amount of money is sent to an NGO of our choice. This application makes it possible to donate some money to hungry kids whenever we overindulge in fast food, donate to education when we watch too much Netflix or support cancer research whenever we buy cigarettes. By coupling the bad habit with a financial contribution to a good cause, we thought we encourage to reduce our bad habits by punishing ourselves financially, while also aiding those in need whenever we feel a little overindulgence cannot be avoided.

The main challenge while developing this program was getting access to highly protected banking data, develop a model which automatically detects predefined bad habits and then connect it to the application so that it automatically triggers another transaction

Methodology, Learnings & Results

In the following we will describe our methodology and process of obtaining the web application “Doing Good with Bad Habits”. We decided on creating a web application instead of a mobile application, which actually would have been more suitable for our project, because we as the Web Development people wanted to build on the skills we have learned in the TechLabs track. We created a GitHub repository so everyone from the Team could access the code and work on it separately.

Our project started with the data collection followed by data cleaning, development of the web application and training of the AI algorithm. During these steps, the tools we have learned from the online track provided by TechLabs enabled us to master these tasks successfully.

Backend

The first step proved to be the most difficult one. Because the data we were trying to collect was subject to privacy concerns, we were not able to obtain any data from online sources. Thus, the data collection process was mainly done internally by contributing our own banking transactions as .csv files. But during the training of the algorithm this proved to be insufficient input for the AI to recognize the transactions. So we turned to the TechLabs community and asked for the members to contribute their anonymous banking transactions. This enabled us to collect 2935 additional data points with which the first hurdle could be overcome.

In the next step, a universally applicable data cleaning process was developed. The main challenge during this step was to build a script that transforms very diverse data into a format that will be readable by the AI algorithm. This problem was reinforced by the international nature of our team. As we have team members from both Germany and Norway, the data, too, was in different languages and had different peculiarities. Especially the number format in Germany, where decimal points of numbers are separated by commas instead of periods proved challenging. As part of the solution, we decided to prompt the user of our system for the name of the bank and clean their banking statement accordingly. During the data cleaning process we then built separate scripts for each bank and substituted commas for periods for German banks. Additionally we took other peculiarities in the banking data, like repeated and random special characters in the transaction label, into account and deleted them. Overall, this process enabled us to transform the uploaded .csv file into a cleaned one which in turn served as input for the AI team.

After some trials with different methods, the AI Team decided to use a pretrained language learner from the fast.ai library in order to approach the classification problem. After assessing different learning rates and trying to run the model with different numbers of epochs, it was finally decided to unfreeze the model and fit it with 2 cycles, which brought up accuracy to approx. 97%. Although this was a success because previous iterations had jumped between accuracies ranging from 50% to 80% and partly delivering accuracy of 100%, which were bound to be overfit, it is assumed that this model is also very overfit to the data provided.

Nonetheless, in order to reach this goal, some hurdles had to be overcome. Firstly, the data did not provide data on “bad habits” to the extent we had hoped for. Therefore, we decided to use other transactions. The most frequent transactions present in our banking data was that of supermarkets, the university canteen and purchases at 7-eleven. In order to build a dataset that the learner could appropriately learn from, we decided to focus on four classes, namely, netto, lidl, spisestuerne and 7-eleven. The model was supposed to predict these classes from a data set containing relevant banking transactions. For now, other banking transactions were excluded. Because the data that was received was not labelled, we labelled the data manually and in order to achieve a larger data set, it was duplicated several times. Nonetheless, in the first iterations of running the learner, the results were very inaccurate. Several times, many transactions were misclassified as ‘spisestuerne’, which could be analysed by assessing the confusion matrix and the top losses of the trained learner. After different attempts to improve this inaccuracy by using different learning rates, checking the labels of the data and the structure of the different data objects, we realised that the data we had provided the model was completely imbalanced. There were approximately twice as many data objects for spisestuerne than there were for any of the other classes and they were a lot more homogenous and evenly distributed within the class. In order to make the learner better, we thus began to rebalance the data in a very manual process. We balanced the objects per class to each contain a similar number of data objects. Moreover, we made sure that even within each class, the data was balanced to the extent that there was a similar amount of transactions with the same syntax in each class. After achieving very bad accuracies at the beginning of the training of the model, the accuracy improved to what was mentioned at the beginning of the paragraph, i.e. an accuracy of approx. 97% was reached and the classifier could now classify most transactions correctly.

Frontend

For the frontend we wanted to create an appealing, minimalistic starting page in the look of N26 bank, which invites users to sign up / log in to our application or just to receive more information on the organization (see navbar on top).

We then linked this starting page to three other pages, mainly a sign up page, a file upload page and lastly the choice of charity page. The two latter pages are supposed to fall under the personal pages the user sees when he is logged in. Regarding the integration of the backend, the file upload page is the most important as it enables the user to upload his banking data in a csv file and look at the labelled data or define a transaction as a bad habit.

To have a predefined framework to work with in our code, we chose to use the Bootstrap library which was explained in the TechLabs track in detail. Besides the HTML structure and CSS to style our pages, we used JavaScript to enable animations in our Sign Up page.

The creation of the animated sign-up / log-in page which switches between Log-In and Sign-Up was really difficult to create for us due to the complexity of the code. We used a tutorial to come up with the idea of changing between the two windows, however we really needed to understand the code in order to write our own version of it.

On the file upload page we came up with a placeholder table of the exemplary bank transactions in order to get an idea of how the uploaded and labeled data should look like. As this is only a first prototype in the final version of our application this table will be automatically generated when uploading a banking data file. After the connection to the backend was successful, we added the additional features of a submit-button and an option to choose the relevant bank so that the model knows which data cleaning template to use.

The choice of charity page enables the user to look for a charity with a search bar or choose a charity from a predefined library of NGOs. If his desired charity is not in there, he should be able to add one himself.

Regarding the frontend, we have certain limitations in our project. Due to our limited knowledge from the elearning track and scope of the project we were not able to come up with a completely working solution where the user can sign up and log in with his stored credentials, upload a banking file and get the file displayed in the way we designed it in the placeholder. In a perfectly working solution we would also add a personal dashboard for each user so that they can see a development of their bad habit expenditures as well as possibilities to individually configure the application to their personal needs, e.g. the function to invite friends and gain rewards, donating to a friend for a certain time instead of a charity, enable location-based notifications if they are approaching a location of their bad habits etc.

Connecting & Integrating

Putting together the frontend and backend is a key challenge in modern web development and also posed some difficulties for us. We developed the parts independently from each other. This separation of concerns made sense, as both teams could work independently from each other. To connect the frontend (written in HTML, CSS and JavaScript) and the backend (which consists of two Python scripts to clean and predict), we used the Flask framework. This framework is very easy to get started but still provides all the functionality that was required to process the routing and the data upload.

Conclusion

Even though we believe in our project and are proud of the prototype we have developed, the road to the end product was not an easy one, as we faced multiple challenges.

One administrative challenge we all faced was the time management necessary to keep up with the TechLabs track. These tracks took up to 55 hours of online courses which aimed to supply us with the tools needed to successfully complete such a project and while the tracks were incredibly useful it still called for superior time management in order to complete the project next to studying and working. Overall, the tracks also took much more time than stated in the outline as you need additional time to understand the concepts, write down important notes and do exercises.

During the data cleaning process we faced some challenges that are core to programming. At times, it seemed like the process was never ending and more and more problems kept coming up. The moment of success for the Data Science team was when after combining the data cleaning processes for the different banking statements, the Python script successfully cleaned the data, regardless of the kind of bank. It seemed like the multiple challenges of dealing with those heterogeneous data sources were finally overcome and the work that was put in paid off at last.

Regarding the frontend, the main challenge for our project team was to connect the backend to the frontend to be able to upload a file and get it cleaned and analyzed in the model, because no one of us had covered this topic in his track. With the help of the TechLabs Mentors we have figured out a solution, however it would have been nice to get an extra e-learning on that topic as this is one of the most important features of our application.

Although the result we got does not resemble the original scope of the project due to the issues faced and due to the fact that the initial goal was very ambitious, we are happy with what we have learned and are glad to present a working web app that can predict the four provided classes accurately. For the future, a catch-all class for other transactions has to be added and the range of transactions has to be extended as well as many more transactions classified and different types of banking data added. Also, data from supermarkets, app usage and many more would have to be included to reliably identify whether the bad habit was followed. Moreover, an actual connection to banking data and transactions would have to be established to eliminate the slightly cumbersome upload of a CSV file to the application as well as executing money transfers to the relevant NGOs.

Overall, the project was a good way to practice the things we have learned in our individual tracks and the whole TechLabs semester was a highly useful experience we all learned a lot from!

Project Team

Our team is made up of the following five people with the respective TechLab tracks:

Karoline Brugger :Python Data Science Track

LinkedIn

Marie-Sophie Dumont:Web Design

LinkedIn

Jan Hähl: AI Track

LinkedIn

Antonia Kellerwessel: AI Track

LinkedIn

Mahnoor Samee: Web Design

LinkedIn

Luckily, every member of the team is currently studying in an IT related field, like E-Business and IT Digitalization. This helped us find common ground and enabled us to get into the topic easier and faster. Overall, the many hackathons and group meetings proved both productive and enjoyable, which we hope is also reflected in our results.

--

--

Inside.TechLabs

Our community Members share their insights into the TechLabs Experience