Boom and Bust — Is XGBoost the new Nostradamus of financial bubbles?
This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (winter term 2021/22).
Ever since the first stock exchange opened in the 16th century, investors have been trying to predict price developments and make profits from them. However, common statistical time series models have little empirical evidence of predictive power for bubbles and crashes. Therefore, we implemented a relatively new machine learning algorithm, namely XGBoost, to predict the daily adjusted closing prices of a single stock based on the last N days of data. For our implementation, we used the programming language Python including standardized packages like Numpy, Pandas, Matplotlib, Seaborn, Sklearn and XGBoost. Finally, we were able to successfully replicate a common stock price prediction algorithm and achieve good results on a selected yahoo finance dataset.
Ever since the first stock exchange opened in the 16th century, investors (private individuals as well as institutions) have been trying to predict price developments and make a profit on them. Especially in times of strong price movements, the desire for precise predictions is particularly high. But common time series models, such as the ARMIA model, have little empirical evidence of predictive power for bubbles and crashes.
Common statistical models cannot end the search for the Holy Grail. Much more precise could be ML algorithms, such as XGBoost. Here, the goal is to predict the daily adjusted closing prices of a stock index or an individual stock based on the data of the last N days. So, in simple terms, it looks for patterns in the past that could be applied to the future. To effectively evaluate the performance of XGBoost, it is not enough to perform a forecast at a single point in time. Instead, various predictions are made at different points in time in this data set and the average of the results is determined.
In our project work, we experimented with the XGBoost algorithm to forecast stock prices. Since our project team shrank from initially five to finally two team members, we followed closely the following resources to still accomplish our project goals:
Forecasting Stock Prices using XGBoost:
One member of the final team participated in the Python Data Science track whereas the other team member participated in the Artificial Intelligence track. Therefore, the main programming language used was Python. Moreover we used different Python packages to process our datasets and carry out our analyses. For mathematical operations we used the package Numpy. For reading and saving our datasets as well as the overall handling of the data in the notebooks we used the package Pandas. The first visualizations were programmed with the package Matplotlib. The more advanced analyses and visualizations were developed with the packages Seaborn and Heatmap. For the preprocessing of the data as well as to calculate the evaluation metrics the package Sklearn was applied. To finally implement the XGBoost algorithm, we used the XGBoost packageFor the development environment we used Jupyter Notebook, which provides an intuitive and easy to use user interface for data analyses. To manage the code and the different notebooks we used the online platform GitHub. Every team member locally cloned the repository and worked independently of each other on his local personal computer. Other team members were then able to pull these changes to their local environment and proceed with their tasks. The initial project management was done in Notion with templates from the Techlabs management team. For all other communication, coordination and the regular online team meetings we used our Slack team channel provided in the Techlabs community Slack room.
For the project a dataset from yahoo finance was used: https://finance.yahoo.com/quote/VTI/. In the first step, an explorative data analysis was conducted. The processed data with the Python package Pandas looked like this:
Below is a plot of the adjusted closing price in the entire data set:
The explorative data analysis revealed that the dataset has already a good quality and not many preprocessing steps have to be conducted:
To finally perform a forecast with the XGBoost algorithm, the dataset was split into training and validation data. The model was trained using the training set and the hyperparameters were tuned using the validation set. For tuning the hyperparameters, the moving window validation was used as explained here.
To evaluate the effectiveness the XGBoost model, the root mean square error (RMSE), mean absolute percentage error (MAPE) and mean absolute error (MAE) were used as evaluation metrics. Analogous to the resources mentioned above that we replicated, we achieved good results for our predictions:
Finally, we were able to replicate the XGBoost algorithm from https://towardsdatascience.com/forecasting-stock-prices-using-xgboost-a-detailed-walk-through-7817c1ff536a to implement our first stock price forecasting. This approach already lead to good results. In the future, the different machine learning algorithms should be developed and different stock price datasets should be tested in order to improve stock price prediction even further.
Philipp Fukas Artificial Intelligence
M. S. Data Science: Python
Roles inside the team
Philipp Fukas mainly dealt with the technical setup of the GitHub repository, the formulation of the blogpost as well as the technical implementation and evaluation of the XGBoost model.
M. S. was mainly responsible for the presentation and communication of the interim project status and the project closing as well as for data acquisition and preprocessing.