Reference Class Forecasting in M&A

8 min readJul 9, 2020

Robertas Kazbaras, Sebastian Buhl, Edward Ahlzén Markai, Andres Uhre Guldfeldt

Introduction

We aimed to make a project with the ability to find and evaluate reference classes of previous M&A deals for a company. During the Due Diligence process in Mergers and Acquisitions, companies do not effectively utilize historic data about similar cases to inform the decision. Professional accounting and advisory companies with M&A agree that more than half of all acquisitions “fail”, the fail-rate vary between 60% and 90%. With our project, we address this issue by creating a web application which allows the user to input the key information about the target company for acquisition. Then having a database of historic M&A deals the application creates a Reference Class of 30 deals that are the most similar to the target company using the inputted information. Then we present, the summary statistics of the Reference Class and list all the deals for the user enabling further inspection.

When the project kick-off began, we began our planning phase of the project. We spent a considerable amount of time reading about Reference Classes and M&A deals learning about their structure, selection and evaluation. When we had a quite clear vision we started looking for a database for the M&A deals and discussing the variables of the deal that are of utmost importance for our project.

Database

The first challenge we came across, was where to find the required data. We knew from the start that this project would require a lot of data points to create relevant Reference Classes and to evaluate them We were searching for a user-friendly interface for efficient web scraping and we stumbled across CBS’ databases. Among the bases, we found Zephyr, which is an external program, where CBS has a license. The database offers comprehensive M&A data with integrated detailed company information. This seemed to the best choice for our project, as the database had around 2 million M&A deals to search through. When entering Zephyr you have to agree on a formality saying that you will only use it for academic research and not expose the data for third parties. Therefore, we contacted CBS and received their permission to use the database for the project. So the scrapping of the data began.

Scraping Data

When we decided that we would scrape Zephyr, we knew that the page demands log in through CBS. Under the process of making the scraper, we had a problem surpassing the password that you needed to enter, to be able to get permission to Zephyr. To make the best use of the time we began with two approaches; Selenium and Scrapy. With the ‘Driver.get’ and web driver, we managed to surpass the password with Selenium and gain access to Zephyr. The next step was to scrape the right data.

The purpose of Selenium was to retrieve all the merger deals from the Zephyr database and then clean the data to achieve our desired output. There were over 2 million deals to scrape. Our initial decision was to scrape each page of the deal containing the relevant information of the deal. However, with that amount of pages, the scraping process would simply take too long. Therefore, we decided to change the approach by acquiring the data in a table-like format which Zephyr also has. Moreover, in the scraper we used two filters provided by Zephyr to reduce the number of unwanted deals — we scraped only completed deals having at least 20% of acquisition stake.

Knowing that even though Zephyr has 2 million deals, a lot of them have missing information that could be necessary for the Reference Class forecasting. Therefore, we had to choose variables carefully and include variables that are beneficial and available in the most deals. The variables we chose to scrape were: Deal type, Deal status, Deal Value, Completion date, Acquiror name, Acquiror Country Code, Target name, Target Country Code, Target Operating Revenue/Turnover, Target EBITDA, Target Business Description, Target Net Profit, and Deal Enterprise value.

Data Cleaning and Preprocessing

To increase the efficiency of the scraper, the data was gathered by running a few scrapers simultaneously that started in the corresponding places in the Zephyr dataset. Therefore, the first step was to use the Pandas framework to store the data. We concatenated the dataframes and deleted a few repeated deals scraped by different scrapers. Then we made sure variables such as Deal type, Deal status, Deal Value, and Target Business Description would not contain any missing values. Next step was to filter Deal Status to contain only “Completed” and “Completed Assumed” variables and filtering Deal Type leaving only “Acquisition”, “Merger”, and “Joint-venture”. Afterwards, a few aspects of the dataframe were tweaked preparing it for further use. To our surprise, we underestimated the number of missing variables in the database. After the data cleaning process we had around 40k deals to work with, however, it was still enough to create relevant Reference classes.

Creating Reference Classes

To use reference classes to find the list of deals that are most similar to the target company we created two new variables: Keyword count and Variable difference (%). When it comes to Keyword count, the user is able to insert a number of keywords that describe the business most accurately — similar to the industry of the company. The program takes these keywords and checks separately whether the deals contain the keyword in their Target Business Description. The Variable difference (%) variable is responsible for taking the key measures of the company’s performance and comparing them to the target company. Four variables are used for such procedure: Target Operating Revenue/Turnover, Target EBITDA, Target Net Profit, and Deal Enterprise value. If inputted by the user, the value would be compared with the one in the deal database assigning corresponding weight for the measure. Each user-inputted variable is then added resulting the new measure — Variable difference (%). Overall, two-level sorting mechanism is used. First of all, deals are sorted in descending order using Keyword count and afterwards Variable difference (%) is used sorting the deal in ascending order. First 30 deals are taken from the dataframe concluding the reference class for the target company.

Visuals

As all the members of the team had the data science course, we did not have any experience in web development. After consulting with TechLabs mentors, we decided that using Flask was the optimal option to create a decent frontend, as Flask offered a package of powerful tools and yet simple approach. First of all, we made an input page, where the main target company information could be typed in, such as Keywords for target business description, Target Operating Revenue/Turnover, Target EBITDA, Target Net Profit, and Deal Enterprise value.

After filling out the desired information regarding the target company and clicking “submit”, the application redirects the user to the result page. As a first outcome, the page shows a boxplot of the 5 main numeric variables: Deal value and the 4 previously mentioned variables that are used creating reference classes. To make it more understandable we used the describe method on Pandas dataframe to give the summary statistics of the variables used in the box plot and portrayed them in a table. At last, we made the list with 30 companies, with purposes of giving the user the opportunity to individually handpick which companies they want to focus on for further analysis, and to see the separate measures of the deals that are included in the Reference Class and used for boxplot and statistical overview. The table includes all the variables mentioned in the scrapping part of the report.

Limitations

Our first idea was to make it highly complicated with distribution and football fields illustrating the estimated cost of the target based on the historical deals. But as we moved on, we realized that it was too complicated partly due to the advanced website; Zephyr, but mainly because a lot of the data from the deals weren’t available which made it hard to complete the original idea. The plan included a statistic model explaining that the target would within a particularly significant level be sold in a range based on the values we chose. However, we tried to do a regression and test it but the regression wasn’t useful, so we decided not to go that way. Hence after numerous trials and discussions, we made a choice to use the data but in a different way. We started by cleaning all the data and saw what would be possible to make within our timeframe. We came up with a solution where we instead of a regression made a boxplot and a table of the thirty most similar deals. We maintained our filtering and cleaning of data by the set Keywords and Variable difference variables. After consultation with a few M&A specialists, we decided that the more advanced analysis of deals is not of utmost importance, as allocated specialists in firms often have different approaches evaluating historical deals and it would be out of scope of our knowledge to build a program that would do such complex analysis for them. Therefore, the Reference Class constructing and presentation is in itself the most valuable part of the project making the deal selection progress way more efficient.

Conclusion

Our finished project contains a frontend Flask web-based application. The visuals include an input page and an output page, with a simple setup. It compares input information with data from Zephyr on prior deals to match the deal with similar “Reference Class” which is a list of 30 deals. Output page with descriptive statistics on specific key values, showing how previous deals have been valued by looking at Enterprise values, etc. This program we have made is best applied after the financial analysis/forecast in order to inform about how much the decision-makers should trust the financial forecast. This approach we have made is also known as ‘The Outside View’. As touched upon in the limitations of the report, creating a Reference Class that selects specific deals that are the most similar to the target is a huge contribution to the efficiency in M&A analysis sector by automating the selection process of similar deals.

Throughout this project the team members were learning new things on a daily basis. We overcame difficult challenges as we moved further with the project. This project, Reference class forecasting, gave us a huge opportunity to learn and navigate in Python within Data Science. We have gained knowledge of theoretical as well as practical approached. This journey has made us all inspired to achieve more within the field of programming. Several of the members are already implementing automation at their firms, which is proof of our infinite interest in this area.

GitHub repository

Team members LinkedIn profiles:

Robertas Kazbaras, Data Science
Sebastian Buhl, Data Science
Edward Ahlzén Markai, Data Science
Anders Uhre Guldfeldt, Data Science