SEA vs SEO Analysis — Quantification of Cannibalization Effects

This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (summer term 2021).


Search engine marketing (SEM) is taking up an increasingly large share of many companies’ online marketing budgets. The growing cost of SEM are also driving a greater need for optimization, especially with respect to the search engine advertising (SEA) budget. In our project we cooperated with a medium-sized German company in the pharmaceutical and cosmetics industry.

Our main project focus was to quantify the so-called search engine cannibalization effect for the main brand of our project partner and based on that deliver optimization recommendations. Search engine cannibalization describes the scenario where users click on a costly paid link, even though they would have clicked on the organic link if the paid ad had not been displayed.

Every day Google receives more than 3.5 billion search queries and helps users to find answers to all kinds of questions ranging from “how to save a life” over “how to remove fat stains” to “when does my local hairdresser open”. Hence, it is no surprise that search engine marketing has become one of the most important advertising channels. Search engine marketing (SEM) includes search engine advertising (SEA) and search engine optimization (SEO).

While SEA describes paying for ads to be positioned in the ad section on the search results page, SEO includes the optimization of a website to improve its position amongst the organic results. Contrary to SEA, SEO does not result in any extra monetary cost, so companies, including our project partner, might wonder “Does it make sense to engage in both? SEO and SEA? The answer to that question depends on how many of the people who click on the ad would also click on the organic link if the ad was not present, i.e. how many clicks were cannibalized, and how many of them would not click at all, i.e. how many SEA were lost.

While these are clearly highly relevant questions as they offer a large savings potential, it is not as obvious how they can be answered. Consequently, the main focus of our project was to find out how the cannibalization effect can be detected and quantified in order to optimize the SEA budget allocation based on the results. As it is not only relevant to see how large the cannibalization effect is but also why it might be larger for some than for other keywords, another goal of our project was to find out what the main drivers of cannibalization are.

As a first step to answer our questions, we determined what kind of data is necessary for our analysis. In order to quantify the cannibalization effect there has to be performance data for each keyword from a period with paid ads and from a period without them in order to make a clean comparison.

Unfortunately, such data was not available yet for the main brand of our project partner. Therefore, our second step was to set up a data generation process for them and we decided to suggest a so-called A/B test. The A setting, i.e. the control setting, is the scenario where SEA ads and SEO links are present, while B is the treatment setting where only SEO is present. To quantify cannibalization one can easily compare results from setting A and setting B. However, in order to get the desired results, it has to be ensured that the population in both settings are as similar as possible to reduce the influence of any confounding variables such as age or gender. An ideal solution for this problem is to use a repeated measure design which implies that the population always consists of the same people, but the control and treatment setting is alternated weekly.

Finally, it is also crucial to choose the right sample of keywords to test. Utilizing the available data from our project partner, we selected a sample of keywords based on several characteristics. First, it made sense to choose the keywords with the highest SEA spend in the past, as those are the ones with the highest potential savings. This resulted in a sample of 67 keywords. Since identifying the drivers of cannibalization requires an adequate distribution of keyword characteristics, we also evaluated the distribution of our sample with respect to relevant variables influencing cannibalization according to research. These include the SEO and SEA position, the type of keyword, i.e. whether a keyword contains the brand name (branded keyword) or not (generic keyword) or the competitiveness e.g., the number or frequency of companies advertising for the same keywords. As all these characteristics were adequately distributed, we determined the 67 keywords as the sample to test.

After setting up our experiment, the next question that arose was how to evaluate the results. How can we quantify the cannibalization effect and how can we base managerial recommendations on the results? To illustrate that let us take a look at a made-up example for a single keyword.

We can easily see that the total number of clicks has decreased from 375 (200+175) in the control period to 315 in the treatment setting, even though our SEA clicks have dropped from 175 to none, as we stopped bidding on the keyword in the treatment period.

This is due to the sharp increase in the number of SEO clicks from 200 to 315 which indicates that a significant number of the 175 SEA clicks from the control period have migrated to the organic clicks. However, a small fraction of this increase in SEO clicks, can also be attributed to the slight increase in the number of organic impressions.

To compute the cannibalization effect, we can now compute the differences in the SEO clicks in both periods and control for the effect induced by the increase in organic impressions. It turns out that 100 out of the 175 SEA clicks have been captured by the organic search result. Contrary to this, 75 SEA clicks are lost.

Having computed the cannibalization effect both in absolute and relative terms the question is left how to optimize the SEA based on these results. Should our project partner stop bidding on keywords where a large number of organic clicks are cannibalized by the paid links or is it profitable to continue bidding on such keywords?

To make a decision, we need to quantify how much money we would have to pay to bring back the 75 lost clicks. If we started bidding on the keyword again, we would again have to pay 0.50 € per click. This would result in 175*0.50 € = 87.50 € in total. Therefore, we would have to pay 87.50 € / 75 = 1.17 € for each click recovered. This KPI is super useful because the search engine marketing team can make decisions based on the same metric that is used to decide on bid amount for individual keywords. How much are we willing to pay for each customer to reach our landing page given a specific search term?

A typical question that arises after this analysis is: “Is it possible to apply the results of the evaluation of our field experiment to keywords that were not included in the experiment, for example to keywords of other brands?” As mentioned, it is known from search engine marketing research that there are various influencing factors that affect the cannibalization effect. The magnitude and direction of their impact can be modeled, for example, using a regression tree. This type of model can not only be visualized clearly and interpreted well, but also allows us to transfer the results to other keywords.

Regression trees are constructed in such a way that all data points are initially contained in the root node. This node is then split by the dimension and the threshold that explain the most variance within the data. In this case this is the keyword type. From this exemplary regression tree, we can see that branded keyword with an average cannibalization effect of 28 % have a much greater effect than generic keywords with only 16 % average cannibalization. Splitting dimensions that are used in the upper levels of the tree are the most important predictors of keyword cannibalization. Regression trees can be easily implemented using the Scikit Library in Python. Given keywords for which we cannot calculate the cannibalization effect directly because we do not have data from an A-B comparison for them, we can estimate the approximate cannibalization effect by using the regression tree. Of course, there are many other ways and models to identify cannibalization drivers and enable predictions such as regression models or neural networks to name only a few alternative options. In practice, we always recommend testing several models, using the same loss function to determine how precise the predictions of different models are, and then deciding also on the basis of other selection criteria, such as for example interpretability, which model to ultimately choose. For example, in our case, it was important to also get a good understanding of what the important drivers are and not only accurate results. So, despite neural networks delivering more accurate results we decided to use regressions trees as the results of neural networks are much harder to interpret.

The team

Lukas Heidemann Data Science: R

Pia Klein Data Science: R


Fabian Kraut

Our community Members share their insights into the TechLabs Experience