User Experience and Customer Journey Analysis of a German cosmetic and medicine company

This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (summer term 2021).


Our project was a cooperation between TechLabs, a German cosmetic and medicine company and the Marketing Center Münster. Out task was to look deeper into the website of one sub-brand of the company. We analyzed the clickstream behaviour of the website visitors with the help of Descriptive analysis, Markov Chain, Clustering and Machine Learning in order to find optimization potential to help reduce the bounce rate and increase user engagement.


User experience describes all aspects of a user’s impressions and experience when interacting with a product, service, environment, or facility while the customer journey analysis helps a company see its products or services through its customers’ eyes. Since our project was to work together with a partner, we aimed to see the partners website through the eyes of their website visitors. With our acquired RStudio skills due to Techl#Labs, especially via the learning platform DataCamp, we planned to

  1. identify the most successful entry channels
  2. analyze the most common exit channels
  3. cluster common customer journeys
  4. predict future user behaviour


We received three datasets by the company, describing the sessions of each user. We downloaded them with Google Big Query and used this platform to get a first impression over the variables with the help of SQL. For data cleaning, we used R Studio. We merged all three data sets into one after reviewing which of the variables are important for the further analysis. Overall, we cleaned the original dataset by 30% to 1.209.827 data points. For this we created first a German subset, computed new variables, checked for duplicates, and removed session IDs with a session duration of zero or higher than the 95th percentile. Furthermore, we removed sessions with page views over 100 pages and those where the city was the headquarters of the company since we did not want to work with a biased dataset.


For the analysis three events were important — Purchase Interest (when a website visitor clicks on the “Buy Now” Button), Purchase Outclick (when the website visitor selects a wanted distributor) and Form Submit (when he/she subscribes to the newsletter). Since the goal of the website is to get the website visitor to engage further, we used these variables as the dependent ones. We started the analysis by running some descriptive analysis in order to gain a better understanding of the website visitors. Afterwards, we took a closer look towards the entry and exit channels, meaning how website visitors enter the website and when they specifically leave. In order to analyze the user journey, we decided to apply the Markov chain model. The Markov chain is being commonly used to describe user journeys by describing a sequence of events based on a linked transition probability of another event. For a visual presentation the Sankey Diagram can be used (presented in the following). However, since it was difficult for our analysis to gain relevant insights about the click behavior of the website visitor with the Sankey diagram, we focused our analysis on the transition probability matrix (also called heatmap).

Sanky diagram of the customer journey transition within the website

The heat map helped us look at specific events closer and gather interesting insights and understand better how the website visitors are behaving. The following graphic shows the probability with which the website visitor transits from one page to another. The darker the color, the higher the transition probability.

Transition probability heatmap of one cluster

After gathering interesting insights in general, we aimed to create clusters of website visitors with similar clickstreams with the k-means method. Based on the number of events, session durations, scroll behavior and the amount of page views, we were able to find three clusters which can be distinguished as the “Quick Checker”, the “Bouncer” and the “Engaged” segment. Lastly, we wanted to predict whether the website visitor triggers an event based on machine learning models. After comparing multiple methods, we selected the Random Forest as the best performing algorithm with the lowest mean misclassification error. Based on the marginal effects, we gained further insights at which value the likelihood of an event is the highest of each feature.

Summary of results

To not exceed the scope of this post, we only provide the results and managerial implications of our analysis in an aggregated form. We condensed the entry channels into two major ones and found out that the paid entry channel leads to more events triggered. We therefore recommend to further allocate budget to paid media while optimizing the organic search, as well. Furthermore, we were able to see, with the help of the transition probability matrix and descriptive analysis, at which websites a lot of users bounce. We recommend placing better product linkages between specific websites and publishing more articles with interesting content, since they are popular and lead website visitors to their products. Furthermore, website visitors stay at the top of the page, wherefore interesting content should be positioned on the top of the websites. Regarding the exit channels, we could identify a high bounce rate. In order to reduce it, we recommend simplifying the navigation of the website, display less text-heavy content and improve cross-links between certain pages. For better conversion, we developed several recommendations to better track purchases and remove hurdles that keep website visitors from converting into buyers. Also, we were able to identify three major clusters, while one of those clusters yields the highest probability of an engaged website visitor. In terms of our prediction model, we found out which variables were more likely to help increase user engagement. Based on our entire analysis, we are now able to conclude how an optimal engaged user can be defined. In the future, our Machine Learning Model can be further developed to apply dynamic targeting on the website by including more information which can be gathered at the beginning of the journey of the website visitor.


All in all, we were able to achieve all our goals set in the beginning. This project was challenging but yet fun, we enjoyed our project work very much and would like to thank TechLabs and the MCM for the organization and support.

The team

Madleen Banze, Data Science: R (LinkedIn)

Eva Schragen, Data Science: R (LinkedIn)

Olivia Henk, Data Science: R (LinkedIn)

Anna Schuller, Data Science: R (LinkedIn)


Marcus Cramer

Our community Members share their insights into the TechLabs Experience