User Experience & Customer Journey Analysis — Gaining valuable insights into the behavior of website visitors for a beauty and personal care brand

This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (summer term 2021).


Our project aimed to analyze the user experience and customer journey of a beauty and care brand website. We used a Markov chain analysis to identify the most common entries, exits, and webpage transitions. Furthermore, we relied on descriptive statistics to understand which website content performs best and which webpages still have room for improvement. Additionally, we performed a cluster analysis to determine segments of website visitors. As for our last step, we worked out managerial implications and recommendations for website optimization.

Disclaimer: We closely worked with a company that provided us with sensitive real business data and hence prefers to stay anonymous. Thus, we are unfortunately not able to share specific results which is why we kept insights of our analysis rather general for the purpose of this blogpost.

Despite the digitalization, 69% of products are still bought offline in the beauty and personal care sector (Statista 2021). However, 81% of consumers inform themselves online about products before buying (Morrison 2014). Hence, for a beauty and care brand, it seems particularly important to deliver an outstanding offline and online experience to stay competitive. Focusing on the online part, we analyzed the user experience and customer journey for our cooperation partner’s website.

The goal of our analysis was to answer five key questions:

  • Where do users come from?
  • How do website users move around?
  • Where do users exit the website?
  • Which content performs best?
  • How can the users be clustered?

We started our project work with the preparation and cleaning of the data. Our first challenge was to merge three datasets into one. Since all three datasets were very large, we focused on the last three months and one country. Next, we cleaned the data by removing irrelevant and duplicated observations, outliers, and missing values.

To analyze where users come from, how they move around, and where they exit the website, we used two methods: a Markov chain analysis and sequential pattern mining.

A Markov chain is a stochastic process describing the probability of reaching one state from another (here and in the following, Scholz 2016). We relied on higher-order Markov chains, where the next state depends on two or more preceding ones. We applied this method in the context of our website analysis to estimate the probability to transition from a webpage to another page.

Before using the method, we had to generate clickstreams from our data. A clickstream is a sequence of click events for one particular session of a website user (Scholz 2016). In our case, we classified the webpages into categories (e.g., information pages) and then generated sequences of viewed webpage categories for each session and user.

The results indicated the start and end probabilities and the probabilities to transition from one webpage category to another. We illustrated the transition probabilities in a heatmap. A specific result is that users are likely to view the same type of page again, like viewing an information page after an information page (see figure).

Moreover, to identify the most frequent clickstream paths, we performed sequential pattern mining. We applied the cSPADE algorithm, which extracts all patterns with particular minimum support (Scholz 2016).

Furthermore, regarding the best performing content, we used descriptive statistics to determine the best and worst performing webpages. We further created a new variable indicating the level of engagement. This new variable combines different indicators such as mean scroll, bounce rate, and newsletter subscription. With the help of our results, we derived guidelines on what content performs well and which content still needs improvement (see figure).

Next, we examined where website visitors come from, how they typically transition from page to page, which content performs best, and where website visitors leave, we wanted to dive even deeper and investigate what exactly specific website visitors or specific groups of website visitors are looking for and how they are different from each other. And ultimately, we wanted to find out how we can improve the website for each of those groups. For this purpose, we applied a cluster analysis in which we divided the website visitors into meaningful segments in order to be able to derive sophisticated marketing implications, respectively.

Clustering is a classical unsupervised machine learning task, which aims to find groups of data with as much homogeneity as possible within a cluster, and at the same time heterogeneity between the different clusters (here and in the following, Lantz 2013). Thus, in our case, it helps to segment customers into groups of similar behavior and hence simplifies large data sets. In practice, the k-means algorithm is used most of the time. Briefly said, it randomly assigns a cluster to each observation and then reassigns observations until the cluster solution is optimized. We chose k-means as our cluster algorithm because it is rather simple yet highly flexible and efficient. Since it requires the data to be standardized, we rescaled the cluster variables. We chose nine marketing-oriented cluster variables, such as page type and session duration. And finally, we checked for multicollinearity to make sure that it does not effect our analysis.

Next, we determined the optimal number of clusters with the help of an elbow plot which is six in our case (see figure).

We ran the analysis, inspected the six clusters in-depth, and discussed how those customer groups can be best approached by our cooperation partner. As a robustness check, we split the data into halves and reran the analysis which did not lead to substantially different results. This supports the validity of our cluster solution.

To wrap it up, our results showed …

  • … how website users move around and where they are likely to start and end their session.
  • … which content on the website performs best and which specific categories and webpages still need optimization.
  • … that the website visitors can be clustered into six meaningful segments based on their specific needs.

Thank you for taking this journey with us.


Lantz, Brett (2013), Machine learning with R. Birmingham, UK: Packt Publishing Ltd.

Morrison, Kimberlee (2014), “81% of Shoppers Conduct Online Research Before Buying,” (accessed August 20, 2021), [available at].

Scholz, Michael (2016), “R Package clickstream: Analyzing Clickstream Data with Markov Chains,” Journal of Statistical Software, 74 (4), 1–17.

Statista (2021), “Beauty & Personal Care,” (accessed August 20, 2021), [available at].

The team

Sabina Almaschij Data Science: R

Alla Devichinskaya Data Science: R

Katrin Heß Data Science: R

Nina Nauß Data Science: R


Marcus Cramer

Our community Members share their insights into the TechLabs Experience