User Generated Content as a Brand Insight Tool to Reveal the Drivers of User Engagement and Evaluate which Topics are Perceived Positive and which Negative

This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (summer term 2021).


The goal of this project was to gain brand insights by analyzing the sentiment of users on social media regarding topics that are connected to the brand. Positive topics could be promoted, while insights about negative topics helped to shape communication.


We chose two approaches for our analysis. Firstly, we looked at the engagement of consumers with the brands:

  • What do the brands post and do the topics differ across brands?
  • Which factors (f.e. topics) drive engagement with brand posts?
  • How does it compare to the perception of the competitors?
  • What topics are criticized or seen as favorable?

Data Basis

Regarding the scope of our analysis, we identified six brands in the FMCG segment that were investigate more closely. As for the platform we looked at UGC from Instagram, as it is probably the most prominent for the target group of the chosen brands.

Approaches and Derived Recommendation for Action

Regression Analysis

In the first part of our analysis we aim to find influences on users’ engagement with brand owned content using a regression analysis. User engagement is a form of user generated content and a very basic method to measure sentiment whereas more engagement is seen as being more positive.

Sentiment Analysis

Our second analytical approach utilizes the sentiment analysis to gain insights about how the brand is perceived by Instagram users. Building on these sentiments we then derive topics which are more likely to be perceived as positive or negative. The goal is to give managerial advice on which topics to focus the communication on in order to improve consumers perception of the brand. The data we used for this approach were comments under the brand owned posts and captions of pictures that were uploaded by users themselves and contained the brand as a tagged account or the hashtag corresponding to the brand.

  1. Secondly, we cleaned the texts from punctation. This will become important in a second.
  2. Then, we tokenized the texts. This means that each case containing a text (combination of multiple words) now was split into multiple cases with each containing only one word. Those cases still shared the same identification variable so we were able to restore the original text.

Topic Modeling

After deploying the sentiment analysis, we split the underlying data set into two for each brand — one for the positive sentiment and one for the negative. The overall sentiment score, aggregated by the sentiment analysis of text and emojis was utilized with the cut off value of 0.

  1. The negative side effects could be addressed by offering a money back warranty, increasing interaction with unsatisfied customers and offering free samples.

The team

Lucas Fischenich Data Science: R


Fabian Kraut

Our community Members share their insights into the TechLabs Experience