BOTHUNTING

7 min readNov 2, 2020

This project was carried out as part of the TechLabs “Digital Shaper Program” in Aachen (Summer Term 2020).

Abstract

Bothunting enables us to uncover fake Twitter accounts and bot-controlled profiles. Using an algorithm based on machine learning methods, BotHunting managed to predict a Bot successfully within our test data with about 94 % certainty.

INTRODUCTION

Social platforms are getting more and more harmful to our society. We face the challenge of disinformation and political opinion-making due to spreading fake news. Further, we are often getting spam content. Especially on Twitter, these spams and fake news occur a lot. This annoying and misleading content is, in most cases, spread or even created by bots. It is estimated that 15 % of Twitter accounts consist of bots [1]. We wanted to detect them with our software solution.

METHOD

In the following steps, it will be explained how we derived the BotHunting software solution and what milestones, obstacles we had to tackle to reach the project goal.

Getting started

Gathering information about Twitter Bots:

We started to collect more detailed information about Twitter Bots. During the research phase, we learned that we could differentiate between different kinds of bots. These can be broadly grouped into spambots (which we later labeled as traditional bots) and social bots. Social bots are more harmful to our society since these bots mostly follow a political intention and agenda by spreading disinformation, fake news, and populistic content, whereas spam bots mostly focus on amplifying the reach of existing messages (advertising, etc.).

Dataset

Searching for Datasets

We searched for labeled datasets and came upon the source page of another like-minded project called “Botometer”, which had all the datasets used for its creation accessible [3]. Our final selection was decided by which datasets were still usable and then circling out the one with the most Data points. We decided to go further with the dataset of Cresci 2017, which was also cited in the academic paper “Supervised Machine Learning Bot Detection Techniques to Identify Social Twitter Bots” [2]

1. Dataset cleaning

Our data set at that time consisted of 11017 data points. After we removed deleted Twitter accounts, data points with incomplete feature sets and languages that are written from right to left, our data basis consisted of 8089 data points.

2. Data extension

We downloaded all information about the accounts and their tweets in our dataset using the Twitter API and Tweepy [4]. Getting access to this API meant creating a developer account with Twitter, which gives you access to their data stream. We computed the features for the classification based on that data. We did that by iterating over all user-IDs in our dataset and requesting the information via Tweepy. Due to the download limits for the Twitter API, we had to run this multiple times in a while loop until all the information needed to compute the features was gathered. Further on, we calculated our own features for each row with self-written functions.

3. Define the feature sets

After gathering more information about the bot types, we could define different characteristics for each one. We defined eight features to categorized Twitter accounts into social bots, traditional bots, and genuine users:

is_protected: Protected means that not everyone can see the contents of the account. First, one has to follow the account and the owner of the account has to accept the following request
time_of_existence: Time passed since the account’s creation in days
average_daily_tweets: Average number of tweets per day
inactive_days: Number of days since the account’s creation on which the user did not tweet
has_default_image: Boolean that indicates whether the user uses the default profile image or not
bio_is_empty: Boolean that indicates whether the bio/description of the user is empty or not.
friends_followers_ratio: Quotient of the number of friends and number of followers.
is_verified: Twitter gives the verified status to you if you can prove that you are a person running the account. However, Twitter usually only verifies accounts of people who are known to the public.

(For insight into the functions used to create these features you can have a look into our Git repository)

Model training

We trained all models based on a 75/25 train-test-split using 75% for training data. The remaining 25% of the data set were used to test how precise our trained models can detect Twitter bots. We tried out ten different classifiers with our data:

K Nearest Neighbours Classifier
SVC (Support Vector Machines)
[Gaussian Process Classifier] (Got took out for its long runtime (stopped after 15 minutes))
[RBF: Radial Basis Function Network] (Produced AttributeError: “‘RBF’ object has no attribute ‘fit’”))
Decision Tree Classifier
Random Forest Classifier (Was used for the actual classification)
Ada Boost Classifier
Gaussian Non-Binary Classifier
Quadratic Discriminant Analysis (Quadratic Classifier)
MLP (Multilayer Perceptron) Classifier

RESULTS

1. Results of all classifiers

We calculated the F1, precision, and recall scores for all the classifiers mentioned above with and without scaling the data. The best results were reached with the Random Forest Classifier. It managed to reach an F1-Score of about 95 %. The following charts show all results for the different classifiers. The left diagram presents the models’ results without a scaled feature set, and the right one visualizes the predictions of our all classifiers with a scaled feature set.

The reason we did it both ways is that at first we scaled all our data and noticed an error with our predictions, where every target was identified as an actual user (even guaranteed bots from the training set). We think the reason for this error was the fact that some of our features are binary so the scaling process made them inconclusive.

2. Adjustment of the parameters to improve classifiers results

We adjusted our code with hyperparameter tuning methods to improve the results of the Random Forest and the MPL classifiers. We used two methods for this Randomized Parameter Search and Parameter Grid Search.

Randomized Parameter Search was used to get a rough idea about what area of values creates positive changes. Whereabout the followed Parameter Grid Search method tried out different values around the area found in the Randomized Parameter Search resulting in the best improvements for the classification.

After the tuning process, we settled for the Random Forest Classifier (RFC), since it consistently had the highest scores, followed by the KNearestNeighbors and the MPL classifier.

The following pictures show you the results of applying the results of the hyperparameter tuning to the RFC.

As you can see, the differences in the scores are minimal and sometimes even resulted in negative changes. Further research into this revealed that this seems to be a common occurrence when trying to tune the RFC.

Challenges we faced

Every person in our group had a different knowledge base and device to work on. Setting up the work environment for every member of the group took some time and in the end, we settled for exchanging Notebooks instead of using the project structure one of our mentors had created in the beginning.

Even getting used to Git proved to be quite a challenge, but after some time everyone got the hang of it.

On the technical side, one of the first problems we came across was the limit the Twitter API puts on their base users, concerning their access data. At first, we thought about working around that using multiple developer accounts but it turned out that the amount of data used by us was still within the range of the basic data plan.

There also was the issue of languages using Arabic and Hebrew lettering and other formatting problems concerning the UTF-8 standard. We managed to fix the UTF-8 problem pretty fast after finding out that Tweepy actually has built-in support functions for it and manually translated the Arabic and Hebrew parts of the dataset since there was only a small amount of them.

Future Work

The most basic goal for the future of our project lies within increasing the accuracy of our predictions, by including more and better features.

There’s also the idea of creating a more detailed bot classification, that also takes the goal of the bot into consideration (e.g. “Which political party does the bot support?”)

When testing our program about three weeks after its completion, we noticed that the F1, precision, and recall scores all dropped. After thinking about it, we noticed that the issue seems to be laid in the fact that some of our features are time-dependent (e.g. “Time of existence”), thus degrading the potential of our classifier over time. In the future, we either want to incorporate non-time-dependent features or write a protocol that retrains our classifier with current data in fixed intervals.

Afterword

The last couple of months with TechLabs were both challenging and fun times. We want to thank everyone at TechLabs and especially our 2 mentors Thomas Salzmann and Abdullah Shams for making this possible and supporting us through the process.

- Adrian Kasner, Angelica D’Souza, Arjun Sahni, Jan Beecken, Roman Zipfel, Serkan Aygültekin

References

Repository: https://github.com/thomas-salzmann/bothunting

TechLabs Aachen e.V. reserves the right not to be responsible for the topicality, correctness, completeness or quality of the information provided. All references are made to the best of the authors’ knowledge and belief. If, contrary to expectation, a violation of copyright law should occur, please contact aachen@techlabs.org so that the corresponding item can be removed.

BOTHUNTING

Written by Inside.TechLabs

No responses yet