Using Random Forest and Naïve Bayes Algorithms in Detection of Cyberbullying on Twitter
Abstract
The internet has infiltrated every aspect of human life, making it simpler to connect people all
over the world and share information to a wider range of people. The purpose of this research
is to carry out a comparative analysis and performance evaluation of both machine learning
algorithms used in this study for the detection of bullying tweets.Nowadays, cyberworld has
numerous negative impacts on individuals despite its significant importance. One of the most
dangerous threats in the cyberworld is cyberbullying as it destroys individuals' reputation or
privacy, threatens or harasses them, and sometimes leads to suicidal acts. Therefore, an
effective and automatic detection model is proposed so that the bullies' abusive tweets or
insulting comments can be identified and detected using machine learning and natural
language processing. Two machine learning algorithms in this research, viz: Naïve Bayes
(NV) and Random Forest (RF) were used for the cyberbullying detection. The datasets
retrieved from Kaagle was used to train and test the model for classification of tweetSets as
either bullying or non-bullying (binary class classification model) and the features were
extracted using Term Frequency-Inverse Document Frequency (Tf-idf).