Want to make creations as awesome as this one?

Transcript

A Holistic Approach to Cyberbullying Mitigation through Advanced Sentiment Analysis and Comment Classification

CyberSentinel

DeepLearning Final Project

Dec 2023

Author: Udaya Singh RaiStudent ID: 4129328

CyberSentinel

03. Methodology

02. Goals

01. Introduction

06. Modeling

05. Data Preparation

04. Data Description

09. Conclusion

08. Deployment

07. Evaluation

Index

https://dataprot.net/statistics/cyberbullying-statistics/

90% of teens in the US believe cyber harassment is a problem. 15% of young cyberbullying victims would prefer to keep the issue a secret. Students are almost twice as likely to attempt suicide if they have been cyberbullied. 80% of teens say that others cyberbully because they think it is funny. 37% of bullying victims develop social anxiety. 59% of US teenagers have experienced bullying or harassment online. 14.5% of children between the ages of 9 and 12 have been cyberbullied. 66.3% of tweens tried to help the victim of cyberbullying.

01. Introduction

Cyberbullying is the use of electronic communication to harass, intimidate, or harm individuals through the spread of harmful content, threats, or targeted attacks online.

Develop and deploy robust sentiment analysis models, including Multinomial Naive Bayes, LSTM, BiLSTM, Random Forest, and Logistic Regression, to accurately categorize offensive or harmful comments in order to combat cyberbullying and contribute to a safer online environment.

02. Goals

03. Methodology

Model Development & Deployment

Next, the model is constructed, trained on the dataset to equip it for combating cyberbullying, and eventually deployed in real-world scenarios.

Data Collection & Preparation

The initial stages involve gathering and preparing data for the project, encompassing data collection and preprocessing.

04. Data Description

The project involves a meticulously curated dataset of 6010 instances of cyberbullying text data, comprising Bengali comments collected from prominent social media platforms such as YouTube, Facebook, and Twitter. The data is evenly distributed among five categories, and statistical analysis provides insights into text lengths, frequency of specific descriptions, comment lengths, linguistic patterns, and unique words across different classes. The dataset's linguistic diversity and potential data quality issues are explored, leading to the identification of repetitions. Visualization techniques, including word clouds, provide a comprehensive overview of comment descriptions in various classes. The assessment of comment plausibility and z-score analysis influences decisions on data preservation strategies, contributing to the project's data-driven approach.

05. Data Preparation

In the data preparation phase, strategic considerations guide data selection and rigorous cleaning, including the removal of stopwords. Labels undergo encoding, and stemming is applied for further refinement. A specialized function identifies duplicate comments & despite the potential imbalance caused by removal duplicate data, a strategic decision is made to allocate class weights during model training to ensure fair representation across classes. The final dataset undergoes TF-IDF vectorization and One Hot Encoding, setting the stage for effective model training and evaluation.

06. Modeling

The modeling phase encompassef five techniques—Multinomial Naive Bayes, Logistic Regression (LR), LSTM, BiLSTM, and One Vs. All LSTM —for text classification. Each model is outlined with its approach, architecture (LSTM and BiLSTM), and evaluation metrics. Naive Bayes serves as a baseline, LSTM captures sequential patterns, BiLSTM enhances contextual understanding, RFR predicts text data, and LR excels in classification. Evaluation includes visual aids and insights into performance across diverse classes, laying the groundwork for subsequent analysis and model selection.

Fig. 02

Following a thorough evaluation of performance metrics, LSTM emerged as the preferred model for final deployment. Subsequently, it underwent fine-tuning using semi-supervised learning, culminating in the depicted final graph in Figure 2.

Fig. 01
Table 01

7.

Evaluation & Selection

08. Deployment

The comment classification model was deployed using Flask in Google Colab, offering a user-friendly interface and secure access with Ngrok. The setup, involving essential libraries and Pickle for model loading, prioritized scalability, error handling, and security. The deployment plan aligns with best practices, making the model ready for real-world implementation.

Thanksfor your attention

Any question?if notclick here to use CyberSentinel :)