Want to create interactive content? It’s easy in Genially!

Get started free

DL CW2

Udaya Singh Rai

Created on December 5, 2023

Start designing with a free template

Discover more than 1500 professional designs like these:

Transcript

CyberSentinel

CyberSentinel

A Holistic Approach to Cyberbullying Mitigation through Advanced Sentiment Analysis and Comment Classification

Author: Udaya Singh Rai Student ID: 4129328

DeepLearning Final Project

Dec 2023

Index

01. Introduction

07. Evaluation

04. Data Description

02. Goals

05. Data Preparation

08. Deployment

06. Modeling

09. Conclusion

03. Methodology

01. Introduction

Cyberbullying is the use of electronic communication to harass, intimidate, or harm individuals through the spread of harmful content, threats, or targeted attacks online.

90% of teens in the US believe cyber harassment is a problem. 15% of young cyberbullying victims would prefer to keep the issue a secret. Students are almost twice as likely to attempt suicide if they have been cyberbullied. 80% of teens say that others cyberbully because they think it is funny. 37% of bullying victims develop social anxiety. 59% of US teenagers have experienced bullying or harassment online. 14.5% of children between the ages of 9 and 12 have been cyberbullied. 66.3% of tweens tried to help the victim of cyberbullying.

https://dataprot.net/statistics/cyberbullying-statistics/

02. Goals

Develop and deploy robust sentiment analysis models, including Multinomial Naive Bayes, LSTM, BiLSTM, Random Forest, and Logistic Regression, to accurately categorize offensive or harmful comments in order to combat cyberbullying and contribute to a safer online environment.

03. Methodology

Data Collection & Preparation

The initial stages involve gathering and preparing data for the project, encompassing data collection and preprocessing.

Model Development & Deployment

Next, the model is constructed, trained on the dataset to equip it for combating cyberbullying, and eventually deployed in real-world scenarios.

04. Data Description

The project involves a meticulously curated dataset of 6010 instances of cyberbullying text data, comprising Bengali comments collected from prominent social media platforms such as YouTube, Facebook, and Twitter. The data is evenly distributed among five categories, and statistical analysis provides insights into text lengths, frequency of specific descriptions, comment lengths, linguistic patterns, and unique words across different classes. The dataset's linguistic diversity and potential data quality issues are explored, leading to the identification of repetitions. Visualization techniques, including word clouds, provide a comprehensive overview of comment descriptions in various classes. The assessment of comment plausibility and z-score analysis influences decisions on data preservation strategies, contributing to the project's data-driven approach.

05. Data Preparation

In the data preparation phase, strategic considerations guide data selection and rigorous cleaning, including the removal of stopwords. Labels undergo encoding, and stemming is applied for further refinement. A specialized function identifies duplicate comments & despite the potential imbalance caused by removal duplicate data, a strategic decision is made to allocate class weights during model training to ensure fair representation across classes. The final dataset undergoes TF-IDF vectorization and One Hot Encoding, setting the stage for effective model training and evaluation.

06. Modeling

The modeling phase encompassef five techniques—Multinomial Naive Bayes, Logistic Regression (LR), LSTM, BiLSTM, and One Vs. All LSTM —for text classification. Each model is outlined with its approach, architecture (LSTM and BiLSTM), and evaluation metrics. Naive Bayes serves as a baseline, LSTM captures sequential patterns, BiLSTM enhances contextual understanding, RFR predicts text data, and LR excels in classification. Evaluation includes visual aids and insights into performance across diverse classes, laying the groundwork for subsequent analysis and model selection.

7.

Evaluation & Selection

Table 01
Fig. 01
Fig. 02

Following a thorough evaluation of performance metrics, LSTM emerged as the preferred model for final deployment. Subsequently, it underwent fine-tuning using semi-supervised learning, culminating in the depicted final graph in Figure 2.

08. Deployment

The comment classification model was deployed using Flask in Google Colab, offering a user-friendly interface and secure access with Ngrok. The setup, involving essential libraries and Pickle for model loading, prioritized scalability, error handling, and security. The deployment plan aligns with best practices, making the model ready for real-world implementation.

Thanksfor your attention

Any question? if not click here to use CyberSentinel :)