DL CW2
Udaya Singh Rai
Created on December 5, 2023
More creations to inspire you
FOOD 1
Presentation
COUNTRIES LESSON 5 GROUP 7/8
Presentation
BLENDED PEDAGOGUE
Presentation
WORLD WILDLIFE DAY
Presentation
FOOD AND NUTRITION
Presentation
2021 TRENDING COLORS
Presentation
HISTORY OF THE CIRCUS
Presentation
Transcript
A Holistic Approach to Cyberbullying Mitigation through Advanced Sentiment Analysis and Comment Classification
CyberSentinel
DeepLearning Final Project
Dec 2023
Author: Udaya Singh RaiStudent ID: 4129328
CyberSentinel
03. Methodology
02. Goals
01. Introduction
06. Modeling
05. Data Preparation
04. Data Description
09. Conclusion
08. Deployment
07. Evaluation
Index
https://dataprot.net/statistics/cyberbullying-statistics/
90% of teens in the US believe cyber harassment is a problem. 15% of young cyberbullying victims would prefer to keep the issue a secret. Students are almost twice as likely to attempt suicide if they have been cyberbullied. 80% of teens say that others cyberbully because they think it is funny. 37% of bullying victims develop social anxiety. 59% of US teenagers have experienced bullying or harassment online. 14.5% of children between the ages of 9 and 12 have been cyberbullied. 66.3% of tweens tried to help the victim of cyberbullying.
01. Introduction
Cyberbullying is the use of electronic communication to harass, intimidate, or harm individuals through the spread of harmful content, threats, or targeted attacks online.
Develop and deploy robust sentiment analysis models, including Multinomial Naive Bayes, LSTM, BiLSTM, Random Forest, and Logistic Regression, to accurately categorize offensive or harmful comments in order to combat cyberbullying and contribute to a safer online environment.
02. Goals
03. Methodology
Model Development & Deployment
Next, the model is constructed, trained on the dataset to equip it for combating cyberbullying, and eventually deployed in real-world scenarios.
Data Collection & Preparation
The initial stages involve gathering and preparing data for the project, encompassing data collection and preprocessing.
04. Data Description
The project involves a meticulously curated dataset of 6010 instances of cyberbullying text data, comprising Bengali comments collected from prominent social media platforms such as YouTube, Facebook, and Twitter. The data is evenly distributed among five categories, and statistical analysis provides insights into text lengths, frequency of specific descriptions, comment lengths, linguistic patterns, and unique words across different classes. The dataset's linguistic diversity and potential data quality issues are explored, leading to the identification of repetitions. Visualization techniques, including word clouds, provide a comprehensive overview of comment descriptions in various classes. The assessment of comment plausibility and z-score analysis influences decisions on data preservation strategies, contributing to the project's data-driven approach.
05. Data Preparation
In the data preparation phase, strategic considerations guide data selection and rigorous cleaning, including the removal of stopwords. Labels undergo encoding, and stemming is applied for further refinement. A specialized function identifies duplicate comments & despite the potential imbalance caused by removal duplicate data, a strategic decision is made to allocate class weights during model training to ensure fair representation across classes. The final dataset undergoes TF-IDF vectorization and One Hot Encoding, setting the stage for effective model training and evaluation.
06. Modeling
The modeling phase encompassef five techniques—Multinomial Naive Bayes, Logistic Regression (LR), LSTM, BiLSTM, and One Vs. All LSTM —for text classification. Each model is outlined with its approach, architecture (LSTM and BiLSTM), and evaluation metrics. Naive Bayes serves as a baseline, LSTM captures sequential patterns, BiLSTM enhances contextual understanding, RFR predicts text data, and LR excels in classification. Evaluation includes visual aids and insights into performance across diverse classes, laying the groundwork for subsequent analysis and model selection.
Fig. 02
Following a thorough evaluation of performance metrics, LSTM emerged as the preferred model for final deployment. Subsequently, it underwent fine-tuning using semi-supervised learning, culminating in the depicted final graph in Figure 2.
Fig. 01
Table 01
7.
Evaluation & Selection
08. Deployment
The comment classification model was deployed using Flask in Google Colab, offering a user-friendly interface and secure access with Ngrok. The setup, involving essential libraries and Pickle for model loading, prioritized scalability, error handling, and security. The deployment plan aligns with best practices, making the model ready for real-world implementation.
Thanksfor your attention
Any question?if notclick here to use CyberSentinel :)