Want to create interactive content? It’s easy in Genially!
WHAT DOES REDDIT REVEALABOUT THE USA 2024TRUMP VS. HARRIS ELECTION?
Mo_amin Safari
Created on November 27, 2024
Start designing with a free template
Discover more than 1500 professional designs like these:
Transcript
Advisor: Dr. Valizadeh
WHAT DOES REDDIT REVEALABOUT THE USA 2024HARRIS VS. TRUMP ELECTION?
Group members: Shayan Kebriti Mahan Veisi Parmiss Yousefi Mohammad Amin Safari
TABLE OF CONTENTS
04.Methodology
01.introducTiOn
03.data collection
05.challenges
06.overview
02.OBJECTIVES
INTRODUCTION
01
January 2024
June 2024
- Global Economic Impact
- Foreign Policy
- Human Rights & Democracy
- Cultural Influence
The Global Importance of the U.S. Election
Biden
July 21, 2024
Endorsments
Events
https://en.wikipedia.org/wiki/2024_United_States_presidential_election
Graph of the opinion polling between Harris and Trump taken during 2024. The dashed line is when Harris became the presumptive Democratic nominee.
Election Result
OBJECTIVES
The goals or intended outcomes of the project.
02
02. Objectives
How does sentiment shift over time in response to specific events during campaigns?
What are the days when the differences in sentiment between supporters of each candidate are the most and the least?
DATA COLLECTION
HOW WE COLLECT THE DATA
03
03. data collection
Where to find the data?
How to gather useful data?
What is the challenge of collecting?
How much does we estimate?
How to gather useful data?
What is the challenge of collecting?
How much does we estimate?
03. data collection
Where to find the data?
Where to find the data?
How to gather useful data?
03. data collection
What is the challenge of collecting?
How to gather useful data?
03. data collection
How much data do we need?
cochran's formula for sample size
03. data collection
How much data do we need?
Assume: Daily_Posts (Worst Case!) = 1,000 Posts_Average_Comments (Worst Case!) = 5,000 Daily_Totall_Comments = Daily_Posts * Posts_Average_Comments = 5,000,000 p = 0.5 (Worst Case!) confidence = 95% margin of error = 5% -> Cochran Sample Size = 385 Comments/day (Same value for infinite population)
cochran's formula for sample size
03. data collection
Methodology
analyzing Reddit COMMENTS using sentiment analysis
04
04. Methodology
PREREQUISITE
large language model (LLM)
A deep learning algorithm designed to summarize, translate, predict, and generate human-like text, enabling it to convey ideas and concepts effectively
+ Info
+ Info
Natural Language Processing (NLP)
NLP is a technology that enables machines to analyze and understand human language, used here to assess sentiment and extract insights from text data.
04. Methodology
steps
1. Preprocessing
Preparing the data by cleaning and organizing Reddit comments for analysis.
3. Data Analysis
Examining trends and correlations between sentiment shifts and key campaign events.
2. Sentiment Analysis
Analyzing the emotional tone of the comments to identify positive, negative, or neutral sentiments.
04. Methodology
- Handling Repetitions
- Remove Excess Whitespace
- Remove Non-ASCII Characters
Preprocess
- Filter comments by campaign dates
- Detecting non-English comments and removing them
- Lowercasing
- Remove URLs
- Handling User Mentions, Subreddit Mentions
These are pre-trained machine learning models that have been further trained (fine-tuned) on specific datasets to improve their performance for tasks like sentiment analysis. (QR-Codes)
+ Info
+ Info
Sentiment Analysis
04. Methodology
Sentiment analysis uses NLP to detect and categorize emotions in text, identifying whether the sentiment is positive, negative, or neutral.
Models fine-tuned for these purposes:
challenges
limits and restrictions
05
05. challengEs
Lack of Context Understanding
Models may struggle with sarcasm, slang, or ambiguous language.
Maintenance
Collecting high-quality, labeled data can be costly, especially if specialized data is needed.
Scalability
Handling large datasets efficiently without compromising model speed or accuracy.
Noise and Irrelevant Features
Presence of irrelevant or noisy data can reduce model accuracy.
https://openrouter.ai/openai/chatgpt-4o-latest
05. challengEs
MAINTENANCE
Assuming that each comment averages 200 tokens and the total number of comments is 150,000, our cost estimate for this method is as follows: 150,000 * [200 * (5 / 10^6) + 15 / 10^6] = 152.25 $ This is in the range of 10 to 11 million Iranian Toman.
goal
SIMILAR RESEARCH
This article has analyzed news data and applied sentiment analysis using GPT-4 to identify news trends. Similarly, we aim to conduct such an analysis on comments. However, we want to take it a step further by doing something that hasn't been done before: examining when trends have shifted and identifying the significant events that occurred on those specific days.
goal
SIMILAR RESEARCH
H0 (Null Hypothesis): There are no significant changes in Reddit users' comments on any of the examined days. H1 (Alternative Hypothesis): There are significant changes in Reddit users' comments on some of the examined days.
overview
06
Report Writing and Final Submission
Results and Visualization
Analysis and Modeling
Data Preprocessing and Cleaning
Data Collection