Want to create interactive content? It’s easy in Genially!

Get started free

WHAT DOES REDDIT REVEALABOUT THE USA 2024TRUMP VS. HARRIS ELECTION?

Mo_amin Safari

Created on November 27, 2024

Start designing with a free template

Discover more than 1500 professional designs like these:

Transcript

Advisor: Dr. Valizadeh

WHAT DOES REDDIT REVEALABOUT THE USA 2024HARRIS VS. TRUMP ELECTION?

Group members: Shayan Kebriti Mahan Veisi Parmiss Yousefi Mohammad Amin Safari

TABLE OF CONTENTS

04.Methodology

01.introducTiOn

03.data collection

05.challenges

06.overview

02.OBJECTIVES

INTRODUCTION

01

January 2024

June 2024

  • Global Economic Impact
  • Foreign Policy
  • Human Rights & Democracy
  • Cultural Influence

The Global Importance of the U.S. Election

Biden

July 21, 2024

Endorsments

Events

https://en.wikipedia.org/wiki/2024_United_States_presidential_election

Graph of the opinion polling between Harris and Trump taken during 2024. The dashed line is when Harris became the presumptive Democratic nominee.

Election Result

OBJECTIVES

The goals or intended outcomes of the project.

02

02. Objectives

How does sentiment shift over time in response to specific events during campaigns?

What are the days when the differences in sentiment between supporters of each candidate are the most and the least?

DATA COLLECTION

HOW WE COLLECT THE DATA

03

03. data collection

Where to find the data?

How to gather useful data?

What is the challenge of collecting?

How much does we estimate?

How to gather useful data?

What is the challenge of collecting?

How much does we estimate?

03. data collection

Where to find the data?

Where to find the data?

How to gather useful data?

03. data collection

What is the challenge of collecting?

How to gather useful data?

03. data collection

How much data do we need?

cochran's formula for sample size

03. data collection

How much data do we need?

Assume: Daily_Posts (Worst Case!) = 1,000 Posts_Average_Comments (Worst Case!) = 5,000 Daily_Totall_Comments = Daily_Posts * Posts_Average_Comments = 5,000,000 p = 0.5 (Worst Case!) confidence = 95% margin of error = 5% -> Cochran Sample Size = 385 Comments/day (Same value for infinite population)

cochran's formula for sample size

03. data collection

Methodology

analyzing Reddit COMMENTS using sentiment analysis

04

04. Methodology

PREREQUISITE

large language model (LLM)

A deep learning algorithm designed to summarize, translate, predict, and generate human-like text, enabling it to convey ideas and concepts effectively

+ Info

+ Info

Natural Language Processing (NLP)

NLP is a technology that enables machines to analyze and understand human language, used here to assess sentiment and extract insights from text data.

04. Methodology

steps

1. Preprocessing

Preparing the data by cleaning and organizing Reddit comments for analysis.

3. Data Analysis

Examining trends and correlations between sentiment shifts and key campaign events.

2. Sentiment Analysis

Analyzing the emotional tone of the comments to identify positive, negative, or neutral sentiments.

04. Methodology

  • Handling Repetitions
  • Remove Excess Whitespace
  • Remove Non-ASCII Characters

Preprocess

  • Filter comments by campaign dates
  • Detecting non-English comments and removing them
  • Lowercasing
  • Remove URLs
  • Handling User Mentions, Subreddit Mentions

These are pre-trained machine learning models that have been further trained (fine-tuned) on specific datasets to improve their performance for tasks like sentiment analysis. (QR-Codes)

+ Info

+ Info

Sentiment Analysis

04. Methodology

Sentiment analysis uses NLP to detect and categorize emotions in text, identifying whether the sentiment is positive, negative, or neutral.

Models fine-tuned for these purposes:

challenges

limits and restrictions

05

05. challengEs

Lack of Context Understanding

Models may struggle with sarcasm, slang, or ambiguous language.

Maintenance

Collecting high-quality, labeled data can be costly, especially if specialized data is needed.

Scalability

Handling large datasets efficiently without compromising model speed or accuracy.

Noise and Irrelevant Features

Presence of irrelevant or noisy data can reduce model accuracy.

https://openrouter.ai/openai/chatgpt-4o-latest

05. challengEs

MAINTENANCE

Assuming that each comment averages 200 tokens and the total number of comments is 150,000, our cost estimate for this method is as follows: 150,000 * [200 * (5 / 10^6) + 15 / 10^6] = 152.25 $ This is in the range of 10 to 11 million Iranian Toman.

goal

SIMILAR RESEARCH

This article has analyzed news data and applied sentiment analysis using GPT-4 to identify news trends. Similarly, we aim to conduct such an analysis on comments. However, we want to take it a step further by doing something that hasn't been done before: examining when trends have shifted and identifying the significant events that occurred on those specific days.

goal

SIMILAR RESEARCH

H0 (Null Hypothesis): There are no significant changes in Reddit users' comments on any of the examined days. H1 (Alternative Hypothesis): There are significant changes in Reddit users' comments on some of the examined days.

overview

06

Report Writing and Final Submission

Results and Visualization

Analysis and Modeling

Data Preprocessing and Cleaning

Data Collection

06. overview

Thank you for you attention!