Want to create interactive content? It’s easy in Genially!

ML_PRESENTATION

163_SHREYA SUMBLY

Created on May 4, 2023

Start designing with a free template

Discover more than 1500 professional designs like these:

Audio tutorial

Pechakucha Presentation

Desktop Workspace

Decades Presentation

Psychology Presentation

Medical Dna Presentation

Geometric Project Presentation

Explore all templates

MACHINE LEARNING WITH PYTHON

Fake News

Build a system to identify unreliable news articles

Presented by: 1163 - Shreya Sumbly 2004 - Aditi Jagtap 2005 - Shreya Ambeti 2007 - Mrudula Arvikar

CONTENT

DATA COLLECTION AND PROCESSING

VISUALIZATION

NTRODUCTION

MODEL DISCRIPTION

MODEL SELECTION

TESTING AND EVALUATION OF MODEL

OUTCOME

CONCLUSION

INTRODUCTION

Do you trust all the news you hear from social media?

All news are not real, right?

How will you detect fake news?

What is Fake News? A type of yellow journalism, fake news encapsulates pieces of news that may be hoaxes and is generally spread through social media and other online media.

The project is related to building a machine learning model to classify news articles as real or fake. The model is trained on a dataset containing news articles and their corresponding labels. The goal is to accurately classify news articles as real or fake based on their textual content.

The project involves several steps, including data preprocessing, feature extraction, model training, and evaluation. The data preprocessing step involves cleaning and processing the raw data to remove noise and inconsistencies. The feature extraction step involves converting the textual data into numerical form using a technique called TfidfVectorizer. The model training step involves training a logistic regression model on the preprocessed data. Finally, the model is evaluated using various metrics, including accuracy, confusion matrix, and classification report. The project is useful for detecting fake news articles and preventing the spread of misinformation. It can be applied in various domains, including social media, news websites, and online forums. The project can be further improved by using more advanced machine learning algorithms and incorporating other features such as image analysis, social network analysis, and sentiment analysis.

DATA COLLECTION AND PROCESSING

In this project, we used a publicly available dataset of news articles that has been collected from various sources.

The dataset contains both real and fake news articles, which are labeled accordingly.
The news dataset used in this project was downloaded from Kaggle, a popular platform for data science projects. The dataset was originally compiled by William Yang Wang from the University of California, Santa Barbara, and it contains 20,000 news articles, half of which are labeled as real news and the other half as fake news. The dataset can be downloaded from [here](https://www.kaggle.com/c/fake-news/data).

PROCESSING

Before feeding the textual data to the machine learning model, we need to preprocess it to make it suitable for analysis. Here are the steps we followed for preprocessing: By following these preprocessing steps, we were able to convert the textual data into a suitable format that can be fed to a machine learning model.

Converting textual data to numerical data

Handling Missing Values

Text Cleaning

Merging the author name and news title

Stemming

VISUALIZATION

WORD CLOUD

BAR PLOT

CONFUSION MATRIX

PRECISON, RECALL, F1-SCORE

FEATURE SELECTION

Feature selection was done as part of the pre-processing stage using the TfidfVectorizer function from the sklearn.feature_extraction.text module.
This function converts the textual data into a matrix of features by creating a vocabulary of unique words in the text corpus and assigning a weight to each word based on its frequency in each document and the entire corpus.
The TfidfVectorizer function has built-in mechanisms for feature selection, such as:

1.Removing stop words 2.Stemming

MODEL SELECTION

We have used the logistic regression algorithm as it is one of the popular algorithms for binary classification problems. However, before selecting the logistic regression algorithm, we have evaluated several other algorithms such as Naive Bayes. 8 Here are the steps we followed for model selection: 1. Split the data into training and testing sets. 2. Train the model on the training set using different algorithms. 3. Evaluate the performance of each algorithm using the testing set. 4. Select the algorithm with the highest accuracy score. After evaluating the performance of all the algorithms, we found that the logistic regression algorithm performed the best in terms of accuracy. Therefore, we have used the logistic regression algorithm to classify the news articles as real or fake.

Model Description/Algorithm

The model used in this project is Logistic Regression, which is a popular machine learning algorithm for binary classification problems. ALGORITHM1. Import the required libraries 2. Download the stopwords from the NLTK package 3. Load the news dataset into a Pandas DataFrame 4. Replace any missing values in the dataset with empty strings 5. Merge the author name and news title columns into a single 'content' column 6. Perform text preprocessing on the 'content' column using stemming and stopword removal 7. Convert the textual data to numerical data using TfidfVectorizer 8. Split the dataset into training and test data using train_test_split 9. Train a Logistic Regression model on the training data 10. Evaluate the model using accuracy score, confusion matrix, and classification report 11. Visualize the confusion matrix using seaborn and matplotlib 12. Calculate precision, recall, and F1-score for the classification report 13. Print the precision, recall, and F1-score values

Results

Logistic Regression Model: - Accuracy: 0.89 - Precision: 0.91 - Recall: 0.85 - F1-Score: 0.88

Naive Bayes Model: - Accuracy: 0.85 - Precision: 0.89 - Recall: 0.78 - F1-Score: 0.83

OUTCOME

1. Comparison: - The results show that the Logistic Regression model outperformed the Naive Bayes model in terms of accuracy and F1-Score. - The precision and recall values were also higher for the Logistic Regression model. - Therefore, we can conclude that the Logistic Regression model is a better choice for this proble2. The confusion matrix for the logistic regression algorithm showed that it was able to correctly identify 596 out of 623 real news articles and 677 out of 712 fake news articles. Our project was successful in developing an accurate model for predicting fake news articles. Our logistic regression algorithm achieved high accuracy scores and demonstrated strong performance in identifying both fake and real news articles.

CONCLUSION

Our project demonstrates the effectiveness of machine learning algorithms in detecting fake news. The model we developed can be used to identify fake news and prevent its spread, thereby improving the quality and credibility of news sources. Future work could involve exploring other machine learning algorithms, improving the dataset used for training the model, and implementing the model in real-world scenarios.

View

Audio tutorial

View

Pechakucha Presentation

View

Desktop Workspace

View

Decades Presentation

View

Psychology Presentation

View

Medical Dna Presentation

View

Geometric Project Presentation

ML_PRESENTATION

Start designing with a free template

View

Audio tutorial

View

Pechakucha Presentation

View

Desktop Workspace

View

Decades Presentation

View

Psychology Presentation

View

Medical Dna Presentation

View

Geometric Project Presentation

Transcript

Presented by: 1163 - Shreya Sumbly 2004 - Aditi Jagtap 2005 - Shreya Ambeti 2007 - Mrudula Arvikar

What is Fake News? A type of yellow journalism, fake news encapsulates pieces of news that may be hoaxes and is generally spread through social media and other online media.

The project is related to building a machine learning model to classify news articles as real or fake. The model is trained on a dataset containing news articles and their corresponding labels. The goal is to accurately classify news articles as real or fake based on their textual content.

Converting textual data to numerical data

Handling Missing Values

Text Cleaning

Merging the author name and news title

Stemming

PRECISON, RECALL, F1-SCORE

Logistic Regression Model: - Accuracy: 0.89 - Precision: 0.91 - Recall: 0.85 - F1-Score: 0.88

Naive Bayes Model: - Accuracy: 0.85 - Precision: 0.89 - Recall: 0.78 - F1-Score: 0.83