Want to create interactive content? It’s easy in Genially!

Get started free

Machine Learning

Quentin Saguer

Created on November 27, 2024

Start designing with a free template

Discover more than 1500 professional designs like these:

Practical Presentation

Smart Presentation

Essential Presentation

Akihabara Presentation

Pastel Color Presentation

Nature Presentation

Higher Education Presentation

Transcript

Machine Learning

Team Members :SAGUER Quentin/GUESSOUS Samy CARRE Pablo/PONTHIEU Gabriel/TALLARON Matéo

Dataset Overview

  • Source : Cyber Threat Detection document
  • This dataset contains 1430 rows and 23 columns
  • Problem statement :
    • Goal : Classify network activities as either malicious or mild
    • Target : Label column where 1 = malicious and mild = 0

Exploring Dataset

Data Cleaning

  • Tasks performed :
    • Removed irrelevant columns
    • Verified there were no missing values in the dataset
    • Checked for duplicates : "No duplicate rows found"

Data Visualization

Splitting features and target

  • Charts to include :
    • Bar chart : Distribution of Label values
    • Histogram : Distribution of Packet_Length or another numeric features
    • Heatmap : Correlation between features (use seaborn or similar)
  • Features : All relevant columns except Label
  • Target : The Label column

Splitting Data

  • Process :
    • Split dataset into training (80%) and testing (20%)
    • Use the python code above
  • Why ?
    • To ensure model is tested on unseen data

Training Models

  • Models used :
    • Logistic Regression
    • Random Forest Classifier
    • Support Vector Machine (SVM)
  • Process :
    • Train each model using the training data
    • Use the python code here shown ->

Models Evaluation

Conclusion

  • Key Insight :
    • Class imbalance is a challenge but can be managed with Random Forest
    • Key features such as Packet_Length and Bytes_Sent are critical for classification
    • Visualizing the data helped in understanding distributions and feature importance, leading to a better model
  • Best Model :
    • Random Forest showed the best overall performance for this classification task

Thank you for listening

Any questions