Want to create interactive content? It’s easy in Genially!
Getting Started with Data Cleaning and Analysis Using Python in Google Colab
Rohan K
Created on March 11, 2025
Start designing with a free template
Discover more than 1500 professional designs like these:
Transcript
Getting Started with Data Cleaning and Analysis Using Python in Google Colab
Step 4: Perform Data Analysis
Step 1: Introduction
Brief explanation of Python and Google Colab
Analyze your dataset
Step 2: Setting Up Google Colab
Step 5: Final Assesment
Lets test your knolwedge
How to create your own notebook
Step 3: Import your dataset
Step 6: Future Work
Get resources to apply your knowledge in real datasets
Opening data set in Colab and modifying it with Pandas
Introduction
Brief explanation of Data Science and Google Colab
Google Colab is Google’s free Jupyter-notebook service that runs in the cloud. It gives you an instant Python environment with optional GPUs/TPUs, access to Google Drive, and real-time collaboration - so you can write, execute, and share code without installing anything locally.
Google Colab is like a data-science workbench in your browser:
- You can import datasets from Drive, the web, or BigQuery; explore and clean them with Python libraries such as pandas and NumPy
- Visualize results on the fly with matplotlib or Plotly
- Train machine-learning models with GPUs for heavier deep-learning jobs.
Google Colab Link
Starting a Google Colab Notebook
colab.research.google.com
Follow the Steps to get create with your own notebook
Activity
1. Create a new code cell in your Google Colab notebook by clicking the + Code button. 2. Type the following Python code into the newly created cell: name = input("What's your name? ") print(f"Hello, {name}! Welcome to your first Python activity.") 3. Run the cell by pressing Shift + Enter or clicking the play icon ▶️ in the cell. 4. When prompted, type your name and press Enter. 5. Try commenting with a text cell
Importing your dataset
Learn how to import and edit dataset
Depending on your familiarity with pandas, feel free to either practice data cleaning using your own dataset or follow the guided tutorial on the right for an introductory walkthrough of essential data cleaning techniques.
2 ways to import your own Dataset
1. Upload a file directly from your computer
In the left sidebar, click Files ▸ Upload (or run from google.colab import files; files.upload() in a code cell). Choose the file(s) on your computer. Colab stores them in the ephemeral working directory /content/. Access in code exactly like a local path:import pandas as pd df = pd.read_csv('/content/your_dataset.csv')
Pandas with a toutorial
2. Mount your Google Drive
In a code cell, run:from google.colab import drive drive.mount('/content/drive')A popup asks for OAuth permission → choose the Google account that owns the Drive.After the mount completes, your Drive appears under /content/drive/MyDrive/. Read or write files normally: !ls '/content/drive/MyDrive/data' df = pd.read_csv('/content/your_dataset.csv')
Perform Data Analysis
Learn how to find patterns in your data
GitHub link to the notebook
Learning Outcomes
Basic Data Analysis Tasks:
- Data Cleaning and Preprocessing:
- Identify and handle missing values
- Remove duplicates
- Detect and correct outliers
- Standardize and normalize data
- Summarize key statistics (mean, median, standard deviation, min, max)
- Identify patterns and distributions
- Examine correlations between variables
- Execute data analysis and visualization using pandas and matplotlib
- Create simple visualizations (such as histograms, bar charts, and box plots) to effectively represent your data using matplotlib.
- Write and execute Python code in Google Colab to analyze data systematically.
Final Assesment
Final Assesment
Additional Resources
Congrats
Harvard Certification
This is the end of the course, now get out there and lets solve real world problems!