Want to create interactive content? It’s easy in Genially!

IBM Data Science Capstone Project Space X

JennyferWAN

Created on February 15, 2022

Start designing with a free template

Discover more than 1500 professional designs like these:

Vintage Photo Album

Nature Presentation

Halloween Presentation

Tarot Presentation

Vaporwave presentation

Women's Presentation

Geniaflix Presentation

Explore all templates

Winning Space Race with Data science

Jennyfer WAN

Outline

Executive Summary

Introduction

Methodology

Results

Conclusion

Appendix

Jennyfer WAN

executive summary

Jennyfer WAN

Executive Summary

Summary of methodologies

Data Collection through API
Data Collection with Web Scraping
Data Wrangling
Exploratory Data Analysis with SQL
Exploratory Data Analysis with Data Visualization
Interactive Visual Analytics with Folium
Machine Learning Prediction

Summary of all results

Exploratory Data Analysis results
Interactive analytics in screenshots
Predictive Analytics results

Jennyfer WAN

introduction

Jennyfer WAN

Introduction

Project background and context

SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this capstone, we will predict if the Falcon 9 first stage will land successfully using the machine learning pipeline created.

Problems you want to find answers

What factors determine if rockets will land successfully?
Which features are the most correlatedto determine the success rate of a successful landing.
What conditions does SpaceX have to achieve to get the best results and ensure the best rocket success landing rate.

Jennyfer WAN

Methodology

Jennyfer WAN

Methodology

Data collection methodology :

SpaceX Rest API.
Web Scraping from Wikipedia.

Perform data wrangling :

One Hot Encoding applied on categorical features (Transforming data for Machine Learning)

Perform exploratory data analysis (EDA) using visualization and SQL :

Scatter and bar graphs to show patterns between data.

Perform interactive visual analytics using Folium and Plotly Dash Perform predictive analysis using classification models :

How to build, tune and evaluate classification models.

Jennyfer WAN

Data collection - SpaceX api

Data collection is the process of gathering data to provide the information that's needed to answer questions, analyze business performance or other outcomes, and predict future trends or actions to take.

Decode the response content as a Json using .json()

Convert into dataframe using .json_normalize()

Filter columns then export csv

Create dataframe from dictionnary

Clean data, check and fill missing values

Request to get SpaceX API

Apply list to dictionnary

Jennyfer WAN

Data collection - web scraping

Convert dictionnary to dataframe then export csv

Request the Falcon9 Launch HTML page

Extract column name one by one

Appending data to keys

Create a BeautifulSoup object

Create a dictionnary

Find all tables

Jennyfer WAN

Data wrangling

Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis.

Calculate the number of launches on each site

Calculate the number and occurrence of each orbit

Calculate the number and occurence of mission outcome per orbit type

We perform Exploratory Data Analysis (EDA) and determine Training Labels We convert those outcomes into Training Labels with : - 1 means the booster successfully landed - 0 means it was unsuccessful

Create a landing outcome label from Outcome column

Export dataframe to csv

Jennyfer WAN

EDA with Data Visualization

Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics by using simple tools from statistics, simple plotting tools.

Most audiences understand how to read a bar graph and can grasp the information.From this graph, we can easily interpret which orbit have the highest sucess rate

Line graphs work well in showing trends chronologically.Moreover, we can visualize data changes at a glance.

We use a scatter plot to determine whether or not two variables have a relationship or correlation.

Scatter Graphs

FlightNumber and PayloadMass
Flight Number and Launch Site
Payload and Launch Site
FlightNumber and Orbit type
Payload and Orbit type

Bar Graph

Line Graph

Launch Success Yearly Trend

Success rate and Orbit type

Jennyfer WAN

EDA with SQL (Structured Query Language)

It is the standard language to interact with databases. SQL is the most important tool, a data analyst uses to manipulate and gain insights from the data.

We were able to load SpaceX dataset into the corresponding table in a Db2 database directly on Jupyter notebook. We performed EDA with SQL queries to gather information from the data :

Display the names of the unique launch sites in the space mission
Display 5 records where launch sites begin with the string 'CCA'
Display the total payload mass carried by boosters launched by NASA (CRS)
Display average payload mass carried by booster version F9 v1.1
List the date when the first successful landing outcome in ground pad was acheived.
List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000
List the total number of successful and failure mission outcomes
List the names of the booster_versions which have carried the maximum payload mass. Use a subquery
List the failed landing_outcomes in drone ship, their booster versions, and launch site names for in year 2015
Rank the count of landing outcomes (such as Failure (drone ship) or Success (ground pad)) between the date 2010-06-04 and 2017-03-20, in descending order

Jennyfer WAN

Build an Interactive Map with Folium

Folium makes it easy to visualize data that has been manipulated in Python on an interactive leaflet map which makes it an excellent tool for plotting maps.

In order to make an interactive visual analytics :

We marked all launch sites on a map and added map objects such as :

folium.Marker()
folium.Circle()

We assigned the feature "launch_outcome" to easily visualize marker colors on the map based on the class value :

1 (Success) = Green
0 (Failure) = Red

We marked the success/failed launches for each site on the map with folium.Icon() :

The color-labeled marker clusters, able us to easily identify which launch sites have relatively high success rates.

We calculated the distances between a launch site to its proximities thanks to MousePosition to get coordinate for a mouse over a point on the map.

Then we answered some questions :

Are launch sites in close proximity to railways, highways and coastline ?
Do launch sites keep certain distance away from cities ?

Jennyfer WAN

Build a Dashboard with Plotly Dash

Dash is a python framework created by plotly for creating interactive web applications.

Pie Chart :

- Add a Launch Site Dropdown Input Component to show the total launches by all sites or a certain site.- Generally used to display numeric values, Pie chart is easy to understand thanks to its different portions and color codings.

Range Slider :

- Add a Range Slider to Select Payload Mass (Kg) in order to find if variable is correlated to mission outcome.

Scatter Plot :

- Add a Scatter Plot to show the relationship with Outcome and Payload Mass for the different Booster Version.- Scatter plot is used to determine whether or not two variables have a relationship or correlation.

Jennyfer WAN

Predictive Analysis (Classification)

Predictive analytics is the use of various statistical and machine learning algorithms to predict the likelihood of future outcomes based on historical data. The goal is to suggest a course of action or strategy to make decisions from immediate to long term to provide the best assessment of what will happen in the future.

1. Building Models

Load our dataset using NumPy and Pandas
Transform our data
Split our data into train and test set
Check how many test samples have been created
Select the different machine algorithms to be trained
Set hyperparameters and algorithms to the object GridSearchCV

Fit the the data into the GridSearchCV to find the best parameters

Train our dataset

3. Find the best performing Classification Model

Select the model with the best accuracy score

2. Evaluating Model

Use the accuracy metric for our model

Improve our model by using feature engineering and algorithm tuning to find the best hyperparameters for each type of algorithms

Plot a Confusion Matrix

Jennyfer WAN

results

Jennyfer WAN

Results

Exploratory data analysis results

Interactive analytics demo in screenshots

Predictive analysis results

Jennyfer WAN

EDA WITH visualization

Flight Number vs. Launch Site
Payload vs. Launch Site
Success Rate vs. Orbit Type
Flight Number vs. Orbit Type
Payload vs. Orbit Type
Launch Success Yearly Trend

Jennyfer WAN

Flight Number vs. Launch Site

As we can see from this scatter plot, the more the number of flights increases, the more the success rate increases

Jennyfer WAN

Payload vs. Launch Site

For example with CCAFS SLC 40 LaunchSite, the greater the playload mass, the greater the success rate for the rocket to land.At this stage, we still cannot determine if there is a correlation between these two variables.

Jennyfer WAN

Success Rate vs. Orbit Type

From the barplot, we can see that 'ES-L1', 'GEO', 'HEO', 'SSO' had the most success rate.

Jennyfer WAN

Flight Number vs. Orbit Type

We can clearly observe for 'LEO' Orbit that sucess is related to the number of flights.Unlike 'GTO' where there seems to be no correlation.

Jennyfer WAN

Payload vs. Orbit Type

With heavy payloads, the successful landing are more for 'PO', 'ISS' and 'LEO' orbits.Unlike 'GTO' which have a negative impact.

Jennyfer WAN

Launch Success Yearly Trend

From this line chart, we can observe that the success rate keep increasing since 2013 till 2020.

Jennyfer WAN

EDA WITH SQL

All Launch Site Names
Launch Site Names Begin with 'CCA'
Total Payload Mass
Average Payload Mass by F9 v1.1
First Successful Ground Landing Date
Successful Drone Ship Landing with Payload between 4000 and 6000
Total Number of Successful and Failure Mission Outcomes
Boosters Carried Maximum Payload
2015 Launch Records
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

Jennyfer WAN

All Launch Site Names

SQL QUERY

We used the keyword DISTINCT to show unique values in LaunchSite column from SpaceXTbl dataset

RESULT

Jennyfer WAN

Launch Site Names Begin with 'CCA'

SQL QUERY

SELECT * FROM = Query all (*) rows and columns from SpaceXTbl dataset WHERE ... LIKE ... = Condition which will only query rows from Launch_Siteusing pattern matching % LIMIT = Return first n_rows matching the SELECT criteria.

➜ Display first 5 records where column Launch_Site values must start with 'CCA'

RESULT

Jennyfer WAN

Total Payload Mass

SQL QUERY

SELECT SUM( ) = Return the total of the column payload_mass__kg_ AS = rename the name of column WHERE ... = ... = Condition which will only query rows from Customer columns with 'NASA (CRS)' values

➜ Display the total payload mass carried by boosters launched by NASA (CRS)

RESULT

Jennyfer WAN

Average Payload Mass by F9 v1.1

SQL QUERY

SELECT AVG( ) = Return the average of the column payload_mass__kg_ WHERE ... = ... = Condition which will only query rows from Booster_version column with 'F9 v1.1' values

➜ Display average payload mass carried by booster version F9 v1.1

RESULT

Jennyfer WAN

First Successful Ground Landing Date

SQL QUERY

SELECT MIN( ) = Return the average of the column payload_mass__kg_ WHERE ... = ... = Condition which will only query rows from Launding__outcome column with 'Success (ground pad)' values

➜ Display the date when the first successful landing outcome in ground pad was acheived.

RESULT

Jennyfer WAN

Successful Drone Ship Landing with Payload between 4000 and 6000

SQL QUERY

SELECT = Query only data from booster_version column WHERE ... = ... = Condition which will only query rows from Launding__outcome column with 'Success (drone ship)' values AND = Requires that additional conditions are true

➜ Display the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000

RESULT

Jennyfer WAN

Total Number of Successful and Failure Mission Outcomes

SQL QUERY

SUM(CASE WHEN ... THEN ... ELSE ... END) = Case statement is used to get both success and failure instead of multiple COUNT()

➜ Display the total number of successful and failure mission outcomes

RESULT

Jennyfer WAN

Boosters Carried Maximum Payload

SQL QUERY

Here, we used a subquery for :WHERE ... = (SELECT MAX( ) FROM ...) = return the maximum value of payload_mass__kg_ column

➜ Display the names of the booster_versions which have carried the maximum payload mass.

RESULT

Jennyfer WAN

2015 Launch Records

SQL QUERY

MONTHNAME(...) = return the month from the column Date WHERE ...AND ... = requires all conditions to be true. Here, we only select Failure (drone ship) from 2015

➜ Display the failed landing_outcomes in drone ship, their booster versions, and launch site names for in year 2015

RESULT

Jennyfer WAN

LAUNCHSITES PROXIMITIES analysis

All Launch Sites on folium map
Color labels for each site on the map
Launch Sites distances from railways / highways / cities / coastlines

Jennyfer WAN

All Launch Sites on folium map

Florida

California

➜ Launch sites are close proximity to the coast for safety reasons

Jennyfer WAN

Color labels for each site on the map

Green Markers = Successful Launches Red Markers = Failure Launches ➜ KSC LC-39A launch site has the most probability of success

Jennyfer WAN

Launch Sites distances from railways / highways / cities / coastlines

Closest_Coastline

Closest_Highlway

Closest_City

Closest_Railway

➜ Are launch sites in close proximity to railways ? Launch sites are nearest from railways in order to transport and receive more easily materials or cargos. But also, to minimize the distance for the employees : thus saving time, money and effort.

➜ Are launch sites in close proximity to highways ? Launch sites still close to highways for the same reasons for railways. But since the highways are also frequented by the population, they must keep a safe distance to avoid any injuries.

➜ Are launch sites in close proximity to coastline ? Launch sites are close to coastline for multiple logical reasons : - As we saw on previous notebooks, the launch success rate may depend on many factors such as the location and proximities of a launch site, i.e., the initial position of rocket trajectories. - Do the lauches over the ocean to cancel any time in case of problems. - Prevent human and material repercussions in case of failure.

➜ Do launch sites keep certain distance away from cities? Launch sites are the farthest from cities and dense areas to protect the population from them.

build a dashoboard with plotly dash

Success Count for all Launch Sites with pie chart
Pie chart with highest success ratio
Folium Map Screenshot 3

Jennyfer WAN

Success Count for all Launch Sites with pie chart

Has we saw on Folium Map part, KSL LC-391 had the most successeful launches from Launch Sites.

Jennyfer WAN

pie chart with highest success ratio

KSL LC-39A had 76.9% of success rate while getting 23.1% of failure rate

Jennyfer WAN

scatter plot of Payload vs launch outcome for all sites

(with different Payload selected in Range slider) PART I

Payload range(Kg) between 0 to 5600 Kg

Payload range(Kg) between 0 to 10 000 Kg

Low weighted Payload Mass (Kg) have HIGHER success rate than Heavy Payload Mass (Kg)

Jennyfer WAN

scatter plot of Payload vs launch outcome for all sites

(with different Payload selected in Range slider) PART II

Booster Version Company with Highest success rate

Payload range(Kg) with highest success rate

Payload range(Kg) with lowest success rate

Payload range with lowest success rate is between 362 Kg and 475kg. Most payload mass with highest success rate is between 1952 Kg and 5300kg. FT is the Booster Version with highest launch success rate.

Jennyfer WAN

predictive analysis (classification)

Classification Accuracy
Confusion Matrix

Jennyfer WAN

Classification Accuracy

We trained 4 models different models. Decision Tree has the highest classification accuracy with 0.90 (while during test set, it got 0.83, the lowest score)

Jennyfer WAN

Confusion matrix

False Negative

+ INFO

Decision Tree got a higher result thanks to his TP of 3 against TP of 5 for the other models.On the other hand, it calculates more than FN (3 against 1).

Same Confusion Matrix for KNN, Decision Tree and Logistic Regression

Jennyfer WAN

conclusion

Jennyfer WAN

Conclusions

More the number of flights increases, more the success rate increases at a launch site.

Orbits 'ES-L1', 'GEO', 'HEO', 'SSO' had the most success rate.

Success rate keep increasing since 2013 till 2020.

KSC LC-39A had the most successful launches from all sites.

Low payloads mass (Kg) perform better than the heavier payloads.

Most payload mass with highest success rate is between 1952 Kg and 5300kg.

FT is the Booster Version with highest launch success rate.

The Decision Tree classifier is the best machine learning algorithm for this project with provided dataset.

Jennyfer WAN

appendix

Jennyfer WAN

LINKS

Interactive Plotly : https://plotly.com/python-api-reference/ Dash Plotly :https://dash.plotly.com DashBoarding Tools : https://pyviz.org/dashboarding/

Jennyfer WAN

THANKS!

Jennyfer WAN

View

Vintage Photo Album

View

Nature Presentation

View

Halloween Presentation

View

Tarot Presentation

View

Vaporwave presentation

View

Women's Presentation

View

Geniaflix Presentation

IBM Data Science Capstone Project Space X

Start designing with a free template

View

Vintage Photo Album

View

Nature Presentation

View

Halloween Presentation

View

Tarot Presentation

View

Vaporwave presentation

View

Women's Presentation

View

Geniaflix Presentation

Transcript