Business understanding
Data understanding
Data Preparation
Modeling
Data
Deployment
Evaluation
Business Understanding
Other objectives of the Business understanding phase include:
- Determining the success criteria associated with the knowledge discovery exercise.
- Assessing the current state of the business and determining whether the business is ready to conduct a knowledge discovery exercises.
- Defining the relevant stakeholders, legal requirements or obligations that needs to be addressed such as government regulations and organizational rules.
- Determining the ethical implications to the business and society; contingencies that needs to be in place with regard to the outcomes of knowledge discovery; a cost benefit analysis for investing in such an exercise; and a timeline for completion of the project.
- Establishing a clear statement of the problem or the opportunity that knowledge discovery presents to the business which is the main outcome of this phase. For example, "increase online sales to existing customers by 7%."
- Defining the problem statement should be part of a project plan, along with a timeline, and a summary of resources - people, skills, capital, and technologies - required to conduct the knowledge discovery process. A timeline should also be included in the project plan.
During the phase of Data understanding, data exploration is carried out, including exploratory data analysis and data correlations. Also data quality and metadata are examined.
- Exploratory data analysis (EDA) is conducted by looking at distributions, histograms, central measure, etc.
- Correlation analysis is also conducted to determine if there are any relationships between different features in the data.
- Examine the data quality to assess if there are problems with the data such as missing data, outliers, or duplicate records. Then decide how to deal with data quality problems.
- In the Data understanding phase, metadata is examined. Metadata means "data about data".
- It enables us to understand the business meaning of the features in the dataset.
- It may provide us clues about any integrity problems in the data and if there are any previously known data quality problems that needs to be addressed before we can move to other steps in the knowledge discovery process.
Tweet
Data Preparation
- The data preparation phase also involves cleaning the data (such as removing duplicates, and adjusting for missing data, outliers, and noise). There will also be cases where you will need to construct new data, discretise data, and impute data.
- Data preparation also entails the process of cleaning and correcting data quality issues associated with the data. Any duplicates that may exist in the data will have to be removed. If there are missing values and outliers in the data, then we need to determine the best approach to deal with these issues. We have to remove any noise in the data, i.e., inconsistent or unexpected data that should not be present in the dataset, to begin with.
- There will be cases where you may have to construct new data. For example, each item at a grocery store has its own price. When you go shopping, the total price that you pay at checkout is a sum of all the individual item prices. This total price that you pay is a derived attribute based on the individual prices of all items that you purchased and is an example of constructed data.
- Similarly, you may have to discretize data or impute data. Other activities in this phase may include integrating data that resides in different locations, and reformatting data depending on the type of analytical approach you plan to use.
Here are the steps in data preparation
Read more
Modeling
- It involves selecting the appropriate modeling technique to work with the data. Examples of modeling techniques include decision trees, clustering algorithms, classification models, etc. Depending on the type of problem we are trying to solve, we have to know the appropriate modeling technique to use. In addition, we need to know the underlying assumptions associated with each modeling technique. For instance, some models might require that the data is normally or uniformly distributed. Some modeling techniques might allow missing data and some may not allow missing data; And some approaches may have specific data format requirements. In addition, we need to know how to tune hyper-parameters to improve the performance of the models.
- The modeling phase also includes developing a test design. Steps in the test design include partitioning data into training, test, and validation datasets. It also includes selecting and identifying appropriate performance measures for the models.
- Finally, in the modeling phase, we have to conduct an assessment of the models. We have to look at how well the model performs with regard to the stated business objectives and the success criteria. We will also need to look at the quality of the generated model to determine whether the performance measures are satisfactory.
If we create multiple models, then we need to decide which model is the best.
Evaluation
It's the phase where we assess the degree to which the model meets the stated business objectives.
- First, review the process that was used to build the models. By this, we mean that we take a critical look at how we went through the process of building and testing the models. This involves reviewing whether all relevant factors were considered in the modeling process, and analyzing whether any crucial aspects were overlooked that might compromise the results of the model.
- Then conduct quality assurance to make sure that the models were built correctly; to make sure that the features were correctly chosen, and that no features were accidentally omitted. Quality assessment (QA) is required to ensure that the features used in modeling will be available if you want to operationalize the models in the future. QA also ensures that there are no inherent biases in the model, and that the modeling exercise is repeatable. Repeatability implies that the model will generate the same results if we were to redo it again. This is very important in any modeling exercise.
- The next steps at this point is to decide whether to finish the project and move to the deployment phase; or, if there are problems that require us to go back and revisit what was done thus far; or, do we move on to new KDDA initiative.
- We will also have to evaluate the resources available if we are to actually deploy and operationalize the model.
Deployment
At this phase, we plan deployment of the models in the production environment. That means, that the organization can start to use the models for knowledge discovery. We also need to monitor the performance of the models while they are in production. Finally, we wrap up the project by creating a project report and reviewing the project outcomes based on feedback from the end users.
Knowledge discovery
BCD Learning Design
Created on February 11, 2022
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Terrazzo Presentation
View
Visual Presentation
View
Relaxing Presentation
View
Modern Presentation
View
Colorful Presentation
View
Modular Structure Presentation
View
Chromatic Presentation
Explore all templates
Transcript
Business understanding
Data understanding
Data Preparation
Modeling
Data
Deployment
Evaluation
Business Understanding
Other objectives of the Business understanding phase include:
- Data understanding
During the phase of Data understanding, data exploration is carried out, including exploratory data analysis and data correlations. Also data quality and metadata are examined.Tweet
Data Preparation
- The data preparation phase also involves cleaning the data (such as removing duplicates, and adjusting for missing data, outliers, and noise). There will also be cases where you will need to construct new data, discretise data, and impute data.
- Data preparation also entails the process of cleaning and correcting data quality issues associated with the data. Any duplicates that may exist in the data will have to be removed. If there are missing values and outliers in the data, then we need to determine the best approach to deal with these issues. We have to remove any noise in the data, i.e., inconsistent or unexpected data that should not be present in the dataset, to begin with.
- There will be cases where you may have to construct new data. For example, each item at a grocery store has its own price. When you go shopping, the total price that you pay at checkout is a sum of all the individual item prices. This total price that you pay is a derived attribute based on the individual prices of all items that you purchased and is an example of constructed data.
- Similarly, you may have to discretize data or impute data. Other activities in this phase may include integrating data that resides in different locations, and reformatting data depending on the type of analytical approach you plan to use.
Here are the steps in data preparationRead more
Modeling
- Finally, in the modeling phase, we have to conduct an assessment of the models. We have to look at how well the model performs with regard to the stated business objectives and the success criteria. We will also need to look at the quality of the generated model to determine whether the performance measures are satisfactory.
If we create multiple models, then we need to decide which model is the best.Evaluation
It's the phase where we assess the degree to which the model meets the stated business objectives.
Deployment
At this phase, we plan deployment of the models in the production environment. That means, that the organization can start to use the models for knowledge discovery. We also need to monitor the performance of the models while they are in production. Finally, we wrap up the project by creating a project report and reviewing the project outcomes based on feedback from the end users.