How Unsupervised Learning Uses Features in mHealth
Review: Unsupervised Learning
In unsupervised learning, the dataset contains features but no labels, meaning we don’t know the “correct” categories or outcomes (like stressed vs calm). The algorithm instead discovers hidden structure or patterns in the data.
How Models Use These Features?
Unsupervised learning algorithms discover relationships among feature values.
Variance
Distance
Density
K-Means Clustering
- Each user’s record is a point in multi-feature space:
- The algorithm measures distances (usually Euclidean) between points.
- Points that are close together (similar HRV, Steps, Sleep) are grouped into the same cluster.
- It uses features to find “who looks like whom” in terms of patterns.
In mHealthK-means is often used to group users by wellness profile:
- Cluster 1 → Low HRV, Low Steps → High Stress
- Cluster 2 → High HRV, High Steps → Active, Healthy
Problem with K-Means Clustering
K-Means requires you to predefine the number of clusters (K).Too few clusters → distinct groups get merged. Too many clusters → model overfits noise. We need a way to find the “just right” number
The Elbow Method helps us find the balance between under- and over-clustering.
K=2
K=3
K=4
Within-Cluster Sum of Squares (WCSS)
- Each data point belongs to a cluster with a cluster center (centroid).
- K-Means calculates a metric called WCSS (Within-Cluster Sum of Squares).
- WCSS measures how far points are from their centroid.
You can compute WCSS for several values of K:
Then you plot K (x-axis) vs WCSS (y-axis). As K increases:
- WCSS always decreases
- Improvement gets smaller after a certain point.
The Elbow Point
The Elbow Method helps us find the balance between under- and over-clustering.
When you look at the curve:
- The WCSS drops sharply at first,
- Then levels off gradually.
The point where this drop starts to flatten, forming an “elbow” shape, is considered the optimal K. At this point, adding more clusters doesn’t significantly improve the fit, so it’s a good balance between simplicity and accuracy.
What is Principal Component Analysis (PCA)?
Principal Component Analysis (PCA) is a mathematical technique that helps you summarize and simplify complex data without losing the most important information.It does this by finding new “summary features” (called principal components or PCs) that capture how the data varies the most. In simple terms: PCA finds the main directions or patterns in your data, such as the “axes of biggest change.”
Why PCA Matters in mHealth
In mobile or wearable health data, we often collect many overlapping signals:
- HRV (Heart Rate Variability)
- Steps
- Sleep duration
- SpO₂
- Skin temperature
- Stress score
Many of these could be related. For example, someone who takes more steps also tends to have higher HRV and better sleep. Instead of analyzing all these correlated features separately, PCA combines them into a smaller set of new features that represent the main trends.
= HRV, Steps, Sleep
= Stress Score, Sleep
Example
Imagine you track 100 people wearing smartwatches.Each person has 10 health features. That's 100 × 10 data matrix — lots of numbers! PCA might discover that:
- PC2 (Principal Component 2) = variation in Sleep and Stress → “Rest Pattern Axis”
- PC1 (Principal Component 1) = combination of HRV, Steps, and Sleep → “Activity & Recovery Axis”
Now you can represent each person by just these two components, instead of all ten original features, making the data easier to visualize and interpret.
Examples in mHealth
How PCA Works?
Data Preparation
Subtract mean from the Data
Calculate Covariance Matrix
Calculate Eigenvectors & Eigenvalues
Select Principal Components
Reduce Data Dimension
What is DBSCAN?
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. It’s an unsupervised learning algorithm that groups together data points that are close and dense, while identifying outliers (points that don’t belong anywhere). Summary, DBSCAN finds clusters in your data based on how tightly packed the points are.
Key Terms to Remember
How DBSCAN Differs from K-Means?
Example: DBSCAN for Smartwatch Users
Let’s say we’re clustering daily summaries from 100 smartwatch users using 3 features:
- HRV (Heart Rate Variability)
Each user → [HRV, Steps, Sleep]
Congratulations, you have completed this activity.
Prepare the Data
What it means:
- Pick the features & units
- Handle missing values
- De-noise / detrend (if time-series)
- Detect & tame outliers
- Scale/standardize features
- (Optional) Log-transform skewed variables (Steps, Calories) to reduce skew.
- (Optional) Feature screening
Why this matters: Without Step 0, PCA might be hijacked by whichever feature has the largest numeric scale or by a few outliers. Example: Steps (0–20,000) would overwhelm HRV (~20–100 ms) unless you standardize.
Calculate Covariance Matrix
What it means: We measure how each pair of features changes together — called covariance. Why we do it: It helps us see which features are related.
- Positive covariance → they increase together (e.g., Steps & HRV).
- Negative covariance → when one goes up, the other goes down (e.g., Stress & Sleep).
Example: If HRV and Steps both increase together, PCA will group them into one direction of variation.
Subtract Mean from the Data
What it means: Each feature (like HRV, Steps, or Sleep) is centered by removing its average value. Why we do it: We want every feature to start from the same reference point (zero mean), so differences reflect variation and not absolute levels. Example: If average HRV = 50, and one user’s HRV = 55 → we use +5 as their adjusted value.
Calculate Covariance Matrix
What it means:This is the math step where we find the directions (eigenvectors) and strengths (eigenvalues) of variance in the data. Why we do it:
- Each eigenvector = a principal component direction.
- Each eigenvalue = how much variance that component explains.
Example:
- PC1 might explain 70% of total variation (activity–recovery).
- PC2 might explain 20% (sleep variation).
Answer
Each user is represented by features such as HRV and Steps. The algorithm doesn’t know who is stressed or calm. Instead, it groups users based on similarity in these feature values. The resulting clusters may correspond to behavioral types (e.g., active, balanced, sedentary) or physiological patterns (e.g., high-stress vs. low-stress groups).
Reduce Data Dimension
What it means: We transform the original data into this new, smaller coordinate system (PC1, PC2, …). Why we do it: Now the data is easier to visualize, interpret, and use in machine learning. Example: Instead of 10 wearable features, we might now analyze just 2 components:
- PC1 → Activity & Recovery
- PC2 → Rest Patternsue.
Select Principal Components
What it means: We keep only the top few components that explain most of the information. Why we do it: We want to simplify the dataset without losing important patterns. This is called dimensionality reduction. Example: If PC1 and PC2 explain 90% of the variance or meaningful patterns and differences in the original data, we drop the rest.
Suppose you measure HRV, Steps, Sleep, and SpO₂.
Most differences between people come from:
- How active (steps) and recovered (HRV) they are (PC1)
- How well they rest (Sleep) (PC2)
Together, those two axes already describe 90% of how users differ — so you don’t need the rest t(SpO₂) o get the main picture.
How DBSCAN Works?
Instead of asking “How many clusters (K) do we want?” like K-Means, DBSCAN asks two simple questions for each point:
- How many neighbors are nearby?
- Are those neighbors close enough (within a certain distance)?
Example: DBSCAN in mHealth
DBSCAN uses feature patterns to detect natural groupings without labels.
- Features with larger differences → increase distance → separate clusters.
- Features with small differences → close distance → same cluster.
How Unsupervised Learning Uses Features in mHealth
Beenish Chaudhry
Created on October 21, 2025
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Math Lesson Plan
View
Primary Unit Plan 2
View
Animated Chalkboard Learning Unit
View
Business Learning Unit
View
Corporate Signature Learning Unit
View
Code Training Unit
View
History Unit plan
Explore all templates
Transcript
How Unsupervised Learning Uses Features in mHealth
Review: Unsupervised Learning
In unsupervised learning, the dataset contains features but no labels, meaning we don’t know the “correct” categories or outcomes (like stressed vs calm). The algorithm instead discovers hidden structure or patterns in the data.
How Models Use These Features?
Unsupervised learning algorithms discover relationships among feature values.
Variance
Distance
Density
K-Means Clustering
- Each user’s record is a point in multi-feature space:
- X=[HRV,Steps,Sleep]
- The algorithm measures distances (usually Euclidean) between points.
- Points that are close together (similar HRV, Steps, Sleep) are grouped into the same cluster.
- It uses features to find “who looks like whom” in terms of patterns.
In mHealthK-means is often used to group users by wellness profile:Problem with K-Means Clustering
K-Means requires you to predefine the number of clusters (K).Too few clusters → distinct groups get merged. Too many clusters → model overfits noise. We need a way to find the “just right” number
The Elbow Method helps us find the balance between under- and over-clustering.
K=2
K=3
K=4
Within-Cluster Sum of Squares (WCSS)
- Each data point belongs to a cluster with a cluster center (centroid).
- K-Means calculates a metric called WCSS (Within-Cluster Sum of Squares).
- WCSS measures how far points are from their centroid.
You can compute WCSS for several values of K:- K = 1, 2, 3, 4, 5, …
Then you plot K (x-axis) vs WCSS (y-axis). As K increases:The Elbow Point
The Elbow Method helps us find the balance between under- and over-clustering.
When you look at the curve:
- The WCSS drops sharply at first,
- Then levels off gradually.
The point where this drop starts to flatten, forming an “elbow” shape, is considered the optimal K. At this point, adding more clusters doesn’t significantly improve the fit, so it’s a good balance between simplicity and accuracy.What is Principal Component Analysis (PCA)?
Principal Component Analysis (PCA) is a mathematical technique that helps you summarize and simplify complex data without losing the most important information.It does this by finding new “summary features” (called principal components or PCs) that capture how the data varies the most. In simple terms: PCA finds the main directions or patterns in your data, such as the “axes of biggest change.”
Why PCA Matters in mHealth
In mobile or wearable health data, we often collect many overlapping signals:
- HRV (Heart Rate Variability)
- Steps
- Sleep duration
- SpO₂
- Skin temperature
- Stress score
Many of these could be related. For example, someone who takes more steps also tends to have higher HRV and better sleep. Instead of analyzing all these correlated features separately, PCA combines them into a smaller set of new features that represent the main trends.= HRV, Steps, Sleep
= Stress Score, Sleep
Example
Imagine you track 100 people wearing smartwatches.Each person has 10 health features. That's 100 × 10 data matrix — lots of numbers! PCA might discover that:
- PC2 (Principal Component 2) = variation in Sleep and Stress → “Rest Pattern Axis”
- PC1 (Principal Component 1) = combination of HRV, Steps, and Sleep → “Activity & Recovery Axis”
Now you can represent each person by just these two components, instead of all ten original features, making the data easier to visualize and interpret.Examples in mHealth
How PCA Works?
Data Preparation
Subtract mean from the Data
Calculate Covariance Matrix
Calculate Eigenvectors & Eigenvalues
Select Principal Components
Reduce Data Dimension
What is DBSCAN?
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. It’s an unsupervised learning algorithm that groups together data points that are close and dense, while identifying outliers (points that don’t belong anywhere). Summary, DBSCAN finds clusters in your data based on how tightly packed the points are.
Key Terms to Remember
How DBSCAN Differs from K-Means?
Example: DBSCAN for Smartwatch Users
Let’s say we’re clustering daily summaries from 100 smartwatch users using 3 features:
- Sleep Hours
Each user → [HRV, Steps, Sleep]Congratulations, you have completed this activity.
Prepare the Data
What it means:
- Pick the features & units
- Handle missing values
- De-noise / detrend (if time-series)
- Detect & tame outliers
- Scale/standardize features
- (Optional) Log-transform skewed variables (Steps, Calories) to reduce skew.
- (Optional) Feature screening
Why this matters: Without Step 0, PCA might be hijacked by whichever feature has the largest numeric scale or by a few outliers. Example: Steps (0–20,000) would overwhelm HRV (~20–100 ms) unless you standardize.Calculate Covariance Matrix
What it means: We measure how each pair of features changes together — called covariance. Why we do it: It helps us see which features are related.
- Positive covariance → they increase together (e.g., Steps & HRV).
- Negative covariance → when one goes up, the other goes down (e.g., Stress & Sleep).
Example: If HRV and Steps both increase together, PCA will group them into one direction of variation.Subtract Mean from the Data
What it means: Each feature (like HRV, Steps, or Sleep) is centered by removing its average value. Why we do it: We want every feature to start from the same reference point (zero mean), so differences reflect variation and not absolute levels. Example: If average HRV = 50, and one user’s HRV = 55 → we use +5 as their adjusted value.
Calculate Covariance Matrix
What it means:This is the math step where we find the directions (eigenvectors) and strengths (eigenvalues) of variance in the data. Why we do it:
- Each eigenvector = a principal component direction.
- Each eigenvalue = how much variance that component explains.
Example:Answer
Each user is represented by features such as HRV and Steps. The algorithm doesn’t know who is stressed or calm. Instead, it groups users based on similarity in these feature values. The resulting clusters may correspond to behavioral types (e.g., active, balanced, sedentary) or physiological patterns (e.g., high-stress vs. low-stress groups).
Reduce Data Dimension
What it means: We transform the original data into this new, smaller coordinate system (PC1, PC2, …). Why we do it: Now the data is easier to visualize, interpret, and use in machine learning. Example: Instead of 10 wearable features, we might now analyze just 2 components:
Select Principal Components
What it means: We keep only the top few components that explain most of the information. Why we do it: We want to simplify the dataset without losing important patterns. This is called dimensionality reduction. Example: If PC1 and PC2 explain 90% of the variance or meaningful patterns and differences in the original data, we drop the rest.
Suppose you measure HRV, Steps, Sleep, and SpO₂. Most differences between people come from:
- How active (steps) and recovered (HRV) they are (PC1)
- How well they rest (Sleep) (PC2)
Together, those two axes already describe 90% of how users differ — so you don’t need the rest t(SpO₂) o get the main picture.How DBSCAN Works?
Instead of asking “How many clusters (K) do we want?” like K-Means, DBSCAN asks two simple questions for each point:
Example: DBSCAN in mHealth
DBSCAN uses feature patterns to detect natural groupings without labels.