Want to create interactive content? It’s easy in Genially!
STATISTICS
Luisamaria Castano V
Created on March 18, 2021
Start designing with a free template
Discover more than 1500 professional designs like these:
Transcript
STATISTICS
What are we gonna learn about statistics?
types of variables
CENTRAL TENDENCY
measures of location
LEVELS OF MEASURE
GRAPHICAL REPRESENTATIONS
measures of dispersion
FREQUENCY TABLES
measures of correlation
WHAT IS STATISTICS FOR?
COLLECTION OF DATA
DESCRIPTION
ANALYSIS
CONCLUSIONS
TYPES OF DATA
Quantitative data can either be...
SCALE OF MEASURE
RATIO DATA
Distance between categorias with true zeroEX: WEIGHT
INTERVAL DATA
Distance between categories, no absolute zero EX: CELSIUS TEMPERATURE SCALE
ORDINAL DATA
Ordered categories (rankings, scales) EX: SOCIOECONOMIC STATUS
NOMINAL DATA
LET'S PRACTICE
Categories (no order o direction) EX: MARITAL STATUS
TYPES OF STATISTICAL ANALYSIS
Not as reliable but faster!
More detailed and accurate!
GRAPHICAL REPRESENTATIONS
RATIO DATA
INTERVAL DATA
ORDINAL DATA
NOMINAL DATA
GRAPHICAL REPRESENTATIONS
RATIO DATA
Bar Charts!
INTERVAL DATA
- Ideal to represent categories (nominal)
- Very good to show relative size
- It is better to leave gaps inbetween bars
ORDINAL DATA
NOMINAL DATA
GRAPHICAL REPRESENTATIONS
RATIO DATA
Pie Charts!
INTERVAL DATA
- Effective to represent categories (nominal or ordinal)
- Every slice shows a proportion of a whole
- Cannot be used if any categorie has a zero or negative value
ORDINAL DATA
NOMINAL DATA
GRAPHICAL REPRESENTATIONS
RATIO DATA
Dot Plots!
INTERVAL DATA
- Effective both for categorical or quantitative variables
- Better to use it when dealing with small data sets
ORDINAL DATA
NOMINAL DATA
GRAPHICAL REPRESENTATIONS
RATIO DATA
Line Graphs!
INTERVAL DATA
- Can only be used with quantitative data
- Perfect to show changes overtime
- Good to compare different sets of data changing together over the same amount of time
ORDINAL DATA
NOMINAL DATA
GRAPHICAL REPRESENTATIONS
RATIO DATA
Histogram!
INTERVAL DATA
- Can only be used with quantitative data
- Shows number intervals and number ranges
- The horizontal axis is a continuous number line, thus, it can have negative and zero values
ORDINAL DATA
NOMINAL DATA
Once we have decided what type of statistical analysis to do and we have collected the data we shall classify it since it will be easier to work with...
How do we classify the data?
1. group the information according to the levels of measure 2. count the frequency of each category
EXAMPLE OF CLASSIFICATION
HOW DO WE CLASSIFY DATA IN GROUPS?
EX:
HOW DO WE CLASSIFY DATA IN GROUPS?
EX:
HOW DO WE CLASSIFY DATA IN GROUPS?
EX:
HOW DO WE CLASSIFY DATA IN GROUPS?
EX:
HOW DO WE CLASSIFY DATA IN GROUPS?
EX:
Once we have organized the data in groups and counted their frequency, we can take measurements that will give us relevant information about the data...
CENTRAL TENDENCY
MEAN, MEDIAN AND MODE
Add up the values and divide by the total amount of numbers!
Order the numbers and choose the one in the middle!
Choose the value that repeats the most!
CENTRAL TENDENCY
The Mean
The mean age of the kids attending the party is 7,5 years
CENTRAL TENDENCY
The Median
The median age is 13 years
CENTRAL TENDENCY
The Mode
The mode age is 13
LOCATION MEASUREMENTS
PERCENTILES
Percentile: the value below which a percentage of data falls.
DECILES: The data is divided in 10 groups with 10% of data each.
QUARTILES: The data is divided in 4 groups with 25% of data each. The middle quartile is the median.
DATA NEEDS TO BE ORDERED
QUINTILES: The data is divided in 5 groups with 20% of data each.
WHAT IF DATA IS GROUPED?
LOCATION MEASUREMENTS
What if you are asked about the best estimate for a percentile of certain observation from a group of data:
EX:
The dot plot shows the number of hours of daily driving time for 14 different MKS school bus drivers, each dot represent one driver. What is the best percentile estimation for the driver with a daily driving time of 6 hours?
LOCATION MEASUREMENTS
What if you are asked about the value of an observation given the percentile rank in a group of data:
EX:
A total of 10000 people attended the music festival StereoPicnic in Bogotá. The table shows the amount of people that arrived per hour. What interval contains the 45th percentile, that means, when 45% of the festival-goers had arrived.
LOCATION MEASUREMENTS
WHAT IF DATA IS GROUPED?
Add up all percentages below the score, plus half the percentage at the score. Taking half the B means we don't assume we got the best B nor the worst B, just an average B.
In the previous test: * 12% of the group got D * 50% of the group got C * 30% of the group got B * 8% of the group got A
12% + 50% + 0.5(30%) = 77%
If you got a B, what percentile are you in?
You are on the 77th percentile, you did as well as or better than 77% of the class
LOCATION MEASUREMENTS
Let's practice!
LOCATION MEASUREMENTS: SABER QUESTIONS
Responde la siguiente pregunta teniendo en cuenta toda la información suministrada.
LOCATION MEASUREMENTS: SABER QUESTIONS
1. Tanto un hombre como una mujer, dados su peso y estatura, se encuentran en el rango de obesidad leve del IMC es decir, su índice está entre 30 y 34,9. Qué se puede afirmar de dichos individuos cuando se les compara con la población de su mismo sexo entre los 26 y 60 años? A. Ambos individuos son más obesos que el 90% de la población de su mismo sexo. B. La mujer es más obesa que el 88% de población de su mismo sexo mientras el hombre es más obeso que el 95% de su población. C. La mujer es más obesa que el 92% de su población mientras el hombre es más obeso que tan solo el 88% de su población. D. Tanto el hombre como la mujer son parte del 80% menos obeso de sus respectivas poblaciones.
DISPERSION
MEAN DEVIATION
VARIANCE
STANDARD DEVIATION
RANGE
DISPERSION
Range: the difference between the lowest and highest values
DISPERSION
Mean Deviation: how far, on average, are all values from the middle
1. Find the mean of all values. 2. Find the distance of each value from that mean: subtract the mean from each value, ignore minus signs. 3. Then find the mean of those distances.
DISPERSION
EX:
DISPERSION
Standard Deviation: measures the amount of variability among the numbers in a data set. Its symbol is the greel letter σ.
- It calculates the typical distance of a data point from the mean of the data
- If the standard deviation is relatively large, it means the data is quite spread out away from the mean
- If the standard deviation is relatively small, it means the data is concentrated near the mean
DISPERSION
Variance: The average of the squared differences from the mean.
Variance² = Standard Deviation
1. Calculate the mean: the simple average of the numbers 2. Then for each number: subtract the mean and square the result, that means, the squared difference. 3. Calculate the average of those squared differences. We work with the square because it avoids positive and negative numbers from cancelling each other out.
DISPERSION
EX:
mean
These five dogs' heights (at the shoulders) are, from left to right: 600mm, 470mm, 170mm, 430mm and 300mm.
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra small. Rottweilers are tall dogs. And Dachshunds are a bit short, right?
HOW CAN WE REPRESENT THIS GRAPHICALLY?
Many things closely follow a Normal Distribution:
- heights of people
- size of things produced by machines
- errors in measurements
- blood pressure
- marks on a test
NORMAL DISTRIBUTION
STANDARD DEVIATION'S PERCENTAGES
EXAMPLE:
EXAMPLE:
EXAMPLE:
Students pass a test if they score 50/100 or more. The marks of a large number of students were sampled and the mean and standard deviation were calculated as 42/100 and 8/100, respectively. 1. Sketch the bell distribution according with the information 2. Assuming this data is normally distributed, what percentage of students pass the test?
EXAMPLE:
1 Standard Deviation
2 Standard Deviations
3 Standard Deviations
MEAN
We know 68 of the scores are within 1 standard deviation (SD) from the means and the score 50 is exactly 1 SD above the mean. So, OUTSIDE of that region there is 32% (100%-68%) of scores, one half above and one half below. Therefore, only 32%/2=16% of people are over the 50 score and have actually passed the test!
DO IT YOURSELF:
The mean June midday temperature in Desertville is 36°C and the standard deviation is 3°C. Assuming this data is normally distributed, how many days in June would you expect the midday temperature to be between 30°C and 42°C?
DO IT YOURSELF:
The heights of male adults are Normally distributed with mean 1.7 m and standard deviation 0.2 m. In a population of 400 male adults, how many would you expect to have a height between 1.5 and 1.9 m?
DO IT YOURSELF:
The heights of male adults are Normally distributed with mean 1.7 m and standard deviation 0.2 m. In a population of 400 male adults, how many would you expect to have a height between 1.5 and 1.9 m?
practice
https://www.mathopolis.com/questions/q.html?id=2619&t=mif&qs=2619_2620_2621_2622_2623_2624_2625_2626_3844_3845&site=1&ref=2f646174612f7374616e646172642d6e6f726d616c2d646973747269627574696f6e2e68746d6c&title=4e6f726d616c20446973747269627574696f6e#
CORRELATION
Correlation measures how related two sets of values are.
- POSITIVE: Both values increase together
- NEGATIVE: While one values increases, the other one decreases
- 1 is a perfect positive correlation
- 0 is no correlation (the values don't seem linked at all)
- -1 is a perfect negative correlation