Dr. Ji Son

Dr. Ji Son

ANOVA with Independent Samples

Slide Duration:

Table of Contents

Section 1: Introduction
Descriptive Statistics vs. Inferential Statistics

25m 31s

Intro
0:00
Roadmap
0:10
Roadmap
0:11
Statistics
0:35
Statistics
0:36
Let's Think About High School Science
1:12
Measurement and Find Patterns (Mathematical Formula)
1:13
Statistics = Math of Distributions
4:58
Distributions
4:59
Problematic… but also GREAT
5:58
Statistics
7:33
How is It Different from Other Specializations in Mathematics?
7:34
Statistics is Fundamental in Natural and Social Sciences
7:53
Two Skills of Statistics
8:20
Description (Exploration)
8:21
Inference
9:13
Descriptive Statistics vs. Inferential Statistics: Apply to Distributions
9:58
Descriptive Statistics
9:59
Inferential Statistics
11:05
Populations vs. Samples
12:19
Populations vs. Samples: Is it the Truth?
12:20
Populations vs. Samples: Pros & Cons
13:36
Populations vs. Samples: Descriptive Values
16:12
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:10
Putting Together Descriptive/Inferential Stats & Populations/Samples
17:11
Example 1: Descriptive Statistics vs. Inferential Statistics
19:09
Example 2: Descriptive Statistics vs. Inferential Statistics
20:47
Example 3: Sample, Parameter, Population, and Statistic
21:40
Example 4: Sample, Parameter, Population, and Statistic
23:28
Section 2: About Samples: Cases, Variables, Measurements
About Samples: Cases, Variables, Measurements

32m 14s

Intro
0:00
Data
0:09
Data, Cases, Variables, and Values
0:10
Rows, Columns, and Cells
2:03
Example: Aircrafts
3:52
How Do We Get Data?
5:38
Research: Question and Hypothesis
5:39
Research Design
7:11
Measurement
7:29
Research Analysis
8:33
Research Conclusion
9:30
Types of Variables
10:03
Discrete Variables
10:04
Continuous Variables
12:07
Types of Measurements
14:17
Types of Measurements
14:18
Types of Measurements (Scales)
17:22
Nominal
17:23
Ordinal
19:11
Interval
21:33
Ratio
24:24
Example 1: Cases, Variables, Measurements
25:20
Example 2: Which Scale of Measurement is Used?
26:55
Example 3: What Kind of a Scale of Measurement is This?
27:26
Example 4: Discrete vs. Continuous Variables.
30:31
Section 3: Visualizing Distributions
Introduction to Excel

8m 9s

Intro
0:00
Before Visualizing Distribution
0:10
Excel
0:11
Excel: Organization
0:45
Workbook
0:46
Column x Rows
1:50
Tools: Menu Bar, Standard Toolbar, and Formula Bar
3:00
Excel + Data
6:07
Exce and Data
6:08
Frequency Distributions in Excel

39m 10s

Intro
0:00
Roadmap
0:08
Data in Excel and Frequency Distributions
0:09
Raw Data to Frequency Tables
0:42
Raw Data to Frequency Tables
0:43
Frequency Tables: Using Formulas and Pivot Tables
1:28
Example 1: Number of Births
7:17
Example 2: Age Distribution
20:41
Example 3: Height Distribution
27:45
Example 4: Height Distribution of Males
32:19
Frequency Distributions and Features

25m 29s

Intro
0:00
Roadmap
0:10
Data in Excel, Frequency Distributions, and Features of Frequency Distributions
0:11
Example #1
1:35
Uniform
1:36
Example #2
2:58
Unimodal, Skewed Right, and Asymmetric
2:59
Example #3
6:29
Bimodal
6:30
Example #4a
8:29
Symmetric, Unimodal, and Normal
8:30
Point of Inflection and Standard Deviation
11:13
Example #4b
12:43
Normal Distribution
12:44
Summary
13:56
Uniform, Skewed, Bimodal, and Normal
13:57
Sketch Problem 1: Driver's License
17:34
Sketch Problem 2: Life Expectancy
20:01
Sketch Problem 3: Telephone Numbers
22:01
Sketch Problem 4: Length of Time Used to Complete a Final Exam
23:43
Dotplots and Histograms in Excel

42m 42s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Previously
1:02
Data, Frequency Table, and visualization
1:03
Dotplots
1:22
Dotplots Excel Example
1:23
Dotplots: Pros and Cons
7:22
Pros and Cons of Dotplots
7:23
Dotplots Excel Example Cont.
9:07
Histograms
12:47
Histograms Overview
12:48
Example of Histograms
15:29
Histograms: Pros and Cons
31:39
Pros
31:40
Cons
32:31
Frequency vs. Relative Frequency
32:53
Frequency
32:54
Relative Frequency
33:36
Example 1: Dotplots vs. Histograms
34:36
Example 2: Age of Pennies Dotplot
36:21
Example 3: Histogram of Mammal Speeds
38:27
Example 4: Histogram of Life Expectancy
40:30
Stemplots

12m 23s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
What Sets Stemplots Apart?
0:46
Data Sets, Dotplots, Histograms, and Stemplots
0:47
Example 1: What Do Stemplots Look Like?
1:58
Example 2: Back-to-Back Stemplots
5:00
Example 3: Quiz Grade Stemplot
7:46
Example 4: Quiz Grade & Afterschool Tutoring Stemplot
9:56
Bar Graphs

22m 49s

Intro
0:00
Roadmap
0:05
Roadmap
0:08
Review of Frequency Distributions
0:44
Y-axis and X-axis
0:45
Types of Frequency Visualizations Covered so Far
2:16
Introduction to Bar Graphs
4:07
Example 1: Bar Graph
5:32
Example 1: Bar Graph
5:33
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:07
Do Shapes, Center, and Spread of Distributions Apply to Bar Graphs?
11:08
Example 2: Create a Frequency Visualization for Gender
14:02
Example 3: Cases, Variables, and Frequency Visualization
16:34
Example 4: What Kind of Graphs are Shown Below?
19:29
Section 4: Summarizing Distributions
Central Tendency: Mean, Median, Mode

38m 50s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
Central Tendency 1
0:56
Way to Summarize a Distribution of Scores
0:57
Mode
1:32
Median
2:02
Mean
2:36
Central Tendency 2
3:47
Mode
3:48
Median
4:20
Mean
5:25
Summation Symbol
6:11
Summation Symbol
6:12
Population vs. Sample
10:46
Population vs. Sample
10:47
Excel Examples
15:08
Finding Mode, Median, and Mean in Excel
15:09
Median vs. Mean
21:45
Effect of Outliers
21:46
Relationship Between Parameter and Statistic
22:44
Type of Measurements
24:00
Which Distributions to Use With
24:55
Example 1: Mean
25:30
Example 2: Using Summation Symbol
29:50
Example 3: Average Calorie Count
32:50
Example 4: Creating an Example Set
35:46
Variability

42m 40s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Variability (or Spread)
0:45
Variability (or Spread)
0:46
Things to Think About
5:45
Things to Think About
5:46
Range, Quartiles and Interquartile Range
6:37
Range
6:38
Interquartile Range
8:42
Interquartile Range Example
10:58
Interquartile Range Example
10:59
Variance and Standard Deviation
12:27
Deviations
12:28
Sum of Squares
14:35
Variance
16:55
Standard Deviation
17:44
Sum of Squares (SS)
18:34
Sum of Squares (SS)
18:35
Population vs. Sample SD
22:00
Population vs. Sample SD
22:01
Population vs. Sample
23:20
Mean
23:21
SD
23:51
Example 1: Find the Mean and Standard Deviation of the Variable Friends in the Excel File
27:21
Example 2: Find the Mean and Standard Deviation of the Tagged Photos in the Excel File
35:25
Example 3: Sum of Squares
38:58
Example 4: Standard Deviation
41:48
Five Number Summary & Boxplots

57m 15s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Summarizing Distributions
0:37
Shape, Center, and Spread
0:38
5 Number Summary
1:14
Boxplot: Visualizing 5 Number Summary
3:37
Boxplot: Visualizing 5 Number Summary
3:38
Boxplots on Excel
9:01
Using 'Stocks' and Using Stacked Columns
9:02
Boxplots on Excel Example
10:14
When are Boxplots Useful?
32:14
Pros
32:15
Cons
32:59
How to Determine Outlier Status
33:24
Rule of Thumb: Upper Limit
33:25
Rule of Thumb: Lower Limit
34:16
Signal Outliers in an Excel Data File Using Conditional Formatting
34:52
Modified Boxplot
48:38
Modified Boxplot
48:39
Example 1: Percentage Values & Lower and Upper Whisker
49:10
Example 2: Boxplot
50:10
Example 3: Estimating IQR From Boxplot
53:46
Example 4: Boxplot and Missing Whisker
54:35
Shape: Calculating Skewness & Kurtosis

41m 51s

Intro
0:00
Roadmap
0:16
Roadmap
0:17
Skewness Concept
1:09
Skewness Concept
1:10
Calculating Skewness
3:26
Calculating Skewness
3:27
Interpreting Skewness
7:36
Interpreting Skewness
7:37
Excel Example
8:49
Kurtosis Concept
20:29
Kurtosis Concept
20:30
Calculating Kurtosis
24:17
Calculating Kurtosis
24:18
Interpreting Kurtosis
29:01
Leptokurtic
29:35
Mesokurtic
30:10
Platykurtic
31:06
Excel Example
32:04
Example 1: Shape of Distribution
38:28
Example 2: Shape of Distribution
39:29
Example 3: Shape of Distribution
40:14
Example 4: Kurtosis
41:10
Normal Distribution

34m 33s

Intro
0:00
Roadmap
0:13
Roadmap
0:14
What is a Normal Distribution
0:44
The Normal Distribution As a Theoretical Model
0:45
Possible Range of Probabilities
3:05
Possible Range of Probabilities
3:06
What is a Normal Distribution
5:07
Can Be Described By
5:08
Properties
5:49
'Same' Shape: Illusion of Different Shape!
7:35
'Same' Shape: Illusion of Different Shape!
7:36
Types of Problems
13:45
Example: Distribution of SAT Scores
13:46
Shape Analogy
19:48
Shape Analogy
19:49
Example 1: The Standard Normal Distribution and Z-Scores
22:34
Example 2: The Standard Normal Distribution and Z-Scores
25:54
Example 3: Sketching and Normal Distribution
28:55
Example 4: Sketching and Normal Distribution
32:32
Standard Normal Distributions & Z-Scores

41m 44s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
A Family of Distributions
0:28
Infinite Set of Distributions
0:29
Transforming Normal Distributions to 'Standard' Normal Distribution
1:04
Normal Distribution vs. Standard Normal Distribution
2:58
Normal Distribution vs. Standard Normal Distribution
2:59
Z-Score, Raw Score, Mean, & SD
4:08
Z-Score, Raw Score, Mean, & SD
4:09
Weird Z-Scores
9:40
Weird Z-Scores
9:41
Excel
16:45
For Normal Distributions
16:46
For Standard Normal Distributions
19:11
Excel Example
20:24
Types of Problems
25:18
Percentage Problem: P(x)
25:19
Raw Score and Z-Score Problems
26:28
Standard Deviation Problems
27:01
Shape Analogy
27:44
Shape Analogy
27:45
Example 1: Deaths Due to Heart Disease vs. Deaths Due to Cancer
28:24
Example 2: Heights of Male College Students
33:15
Example 3: Mean and Standard Deviation
37:14
Example 4: Finding Percentage of Values in a Standard Normal Distribution
37:49
Normal Distribution: PDF vs. CDF

55m 44s

Intro
0:00
Roadmap
0:15
Roadmap
0:16
Frequency vs. Cumulative Frequency
0:56
Frequency vs. Cumulative Frequency
0:57
Frequency vs. Cumulative Frequency
4:32
Frequency vs. Cumulative Frequency Cont.
4:33
Calculus in Brief
6:21
Derivative-Integral Continuum
6:22
PDF
10:08
PDF for Standard Normal Distribution
10:09
PDF for Normal Distribution
14:32
Integral of PDF = CDF
21:27
Integral of PDF = CDF
21:28
Example 1: Cumulative Frequency Graph
23:31
Example 2: Mean, Standard Deviation, and Probability
24:43
Example 3: Mean and Standard Deviation
35:50
Example 4: Age of Cars
49:32
Section 5: Linear Regression
Scatterplots

47m 19s

Intro
0:00
Roadmap
0:04
Roadmap
0:05
Previous Visualizations
0:30
Frequency Distributions
0:31
Compare & Contrast
2:26
Frequency Distributions Vs. Scatterplots
2:27
Summary Values
4:53
Shape
4:54
Center & Trend
6:41
Spread & Strength
8:22
Univariate & Bivariate
10:25
Example Scatterplot
10:48
Shape, Trend, and Strength
10:49
Positive and Negative Association
14:05
Positive and Negative Association
14:06
Linearity, Strength, and Consistency
18:30
Linearity
18:31
Strength
19:14
Consistency
20:40
Summarizing a Scatterplot
22:58
Summarizing a Scatterplot
22:59
Example 1: Gapminder.org, Income x Life Expectancy
26:32
Example 2: Gapminder.org, Income x Infant Mortality
36:12
Example 3: Trend and Strength of Variables
40:14
Example 4: Trend, Strength and Shape for Scatterplots
43:27
Regression

32m 2s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Linear Equations
0:34
Linear Equations: y = mx + b
0:35
Rough Line
5:16
Rough Line
5:17
Regression - A 'Center' Line
7:41
Reasons for Summarizing with a Regression Line
7:42
Predictor and Response Variable
10:04
Goal of Regression
12:29
Goal of Regression
12:30
Prediction
14:50
Example: Servings of Mile Per Year Shown By Age
14:51
Intrapolation
17:06
Extrapolation
17:58
Error in Prediction
20:34
Prediction Error
20:35
Residual
21:40
Example 1: Residual
23:34
Example 2: Large and Negative Residual
26:30
Example 3: Positive Residual
28:13
Example 4: Interpret Regression Line & Extrapolate
29:40
Least Squares Regression

56m 36s

Intro
0:00
Roadmap
0:13
Roadmap
0:14
Best Fit
0:47
Best Fit
0:48
Sum of Squared Errors (SSE)
1:50
Sum of Squared Errors (SSE)
1:51
Why Squared?
3:38
Why Squared?
3:39
Quantitative Properties of Regression Line
4:51
Quantitative Properties of Regression Line
4:52
So How do we Find Such a Line?
6:49
SSEs of Different Line Equations & Lowest SSE
6:50
Carl Gauss' Method
8:01
How Do We Find Slope (b1)
11:00
How Do We Find Slope (b1)
11:01
Hoe Do We Find Intercept
15:11
Hoe Do We Find Intercept
15:12
Example 1: Which of These Equations Fit the Above Data Best?
17:18
Example 2: Find the Regression Line for These Data Points and Interpret It
26:31
Example 3: Summarize the Scatterplot and Find the Regression Line.
34:31
Example 4: Examine the Mean of Residuals
43:52
Correlation

43m 58s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Summarizing a Scatterplot Quantitatively
0:47
Shape
0:48
Trend
1:11
Strength: Correlation ®
1:45
Correlation Coefficient ( r )
2:30
Correlation Coefficient ( r )
2:31
Trees vs. Forest
11:59
Trees vs. Forest
12:00
Calculating r
15:07
Average Product of z-scores for x and y
15:08
Relationship between Correlation and Slope
21:10
Relationship between Correlation and Slope
21:11
Example 1: Find the Correlation between Grams of Fat and Cost
24:11
Example 2: Relationship between r and b1
30:24
Example 3: Find the Regression Line
33:35
Example 4: Find the Correlation Coefficient for this Set of Data
37:37
Correlation: r vs. r-squared

52m 52s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
R-squared
0:44
What is the Meaning of It? Why Squared?
0:45
Parsing Sum of Squared (Parsing Variability)
2:25
SST = SSR + SSE
2:26
What is SST and SSE?
7:46
What is SST and SSE?
7:47
r-squared
18:33
Coefficient of Determination
18:34
If the Correlation is Strong…
20:25
If the Correlation is Strong…
20:26
If the Correlation is Weak…
22:36
If the Correlation is Weak…
22:37
Example 1: Find r-squared for this Set of Data
23:56
Example 2: What Does it Mean that the Simple Linear Regression is a 'Model' of Variance?
33:54
Example 3: Why Does r-squared Only Range from 0 to 1
37:29
Example 4: Find the r-squared for This Set of Data
39:55
Transformations of Data

27m 8s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Why Transform?
0:26
Why Transform?
0:27
Shape-preserving vs. Shape-changing Transformations
5:14
Shape-preserving = Linear Transformations
5:15
Shape-changing Transformations = Non-linear Transformations
6:20
Common Shape-Preserving Transformations
7:08
Common Shape-Preserving Transformations
7:09
Common Shape-Changing Transformations
8:59
Powers
9:00
Logarithms
9:39
Change Just One Variable? Both?
10:38
Log-log Transformations
10:39
Log Transformations
14:38
Example 1: Create, Graph, and Transform the Data Set
15:19
Example 2: Create, Graph, and Transform the Data Set
20:08
Example 3: What Kind of Model would You Choose for this Data?
22:44
Example 4: Transformation of Data
25:46
Section 6: Collecting Data in an Experiment
Sampling & Bias

54m 44s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Descriptive vs. Inferential Statistics
1:04
Descriptive Statistics: Data Exploration
1:05
Example
2:03
To tackle Generalization…
4:31
Generalization
4:32
Sampling
6:06
'Good' Sample
6:40
Defining Samples and Populations
8:55
Population
8:56
Sample
11:16
Why Use Sampling?
13:09
Why Use Sampling?
13:10
Goal of Sampling: Avoiding Bias
15:04
What is Bias?
15:05
Where does Bias Come from: Sampling Bias
17:53
Where does Bias Come from: Response Bias
18:27
Sampling Bias: Bias from Bas Sampling Methods
19:34
Size Bias
19:35
Voluntary Response Bias
21:13
Convenience Sample
22:22
Judgment Sample
23:58
Inadequate Sample Frame
25:40
Response Bias: Bias from 'Bad' Data Collection Methods
28:00
Nonresponse Bias
29:31
Questionnaire Bias
31:10
Incorrect Response or Measurement Bias
37:32
Example 1: What Kind of Biases?
40:29
Example 2: What Biases Might Arise?
44:46
Example 3: What Kind of Biases?
48:34
Example 4: What Kind of Biases?
51:43
Sampling Methods

14m 25s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Biased vs. Unbiased Sampling Methods
0:32
Biased Sampling
0:33
Unbiased Sampling
1:13
Probability Sampling Methods
2:31
Simple Random
2:54
Stratified Random Sampling
4:06
Cluster Sampling
5:24
Two-staged Sampling
6:22
Systematic Sampling
7:25
Example 1: Which Type(s) of Sampling was this?
8:33
Example 2: Describe How to Take a Two-Stage Sample from this Book
10:16
Example 3: Sampling Methods
11:58
Example 4: Cluster Sample Plan
12:48
Research Design

53m 54s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Descriptive vs. Inferential Statistics
0:51
Descriptive Statistics: Data Exploration
0:52
Inferential Statistics
1:02
Variables and Relationships
1:44
Variables
1:45
Relationships
2:49
Not Every Type of Study is an Experiment…
4:16
Category I - Descriptive Study
4:54
Category II - Correlational Study
5:50
Category III - Experimental, Quasi-experimental, Non-experimental
6:33
Category III
7:42
Experimental, Quasi-experimental, and Non-experimental
7:43
Why CAN'T the Other Strategies Determine Causation?
10:18
Third-variable Problem
10:19
Directionality Problem
15:49
What Makes Experiments Special?
17:54
Manipulation
17:55
Control (and Comparison)
21:58
Methods of Control
26:38
Holding Constant
26:39
Matching
29:11
Random Assignment
31:48
Experiment Terminology
34:09
'true' Experiment vs. Study
34:10
Independent Variable (IV)
35:16
Dependent Variable (DV)
35:45
Factors
36:07
Treatment Conditions
36:23
Levels
37:43
Confounds or Extraneous Variables
38:04
Blind
38:38
Blind Experiments
38:39
Double-blind Experiments
39:29
How Categories Relate to Statistics
41:35
Category I - Descriptive Study
41:36
Category II - Correlational Study
42:05
Category III - Experimental, Quasi-experimental, Non-experimental
42:43
Example 1: Research Design
43:50
Example 2: Research Design
47:37
Example 3: Research Design
50:12
Example 4: Research Design
52:00
Between and Within Treatment Variability

41m 31s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Experimental Designs
0:51
Experimental Designs: Manipulation & Control
0:52
Two Types of Variability
2:09
Between Treatment Variability
2:10
Within Treatment Variability
3:31
Updated Goal of Experimental Design
5:47
Updated Goal of Experimental Design
5:48
Example: Drugs and Driving
6:56
Example: Drugs and Driving
6:57
Different Types of Random Assignment
11:27
All Experiments
11:28
Completely Random Design
12:02
Randomized Block Design
13:19
Randomized Block Design
15:48
Matched Pairs Design
15:49
Repeated Measures Design
19:47
Between-subject Variable vs. Within-subject Variable
22:43
Completely Randomized Design
22:44
Repeated Measures Design
25:03
Example 1: Design a Completely Random, Matched Pair, and Repeated Measures Experiment
26:16
Example 2: Block Design
31:41
Example 3: Completely Randomized Designs
35:11
Example 4: Completely Random, Matched Pairs, or Repeated Measures Experiments?
39:01
Section 7: Review of Probability Axioms
Sample Spaces

37m 52s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
Why is Probability Involved in Statistics
0:48
Probability
0:49
Can People Tell the Difference between Cheap and Gourmet Coffee?
2:08
Taste Test with Coffee Drinkers
3:37
If No One can Actually Taste the Difference
3:38
If Everyone can Actually Taste the Difference
5:36
Creating a Probability Model
7:09
Creating a Probability Model
7:10
D'Alembert vs. Necker
9:41
D'Alembert vs. Necker
9:42
Problem with D'Alembert's Model
13:29
Problem with D'Alembert's Model
13:30
Covering Entire Sample Space
15:08
Fundamental Principle of Counting
15:09
Where Do Probabilities Come From?
22:54
Observed Data, Symmetry, and Subjective Estimates
22:55
Checking whether Model Matches Real World
24:27
Law of Large Numbers
24:28
Example 1: Law of Large Numbers
27:46
Example 2: Possible Outcomes
30:43
Example 3: Brands of Coffee and Taste
33:25
Example 4: How Many Different Treatments are there?
35:33
Addition Rule for Disjoint Events

20m 29s

Intro
0:00
Roadmap
0:08
Roadmap
0:09
Disjoint Events
0:41
Disjoint Events
0:42
Meaning of 'or'
2:39
In Regular Life
2:40
In Math/Statistics/Computer Science
3:10
Addition Rule for Disjoin Events
3:55
If A and B are Disjoint: P (A and B)
3:56
If A and B are Disjoint: P (A or B)
5:15
General Addition Rule
5:41
General Addition Rule
5:42
Generalized Addition Rule
8:31
If A and B are not Disjoint: P (A or B)
8:32
Example 1: Which of These are Mutually Exclusive?
10:50
Example 2: What is the Probability that You will Have a Combination of One Heads and Two Tails?
12:57
Example 3: Engagement Party
15:17
Example 4: Home Owner's Insurance
18:30
Conditional Probability

57m 19s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
'or' vs. 'and' vs. Conditional Probability
1:07
'or' vs. 'and' vs. Conditional Probability
1:08
'and' vs. Conditional Probability
5:57
P (M or L)
5:58
P (M and L)
8:41
P (M|L)
11:04
P (L|M)
12:24
Tree Diagram
15:02
Tree Diagram
15:03
Defining Conditional Probability
22:42
Defining Conditional Probability
22:43
Common Contexts for Conditional Probability
30:56
Medical Testing: Positive Predictive Value
30:57
Medical Testing: Sensitivity
33:03
Statistical Tests
34:27
Example 1: Drug and Disease
36:41
Example 2: Marbles and Conditional Probability
40:04
Example 3: Cards and Conditional Probability
45:59
Example 4: Votes and Conditional Probability
50:21
Independent Events

24m 27s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Independent Events & Conditional Probability
0:26
Non-independent Events
0:27
Independent Events
2:00
Non-independent and Independent Events
3:08
Non-independent and Independent Events
3:09
Defining Independent Events
5:52
Defining Independent Events
5:53
Multiplication Rule
7:29
Previously…
7:30
But with Independent Evens
8:53
Example 1: Which of These Pairs of Events are Independent?
11:12
Example 2: Health Insurance and Probability
15:12
Example 3: Independent Events
17:42
Example 4: Independent Events
20:03
Section 8: Probability Distributions
Introduction to Probability Distributions

56m 45s

Intro
0:00
Roadmap
0:08
Roadmap
0:09
Sampling vs. Probability
0:57
Sampling
0:58
Missing
1:30
What is Missing?
3:06
Insight: Probability Distributions
5:26
Insight: Probability Distributions
5:27
What is a Probability Distribution?
7:29
From Sample Spaces to Probability Distributions
8:44
Sample Space
8:45
Probability Distribution of the Sum of Two Die
11:16
The Random Variable
17:43
The Random Variable
17:44
Expected Value
21:52
Expected Value
21:53
Example 1: Probability Distributions
28:45
Example 2: Probability Distributions
35:30
Example 3: Probability Distributions
43:37
Example 4: Probability Distributions
47:20
Expected Value & Variance of Probability Distributions

53m 41s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Discrete vs. Continuous Random Variables
1:04
Discrete vs. Continuous Random Variables
1:05
Mean and Variance Review
4:44
Mean: Sample, Population, and Probability Distribution
4:45
Variance: Sample, Population, and Probability Distribution
9:12
Example Situation
14:10
Example Situation
14:11
Some Special Cases…
16:13
Some Special Cases…
16:14
Linear Transformations
19:22
Linear Transformations
19:23
What Happens to Mean and Variance of the Probability Distribution?
20:12
n Independent Values of X
25:38
n Independent Values of X
25:39
Compare These Two Situations
30:56
Compare These Two Situations
30:57
Two Random Variables, X and Y
32:02
Two Random Variables, X and Y
32:03
Example 1: Expected Value & Variance of Probability Distributions
35:35
Example 2: Expected Values & Standard Deviation
44:17
Example 3: Expected Winnings and Standard Deviation
48:18
Binomial Distribution

55m 15s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Discrete Probability Distributions
1:42
Discrete Probability Distributions
1:43
Binomial Distribution
2:36
Binomial Distribution
2:37
Multiplicative Rule Review
6:54
Multiplicative Rule Review
6:55
How Many Outcomes with k 'Successes'
10:23
Adults and Bachelor's Degree: Manual List of Outcomes
10:24
P (X=k)
19:37
Putting Together # of Outcomes with the Multiplicative Rule
19:38
Expected Value and Standard Deviation in a Binomial Distribution
25:22
Expected Value and Standard Deviation in a Binomial Distribution
25:23
Example 1: Coin Toss
33:42
Example 2: College Graduates
38:03
Example 3: Types of Blood and Probability
45:39
Example 4: Expected Number and Standard Deviation
51:11
Section 9: Sampling Distributions of Statistics
Introduction to Sampling Distributions

48m 17s

Intro
0:00
Roadmap
0:08
Roadmap
0:09
Probability Distributions vs. Sampling Distributions
0:55
Probability Distributions vs. Sampling Distributions
0:56
Same Logic
3:55
Logic of Probability Distribution
3:56
Example: Rolling Two Die
6:56
Simulating Samples
9:53
To Come Up with Probability Distributions
9:54
In Sampling Distributions
11:12
Connecting Sampling and Research Methods with Sampling Distributions
12:11
Connecting Sampling and Research Methods with Sampling Distributions
12:12
Simulating a Sampling Distribution
14:14
Experimental Design: Regular Sleep vs. Less Sleep
14:15
Logic of Sampling Distributions
23:08
Logic of Sampling Distributions
23:09
General Method of Simulating Sampling Distributions
25:38
General Method of Simulating Sampling Distributions
25:39
Questions that Remain
28:45
Questions that Remain
28:46
Example 1: Mean and Standard Error of Sampling Distribution
30:57
Example 2: What is the Best Way to Describe Sampling Distributions?
37:12
Example 3: Matching Sampling Distributions
38:21
Example 4: Mean and Standard Error of Sampling Distribution
41:51
Sampling Distribution of the Mean

1h 8m 48s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Special Case of General Method for Simulating a Sampling Distribution
1:53
Special Case of General Method for Simulating a Sampling Distribution
1:54
Computer Simulation
3:43
Using Simulations to See Principles behind Shape of SDoM
15:50
Using Simulations to See Principles behind Shape of SDoM
15:51
Conditions
17:38
Using Simulations to See Principles behind Center (Mean) of SDoM
20:15
Using Simulations to See Principles behind Center (Mean) of SDoM
20:16
Conditions: Does n Matter?
21:31
Conditions: Does Number of Simulation Matter?
24:37
Using Simulations to See Principles behind Standard Deviation of SDoM
27:13
Using Simulations to See Principles behind Standard Deviation of SDoM
27:14
Conditions: Does n Matter?
34:45
Conditions: Does Number of Simulation Matter?
36:24
Central Limit Theorem
37:13
SHAPE
38:08
CENTER
39:34
SPREAD
39:52
Comparing Population, Sample, and SDoM
43:10
Comparing Population, Sample, and SDoM
43:11
Answering the 'Questions that Remain'
48:24
What Happens When We Don't Know What the Population Looks Like?
48:25
Can We Have Sampling Distributions for Summary Statistics Other than the Mean?
49:42
How Do We Know whether a Sample is Sufficiently Unlikely?
53:36
Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?
54:40
Example 1: Mean Batting Average
55:25
Example 2: Mean Sampling Distribution and Standard Error
59:07
Example 3: Sampling Distribution of the Mean
1:01:04
Sampling Distribution of Sample Proportions

54m 37s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Intro to Sampling Distribution of Sample Proportions (SDoSP)
0:51
Categorical Data (Examples)
0:52
Wish to Estimate Proportion of Population from Sample…
2:00
Notation
3:34
Population Proportion and Sample Proportion Notations
3:35
What's the Difference?
9:19
SDoM vs. SDoSP: Type of Data
9:20
SDoM vs. SDoSP: Shape
11:24
SDoM vs. SDoSP: Center
12:30
SDoM vs. SDoSP: Spread
15:34
Binomial Distribution vs. Sampling Distribution of Sample Proportions
19:14
Binomial Distribution vs. SDoSP: Type of Data
19:17
Binomial Distribution vs. SDoSP: Shape
21:07
Binomial Distribution vs. SDoSP: Center
21:43
Binomial Distribution vs. SDoSP: Spread
24:08
Example 1: Sampling Distribution of Sample Proportions
26:07
Example 2: Sampling Distribution of Sample Proportions
37:58
Example 3: Sampling Distribution of Sample Proportions
44:42
Example 4: Sampling Distribution of Sample Proportions
45:57
Section 10: Inferential Statistics
Introduction to Confidence Intervals

42m 53s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Inferential Statistics
0:50
Inferential Statistics
0:51
Two Problems with This Picture…
3:20
Two Problems with This Picture…
3:21
Solution: Confidence Intervals (CI)
4:59
Solution: Hypotheiss Testing (HT)
5:49
Which Parameters are Known?
6:45
Which Parameters are Known?
6:46
Confidence Interval - Goal
7:56
When We Don't Know m but know s
7:57
When We Don't Know
18:27
When We Don't Know m nor s
18:28
Example 1: Confidence Intervals
26:18
Example 2: Confidence Intervals
29:46
Example 3: Confidence Intervals
32:18
Example 4: Confidence Intervals
38:31
t Distributions

1h 2m 6s

Intro
0:00
Roadmap
0:04
Roadmap
0:05
When to Use z vs. t?
1:07
When to Use z vs. t?
1:08
What is z and t?
3:02
z-score and t-score: Commonality
3:03
z-score and t-score: Formulas
3:34
z-score and t-score: Difference
5:22
Why not z? (Why t?)
7:24
Why not z? (Why t?)
7:25
But Don't Worry!
15:13
Gossett and t-distributions
15:14
Rules of t Distributions
17:05
t-distributions are More Normal as n Gets Bigger
17:06
t-distributions are a Family of Distributions
18:55
Degrees of Freedom (df)
20:02
Degrees of Freedom (df)
20:03
t Family of Distributions
24:07
t Family of Distributions : df = 2 , 4, and 60
24:08
df = 60
29:16
df = 2
29:59
How to Find It?
31:01
'Student's t-distribution' or 't-distribution'
31:02
Excel Example
33:06
Example 1: Which Distribution Do You Use? Z or t?
45:26
Example 2: Friends on Facebook
47:41
Example 3: t Distributions
52:15
Example 4: t Distributions , confidence interval, and mean
55:59
Introduction to Hypothesis Testing

1h 6m 33s

Intro
0:00
Roadmap
0:06
Roadmap
0:07
Issues to Overcome in Inferential Statistics
1:35
Issues to Overcome in Inferential Statistics
1:36
What Happens When We Don't Know What the Population Looks Like?
2:57
How Do We Know whether a sample is Sufficiently Unlikely
3:43
Hypothesizing a Population
6:44
Hypothesizing a Population
6:45
Null Hypothesis
8:07
Alternative Hypothesis
8:56
Hypotheses
11:58
Hypotheses
11:59
Errors in Hypothesis Testing
14:22
Errors in Hypothesis Testing
14:23
Steps of Hypothesis Testing
21:15
Steps of Hypothesis Testing
21:16
Single Sample HT ( When Sigma Available)
26:08
Example: Average Facebook Friends
26:09
Step1
27:08
Step 2
27:58
Step 3
28:17
Step 4
32:18
Single Sample HT (When Sigma Not Available)
36:33
Example: Average Facebook Friends
36:34
Step1: Hypothesis Testing
36:58
Step 2: Significance Level
37:25
Step 3: Decision Stage
37:40
Step 4: Sample
41:36
Sigma and p-value
45:04
Sigma and p-value
45:05
On tailed vs. Two Tailed Hypotheses
45:51
Example 1: Hypothesis Testing
48:37
Example 2: Heights of Women in the US
57:43
Example 3: Select the Best Way to Complete This Sentence
1:03:23
Confidence Intervals for the Difference of Two Independent Means

55m 14s

Intro
0:00
Roadmap
0:14
Roadmap
0:15
One Mean vs. Two Means
1:17
One Mean vs. Two Means
1:18
Notation
2:41
A Sample! A Set!
2:42
Mean of X, Mean of Y, and Difference of Two Means
3:56
SE of X
4:34
SE of Y
6:28
Sampling Distribution of the Difference between Two Means (SDoD)
7:48
Sampling Distribution of the Difference between Two Means (SDoD)
7:49
Rules of the SDoD (similar to CLT!)
15:00
Mean for the SDoD Null Hypothesis
15:01
Standard Error
17:39
When can We Construct a CI for the Difference between Two Means?
21:28
Three Conditions
21:29
Finding CI
23:56
One Mean CI
23:57
Two Means CI
25:45
Finding t
29:16
Finding t
29:17
Interpreting CI
30:25
Interpreting CI
30:26
Better Estimate of s (s pool)
34:15
Better Estimate of s (s pool)
34:16
Example 1: Confidence Intervals
42:32
Example 2: SE of the Difference
52:36
Hypothesis Testing for the Difference of Two Independent Means

50m

Intro
0:00
Roadmap
0:06
Roadmap
0:07
The Goal of Hypothesis Testing
0:56
One Sample and Two Samples
0:57
Sampling Distribution of the Difference between Two Means (SDoD)
3:42
Sampling Distribution of the Difference between Two Means (SDoD)
3:43
Rules of the SDoD (Similar to CLT!)
6:46
Shape
6:47
Mean for the Null Hypothesis
7:26
Standard Error for Independent Samples (When Variance is Homogenous)
8:18
Standard Error for Independent Samples (When Variance is not Homogenous)
9:25
Same Conditions for HT as for CI
10:08
Three Conditions
10:09
Steps of Hypothesis Testing
11:04
Steps of Hypothesis Testing
11:05
Formulas that Go with Steps of Hypothesis Testing
13:21
Step 1
13:25
Step 2
14:18
Step 3
15:00
Step 4
16:57
Example 1: Hypothesis Testing for the Difference of Two Independent Means
18:47
Example 2: Hypothesis Testing for the Difference of Two Independent Means
33:55
Example 3: Hypothesis Testing for the Difference of Two Independent Means
44:22
Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means

1h 14m 11s

Intro
0:00
Roadmap
0:09
Roadmap
0:10
The Goal of Hypothesis Testing
1:27
One Sample and Two Samples
1:28
Independent Samples vs. Paired Samples
3:16
Independent Samples vs. Paired Samples
3:17
Which is Which?
5:20
Independent SAMPLES vs. Independent VARIABLES
7:43
independent SAMPLES vs. Independent VARIABLES
7:44
T-tests Always…
10:48
T-tests Always…
10:49
Notation for Paired Samples
12:59
Notation for Paired Samples
13:00
Steps of Hypothesis Testing for Paired Samples
16:13
Steps of Hypothesis Testing for Paired Samples
16:14
Rules of the SDoD (Adding on Paired Samples)
18:03
Shape
18:04
Mean for the Null Hypothesis
18:31
Standard Error for Independent Samples (When Variance is Homogenous)
19:25
Standard Error for Paired Samples
20:39
Formulas that go with Steps of Hypothesis Testing
22:59
Formulas that go with Steps of Hypothesis Testing
23:00
Confidence Intervals for Paired Samples
30:32
Confidence Intervals for Paired Samples
30:33
Example 1: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
32:28
Example 2: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
44:02
Example 3: Confidence Intervals & Hypothesis Testing for the Difference of Two Paired Means
52:23
Type I and Type II Errors

31m 27s

Intro
0:00
Roadmap
0:18
Roadmap
0:19
Errors and Relationship to HT and the Sample Statistic?
1:11
Errors and Relationship to HT and the Sample Statistic?
1:12
Instead of a Box…Distributions!
7:00
One Sample t-test: Friends on Facebook
7:01
Two Sample t-test: Friends on Facebook
13:46
Usually, Lots of Overlap between Null and Alternative Distributions
16:59
Overlap between Null and Alternative Distributions
17:00
How Distributions and 'Box' Fit Together
22:45
How Distributions and 'Box' Fit Together
22:46
Example 1: Types of Errors
25:54
Example 2: Types of Errors
27:30
Example 3: What is the Danger of the Type I Error?
29:38
Effect Size & Power

44m 41s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Distance between Distributions: Sample t
0:49
Distance between Distributions: Sample t
0:50
Problem with Distance in Terms of Standard Error
2:56
Problem with Distance in Terms of Standard Error
2:57
Test Statistic (t) vs. Effect Size (d or g)
4:38
Test Statistic (t) vs. Effect Size (d or g)
4:39
Rules of Effect Size
6:09
Rules of Effect Size
6:10
Why Do We Need Effect Size?
8:21
Tells You the Practical Significance
8:22
HT can be Deceiving…
10:25
Important Note
10:42
What is Power?
11:20
What is Power?
11:21
Why Do We Need Power?
14:19
Conditional Probability and Power
14:20
Power is:
16:27
Can We Calculate Power?
19:00
Can We Calculate Power?
19:01
How Does Alpha Affect Power?
20:36
How Does Alpha Affect Power?
20:37
How Does Effect Size Affect Power?
25:38
How Does Effect Size Affect Power?
25:39
How Does Variability and Sample Size Affect Power?
27:56
How Does Variability and Sample Size Affect Power?
27:57
How Do We Increase Power?
32:47
Increasing Power
32:48
Example 1: Effect Size & Power
35:40
Example 2: Effect Size & Power
37:38
Example 3: Effect Size & Power
40:55
Section 11: Analysis of Variance
F-distributions

24m 46s

Intro
0:00
Roadmap
0:04
Roadmap
0:05
Z- & T-statistic and Their Distribution
0:34
Z- & T-statistic and Their Distribution
0:35
F-statistic
4:55
The F Ration ( the Variance Ratio)
4:56
F-distribution
12:29
F-distribution
12:30
s and p-value
15:00
s and p-value
15:01
Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity?
18:33
Example 2: F-distributions
19:29
Example 3: F-distributions and Heights
21:29
ANOVA with Independent Samples

1h 9m 25s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
The Limitations of t-tests
1:12
The Limitations of t-tests
1:13
Two Major Limitations of Many t-tests
3:26
Two Major Limitations of Many t-tests
3:27
Ronald Fisher's Solution… F-test! New Null Hypothesis
4:43
Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)
4:44
Analysis of Variance (ANoVA) Notation
7:47
Analysis of Variance (ANoVA) Notation
7:48
Partitioning (Analyzing) Variance
9:58
Total Variance
9:59
Within-group Variation
14:00
Between-group Variation
16:22
Time out: Review Variance & SS
17:05
Time out: Review Variance & SS
17:06
F-statistic
19:22
The F Ratio (the Variance Ratio)
19:23
S²bet = SSbet / dfbet
22:13
What is This?
22:14
How Many Means?
23:20
So What is the dfbet?
23:38
So What is SSbet?
24:15
S²w = SSw / dfw
26:05
What is This?
26:06
How Many Means?
27:20
So What is the dfw?
27:36
So What is SSw?
28:18
Chart of Independent Samples ANOVA
29:25
Chart of Independent Samples ANOVA
29:26
Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users?
35:52
Hypotheses
35:53
Significance Level
39:40
Decision Stage
40:05
Calculate Samples' Statistic and p-Value
44:10
Reject or Fail to Reject H0
55:54
Example 2: ANOVA with Independent Samples
58:21
Repeated Measures ANOVA

1h 15m 13s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
The Limitations of t-tests
0:36
Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?
0:37
ANOVA (F-test) to the Rescue!
5:49
Omnibus Hypothesis
5:50
Analyze Variance
7:27
Independent Samples vs. Repeated Measures
9:12
Same Start
9:13
Independent Samples ANOVA
10:43
Repeated Measures ANOVA
12:00
Independent Samples ANOVA
16:00
Same Start: All the Variance Around Grand Mean
16:01
Independent Samples
16:23
Repeated Measures ANOVA
18:18
Same Start: All the Variance Around Grand Mean
18:19
Repeated Measures
18:33
Repeated Measures F-statistic
21:22
The F Ratio (The Variance Ratio)
21:23
S²bet = SSbet / dfbet
23:07
What is This?
23:08
How Many Means?
23:39
So What is the dfbet?
23:54
So What is SSbet?
24:32
S² resid = SS resid / df resid
25:46
What is This?
25:47
So What is SS resid?
26:44
So What is the df resid?
27:36
SS subj and df subj
28:11
What is This?
28:12
How Many Subject Means?
29:43
So What is df subj?
30:01
So What is SS subj?
30:09
SS total and df total
31:42
What is This?
31:43
What is the Total Number of Data Points?
32:02
So What is df total?
32:34
so What is SS total?
32:47
Chart of Repeated Measures ANOVA
33:19
Chart of Repeated Measures ANOVA: F and Between-samples Variability
33:20
Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability
35:50
Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos?
40:25
Hypotheses
40:26
Significance Level
41:46
Decision Stage
42:09
Calculate Samples' Statistic and p-Value
46:18
Reject or Fail to Reject H0
57:55
Example 2: Repeated Measures ANOVA
58:57
Example 3: What's the Problem with a Bunch of Tiny t-tests?
1:13:59
Section 12: Chi-square Test
Chi-Square Goodness-of-Fit Test

58m 23s

Intro
0:00
Roadmap
0:05
Roadmap
0:06
Where Does the Chi-Square Test Belong?
0:50
Where Does the Chi-Square Test Belong?
0:51
A New Twist on HT: Goodness-of-Fit
7:23
HT in General
7:24
Goodness-of-Fit HT
8:26
Hypotheses about Proportions
12:17
Null Hypothesis
12:18
Alternative Hypothesis
13:23
Example
14:38
Chi-Square Statistic
17:52
Chi-Square Statistic
17:53
Chi-Square Distributions
24:31
Chi-Square Distributions
24:32
Conditions for Chi-Square
28:58
Condition 1
28:59
Condition 2
30:20
Condition 3
30:32
Condition 4
31:47
Example 1: Chi-Square Goodness-of-Fit Test
32:23
Example 2: Chi-Square Goodness-of-Fit Test
44:34
Example 3: Which of These Statements Describe Properties of the Chi-Square Goodness-of-Fit Test?
56:06
Chi-Square Test of Homogeneity

51m 36s

Intro
0:00
Roadmap
0:09
Roadmap
0:10
Goodness-of-Fit vs. Homogeneity
1:13
Goodness-of-Fit HT
1:14
Homogeneity
2:00
Analogy
2:38
Hypotheses About Proportions
5:00
Null Hypothesis
5:01
Alternative Hypothesis
6:11
Example
6:33
Chi-Square Statistic
10:12
Same as Goodness-of-Fit Test
10:13
Set Up Data
12:28
Setting Up Data Example
12:29
Expected Frequency
16:53
Expected Frequency
16:54
Chi-Square Distributions & df
19:26
Chi-Square Distributions & df
19:27
Conditions for Test of Homogeneity
20:54
Condition 1
20:55
Condition 2
21:39
Condition 3
22:05
Condition 4
22:23
Example 1: Chi-Square Test of Homogeneity
22:52
Example 2: Chi-Square Test of Homogeneity
32:10
Section 13: Overview of Statistics
Overview of Statistics

18m 11s

Intro
0:00
Roadmap
0:07
Roadmap
0:08
The Statistical Tests (HT) We've Covered
0:28
The Statistical Tests (HT) We've Covered
0:29
Organizing the Tests We've Covered…
1:08
One Sample: Continuous DV and Categorical DV
1:09
Two Samples: Continuous DV and Categorical DV
5:41
More Than Two Samples: Continuous DV and Categorical DV
8:21
The Following Data: OK Cupid
10:10
The Following Data: OK Cupid
10:11
Example 1: Weird-MySpace-Angle Profile Photo
10:38
Example 2: Geniuses
12:30
Example 3: Promiscuous iPhone Users
13:37
Example 4: Women, Aging, and Messaging
16:07
Loading...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
Bookmark & Share Embed

Share this knowledge with your friends!

Copy & Paste this embed code into your website’s HTML

Please ensure that your website editor is in text mode when you paste the code.
(In Wordpress, the mode button is on the top right corner.)
  ×
  • - Allow users to view the embedded video in full-size.
Since this lesson is not free, only the preview will appear on your website.
  • Discussion

  • Answer Engine

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Lecture Comments (4)

0 answers

Post by kabongo mpoyi on February 1, 2017

I think since you are using Asian, Latino, it is also better to use African and European instead of black and white. Using the term black or white is not a good terminology.
Thanks

0 answers

Post by Jethro Buber on October 8, 2014

Is this one way or two way ANOVA? I yet to learn the differences if any.

0 answers

Post by George Kumar on May 15, 2012

Latinos can be of different colors (e.g. Brazil)
Asians can be of different colors (e.g. India)
Suggest using the term Cacausians and African-Americans instead of White and Black.

0 answers

Post by samer hanna on November 4, 2011

good job Dr. Jo

ANOVA with Independent Samples

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:05
    • Roadmap
  • The Limitations of t-tests 1:12
    • The Limitations of t-tests
  • Two Major Limitations of Many t-tests 3:26
    • Two Major Limitations of Many t-tests
  • Ronald Fisher's Solution… F-test! New Null Hypothesis 4:43
    • Ronald Fisher's Solution… F-test! New Null Hypothesis (Omnibus Test - One Test to Rule Them All!)
  • Analysis of Variance (ANoVA) Notation 7:47
    • Analysis of Variance (ANoVA) Notation
  • Partitioning (Analyzing) Variance 9:58
    • Total Variance
    • Within-group Variation
    • Between-group Variation
  • Time out: Review Variance & SS 17:05
    • Time out: Review Variance & SS
  • F-statistic 19:22
    • The F Ratio (the Variance Ratio)
  • S²bet = SSbet / dfbet 22:13
    • What is This?
    • How Many Means?
    • So What is the dfbet?
    • So What is SSbet?
  • S²w = SSw / dfw 26:05
    • What is This?
    • How Many Means?
    • So What is the dfw?
    • So What is SSw?
  • Chart of Independent Samples ANOVA 29:25
    • Chart of Independent Samples ANOVA
  • Example 1: Who Uploads More Photos: Unknown Ethnicity, Latino, Asian, Black, or White Facebook Users? 35:52
    • Hypotheses
    • Significance Level
    • Decision Stage
    • Calculate Samples' Statistic and p-Value
    • Reject or Fail to Reject H0
  • Example 2: ANOVA with Independent Samples 58:21

Transcription: ANOVA with Independent Samples

Hi, welcome to educator. com.0000

We are going to talk about ANOVA with independent samples today.0002

So first we need to talk a little bit about why we need to introduce the ANOVA.0005

We had been doing so well at t-test so far.0011

Well, there are some limitations of the t-test and that is why we are going to need an ANOVA here.0013

An ANOVA is also called the analysis of variance and the analysis of variance is really also could be thought of as the omnibus hypothesis test.0020

So still, hypothesis test just like the t-test but it is the omnibus hypothesis test, we are going to talk what that means.0032

We are going to need to go over a little bit of notation in order to break down with the ANOVA details.0041

And then were really going to get to the nitty-gritty of partitioning or analyzing variance like0047

getting down of breaking apart variance into its component parts.0055

The we are going to build up the S statistics made up of those bits and pieces of variances and0059

then finally talk about how that relates to the F distribution and hypothesis testing.0066

Okay so first thing, the limitations of the t-test.0071

Well here is a common problem like I want to know this question.0077

Who upload more pictures to facebook?0083

The Latino users, white users, Asian users or black Facebook users?0086

Which of these racial or ethnic groups uploads more pictures to facebook?0091

Well, let us see what would happen if we use independent samples t-test?0098

What would we have to do?0101

Well we have to compare Latinos to white, Latinos to Asian, Latinos to black and whites and Asians and whites and blacks and Asians and blacks.0104

As like, all of a sudden we have to do 6 different independent samples t-test.0111

That is a lot of tiny, tiny little t-test and really the more t-test you do that increases your likelihood of type 1 error.0118

Previously, to calculate type 1 error we looked at one minus the probability that you would be0127

correct, so one minus the probability of being right and that was to me like . 05 let say, right?0135

But now that we want to calculate the probability of type 1 error for six t-test we have to think0144

back to our probability principles but really I just want to look something like this.0152

One minus whatever your correct rate is to the sixth power and that is got to be a much higher,0157

much higher type 1 error rate than you really want.0167

So the problem is that the more t-test you have, the more the bigger the chance of your type 10174

error and even non-mathematically you could think about this.0181

Any time you do a t-test you could reject the null, every time you reject the null you have the0186

possibility of making a type 1 error and so if you reject the null six times then you have increased0193

your type 1 error rate because your just rejecting more null hypotheses.0201

So you should know there are two major limitations of having many many tiny tiny little t-test.0206

So you have six separate t-test, one is the increased likelihood of type 1 error and that is bad.0213

We do not want a false alarm but there is a second problem, you are not using the full set of data in order to estimate S.0220

Remember how before we talked about how estimate of the population standard deviation?0231

Well, it would be nice if we had a good estimate of the population standard deviation and you0237

know when you have a better estimate of the population standard deviation?0242

When you got more data rate when you do a t-test for instance with Latinos than white people0246

then you are ignoring your luscious and totally usable data from your Asian and black American0253

population so that is a problem you are ignoring some of your data in order to estimate S and0260

your estimating S a bunch of different little time instead of having one sort of giant estimate of S0267

which would be a better way to go so both of these are major limitations of using many many little t-test.0274

So back in the day statisticians knew that there was this problem Ronald Fisher came up with a0282

solution and his solution is called an F test for Fisher.0291

You think of a new statistic you could name it after you self.0296

So he thought of something called an F test but this F test also includes a new way of thinking0302

about hypotheses and so the F test could also be thought of as an omnibus test and the way you0308

could think about them is like the Lord of the rings ring idea.0315

It is one to rule them all instead of doing many many tiny tiny little test, you do one test to0319

decide once and for all if there is a difference.0326

And because you have this one test you need one null hypothesis and here is what that null hypothesis is.0329

You need to test whether all the samples belong to the same population or whether 1 at least0337

one belongs to a different population because remember the null hypothesis and the alternative0346

hypothesis they have to be like two sides of the same point so your null hypothesis is that they are all equal.0351

The mu’s are all equal.0359

They all came from exactly the same population.0360

The other hypothesis the alternative hypothesis is that they are not all equal but let us think about what that means.0363

That means at least two of them are different from each other mean that all of them are 40372

different from each other, that means at least one guy is different from one of these guys.0377

That is it ,that is all it means, that is all you can find out.0382

So let us consider this situation let us say you have these three samples.0386

Your null hypothesis would be that they all came from the same population.0392

A1 A2 and A3 all the same population A but if we reject that null hypothesis what have we found out?0399

What we found out that at least two of them differ at least, all three of them could differ from each other or it could just be 2.0413

It could be that A1 and A2 are the same, the A3 is different.0421

It could be the A2 and A3 are the same but A1 is different.0425

It could mean that A 1 is totally different from A2 and that is totally different from A3.0428

Any of those are possibility so here is a good thing.0433

The good thing about the omnibus hypothesis is that you could test all mentioned things at ones.0436

That they all come from the same population, you could test that big hypothesis at ones.0442

The bad thing about it is that if you reject the null it still did not tell you which populations differ.0446

It only tells you that at least one of the valuations is different.0454

So when you reject the null, it is not quite as informative but still it is a very useful test.0459

So we need to know some notation before we go on.0466

An analysis of variance so analysis of variance, that is why it is called the ANOVA so sometimes0471

you might do with the opening little ANOVA notation, you want to analyze the variance so when0479

we want to analyze the variance we have to really think hard about what variance means.0486

And variance as sort of the average spread around some means so how much spread you have.0492

Are you really tightly clustered around the mean or you like really just burst around the mean.0500

Okay so first things first, consider all the data that we get from all the different groups.0505

That is why we have to regroup all the data from all the different groups, and a lot of variance0511

around the grand mean and the grand mean is a new idea.0518

The grand mean it is not just the mean of your sample but the grant mean is the mean of everybody lock together.0521

Pretend there are three groups pretend there is just one giant group that all three data sets have been sort of award into.0528

What is the meaning of that giant group?0536

That is called the grand mean and so for instance, here is our sample.0538

Our sample from A 1 our sample from A2, our sample from A3, and when you have sample means here is what the notation looks like.0544

It should be pretty familiar, X bar sub A1, X bar sub A2, X bar sub A3.0552

Now when we had a grand mean, we do not have three of them we just have one because remember, they are all lock together, right?0564

How do we distinguish the grand mean if we just say X bar we might confuse it for being a0571

sample instead of grand mean right and so in order to think grand mean this is the mean of all0579

the means, mean of all the samples right, we call it X double bar and that is how we know that it0585

is the grand mean so that is definitely one of the things you need to know.0592

So now let us talk about partitioning or analyzing the variance.0596

When we are analyzing variance, what we want to start with is the total amount of variance.0606

First, we got so have the big thing before we jump in apart.0614

So what is the big thing, the big variance in the room is total variance and this is the variance0617

from every single data point in our giant pool around the grand mean.0625

And we can actually just sort of think about how to write this as a formula just by knowing grand0629

mean as well as the variance formula right and so variance is always squared distance away from0639

the mean divided by however many data points you have to get average square distance from the mean.0645

Now we want the distance away from the grand mean so I am going to go ahead and put that0653

there instead of X bar I have X double bar and put my data points so that would be Exabyte.0659

And we want to get the sum of all of those and then divide by however many data points we have.0668

Usually N means the number of data points in a sample.0676

How do we tell the N of everybody of all your data points added together?0682

Here is how you, you call it N sub total.0688

And in this says it is not just the end of one of our little sample because we have three little0691

sample, I mean the N of everybody of the total number in your data set.0698

And so even this Exabyte I do not really mean just the axis in sample 1, I mean every single data0704

point so I would say I goes from 1 all the way up to N total.0713

Sorry this is a little small, N’s of total appear and so this will cycle through every single X, every0719

single data point in your entire sample lumped together.0729

Get their distance away from the grand mean, square it, add those squared distances together0732

divide by N so this is just the general idea of variance.0741

Average where distance from the mean.0747

In this case, retirement grand mean and so how do we say total variance?0750

Well it would be nice if we could say like, oh this is something subtotal, right?0757

Before we go on to variance though I just want to stop here before we go into average variance,0765

I just want to talk about this thing, what is this thing?0773

And so let us talk about some of variance request so variance is always going to be the sum of0777

squared distances, sum of squares divided by N or if you are talking about S, S squared is the sum0784

of squared distances over N minus 1 and another way of saying that is SS over degrees of freedom.0795

So we are just going to stop here for a second and just talk about this sum of squares and we are going to call that sum of squares total.0805

So that sum of squares total and that is going to be important to us because later we are going to0817

used these sum of squares, these different sum of squares to then talk about variance.0824

It is the squares are very unrelated to the idea of variance.0830

Now we have this total variance because this is really the idea of how much you are varying.0834

We have this total variance and were going to partition it into two types of variance.0840

One is within group variation and the other is between group variations.0845

So we have 3 groups, the between group variance is going to look at how different they are from each other.0850

The with in group variance is just going to look at how different they are from their own group,0860

how different the data are from their own group and that is going to be important because this0865

sum of squares total actually is made of sum of squares within plus sum of squares between.0871

So because of this idea we can really now see, where taking us all variance and partitioning it0882

into within group variance between group variance or between sample variance.0892

So first things first, within group variance.0899

How do we get an idea of how different each sample is from itself.0902

Well the very idea is just like what we have been talking about before.0912

This is each samples variance around their own mean and we already know the notation for this mean.0917

So that would be something like how much does everybody in sample A1 differ from the mean of0938

A 1 so what is that different to getting the sum of squares.0947

And what is the variance, the sum of squares for everybody in A2 square and the same thing for everybody in A3.0954

So this is sort of the regular use of variance that we used before regular use of sum of squares that we have used before.0971

Just looking at each sample variance from its own sample mean.0977

Now how do we get between group variance?0982

Between group variance is going to be each samples mean, how much does it very from the0986

grand mean, difference, squared difference from grand mean so there is some grand mean and0999

how much does each sample mean differ from that grand mean.1011

And so that is going to be between group variation.1016

How much do the group differ from that grand mean.1020

So first of all let us just review variance and sum of squares.1024

So sum of squares is the idea that were in use over and over again and it is just this idea that1033

yours summing the sigma sign, sum from X bar squared.1043

So it is just basically that the squared distance away from the mean and add them up.1052

That is sum of squares.1061

Now what we are doing is we are sort of swapping out this idea of the mean for things like grand1063

mean, sample mean, and were also swapping out what our data points are.1071

Is this like from N total, is it from all of data points, is it just the end from the sample, is it the group means?1082

So were swapping out these two ideas in order to get our sum of squares total, sum of square1098

between or sum of squares within but it is always the same idea.1106

Sum of distance, squared, add them up.1110

Okay now so what is variance in relationship?1113

Well variance is the average squared distance and so in doing this we always take the sum of1116

squares and we divide by however number we own but how many data points we have?1130

But often where using estimates of S instead of actually having the population standard deviation.1139

So were going to be using degrees of freedom instead of just N and we have different kinds of1146

degrees of freedom for between and within group variation so watch out for them.1153

Okay now let us go back to the idea of the F statistic.1162

Now that we have broken it down a little bit in terms of what kind of different variances there1167

are, hopefully the F statistic makes a little more steps sense.1171

The idea is that you want to take the ratio of the between group or sample variance over the1175

within group variance and the reason we want this particular ratio is that were actually very1187

interested in the between group difference that what our hypothesis test is all about whether the groups or difference are the same.1197

The within group variation, we cannot account for.1206

Its variation that just is inherent in the system and so we need to compare the between group1210

variation which we care about with the within group variation we cannot explain we do not have1218

any explanation for at least not in this hypothesis test, we have to do other tests to figure out that.1223

Okay so now that were here we need to do is replace these conceptual ideas with some of the things that we have been learning about.1230

In particular the variance between the variance within and so variance we are going to use S squared but S squared between over S squared within.1242

So variance between over variance within but now we know a little bit like we have refreshed,1260

what is variance about, how can we break it down in terms of sum of squares?1266

Well, that is what we are going to do.1272

We are going to double-click on this guy and here is what we see inside.1276

We see the sum of squares between divided by the degrees of freedom between all over the1280

sum of squares within then divided by the degrees of freedom within and this is how were going to actually calculate our S statistic.1291

Now, we will write out the formulas for each these but it is sort of good to know like where the S1301

statistics are comes from its conceptual route, you always wanted to be able to go back there.1309

Because ultimately when we have a large F, we want to be able to say, this means there is a1314

larger between group variation then within group relative to within group variation.1321

A larger difference in the thing that were interested in over the variance that we have no explanation for.1327

Okay so now let us figure out how to break down this idea and remember this idea really is the breakdown of the variance between.1332

So were breaking down the broken down thing.1343

So conceptually what is this?1347

Well, conceptually this is the difference of sample mean from the grand mean so imagine our1350

little group and their sum grand mean that all of these guys contributed to but this all have a little sample mean of their own.1357

What I want to do is know the difference between these, squared, then add them up.1376

That is the idea behind this.1384

So first of all how many means do we have how many data sets do we have, how many data points do we have?1386

Well we have a data point for every sample that we have so how many means do we have?1395

Or how many samples do we have.1402

We actually have a term for that.1404

The above letter that we reserve for how many samples is K, number of samples.1406

And so that you could think about okay if that is the number of samples then what might be the degrees of freedom here?1415

Well, just going to be K -1, here is why.1427

In order to get the grand mean we could do weighted average of these means and since there1434

are three of them if we knew what two of them were in advance the third one would not be free1442

to vary, we lockdown with that third one.1449

So the degree of freedom is K – 1.1451

Okay so what is the actual sum of squares between and know you need to take into1454

consideration how many actual data points are in each group.1463

For instance, group one might have a lot of data point or two might only have a few data points which means should matter more.1468

Well that can be taken into account.1476

So first things first, how do we tell it get the difference between this mean and this mean.1479

That is going to be this.1486

X bar minus X double bar so get them the difference between the mean and the grand mean.1489

Now we several means here so I am going to put an I for index and in my sum of squares my I is1497

going to go from one up through K so for each group that I have I want you to get this distance and square it.1507

Not only I am going to stop there but I also want you to make it count a lot if it has a lot of data1515

points so if this guy have a lot of data point he should get more votes, his difference from the1526

grand mean should count more than this guys different and so that is what we get by multiplying1531

by N if N is very large, this distance is an account a lot if N is very small, this distance is not going to count as much.1538

And this is the sum of squares between so that is the idea.1546

Okay so now we actually know this and this so we could actually create this guy but putting these two together.1554

Now let us talk about sum of squares within now that we know sum of squares between pretty well.1563

Well, first thing we need to know is that this idea sum of squares within divided by degrees of1582

freedom within is actually going to give us the variance within.1587

Let us talk about what this means conceptually.1593

This means the spread of all the data points from their own sample mean.1596

So this is the picture I want you to think of.1604

So everybody has their own little sample mean, X bars, own little sample mean and here are my1610

little data point and I want to get the distance of each set away from their own sets mean.1620

This is going to give me the within group variation.1629

Well, we need to think about first, how many data points do we have?1635

Well we have a total of N total, because you need to count all of these data points you need to add them all up.1643

The total number of data point.1652

So what is the degrees of freedom?1656

Well, it is not just N total -1.1659

How many means did we find?1661

We found three means, for each time we calculate a mean, we loss a degrees of freedom so it is1663

really the N total minus the number of means that we calculate and here, it is 3, it is 3 because we have three groups.1674

Remember, we have a letter for how many groups we have, and that is K so it is really going to1684

be N total minus K the number of group and that is going to give us the degrees of freedom within.1689

So what is the sum of squares within?1698

The sum of squares within is really going to be the sum of squares here plus the sum of squares here plus last the sum of squares here.1701

So for each group just get the sum of squares.1713

That is a pretty easy idea so the sum of squares within is just add up all the sum of squares.1718

Now what it this I mean?1728

I means the sum of squares for each group and that is I going from one to K so for however many1730

groups you have get that group sum of squares added to the next group sum of squares added to1740

the next group sum of squares and these are general formulas that work for two groups three1746

groups four groups, so that is sum of squares within and now that we know this and this, we could calculate this.1751

So now let us put it all together all at once.1764

My apologies because this may look a little bit tiny on your screen but hopefully you could sort of1770

reconstruct it from when you seen before because I am writing the same formulas just in a1781

different format just to show you how they all relate to each other.1786

So first conceptually this is always important because you can forget the formula but do not1789

forget the concept because from the concept you could reconstruct the formula.1796

It does take a little bit of mental work that you can do.1800

So first things first, the whole idea of the F is the between group variation over the within group variation.1803

So that is the whole idea right there and in order to get that we are going to get the variation between over the variability within.1817

Actually, I wrote this in the wrong place, should have written it down in the formula section.1831

So F equals the variability between divided by the variability within.1839

So that is the F.1852

Now for the F you cannot just calculate the sum of squares because really, the F is made up of a1856

bunch of squares and for F you actually need 2 degrees of freedom and that is going to be1861

determined by the between group degrees of freedom in the within group degrees of freedom.1865

So these I am just going to leave them empty.1871

Now let us talk about between group variability.1873

The big idea of this is that this spread around of sample means, around.1876

So gonna put S there of ex-bars around the grand mean.1891

That is what we are really looking for, that idea of this spread of all the sample means around the grand mean.1897

However the within group variability is the spread of data points from own sample mean.1904

So for each little group, what is the spread there?1920

So that is the idea of these two things.1923

Now in order to break it down into the formula you first wanted to get into what is S squared1928

between, so if you double-click on that, that takes you here, you double-click on this one, it will take you here S squared within.1935

So the variance between the between group variability, this is going to be just the very basic idea of variance.1943

Sum of squares over degrees of freedom.1955

Same thing here, sum of squares over degrees of freedom.1958

That stuff you already know but the only difference is with a little between here and with a little within here so that is only difference.1963

Once you get there then you could break this down right and you could say sum of squares1973

between and if you forget what the formula is, you can look up here, spread of ex-bars around the grand mean.1978

So X bar minus grand mean.1988

You know you have a whole bunch of them, sum of squares and you are going to go from 1 up1990

through K that is how many sample means you have.2000

And you wanted to be waited.2005

You wanted to count more your distance counts more if you have more data points in your2008

sample and then the degrees of freedom is fairly straightforward.2016

It is the number of the means -1 because when you find out your grand mean it is going to limit2021

one of those guys so your degrees of freedom is lessened by one.2030

So for sum of squares within, let us go back to this idea that spread of all the data point away2034

from their own sample mean and that is just going to be all those sum of squares for each little2043

group and you already know from for that, added together.2051

So I goes from one up to K.2055

And the degrees of freedom is really just this idea that you have all these points, all this data2058

points and total minus however many means you found because that is going to limit the2072

degrees of freedom for those data point and that is K.2080

One another thing I want to just say right here, it is just this idea that you might see in your2083

textbook or in a statistics package this idea called mean squared error.2091

So this term right here is sometimes going to be called the mean squared error term so that a common thing that you might see.2099

This may be called mean squared between or you might just see the mean square between2112

groups or something like that so between group start might be written out.2126

But almost always this denominator is going to be called mean squared error.2130

The reason I want to mention it here is not only to connect this lesson with whatever is going on2135

on your classes but also because mean squared error will be an important term for later on when2142

we are going to other kinds of ANOVA.2148

So now let us get to examples.2151

So first who uploads more photos?2156

People of unknown ethnicity Latino Asian black or white Facebook users.2158

So what are null hypothesis and sorry, you might be like, how will I ever know?2164

Is this data set found in your downloads?2172

And so the download looks like this and there is however many uploaded photos so here is2176

uploaded photos here so this person has uploaded 892 photos and their ethnicity is at zero.2185

And zero is just a stand-in for the term unknown or blank so they may have left there blank.2191

So the Latino sample is one, the Asian sample is 2, the black or African-American is three, the whiter European-American sample is 4.2198

And so you can look through that data set, I kind of recorded that just so that we can easily through see where we are.2210

Okay let us start off with our hypotheses.2217

On this hypotheses the hypotheses to rule them all right the null hypotheses should say that all2222

of these ethnicities and even unknown are all the same when it comes to uploading photos.2231

So our mu of ethnicity zero, occultist zero, occultist 1,2,3,4 only because that is what is also in the data set.2239

The mu of ethnicity zero, the mu of ethnicity 1 equals the mu of ethnicity 2, the mu of ethnicity three, equals the mu of ethnicity 4.2251

So we could say this in order to, say look they are all the same mathematically.2265

So this is how you write out that idea of they are all the same, they all came from the same population.2276

The reason we want to use E0 E1 E2 is just that it is going to make it a lot easier for us to write2281

the alternative hypotheses and this also helps us keep in mind why are we comparing the different groups.2291

What is the variable they will differ on and the variable is ethnicity and they all differ on that2298

variable they will have different values of it and that the between subjects variable so at least in2307

our sample people are either Latino or Asian or black or white although they can be both, just not in our sample.2315

So the alternative hypotheses is that the mu sub E are not all the same, not all equal.2323

We do not actually put does not equal because we know whether it is easy to that are not equal2346

or these two that are not equal or this one and this one that is not equal right.2363

So we do not make those claims and that is why you do not want it right those not equal ones2367

you want to just write a sentence that the means are just not all the same.2371

Now at the site in a significance level, just like before let us decide on a significance level of . 05, it is commonly accepted.2376

And because we are going to be calculating an S statistic, were going to be comparing it to disc alpha.2384

So it is always one tail, always only on the positive tail and so this is the F distribution.2397

Okay now let us talk about the decision stage so in the decision stage you want to draw the F distribution, just like I did so here is alpha, here is zero.2404

We need to find the critical F but in order to find the critical F we actually need to know the two2419

different degrees of freedom because this distribution is going to be different based on those 2° of freedom.2434

So we need to know the degrees of freedom in the numerator which in this case is the degrees of2441

freedom between and the degrees of freedom in the denominator and that is going to be the2448

degrees of freedom within, we could actually calculate that.2457

The degrees of freedom between is K -1 and here our K is 12345, K equals 5, 5 groups so that can2460

be a degrees of freedom of 4, and the degrees of freedom within is going to be N total minus K.2473

And so let us see how many we have total.2484

So we could do, we could just do account if you go down here I have actually sort of filled it in for2488

you a little bit just so that it is nice and formatted, I used E1 2345 but that really mean, one of them should be easy zero.2500

So K is five, we have five different groups, the degrees of freedom between is going to be 5-1,2511

the degrees of freedom within, we are going to need to know the total number of data points we2520

have so we need to count all the data point that we have.2527

All these different data point minus K so here is K.2531

So that is a 94 so apparently we have 99 people in our sample.2541

So then we can find the critical F.2547

Now ones we have the degrees of freedom between and the degrees of freedom within here just2550

to remind you this is the numerator and this is the denominator degrees of freedom.2555

Once we have that you can look it up in the back of your book.2561

Look for the F distribution chart or table and you need to find one of the, either the columns and2564

rows usually the columns will say degrees of freedom numerator and the degrees of freedom2574

denominator and then you could use both to look up your critical F 5% or you can look it up in Excel.2580

And the way we do that is by using F in because F discs will give you the probability, F in you put in probability and get the F value.2594

So the probability is . 05, only one tail so we do not have to worry about that.2607

The first degrees of freedom we are looking for is the numerator one and the second degrees of2611

freedom we are looking for is the denominator one.2615

And so when we look at that we see 2. 47 to be a critical F.2620

So your critical F is 2. 47 and so we need an F value greater than that or a P value less than . 05 in2633

order to reject our null hypothesis that they are all the same, all come from the same population.2644

Okay so step 4 in our same question.2650

We need to calculate the sample statistic as well as the P value so in order to calculate the2658

sample statistic we need to calculate F because F is the only test statistic that will help us rollout our omnivorous hypothesis.2666

Remember that is going to be the variance between over the variance within.2675

And once we get our F, then we can find the P value at that F.2681

So what is the probability of getting an F value that big or bigger given that the null hypothesis is true.2688

And we want that P value to be very small.2697

So let us go ahead and go to our example.2700

Example 1 and here I have already put in these formulas for you but one thing that I like to do for2706

myself is I like to tell myself sort of what I need and so I need this and then I break it down one2715

row at a time, the next row is going to be assessed between over the degrees of freedom2722

between and then I can find each one of those things separately and then I also am going to2730

break down the variance within into the sum of squares within and degrees of freedom within and I break those down.2736

Okay so first things first, I want to find the variance between but in order to do that I need to find2743

sum of squares between and that is this idea that I get every mean, so I need the mean for every2750

single one of these groups, for the mean for unknown, mean for Latino users for Asian users and2758

so on and so forth and I need to find the grand mean.2764

I need to find the squared distances between those guys.2768

Okay so first, I need to know how many people are in the this particular sample.2770

So let us find the count of E0.2781

That is our zero ethnicity for unknown people.2785

So I am going to account those people, and then I also going to count E1 and also going to count2791

E2, I am also going to count my E3 and finally I am going to count my E4.2807

Now these are the same data point that I am going to be using over and over again so what I am2830

going to do is I am going to lockdown my data point.2845

Say use this data whenever you are talking about the E subzero.2848

Use this data whenever I am talking about E1.2854

Use this data whenever I talk about E2 and use this data whenever I talk about E3, use this data when I talk about E4.2862

Now the nice thing about this is that you could see that they almost all have 20 data points in each sample.2879

The only one that differs is the unknown population the unknown ethnicity sample and they are just off by one.2891

So, what is the meaning of the sample?2900

One thing I could do is I could just copy and paste the cross but what I really want to do is I do2904

not want to get the count anymore, I want to get the average.2915

So once I do that I could just type an average instead of count, save me a little bit of work and I find all these X bars, X bars for 01234.2918

Now let us find the grand mean.2941

The grand mean it is going to be the means for everybody so that is going to be the average for every single data point that we have.2944

And we really only need to find the grand mean ones.2951

If you want you could just point to the grand mean, copy and paste that down it should be the2962

same grand mean over and over again or you could just refer to this top one every single time.2972

So now let us put together our N times the distance squared before we add them all up.2978

So we have N times the distance X bar minus the grand mean, square that, and that is a huge2990

number, variance and now we are going to sum them all up.3004

Equal sign, sum and I want to sum all of this up.3011

I get this giant number 8 million something.3019

So huge number.3023

So once I have that I can just put a pointer here.3025

I just put equal sign and point to this sum.3031

And that is really the sum of squares between.3035

What about degrees of freedom between, have I already found that?3039

Yes I have, I found it up here.3047

So I am not going to calculate that again I am just going to point to it.3049

Once I have these two now I can get variance between groups.3054

So it is this divided by the sum of squares divided the degrees of freedom.3060

We saw the giant number that it make sense if you take 8 millions something divide by 4 you get 2 millions something.3067

It is still a giant number but is it more giant than the variance within?3072

I do not know, let us see.3080

So in order to find the variance within then I need to find the sum of squares within as well as the degrees of freedom within.3082

So how do I find sum of squares within?3087

Well, one thing I could do I could go to each data point and find mean, subtract each X from each3093

mean, square it, add them all up, or I could use a little trick.3101

I might use a little trick.3107

So just to remind you.3113

So here is my little trick.3113

So remember the variance of anything, the variance is going to be some of squares divided by N-1.3116

So if I find variance and I multiply it by N-1 I could get my sum of squares, I could do variance times N-1.3129

I could use that trick if I use XL.3146

So here is what I am going to do.3152

I am going to find the variance.3157

First it might be easy if I copied these.3159

Just so that I do not have to go and select those.3166

If I find the variance and then I multiply it by N -1, I get my sum of squares.3171

I am just working backwards from what I know about variance.3186

So I am going to do that same thing here and and get my variance and multiply it by N minus 1.3189

Get my variance multiplied by N – 1, finally variance multiplied by N – 1.3199

Obviously you do not have to do this, you could go ahead and actually compute sum of squares3234

for each set of data but that would take up a lot of room and typically more time so if XL is3243

handy to you then I really highly recommend the shortcut and then we will just want to sum all the guys up.3251

That some of all the sum of squares and we get this giant number.3258

We get 42 million, really large number.3263

But our degrees of freedom within is also a larger number than our degrees of freedom between.3279

And so if I find out my variance within then let us see.3287

Is this smaller or bigger.3295

Well we see that this number 450,000, that is the smaller number than 2 million so that is looking good for S statistic.3297

So our S statistic is the variance between divided by the variance within and we get 4. 48 and3312

that is quite a bit larger than our critical F of 2.46 and I have forgotten to put a place for P value3323

but let us calculate the P value here so in order to calculate P value we put F discs and we put in3334

the F value and the degrees of freedom for the numerator as well as the degrees of freedom for the denominator.3343

And we get P = .002 just quite a bit smaller than .05.3353

So that is a good thing so in step five we reject the null.3362

How does which group is different or multiple groups are different from each other?3366

We just know that the groups are not all the same that is all we know.3374

Okay so we got a P value equals .002 so we rejected the null hypothesis.3378

Here is the thing, remember at the end of this, we still do not know who is who, we just know that somebody is different.3390

At the end of this what you wanted to do is, there is going to be like little paired t-test.3398

They are often called contrast and you want to do that in order to figure out what your actual,3405

which group actually differs from which other group not just whether some group differs from3414

some other group and so you want to do a little bit more after you do this.3420

This are called post hoc test.3425

And in a lot of ways they are very similar to t-test were you look at pairs.3427

There is one change, they change the sort of P value that you are looking for so but you wanted3439

to do the post hoc tests afterwards and to do all the little comparison so that you can figure out who is different from who.3446

But you are only allowed to do a post hoc test if you rejected the null hypothesis.3452

So you are not allowed to do a post hoc test if you have not reject the null hypothesis that is why3457

we cannot just get to the step from the very beginning.3464

So first thing we need to do if you reject is do post hoc test.3468

Something you need to do is find the effect size.3472

In the case of an F test, you are not going to find like coens D or hedges G.3475

You are not going to find that kind of effect size.3486

You are going to find what it’s called Eta squared.3488

Eta squared, it looks like the N squared.3490

And eight is where it is going to give you an idea of the effect size.3495

Now let us go to example 2.3499

So also the data is provided in your download.3504

A pharmaceutical company wants to know whether new drug had the side effect of causing patients to become jittery.3508

3 randomly selected sample, the patients were given 3 mild doses of the drug.3513

0, 100 200 mg and they were also given a finger tapping exercise.3518

Does this drug affect this finger tapping behaviour?3523

Will this one I did not format really nicely for you because I want to serve figured out as you go but do not worry I will do this with you.3527

So first things first Omnibus hypothesis.3536

And that is that all three dosages are the same so mu of dosage zero = mu of dosage 100 = mu of dosage 200.3538

And the alternative hypothesis is that mu of the dosages are not all same.3563

Okay step 2.3575

Alpha = .05.3579

Step three decision stage how do we make our decision to reject or fail to reject.3581

First you want to draw that F distribution, put colour in that Alpha = .05, that is the error rate were willing to tolerate.3591

Now what is our critical F?3603

In order to find our critical F we need to know the degrees of freedom for between the degrees of freedom for within.3607

So if you go to example 2 the worksheet for example 2, example 2 then you can see this data set.3615

Now usually this is not the way data is set up that especially if you use SPSS or some of these other statistics packages.3627

Usually you will see the data for one person on one line just like this.3635

Just like example 1 the data for one person their ethnicity and their photos are on one line.3641

You will rarely see this but you may see this in textbooks so I do want you to sort of pay attention3649

to that but here and the problem is that different people were given the different dosages so you3655

could assume each square to be a different person.3660

So, were on step three decision stage and in order to figure out our critical F, we need to know3663

the degrees of freedom between and degrees of freedom within, that is not so pretty anymore,3677

this takes a long time to do that to put all the little fancy things in there but it is very easy.3683

So degrees of freedom between in order to find that it would be really helpful if we knew the K,3695

how many groups right and there are three groups, three different drug dosages.3699

So it is K -1 degree of freedom 2.3705

In order to find degrees of freedom within we need to know N total.3710

How many total data points do we have?3716

And we could easily find that in XL using count and selecting all our data point so there is 30 people 10 people in each group.3719

So that is going to be N total minus K.3730

That should be 27.3736

Once we know that we can find our critical F and use F in probability of .05 degrees of freedom3738

for the numerator is going to be degrees of freedom between, degrees of freedom for the3747

denominator is degrees of freedom within and we get 3. 35 as our critical F.3752

Note that this is a larger critical F than before when we had more data points.3760

Like 90 data points in the other example and because of that brought down our critical F.3767

Now let us go to step 4, step 4 we need to calculate the sample F as well as the P value.3772

Let us talk about how you do F.3784

Here we need the variance, variance between divided by the variance within.3786

How do we find the variance between?3795

Well that is going to be the sum of squares between divided by the degrees of freedom between.3797

How do we find sum of squares between?3805

Well remember, the idea of it is going to be the means for each group, distance from that mean3809

to the grand mean, square that distance, weight that distance by how many N we have, and then add them all up.3816

So in order to get that, up and down here what will we put in the other stuff to?3826

The variance within, that is going to be the sum of squares within divided by the degrees of3837

freedom within, just so I know how much room I have to work with.3844

Okay so first they might be helpful to know which being were talking about, the dosage so it is 0,3849

D0, D 100 and D200, those are three different groups.3857

What is the N for each of these groups, what is the X-bar for each of these groups, what is the3865

grand mean and then we want to look at N times X bar minus the grand mean, we want to3872

square that and then once we have that, now we want to add these up and so I will put sum here3883

just so that I can remember to add them up.3894

Okay so the N for all of these are 10, we already know that, and let us find the X-bar.3897

So this is the average of this and then the next one, it is the same thing, we know it is the same3906

thing the average except for column B and the next one is average again, for column C for 200.3922

How do we find the grand mean?3934

We find the average, we could put a little pointer so that they all have the same grand means.3937

Now we could calculate the weighted distance squared for each of these group means.3952

So it is N times X bar minus the grand mean, squared.3962

And once you have that you could just dragging all the way down and here we sum of these all up.3970

We sum these weighted differences up and we get a sum of squares of 394.3983

And we already know that degrees of freedom between group so we could put this in divided by this number.3994

We get 197.4006

Now let us see.4009

It is not going to be bigger or smaller than the variance within right and in order to find the4011

variance within it helps to just sort of conceptually remember, okay what is the sum of squares4018

within, then the sum of squares for each of these groups from their own means.4022

And so the sum of squares for each of these dosages are going to be, and I am just going to use4029

that shortcut, the variance for this set multiplied by nine, that is N -1.4041

And I am just going to take that and I am going to say do that for the second column as well as the third column.4058

And once they do that I just want to sum these all up and I get 419.4073

So now I have my sum of squares within.4082

I divide that by my degrees of freedom within and I get 15.53 and even before we do anything4085

we could see wow that variance between is a lot bigger than the variance within.4095

So we divide and we get 12.69, 12.7 right and that is much larger than the critical F that we set.4099

What is the P value for this?4111

We use F disc, we put in the F value we got, we put in the degrees of freedom between, degrees of freedom within and we get a P value of .0001.4114

So were pretty sure that there is a difference between these 3 groups in terms of finger tapping.4131

We just do not know what that difference is.4138

So step five would be reject null and once we decided to reject the null then you would go on to4140

do post talk test as well as calculating effect size.4153

So that is one way ANOVA with independent samples.4157

Thanks for using educator.com.4163

Educator®

Please sign in to participate in this lecture discussion.

Resetting Your Password?
OR

Start Learning Now

Our free lessons will get you started (Adobe Flash® required).
Get immediate access to our entire library.

Membership Overview

  • Available 24/7. Unlimited Access to Our Entire Library.
  • Search and jump to exactly what you want to learn.
  • *Ask questions and get answers from the community and our teachers!
  • Practice questions with step-by-step solutions.
  • Download lecture slides for taking notes.
  • Track your course viewing progress.
  • Accessible anytime, anywhere with our Android and iOS apps.