Chapter 9 Analysis of Variance

Analysis of Variance (ANOVA) evaluates whether the means of two or more groups are statistically different from each other.

9.1 Overview

You have learned about inferential statistics, the steps in hypothesis testing and the role of probability in measuring or quantifying uncertainty as a way of drawing conclusions about a particular population from information available from only a sample. In the next three chapters, you will be introduced to specific inferential tools that allow us to use inferential principals to evaluate associations between different types of variables (i.e., those measured categorically and/or quantitatively). Analysis of Variance (ANOVA) evaluates whether the means of two or more groups are statistically different from each other. This test is appropriate whenever you want to compare the means (quantitative response variable) of groups (categorical explanatory variable).

9.2 Lesson

Learn about the ANOVA F test. Consider the null and alternative hypothesis. See how variation within the sample means and variation between the groups plays a role in our evaluation of statistical significance. Learn how to run an ANOVA test and interpret results within the context of real data. Understand the critical role of post hoc tests for interpreting ANOVA results when our explanatory variable includes more than two groups or levels. See how post hoc tests allow us to avoid increasing the likelihood of rejecting the null hypothesis when the null hypothesis should be accepted, a problem known as Type 1 error. Click on a video lesson below.


SAS                     R                     Python                     Stata                     SPSS


9.3 Syntax

SAS

proc anova;
    class CategExplanatoryVar;
    model QuantResponseVar=CategExplanatoryVar;
    means CategExplanatoryVar;

*with Duncan post hoc test;
proc anova;
    class CategExplanatoryVar;
    model QuantResponseVar=CategExplanatoryVar;
    means CategExplanatoryVar /duncan;

R

myAnovaResults <- aov(QuantResponseVar ~
CategExplanatoryVar, data = myData)
summary(myAnovaResults)

#with post hoc test
myAnovaResults <- aov(QuantResponseVar ~
CategExplanatoryVar, data = myData)
TukeyHSD(myAnovaResults)

Python

import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi
model1 = smf.ols(formula='QuantResponseVar ~
C(CategExplanatoryVar)', data=myData)
results1 = model1.fit()
print (results1.summary())

#with post hoc test
model1 = smf.ols(formula='QuantResponseVar ~
C(CategExplanatoryVar)', data=myData)
results1 = model1.fit()
print (results1.summary())
sub1 = myData[['QuantResponseVar',
'CategExplanatoryVar']].dropna() mc1 =
multi.MultiComparison(sub1['QuantResponseVar'],
sub1['CategExplanatoryVar'])
res1 = mc1.tukeyhsd()
print(res1.summary())

STATA

oneway QuantResponseVar CategExplanatoryVar, tabulate 
// with post hoc test
oneway QuantResponseVar CategExplanatoryVar, sidak

SPSS

UNIANOVA QuantResponseVar BY CategExplanatoryVar.
*with post hoc test.
UNIANOVA QuantResponseVar BY CategExplanatoryVar
/POSTHOC=CategExplanatoryVar (TUKEY)
/PRINT=ETASQ DESCRIPTIVE.

9.4 Assignment

Run an ANOVA using a quantitative response variable and a categorical explanatory variable. State your research question, the null and alternate hypotheses and whether the ANOVA F value is statistically significant or not. Submit your output/results. Describe your results. You will need to analyze and interpret post hoc paired comparisons in instances where your original statistical test was significant, and you were examining more than two groups (i.e., more than two levels of a categorical, explanatory variable).