Chapter 10 Chi-Square Test of Independence

A Chi-Square Test of Independence evaluates whether the conditional proportions of two variables are statistically independent of one another.

10.1 Overview

The last statistical test that we studied (Analysis of Variance or ANOVA) involved examining the relationship between a categorical explanatory variable and a quantitative response variable. Next, we will consider inference in the context of relationships between two categorical variables, corresponding to case C->C. The Chi-Square Test of Independence allows us to examine our observed data and to evaluate whether we have enough evidence to conclude, with a reasonable level of certainty (p<0.05), that two categorical variables are related.

10.2 Lesson

Learn about the Chi-Square Test of Independence. Consider the null and alternative hypotheses when using this test. See how the fit of the observed data with the data that is expected plays a role in our evaluation of statistical significance. Learn how to run a Chi-Square Test of Independence and interpret results within the context of real data. Understand the critical role of post hoc tests for interpreting Chi-Square results when our explanatory variable includes more than two groups or levels. See how post hoc tests allow us to avoid increasing the likelihood of rejecting the null hypothesis when the null hypothesis should be accepted, a problem known as Type 1 error. Practice controlling for Type I error using the Bonferroni adjustment. Click on a video lesson below.


SAS                     R                     Python                     Stata                     SPSS


10.3 Syntax

SAS

proc freq;
    tables CategResponseVar*CategExplanatoryVar / chisq;

Select pairs one at a time for post hoc test (use Bonferroni).

*at the end of the data step; 
if (CategExplanatoryVar = 1) or (CategExplanatoryVar = 3);

*following the data step;
proc freq;
    tables CategResponseVar*CategExplanatoryVar / chisq;

R

myChi <- chisq.test(myData$CategResponseVar,
myData$CategExplanatoryVar)
myChi
myChi$observed # for actual, observed cell counts
prop.table(myChi$observed, 2) # for column percentages
prop.table(myChi$observed, 1) # for row percentages
(with post hoc test)
library(fifer)
myChi <- chisq.test(myData$CategResponseVar,
myData$CategExplanatoryVar) observed_table<- myChi$observed
chisq.post.hoc(observed_table, popsInRows=FALSE,
control=”bonferroni”)[,1:2]

Python

import scipy.stats
ct1=pandas.crosstab(myData['CategResponseVar],
myData['CategExplanatoryVar'])
print ('chi-square value, p value, degrees of freedom, expected counts')
cs1= scipy.stats.chi2_contingency(ct1)
print (cs1)
# column percentages
colsum=ct1.sum(axis=0)
colpct=ct1/colsum
print(colpct)

#with post hoc test
#for each Chi Sq pair data subset
ct1=pandas.crosstab(myData['CategResponseVar],
myData['CategExplanatoryVar'])
cs1= scipy.stats.chi2_contingency(ct1)
print (cs1)

STATA

tab CategResponseVar CategExplanatoryVar , chi2 row col

//with post hoc test
tab CategResponseVar CategExplanatoryVar if ///
CategExplanatoryVar==1 | CategExplanatoryVar==3, chi2

SPSS

CROSSTABS/TABLES= CategResponseVar by
CategExplanatoryVar /STATISTICS=CHISQ.

*with post hoc test.
TEMPORARY.
SELECT IF CategExplanatoryVar=1 OR CategExplanatoryVar =3.
CROSSTABS TABLES= CategResponseVar CategExplanatoryVar
/STATISTICS=CHISQ.

10.4 Assignment

Run a Chi-Square Test of Independence using a categorical response variable and a categorical explanatory variable (your response variable must have only two levels). State your research question, the null and alternate hypotheses and whether the Chi-Square value is statistically significant or not. Submit your output/results. Describe your results. You will need to analyze and interpret post hoc paired comparisons in instances where your original statistical test was significant, and you were examining more than two groups (i.e., more than two levels of a categorical, explanatory variable).