Chapter 10 Chi-Square Test of Independence
A Chi-Square Test of Independence evaluates whether the conditional proportions of two variables are statistically independent of one another.
10.1 Overview
The last statistical test that we studied (Analysis of Variance or ANOVA) involved examining the relationship between a categorical explanatory variable and a quantitative response variable. Next, we will consider inference in the context of relationships between two categorical variables, corresponding to case C->C. The Chi-Square Test of Independence allows us to examine our observed data and to evaluate whether we have enough evidence to conclude, with a reasonable level of certainty (p<0.05), that two categorical variables are related.
10.2 Lesson
Learn about the Chi-Square Test of Independence. Consider the null and alternative hypotheses when using this test. See how the fit of the observed data with the data that is expected plays a role in our evaluation of statistical significance. Learn how to run a Chi-Square Test of Independence and interpret results within the context of real data. Understand the critical role of post hoc tests for interpreting Chi-Square results when our explanatory variable includes more than two groups or levels. See how post hoc tests allow us to avoid increasing the likelihood of rejecting the null hypothesis when the null hypothesis should be accepted, a problem known as Type 1 error. Practice controlling for Type I error using the Bonferroni adjustment. Click on a video lesson below.
SAS – R – Python – Stata – SPSS
10.3 Syntax
SAS
proc freq;
tables CategResponseVar*CategExplanatoryVar / chisq;
Select pairs one at a time for post hoc test (use Bonferroni).
*at the end of the data step;
if (CategExplanatoryVar = 1) or (CategExplanatoryVar = 3);
*following the data step;
proc freq;
tables CategResponseVar*CategExplanatoryVar / chisq;
R
<- chisq.test(myData$CategResponseVar,
myChi $CategExplanatoryVar)
myData
myChi$observed # for actual, observed cell counts
myChiprop.table(myChi$observed, 2) # for column percentages
prop.table(myChi$observed, 1) # for row percentages
(with post hoc test)library(fifer)
<- chisq.test(myData$CategResponseVar,
myChi $CategExplanatoryVar) observed_table<- myChi$observed
myDatachisq.post.hoc(observed_table, popsInRows=FALSE,
control=”bonferroni”)[,1:2]
Python
import scipy.stats
=pandas.crosstab(myData['CategResponseVar],
ct1myData['CategExplanatoryVar'])
print ('chi-square value, p value, degrees of freedom, expected counts')
cs1= scipy.stats.chi2_contingency(ct1)
print (cs1)
# column percentages
colsum=ct1.sum(axis=0)
colpct=ct1/colsum
print(colpct)
#with post hoc test
#for each Chi Sq pair data subset
ct1=pandas.crosstab(myData['CategResponseVar],
'CategExplanatoryVar'])
myData[= scipy.stats.chi2_contingency(ct1)
cs1print (cs1)
STATA
tab CategResponseVar CategExplanatoryVar , chi2 row col
//with post hoc test
tab CategResponseVar CategExplanatoryVar if ///
chi2 CategExplanatoryVar==1 | CategExplanatoryVar==3,
SPSS
CROSSTABS/TABLES= CategResponseVar by
CategExplanatoryVar /STATISTICS=CHISQ.
*with post hoc test.
TEMPORARY.
SELECT IF CategExplanatoryVar=1 OR CategExplanatoryVar =3.
CROSSTABS TABLES= CategResponseVar CategExplanatoryVar
/STATISTICS=CHISQ.
10.4 Assignment
Run a Chi-Square Test of Independence using a categorical response variable and a categorical explanatory variable (your response variable must have only two levels). State your research question, the null and alternate hypotheses and whether the Chi-Square value is statistically significant or not. Submit your output/results. Describe your results. You will need to analyze and interpret post hoc paired comparisons in instances where your original statistical test was significant, and you were examining more than two groups (i.e., more than two levels of a categorical, explanatory variable).