Chapter 12 Moderation

“The world is one big data problem.”    — Andrew McAfee

12.1 Overview

You have learned about specific inferential tools (i.e., ANOVA, Chi-Square Test of Independence, and Pearson Correlation) that allow us to use inferential principals to evaluate associations between variables (i.e., those measured both categorically and/or quantitatively). You will now learn to examine a bivariate relationship to determine if it depends on a third variable (i.e., the relationship may differ for population subgroups). This is known as moderation (or statistical interaction). In statistics, moderation occurs when the relationship between two variables depends on a third variable. In this case, the third variable is referred to as the moderating variable or simply the moderator. The effect of a moderating variable is often characterized statistically as an interaction; that is, a third variable that affects the direction and/or strength of the relation between your explanatory and response variable.

12.2 Lesson

Learn about moderation. Consider how we can test bivariate relationships that we examined with ANOVA, Chi-Square Test of Independence, and Pearson Correlation to determine if the relationship is different at different levels of a categorical third variable. Consider the null and alternative hypothesis when testing for moderation. Understand how examining moderation (i.e., statistical interaction) is a way to better understand your data. Click on a video lesson below.


SAS                     R                     Python                     Stata                     SPSS


12.3 Syntax

12.3.1 moderation for ANOVA

SAS

proc sort;
    by CategThirdVar;

proc anova;
    class CategExplanatoryVar;
    model QuantResponseVar=CategExplanatoryVar;
    means CategExplanatoryVar;
    by CategThirdVar;

R

by(myData, myData$CategThirdVar, function(x)
list(aov(QuantResponseVar ~ CategExplanatoryVar, data = x), summary(aov( QuantResponseVar ~ CategExplanatoryVar, data = x))))

Python

#subset by categorical 3rd variable
sub2=myData[(myData['CategThirdVar']=='Group 1')]
sub3=myData[(myData['CategThirdVar']=='Group 2')]
import statsmodels.api
import statsmodels.formula.api as smf
model2 = smf.ols(formula='QuantResponseVar ~
C(CategExplanatoryVar)', data=sub2).fit()
print (model2.summary())
model3 = smf.ols(formula= formula='QuantResponseVar ~
C(CategExplanatoryVar)', data=sub3).fit()
print (model3.summary())

STATA

bys CategThirdVar: oneway QuantResponseVar
CategExplanatoryVar, tab

SPSS

SORT CASES BY CategThirdVar.
SPLIT FILE LAYERED BY CategThirdVar.
ONEWAY QuantResponseVar BY CategExplanatoryVar
/ STATISTICS DESCRIPTIVES
/ POSTHOC = BONFERRONI ALPHA (0.05).
SPLIT FILE OFF.

12.3.2 moderation for chi-square test of independence

SAS

proc sort;
    by CategThirdVar;

proc freq;
    tables CategResponseVar*CategExplanatoryVar/chisq;
    by CategThirdVar;

R

by(myData, myData$CategThirdVar, function(x)
list(chisq.test(x$CategResponseVar, x$CategExplanatoryVar),
chisq.test(x$CategResponseVar, x$CategExplanatoryVar)$observed,
prop.table(chisq.test(x$CategResponseVar,
x$CategExplanatoryVar) $observed, 2))) # column %s

Python

#subset by categorical 3rd variable
sub2=myData[(myData['CategThirdVar']=='Group 1')]
sub3=myData[(myData['CategThirdVar']=='Group 2')]
ct2=pandas.crosstab(sub2['CategResponseVar'],
sub2['CategExplanatoryVar'])
print (ct2)
ct3=pandas.crosstab(sub3['CategResponseVar'],
sub3['CategExplanatoryVar'])
print (ct3)

STATA

bys CategThirdVar: tab CategResponseVar CategExplanatoryVar, chi2 row

SPSS

CROSSTABS/TABLES=CategResponseVar by CategExplanatoryVar
by CategThirdVar
/ CELLS = COUNT ROW
/ STATISTICS = CHISQ.

12.3.3 moderation for pearson correlation:

SAS

proc sort;
    by CategThirdVar;

proc corr;
    var QuantResponseVar QuantExplanatoryVar;
    by CategThirdVar;

R

by(myData, myData$CategThirdVar, function(x)
cor.test(x$QuantResponseVar, x$QuantExplanatoryVar))

Python

#subset by categorical 3rd variable
sub1=myData[(myData[‘CategThirdVar’]== 1)]
sub2=myData[(myData[‘CategThirdVar’]== 2)]
sub3=myData[(myData[‘CategThirdVar’]== 3)]
print (scipy.stats.pearsonr(sub1['QuantResponseVar'],
sub1['QuantExplanatoryVar']))
print (scipy.stats.pearsonr(sub2['QuantResponseVar'],
sub2['QuantExplanatoryVar']))
print (scipy.stats.pearsonr(sub3['QuantResponseVar'],
sub3['QuantExplanatoryVar']))

STATA

bys CategThirdVar: corr QuantResponseVar QuantExplanatoryVar

SPSS

SORT CASES BY CategThirdVar.
SPLIT FILE LAYERED BY CategThirdVar.
CORRELATIONS
/ VARIABLES= QuantResponseVar QuantExplanatoryVar
/ STATISTICS DESCRIPTIVES.
SPLIT FILE OFF.

12.4 Assignment

Test whether a categorical third variable moderates your association of interest (i.e. your research question) using ANOVA, Chi-Square Test of Independence, or Pearson Correlation. Submit the program along with corresponding output for the bivariate relationship and the moderation output. Describe any similarities or differences in the association between your explanatory and response variable for different levels of your third variable (i.e. different population subgroups).