Chapter 14 Multivariate Modeling
“All models are wrong, but some are useful.” — George E. P. Box
14.1 Overview
You have learned that a significant association or correlation does not mean causation. Next you will learn to determine statistically if a third variable is a confounder, or can account for a bivariate relationship you found to be significant using ANOVA, Chi-Square Test of Independence, or Pearson Correlation. Using multivariate models allows us to statistically control for additional variables that may account for (confound) a significant bivariate association.
14.2 Lesson
Learn about confounding and multivariate models. Understand that testing for likely confounding variables helps us to get slightly closer to establishing a cause and effect relationship when conducting an observational study. Determine if you need to use multiple regression because your response variable is quantitative or logistic regression because your response variable is categorical (with 2 levels). Consider the evidence for when a third variable is, or is not, a confounder. Click on a video lesson below.
SAS – R – Python – Stata – SPSS
14.3 Syntax
14.3.1 multiple regression
SAS Code binary variables as yes = 1 and no = 2
proc glm;
class CategExplanatoryVar CategThirdVar;
model QuantResponseVar=CategExplanatoryVar CategThirdVar QuantThirdVar
/solution;
R
<- lm(QuantResponseVar ~ ExplanatoryVar + ThirdVar1 + ThirdVar2, data = myData)
my.lm summary(my.lm)
Python
import statsmodels.api
import statsmodels.formula.api as smf
#note that categorical explanatory/third variables have to be entered
as C(CategVar)
= smf.ols('QuantResponseVar ~ ExplanatoryVar +
lm1 C(CategThirdVar1) + QuantThirdVar', data=myData).fit()
print (lm1.summary())
STATA
reg QuantResponseVar ExplanatoryVar ThirdVar1 ThirdVar2
SPSS
REGRESSION
/DEPENDENT QuantResponseVar
/METHOD ENTER ExplanatoryVar ThirdVar1 ThirdVar2.
14.3.2 logistic regression
SAS Code binary variables as yes = 1 and no = 2
proc logistic;
class CategExplanatoryVar CategThirdVar;
model BinaryResponseVar=CategExplanatoryVar CategThirdVar QuantThirdVar;
R
<- glm(BinaryResponseVar ~ ExplanatoryVar + ThirdVar1 +
my.logreg data = myData, family = "binomial")
ThirdVar2, summary(my.logreg) # for p-values
exp(my.logreg$coefficients) # for odds ratios
exp(confint(my.logreg)) # for confidence intervals on the odds ratios
Python
import statsmodels.api
import statsmodels.formula.api as smf
# logistic regression
= smf.logit(formula = 'BinaryResponseVar ~ ExplanatoryVar +
lreg1 ThirdVar1 + ThirdVar2', data = myData).fit()
print (lreg1.summary())
# odd ratios with 95% confidence intervals
= lreg1.params
params = lreg1.conf_int()
conf 'OR'] = params
conf[= ['Lower CI', 'Upper CI', 'OR']
conf.columns print (numpy.exp(conf))
STATA
logistic BinaryResponseVar ExplanatoryVar ThirdVar1 ThirdVar2
SPSS
LOGISTIC REGRESSION BinaryResponseVar with ExplanatoryVar
ThirdVar1 ThirdVar2.
14.4 Assignment
Run a multiple regression model (quantitative response variable) or a logistic regression model (binary, categorical response variable). Submit the program that tests for confounding along with corresponding output. Describe in a few sentences what you found.