Chapter 14 Multivariate Modeling

“All models are wrong, but some are useful.”     — George E. P. Box

14.1 Overview

You have learned that a significant association or correlation does not mean causation. Next you will learn to determine statistically if a third variable is a confounder, or can account for a bivariate relationship you found to be significant using ANOVA, Chi-Square Test of Independence, or Pearson Correlation. Using multivariate models allows us to statistically control for additional variables that may account for (confound) a significant bivariate association.

14.2 Lesson

Learn about confounding and multivariate models. Understand that testing for likely confounding variables helps us to get slightly closer to establishing a cause and effect relationship when conducting an observational study. Determine if you need to use multiple regression because your response variable is quantitative or logistic regression because your response variable is categorical (with 2 levels). Consider the evidence for when a third variable is, or is not, a confounder. Click on a video lesson below.


SAS                     R                     Python                     Stata                     SPSS


14.3 Syntax

14.3.1 multiple regression

SAS    Code binary variables as yes = 1 and no = 2

proc glm;
    class CategExplanatoryVar CategThirdVar;
    model QuantResponseVar=CategExplanatoryVar CategThirdVar QuantThirdVar 
        /solution;

R

my.lm <- lm(QuantResponseVar ~ ExplanatoryVar + ThirdVar1 + ThirdVar2, data = myData)
summary(my.lm)

Python

import statsmodels.api
import statsmodels.formula.api as smf
#note that categorical explanatory/third variables have to be entered
as C(CategVar)
lm1 = smf.ols('QuantResponseVar ~ ExplanatoryVar +
C(CategThirdVar1) + QuantThirdVar', data=myData).fit()
print (lm1.summary())

STATA

reg QuantResponseVar ExplanatoryVar ThirdVar1 ThirdVar2

SPSS

REGRESSION
/DEPENDENT QuantResponseVar
/METHOD ENTER ExplanatoryVar ThirdVar1 ThirdVar2.

14.3.2 logistic regression

SAS    Code binary variables as yes = 1 and no = 2

proc logistic;
    class CategExplanatoryVar CategThirdVar;
    model BinaryResponseVar=CategExplanatoryVar CategThirdVar QuantThirdVar;

R

my.logreg <- glm(BinaryResponseVar ~ ExplanatoryVar + ThirdVar1 +
ThirdVar2, data = myData, family = "binomial")
summary(my.logreg) # for p-values
exp(my.logreg$coefficients) # for odds ratios
exp(confint(my.logreg)) # for confidence intervals on the odds ratios

Python

import statsmodels.api
import statsmodels.formula.api as smf
# logistic regression
lreg1 = smf.logit(formula = 'BinaryResponseVar ~ ExplanatoryVar +
ThirdVar1 + ThirdVar2', data = myData).fit()
print (lreg1.summary())
# odd ratios with 95% confidence intervals
params = lreg1.params
conf = lreg1.conf_int()
conf['OR'] = params
conf.columns = ['Lower CI', 'Upper CI', 'OR']
print (numpy.exp(conf))

STATA

logistic BinaryResponseVar ExplanatoryVar ThirdVar1 ThirdVar2

SPSS

LOGISTIC REGRESSION BinaryResponseVar with ExplanatoryVar
ThirdVar1 ThirdVar2.

14.4 Assignment

Run a multiple regression model (quantitative response variable) or a logistic regression model (binary, categorical response variable). Submit the program that tests for confounding along with corresponding output. Describe in a few sentences what you found.