Chapter 7 Bivariate Graphing
“The greatest value of a picture is when it forces us to notice what we never expected to see.” — John Tukey
7.1 Overview
So far, we have examined data obtained from one variable at a time (either categorical or quantitative) and learned how to describe the features and distribution of the variable using the appropriate visual displays and numerical measures. Now we will consider two variables simultaneously and explore the relationship between them using, as before, visual displays and numerical summaries. When graphing your variables, it is important that each graph provides clear and accurate summaries of the data that do not mislead.
7.2 Lesson
Understand why we impose a causal model on our research question despite the fact that causation cannot be directly evaluated based on observational data. Assign roles to each of your variables. Which will play the role of explanatory variable and which will play the role of response variable? Learn to use the graphing decision flow chart to determine, based on variable types, the appropriate graph for visualizing each relationship. Consider what types of bivariate graphs will help you graphically visualize your research question. Understand how to interpret bivariate graphs. Note that in some cases, the bivariate association may differ for population subgroups. Graphing the subgroups by adding a third variable will help to visually determine if population subgroup differences may exist. Click on a video lesson below.
SAS – R – Python – Stata – SPSS
Graphing Flow Chart:
7.3 Syntax
7.3.1 bivariate graph (categorical explanatory, categorical response)
SAS
proc sgplot;
vbar CategExplanatoryVar / response=CategResponseVar stat=mean;
title “Title Here”;
R
ggplot(data=myData)+
stat_summary(aes(x=CategExplanatoryVar, y=CategResponseVar),
fun.y=mean, geom=”bar”)+
ggtitle(“Descriptive Title Here”)
Python
="CategExplanatoryVar",
seaborn.factorplot(x="QuantResponseVar", data=myData, kind="bar", ci=None)
y'Label for CategExplanatoryVar')
plt.xlabel('Label for QuantResponseVar')
plt.ylabel('Descriptive Title Here') plt.title(
STATA
graph bar (mean) CategResponseVar, over(CategExplanatoryVar)
SPSS
Use graphical user interface (GUI)
7.3.2 bivariate graph (categorical explanatory, quantitative response)
SAS
proc sgplot;
vbar CategExplanatoryVar / response=QuantResponseVar stat=mean;
title “Title Here”;
R
##Below is code for bar graph
ggplot(data=myData)+
stat_summary(aes(x=CategExplanatoryVar,
y=QuantResponseVar), fun.y=mean, geom=”bar”)
for boxplots
Below is code ggplot(data=myData)+
geom_boxplot(aes(x=CategExplanatoryVar,
y=QuantResponseVar))+ ggtitle(“Descriptive Title Here”)
##Below is code for density plots
ggplot(data=myData)+
geom_density(aes(x=QuantResponseVar,
color=CategExplanatoryVar))+ggtitle(“Descriptive Title Here”)
Python
= seaborn.factorplot(x=CategExplanatoryVar,
scat1 =QuantResponseVar, data=myData, kind="bar", ci=None)
yprint(scat1)
STATA
graph box QuantResponseVar, over(CategExplanatoryVar)
SPSS
Use graphical user interface (GUI)
7.3.3 bivariate graph (quantitative explanatory, quantitative response)
SAS
proc sgplot;
scatter x=QuantExplanatoryVar y=QuantResponseVar;
title “Title Here”;
R
ggplot(data=myData)+
geom_point(aes(x=QuantExplanatoryVar, y=QuantResponseVar))+
geom_smooth(aes(x=QuantExplanatoryVar, y=QuantResponseVar),
method=”lm”)
## Note, adding 3rd variable by using the color argument
ggplot(data=myData)+
geom_point(aes(x=QuantExplanatoryVar, y=QuantResponseVar,
color=CategThirdVar))+
geom_smooth(aes(x=QuantExplanatoryVar, y=QuantResponseVar,color=CategThirdVar), method=”lm”)
Python
="QuantExplanatoryVar",
seaborn.regplot(x="QuantResponseVar",
y=False, data=myData)
fit_reg'Label for QuantExplanatoryVar')
plt.xlabel('Label for QuantResponseVar')
plt.ylabel('Descriptive Title Here') plt.title(
STATA
twoway (scatter QuantResponseVar QuantExplanatoryVar) ///
lfit QuantResponseVar QuantExplanatoryVar) // adds reg line (
SPSS Use graphical user interface (GUI)
7.3.4 3rd variable graph (categorical explanatory, categorical response, categorical 3rd variable)
SAS
proc sgplot;
vbar ExplVar /response=RespVar group=ThirdVar groupdisplay=cluster stat=mean;
title “Title Here”;
R
<- ftable(myData$CategResponseVar,
tab1 $CategExplanatoryVar,
myData$CategThirdVar)
myData
tab1<- prop.table(tab1, 2)
tab1_colProp tab1_colProp
Python
="CategExplanatoryVar",
seaborn.factorplot(x="CategResponseVar", hue="CategThirdVar", data=myData,
y="bar", ci=None)
kind'Label for CategExplanatoryVar')
plt.xlabel('Label for CategResponseVar')
plt.ylabel('Descriptive Title Here') plt.title(
STATA
ssc install catplot
percent(CategExplanatoryVar over (CategThirdVar) catplot CategResponseVar CategExplanatoryVar,
SPSS Use graphical user interface (GUI)
7.3.5 3rd variable graph (categorical explanatory, quantitative response, categorical 3rd variable)
SAS
proc sgplot;
vbar ExplVar /response=RespVar group=ThirdVar groupdisplay=cluster stat=mean;
title “Title Here”;
R
ftable(by(myData$QuantResponseVar,
list(myData$CategExplanatoryVar, myData$CategThirdVar),
na.rm = TRUE)) mean,
Python
="CategExplanatoryVar", y="QuantResponseVar",
seaborn.barplot(x="CategThirdVar", data=myData, ci=None)
hue'Label for CategExplanatoryVar')
plt.xlabel('Label for QuantResponseVar')
plt.ylabel('Descriptive Title Here') plt.title(
STATA
graph bar (mean) QuantResponseVar, over(CategExplanatoryVar) over(CatThirdVar) ///
blabel(bar) // displays mean value for each bar
SPSS Use graphical user interface (GUI)
7.4 Assignment
Submit a graph showing the association between your explanatory variable and your response variable. Describe what the graph shows. Include an additional graph(s) of your bivariate relationship by a third variable. Describe any similarities or differences in the association between your explanatory and response variable for different levels of your third variable (i.e. different population subgroups).