Chapter 6 Univariate Graphing

“The purpose of visualization is insight, not pictures.”     — Ben Shneiderman

6.1 Overview

You have chosen a research question, selected several variables, and implemented data management decisions that will help you answer your research question. Next, you will learnhow to graph your newly managed variables one at a time (i.e. univariate) in order to better visualize their distributions. Univariate graphing is the process of visualizing the distribution of one variable at a time.

6.2 Lesson

There are a variety of conventional ways to visualize data - tables, histograms, bar graphs, etc. Learn how to visualize a frequency distribution as a graphical display. See how a bar chart that graphs one categorical variable contains bars representing each category on the xaxis and number or percent of observations on the y-axis. Create a histogram for quantitative variables that plots quantitative intervals on the x-axis and the number or percentage of observations on the y-axis. Understand how to describe a histogram’s pattern with shape, center, and spread as well as deviations from those patterns (i.e. outliers). Learn how we use the numerical measures for the mean and median to describe the center of the distribution. See how we use measures of spread to describe the distribution’s variability. Click on a video lesson below.


SAS                     R                     Python                     Stata                     SPSS


6.3 Syntax

6.3.1 univariate graph (categorical variable)

SAS

proc sgplot;
    vbar CategVar/ stat=percent;
    title “Title Here”;

R

library(ggplot2)
ggplot(data=myData)+
geom_bar(aes(x=CategVar))+
ggtitle(“Descriptive Title Here”)

Python

import seaborn
seaborn.countplot(x="CategVar", data=myData)
plt.xlabel("Label for CategVar")
plt.title("Descriptive Title")

STATA

graph bar, over(CategVar)

SPSS    Use graphical user interface (GUI)

6.3.2 univariate graph (quantitative variable)

SAS

proc sgplot;
    histogram QuantVar;
    title “Title Here”;

R

ggplot(data=myData)+
geom_histogram(aes(x=QuantVar))+
ggtitle(“Descriptive Title Here”)

Python

import seaborn
seaborn.distplot(myData["QuantVar"].dropna(), kde=False)
plt.xlabel("Label for QuantVar")
plt.title("Descriptive Title Here")

STATA

histogram QuantVar

SPSS    Use graphical user interface (GUI)

6.4 Assignment

Submit univariate graphs for your two main constructs (i.e. data managed variables). Write a few sentences under each graph describing what it shows. For categorical variables, what response category shows the modal (i.e. most common) frequency and which shows the least common? For quantitative variables, describe the shape (i.e. symmetry/skewness and modality), center (i.e. midpoint) and spread (i.e. range) of the distribution, as well as any outliers. Some categorical variables are ordered (e.g. 1=strongly agree, 2=agree, 3=neither agree nor disagree, 4=disagree, 5=strongly disagree). For ordered variables what is the modal frequency and is there anything that can be said about the shape, center, and spread of the distribution?