Chapter 4 Working with Data
“When you have mastered numbers, you will no longer be reading numbers, any more than you read words when reading books. You will be reading meanings.”— W.E.B. Du Bois
4.1 Overview
You have learned about developing a research question based on existing data by using a code book to guide you. You refined your research question by conducting a literature review based on primary source journal articles. Now you are ready to use statistical software to work with the data. Exploratory data analysis is the processes of converting raw data into a more useful form so that we can begin to discover important features and patterns in the data.
4.2 Lesson
Learn to examine frequency distributions for each of the variables you have selected. Determine what values a variable takes and how often it takes those values. Write the code or take the steps required to generate frequency distributions using your statistical software program. As you engage with your data, learn how to consider whether or not you want to create a subset of the larger sample in order to answer your question. Click on a video lesson below.
SAS – R – Python – Stata – SPSS
4.3 Syntax
4.3.1 loading a data set
SAS
libname mydata "C:/foldername-including-path";
data new;
set mydata.filename;
R
load ("filename-including-path.Rdata")
<- name-of-object-loaded-in-your-workspace myData
Python
import pandas
import numpy
= pandas.read_csv('nesarc_pds.csv') myData
STATA
use "C:\path-and-folder-name\filename", clear
SPSS
GET FILE='C:\path-and-folder-name\filename.sav'.
4.3.2 sorting data
SAS
proc sort;
by unique_id;
R
<- myData[order(myData$unique_id, decreasing = FALSE),] myData
Python
= myData.sort_values(by='unique_id') myData
STATA
sort unique_id
SPSS
SORT CASES BY unique_id.
4.3.3 displaying frequency tables
SAS
proc freq;
tables VAR1 VAR2 VAR3;
R
library(descr)
freq(as.ordered(myData$VAR1))
freq(as.ordered(myData$VAR2))
freq(as.ordered(myData$VAR3))
Python
= myData['VAR1'].value_counts(sort=False, dropna=False)
c1 print(c1)
STATA
tab1 VAR1 VAR2 VAR3
SPSS
FREQUENCY VARIABLES=var1 var2 var3.
/ORDER=ANALYSIS.
4.4 Assignment
Submit your program and the corresponding results that display at least three of your variables as frequency tables. Write a few sentences that describe what you see in each frequency table.