Data Analysis and Interpretation: Pearson Correlation
I am using the ‘Outlook on Life’ survey data to investigate if age or education play a role in whether people think blacks and other minorities are treated the same as whites in the criminal justice system (W1_K4
in the codebook) or in whether they have trust in the legal system (W1_K1_C
in the codebook). Both of these categorical response variables have multiple levels, as do the explanatory variables. Since the Pearson correlation is a measure of the linear correlation between two X and Y quantitative variables I am going to complete this assignment using the GapMinder data instead.
For this exercise, I will examine the relationship between a country’s wealth and life expectancy. In the GapMinder data, the wealth of a country can be measured with GDP per capita (variable: incomeperperson) and life expectancy by years with a variable of the same name (variable: lifeexpectancy).
Using a scatter plot to visualize the data is the first step. The graph below indicates a positive relationship between GDP per capita and life expectancy. The relationship appears to follow a curvilinear (nonlinear) pattern so a large correlation coefficient, r, is not expected as the Pearson correlation coefficient is meant to measure linear relationships.
SAS Output: Scatter Plot  GDP per capita v Life Expectancy, Source: S. Prestridge
Reviewing the output tables shows the Pearson correlation coefficient, r, is equal to 0.60 (moderate strength) with a pvalue less than 0.0001 so the correlation also holds for the population. rsquared is equal to 0.36 which means that 36% of the variability in life expectancy can be explained by the variation in GDP per capita.
SAS Output: Pearson correlation coefficient tables  GDP per capita v Life Expectancy, Source: S. Prestridge
Full results available as a PDF
SAS Program Code
Full code of the SAS program is shown below.
LIBNAME mydata "/courses/d1406ae5ba27fe300" access=readonly;
DATA new; set mydata.gapminder;
label incomeperperson="GDP per capita"
lifeexpectancy="Life expectancy";
proc sort; by country;
proc gplot;
axis1 label=(f="verdana" h=1.9 "GDP per capita" justify=c f="verdana" h=1.2 "US$")
order=(0 to 60000 by 10000)
value=(f="verdana" h=1.2 "0" "10,000" "20,000" "30,000" "40,000" "50,000" "60000");
axis2 label=(a=90 f="verdana" h=1.9 "life expectancy" justify=c a=90 f="verdana" h=1.2 "Years");
plot lifeexpectancy*incomeperperson/haxis=axis1 vaxis=axis2;
proc corr; var lifeexpectancy incomeperperson;
run;
Bonus material  Outlook on Life analysis
Just to see what the results would be I examined Pearson correlation coefficients for several of the Outlook on Life variables. The first two were the original questions I was interested in.

W1_K4
: “Are blacks & other minorities treated the same as whites in the criminal justice system?” 
W1_K1_C
: “Do You trust the legal system?” 
W1_M5
: “Attendance of religious services” 
W1_P9
: “Has anyone in your household been arrested” 
W1_P20
: “Personal Income” 
W1_N1d
: “Rate White people” 
W2_J1_B
: “Rate people in jail”
See the results table below.
General findings are that although there are several variables with pvalues less than 0.0001, indicating statistical significance for the population, there are few strongly correlated variables as indicated by r  the Pearson correlation coefficient. The largest rvalue of 0.45 for personal income and education indicates a moderate strength relationship and yields an rsquared of 0.20 which means only 20% of the variability in personal income can be explained by the variation in education. Regarding the original two questions  belief that blacks and minorities are treated the same as whites in the criminal justice system and general trust in the legal system, neither appears to be strongly correlated with age, education, income, living with someone convicted of a crime, feelings toward white people or people in jail, or attendance of religious services.
SAS Output: Pearson correlation coefficient tables  Outlook on Life, Source: S. Prestridge