Use Sophia to knock out your gen-ed requirements quickly and affordably. Learn more
×

Calculating Correlation

Author: Sophia

what's covered
This tutorial will discuss two methods for calculating the correlation coefficient. Our discussion breaks down as follows:

Table of Contents

1. Calculating Correlation Coefficients

The correlation is measured using a numerical value known as the correlation coefficient. The correlation coefficient is a variable called "r" and is unit-less. It is expressed as a number between negative 1 and positive 1 and indicates the strength of the linear association.

Numbers that are close to negative 1 or positive 1 are associated with a strong association between the two variables--a 1 indicating a strong positive association, and a negative 1 indicating a strong negative association. Numbers near zero represent almost no linear relationship.

Correlation Chart

A correlation coefficient is calculated is essentially the average of the products of the z-scores for the x's and the y's. The z- scores are the values of x minus the means of x divided by the standard deviation of x. It's the same thing for y.

formula to know
Correlation Coefficient
r equals fraction numerator 1 over denominator n minus 1 end fraction sum for blank of z subscript x times z subscript y equals fraction numerator 1 over denominator n minus 1 end fraction sum for blank of left parenthesis fraction numerator x minus x with bar on top over denominator s subscript x end fraction right parenthesis left parenthesis fraction numerator y minus y with bar on top over denominator s subscript y end fraction right parenthesis

EXAMPLE

These are destinations that you could go to from the city of Minneapolis-Saint Paul, with the distances away from Minneapolis and the airfare to fly to any of these places.
Destination Miles Airfare
Kansas City 460 379
Los Angelas 1,870 377
Milwaukee 338 158
New York City 1,167 283
Philadelphia 1,141 323

step by step

Step 1: Calculate z-scores of the x variable. In this situation, miles is x, or the explanatory variable, as miles are believed to cause airfare to rise. This makes airfare the response variable, y. Take the given miles and airfare and convert both of them into z-scores.

To do this, you need the mean and the standard deviation. Recall from Unit 3 that you can use Microsoft Excel to easily find these values. For the mean, use the function "=AVERAGE", and for the standard deviation, use the function "=STDEV.S". When using these functions in Excel, you just need to highlight each column that you are finding the mean and standard deviation for. We will do this for both Miles and Airfare.

Finding the Mean and Standard Deviation of the Explanatory and Response Variables

The Mean and Standard Deviation of the Explanatory and Response Variables

Next, to calculate the z-score, subtract the mean from each value and divide by the standard deviation. For example, using the first value in Miles, take 460 minus the mean, 995.2, and divide by the standard deviation, 619.35. This gives us a -0.864.

Calculating the Z-Score For The Explanatory Variable

Do the same thing for the 1870 miles to Los Angeles, and all of the other cities.

Step 2: Repeat this process and calculate the z-scores for the y values. In this scenario, the response variables are the airfare values. Starting with Kansas City, 379 minus 304 divided by 90.93 gives us 0.825.
Calculating the Z-Score For The Response Variable

Do the same thing with all the rest of the airfare.

Step 3: Multiply the corresponding z-scores and add. Starting with -0.864 and 0.825, multiply the corresponding z-scores for the x and y variables, all down the rows, then add them up.

Multiplying Z-Scores

The sum here ends up being positive 2.11. We can substitute this value into the correlation formula.

table attributes columnalign left end attributes row cell r equals fraction numerator 1 over denominator n minus 1 end fraction begin inline style sum subscript blank superscript blank end style z subscript x times z subscript y end cell row cell r equals fraction numerator 1 over denominator n minus 1 end fraction begin inline style sum subscript blank superscript blank end style 2.11 end cell end table

Step 4: Finally, divide by the number of observations minus 1. There are five observations, so the denominator will be 5-1.

table attributes columnalign left end attributes row cell r equals fraction numerator 1 over denominator n minus 1 end fraction left parenthesis 2.11 right parenthesis end cell row cell r equals fraction numerator 1 over denominator 5 minus 1 end fraction left parenthesis 2.11 right parenthesis end cell row cell r equals 1 fourth left parenthesis 2.11 right parenthesis end cell row cell r equals 0.527 end cell end table

Dividing by four yields a correlation of 0.527. This value tells us that the correlation between airfare and miles is a positive relationship but fairly weak association. We can also see this from the scatter plot:

Scatterplot Showing Miles vs Airfare


2. Calculating Correlation Coefficients with Spreadsheet Functions

This is a very cumbersome process to go through, and the correlation coefficient is almost always found using technology. In Excel, once we have the basic information for miles and airfare listed, all you have to do is type in the command "=CORREL", which is short for correlation. Select all the things believed to be the x's, and all of the things we believe to be the y's. Close the parentheses and hit "Enter."

Finding the Correlation Using Excel

Sure enough, it gives you the 0.527 that you got before.

hint
The formula and inputs of the CORREL function are identical spreadsheet software like Google Sheets. It's a good idea to use one of these tools to save time and reduce the chance of errors.

think about it
In the real world, calculating correlation coefficients will always be done with a tool like Excel. Why might performing the calculations "by hand" in this course have value, despite not being something you would do as a professional?

summary
Correlation measures the strength and direction of a linear relationship between two variables on a scatter plot. Now that you are familiar with the correlation coefficient is and how it is calculated, you should calculate it using a tool, such as a calculator, Internet Applet, or a spreadsheet. Because of the way Correlation coefficients are calculated, they will be the same regardless of which variable is explanatory and which is response.

Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

Formulas to Know
Correlation
r space equals space fraction numerator 1 over denominator n minus 1 end fraction sum from blank to blank of z subscript x times z subscript y space equals space fraction numerator 1 over denominator n minus 1 end fraction sum from blank to blank of open parentheses fraction numerator x minus top enclose x over denominator s subscript x end fraction close parentheses open parentheses fraction numerator y minus top enclose y over denominator s subscript y end fraction close parentheses