This lesson examines the following components of statistical analysis: mean, median, mode, inferential statistics, descriptive statistics, correlation, cause and effect, and spurious (false) correlation.
This tutorial will cover the topic of statistics, through the definition and discussion of:
The world is full of endless streams of data, and you’re constantly receiving new data from new methods of measurement. Unsorted data in raw form isn't useful for the sociologist--for example, a set of numbers--unless you have a way to put it together to say something about the social world.
One of the ways we put raw data together is to use mean, median, and mode to express something descriptive about an individual. This is known as descriptive statistics, which is the process of using statistics to say something about the typical member of a population--not the average, just the typical.
Descriptive Statistics
A form of statistical analysis that seeks to make statements about the most "typical" member of a group.
IN CONTEXT
Consider this set of numbers:
The mean, which is another word for the arithmetic average, is 5.9. If you added these 11 numbers together and divided them by 11, you'd get 5.9.
The median, which is the middle number in a set, is 7. Once you have your numbers organized from least to greatest, the one right in the middle is the median. You know that 7 is the median of this set, because it is in the exact middle of the set--you have five numbers on the left side of the 7, and five numbers on the right side.
The mode of this set of numbers is 8. Mode is the most frequently occurring number in the set, and in this set of numbers, the number 8 occurs more than any other number--three times.
Mean
The arithmetic average of a set of numbers.
Median
The middle point in a set of numbers.
Mode
The most frequently occurring number in a set.
IN CONTEXT
How can you use this information and make sense of something, using descriptive statistics? Suppose that these numbers represent the number of times a person goes out to eat within a month. You asked eleven random people on the street how many times they went out to eat last month, and these are the numbers they responded with.
It’s important to note that when using descriptive statistics, it is a way to describe the most typical member of the population--typical, not average, because average might be different. There are a number of valid ways to represent what is "typical" in the population.
For example, you could say that the typical person went out to eat an average of 5.9 times last month. However, most people didn’t go out to eat 5.9 times--in fact, you can’t even go out to eat 0.9 of a time. Another way to say it is that most people went out to eat 8 times last month (the number that occurred most often), which is another statistically valid statement of the typical person. Both numbers, 5.9 and 8, describe the typical person, and are statistically valid.
These are the kind of statements you want to make about the social world using descriptive statistics.
Inferential statistics is another way to use statistical data to say something interesting about the social world. Inferential statistics is applying data from a group to the larger population as a whole.
Inferential Statistics
Applying statistical data from a group or sample to the larger population as a whole.
IN CONTEXT
If you took your data set that you obtained on the street by asking those 11 people how many times they went out to eat last month, and then applied that to American society as a whole, you would be using inferential statistics.
If you say that the average American goes out to eat 5.9 times a month, you are making an inference about the population as a whole based on the data that you gathered.
When using inferential statistics, it is important to have a representative sample that allows you to make these kind of claims about the population as a whole. What if you thought wealthy people were more likely to carry briefcases, but you only asked people who were already carrying briefcases? You decided to only ask them, based on your assumption that they were wealthy because they carried briefcases.
This, therefore, might not be a very accurate way to make an inference about the population as a whole, because your selection process targeted a specific kind of person, not a representative sample.
Correlation is defined as a relationship between variables where two or more variables change together. They change at the same time, whereas cause and effect is a relationship where change in one variable causes change in another variable. You may have heard the expression “Correlation is not causation.” Cause and effect is, in fact, different from a correlation.
Correlation
A close and simultaneous relationship between two or more variables.
Cause and Effect
A relationship where change in one variable causes the other variable to change.
IN CONTEXT
Recall Durkheim’s study on suicide? See if you can come up with the same correlations that Durkheim did.
When the Berlin Wall fell, the former Soviet Union countries--that group of countries that were under Russian rule--became free and made the transition to capitalism. Suppose you looked at suicide rates over that period, for 10 years before the transition, during the transition, and 10 years after the transition.
You find that as the economy started to produce more and as the gross domestic product went up, at the same time as these countries became more free--they were more repressed before-- suicide rates ticked up with the instability. This confirms Durkheim’s original findings.
You could argue, then, that there is a correlation between higher suicide rates and the fall of the Berlin Wall and the growing gross domestic product/the bettering of the economy/more freedom experienced.
What you could not argue, though, is that that these things caused suicide rates to increase. There was so much happening during this period and there were likely things happening that you did not measure, that contributed as well. Social life is very complex, so to make a cause and effect statement in this case is quite tenuous.
The goal when you're doing this kind of statistical research is to generate cause and effect statements. Often, though, you’re satisfied with simply demonstrating or showing a correlation.
In statistical research, when looking for correlations and cause and effect relationships, you need to be aware of what is called spurious correlation. Spurious correlation happens when variables appear to be related, but in actuality are not. Something else is going on that the research missed, that makes it look like these variables are correlated, but they are not.
Spurious Correlation
A false correlation that is often the result of an unaccounted variable.
When you were researching suicide rates in the former Soviet Union, could there be a third variable you didn't measure that made it seem like suicide rates were correlated with rising GDP? What if there's a third variable, like debt?
Suppose personal debt was on the rise with the increasing GDP--those two things were moving together. Putting the GDP in the equation masked the fact that debt was really the problem. This would be a spurious, or false, correlation.
A final pitfall to avoid in statistical research is known as the Hawthorne Effect, where subjects of a study will change their behavior in response to being studied.
Hawthorne Effect
An effect that occurs when subjects of a study change their behavior in response to being studied.
Today you learned about statistics, exploring both descriptive statistics and inferential statistics. You also learned about the statistical concepts of mean, median, mode, correlation, causation, and spurious correlation as well as the Hawthorne effect.
Source: This work is adapted from Sophia author Zach Lamb.
An effect that occurs when subjects of a study change their behavior in response to being studied.
A false correlation that is often the result of an unaccounted variable.
A relationship where change in one variable causes the other variable to change.
A close and simultaneous relationship between two or more variables.
Applying statistical data from a group or sample to the larger population as a whole.
A form of statistical analysis that seeks to make statements about the most "typical" member of a group.
The most frequently occurring number in a set.
The middle point in a set of numbers.
The arithmetic average of a set of numbers.