Source: Intro Music by Mark Hannan; Public Domain
Hello, and welcome to sociological studies. As always, thank you for taking the time out of your busy day to study society. The topic of today's lesson is going to be statistics. We're going to discuss inferential statistics and descriptive statistics, as well as some statistical concepts like mean, median, and mode. Statistics can be kind of intimidating for people, so I hope we can break it down very simply and easily. So I hope I've done that for you.
But before we start, let's discuss mean, medium, and mode. Well, the world is full of endless streams of data, and we're constantly getting new data from new ways to measure things. Unsorted data in raw form really isn't useful for the sociologist. Like this set of numbers really isn't useful, unless we have a way to put it together to say something about the social world.
One of the ways we do this-- one of the ways we just put raw data together-- is to use mean, median, and mode to say something descriptive about somebody. And that's what we call descriptive statistics, which is the process of using statistics to say something about the typical member of a population-- not the average, just the typical.
So now, look at this set of numbers. One, two, two-- sorry. I got a little carried away there, and I accidentally erased the bottom of the two. So 1, 2, 2, 4, 6, 7, 8, 8, 8, 9, 10-- I've taken the liberty of just organizing them in a row here.
So first, the mean, which is 5.9-- the way to think of mean is to think of average. If you added these 11 numbers together and divided them by 11, you'd get 5.9. That is mean. It's the arithmetic average.
So now median, which is seven-- median is the middle number in a set. So once you have your numbers organized, the one right in the middle is the median. And we know seven is in the middle because you have five numbers on this side and five numbers on that side. So that's why seven is the median. It's in the middle.
And mode, then, is eight. We have eight as the mode because mode is the most frequently occurring number in the set, and that's definitely eight. We have three eights.
So to get a sense of how we can use this information and make sense of something using descriptive statistics, let's suppose that these numbers represent the number of times a person goes out to eat within a month. Suppose I was on the street and I asked 11 people that walked by how many times did you go out to eat last month. And these are what they said. These are the numbers they responded with.
So using descriptive statistics, which is, again, a way to describe the most typical member of the population-- typical, not average, because average might be different. Typical could be any of those numbers and make a statistically valid statement. So that's why we say typical and not average. So then the typical person went out to eat an average of 5.9 times last month.
But really, most people didn't go out to eat 5.9 times. In fact, you can't even go out to eat 0.9 of a time. So another way to say it is most people went out to eat eight times last month, and that's another statistically valid statement of the typical person. So both numbers, 5.9 and 8, describe the typical person, and are statistically valid. These are the kind of statements we want to make about the social world using descriptive statistics.
Another way to use statistical data to say something interesting is inferential statistics, which is applying data from a group to the larger population as a whole. I would be doing inferential statistics if I took my data set here that I obtained on the street by asking those 11 people how many times they went out to eat last month-- if I then applied that to American society as a whole. So I would say then that the average American goes out to eat 5.9 times a month. I would be making an inference about the population as a whole based on the data that I gathered.
So when doing inferential statistics, we need to be careful that we have a representative sample that allows us to make these kind of claims about the population as a whole. What if I asked only really wealthy people who were carrying briefcases? I decided to only ask them. I thought they were wealthy because they had their briefcase. This might not be a very accurate way to make an inference about the population as a whole. So this is a concern with inferential statistics.
Next, I want to talk about the difference between correlation and causation. We can define correlation as a relationship between variables where two or more variables change together. They change at the same time, whereas cause and effect is a relationship where change in one variable causes change in another variable. And you may have heard the expression correlation is not causation. So cause and effect is, in fact, different from a correlation. I'll explain that now.
So I was an economics major as an undergraduate, and so I had to do a thesis to get done-- to finish economics. So I looked at-- I was interested in sociology, so I looked at that study by Durkheim on suicide, and I wanted to see if I could modernize it and come up with the same correlations that Durkheim did. So I looked at the former Soviet Union countries as they became capitalist-- so that group of countries that were under Russian rule-- part of the Soviet Union-- when the Berlin Wall fell and they all became free. So they made the transition to capitalism.
I looked at suicide rates over that period, for 10 years before the transition, and during, and 10 years after the transition. And what I actually found was that as the economy started to produce more, as the gross domestic product went up, and at the same time as these countries became more free-- they were more repressed before-- suicide rates ticked up with the instability. Just as Durkheim found and predicted originally, I did with research-- confirmed what he found.
So correlation, then-- higher suicide rates, I argued, were correlated with the fall of the Berlin Wall and the growing gross domestic product-- the bettering of the economy-- and the more freedom you experienced. So I argued that this was a correlation.
I couldn't have argued, though, that these things caused suicide rates to increase. I couldn't have argued this because there was so much happening. There was so much going on. There was things that maybe I didn't measure that contributed to this as well. Social life, like we've talked about, is very complex. So for me to make a cause and effect statement about that-- it was very tenuous.
But that is our goal when we're doing this kind of research. We want to generate cause and effect statements. But often, we're happy with proving or showing a correlation.
When doing this kind of statistical research, when looking for correlations and cause and effect relationships, something that the researcher needs to look out for is what's called spurious correlation. Spurious correlation happens when variables appear to be related, but then in effect and in actuality really are not. There's something else going on that the research missed that makes it look like these variables are correlated, but they're not.
So for example, when I was doing my research on suicide, what if there's a third variable I didn't measure that made it seem like suicide rates were correlated with rising GDP? What if there's a third variable then, like debt? Say personal debt was on the increase with the increasing GDP. Those things move together. But me putting GDP in there masked the fact that debt was really the problem. So this would be a spurious correlation. I would have arrived at a false correlation.
Along with spurious correlation, then, a final thing we want to make sure we avoid and look out for is what's called the Hawthorne effect, which is where subjects of a study will change their behavior in response to being studied.
Well, I hope you enjoyed learning about statistics, descriptive statistics, and inferential statistics, and mean, median, and mode, correlation, cause and effect, and the Hawthorne effect, as well as spurious correlation. Thank you very much for joining me, and have a great rest of your day.
A relationship where change in one variable causes the other variable to change.
A close and simultaneous relationship between two or more variables.
A form of statistical analysis that seeks to make statements about the most "typical" member of a group.
An effect that occurs when subjects of a study change their behavior in response to being studied.
Applying statistical data from a group or sample to the larger population as a whole.
The arithmetic average of a set of numbers.
The middle point in a set of numbers.
The most frequently occurring number in a set.
A false correlation that is often the result of an unaccounted variable.