The world is full of endless streams of data, and you’re constantly receiving new data from new methods of measurement. Unsorted data in raw form isn't useful for the sociologist--for example, a set of numbers--unless you have a way to put it together to say something about the social world.
One of the ways we put raw data together is to use mean, median, and mode to express something descriptive about an individual. This is known as descriptive statistics, which is the process of using statistics to say something about the typical member of a population--not the average, just the typical.
Consider this set of numbers:
The mean, which is another word for the arithmetic average, is 5.9. If you added these 11 numbers together and divided them by 11, you'd get 5.9.
The median, which is the middle number in a set, is 7. Once you have your numbers organized from least to greatest, the one right in the middle is the median. You know that 7 is the median of this set because it is in the exact middle of the set--you have five numbers on the left side of the 7, and five numbers on the right side.
The mode of this set of numbers is 8. Mode is the most frequently occurring number in the set, and in this set of numbers, the number 8 occurs more than any other number--three times.
IN CONTEXT
How can you use this information and make sense of something, using descriptive statistics?
Suppose that these numbers represent the number of times a person goes out to eat within a month. You asked eleven random people on the street how many times they went out to eat last month, and these are the numbers they responded with.
For example, you could say that the typical person went out to eat an average of 5.9 times last month. However, most people didn’t go out to eat 5.9 times--in fact, you can’t even go out to eat 5.9 of a time.
Another way to say it is that most people went out to eat 8 times last month (the number that occurred most often), which is another statistically valid statement of the typical person.
Both numbers, 5.9 and 8, describe the typical person, and are statistically valid. These are the kind of statements you want to make about the social world using descriptive statistics.
Inferential statistics is another way to use statistical data to say something interesting about the social world. Inferential statistics is applying data from a group to the larger population as a whole.
IN CONTEXT
If you took your data set that you obtained on the street by asking those 11 people how many times they went out to eat last month, and then applied that to American society as a whole, you would be using inferential statistics.
If you say that the average American goes out to eat 5.9 times a month, you are making an inference about the population as a whole based on the data that you gathered.
Correlation is defined as a relationship between variables where two or more variables change together. They change at the same time, whereas cause and effect is a relationship where change in one variable causes change in another variable.
You may have heard the expression “Correlation is not causation.” Cause and effect is, in fact, different from a correlation.
IN CONTEXT
Recall Durkheim’s study on suicide? See if you can come up with the same correlations that Durkheim did.
When the Berlin Wall fell, the former Soviet Union countries--that group of countries that were under Russian rule--became free and made the transition to capitalism. Suppose you looked at suicide rates over that period, for 10 years before the transition, during the transition, and 10 years after the transition.
You find that as the economy started to produce more and as the gross domestic product went up, at the same time as these countries became more free--they were more repressed before-- suicide rates ticked up with the instability. This confirms Durkheim’s original findings.
You could argue, then, that there is a correlation between higher suicide rates and the fall of the Berlin Wall and the growing gross domestic product/the bettering of the economy/more freedom experienced.
What you could not argue, though, is that that these things caused suicide rates to increase. There was so much happening during this period and there were likely things happening that you did not measure, that contributed as well. Social life is very complex, so to make a cause and effect statement in this case is quite tenuous.
A final pitfall to avoid in statistical research is known as the Hawthorne Effect, where subjects of a study will change their behavior in response to being studied.
Source: This work is adapted from Sophia author Zach Lamb.