To begin, think back to the first four steps of the experimental method. Step one is choosing two different things you think might have some sort of cause-and-effect relationship. The second step is identifying how the two things might tie together, or how the cause might have the effect on the particular variable.
Step three is predicting how changing one variable and will affect the other. The fourth step is experimenting with or testing the prediction by trying to determine if what you’re looking at has the predicted cause- and-effect relationship.
IN CONTEXT
Say you’re interested to see the impact that studying has on test scores. You would determine the relationship you think exists between studying and test score, and make a prediction about the effect of changes in studying. In step four, you test the prediction. What you’re doing is establishing that you think the more hours you study for an exam, the better you’re going to do on it. You’re trying to see if the actual outcome is as predicted.
Maybe you follow the number of hours that students study. Then, you’d look at their resulting exam scores and see if the number of hours spent studying made any difference. That would be an example of how you’d actually apply step four of the experimental method.
In statistics, the goal is to determine information about an entire group of individuals or items. The problem is that this type of information is often difficult to obtain. This being the case, it is easier to test and gather data from a small set of individuals or items that have characteristics similar to those in the larger group. Doing this enables us to make predictions regarding how the larger group will behave.
A population is an entire group of individuals or items that a researcher is interested in. Depending on the circumstances, the population could be an entire country or even the entire world. It could even be all plants or animals. Often, the population is a group, individuals, or items with a specific characteristic of interest, such as all people with a specific level of education or all houses that have four bedrooms.
Generally speaking, populations are likely very large, making it impossible or very costly to obtain information about each individual in the population. Because of this, a researcher typically chooses a much smaller group from the population. This group is called a sample.
The researcher obtains the data from the sample, not the population. However, the sample should be representative of the population as a whole. If the sample is not representative of the population, then any conclusions drawn from the sample cannot be applied to the population, and the data is useless for estimating a specific aspect of the population.
IN CONTEXT
To get an idea of this, think about polling data that takes place around elections. The idea behind polling data is to take a sample of people that are registered or likely voters and get a sense for how they feel about a particular issue or candidate. This looks at a small sample that reflects the overall population as a whole.
You might ask 700 registered voters how they feel about a particular issue or which candidate they prefer in this election. You would take that information from the sample and use it to make estimates about what the population itself actually looks like.
In the world of statistics, there are specific terms used when speaking about data for a population or a sample. A parameter is a measurable quantity or characteristic about the population.
IN CONTEXT
Say you wanted to estimate the average value of houses across the entire United States. The population parameter, the piece of information about the population you’re interested, is the average value of homes across the entire country.
Because it’s generally very difficult to obtain information about every individual in a population, the exact values of parameters are unknown. There are, however, some exceptions, and the United States Census is one of them. The census is required by law to count the population in the United States every 10 years. The last census we had was in 2010.
IN CONTEXT
What if you were interested in the total number of children under the age of 18 in the year 2010? According to the US Census Bureau, there is actually a number for that, because everyone was counted. The number that the bureau reports, according to their website, is 74,181,467. That is the actual number of children under the age of 18 in the year 2010. These results are parameters, because they come from the entire population.
You don’t often have access to an entire population. What if a researcher only has a sample, or any measurable quantity or characteristics related to the sample? That would be called a statistic.
IN CONTEXT
If a pollster contacts registered voters to ask how they plan on voting in an upcoming election, the results of the survey are a statistic. The value of a statistic can be significantly different than that of a parameter, depending on how the sample was obtained and how many observations are in the sample.
A larger sample will generally give better information about a population than a smaller sample. This is assuming that both were conducted in an identical manner. The larger the sample is, the more likely any data that comes from it will represent the actual population.
IN CONTEXT
Say you were interested in determining the average number of hours of television that American adults watch per week. Now, that’s a situation where the parameter would actually be the average for every adult in the entire United States. However, that’s not necessarily something we can easily measure.
Instead, you decided to take a survey, or a representative sample of the population, and you asked people how many hours of television they watch. The population parameter would be the total average for the entire United States. However, in this case, the statistic would be the average of your sample.
Remember, populations consist of all individuals, or a group of individuals, who may possess a certain characteristic. A sample consists of individuals that are a part of the population of individuals who share a common characteristic.
To get the statistic from the above questions, you could look at the average snowfall in one particular city. Then you could look at the average snowfall in a completely different state. You’d probably find a significant difference, as the snowfall at one location may not be representative of the population as a whole. That would be the key distinction between average annual snowfall across the entire country versus the average within a particular area. This would be same as the sample versus the population, or the statistic versus the parameter.
Source: This work is adapted from Sophia author Dan Laub.