Source: Image of exam, PD, http://bit.ly/1Ja9ux7; Image of man with book, PD, http://bit.ly/1NvFJbY; Image of earth, PD, http://bit.ly/1QsRl5B; Image of ballot box, PD, http://bit.ly/1NvGlOz; Image of vote sticker, PD, http://bit.ly/1Mk7mTl; Image of house, PD, http://bit.ly/1mbmuxe; Image of Census Seal, PD, http://bit.ly/1TT3K0m; Image of TV, PD, http://bit.ly/1Qr5CiM; Image of snowfall, PD, http://bit.ly/1UGHZlg; Image of snow map, PD, http://bit.ly/1OauDZ5
[SOUND EFFECTS] Hi, Dan Laub, here. And in this lesson, we're going to talk about classifying data by who you are testing. The objectives for this lesson are as follows. First, we want to understand the difference between a population and a sample. Second, we want to realize the difference between parameters and statistics.
So just as a brief reminder, let's talk about the first four steps of the experimental method. And so we'll get the first step, choosing two different things we think might have some sort of cause and effect relationship. Second step, talking about a guess, as to how they actually might tie together, so how the cause might actually have the effect on the particular variable. Third step, predicting what we think will happen once we change one thing and how it affects the other. And the fourth step is basically testing or experimenting with the prediction by trying to determine if what we're looking at has the cause and effect relationship that was predicted.
As a simple example, what if we're interested to see the impact that studying has on test scores? And so if we refer, here, to step four, we're testing with the prediction. So what we're doing is establishing the fact that we think that the more hours you study for an exam, the better you're going to do on it. And we're trying to see if the actual outcome is as predicted.
Maybe we follow the number of hours that students study. And then, we'd look at their resulting exam scores and see if it made any difference. And that would be an example of how we'd actually apply step four, here, of the experimental method.
In statistics, the goal is to determine information about an entire group of individuals or items. But the problem is that this type of information is often difficult to obtain. This being the case, it is easier to test and gather data from a small set of individuals or items that have similar characteristics to the larger group. Doing this enables us to make predictions regarding how the larger group will behave.
A population is an entire group of individuals or items that a researcher is interested in. Depending on the circumstances, this population could be an entire country or even the entire world. It could even be all plants or animals. But often, the population is a group, or individuals, or items with a specific characteristic of interest-- for example, all people with a specific level of education or all houses and have four bedrooms.
Generally speaking, populations are likely very large. So it is usually impossible or very costly to obtain information about each individual in the population. Since populations are generally so large, a researcher typically chooses a much smaller group from the population. This group is called a sample.
The researcher obtains their data from the sample, not the population. However, the sample should be representative of the population as a whole. If the sample is not representative of the population, then any conclusions drawn from the sample cannot be applied to the population, and the data is useless for estimating a specific aspect of the population.
So as an example, let's talk about polling data that takes place around elections. And so the idea behind polling data is to take a sample of people that are registered voters, or likely voters, and to get a sense for how they feel about a particular issue or particular candidate. And the idea, here, is to basically look at a small sample, representative of the population, that reflects the overall population as a whole. So in this case, we might ask, say, 700 registered voters, how do you feel about this particular issue, or which candidate do you prefer in this election? And what we do is take that information from the sample, and then use it to make estimates about what the population itself actually looks like.
In the world of statistics, there are specific terms used when speaking about data for a population or a sample. For instance, a parameter is a measurable quantity or characteristic about the population. Let's look at the example of houses. Say, for instance, we wanted to estimate the average value of houses across the entire United States. Well, the population parameter, where the piece of information about the population we're interested, is the average value of homes across the entire country.
Due to the fact that it's generally very difficult to obtain information about every individual in a population-- in this case, houses throughout the United States-- the exact values of parameters are unknown. There are, however, some exceptions, and the United States Census isn't one of them. So the Census is required by law, every 10 years, to count the population in the United States. And the last census we had was in year 2010.
So as an example, what if we were interested in the total number of children under the age of 18 in the year 2010. Well, according to Census Bureau, we actually have a number for that, because what they did was they went out and counted everyone. They're required to do so. Now, obviously, it's a very lengthy and expensive process, but it's legally required, and therefore, it was done.
So the number that the Census Bureau reports, according to their website, 74,181,467. That is the actual number of children under the age of 18 in the year 2010. These results are parameters, because they come from the entire population.
Now, we don't often have access to an entire population. What if a researcher only has a sample, or any measurable quantity or characteristics related to the sample? Well, that would be called a statistic.
For example, if a pollster contacts registered voters to ask how they plan on voting in an upcoming election, the results of the survey are a statistic. The value of a statistic can be significantly different than that of a parameter, depending on how the sample was obtained and how many observations are in the sample. A larger sample will generally give better information about a population than a smaller sample, assuming that both were conducted in an identical manner. The larger the sample is, the more likely any data that comes from it will represent the actual population.
So as an example of a statistic, let's say we were interested in determining the average number of hours of television that American adults watch per week. Now, that's a situation where the parameter would actually be the average for every adult in the entire United States. However, that's not necessarily something we can easily measure.
So what if we decided to take a survey, or take a representative sample of the population, and we were to ask them a general question, how many hours of television do you watch? In a case like this, the population parameter would be the total average for the entire United States. However, in this case, the statistic would be, well, what is the average of our sample? Of all the people that we asked in our sample, what was the average number of television they watched per week?
Remember, populations consist of all individuals, or a group of individuals, who may possess a certain characteristic-- in this case, they happen to watch television, and they happen to be adults-- while a sample consists of individuals that are a part of the population of individuals who share a common characteristic. In this case, who would be people that we actually surveyed in order to find out what their average number of television hours that they viewed was.
As another example, let's look at the average annual snowfall across the entire United States. So that would be the population parameter. Now, how would we go about determining that? Well, we could look at sample statistics.
So we could look at the average snowfall in one particular city. We could look at the average snowfall in a completely different state. And we would look at that and we'd realize there's a significant difference. And the reason is because the snowfall at one location may not very well be representative of the population as a whole. And so that would be the key distinction there between average annual snowfall across the entire country versus the average within a particular area-- so in other words, the sample versus the population, or the statistic versus the parameter.
So let's discuss the objectives and see if we actually covered them. First objective was to understand the difference between a population and a sample. We did. A population represents a much larger group that we're interested in, whereas a sample represents a small group of the population as a whole.
And to realize the difference between parameters and statistics. Parameters are simply values that represent something drawn from the population. Whereas, statistics are values that represent something from the sample.
So again, my name is Dan Laub. And hopefully, you got some value from today's lesson.
A piece of information about the population.
The entire group of individuals that a researcher is interested in.
A smaller group a researcher selects from the population.
A piece of information about the sample.