This lesson will introduce the collection and evaluation of data including:
Data is the pieces of information that we use in order to answer some statistical question. It could be a number. It could be an attribute.
But ultimately, it's the pieces of information that we use in order to get a more accurate picture of a scenario. Every piece of data helps us to get a more accurate description, which begs the question, how do you obtain data? Where does it come from? Do you just make it up? Where is data?
There are two types of data to serve your purposes. It's possible that the the easier route is to go with something someone else has already done. Available data is data that has already been collected by somebody.
Now, who collects data? Well, a lot of places collect data, such as:
The vast majority of sources are trustworthy, however, when using available data, it's important to think critically about what the information is trying to convey. It’s important to break apart the information and ask yourself these questions:
So, how do you know when you need to gather the information yourself Gathering information yourself is called raw data. Obviously if the population doesn’t match your topic of interest, then it is of no value to you, so you need to gather it yourself.
But what about less obvious characteristics such as whether or not a source has an agenda? This is a key point here. Having an agenda, whether intentional or not, can introduce what's called bias.
Oftentimes, polling organizations and news organizations and government entities try to do the best job they can to get relevant information. It's not usually intentionally put out there. But sometimes it is, when they're trying to push some kind of agenda.
I you choose to collect your own data, you must think critically and ask yourself these questions:
Collecting data is important because it's the source of statistics. Think about data as the raw means of creating something useful. If you collect your data well, the statistics are going to be accurate. If you collect your data poorly, then your data is poor. There's no rescuing that.
You can't make good statistics out of poor data. Thinking critically will help you determine which type of data should be used for your purposes.
This tutorial defined data as “information used in a study to answer a statistical question.” We discussed how to evaluate types of data, available or raw, and questions focusing on the the who, what, why, and how should be posed to help identify bias. When gathering your own data, it’s important to understand your audience and consider how they will gain access to all your hard work.
Source: Adapted from Sophia tutorial by Jonathan Osters.
Data collected by some other entity - a government organization or private company.
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a study