This lesson will introduce the collection and evaluation of data including:
Data is information used in order to answer a statistical question. It could be a number. It could be an attribute.
But ultimately, it's the pieces of information that we use in order to get a more accurate picture of a scenario. Every piece of data helps us to get a more accurate description, which begs the question, how do you obtain data? Where does it come from? Do you just make it up? Where is data?
Data: Information used in a study to answer a statistical question
There are two types of data to serve your purposes:
A possible easier route is to go with something someone else has already done. Available data is data that has already been collected by somebody.
Available Data: Data collected by some other entity - a government organization or private company
Now, who collects data? Well, a lot of places collect data, such as:
The vast majority of sources are trustworthy, however, when using available data, it's important to think critically about what the information is trying to convey. It’s important to break apart the information and ask yourself these questions:
So, how do you know when you need to gather the information yourself? Obviously if the population doesn’t match your topic of interest, then it is of no value to you.
But what about less obvious characteristics such as whether or not a source has an agenda?
This is a key point here. Having an agenda, whether intentional or not, can introduce what's called bias.
Bias: The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a study.
Oftentimes, polling organizations and news organizations and government entities try to do the best job they can to get relevant information. It's usually not intentionally put out there. But sometimes it is, when they're trying to push some kind of agenda. So you have to be very careful.
If data is not available, or if you don't trust the sources, you can collect it yourself. That's called raw data. It’s a lot more difficult than if the data is already out there and available, but sometimes it’s necessary.
Raw Data: Unorganized, unprocessed and not summarized. Typically, this is data that is not already available.
Now, if you choose to collect your own data, then you must think critically and ask yourself these questions:
Collecting data is important because it's the source of statistics. Think about data as the raw means of creating something useful. If you collect your data well, the statistics are going to be accurate. If you collect your data poorly, then your data is poor. There's no rescuing that.
You can't make good statistics out of poor data. Thinking critically will help you determine which type of data should be used for your purposes.
This tutorial defined data as “information used in a study to answer a statistical question.” We discussed how to evaluate types of data, available or raw, and questions focusing on the the who, what, why, and how should be posed to help identify bias. When gathering your own data, it’s important to understand your audience and consider how they will gain access to all your hard work.
Source: This tutorial is adapted from the work of Sophia author, Jonathan Osters.
Data collected by some other entity - a government organization or private company.
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a study
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available