Author: Jonathan Osters

This lesson will introduce the collection and sorting of data.

See More
Introduction to Statistics

Get a load of these stats.
Our Intro to Stats course is only $329.

Sophia's online courses not only save you money, but also are eligible for credit transfer to over 2,000 colleges and universities.*
Start a free trial now.


Video Transcription

Download PDF

This tutorial is going to explain to you exactly what we mean by data. Now, you've probably heard this term before, data. But what does it really mean? Well, data is the pieces of information that we use in order to answer some statistical question. It could be a number. It could be an attribute. But ultimately, it's the pieces of information that we use in order to get a more accurate picture of the scenario. Every piece of data helps us to get a more accurate description, which begs the question, how do you obtain data? Where does it come from? Do you just make it up? Where you get from?

Well, there are two ways that you can get data to serve your purposes. One way is sort of the easy route, to go with something someone else has already done. That's called available data. Use data that's already been collected by somebody. Now, who collects data? Well, a lot of places collect data. There's government organizations. There's polling organizations and news sources. Government entities collect lots and lots and lots of data. And the vast majority of them are very trustworthy and available to the public. Private entities still collect very reliable data.

The other way is if the data's not available or if you don't trust the sources, you can go and collect it yourself. Grab your clipboard, grab your pen, collect it yourself. That's called raw data. Now that's a lot more difficult than if the data is already out there and available. But sometimes, that's exactly how we need to do it. Now if you are using available data, it's very important that you think critically about what the information is trying to convey. Who collected it? Are they reputable? Do you trust them? When was it collected? If it's out of date, it might not be very useful to you.

How was it collected? This is key. If you're not getting it from the population that you want to talk about, then it's not of any use to you at all. And finally, why did they collect it? Look at it with a critical eye. Do they have an agenda? Are they trying to push some type of agenda on you? If they are, then maybe the data's not really all that trustworthy and you should think about gathering it yourself.

Those last two bullets with the asterisk by them are very, very key. And those can introduce what's called bias. Bias is a systematic favoring of certain outcomes in a study. Oftentimes, polling organizations and news organizations and government entities are trying to do the best job they can to get relevant information. It's not usually intentionally put out there. But sometimes it is, when they're trying to push some kind of agenda. So you have to be very careful.

Now if you choose to collect your own data, choose to make your own raw data, think ahead about these things. Who are you going to give this data to? For whom is the data intended? How are you going to get to it? We're going to learn in lots of different tutorials about good ways to collect data. And what question do you wish to answer? That's going to determine who you ask, how you ask, how you collect your data. All of these are considerations as you go forward. And answering these questions before you start really helps in determining how you proceed.

Collecting data is important because it's the source of statistics. Think about data as the raw means of creating something useful. If you collect your data well, the statistics are going to be accurate. If you collect your data poorly, imagine potatoes with your eyes and mold all over them or something. There's no rescuing that. You can't make good statistics out of poor data. And that's just a fact.

So to recap, the data that we collect can be collected several different. Ways but you have to examine the who, what, when, why, how. All those questions are very important in determining what's going to go on before you start collecting data. And the terms that we've used are data, available data, raw data, and big bad bias. Good luck. And we'll see you next time.

  • Available Data

    Data collected by some other entity - a government organization or private company.

  • Raw Data

    Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available

  • Bias

    The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a study