This is a tutorial on data and how it's acquired. The word data refers to pieces of information. Now, these pieces of information can be recorded with numbers or with qualities. Examples of potential sources of data include age, hair color, cost of milk, or the store the milk was bought from. Now in order to have a list of people's ages, or a list of hair colors, or cost of milk, or the stores it is purchased from, you need to acquire those pieces of information.
There are two main ways of acquiring data. The first way is data that's already been made available. It's been collected for some other purpose previously. The second way is a new data.
With available data-- that is data that was previously collected-- there are five main questions used to evaluate it. You're probably familiar with these five questions from a variety of other applications. They are, who, what, when, where, why?
Now who means who's collecting this data? What-- what kind of pieces of information did they collect?
When did they do this? Was it too long ago to really be applicable any longer? Or is it so close enough that it would work for your study?
Where did they get the information from? And then, why did they do it? What was their purpose or intent?
Now a great source of already acquired data is governments. Governments are a great source because they collect a lot of data. If you think about the census, if you think about any kind of bureau and labor statistics data, all of that was collected by the government. Additionally, it is typically reliable, and it's also easily enough to access.
Another source of data is new data. With new data, you're collecting the information yourself. When you collect new pieces of information, this is called raw data. It's unprocessed. It's unorganized. It's just a list of the numbers or qualities that you found from your study.
When you are looking at data, whether it is available data or it is new data, you need to be really concerned about bias. Bias is like the bad guy of statistics.
Bias is the systematic favoring of certain outcomes. Sometimes it's intentional, sometimes it's not. Whether it's intentional or not, it's still a bad thing.
It affects the reliability of your study. It leads to inaccuracy. And there's many different ways that bias can come in. We'll learn more about those in a later tutorial.
Some examples of bias. If you are trying to study favorite colors of people and you only looked at a group of women, they'd be much more likely to say pink, than a group of men would. So you'd be introducing bias by looking at the wrong group.
Similarly, if you were trying to look at the cost milk prices and you accidentally only went to gourmet stores, the prices would be a lot higher than everywhere else. And so you'd be introducing bias into your survey by going to the wrong places.
Or if you're trying to determine the height of a high school on average, but you only measure the basketball team, that wouldn't be a good source to measure to know what the average person in high school looked like. So you'd be introducing bias into your study and introducing unreliability.
In summary, when you're looking at data, you're looking at pieces of information. There are two main ways of acquiring the data-- through already available data that was collected by someone else, particularly governments, or by looking at new data that you collect yourself and is raw because it is unorganized and unprocessed.
No matter where your data comes from, you need to be aware of bias. There are many ways of introducing unreliability into your study and we'll look at those in other tutorials. Thank you.