This tutorial is all about data. So let's start by looking at two separate research questions. Question 1, how tall is the tallest player on the 2011-2012 University of Minnesota women's basketball team? And question number 2, how tall is the tallest student in my son's third grade class? So two questions.
So what is needed to answer both questions? And the answer to that, of course, is data. So let's define data before we get back to the questions. So what data is is a particular value of a measurable factor, characteristic, or attribute of an individual or a system. So it can be a measurable factor, can be a characteristic, or it could be an attribute.
And so we have a couple ways of acquiring data. First way is, of course, you can just use available data. The internet is a great resource for available data. And of course, the second way is to collect your own. So if we go back to that first research question, chances are we're not going to be able to collect our own data to answer research question number one. But there may be some available data that we can use.
So what I did is I went on the University of Minnesota athletics web page. And I pulled up a roster on that 2011-2012 University of Minnesota women's basketball team. And basically, this roster gives me a whole bunch of data. So if we look at this F here, that's a piece of data that tells me the position of this specific player Shonte Clay. This SO, that means sophomore. That's a sophomore of Micaella Riche.
So the data that I'm specifically interested in is the height data. So that's all listed here. And I was looking for how tall is the tallest player. And if I scan this data, I can see that the tallest player is 6 foot 7 inches. And that ends up being Amber Dvorak. So I was able to use this available data to answer my research question.
Now, when you're using available data, you want to always make sure that it's good data. And a couple kind of questions you can ask yourself is why was it collected? Who collected it? And how was it collected?
So if I'm thinking about my available data, the basketball roster, so if I think about, well, why was it collected, well, it was probably collected because fans of the team, supporters, they're interested in the players. They want to know how tall they are. They want to know where their home-- what hometown they're from, their height. So that's probably why it was collected to provide more information for fans.
Who collected it? Well, I don't know specifically who collected it. But I can probably assume that it was somebody from the University of Minnesota collected the data and then compiled it. How was it collected? Again, I'm not really sure how it was collected. But I'd imagine that the height data probably came from some sort of physical exam or something where they were actually measured. And their height was collected that way.
Ultimately, you want to be certain that the data is reliable. So I went through those three questions. Chances are they're not trying to give me bad data. Or they're not trying to misrepresent the players. So I'm fairly confident the data that was available to me was good data.
Another place where you can generally get good, reliable, available data is through governments. So the US government provides a lot of available data on a wide range of topics and variables. Individual states usually have great, big data sets that are available. City governments often do the same thing.
And a little fun fact for you is that many think that the word statistics at some point came from a state's need for data in order to make demographic economic decisions. So that stat and statistics comes from the STAT in state. So just a little fun fact for you.
Now, if we think back again to our research questions, we answered question 1. So now, let's think about question 2. How tall is the tallest student in my son's third grade class? Well, chances are that that data is not going to be available.
So what we need to do or what you would need to do to answer that question is to collect the data yourself. So you need to come up with some self-collected data. So chances are I would need to go into my son's classroom, measure all the kids, and then end up determining once you have that big list of data, well, who is the tallest? And how tall is that person?
So when I collected that data, what I would start with, my big list of data, that would be called raw data. So self-collected data generally starts as raw data. And what raw data is is data that is unorganized, unprocessed, and not summarized. So you haven't taken any averages. You haven't made it into a table. All it is is just a big list of data. So that's what we call raw data.
So once I collected my data, I would be able to answer that second research question. So in that case, it's going to be pretty easy to collect the data. In many other instances, good self-collected data is going to be pretty hard to get. And one thing that what you want to do when you're collecting data is to avoid what's called bias. So sometimes, bias can affect the data collection process.
Now, to define that word bias is bias is the tendency for collected data to differ from what is expected in a systematic way. Biased data can often favor a specific group of those studied. So we do want to effect bias. Bias can often slant the results one way or another. So it's going to be important as we're collecting data and learning how to collect it in the future that we want to think of ways of collecting it that will minimize that bias.
So how to collect good data while avoiding bias will be the subject of future tutorials. So stay tuned. And that's all for this tutorial. Thanks for watching. And we'll see you soon.