In this activity, you will use inferential statistics: the systems and techniques for making probability-based decisions and accurate predictions based on incomplete – or sample – data. There are many different types of data and when learning how to use statistics, data analysis is an important skill. Ideally, you should already have an understanding of basic statistical terms such as population, parameter, sample, standard deviation, and statistic mean. In addition, you will need to know how to create a sampling distribution and have practiced with confidence intervals (calculating them and understanding what they mean).
In this activity, you will create and deploy a three-question survey, and you will encounter problems of sampling, sampling variability, population, and parameter. In solving these problems, you will produce visuals showing confidence intervals and experience what happens to a confidence interval when the confidence level is changed. The goal here is to be able to “visualize” statistical data in a summary form and get a sense of their values in order to make a thoughtful statement of interpretation of its meaning.
SELF CHECK:
Did your project show how fundamental elements of statistical knowledge are applied to solve real-world problems?
Did you explain how your survey results, even though a good resource for making a decision, can be debated by the users as still being uncertain as a good resource? In other words, did you address the sources of uncertainly and how they are addressed by statistical science in a way that is easy for others to understand?
(NOTE: all these sites offer free access to statistics educational content, however, some of the sites listed also include offers of services for a fee. It is not the intent of this author to encourage users to purchase services from the vendors.)
This exercise was developed under the direction of Nathan Koebcke, a graduate student and teaching assistant in the Department of Statistics at the University of Kentucky. A similar kind of activity has been used in his course, STA210 “Introduction to Statistical Reasoning,” which is a component of the University’s general education program, UK Core (see more at http://www.uky.edu/UGE/documents/Templates/Statistical.pdf). Successful students who complete this course at UK should be able to articulate how statistical science can be used to address uncertainty in many of our decisions and decide whether a statistical argument (that is used for example in the mainstream media) is valid.
The word statistics comes from the ancient Latin statisticum collegium, meaning “a lecture on the state of affairs” and from the German word Statistik, which means “collection of data involving the State.”
A sample survey is a type of data collection that provides information for you to study something about a particular subset of a population and estimate the value of a particular item in the context of the whole population. A sample from a population might not give as accurate results as a survey of a whole population (i.e., a census) but it helps in decision-making. A sample survey helps, if it is well-designed, to infer how those same attributes apply to everyone in that population (the population parameter) – even if you did not survey them all directly. However, we must remember that it is an estimate and be wary of how widely the results can be applied to a problem’s solution. A sample statistic is an estimate based on sample data, and the quality of a sample statistic (that is, its accuracy, precision and representativeness) relies on the way that we choose sample observations – the sampling method.
EXAMPLES OF SAMPLE SURVEYS
In the business world a sample statistic is a way to save costs, but the decisions made from an analysis of the data collected should come only after a good “confidence interval” has been reached. In a distribution chart, the confidence interval describes how a set of data will most likely fall between upper and lower boundaries. The best a confidence interval can be is between 95% or 99%, but it is a way to predict how the population as a whole will function. See for example how stock brokers use a confidence interval to predict returns on investments ("Confidence Interval," Investopedia) or how health care professionals use it in designing chronic disease and injury programs ("Confidence Intervals," New York State Department of Health).
Choose a topic on which you want to survey a group of friends. For example, you want to learn the percentage of voters that favor more regulatory control over a controversial topic such as coalmining or placement of homeless shelters.
Create three different kinds of questions on your chosen topic. Before you begin writing your questions, write down exactly what you need to know and what additional information would be nice to know. Keep that list beside you as you write your survey questions. Be sure and keep your questions simple – get to the point and avoid the use of jargon or acronyms.
Review this “Guide to Writing Survey Questions: Things to Think About Before You Start” (from Management Analysis & Development, Minnesota).
Now write an introductory narrative to identify the topic, the purpose of the survey and the date it needs to be returned. Make it clear that the survey is anonymous. Now, pre–test your survey with a few members of your target audience to find glitches, such as unexpected question interpretations or confusing answer choices. You will succeed if you keep your survey anonymous, short, simple, precise, clear and focused.
Conduct your survey and gather your data. For example, you can launch a survey on Facebook (here an ehow.com tutorial) or you can get a free account on SurveyMonkey (your data will be completely private) and send the survey to your selected sample population.
Pay attention to the parameter of your population. Let’s imagine you have 350 Facebook friends total who could, in theory, be sent the survey to gather all the information as in a census. How many will you sample depends on how you want to randomize it (e.g., only people whose profile pictures are humanoid or only selecting the first in every 5 friends in the message selection box). Ideally, you should sample at least 20 people to get enough data to work with on this activity.
When you are writing up your analysis about your group’s larger population, you must indicate a confidence interval (http://davidmlane.com/hyperstat/A29494.html) and a confidence level. You want to avoid misleading others with a misuse of your statistics from your sample. You don’t want to be accused of “statisticulation” (HINT: it’s not a disease)!
So now work on how to create a confidence interval by working on the StatTrek Tutorial at (see also their essay on Estimation in Statistics) or read the HyperStat Online textbook chapter on confidence intervals and practice with the exercises there.
Next, create a 95% Confidence Interval based off the answers you received from your sample. Then experiment with the analysis and create an 80% Confidence Interval based off the answers you receive from your sample. Take three-quarters of your sample and create a 95% Confidence Interval based off that new sample.
Source: Derived from the UK Core assessment rubric for Statistical Inferential Reasoning: http://www.uky.edu/ukcore/sites/www.uky.edu.ukcore/files/SIR%20Rubric.pdf
