+

# A Statistics Problem: Creating a Good Survey and Analyzing the Results

(24)
• (5)
• (6)
• (3)
• (1)
• (9)
##### Description:

In this activity, you will use inferential statistics: the systems and techniques for making probability-based decisions and accurate predictions based on incomplete – or sample – data. There are many different types of data and when learning how to use statistics, data analysis is an important skill. Ideally, you should already have an understanding of basic statistical terms such as population, parameter, sample, standard deviation, and statistic mean. In addition, you will need to know how to create a sampling distribution and have practiced with confidence intervals (calculating them and understanding what they mean).

In this activity, you will create and deploy a three-question survey, and you will encounter problems of sampling, sampling variability, population, and parameter. In solving these problems, you will produce visuals showing confidence intervals and experience what happens to a confidence interval when the confidence level is changed. The goal here is to be able to “visualize” statistical data in a summary form and get a sense of their values in order to make a thoughtful statement of interpretation of its meaning.

SELF CHECK:

Did your project show how fundamental elements of statistical knowledge are applied to solve real-world problems?

Did you explain how your survey results, even though a good resource for making a decision, can be debated by the users as still being uncertain as a good resource? In other words, did you address the sources of uncertainly and how they are addressed by statistical science in a way that is easy for others to understand?

(NOTE: all these sites offer free access to statistics educational content, however, some of the sites listed also include offers of services for a fee. It is not the intent of this author to encourage users to purchase services from the vendors.)

James H. Baird, “How Statistics Can Lie,” Green Section Record  (May/June 2003): 21-23. http://turf.unl.edu/extpresentationspdf/BairdStats.pdf

“College Statistics,” Ask Dr. Math College Archive, The Math Forum @ Drexel University, http://mathforum.org/library/drmath/sets/college_statistics.html

HyperStat Online Statistics Textbook, http://davidmlane.com/hyperstat

Stat-Spotting: A Field Guide to Identifying Dubious Data (University of California Press, 2008) - http://www.ucpress.edu/book.php?isbn=9780520257467

“Probability and Statistics” free course from the Open Learning Initiative of Carnegie Mellon University. http://oli.cmu.edu/courses/free-open/statistics-course-details

StatSoft, Inc. (2012). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB: http://www.statsoft.com/textbook/.

“Stat Trek: Teach Yourself Statistics,” http://stattrek.com

“Statistics Lessons,” Free Math Help, http://www.freemathhelp.com/statistics.html

“Stats Make Me Cry,” http://www.statsmakemecry.com

(more)

Sophia’s self-paced online courses are a great way to save time and money as you earn credits eligible for transfer to many different colleges and universities.*

No credit card required

28 Sophia partners guarantee credit transfer.

281 Institutions have accepted or given pre-approval for credit transfer.

* The American Council on Education's College Credit Recommendation Service (ACE Credit®) has evaluated and recommended college credit for 25 of Sophia’s online courses. Many different colleges and universities consider ACE CREDIT recommendations in determining the applicability to their course and degree programs.

Tutorial

## Acknowledgements:

This exercise was developed under the direction of Nathan Koebcke, a graduate student and teaching assistant in the Department of Statistics at the University of Kentucky. A similar kind of activity has been used in his course, STA210 “Introduction to Statistical Reasoning,” which is a component of the University’s general education program, UK Core (see more at http://www.uky.edu/UGE/documents/Templates/Statistical.pdf).  Successful students who complete this course at UK should be able to articulate how statistical science can be used to address uncertainty in many of our decisions and decide whether a statistical argument (that is used for example in the mainstream media) is valid.

## Something to ponder:

The word statistics comes from the ancient Latin statisticum collegium, meaning “a lecture on the state of affairs” and from the German word Statistik, which means “collection of data involving the State.”

## Using a Sample Survey to Gather Data:

A sample survey is a type of data collection that provides information for you to study something about a particular subset of a population and estimate the value of a particular item in the context of the whole population.  A sample from a population might not give as accurate results as a survey of a whole population (i.e., a census) but it helps in decision-making. A sample survey helps, if it is well-designed, to infer how those same attributes apply to everyone in that population (the population parameter) – even if you did not survey them all directly.  However, we must remember that it is an estimate and be wary of how widely the results can be applied to a problem’s solution. A sample statistic is an estimate based on sample data, and the quality of a sample statistic (that is, its accuracy, precision and representativeness) relies on the way that we choose sample observations – the sampling method.

EXAMPLES OF SAMPLE SURVEYS

• Systematic Sample: Contact every third person who submitted an online response to your blog to ask them how many times they typically reply to social media outlets per day.
• Cluster Sample: Mark 20 square blocks on a map of your neighborhood association’s boundaries then interview every household living in two of the squares.
• Stratified Sample: In a wild horse herd that is 80% females and 20% male, randomly select a small group of them to study but make sure that the sample has no more than 20% males.

In the business world a sample statistic is a way to save costs, but the decisions made from an analysis of the data collected should come only after a good “confidence interval” has been reached. In a distribution chart, the confidence interval describes how a set of data will most likely fall between upper and lower boundaries. The best a confidence interval can be is between 95% or 99%, but it is a way to predict how the population as a whole will function. See for example how stock brokers use a confidence interval to predict returns on investments ("Confidence Interval," Investopedia) or how health care professionals use it in designing chronic disease and injury programs ("Confidence Intervals," New York State Department of Health).

## Step 1. Develop a 3-question sample survey:

Choose a topic on which you want to survey a group of friends.  For example, you want to learn the percentage of voters that favor more regulatory control over a controversial topic such as coalmining or placement of homeless shelters.

Create three different kinds of questions on your chosen topic. Before you begin writing your questions, write down exactly what you need to know and what additional information would be nice to know. Keep that list beside you as you write your survey questions. Be sure and keep your questions simple – get to the point and avoid the use of jargon or acronyms.

1. Yes/No – These kinds of questions are really hard to write, but easy to analyze. Be sure to keep the question focused on behavior and not about attitude. For example, “Have you ever ridden a horse?” is better than “Do you think horseriding is dangerous?” See more on how to write good Yes/No questions at the Vovici Resource Center’s The Listening Post.
2. Rating Scale – Rating scales are a great way to measure and compare sets of variables. Be sure to use an odd number in your scale – and that are two extremes for your answers (e.g., “very interested” scales down to “not interested”) to make data analysis easier. For example, “What has your experience been working with the Help Desk?” can be improved with a more direct question: “How satisfied are you with the response time of the Help Desk?” Find more examples at the WISCO Survey tutorial on "Numeric Rating Scale Survey Questions."
3. Open Ended – This kind of question can provide useful qualitative information and insights on a topic. Use this chart to practice writing good open-ended questions: Question Creation Chart (adapted from educationoasis.com)

Review this “Guide to Writing Survey Questions: Things to Think About Before You Start” (from Management Analysis & Development, Minnesota).

Now write an introductory narrative to identify the topic, the purpose of the survey and the date it needs to be returned. Make it clear that the survey is anonymous. Now, pre–test your survey with a few members of your target audience to find glitches, such as unexpected question interpretations or confusing answer choices.  You will succeed if you keep your survey anonymous, short, simple, precise, clear and focused.

Conduct your survey and gather your data.  For example, you can launch a survey on Facebook (here an ehow.com tutorial) or you can get a free account on SurveyMonkey (your data will be completely private) and send the survey to your selected sample population.

Pay attention to the parameter of your population.  Let’s imagine you have 350 Facebook friends total who could, in theory, be sent the survey to gather all the information as in a census.  How many will you sample depends on how you want to randomize it (e.g., only people whose profile pictures are humanoid or only selecting the first in every 5 friends in the message selection box).  Ideally, you should sample at least 20 people to get enough data to work with on this activity.

## Step 2. Determine the Confidence Level of your Survey Results:

When you are writing up your analysis about your group’s larger population, you must indicate a confidence interval (http://davidmlane.com/hyperstat/A29494.html) and a confidence level. You want to avoid misleading others with a misuse of your statistics from your sample.  You don’t want to be accused of “statisticulation” (HINT: it’s not a disease)!

So now work on how to create a confidence interval by working on the StatTrek Tutorial at  (see also their essay on Estimation in Statistics) or read the HyperStat Online textbook chapter on confidence intervals and practice with the exercises there.

Next, create a 95% Confidence Interval based off the answers you received from your sample. Then experiment with the analysis and create an 80% Confidence Interval based off the answers you receive from your sample.  Take three-quarters of your sample and create a 95% Confidence Interval based off that new sample.

## Step 3. What happened? Write up your results and determine what they mean:

1. Define the following (in BOTH words and numbers):
1. Population
2. Sample
3. Parameter
4. Statistic
2. How did you come up with your sample? How did you deal with sampling and non-sampling errors that we have discussed in class? Is this truly a “random sample?”
3. Summarize your survey responses in a way that is meaningful and informative.
4. This analysis report should include:
1. 95% and 80% Confidence interval from the original sample, and the 95% Confidence interval based on the second sample.
2. Do these intervals include the parameter? Why would they not?
3. What happened as you changed the confidence?
4. What happened as you changed the sample size?
5. An interpretation of that interval.

## Self Check:

Did your project show how fundamental elements of statistical knowledge are applied to solve real-world problems?

Did you explain how your survey results, even though a good resource for making a decision, can be debated by the users as still being uncertain as a good resource? In other words, did you address the sources of uncertainly and how they are addressed by statistical science in a way that is easy for others to understand?

Source: Derived from the UK Core assessment rubric for Statistical Inferential Reasoning: http://www.uky.edu/ukcore/sites/www.uky.edu.ukcore/files/SIR%20Rubric.pdf

(NOTE: all these sites offer free access to statistics educational content, however, some of the sites listed also include offers of services for a fee. It is not the intent of this author to encourage users to purchase services from these vendors.)

James H. Baird, “How Statistics Can Lie,” Green Section Record  (May/June 2003): 21-23. http://turf.unl.edu/extpresentationspdf/BairdStats.pdf

Joel Best, Stat-Spotting: A Field Guide to Identifying Dubious Data. University of California Press, 2008.

“College Statistics,” Ask Dr. Math College Archive, The Math Forum @ Drexel University, http://mathforum.org/library/drmath/sets/college_statistics.html

HyperStat Online Statistics Textbook, http://davidmlane.com/hyperstat

“Probability and Statistics” free course from the Open Learning Initiative of Carnegie Mellon University., http://oli.cmu.edu/courses/free-open/statistics-course-details

StatSoft, Inc. (2012). Electronic Statistics Textbook. Tulsa, OK: StatSoft.  http://www.statsoft.com/textbook.

“Stat Trek: Teach Yourself Statistics,” http://stattrek.com