Statistics Challenge

Overview

The goal of this project is to solve a statistics problem in the shortest time. You will carry out the project as part of a team. There are a number of tasks that are common to data analysis such as figuring out what needs to be done, checking for data errors, transforming data prior to analysis and running statistics.

Rules

Rule #1. Since you will be in a team you are advised to develop a strategy where different team members have different tasks. Some suggested tasks are:

_______________   organizing tasks (i.e., team leader)
_______________   documenting results of each step
_______________   typing in the master file
_______________   data entry
_______________   regression
_______________   correlation
_______________   ANOVA

Each member of the team should sign up for one to three of the tasks above. When you do the tasks it is highly recommended that you do NOT do the tasks in strict sequential order. That is, don't wait for a team member to finish their task before you begin yours; rather, try to do as much of your task as you can right away. When you have some results provide them to the person who is typing in the master file.

Rule #2. You will receive a data file. You must use this data file to carry out the analyses. That is, you cannot just type the data in; rather you need to use the data provided and check it for errors.

Rule #3. You must use the statistical software program to carry out the data manipulation steps (such as forming an index that combines 3 raw scores); you cannot just add the scores for each subject and type that number in the data set. This assignment is intended to simulate a "real world" data analysis problem in which there are typically hundreds of cases such that the computer would carry out the data manipulation.

Experimental Design

A researcher investigates four groups (Control 1, Control 2, Experimental 1 and Experimental 2). Each of the four groups is measured on a dependent variable across four blocks; there is a pretest block and three test blocks. The researcher would like to know if Control differs from Experimental on the dependent variable.

For the independent variable, the researcher would like to end up with just two groups, and compare Control (consisting of Control 1 and Control 2) with Experimental (i.e., Experimental 1 and Experimental 2); that is, do an analysis that combines the two control groups into one group, and the two experimental into another. For the dependent variable, the researcher would like to either form an index using all four block scores (i.e., the pretest and test blocks) or just the last three block scores (i.e., the test blocks); he will use all four scores if the pretest has a "substantial" correlation with the total of the three test block scores; he will use the total of the last three scores if the correlation is "not substantial".

The last subject's data is missing for Test3. Before doing the other statistical analyses, the researcher would like to replace this missing value with a score estimated from the data. The researcher predicts this missing data from a formula that is derived from predetermined values for j and k, and from values of A and B determined by a regression line predicting Test3 from Test2 with the other three experimental subjects data. The researcher uses the formula:

Test3 = j[A] + k[B(Test2)]

where

j = -0.375
k = 1.050
A = ordinate intercept
B = slope
Test2 = score for subject with missing data

Note that the value for Test3 when rounded will give a whole number. Replace the missing value with this whole number in the data set.

Data

The correct data is shown below. You will receive a data file that was "hastily" typed up. You must use this file (i.e., you cannot simply enter the data below) and you should carefully check it for errors.

 
pretest
test1
test2
test3
 
control 1
3
5
1
4
 
 
5
23
6
1
 
control 2
4
17
3
10
 
 
6
3
5
2
 
experimental 1
4
12
6
2
 
 
3
20
12
8
 
experimental 2
6
12
12
16
 
 
5
6
4
missing
 
 
 

You may correct the DV data values if you find errors, but do not change any IV scores in the data set. That is, you need to use the correct statistical software control command to convert the 4 groups into 2 groups; you can't just simplying change the data values.

Goal

Your goal is to provide the F ratio for the final analysis; write it down and document how it was derived. There is a 10 minute penalty for each incorrect F ratio provided. In the event that no group comes up with the correct answer, the winner will be determined on the basis of which group has advanced the farthest--as such, be sure that you document your answers to the various steps and can provide this at the end of the class period. There will be no time extensions.

 

Instructor notes

Some students may attempt to type the data in directly instead of using the data file with the errors. You can prevent this by either monitoring that they don't do so, or providing them with the file already on their computer. If you are using SPSS, for example, give it to them as chal.sav on a floppy or set up on the desktop.

Since this exercise is on the internet, you should probably not announce the assignment ahead of time. A clever student could easily track it down and find the correct answer without working through the problem.

When I first carried this task out, students could "buy" a hint. This didn't turn out to be a successful rule as no team was interested, so I dropped this. I did, however, find that giving hints was a good teaching technique. This prevent a team from getting hopeless lost or frustrated. I tried to balance the value of the hints for each side. An example of a hint would be "you know there are some characters that look alot alike." This could be used if a team failed to notice that the 10 (the number ten) was miscoded as 1O (the number 1 followed by the letter O) in the data set.

 

Terms of Use: You are free to use this challenge as long as this "terms of use" notice and the citation are kept intact. On a website please cite using:
<a href="http://symynet.com/fb">Statistics Challenge </a>

The student handout in a printer-friendly format is here.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The data file is available here.