Schools of Thought: Question by question

Question by question

A final thought in the way test scores are reported and evaluated: Not only do states engage in masking the truth, and not only does the media commit a fallacy by only reporting comparative data, but passage on a test is given in static states instead of looking at how many kids are passing each genre of question. In other words, a child can pass a math test on the strength of his division skills while lacking multiplication.

The problem is, the vast majority of states don't release old tests with the accompanying percentage of students who got each question right. Only two that I've found, Washington and Wyoming, do this. The Boston Globe also managed to get its hands on some data for an article today.

So what do Washington and Wyoming demonstrate? Primarily that there is a tremendous amount of variation in the number of students that get different questions correct. We'll look at 4th grade math tests for this exercise, since multiple choice items are the simplest unit of analysis.

In Washington, for example, 80% of 4th graders correctly answered a question about how they would go about collecting data on their classmates' pets, while only 47% could answer a problem involving whole numbers and fractions.

Wyoming is an even better case study because they have far more released MC items and their test is closely aligned to their state NAEP results. Overall in 2003, the test we're looking at, 37% passed proficient.

In Wyoming we see, on the low end, 24% correct on a question involving measurement and arithmetic, 31% on a question with area and arithmetic, 42% on a question about time, 47% on a question about digits and numbers and my personal favorite, 30% correctly answering "which of the figures below is not a rectangle?"

On the high end, 62% correctly responded to a question on estimation, 63% on patterns, 66% on graphing, 61% on simple multiplication and 68% on probability (We'll set aside for a moment that having a third of kids missing questions on the high end is atrocious).

A 40 point spread does not an accurate average make. Once again, it means that most data points are likely not clustered around the middle (of the 15 released items in Wyoming, precisely one was within 5 points of the state average of 37%). Especially when you take into account the automatic advantage of multiple choice questions with regards to guessing and having the universe of answers reduced to four, the results suddenly seem a bit more complex. The media needs to start challenging state departments of education to release tests with the percent getting each question correct, and then some serious analysis needs to happen to see what's really there.

Here's my question: If states can screw around with the structure of a test to inflate the results, if no one questions the absolute numbers and instead focuses solely on trends, and if results are reported/accountability is determined by averages and scale scores which don't reflect the wild variation inside a test, in what way are they legitimate measures with which to force states to improve their education systems? And, are there avenues available to rectify these problems, or are they crippling execution flaws to an otherwise admirable policy goal? Finally, what are the alternatives? Stay tuned: These latter two questions will occupy many of my subsequent posts.

This entry was posted on Friday, August 26, 2005 at 9:29 AM. |

« Home | Next »