Schools of Thought: Five simple rules for fudging test numbers

Five simple rules for fudging test numbers

With newspaper headlines blaring such missives as "Nearly all Michigan school districts meet improvement standards," or "Scores slump on most Md. tests," August is the unquestionable Month of the Test Result. But as the majority of states begin to release their state assessment and AYP results, one issue which gets lost in the shuffle is just how easy it is for states to manipulate the numbers.

Under NCLB, states have an enormous amount of discretion when it comes to designing their tests. There are five major veins in which states can engage in chicanery: Content standards, test format, achievement standards, cut scores, and n-sizes. I apologize in advance to those for whom this will be Psychometrics 101, but it's not talked about nearly enough.

Content standards
Simply put, these say “what every kid in our state should know in a given grade.” They are the basis of test questions, and any change to the content standards will fundamentally alter what knowledge the test is assessing.

Test format
This is self-explanatory. Is the test entirely multiple-choice? Half multiple-choice and half constructed response? How many questions are there? Is the test timed? Changing the format changes how kids will perform on it. There is also an important distinction between norm-referenced tests, which tell you how students do relative to one another, and criterion-referenced tests, which tell you how students do against a universal benchmark. These are intensely different measures.

Achievement standards
This is the first of two facets that deal with scoring. Achievement standards are the levels of proficiency – e.g. below basic, basic, proficient, advanced. States decide which achievement standard is the target; usually, it is proficient or above. Adding new achievement standards can change the results semantically. Suddenly, kids who were only 10% below proficient are “almost proficient,” and the “basic” group just got 10% smaller.

Cut scores
What score do you need to pass? Most states now employ criterion-referenced tests which assign each student a scale score depending on how many questions they got right. Altering the scale score necessary to be considered proficient dramatically changes the number of students in each category.

N-sizes
The n-size is the minimum number of students belonging to a certain group that must be present in a school for their group to be counted. So, if the n-size is 20, a school with 20 black students has to report and be held accountable for the performance of that subgroup – if the same school only has 17 Hispanics, it doesn’t have to disaggregate for them. This applies not only for racial groups, but also groupings by socioeconomic status, English language capability and disabilities. States have nearly complete autonomy to set their n-sizes, and increasingly the n-sizes have been creeping upwards, excluding more and more subgroups who traditionally struggle.

The incredible number of permutations wouldn't be so bad if states designed their tests in good faith. But they don't. Paul Peterson and Frederick Hess compared the pass rates on every state test with the state's equivalent pass rate on the NAEP, and found that on average, 36% more kids were passing their state assessments than the rigorous NAEP.

The problem that stems from this isn't so much that some states like Texas continue to report trend data even after changing their cut scores (TX changed cut scores between '02-'03 and '03-'04), but instead that results are reported with no context. Arizona is an instructive example; this year, Arizona completely reformatted its AIMS test. New, lower cut scores, sample test items released for the first time, making the test untimed, etc. As a consequence, test scores in Arizona jumped 20 to 30 percent in most subjects.

What's interesting about this is how transparent the charade was. The Arizona department of education said in a press release that the scores weren't comparable historically, and the Arizona Republic wrote an article on how the changes were causing test scores to rise artificially. But the point is this -- from here on out, scores on the AIMS will not reflect any semblance of authentic knowledge!

52% of Arizona 5th graders passed the reading AIMS in 2004, while 71% did in 2005 under the reformatted test. Come next year, when, say, 73% pass, that's the number that will be reported. 73% will be compared to 71%, completely ignoring the fact that both numbers are, in an absolute sense, ludicrously inflated. Thousands more parents will be told that their children are proficient when in reality they are no more proficient than when they weren't proficient last year. In an exceedingly public manner, Arizona has successfully masked the number of its students who can't read or do math well.

Standards and test scores are supposed to highlight problem areas so they can be fixed and hold schools accountable for fixing them. How can that laudable goal possibly function when states have five powerful tools at their disposal with which to whitewash the truth?

This entry was posted on Wednesday, August 24, 2005 at 8:16 AM. |

« Home | Next »