The Center on Education Policy (CEP) recently issued a report titled “State Test Score Trends Through 2008-09, Part 1, Rising Scores on State Tests and NAEP,” lauding a supposed discovery that improvement on both state public school tests and the National Assessment of Educational Progress (NAEP) is moving in the same direction and looks much better than previously reported.
Unfortunately, the report turns the mathematics of statistical sampling on its ear while attempting to claim that the performance of most states is improving on the NAEP reading assessments.
The facts: Between 2005 and 2009, the period of major concern in the CEP report, NAEP’s own Reading Report Card for 2009 clearly shows that a strong majority of the states did not post statistically significant improvements in either fourth- or eighth-grade reading. That general lack of progress is found both for the percentages of students scoring at the level NAEP calls “Basic” and at the level NAEP calls “Proficient.” Trying to claim otherwise, which the CEP report attempts to do, simply is not statistically defensible.
There are more issues with the CEP report, such as the selection of an unrealistically low target performance level – NAEP Basic – as a suitable comparison level for state assessment programs. If you would like to see more on that, just click the “Read more” link below.
To begin, first let’s expand on the CEP report’s incorrectly drawn conclusions from the NAEP reading data.
The statistics matter
The NAEP is a sampled assessment. Only a fraction of the students in each state take each test. Thus, NAEP results include statistical sampling errors, just like reports of voter polling always include margins of error.
NAEP’s sampling errors make it impossible to confidently detect small changes in a state’s performance, just like it is impossible to call a close election before all the votes get counted. Unless a performance change is fairly large on the NAEP, claims that real improvement has occurred lack any validity.
For instance, consider Table A-18 in the 2009 NAEP Report Card. This table shows the percentage of students scoring at or above the “NAEP Basic” level in various states for a number of different years.
Table A-18 shows that while the exact figure varies from state to state, in some cases (e.g. New Mexico) states may need changes of more than 4 points before NAEP can confidently detect genuine improvement.
That leads to a serious methodological problem in the CEP report. On Page 6 the authors of the CEP study write, “We did not constrain comparisons by limiting NAEP data to statistically significant changes.” Instead, the CEP report treats any numerical increase in NAEP numbers as a valid and true performance increase.
That action seriously violates some very basic statistical rules, including very clear comments found in the NAEP’s own documentation and reports.
For example, this comment appears on Page 6 in the 2009 NAEP Reading Report Card:
“Only those differences that are found to be statistically significant are discussed as higher or lower.”
Of importance here, the experts who run the NAEP obviously do not consider small changes in either direction to be meaningful. It’s statistically invalid to claim a true increase in performance on NAEP occurred when the blur caused by statistical sampling error won’t support such an assertion. In the cases of small score changes on the NAEP, about the best we can confidently say is performance is flat.
How does the CEP methodology impact the report’s conclusions?
The impact of the CEP’s methodological error on the report’s findings becomes quite apparent when you examine the percentages of eighth grade students who scored at or above NAEP Basic in reading in 2005 and 2009. Those rates are listed in Table A-18 in the NAEP 2009 Reading Report Card.
Here is an extract of that table with some added highlights for the states in question (Click on the picture to enlarge).
In this NAEP table, all of the 2005 figures that are statistically significantly different from the 2009 rates are marked with an asterisk. Where no asterisk appears next to a state’s 2005 score, the limitations in NAEP’s sampling statistics do not allow us to claim that score is different from the 2009 score.
Now, consider what the CEP report claims. CEP shows in its Table 1-A that 17 states had a gain in NAEP Basic reading performance for their eighth grade students between 2005 and 2009. However, as highlighted by the arrows in the figure above, the NAEP itself only supports such a claim for six of those 17 states (CA, FL, MD, PA, TX and UT).
The other 11 states CEP lists as having made gains in eighth grade reading (AK, AL, AZ, CO, MT, ND, NM, NV, OH, TN and WI) actually have results that NAEP itself declines classify as being either higher or lower than the 2009 results. The best conclusion we can draw from the NAEP is that those 11 states had flat results between 2005 and 2009 and should be listed in the CEP’s Table 1-A under the category of “# of states with no change.”
Thus, while the CEP’s report shows that 20 states had gains in their Proficiency rates on their own state eighth-grade reading tests, the NAEP only confirms that six of those states really improved. In the other states that CEP claims also improved – if there is improvement at all – it’s too small for the NAEP to reliably detect. That is a very different picture from that offered in the CEP report.
A similar situation exists with the fourth grade NAEP reading data. To briefly summarize, while CEP claims that 16 states showed improvement against the fourth grade NAEP Basic standard, the NAEP Report Card for 2009 clearly can only support that claim for five states.
Is NAEP Basic a suitable target for state assessment evaluation? Evidence from Kentucky
It is also important to note that the improvement claims from the CEP are based on comparison of state tests to the watered-down target score of NAEP Basic. NAEP documents make it clear that NAEP Basic only denoted partial mastery of material, a fact that even the CEP report admits.
Aside from the clear evidence in NAEP documents that NAEP Basic isn’t a good comparison for state testing, additional evidence from other testing in Kentucky disputes the CEP’s assertion that “Proficient” level scoring from state tests should be compared to NAEP Basic, not NAEP Proficient results.
A small study I assembled in the freedomkentucky.org Wiki site using Kentucky’s EXPLORE test results indicates that if we are interested in finding out how well schools are preparing students for postsecondary education and living wage occupations in the workforce, then NAEP Proficient – not NAEP Basic – truly is the appropriate target for state test comparisons.
For example, this graph, taken from that report, shows surprisingly good agreement between NAEP Proficient results and the percentage of students in the same student cohorts who reached or exceeded the EXPLORE Benchmark Score that indicates students are on track for college and career readiness.
You can learn more by clicking the link above to the complete study, which also looks at math performance. Certainly, the evidence from Kentucky’s EXPLORE testing supports the idea that NAEP Proficient is much better aligned to what students really need than NAEP Basic is.
By the way, the Kentucky 2008-09 Interim Performance Report (access from menu here) shows Kentucky Core Content Test results of 68.05 percent proficiency in eighth grade reading. Thus, both NAEP and EXPLORE tell us the Kentucky Core Content Tests are seriously inflated.
That inflation got worse when Kentucky reduced the rigor of scoring of the Kentucky Core Content Tests in 2007. That same scoring inflation, once it became obvious, played a role in the Kentucky Legislature’s vote in 2009 to disband the Kentucky Core Content Tests as soon as decent replacements – ones that will correlate to college preparedness – can be brought on line.
Similar testing changes are in the wind nationwide. Virtually every state has committed to adopting the new Common Core State Standards and to revising their own state assessments. Pressure to do this comes from widespread understanding that state tests are generally providing seriously inflated information about real student proficiency.
Thus, while the CEP report attempts to downplay inflation in existing state assessments, it appears those assessments are unlikely to remain, anyway, at least not in their currently undemanding format. For sure, new tests are coming to Kentucky.