Why Are So Many Texas Students Struggling With STAAR?

A report from the Texas Education Agency this month showed nearly 105,000 students in the fifth and eighth grades couldn’t pass the reading portion of the State of Texas Academic Achievement Readiness exam (STAAR) in three tries, putting their graduation to the next grade in jeopardy. But if that number sounds high to you, it only scratches the surface of a long history of Texas students’ consistently subpar performance on standardized testing.

Let’s back up for a second. Although the number sounds extreme, an existing provision that allows individual committees to exempt students from the STAAR requirement means that, regardless of test scores, it’s likely most of these fifth- and eighth-graders will end up being promoted to the next grade. The exemption, designed for less-than-stellar test takers, allows school districts to evaluate students on their achievements other than STAAR test scores.

For example, the San Antonio Express-News reported that in 2014 San Antonio’s largest district only held back 13 of about 1,300 students who didn’t meet the STAAR requirement. The latest legislative session copied this exemption model for high school seniors, and, with the backing of many education advocacy organizations, the measure flew through the Senate, House, and across the governor’s desk.

But 105,000 is still a huge number of students to evaluate individually, even when its divided at a district level. So that brings us back to a bigger question that lawmakers, educators, students, and parents have been grappling with for years: why are so many students failing?

The short answer? Nobody knows. In the four years since STAAR replaced the Texas Assessment of Knowledge and Skills (TAKS), student success rates on the different sections of the test have hovered around 70 percent. When STAAR was introduced, it was easy enough to chalk these stagnant numbers up to a necessary adjustment period for the new format. The expectation, however, was that student scores would begin to steadily increase — they didn’t.

Walter Stroup, an associate professor at the UT Austin School of Public Education, says the test format is to blame. He captured the attention of legislators and education advocacy organizations two legislative sessions ago when he found that data from the 2007-2008 TAKS results suggested that the test format may not be conducive to student improvement.

The TAKS data turned out to be pretty predictive of STAAR’s in that student improvement was nonexistent. According to Stroup, this is because the tests are designed using an item-response theory (IRT) model that winds up scoring students based on the performance of others rather than what they’ve actually learned.

His data suggest the tests are “insensitive to instruction,” meaning there’s no indication that the results demonstrate what students are learning in the classroom. Rather, they suggest a student’s ability to take the test itself, which has all kinds of implications for the conversation surrounding standardized testing in the state.

“It’s pernicious to shift the burden to the kids when we know full well that the tests aren’t sensitive to what happens during the school year,” Stroup says, while also insisting he isn’t anti-testing or anti-accountability, but that he’s simply “concerned the tests aren’t measuring what they need to measure.”

His testimony before the House Committee on Public Education in 2012 led to sharp criticism from Pearson, the corporate testing giant behind STAAR, and the Texas Education Agency. Pearson turned their attention toward discrediting Stroup’s research, but the theory that item-response theory is perpetuating this problem is beginning to spread. The American Statistical Association, for one, has since come out condemning the assessment of teachers based on standardized testing scores, saying that “teachers account for about 1% to 14% of the variability in test scores.”

But TEA stands by the test, with officials suggesting that low scores were due to the quality of instruction. With the release of the most recent test scores, they seemed more hopeless than anything. “It appears that we just aren’t going to see large gains from year to year,” TEA spokeswoman Debbie Ratcliffe told the Dallas Morning News.

And the agency isn’t the only one defending STAAR. Others have emphasized their belief that the school system and instruction are at fault, and the state’s effort should be spent improving schools rather than changing the test. Todd Williams, Dallas Mayor Mike Rawling’s education advisor, wrote in D Magazine:

In some ways, my analogy is the individual who desperately wants to lose weight. Despite knowing that the key fundamental drivers are exercise and diet, they instead want to question the integrity and appropriateness of the scale.

Pearson lost its contract with the state for the next four years, and the Educational Testing Service (ETS), the company behind the SAT, will step in. But Stroup doubts that will change much, considering the test will still use the current IRT design he’s decried. Ask any parent, teacher or child in the system, and they will tell you that something needs to change, but nobody’s 100 percent sure what that is.