I took my first standardized test thirty years ago. I was in third grade at Seele Elementary in New Braunfels, and it was the Texas Assessment of Academic Skills, or TAAS, test. Our teacher, Bunny Hollis—she and her cardigans were straight out of a heartwarming midcentury children’s novel—called for our attention one afternoon and briefed us on the upcoming exam with not a hair of her perfectly stationary coiffure askew. I don’t remember exactly what she said, but I do remember the nonchalance. It was something to this effect: “There’s going to be a test tomorrow. Don’t worry about it; I’m your teacher, and I’m confident you know far more than you’ll need to know to pass. Please eat a big breakfast and don’t be late.”

And then we took the test, with no more novelty than the smell of new pencils. Standardized testing was nothing new—Texas started testing third, fifth, and ninth graders in 1980, and by 1993 we were on the third iteration of the exam. We’d had TABS, TEAMS, and now TAAS, and no one cared all that much about any of them or felt they were the truest measure of our nine-year-old academic capacity. Public confidence in the usefulness of the tests is still shaky—the Charles Butt Foundation surveyed parents and teachers in 2023 and found that 45 percent of public school parents and 81 percent of current and former public school teachers polled are not confident in the current test’s ability to measure student learning.

You’d never guess that if you visited a Texas elementary school at any point in the last two weeks.

I’ve been reporting on education in Texas since 2013, the second year of the State of Texas Assessments of Academic Readiness, or STAAR test (which replaced the TAKS test, which replaced the TAAS test). On one of my first campus visits I saw fourth graders in T-shirts with a Star Wars–themed play on the STAAR acronym. I thought it was a little overkill. In San Antonio, testing coincides with Fiesta season, so I assumed that some of the STAAR-themed decorations were unique to our papel picado–clad city. It turns out they are not unique, nor are they the most ebullient celebration of the standardized tests. Scrolling through social media this week, I saw posts from across the state. Marching bands and celebrity guests at pep rallies, reality TV–themed contests, special dress days, and more, all dedicated to the STAAR test.

This was also my first year to live the hype in my own home, to see it all in one place, as my third-grade daughter took the STAAR test after several months of escalating intensity. My usually unflappable, honor roll–earning nine-year-old had been talking since February about how nervous she was, how her number one goal this year was to pass the test. On the day before, third graders were allowed to wear pajamas and the school hosted a STAAR pep rally. My first grader came home with one of the signs his class had made to encourage the test takers. A brief glance at social media suggested that across the state, these signs are waved by the younger children as the test takers walk through the halls in a sort of parade of tributes being marched to the Colosseum.

My daughter’s mounting anxiety didn’t surprise Phyllis Fagell, a school-based licensed clinical professional counselor, Washington Post contributor, and author of the books Middle School Matters and Middle School Superpowers. What did surprise her was the fanfare itself. She’s not based in Texas, so she needed a little context when I asked her about “test hype.” I explained the bit about pep rallies and parades and Fagell was, frankly, astounded. “I can understand the well-meaning intention behind it, trying to bring some joy to a dreary experience,” she said, “but amping it up and having a lot of ritualistic activity, for some kids, is going to have the opposite effect.”

Fagell sees plenty of elementary and middle school kids with test anxiety. When she does, she said, she tries to calm them down, not rev them up. She gives them tools to get them out of their amygdala—the part of the brain that registers fear—and into their prefrontal cortex, where critical thinking happens. She encourages them to have a small item in hand going into the test, a good luck charm of sorts, and before the test to think of three words to describe the object. This requires the prefrontal cortex to activate and quiets the amygdala. But trying to do that or other calming practices after weeks of buildup could send a mixed message.

That confusion bothers Tiffani Leavitt, a high school teacher with a fifth-grader in San Antonio. It’s not just the festivities—though those do make the cognitive dissonance worse, she said—it’s the months of buildup. “Schools prep kids for this test all year long. They give tutoring around how to answer the questions, they give kids strategies on how to highlight and make charts and graphs in the margins of the tests, all the while telling the kids, ‘Don’t worry!’ For a developing brain to try and sort through whether to put this in the category of ‘important’ or not is too much!”

Kids are no fools. The tests are a big deal. The Texas Education Agency has taken steps to alleviate some of the pressure this year by giving individual kids less to worry about. Fifth and eighth graders no longer have to pass the test to advance to the next grade. Some of the multiple-choice questions have been changed to short-answer, or “claim, evidence, reasoning,” which my daughter informs me are “the best questions.” But, lo and behold, the pressure remains. Kids are not just picking up on the importance of the test itself, Fagell suggested. They are picking up on the adults’ anxiety. They don’t want to let anyone down.

The increased razzmatazz has coincided with rising stakes not just for the students, but for their schools. Bunny Hollis could be nonchalant because the test really wasn’t a big deal. It was 1993, and our TAAS scores were not yet factored into any form of public campus accountability. Weeks later, Governor Ann Richards would sign Senate Bill 7 into law, creating a statewide method of rating and evaluating campuses with standardized test scores as one of the criteria. These tests have been part of every iteration of accountability law since then. I say “part,” but that’s really only for high schools, where graduation rates and college and military readiness count toward the score, currently an A–F rating. In elementary and middle school, STAAR scores are the only data considered in the ranking. The stakes got even higher when Richards’s successor, Governor George W. Bush, took Senate Bill 7 with him to Washington and gave us No Child Left Behind, which tied federal funding to school performance.

Before we get too deep into this morass, I do want to say that I understand, and even support, the use of standardized testing to check for weak spots and gaps in student learning. Because teacher quality is the biggest factor in student success, I have been convinced by data that teacher accountability should include student outcomes, especially baseline knowledge measured by tests. As a parent and a reporter, I support the public having access to test scores, and other clear, equitable information about school quality. Educators largely agree that there’s some value in testing, said Kelli Moulton, former superintendent of Galveston ISD and chair of Raise Your Hand Texas’s Measure What Matters Assessment and Accountability Council. “We absolutely believe that the academic test has a place.” 

The details of how best to hold schools and teachers accountable are worth the debate they’ve engendered. But for all the tinkering we do to try to perfect the way we use test scores, the process of getting them seems to be increasingly unhealthy. There’s a slope between the high-level people trying to close the gaps in education and the tiny test takers generating the data, and anxiety snowballs as it rolls downhill.

Pediatric psychiatrist Theresa Treviño first got involved in test reform when she noticed the effects of high-stakes testing on parents who were worried that their developmentally normal five-year-old would be too squirmy for the standardized tests three years later. The prospect of high-stakes testing was shaping how parents judged their children’s fitness for school, Treviño said. She could put these parents’ minds at ease about the wiggliness of kindergartners, but she couldn’t alleviate the pressure the family clearly felt.

When Texas students began taking the STAAR test in 2012, Treviño joined with others to form Texans Advocating for Meaningful Student Assessment. The organization regularly looks for examples of accountability systems in other states, such as New York, where assessments take more criteria into account to create a more accurate picture of what is happening inside a school. Her organization advocates at the Texas Legislature for inclusion of criteria other than testing—for example, allowing students to submit portfolios of work or including extracurricular offerings as an additional measure of school quality. Proponents of high-stakes testing claim that emphasizing curriculum and instruction improves both students’ classroom experience and test scores—that’s the idea behind the whole testing enterprise. By limiting the accountability measures to academic knowledge, the tests prevent schools from masking illiteracy with other factors and, in theory, incentivize better instruction, not just test prep or school spirit. In 2019, concerns were raised about whether the STAAR was even grade-appropriate, but 2019 data from the TEA show that the content of the test is actually a good measure of the curricula taught in school. But at the end of the day, all of that teaching and instruction comes down to one day, one test, one pressure point.

Beginning in October 2021, Raise Your Hand Texas, a nonprofit organization that advocates for Texas public education policies, talked to more than 15,600 Texas parents, teachers, students, and community members about accountability in schools and learned that parents would like to see ratings that consider three other measurable factors: students’ feelings of safety and engagement, teacher quality, and enrichment activities. “We could build these indicators,” senior director of policy Libby Cohen said. “We have the data.”

The current A–F rating seems simple and straightforward, Cohen said, until parents try to understand the scores that created it, which are measured against other schools and against a school’s scores in previous years. The complexity is born of an effort to make the accountability system more equitable, because raw test scores tend to track most closely with one variable: income. In the first years TAAS scores carried their newfound weight, Bunny Hollis still probably didn’t need to worry too much. She was a phenomenal teacher, sure, but more saliently, only 38 percent of New Braunfels ISD students were economically disadvantaged in 1995. From 1995 to 2002 Seele Elementary carried either “acceptable” or “recognized” status. Meanwhile, less affluent districts fell into year after year of penalties. While the scoring has been recalibrated recently to consider student growth, not just overall scores, the correlation between test scores and income persists, and schools that fall short one year often double down the next year trying to get higher scores. “Any time you [publicize a poor rating], you place a stigma on those schools and classes. That causes them to dig in deeper,” Moulton said.

Lawmakers regularly tell Cohen and Moulton that schools don’t have to make test prep the all-consuming theme of the spring semester. “But the reality of saying for an elementary or middle school that your public-facing rating is going to be one hundred percent based on these STAAR scores creates unavoidable pressure,” Cohen said.

Facilities managers are not allowed to mow on test days, kindergarten teachers are pulled from their classes to proctor tests, and high school seniors are sent home for the week during STAAR tests, Moulton said. The whole school is reorganized to create a pristine testing environment—a sharp contrast to the marching bands and pep rallies. Kids feel the weight placed on the day. On them.

Again, this isn’t irrational behavior as much as it is inevitable in a competitive landscape. As schools try to market themselves to parents with lots of choices, an A works much better than a C. Whether it’s the flight to the suburbs, the proliferation of charter schools, or the proposed voucher schemes of current legislative agendas, the battle for better test results has become nothing short of existential. That’s before you consider the threat of closure and state intervention, both of which depend largely on test scores. Only one group of people can deliver high test scores, and those people go to bed at 8:30 p.m. and know more about Pokémon than they do about pedagogy.

If they are treated like tributes marching into battle through halls lined with cheering villagers, waving banners and shouting praise, the fate of the kingdom resting on their tiny shoulders, it’s because, in many ways, they are.