Nearly 300 middle school mathematics teachers in Nashville, Tenn., voluntarily took part in the Project on Incentives in Teaching, a three-year randomized experiment conducted by researchers affiliated with the National Center on Performance Incentives at Vanderbilt University. It was designed to study the hypothesis that a large monetary incentive would cause teachers to seek ways to be more effective and boost student scores as a result.
But it yielded only two small positive findings, limited to 5th graders in the second and third year of the experiment. No effects were seen for students in grades 6-8 in any year of study.
At the same time, however, participating teachers did not report finding the pay program’s goals for students out of reach or its impact on school culture damaging, two concerns that have been among those voiced by opponents of performance pay.
The implementation of the pay program “did not set off significant negative reactions of the kind that have attended the introduction of merit pay elsewhere,” the study’s authors write. “But neither did it yield consistent and lasting gains in test scores. It simply did not do much of anything.”
The findings arrive in a highly charged teacher-quality policy environment, in which many states and districts, with support from the Obama administration, are overhauling current practices for preparing, evaluating, and compensating teachers.
And they come at a particularly inopportune time for the U.S. Department of Education, which is scheduled to announce a fresh slate of grantees this month under a federal program designed to seed merit-pay programs for teachers and principals.
The study, known as POINT for the Project on Incentives in Teaching, was designed by the researchers, with the input of the 76,000-student school district and the support of the local teachers’ union affiliate and the Tennessee Education Association. Matthew G. Springer, the director of the Nashville-based center, cited the unions’ cooperation as a crucial factor in the study’s successful implementation.
The executive director of the Tennessee Education Association said the reputation of the researchers played an important role in the union’s decision to sign on. “We thought it was a chance to work with researchers whose processes and reputation we trust, and they were coming at this question with no particular ideology,” said Al Mance. “We said, ‘OK, this is something we really want to know. We won’t have a better opportunity than this.’ ”
The program was instituted in Nashville between 2006-07 and 2008-09 and covered 296 middle school math teachers in grades 5-8.
Participating teachers, all volunteers, were assigned to either a treatment group eligible to receive significant pay bonuses or a control group earning normal wages. Those in the treatment group were rewarded with bonuses between $5,000 and $15,000 based on whether their students’ achievement rose by a specified amount over the course of a year. The gains were calculated using a value-added methodology designed to filter out other aspects that could have influenced the scores.
The teachers were also randomized in clusters, so that there was at least one treatment and one control teacher in every middle school. And the program contained no quotas, so all teachers whose students performed at the specified targets earned the additional pay.
Over the course of the study, attrition reduced the number of participating teachers to only 148, and researchers carefully tracked that pattern over time to make sure it did not change the equivalence of the two groups in such a way as to skew the results. Only one teacher withdrew from the study; most of the attrition occurred because teachers were reassigned or left the district.
On average, students taught by the teachers taking part in the program did not make larger academic gains than those taught by teachers in the normal wage group.The sole exception was in grade 5 in the second and third years of study.
In those years, the incentive pay was linked to statistically significant increases in student scores—an increase, the report states, equal to between a third and a half year of learning. But the effect did not appear to persist.
“By the end of 6th grade,” the study states, “it does not matter whether a student had a treatment teacher in grade 5.”
The researchers performed a number of tests to try to make sense of the grade 5 findings, including to see whether there was evidence of a reallocation of time from other subjects to math, or cheating on the exams. But none of them turned up any firm explanation.
“It really is puzzling,” said Mr. Springer. “It just raises questions about what’s different about 5th grade and what factors played a role. Was it student development? The curriculum? Teaching or classroom structures?”
In interviews, scholars who study performance-based pay and teacher incentives and who were familiar with the POINT findings but not involved in the experiment, widely praised its rigorous design.
“It’s a really well-designed study, and it’s really important because a lot of the debate about performance pay has been evidence-free,” said Steven N. Glazerman, a principal researcher at Mathematica Policy Research, a Princeton, N.J.-based evaluation firm.
The existing empirical research literature on incentive pay has been limited in scope, size, and relevance. Much of the experimental research concerns programs in other countries.
What’s more, many of the existing performance-pay programs studied in the United States award far smaller bonuses, and scholars have questioned whether those amounts were enough to affect a change in teacher behavior.
But the POINT findings, said some researchers and advocates, appear to put to rest the idea that incentive pay in and of itself is enough to spur better teacher performance.
“A lot of the discussion about performance pay is based on a faulty assumption that the reason we don’t have higher test scores is that teachers are shirking their responsibilities,” said Helen F. Ladd, a professor of public policy and economics at Duke University in Durham, N.C., about the findings.
Ms. Ladd added, however, that she was “a little surprised” that the findings were not more mixed. She anticipated that teachers might work even harder over the short term to win bonuses. But that supposition was not borne out by the study.
Mr. Mance of the Tennessee Education Association said the study confirms what many teachers and unions have long believed: that teachers are already hardworking. For this study to show positive results, he said, “you’d have to have teachers who were saving their best strategies for an opportunity to get paid for them, and that is an absurd proposition.”
Researchers cautioned, however, that the Nashville experiment does not provide answers to many other questions about incentive pay. For instance, it wasn’t designed to test the hypotheses that pay incentives might serve as a draw to a different population of teacher-candidates or as an incentive for other candidates to stay in the profession—thus potentially changing the quality of the teacher workforce.
“I personally believe that the biggest role of incentives has to do with selection of who enters and who stays in teaching—how incentives change the teaching corps through entrance and exits,” said Eric A. Hanushek, a professor of economics at the Hoover Institution at Stanford University. “The study has nothing to say about this.”
And because the study looks at an incentive program strictly as pay, it remains unclear how far the findings can be extrapolated to incentives with more features, such as professional development, differentiated roles, or a new teacher-evaluation system.Many well-known incentive-pay models, including Denver’s ProComp system and the popular Teacher Advancement Program, sponsored by the Santa Monica, Calif.-based National Institute for Excellence in Teaching, contain such elements.
ver the use of test scores as a measure of student learning and teacher effectiveness remains a top concern for teachers. Surveys of participants for POINT found that a majority generally supported higher pay for teachers whose students made achievement gains. Yet in 2009, about 85 percent said they felt the test-based criteria for determining effectiveness were too narrow.
That lack of buy-in, the study’s authors postulated, might have contributed to the finding of no differences in how the control and treatment groups affected instruction.
From a policy perspective, performance pay has experienced a type of renaissance over the past six years, following the introduction in 2004 of the ProComp and in 2006 of the federal Teacher Incentive Fund, or TIF, a program established under the administration of President George W. Bush to seed performance-pay systems.
Since 2008, the Obama administration has embraced TIF and has put its own stamp on performance pay through the Race to the Top competition, which encouraged states to institute new systems for evaluating teachers and for using the results of those evaluations to inform pay decisions.
“While this is a good study, it only looked at the narrow question of whether more pay motivates teachers to try harder,” a spokeswoman for the U.S. Department of Education said in an e-mail. “What we are trying to do is change the culture of teaching by giving all educators the feedback they need to get better while rewarding and incentivizing the best to teach in high need schools and hard-to-staff subjects.”
The effects of the report on that policy agenda are not clear, but in the short run at least, proponents of merit pay are likely to steer clear of replicating the features of the Nashville program.
“Anyone about to implement a performance-based pay system will want to pay very close attention to this study, to learn from the POINT program’s successes, but especially its shortcomings,” said Mr. Glazerman of Mathematica. “These groups bear a heavy burden to figure out how their own programs can demonstrate a greater impact than what we’ve seen so far.”
“I think most people today agree that the existing compensation structure for teachers is broken, but we don’t know what a better way is,” added Mr. Springer of the Vanderbilt center. “This experiment is one step in the right direction in terms of building our knowledge base, but we need to continue to build that base and test other program designs.”