Measuring Teachers' Effectiveness: A Report from Phase 3 of Pennsylvania's Pilot of the Framework for Teaching

Published: Apr 23, 2015

Publisher: Washington, DC: Mathematica Policy Research

Associated Project

Pennsylvania Teacher and Principal Evaluation Pilot

Time frame: 2010-2015

Prepared for:

Team Pennsylvania Foundation

The Gates Foundation

Authors

Stephen Lipscomb

Jeffrey Terziev

Duncan Chaplin

Key Findings

Key Findings:

Teacher performance, as captured by the Framework for Teaching (FFT), was generally rated in the top two possible performance categories (distinguished or proficient) in 2012–2013 and 2011–2012 in the Pennsylvania districts covered in our study.
Less than 0.1 percent of the teachers in our study were rated in the bottom category (failing).
The FFT scores were internally consistent, meaning that the domains and the components within each domain appear to be measuring similar concepts.
The correlations of the FFT scores with value-added measures scores were all positive and generally statistically significant, ranging from 0.19 to 0.22 by domain.

In this report we analyzed data on two measures of teacher performance—one (the Framework for Teaching, or FFT) is based largely on classroom observations, and the other (value-added measures, or VAM) is based on student test scores. The data we analyzed cover 6,676 teachers from 269 districts in the state of Pennsylvania, including Pittsburgh public schools. The observation-based data describe teacher performance on the 22 components of the FFT of Charlotte Danielson. Each of these components is designed to capture a separate teaching practice. We used these data to estimate four domain scores and one overall Professional Practice Rating (PPR) score. We merged these scores with data on teachers’ estimated contributions to student achievement growth. Based on these pilot data from the 2012–2013 school year, we estimate that, although less than 13 percent of teachers received the top rating (distinguished) for the overall PPR score, almost 85 percent were rated in the second highest category (proficient). Less than 0.1 percent were rated in the bottom category (failing). The remaining teachers (around 2.6 percent) received needs improvement ratings. FFT scores were internally consistent, meaning that the domains and the components within each domain appear to be measuring similar concepts. Teachers with higher FFT scores tended to produce greater student achievement growth. The correlations of the FFT scores with VAM scores were all positive and generally statistically significant, ranging from 0.19 to 0.22 by domain. We compared the results based on the 2012–2013 data with results based on 2011–2012 data from a previous pilot phase. For the most part, the findings were similar. More than 90 percent of teachers were rated in the top two performance categories in both phases, although the fraction of ratings in the top two categories decreased somewhat in Pittsburgh (which contributed more teachers to the pilot than any other district). The levels of internal consistency were in the acceptable to good ranges in both phases, with the overall PPR score having higher consistency than any of the domain scores in both phases. The correlations between parts of the FFT and VAM scores were almost always positive but also below 0.30 in both phases. The lowest correlations in 2011–2012 improved slightly in 2012–2013. In sum, although FFT scores are overwhelmingly concentrated in the top two performance categories, the positive correlations with VAM suggest that the FFT provides some meaningful differentiation and captures aspects of teacher skills related to student achievement growth.

Efficiency Meets Impact.
That's Progress Together.

To solve their most pressing challenges, organizations turn to Mathematica for deeply integrated expertise. We bring together subject matter and policy experts, data scientists, methodologists, and technologists who work across topics and sectors to help our partners design, improve, and scale evidence-based solutions.

Work With Us

Evidence Library

Measuring Teachers' Effectiveness: A Report from Phase 3 of Pennsylvania's Pilot of the Framework for Teaching

Associated Project

Pennsylvania Teacher and Principal Evaluation Pilot

Authors

Stephen Lipscomb

Jeffrey Terziev

Duncan Chaplin

Key Findings

Efficiency Meets Impact.
That's Progress Together.

Explore

Engage

Content Libraries

Measuring Teachers' Effectiveness: A Report from Phase 3 of Pennsylvania's Pilot of the Framework for Teaching

Associated Project

Pennsylvania Teacher and Principal Evaluation Pilot

Authors

Stephen Lipscomb

Jeffrey Terziev

Duncan Chaplin

Key Findings

More like this from Mathematica

Evaluation of the Networks for School Improvement (NSI) Initiative

Early Childhood Educator Pay Equity Fund (PEF): Do Economic Returns Change Over Time?

The Impact of Registered and Unregistered Apprenticeship

Principles and Promising Practices for Hiring and Retaining Young Autistic Workers

Efficiency Meets Impact. That's Progress Together.

Efficiency Meets Impact.
That's Progress Together.