Strengthening Research Methods

Our highly trained and experienced staff develop new approaches for the design and analysis of rigorous implementation, outcome, and impact evaluations. Mathematica also helps clients understand the evidence they have or the research they want to fund, by using transparent and scientific standards to assess the quality of study reports and plans. Mathematica helps policymakers, funders, and other decision makers navigate their options, using sound science as a guide. Some examples of our work in the area of strengthening methods include:

  • "Smarter, Better, Faster: The Potential for Predictive Analytics and Rapid-Cycle Evaluation to Improve Program Development and Outcomes." Scott Cody and Andrew Asher, June 2014. Public administrators have always been interested in identifying cost-effective strategies for managing their programs. As government agencies invest in data warehouses and business intelligence capabilities, it becomes feasible to employ analytic techniques used more-commonly in the private sector. Predictive analytics and rapid-cycle evaluation are analytical approaches that are used to do more than describe the current status of programs: in both the public and private sectors, these approaches provide decision makers with guidance on what to do next.
  • "Recognizing  and Conducting Opportunistic Experiments in Education: A Guide for Policymakers  and Researchers." Alexandra Resch, Jillian Berk, and Lauren Akers. April 2014. Opportunistic experiments are a type of randomized controlled trial that studies the effects of a planned intervention or policy change with minimal added disruption and cost. This guide defines opportunistic experiments and provides examples, discusses issues to consider when identifying potential opportunistic experiments, and outlines the critical steps to complete opportunistic experiments. It concludes with a discussion of the potentially low cost of conducting opportunistic experiments and the potentially high cost of not conducting them. Readers will also find a checklist of key questions to consider when conducting opportunistic experiments.
  • "Orthogonal Design: A Powerful Method for Comparative Effectiveness Research with Multiple Interventions." Issue Brief. Jelena Zurovac and Randy Brown, April 2012. Orthogonal design affords an opportunity to design interventions in real-world settings and to study intervention components that can be implemented in various ways. This issue brief introduces orthogonal design, describes key design and implementation considerations, and illustrates how it can be applied in comparative effectiveness research studies.
  • "Using State Tests in Education Experiments: A Discussion of the Issues." Technical Methods Report. Henry May, Irma Perez-Johnson, Joshua Haimson, Samina Sattar, and Phil Gleason, November 2009. Securing data on students' academic achievement is typically one of the most important and costly aspects of conducting education experiments. As state assessment programs have become practically universal and more uniform in terms of grades and subjects tested, the relative appeal of using state tests as a source of study outcome measures has grown. However, the variation in state assessments—in both content and proficiency standards—complicates decisions about whether a particular state test is suitable for research purposes and poses difficulties when planning to combine results across multiple states or grades. This discussion paper aims to help researchers evaluate and make decisions about whether and how to use state test data in education experiments. It outlines the issues that researchers should consider, including how to evaluate the validity and reliability of state tests relative to study purposes; factors influencing the feasibility of collecting state test data; how to analyze state test scores; and whether to combine results based on different tests. It also highlights best practices to help inform ongoing and future experimental studies. Many of the issues discussed are also relevant for nonexperimental studies.
  • "Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes?" Technical Methods Report. Peter Z. Schochet, October 2009. For randomized controlled trials (RCTs) of education interventions, it is often of interest to estimate associations between student and mediating teacher practice outcomes, to examine the extent to which the study's conceptual model is supported by the data, and to identify specific mediators that are most associated with student learning. This paper develops statistical power formulas for such exploratory analyses under clustered school-based RCTs using ordinary least squares (OLS) and instrumental variable (IV) estimators, and uses these formulas to conduct a simulated power analysis. The power analysis finds that for currently available mediators, the OLS approach will yield precise estimates of associations between teacher practice measures and student test score gains only if the sample contains about 150 to 200 study schools. The IV approach, which can adjust for potential omitted variables and simultaneity biases, has very little statistical power for mediator analyses. For typical RCT evaluations, these results may have design implications for the scope of the data collection effort for obtaining costly teacher practice mediators.
  • "The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions." Technical Methods Report. Peter Z. Schochet, August 2009. This paper examines the estimation of two-stage clustered RCT designs in education research using the Neyman causal inference framework that underlies experiments. The key distinction between the considered causal models is whether potential treatment and control group outcomes are considered to be fixed for the study population (the finite-population model) or randomly selected from a vaguely defined universe (the super-population model). Appropriate estimators are derived and discussed for each model. Using data from five large-scale clustered RCTs in the education area, the empirical analysis estimates impacts and their standard errors using the considered estimators. For all studies, the estimators yield identical findings concerning statistical significance. However, standard errors sometimes differ, suggesting that policy conclusions from RCTs could be sensitive to the choice of estimator. Thus, a key recommendation is that analysts test the sensitivity of their impact findings using different estimation methods and cluster-level weighting schemes.
  • "Estimation and Identification of the Complier Average Causal Effect Parameter in Education RCTs." Technical Methods Report. Peter Z. Schochet and Hanley Chiang, April 2009. In RCTs in the education field, the complier average causal effect (CACE) parameter is often of policy interest because it pertains to intervention effects for students who receive a meaningful dose of treatment services. This report uses a causal inference and instrumental variables framework to examine the identification and estimation of the CACE parameter for two-level clustered RCTs. The report also provides simple asymptotic variance formulas for CACE impact estimators measured in nominal and standard deviation units. In the empirical work, data from 10 large RCTs are used to compare significance findings using correct CACE variance estimators and commonly used approximations that ignore the estimation error in service receipt rates and outcome standard deviations. Our key finding is that the variance corrections have very little effect on the standard errors of standardized CACE impact estimators. Across the examined outcomes, the correction terms typically raise the standard errors by less than one percent, and change p-values at the fourth or higher decimal place.
  • "The Late Pretest Problem in Randomized Control Trials of Education Interventions." Technical Methods Report. Peter Z. Schochet, October 2008. This report addresses pretest-posttest experimental designs that are often used in RCTs in the education field to improve the precision of the estimated treatment effects. For logistic reasons, however, pretest data are often collected after random assignment, so that including them in the analysis could bias the posttest impact estimates. Thus, the issue of whether to collect and use late pretest data in RCTs involves a variance-bias tradeoff. This paper addresses this issue both theoretically and empirically for several commonly used impact estimators using a loss function approach that is grounded in the causal inference literature. The key finding is that for RCTs of interventions that aim to improve student test scores, estimators that include late pretests will typically be preferred to estimators that exclude them or that instead include uncontaminated baseline test score data from other sources. This result holds as long as the growth in test score impacts do not grow very quickly early in the school year.
  • "Statistical Power for Regression Discontinuity Designs in Education Evaluations." Technical Methods Report. Peter Z. Schochet, August 2008. This report examines theoretical and empirical issues related to the statistical power of impact estimates under clustered regression discontinuity (RD) designs. The theory is grounded in the causal inference and hierarchical linear modeling literature, and the empirical work focuses on commonly used designs in education research to test intervention effects on student test scores. The main conclusion is that three to four times larger samples are typically required under RD than experimental clustered designs to produce impacts with the same level of statistical precision. Thus, the viability of using RD designs for new impact evaluations of educational interventions may be limited, and will depend on the point of treatment assignment, the availability of pretests, and key research questions.
  • "Guidelines for Multiple Testing in Impact Evaluations." Technical Methods Report. Peter Z. Schochet, May 2008. This report presents guidelines for education researchers that address the multiple comparisons problem in impact evaluations in the education area. The problem occurs due to the large number of hypothesis tests that are typically conducted across outcomes and subgroups in evaluation studies, which can lead to spurious significant impact findings.