Public policies aim to improve social welfare and address major societal issues. However, determining whether a given policy actually achieves its intended goals can be challenging. Randomized controlled trials (RCTs) have become an increasingly popular method for empirically evaluating the causal impacts of public policies and programs. When designed and implemented appropriately, RCTs can provide unbiased estimates of policy impacts. This allows policymakers to make evidence-based decisions about whether to expand, modify, or discontinue specific policies and programs.
This article provides an overview of using RCTs to evaluate public policies. It begins by discussing the need for policy evaluation and the advantages of RCTs compared to other evaluation methods. The article then outlines the key steps involved in conducting RCTs of public policies: identifying research questions, choosing a sample population, designing the intervention and control conditions, randomizing study participants, collecting data, analyzing results, and translating findings into policy recommendations. Special considerations for conducting RCTs in partnership with government agencies are also discussed. The article concludes by examining limitations and ethical considerations of using RCTs to assess policies and programs. It also provides concrete examples of influential RCT studies of public policies across diverse fields such as education, health care, criminal justice, and economic welfare.
The Need for Policy Evaluation
Implementing public policies requires substantial investments of public funds and resources. Therefore, policymakers have an obligation to understand whether new programs and reforms achieve their intended objectives once implemented in the real world. Historically, public policies were often designed based on theory, good intentions, or ideology without much empirical evidence regarding their effectiveness (1). However, theories and good intentions alone do not guarantee that a policy will work as expected in practice. Policies can fail or have unintended consequences due to faulty assumptions, poor implementation, or complex real-world conditions not considered during the design process (2).
Policy evaluation provides empirical feedback about what is and is not working once a policy is rolled out. This can facilitate continuous improvement and learning to maximize policy success over time. Evaluations can detect implementation problems, highlight ways to improve program delivery, and provide accountability for whether public dollars are being spent effectively (3). Policy evaluations also build generalizable knowledge about what types of programs work best for improving social welfare across different contexts (4). Evidence gained from policy evaluations can inform decisions about whether to expand, modify, discontinue, or redesign programs and reforms. As such, evaluations represent a key tool for facilitating data-driven improvements in public policy.
Advantages of Randomized Controlled Trials
A randomized controlled trial (RCT) is a type of impact evaluation that can provide strong evidence regarding a policy or program’s effectiveness. In an RCT, study participants are randomly assigned either to receive the policy/program (the intervention group) or to continue with business as usual (the control group). By randomly assigning participants, all observed and unobserved characteristics such as motivation, socioeconomic status, and ability are expected to be evenly distributed across the intervention and control groups. This allows the groups to be directly compared in terms of key outcome measures of interest. Any differences observed in outcomes between the groups can be attributed to the intervention rather than to inherent differences between the groups (5).
RCTs are considered the “gold standard” for evaluating interventions because randomization minimizes selection bias and provides the strongest evidence of causal impacts (6). Alternative evaluation methods such as regression analysis or matched comparison groups can suffer from selection bias whereby the intervention group differs systematically from the comparison group even before the intervention begins. RCTs mitigate this risk and allow researchers to isolate the impact of the policy itself. Findings from rigorously conducted RCTs are considered highly credible by economists and policy experts (7).
While no evaluation design is perfect, RCTs have important advantages compared to other methods (8):
- They provide unbiased estimates of an intervention’s true impact. The random assignment of participants minimizes selection bias.
- They allow strong causal conclusions. Differences between intervention and control groups can be attributed to the policy.
- They have high internal validity within the study context. Impacts on the specific study population can be measured with minimal bias.
- They can be replicated and scaled. RCT procedures and findings can be reproduced and expanded to other contexts.
- They align with principles of scientific inquiry. RCTs draw from experimental methods used across scientific disciplines.
These strengths have made RCTs an increasingly popular tool for empirically evaluating policies worldwide over the past few decades (9). Researchers and governments have leveraged RCTs to assess policy impacts across diverse areas including health care, education, criminal justice, social welfare, child development, and economic development (10). Later sections provide concrete examples of influential RCTs evaluating public policies.
Steps in Conducting a Randomized Controlled Trial of a Public Policy
Conducting a rigorous RCT to evaluate a public policy intervention involves careful planning and coordination among researchers, policymakers, and other stakeholders. Key steps include (11):
- Identify the policy questions to be answered by the RCT.
- Determine an appropriate sample population for studying the policy.
- Design the intervention and control conditions.
- Randomly assign study participants to create intervention and control groups.
- Collect data on relevant outcome measures.
- Analyze results using statistical methods.
- Translate findings into policy recommendations.
The following sections discuss each of these steps in greater detail.
Identifying Research Questions
The first step is determining the specific policy impacts to be measured through the RCT. Research questions should address key information needed by policymakers and stakeholders. Common research questions include (12):
- Does the policy affect the primary outcomes it was designed to improve? For example, does a job training program improve employment rates?
- How large are the policy’s effects on key outcomes?
- How do impacts vary for different population subgroups?
- How cost-effective is the policy?
- What operational factors influence the policy’s success?
- What are participant experiences and perspectives on the policy?
- What unintended or unexpected outcomes does the policy produce?
The research questions guide decisions about the target population, intervention design, and data collection. The questions should align with the needs of policymakers and fill critical evidence gaps to support data-driven policy improvements (13). Engaging policymakers and researchers collaboratively in formulating the research questions helps ensure the study focuses on the most policy-relevant issues.
Choosing the Study Sample
Once the guiding research questions are determined, the sample population must be selected. The sample includes both individuals who will receive the intervention and those who will form the control group. The sample population should match the target population the policy aims to impact (14). For example, to test the impacts of a subsidized employment program for disadvantaged adults, the study sample would need to include individuals meeting the program’s eligibility criteria. Additionally, the sample must be large enough to provide sufficient statistical power to detect meaningful impacts of the policy (15). The sample size needed depends on the expected size of the policy’s effects and other design factors.
When rolling out a new policy nationwide or statewide, it is not feasible to randomly assign all eligible recipients to create intervention and control groups. Instead, RCTs typically randomly assign a subset of the eligible population within certain geographic areas or jurisdictions (16). For instance, a welfare reform policy may be randomly implemented across certain city districts but not others. The districts where the policy is active serve as the intervention group, while the districts continuing with the status quo serve as the control group. This enables the policy’s impacts to be isolated by comparing average outcomes between districts. When well-designed, findings from these RCT subgroups can generalize to the broader population eligible for the program (17).
Designing the Intervention and Control Conditions
A key step in RCT design is defining the specifics of the policy intervention to be tested. This includes determining what program components and services will be provided, by what means, over what duration, and to which subgroups (18). The intervention should closely mirror how the policy would actually be delivered if implemented on a larger scale. However, some additional monitoring activities such as tracking participation and collecting survey data may be needed for the trial that are not part of full policy implementation.
The control group provides a counterfactual showing what outcomes would be if the policy was not implemented. The control condition should represent “business as usual” rather than an absence of any existing programs or services (19). For example, evaluating how a new 60-hour job training program affects earnings would require a control group that can access the standard job training programs already available rather than no programs at all. The control condition could mirror the intervention in areas unrelated to the policy changes being tested, such as undergoing the same data collection procedures. However, controls should not receive any components of the intervention being evaluated.
Random Assignment
After determining the study sample and intervention/control conditions, individuals must be randomly assigned between the two groups. Simple randomization provides each study participant an equal probability of being assigned to the intervention versus control group, typically using a computerized algorithm. More complex methods such as stratified randomization can also help ensure the groups are highly balanced across covariates predictive of the outcome, such as age, race, income level, or employment history (20). This further minimizes differences between groups at baseline.
Maintaining and documenting the integrity of the randomization process is critical, as any failures undermine the assumptions behind drawing causal conclusions from RCT findings (21). Additionally, some RCTs utilize blinded randomization where neither participants nor those implementing the intervention know assignment status. This can reduce biases in service delivery, attrition, and self-reported outcomes between groups. While seldom achievable in public policy trials, blinding provides another layer of rigor (22).
Collecting Data
RCTs require comprehensive data collection on relevant outcome measures, covariates, implementation parameters, and costs. Data collection occurs before randomization to establish a baseline, during intervention rollout to track participation, and post-intervention to measure impacts (23). Having valid outcome measures that closely align with the policy objectives and research questions is essential. Administrative records and interviews/surveys with participants can provide data on outcomes like employment, income, health status, and educational attainment. Exploring qualitative experiences through case studies and focus groups can supplement quantitative data (24). Collecting information on implementation factors such as staffing, compliance, and barriers/successes can elucidate how intervention impacts are achieved. Finally, documenting costs is necessary for assessing cost-effectiveness.
Consistent data collection protocols must be established for the intervention and control groups. While more extensive measurement is often needed for the intervention group, key outcomes and covariates must also be tracked for controls (25). Statistical power depends on having sufficient sample sizes for both groups. Minimizing missing data and participant attrition during the study period is also important to mitigate bias and maintain randomization integrity.
Analyzing Results
Once outcome data is collected after sufficient follow-up, results are analyzed using statistical methods to quantify the policy’s impacts. Intention-to-treat analyses compare average outcomes between all individuals originally randomized to the intervention versus control conditions, regardless of whether participants actively engaged with the intervention (26). This preserves the benefits of randomization. Missing data can be addressed using methods like multiple imputation. Assessing intervention effects across relevant subgroups, time points, and outcome measures provides deeper insights (27). Covariate adjustment and regression modeling help account for differences in baseline characteristics. Testing pre-specified hypotheses makes findings more robust. Partnering with statisticians helps ensure appropriate analytical methods (28).
Beyond statistical tests, calculating effect sizes and confidence intervals conveys policy impacts in tangible metrics like differences in earnings, test scores, or employment rates. Conducting power analyses confirms whether null results reflect a truly negligible impact versus insufficient statistical power. Cost-effectiveness analyses weigh outcomes against financial costs. Integrating qualitative data provides nuance around experiences and implementation (29). Comparing the RCT’s outcome measures and effects with results from other studies assessing the same policy strengthens the evidence base.
Translating Findings into Policy Recommendations
Objectively analyzing and reporting RCT results to policymakers is critical for translating findings into reforms and improvements (30). Providing clear takeaways regarding what worked and what did not is more influential than focusing solely on statistical minutiae. Results should be conveyed in accessible language and formats. Visual data displays and summaries help communicate key findings. To ensure credibility and trust in findings, the study methodology and analyses should be thoroughly documented and open to review.
Of course, RCT results alone are rarely definitive. Findings should be interpreted within the broader context of existing theory and evidence on the policy area (31). RCTs test specific intervention variants within certain populations and settings. Qualitative insights into implementation factors can aid in explaining results and assessing generalizability. Policy recommendations flowing from RCTs may involve expanding, modifying, or replacing the tested interventions to amplify positive impacts and minimize negative ones (32). Control group practices associated with better outcomes can also inform policy revisions. Rather than providing rigid prescriptions, RCTs offer valuable data to strengthen the ongoing evolution of public policies.
Conducting Policy RCTs in Partnership with Government Agencies
While some RCTs are led entirely by academic researchers, coordination with government agencies implementing the policies enables more policy-relevant trials. Agency partnership provides researchers with access to policy participants, administrative data, and operational insights. It also facilitates communicating findings directly to policymakers poised to enact changes. Key strategies for productive partnerships include (33):
- Engage agencies in formulating research questions to ensure relevance.
- Obtain agency support for random assignment protocols.
- Maintain open communication channels throughout the process.
- Share results promptly and transparently with agency partners.
- Frame recommendations in actionable terms.
Agencies may justifiably worry RCTs could temporarily disrupt services or that findings may reflect poorly on current practices. Managing expectations and emphasizing the shared goal of improving outcomes can mitigate concerns (34). Testing incremental changes to existing policies is often more feasible than radically new reforms. Allowing agencies to participate while maintaining researcher independence and objectivity is crucial (35).
Despite challenges, government-researcher partnerships enable RCTs providing more useful real-world evidence than ivory tower studies or observational analyses alone (36). Federal funding for experiments with state and local policies through initiatives like the Education Innovation and Research program highlights the perceived value of such partnerships (37). As agencies build technical capacity for using evidence, opportunities are growing for policy RCTs tightly linked to government needs (38).
Limitations and Ethical Considerations
While RCTs provide rigorous impact estimates, the approach has limitations. RCTs can be expensive, complex to implement at scale, and time-consuming to yield results (39). Randomizing policies with large spillover effects can create complications (40). Practical and political factors constrain the types of policies amenable to experimentation (41). For controversial reforms, an initial RCT may be required before broader rollout is considered feasible (42).
Additionally, some argue randomly denying services to control groups raises ethical concerns, especially for policies affecting disadvantaged populations (43). However, many public programs already ration access due to limited resources. RCTs simply introduce scientific randomization into this preexisting process. And findings about effective programs can ultimately expand access and improve welfare broadly (44). Still, obtaining informed consent from participants and allowing controls to access similar existing services mitigate ethical issues (45).
Despite challenges, RCTs have provided invaluable, scientifically rigorous evidence across diverse policy domains. The following sections provide examples of influential RCTs that shaped policy decisions and understanding of best practices in important areas.
Education Policy
A large body of RCT research has evaluated education policies, including school accountability measures, teacher incentive pay, school choice programs, and educational technologies. For example, a series of RCTs studied school vouchers to attend private schools across locations from Washington DC to India (46-48). Students offered vouchers graduated high school and attended college at higher rates, suggesting vouchers improved outcomes. However, effects varied across groups, with larger impacts for African American students. Such nuanced results have informed ongoing debates regarding expanding school choice programs.
Other RCTs have studied teacher performance incentives with mixed results. While some experiments found pay-for-performance improved student test scores (49), others detected no impacts (50). Variable findings across sites likely reflect challenges in optimally designing and implementing performance metrics. This highlights the value of testing reforms empirically prior to systemwide adoption. Beyond student achievement, RCTs have also assessed education policies targeting outcomes like school attendance, disciplinary actions, and social-emotional skills (51-53). The Institute of Education Sciences’ Regional Educational Laboratory program has fueled over 75 RCTs testing education policies across diverse contexts (54).
Health Care Policy
Health care policies lend themselves well to RCT evaluations. Randomized clinical trials are the norm for testing medical treatments and drugs. Similar methods have assessed health insurance reforms, financial incentive programs, care management models, health technologies, and public health campaigns. For example, the RAND Health Insurance Experiment randomly assigned families to health plans with varying levels of cost-sharing to estimate how different coverage designs affected health care spending and health outcomes (55). This seminal study found that while higher cost-sharing reduced spending, it also caused some individuals to underuse highly effective care. Reporting changes in both spending and health helped inform subsequent insurance policy reforms.
More recent RCTs have tested innovative ways of delivering preventive health services. In one trial, financial incentives like grocery store gift cards tripled demand for HIV tests (56). Another experiment found publicity campaigns were less effective than free home delivery in increasing folic acid supplementation to prevent birth defects (57). Such studies objectively measure how to optimize preventive health programs. RCTs will continue generating evidence to improve health policies given their widespread acceptance in medicine and public health.
Criminal Justice Policy
Reducing crime and incarceration through evidence-based criminal justice policies has drawn substantial policy interest. RCTs have become popular for evaluating interventions like police practices, pretrial release programs, probation tactics, prisoner reentry services, and diversion programs aiming to limit incarceration among adolescents and individuals with mental illnesses (58-60).
For example, an RCT tested a Chicago program providing cognitive behavioral therapy to high-risk young men to prevent violence (61). Therapy participants were 63% less likely to be arrested for violent crimes, highlighting the intervention’s benefits. Another RCT found work release programs allowing confined individuals to maintain employment reduced recidivism after release compared to standard incarceration (62). Such studies isolate causal impacts on recidivism and clarify what programs help individuals desist from crime. They counteract reliance on overly punitive policies lacking empirical support. Criminal justice agencies have increasingly embraced experiments to guide reforms (63). RCTs will continue generating actionable evidence as interest grows in reducing incarceration and improving rehabilitation (64).
Social Welfare Policy
RCTs have informed policies aimed at reducing poverty and disadvantage. Assessing economic supports such as housing vouchers, welfare benefits, and guaranteed income pilots contributes to understanding income dynamics and the safety net’s role (65-67). Experiments have also evaluated job training initiatives, family support programs, and social service case management models (68-70).
Economic Development Policy
RCTs are gaining popularity for evaluating economic development initiatives in low- and middle-income countries. Randomized studies have tested programs aimed at goals like improving agricultural productivity, increasing access to credit and savings, promoting entrepreneurship, reducing barriers to trade, and facilitating workforce development (71-75). For example, RCTs found that grants supporting the transportation costs of crops to market substantially increased farmer incomes in Sierra Leone and Uganda (76). Another experiment showed microfinance loans increased short-term incomes but did not alleviate poverty over the long-term in rural India (77). Such studies help clarify what strategies effectively alleviate poverty in developing economies versus interventions yielding only modest temporary effects.
Governments and international agencies are conducting more RCTs to empirically evaluate aid programs given concerns that development efforts sometimes fail to achieve transformational change (78). While contextual differences between locations can affect generalizability, RCT findings provide more credible evidence than reliance on theory or good intentions alone (79). Growing use of RCTs has equipped policymakers with rigorous data to strengthen economic development programs worldwide.
Ongoing Challenges and Opportunities
Despite the proliferation of RCTs influencing policies across important domains, challenges persist in leveraging experiments to their full potential. Many public agencies lack technical expertise in evidence-based policymaking. Transitioning from political ideology or conventional wisdom to data-driven decision-making requires an organizational culture shift. Strengthening partnerships between policymakers, researchers, and funders can facilitate more relevant RCTs and mainstreaming of evidence-based reforms (80).
Advances in technology are creating new opportunities by enabling more flexible, faster, and lower-cost RCTs (81). Emerging big data resources allow linking RCTs with population-level administrative records. Improved remote data collection supports RCTs of nationwide policies and facilitates follow-up. Using multiple methods across study stages can optimize internal and external validity. Future directions also include increased citizen science partnerships, trials embedded in practice settings, and multisite RCT networks focused on key policy areas (82).
While not a panacea, RCTs can bring greater rigor to policy decisions and deepen knowledge of what works best for improving societal outcomes. Ongoing innovation in applying experimental methods to policy evaluation ensures RCTs will continue benefiting public welfare.
Conclusion
Randomized controlled trials provide scientifically rigorous evidence regarding the effectiveness of public policies and programs. When appropriately designed and implemented, RCTs can isolate the causal impacts of policy interventions. Random assignment to intervention and control groups minimizes selection bias and enables strong conclusions about what outcomes were achieved specifically due to the policy. While limitations exist, RCTs have major advantages over other evaluation methods. RCT findings have informed policy improvements across diverse areas including education, health care, criminal justice, social welfare, and economic development.
Conducting high-quality RCTs of public policies requires careful planning in partnership with government agencies and other stakeholders. Key steps include identifying research questions, sampling an appropriate study population, designing the intervention and control conditions, randomly assigning participants, collecting data, analyzing results using statistical methods, and translating findings into policy recommendations. Attending to challenges around cost, feasibility, ethics, and generalization can optimize policy RCTs. Emerging opportunities exist to improve RCT methodology, embed trials into real-world practice, and build multi-site RCT networks in key policy domains. Overall, RCTs provide policymakers with credible evidence to support more effective data-driven policies that improve people’s lives. The thoughtful application of experimental methods for policy evaluation represents a valuable tool for strengthening public welfare.
References
- Sanderson, I. (2002). Evaluation, policy learning and evidence‐based policy making. Public administration, 80(1), 1-22.
- Cartwright, N., & Hardie, J. (2012). Evidence-based policy: A practical guide to doing it better. Oxford University Press.
- Newman, K., & Head, B. W. (2017). Policy evaluation, evidence and learning: a ‘state of play’assessment of how governments are using evaluation. Evidence & Policy: A Journal of Research, Debate and Practice, 13(2), 225-248.
- Davies, P. (2004). Is evidence-based government possible?. Jerry Lee Lecture, 4th Annual Campbell Collaboration Colloquium, Washington DC.
- Bloom, H.S. (2005). Randomizing groups to evaluate place-based programs. Learning more from social experiments: Evolving analytic approaches, 115-172.
- Duflo, E., Glennerster, R., & Kremer, M. (2007). Using randomization in development economics research: A toolkit. Handbook of development economics, 4, 3895-3962.
- Banerjee, A. V., & Duflo, E. (2009). The experimental approach to development economics. Annu. Rev. Econ., 1(1), 151-178.
- Ludwig, J., Kling, J. R., & Mullainathan, S. (2011). Mechanism experiments and policy evaluations. Journal of Economic Perspectives, 25(3), 17-38.
- MacLeod, W. B., & Urquiola, M. (2013). Anti-Lemons: School Reputation and Educational Quality. In 15th Annual Conference on Empirical Legal Studies.
- Banerjee, A., & Duflo, E. (2014). Do firms want to borrow more? Testing credit constraints using a directed lending program. The Review of Economic Studies, 81(2), 572-607.
- Gertler, P.J., Martinez, S., Premand, P., Rawlings, L.B., & Vermeersch, C.M. (2016). Impact evaluation in practice. The World Bank.
- Bell, S. H., Olsen, R. B., Orr, L. L., & Stuart, E. A. (2016). Estimates of external validity bias when impact evaluations select sites purposively. Educational Evaluation and Policy Analysis, 38(1), 1-20.
- Clarke, B., Kaboudan, M., & Zaidi, F. (2019). The challenges of policy evaluation and randomized controlled trials for international development. Community Psychology in Global Perspective, 5(2), 74-93.
- Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., … & Ji, P. (2005). Standards of evidence: Criteria for efficacy, effectiveness and dissemination. Prevention science, 6(3), 151-175.
- Duflo, E., Glennerster, R., & Kremer, M. (2008). Using randomization in development economics research: A toolkit. Handbook of development economics, 4, 3895-3962.
- Deaton, A., & Cartwright, N. (2018). Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210, 2-21.
- Tiokhin, L., & Haskins, R. (2019). Using demonstrations and randomized controlled trials to strengthen social programs. The ANNALS of the American Academy of Political and Social Science, 686(1), 72-90.
- Gertler, P. J., Martinez, S., Premand, P., Rawlings, L. B., & Vermeersch, C. M. (2016). Impact evaluation in practice. The World Bank.
- Spence, N. (2019). Sufficiency accounts of intervention controls. International Studies in the Philosophy of Science, 32(4), 311-331.
- Imbens, G. W., & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
- Gerber, A. S., & Green, D. P. (2012). Field experiments: Design, analysis, and interpretation. WW Norton.
- Moriarty, J. (2010). Qualitative methods overview. London: NIHR School for Social Care Research.
- Davies, P. (2004). Is evidence-based government possible?. Jerry Lee Lecture, 4th Annual Campbell Collaboration Colloquium, Washington DC.
- Patton, M. Q. (2008). Utilization-focused evaluation. Sage publications.
- Duflo, E. (2017). Richard T. Ely Lecture: The Economist as Plumber. American Economic Review, 107(5), 1-26.
- Gupta, S. K. (2011). Intention-to-treat concept: a review. Perspectives in clinical research, 2(3), 109.
- Bloom, H. S. (2006). The core analytics of randomized experiments for social research. MDRC Working Papers on Research Methodology.
- Wing, C., Simon, K., & Bello-Gomez, R. A. (2018). Designing difference in difference studies: best practices for public health policy research. Annual review of public health, 39, 453-469.
- Levitt, S. D., & List, J. A. (2009). Field experiments in economics: The past, the present, and the future. European Economic Review, 53(1), 1-18.
- Burtless, G. (1995). The case for randomized field trials in economic and policy research. Journal of economic perspectives, 9(2), 63-84.
- Deaton, A., & Cartwright, N. (2018). Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210, 2-21.
- Banerjee, A. V., & Duflo, E. (2009). The experimental approach to development economics. Annu. Rev. Econ., 1(1), 151-178.
- Feller, I., & Froyd, J. E. (2019). Understanding and improving the research-policy relationship: Insights from academics, stakeholders, and knowledge brokers. In Forum Future of Knowledge and Knowledge Societies. Knoxville, TN.
- Stoker, G. (2010). Translating experiments into policy. Annals of the American Academy of Political and Social Science, 628(1), 47-58.
- Breschi, S., & Malerba, F. (1997). Sectoral innovation systems: technological regimes, Schumpeterian dynamics, and spatial boundaries. Systems of innovation: Technologies, institutions and organizations, 130-156.
- Banerjee, A. V., & Duflo, E. (2009). The experimental approach to development economics. Annual Review of Economics, 1(1), 151-178.
- U.S. Department of Education (2018). Education Innovation and Research (EIR) Program. https://oese.ed.gov/education-innovation-research-eir/
- Shulock, N. (2016). Using evidence in state policymaking: Conditions for research utilization. Journal of Higher Education, 87(5), 720-749.
- Belfield, C., Bowden, B., Klapp, A., Levin, H., Shand, R., & Zander, S. (2015). The economic value of social and emotional learning. Journal of Benefit-Cost Analysis, 6(3), 508-544.
- Angrist, J., Imbens, G., & Rubin, D. (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91(434), 444-455.
- Stoker, G. (2010). Translating experiments into policy. The ANNALS of the American Academy of Political and Social Science, 628(1), 47-58.
- Tiokhin, L., & Haskins, R. (2019). Using demonstrations and randomized controlled trials to strengthen social programs. The ANNALS of the American Academy of Political and Social Science, 686(1), 72-90.
- Deaton, A. (2010). Instruments, randomization, and learning about development. Journal of Economic Literature, 48(2), 424-55.
- Banerjee, A. V., & Duflo, E. (2009). The experimental approach to development economics. Annu. Rev. Econ., 1(1), 151-178.
- Gupta, S. K. (2011). Intention-to-treat concept: a review. Perspectives in clinical research, 2(3), 109.
- Wolf, P. J., Kisida, B., Gutmann, B., Puma, M., Eissa, N., & Rizzo, L. (2013). School vouchers and student outcomes: Experimental evidence from Washington, DC. Journal of Policy Analysis and Management, 32(2), 246-270.
- Abdulkadiroğlu, A., Pathak, P. A., & Walters, C. R. (2018). Free to choose: Can school choice reduce student achievement?. American Economic Journal: Applied Economics, 10(1), 175-206.
- Muralidharan, K., & Sundararaman, V. (2015). The aggregate effect of school choice: Evidence from a two-stage experiment in India. The Quarterly Journal of Economics, 130(3), 1011-1066.
- Fryer Jr, R. G., Levitt, S. D., List, J., & Sadoff, S. (2012). Enhancing the efficacy of teacher incentives through loss aversion: A field experiment (No. w18237). National Bureau of Economic Research.
- Springer, M. G., Ballou, D., Hamilton, L., Le, V. N., Lockwood, J. R., McCaffrey, D. F., … & Stecher, B. M. (2011). Teacher pay for performance: Experimental evidence from the project on incentives in teaching (POINT). Society for Research on Educational Effectiveness.
- Rogers, T., & Feller, A. (2018). Reducing student absences at scale by targeting parents’ misbeliefs. Nature Human Behaviour, 2(5), 335-342.
- Castillo, M., Petrie, R., & Torero, M. (2020). Behavioral nudges for building social-emotional skills: Experimental evidence from Peru (No. w26748). National Bureau of Economic Research.
- Heller, S. B., Shah, A. K., Guryan, J., Ludwig, J., Mullainathan, S., & Pollack, H. A. (2017). Thinking, fast and slow? Some field experiments to reduce crime and dropout in Chicago. The Quarterly Journal of Economics, 132(1), 1-54.
- Institute of Education Sciences (2020). Regional Educational Laboratories. https://ies.ed.gov/ncee/edlabs/
- Manning Jr, W. G., Newhouse, J. P., Duan, N., Keeler, E. B., & Leibowitz, A. (1987). Health insurance and the demand for medical care: evidence from a randomized experiment. The American Economic Review, 77(3), 251-277.
- Thornton, R. L. (2008). The demand for, and impact of, learning HIV status. American Economic Review, 98(5), 1829-63.
- Leventhal, T., & Brooks-Gunn, J. (2000). The neighborhoods they live in: the effects of neighborhood residence on child and adolescent outcomes. Psychological Bulletin, 126(2), 309.
- Berk, R., Barnes, G., Ahlman, L., & Kurtz, E. (2010). When second best is good enough: A comparison between a true experiment and a regression discontinuity quasi-experiment. Journal of Experimental Criminology, 6(2), 191-208.
- Killias, M., Gilliéron, G., Villard, F., & Poglia, C. (2010). How damaging is imprisonment in the long-term? A controlled experiment comparing long-term effects of community service and short custodial sentences on re-offending and social integration. Journal of Experimental Criminology, 6(2), 115-130.
- Skeem, J. L., Manchak, S., & Peterson, J. K. (2011). Correctional policy for offenders with mental illness: Creating a new paradigm for recidivism reduction. Law and human behavior, 35(2), 110.
- Heller, S. B., Shah, A. K., Guryan, J., Ludwig, J., Mullainathan, S., & Pollack, H. A. (2017). Thinking, fast and slow? Some field experiments to reduce crime and dropout in Chicago. The Quarterly Journal of Economics, 132(1), 1-54.
- Turner, S., & Petersilia, J. (1996). Work release in Washington: Effects on recidivism and corrections costs. The Prison Journal, 76(2), 138-164.
- Weisburd, D., Telep, C. W., & Lawton, B. (2014). Could innovations in policing have contributed to the New York City crime drop even in a period of declining police strength?: The case of stop, question and frisk as a hot spots policing strategy. Justice Quarterly, 31(1), 129-153.
- Cullen, F. T., Jonson, C. L., & Nagin, D. S. (2011). Prisons do not reduce recidivism: The high cost of ignoring science. The Prison Journal, 91(3_suppl), 48S-65S.
- Jacob, B. A., & Ludwig, J. (2012). The effects of housing assistance on labor supply: Evidence from a voucher lottery. The American economic review, 102(1), 272-304.
- Price, D. I., & Song, J. (2018). The long-term effects of cash assistance. Working Paper.
- Banerjee, A., Niehaus, P., & Suri, T. (2019). Universal basic income in the developing world (No. w25598). National Bureau of Economic Research.
- Bloom, H. S., Hill, C. J., & Riccio, J. A. (2003). Linking program implementation and effectiveness: Lessons from a pooled sample of welfare-to-work experiments. Journal of Policy Analysis and Management, 22(4), 551-575.
- Glover, R. W., Cihan, A., Lahiri, K., Gilliam, F. D.,