Synthetic Control Method: A-Z

Henam Singla
10 min readSep 3, 2024

--

Understanding all about the SCM in Econometrics

In a recent economics paper, the authors describe synthetic controls as “arguably the most important innovation in the policy evaluation literature in the last 15 years” (Athey and Imbens 2017). The Synthetic control method (SCM) provides substantial advantages as a research design method in the social sciences.

Image generated by the author using Dall.E depicting the increase in immigration leading to higher unemployment

In this article, we are going to discuss the SCM in detail, how it works, pros, cons, validity, and much more. Stay tuned!

What is the Synthetic Control Method?

SCM aims to estimate the effects of aggregate interventions, that is, interventions that are implemented at an aggregate level affecting a small number of large units (such as cities, regions, or countries), on some aggregate outcome of interest. SCM was used to study the effect of terrorist conflict in the Basque Country by Abadie and Gardeazabal (2003).

Dall-E 2 generated by the author: Fictional visual ocean with immigrants in the United States

To clarify, I will give an example throughout my article. In 1980 there was a large inflow of Cubans into the United States, particularly into Miami. This was known as the Mariel boatlift. This phenomenon was first studied by David Card (1990). Here, there is no clear distinction between treated and untreated. We didn’t know which donor pool to use, so we resorted to the synthetic control group for Miami which I will discuss later.

The SCM was useful for studying the Mariel Boatlift because it allowed for the creation of a synthetic control group composed of several other cities. This synthetic control group served as a comparison to isolate the effect of the Boatlift on Miami’s employment, providing a clearer picture of its impact.

The synthetic control method is a comparative case study approach designed to evaluate the effects of an intervention or treatment. It constructs a synthetic version of the treatment group (often referred to as the “treated unit”) by combining a weighted average of control units (untreated groups) from the donor pool. This synthetic version serves as a counterfactual to estimate what would have happened to the treated unit in the absence of the intervention.

How Does the SCM Work?

SCM uses a more data-driven approach to select the comparison group.

  1. Selection of Units and Variables:
  • Treated Unit: The entity that receives the intervention.
  • Control Units: Entities that do not receive the intervention and are used to construct the synthetic control.
  • Predictor Variables: Pre-intervention characteristics and outcomes used to construct the synthetic control.

2. Construction of the Synthetic Control:

  • A weighted average of control units
  • the weights are chosen so that the treatment unit and its synthetic control are as similar as possible in preintervention characteristics and trends

A few use cases of SCM

Here are a few examples where SCM was used to study the effect of intervention.

Photo by Val Tievsky on Unsplash
  1. California’s Tobacco Control Program (Abadie, Diamond, and Hainmueller, 2010):

This paper studies the effects of Proposition 99, a large-scale tobacco control program that California implemented in 1988. The authors showed that following Proposition 99, tobacco consumption fell markedly in California relative to a comparable synthetic control region.

2. Economic Liberalization Episodes (Billmeier and Nannicini, 2013):

This paper uses the SCM to assess the impact of economic liberalization on real GDP per capita across a global sample of countries. The study finds that economic liberalization generally had a positive effect on GDP per capita by comparing the post-liberalization GDP trajectories of liberalizing countries with those of a synthetic control group of similar but non-liberalizing countries.

3. Economic Impact of Terrorism in the Basque Country (Abadie and Gardeazabal, 2003):

The paper used SCM to estimate the economic costs of terrorism in the Basque Country by creating a synthetic control from other Spanish regions. The authors find that after the outbreak of terrorism in the late 1960s, per capita GDP in the Basque Country declined about 10 percentage points relative to a synthetic control region without terrorism.

4. Comparative Politics and the Synthetic Control Method (Abadie, Diamond, and Hainmueller, 2014):

The authors apply the SCM to study the economic effects of the 1990 German reunification in West Germany. The authors construct a synthetic West Germany as a weighted average of other advanced industrialized countries chosen to resemble the values of economic growth predictors for West Germany prior to the reunification.

Formal Definitions

If you are just here to understand the concept, you may skip this section. This is going to be all math to try to make the concept clearer. For each formal definition, I will tie in the analogous concept from the Mariel Boatlift study.

Created by the author using DALL.E
  1. Setup

Suppose there are J+1 units, and unit 1 gets the intervention after time T_0 and the other J units never get the intervention. Y_jt is the outcome for unit j at time t.

In the Mariel boatlift example:

  • Units: These could be cities, with Miami being unit 1 (the treated unit) and the other cities (control units) not affected by the boatlift.
  • Outcome 𝑌_𝑗𝑡: This could be the unemployment rate in city j at time t.

2. Effect of the Intervention

Then, the effect of the intervention of interest for the affected unit in period t (with t > T_0) is:

1. Y_1t^I is the outcome for unit 1 at time t with the intervention (we observe this)

2. Y_1t^N is the outcome for unit 1 at time t in the absence of intervention (we don’t observe this)

For Miami:

  • Y_1t^I is the unemployment rate in Miami after the Mariel boatlift (observed).
  • Y_1t^N​ is the hypothetical unemployment rate in Miami had the Mariel boatlift not occurred (unobserved).

3. How Synthetic control helps

In the synthetic control method, we form an estimate of Y_1t^N by using a weighted average of Y_jt of the non-treated units. Using this estimate of Y_1t^N we get an estimate of the treatment effect:

where 𝑤_𝑗∗​ are the weights assigned to the control units to form the synthetic control.

In our example:

  • We use a weighted average of the unemployment rates of other cities (control units) to estimate what Miami’s unemployment rate would have been without the Mariel boatlift.
Image generated using Dall.E showing the donor pool (synthetic city)
  • For instance, if the synthetic control is composed of 20% Atlanta, 30% Houston, and 50% Tampa, we calculate the weighted average unemployment rate of these cities to estimate Y_1t^N. In formal language, this is also known as a “donor pool”.

4. Estimating Treatment Effect

Using this estimate of Y_1t^N, we get an estimate of the treatment effect.

In our example:

  • If the observed unemployment rate in Miami is 8% and the estimated rate without the boatlift is 6%, the treatment effect is 8%−6%=2%8%−6%=2%.

This method allows us to estimate the causal impact of the Mariel boatlift on Miami’s unemployment rate by comparing it with a synthetic control group of similar cities that did not experience the boatlift.

Another question you would be thinking about now is weights. How do we get them? Don’t worry, that's what I am going to talk about next!

How do we get weights?

Photo by Piret Ilver on Unsplash

As an example, a synthetic control that assigns equal weights, w_j*= 1 / J, to each of the units in the control group results in the following estimator for τ_1t:

Need Rich Pre-Intervention Data

  • Predictor Variables: Denoted X’s (made up of Z’s, lagged Y).
  • Many Time Periods: Collect data over numerous time periods to ensure robustness.

Method-1: Synthetic Control Method

The goal is to create a synthetic control group that closely resembles the treated unit (unit 1) in terms of the predictor variables (X’s).

Using these data, minimize the difference between unit 1’s X’s and the synthetic control’s X’s. Let:

w* is chosen to minimize the distance between the (eventual) treated unit and the non-treated units. Specifically, minimize:

Photo by Isaac Smith on Unsplash

Choosing Weights (w*):

  • There are J units in the donor pool (units that are not treated).
  • Assign weights (w_2, w_3, …, w_J+1) to each of these J donor units.
  • The weights should be non-negative and add up to 1.

Predictor Variables Weights (v’s):

  • Assign weights (v_1, v_2, …, v_k) to each predictor variable.
  • These weights reflect how important each predictor variable is.

Method-2: Choosing the Optimal Predictor Weights (V)

The goal is to find the weights (V) for the predictor variables that minimize the difference between the observed outcome (Y_1t) and the synthetic control’s estimate of the outcome (Y_1t^N​).

Choose the V that minimizes the mean squared prediction error (MSPE).

Steps:

Step 1: Minimize Differences in Predictor Variables (X’s):

  • Given a 𝑉, choose W(V) to minimize ∥𝑋_1−𝑋_0𝑊(V)∥.

Here, X_1​ represents the predictor variables for the treated unit, and X_0W(V) represents the weighted average of the predictor variables for the donor units.

Step 2: Compute Weights for Each V:

  • There are many possible choices of V. Which one to use? For each possible choice of V, compute W(V) via the above minimization process.

Step 3: Select the Optimal Bundle:

  • Among these possible V and W(V) bundles, select the one with the lowest MSPE. The goal is to minimize the difference between the observed 𝑌_1𝑡 and the particular synthetic control method’s estimate of 𝑌_1𝑡^𝑁.

Mathematically:

for some set 𝑇_0 ⊆ {1,2,…,𝑇_0} of pre-intervention periods.

I wrote an article earlier on DiD, I would like to mention some key differences between DiD and SCM:

  1. No intercept (DiD: Unconstrained intercept)
  2. Nonnegative weights (DiD: Uniform weights)
  3. Weights sum up to 1 (DiD: Non-negative weights)

Limitations and Considerations

Photo by Sam Xu on Unsplash
  • Data Requirements: Requires rich pre-intervention data for accurate estimation.
  • Selection of Donor Pool: Ensure control units are comparable to the treated unit to avoid bias.
  • Sensitivity to Specifications: Results can be sensitive to predictor variables and donor pool choices, requiring robustness checks.
  • Subjectivity in Variable Selection: The choice of predictor variables and control units can introduce subjectivity, potentially affecting the results.
  • Limited Applicability: It is best suited for cases with a single or few treated units, and its applicability diminishes with larger numbers of treated units.

Extensions

I would like to make a special mention of two amazing papers that relax some assumptions of SCM and enhance its applicability.

Photo by David Thielen on Unsplash

The augmented SCM can allow negative weights on some donor units and employs regularization to avoid overfitting. It shows significant improvements in simulations and is applied to analyze the impact of the 2012 Kansas tax cuts on economic growth.

  • Generalized Synthetic Control Method (Xu, 2017):

The GSCM extends the SCM to handle multiple treated units and variable treatment periods by combining SCM with interactive fixed effects models. An empirical example is provided, illustrating the impact of Election Day Registration on voter turnout in the United States.

Conclusion

The synthetic control method is a powerful tool for causal inference, offering a robust alternative to traditional methods in comparative case studies. By constructing a synthetic version of the treated unit from a combination of control units, SCM provides a transparent and intuitive way to estimate the causal impact of an intervention. While it has its limitations, the method’s flexibility and ability to reduce bias make it an invaluable asset for researchers and policymakers aiming to understand the true effects of their actions.

Whether you’re evaluating the impact of a new policy, assessing the effectiveness of a public health intervention, or analyzing economic shocks, the synthetic control method offers a rigorous and insightful approach to uncovering causal relationships in observational data.

References

[1] Article on Synthetic Control by Scott Cunningham

[2] Best Paper in my opinion: Abadie, Alberto. 2021. “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature, 59 (2): 391–425.

[3] Athey, Susan, and Guido W. Imbens. 2017. “The State of Applied Econometrics: Causality and Policy Evaluation.” Journal of Economic Perspectives, 31 (2): 3–32.

[4] Use of SCM by Microsoft

[5] Dig deeper into my example of Mariel boatlift: Peri, G. and Yasenov, V., 2019. The labor market effects of a refugee wave: Synthetic control method meets the Mariel boatlift. Journal of Human Resources, 54(2), pp.267–309.

[6] Data Requirements and Feasibility in Depth: Abadie, Alberto. “Using synthetic controls: Feasibility, data requirements, and methodological aspects.” Journal of Economic Literature 59, no. 2 (2021): 391–425.

Thank you for reading!

Thank you for reading! 🤗 If you enjoyed this post and want to see more, consider following me. You can also follow me on LinkedIn. I plan to write blogs about causal inference and data analysis, always aiming to keep things simple.

A small disclaimer: I write to learn, so mistakes might happen despite my best efforts. If you spot any errors, please let me know. I also welcome suggestions for new topics!

--

--

Henam Singla
Henam Singla

No responses yet