SMM and Indirect Inference

A Practical introduction

Jay Kahn

Better data and computing facilities have made sensible things simple.
- Pakes (2003)

Many interesting models do not have convenient closed forms

For instance, any interesting dynamic model in corporate finance. Also many models in macrofinance and asset pricing.

But the moment and likelihood techniques you know from econometrics rely on closed forms.

  • Example: \(\mathbb{E}\left[ \beta \left(\dfrac{c_{t+1}}{c_t}\right)^{-\alpha} x_t-p_t \right]= 0\).

The techniques I'm going to show you today are designed to get around this.

  1. SMM

  2. Indirect inference

Key concept: the right parameters get the model to look like the data along a few dimensions.

What if you wanted to estimate the mean of \(x_i\sim \mathcal{N}(\mu,1)\).

You could use closed-form moments estimation:

\[ \mu=\dfrac{1}{n}\sum x_i \]

What if you didn't know the closed form, but knew how to calculate a sample average and how to sample from a normal?

Consider a candidate mean, \(\tilde \mu\), using a random number generator, draw a sample of size \(m >> n\), \(\tilde x_i\) from \(\mathcal{N}(\tilde\mu, 1)\). Take:

\[ d(\tilde x_{i}, x_{i}|\tilde\mu) = \dfrac{1}{m}\sum \tilde x_{i} - \dfrac{1}{n} \sum x_{i} \]

If you pick the right mean, \(d(\tilde x_{i}, x_{i}|\mu) \approx 0\).

So you can get to the right \(\mu\) by making \(\frac{1}{m}\sum \tilde x_i\) "look like" \(\frac{1}{n}\sum x_i\).

  • This is the essence of moment estimation, but we no longer have a formula for \(\frac{1}{m}\sum \tilde x_i\). That's all SMM is!

SMM is just a more general version of the same concept.

Take any moment, \(h(x_i)\).

Given \(\theta\), suppose you can simulate \(\tilde x_i (\theta)\) from the model.

Simulate \(S\) sets of size \(n\), \(\tilde x_{is}\) and form the vector:

\[ g_{n}\left( \theta\right) =n^{-1}\sum_{i=1}^{n}\left[ h\left( x_{i}\right) -S^{-1}\sum_{s=1}^{S}h\left( \tilde x_{is}\left( \theta\right) \right) \right] . \]

We can estimate \(\theta\):

\[ \hat{\theta}=\arg\min_{\theta} Q(\theta,n) \equiv g_{n}\left( \theta\right)^{\prime}\hat{W}_{n}g_{n}\left(\theta\right) \]

for a positive semi-definite \(\hat W_n\).

This is the simulated method of moments estimator.

In general, the mapping from \(\theta\) to \(\tilde x_i\) will come from solving a dynamic model.

Inference is very similar to what you're used to in GMM

SMM estimator is asymptotically normal:

\[ \sqrt{n}\left( \hat{\theta}-\theta\right) \overset{d}{\longrightarrow}\mathcal{N}% \left( 0,\operatorname*{avar}(\hat{\theta})\right) \]


\[ \operatorname*{avar}(\hat{\theta})\equiv\left( 1+\frac{1}{S}\right) \left[ \frac{\partial g_{n}\left( \theta\right) }{\partial \theta}W\frac{\partial g_{n}\left( \theta\right) }{\partial \theta^{\prime}}\right] ^{-1}. \]

The \(\frac{1}{S}\) term accounts for simulation error in the estimates.

A test of overidentifying statistics is still available:

\[ \frac{nS}{1+S} Q(\theta,n) \overset{d}{\longrightarrow} \chi ^{2}(j-k) \] where \(j\) is the number of moments and \(k\) is the number of parameters. As with GMM, this \(J\)-stat equivalent can be used to make nested model comparisons so long as \(W\) is kept constant.

How to do SMM in practice: the objective function

For a given model, the SMM objective function has a few inputs:

  1. The data moment vector, \(h(x_i)\).
  2. The parameter vector, \(\theta\).
  3. The weight matrix, \(W\).
  4. A set of random numbers.

Given these inputs, \(\theta\) allows you to SOLVE THE MODEL.

The random numbers allow you to SIMULATE DATA SETS.

The data moments and weight matrix allow you to COMPUTE DISTANCE.

How to do SMM in practice: pseudo-code

def SMM(theta_0, W, true_data, S):

  mom_data   = moments(true_data) # Grab the true moments from the data and store them

  random_nums = rand(true_data.size, S) # Generate random draws and store them

  theta_new   = theta_0

  while not stop:

    policies       = model_solution(theta_new)   # SOLVE THE MODEL
    # This eats up ALL YOUR TIME

    simulated_data = model_simulation(policies, random_nums)  # SIMULATE DATA SETS

    mom_sim        = moments(simulated_data) # Calculate moments from simulated data

    d_mom          = mom_sim - mom_data  # Get the difference between simulated and true

    Q              = d_mom.t() * W * d_mom  # COMPUTE DISTANCE

    theta_old      = theta_new # Store the current parameter

    theta_new      = update(theta_old, theta_new, Q, Qnew) # Update using minimization algorithm

    if converge(Q, theta_old, theta_new): stop = True # Stop if condition is met

Identification requires moments which are monotonic in parameters

What we did with SMM was actually pretty deep.

picture of spaghetti

Indirect inference generalizes the techniques behind SMM.

Due to Gourieroux, Monfort and Renault (1993).

Not all interesting features of the data can be written as moments. Medians, for instance.

  1. True model: \(x_i \sim \mathcal{f}(\theta)\). Parameters, \(\hat\theta\), difficult to estimate. Example: DSGE.
  2. Auxiliary model: \(x_i \sim \mathcal{g}(b)\). Parameters, \(\hat b\), easy to estimate. Example: VAR.
    • \(\hat b = \arg\max_b J(b,x_i)\)

A likelihood describes the DGP. In the case of a DSGE model this is nearly impossible to calculate.

But a VAR is easy. It's an auxiliary model because it does not describe the true DGP.

Think of this as the logic of the Cowles' commission formalized: reversing the mapping from a structural model to a reduced form in order to estimate parameters.

Indirect inference depends on a choice of metric:

Wald: Really just SMM.

\[ \hat\theta = \arg \min_\theta \left[\left(\tilde b(\theta) - b\right)^\prime W \left(\tilde b(\theta) - b\right)\right] \]

Likelihood ratio: compare simulated parameter to data score.

\[ \hat\theta = \arg \min_\theta \sum \log g(x_t, b) - \sum \log g(x_t,\tilde b(\theta)) \]

Likelihood maximization: compare data parameter to simulated score.

\[ \hat\theta = \arg \min_\theta G(\theta)^\prime W G(\theta) \]

  • With: \[ G(\theta) = \sum \dfrac{\delta}{\delta \theta} \log g\left(\tilde x_t(\theta),b\right) \]

Which auxiliary estimator do I choose?

As we've discussed before, moments must be monotonic in parameters.

But what kind of moments do we want?

  1. The score of an extremely flexible likelihood (Gallant and Tauchen, 1996).

  2. The shape of extremely flexible policy functions (Bazdresch, Kahn and Whited, 2014)).

In both cases the intution is that you want to get as close as possible to the actual DGP.

More generally, the key is to have moments that reflect the essential mechanism of the model.

  • Not average leverage, average consumption, or the equity premium.
  • Instead correlation between leverage and investment, correlation between consumption and the equity premium.

As a test of model fit, this helps to determine if a channel is reasonable. If an author neglects all moments related to mechanism they haven't shown you what their model adds.

Calculating the weighting matrix

It turns out that \(W\) need not depend on any of the parameters of the model directly.

Under the null that the model is true, the optimal \(W\) is just the variance-covariance matrix of the moments in the data.

In general, a mixture of different moments makes it difficult to determine \(W\). A few options:

  1. Bootstrap this matrix this is wrong, don't do it.
  2. Estimate one big GMM system.
  3. Use influence functions.
    • These are super easy to use, and allow you to calculate the covariance of pretty much anything.
    • Obscure in applied work, notes on my website if you're interested.

There are two big practical problems with simulation estimators.

  1. The indirect inference objective function is almost never well behaved in the sense needed by many minimization routines.
    • Gradient based methods will not work. Nelder-Mead only helps if you're very close.
    • I generally rely on either simulated annealing or differential evolution. One or both is available in your favorite language.
  2. Solving dynamic models takes a long time.
    • Get really efficient at coding.
    • Use a compiled language. Matlab, R and python will not get your through this.

Indirect inference is a lot like calibration, but there are key differences.

In both methods, you have some moments in the data that you try to match your simulated model to. Both allow you to be quantitative and discuss counterfactuals.


  • Takes micro estimates of parameters and compares to macro data.

  • Have to trust the model, but no priors needed.

  • The best fit of the model.

  • Actual maximization.

  • A direct way to calculate standard errors.


  • Infers micro parameters from observed micro data and allows comparison.

  • Don't have to put so much trust in the model, but need strong priors.

  • A particular version of the model.

  • Heuristic maximization ("fiddling around")

  • Takes a lot less time.

I don't know why these methods aren't more widespread.