Jay Kahn

Better data and computing facilities have made sensible things simple.- Pakes (2003)

For instance, *any interesting dynamic model in corporate finance*. Also **many models in macrofinance and asset pricing**.

But the moment and likelihood techniques you know from econometrics rely on **closed forms**.

- Example: \(\mathbb{E}\left[ \beta \left(\dfrac{c_{t+1}}{c_t}\right)^{-\alpha} x_t-p_t \right]= 0\).

The techniques I'm going to show you today are designed to get around this.

SMM

Indirect inference

**Key concept:** **the right parameters get the model to look like the data along a few dimensions.**

You could use closed-form moments estimation:

\[ \mu=\dfrac{1}{n}\sum x_i \]

What if you didn't know the closed form, but knew how to **calculate a sample average** and how to **sample from a normal**?

Consider a candidate mean, \(\tilde \mu\), using a random number generator, draw a sample of size \(m >> n\), \(\tilde x_i\) from \(\mathcal{N}(\tilde\mu, 1)\). Take:

\[ d(\tilde x_{i}, x_{i}|\tilde\mu) = \dfrac{1}{m}\sum \tilde x_{i} - \dfrac{1}{n} \sum x_{i} \]

If you pick the right mean, \(d(\tilde x_{i}, x_{i}|\mu) \approx 0\).

**So you can get to the right \(\mu\) by making \(\frac{1}{m}\sum \tilde x_i\) "look like" \(\frac{1}{n}\sum x_i\).**

- This is the essence of moment estimation, but we no longer have a formula for \(\frac{1}{m}\sum \tilde x_i\).
**That's all SMM is!**

Take **any** moment, \(h(x_i)\).

Given \(\theta\), **suppose you can simulate \(\tilde x_i (\theta)\) from the model**.

Simulate \(S\) sets of size \(n\), \(\tilde x_{is}\) and form the vector:

\[ g_{n}\left( \theta\right) =n^{-1}\sum_{i=1}^{n}\left[ h\left( x_{i}\right) -S^{-1}\sum_{s=1}^{S}h\left( \tilde x_{is}\left( \theta\right) \right) \right] . \]

We can estimate \(\theta\):

\[ \hat{\theta}=\arg\min_{\theta} Q(\theta,n) \equiv g_{n}\left( \theta\right)^{\prime}\hat{W}_{n}g_{n}\left(\theta\right) \]

for a positive semi-definite \(\hat W_n\).

This is the **simulated method of moments estimator**.

In general, the mapping from \(\theta\) to \(\tilde x_i\) will come from solving a dynamic model.

**SMM estimator is asymptotically normal:**

\[ \sqrt{n}\left( \hat{\theta}-\theta\right) \overset{d}{\longrightarrow}\mathcal{N}% \left( 0,\operatorname*{avar}(\hat{\theta})\right) \]

where:

\[ \operatorname*{avar}(\hat{\theta})\equiv\left( 1+\frac{1}{S}\right) \left[ \frac{\partial g_{n}\left( \theta\right) }{\partial \theta}W\frac{\partial g_{n}\left( \theta\right) }{\partial \theta^{\prime}}\right] ^{-1}. \]

The \(\frac{1}{S}\) term accounts for simulation error in the estimates.

**A test of overidentifying statistics is still available:**

\[ \frac{nS}{1+S} Q(\theta,n) \overset{d}{\longrightarrow} \chi ^{2}(j-k) \] where \(j\) is the number of moments and \(k\) is the number of parameters. As with GMM, this \(J\)-stat equivalent can be used to make nested model comparisons so long as \(W\) is kept constant.

**For a given model, the SMM objective function has a few inputs:**

- The data moment vector, \(h(x_i)\).
- The parameter vector, \(\theta\).
- The weight matrix, \(W\).
- A set of random numbers.

Given these inputs, \(\theta\) allows you to **SOLVE THE MODEL**.

The random numbers allow you to **SIMULATE DATA SETS**.

The data moments and weight matrix allow you to **COMPUTE DISTANCE**.

```
``````
def SMM(theta_0, W, true_data, S):
mom_data = moments(true_data) # Grab the true moments from the data and store them
random_nums = rand(true_data.size, S) # Generate random draws and store them
theta_new = theta_0
while not stop:
policies = model_solution(theta_new) # SOLVE THE MODEL
# This eats up ALL YOUR TIME
simulated_data = model_simulation(policies, random_nums) # SIMULATE DATA SETS
mom_sim = moments(simulated_data) # Calculate moments from simulated data
d_mom = mom_sim - mom_data # Get the difference between simulated and true
Q = d_mom.t() * W * d_mom # COMPUTE DISTANCE
theta_old = theta_new # Store the current parameter
theta_new = update(theta_old, theta_new, Q, Qnew) # Update using minimization algorithm
if converge(Q, theta_old, theta_new): stop = True # Stop if condition is met
```

Due to Gourieroux, Monfort and Renault (1993).

Not all interesting features of the data can be written as moments. Medians, for instance.

**True model**: \(x_i \sim \mathcal{f}(\theta)\). Parameters, \(\hat\theta\), difficult to estimate. Example: DSGE.**Auxiliary model**: \(x_i \sim \mathcal{g}(b)\). Parameters, \(\hat b\), easy to estimate. Example: VAR.- \(\hat b = \arg\max_b J(b,x_i)\)

A likelihood describes the DGP. In the case of a DSGE model this is nearly impossible to calculate.

But a VAR is easy. It's an auxiliary model because it does not describe the true DGP.

**Think of this as the logic of the Cowles' commission formalized: reversing the mapping from a structural model to a reduced form in order to estimate parameters.**

**Wald**: Really just SMM.

\[ \hat\theta = \arg \min_\theta \left[\left(\tilde b(\theta) - b\right)^\prime W \left(\tilde b(\theta) - b\right)\right] \]

**Likelihood ratio**: compare simulated parameter to data score.

\[ \hat\theta = \arg \min_\theta \sum \log g(x_t, b) - \sum \log g(x_t,\tilde b(\theta)) \]

**Likelihood maximization**: compare data parameter to simulated score.

\[ \hat\theta = \arg \min_\theta G(\theta)^\prime W G(\theta) \]

- With: \[ G(\theta) = \sum \dfrac{\delta}{\delta \theta} \log g\left(\tilde x_t(\theta),b\right) \]

As we've discussed before, moments must be monotonic in parameters.

**But what kind of moments do we want?**

The score of an extremely flexible likelihood (Gallant and Tauchen, 1996).

The shape of extremely flexible policy functions (Bazdresch, Kahn and Whited, 2014)).

**In both cases the intution is that you want to get as close as possible to the actual DGP.**

More generally, **the key is to have moments that reflect the essential mechanism of the model.**

- Not average leverage, average consumption, or the equity premium.
- Instead correlation between leverage and investment, correlation between consumption and the equity premium.

As a test of model fit, this helps to determine if a channel is reasonable. **If an author neglects all moments related to mechanism they haven't shown you what their model adds.**

It turns out that \(W\) need not depend on any of the parameters of the model directly.

Under the null that the model is true, the optimal \(W\) is just the variance-covariance matrix of the moments in the data.

In general, a mixture of different moments makes it difficult to determine \(W\). A few options:

- Bootstrap this matrix this is wrong, don't do it.
- Estimate one big GMM system.
- Use
**influence functions.**- These are
**super easy to use**, and allow you to calculate the covariance of**pretty much anything**. - Obscure in applied work, notes on my website if you're interested.

- These are

- The indirect inference objective function is almost never well behaved in the sense needed by many minimization routines.
- Gradient based methods will not work. Nelder-Mead only helps if you're very close.
**I generally rely on either simulated annealing or differential evolution.**One or both is available in your favorite language.

**Solving dynamic models takes a long time.**- Get really efficient at coding.
- Use a compiled language. Matlab, R and python will not get your through this.

In both methods, you have some moments in the data that you try to match your simulated model to. Both allow you to be quantitative and discuss counterfactuals.

**Estimation**

Takes micro estimates of parameters and compares to macro data.

Have to trust the model, but no priors needed.

The best fit of the model.

Actual maximization.

**A direct way to calculate standard errors.**

**Calibration**

Infers micro parameters from observed micro data and allows comparison.

Don't have to put so much trust in the model, but need strong priors.

A particular version of the model.

Heuristic maximization ("fiddling around")

**Takes a lot less time.**

They're easy and they're useful.

More references: