# Bootstrap (Build 4)
This document describes the parametric bootstrap on the IO-HMM-fitted
population, the warning-shift sweep that rides on top of it, and how to read
the resulting Fig. 6 (failed-evacuation count vs. shift, with quantile
bands).
## What's bootstrapped, what's swept
* **Bootstrap dimension (B replicates):** resample households *with
replacement* from the baseline simulation output, refit the IO-HMM on each
resample, yielding `B` parameter sets `θ̂⁽¹⁾, …, θ̂⁽ᴮ⁾`.
* **Shift dimension (S shifts):** vary the warning-lead-time shift `δ` in
hours relative to baseline. A shift of `δ` means the voluntary order fires
at `t = 60 + δ` and the mandatory at `t = 84 + δ`. Defaults:
`δ ∈ {-24, -16, -8, 0, +8, +16, +24}`, so `S = 7`.
* **For each `(θ̂⁽ᵇ⁾, δ)` cell:** simulate trajectories using `θ̂⁽ᵇ⁾` as
the (DGP-equivalent) parameter set with the shifted timeline, then compute
the four network metrics. The *failed-evacuation count* is the headline
metric for Fig. 6.
The `(B × S)` grid is the long-format `shift_sweep.parquet`. Aggregating
along `B` at each `δ` gives the median, the 25–75% inter-quartile band, and
the 5–95% outer band that the figure plots.
## Why this captures the right uncertainty
The bootstrap quantifies **parameter uncertainty** in `θ̂`. Each `δ` column
of the grid asks: *"what's the failed-evac count under this policy, given
our uncertainty about the true behavioral parameters?"* The bands directly
answer the chapter's policy question — *"is the shift in the failed-evac
curve under each policy distinguishable from sampling noise?"*
The simulation step is itself stochastic (each call uses an RNG), but we
hold the simulator seed *deterministic* given `(b, δ)` so this noise does
not enter the bands. Concretely the cell-level seed is:
```
seed(b, s_index) = base_seed * 10000 + replicate_id * 100 + shift_index
```
This isolates parameter uncertainty from simulation noise. The chapter only
needs the parameter-uncertainty story; if a future extension wants to also
display simulation noise, it can sweep multiple seeds per `(b, δ)`.
## Note on terminology
Strictly, this is a **parametric bootstrap on the IO-HMM-fitted population**:
we resample the *fitted-data* households, refit, and simulate from each
refit. This is the most direct way to propagate parameter-estimation
uncertainty into downstream policy outcomes. The chapter can call it either
"bootstrap" or "parametric bootstrap"; the procedure documented here is the
authoritative reference.
## Why warm-starting
Each replicate's EM problem is a small perturbation of the original
full-data fit. Cold (random) starts converge in ~20–30 iterations on the
baseline data; warm-starting from the original full-data `theta.toml`
typically converges in 5–10. Since EM dominates the per-replicate runtime,
warm-starting halves wall-clock time for the production 50-replicate run.
`--warm-start` is optional. Pass `./output/fit/` (the directory containing
the existing single-fit `theta.toml`); leave it off to fit each replicate
cold from a random init (works, but slower).
## Substituting `θ̂⁽ᵇ⁾` into the simulator
Each `θ̂⁽ᵇ⁾` is an IO-HMM `FitParameters` object. The IO-HMM's transition
logits have the same multinomial-logit form as the DGP's transition rows
(`α + β · u`), but the IO-HMM input vector
```
u = [vol, mand, ρ, r, v, τ]
```
is a strict subset of the DGP's: the endogenous features `π` (peer share),
`c` (congestion), and `tir` (time-in-ER) are absent by design — the IO-HMM
sees only exogenous inputs. The bootstrap simulator
(`simulate_with_iohmm_transitions` in
`src/iohmm_evac/bootstrap/shift_sweep.py`) consequently:
1. Reuses the DGP's population, timeline, emissions, and feedback machinery
from the supplied `SimulationConfig`.
2. At each step, builds `u_{i,t}` exactly as `inference.data.bundle_to_fit_data`
does, so the simulator and the IO-HMM agree on the input vector.
3. Samples the next state from `softmax(α[k, :] + β[k, :, :] @ u_{i,t})`
using `θ̂⁽ᵇ⁾`'s `(α, β)`. SH remains absorbing.
4. Continues to track `tir`, `π`, `c` because the *emission* model and the
network metrics depend on them — but they no longer enter the
*transition* probabilities.
This isolates the IO-HMM's view of policy effects (warning shifts) from any
DGP-specific feedback wiring.
## Worked example: reading Fig. 6
A typical Fig. 6 (failed-evac count vs. `δ`) shows:
* Median curve sloping upward from `δ = -24` to `δ = +24` — later warnings
⇒ more failed evacuations.
* 25–75% inner band giving the typical replicate-to-replicate spread.
* 5–95% outer band giving the worst-case range a reader should expect.
To answer *"is δ = -8 distinguishably better than baseline (δ = 0)?"*,
compare the medians at `δ = -8` and `δ = 0`. If the 25–75% interval at
`δ = -8` does not overlap with the 25–75% interval at `δ = 0`, the
improvement is robust to the IO-HMM's parameter uncertainty.
The `bootstrap summary` CLI prints the same information as a table:
```
shift failed_evacuation_count ...
+0 120.00 [115.50,123.00] (109.20,128.10)
-8 102.50 [99.00,106.00] (95.40,110.20)
...
```
Each cell is `median [P25,P75] (P5,P95)`.
## Resampling-only bootstrap, not nonparametric on trajectories
The household-level resample changes which trajectories enter the EM fit,
so each replicate's `θ̂` differs. We do **not** resample at the
trajectory-realization level (i.e. drawing fresh DGP trajectories per
replicate). That's a different kind of bootstrap which mixes
parameter-estimation and DGP-realization uncertainty; we want only the
former for the chapter's policy story.
## Production run
```bash
uv run iohmm-evac bootstrap fit \
--input ./output/baseline.parquet \
--warm-start ./output/fit/ \
--output-dir ./output/bootstrap/fits/ \
--n-replicates 50 --jobs -1 --seed 0
uv run iohmm-evac bootstrap shift-sweep \
--bootstrap-dir ./output/bootstrap/fits/ \
--output ./output/bootstrap/shift_sweep.parquet --seed 0
uv run iohmm-evac report bootstrap-bands \
--input ./output/bootstrap/shift_sweep.parquet \
--output ./output/figures/bootstrap_bands.png
```
The 5-replicate smoke test (used by the test suite and the acceptance
gate) substitutes `--n-replicates 5` for the full 50.