docs/development.md

# Development

## Setup

```bash
# install uv if you don't have it (see https://docs.astral.sh/uv/)
uv sync
```

`uv sync` reads `pyproject.toml`, creates a `.venv/`, and installs both
runtime and dev dependencies (declared under `[dependency-groups].dev`).

## Tasks

```bash
uv run ruff check src tests        # lint
uv run ruff format --check src tests # format check
uv run ruff format src tests       # reformat in place
uv run mypy src tests              # strict type-check
uv run pytest                      # tests
uv run pytest --cov=src/iohmm_evac --cov-report=term-missing  # with coverage
```

The acceptance gate is:

```bash
uv run ruff check && uv run ruff format --check && uv run mypy src tests && \
    uv run pytest --cov=src/iohmm_evac --cov-fail-under=85
```

## Project layout

```
src/iohmm_evac/
├── __init__.py
├── cli.py            # argparse-driven entry point
├── io.py             # parquet + TOML sidecar writers
├── params.py         # frozen dataclasses for every parameter
├── scenarios.py      # named scenarios → SimulationConfig
├── types.py          # FloatArray / IntArray / BoolArray, State, Zone, EvacPath
├── dgp/              # the simulator
│   ├── population.py
│   ├── timeline.py
│   ├── feedback.py
│   ├── emissions.py
│   ├── transitions.py
│   └── simulator.py  # the per-step loop
├── inference/        # IO-HMM fitting (Build 2)
│   ├── data.py            # SimulationBundle → IO-HMM input arrays
│   ├── log_space.py       # logsumexp / log_softmax helpers
│   ├── fit_params.py      # FitParameters and DGP→fit projection
│   ├── forward_backward.py
│   ├── m_step.py          # closed-form + L-BFGS-B
│   ├── em.py              # EM loop with monotonicity tracking
│   ├── initialization.py  # random / kmeans / truth init strategies
│   ├── fit.py             # multi-restart driver
│   ├── io.py              # fit-bundle Parquet/TOML serialization
│   └── cli.py             # `iohmm-evac fit` subcommand
├── diagnostics/      # recovery metrics (Build 2)
│   ├── decoding.py        # Viterbi + posterior mode
│   ├── alignment.py       # Hungarian state alignment
│   ├── recovery.py        # state and parameter recovery
│   └── cli.py             # `iohmm-evac diagnose recovery`
├── report/           # diagnostic plots (Builds 1.5–1.7)
│   ├── plots.py
│   ├── recovery_plots.py  # recovery scatters / confusion / LL trace (Build 2)
│   ├── recovery_cli.py    # report subcommands for fit-side diagnostics
│   ├── summary.py
│   ├── loader.py
│   ├── constants.py
│   └── cli.py
└── bootstrap/        # parametric bootstrap + shift sweep (Build 4)
    ├── resample.py        # household-level resampling with replacement
    ├── runner.py          # parallel B-fit driver (joblib + loky)
    ├── shift_sweep.py     # apply each θ̂ to each warning-shift δ
    └── aggregate.py       # collect per-(b, δ) metrics into the band grid
```

Bootstrap source files stay ≤ 300 lines per Build 4's acceptance gate.

## Adding a new scenario

1. Add a builder function to `src/iohmm_evac/scenarios.py`. Pattern: start
   from `SimulationConfig()`, then `dataclasses.replace` the bundle(s) you
   want to change.
2. Register it in the `SCENARIO_BUILDERS` dict.
3. Add a row to [`scenarios.md`](scenarios.md) describing what changes and
   why.
4. Add a test in `tests/test_scenarios.py` that asserts the post-condition
   you care about (e.g., a parameter landed where you expected, or the
   resulting simulation is shifted in the expected direction).

The CLI will pick up the new name automatically — the `--scenario` choices
are derived from `list_scenarios()`.

## Reproducibility hygiene

* Never call `np.random.<x>` directly. All randomness flows through a
  `numpy.random.Generator` derived from the user-supplied seed.
* When sampling inside helper functions, accept `rng: Generator` as a
  parameter rather than constructing one internally. This keeps the
  randomness chain auditable.

## Performance

The full DGP at `N=10_000, T=120` runs in well under a second on a typical
laptop. If you make a change and notice the runtime jump, the most likely
culprits (in decreasing order of probability) are:

1. A Python-level loop replaced what used to be a NumPy reduction. Profile
   with `uv run python -m cProfile -s cumulative -m iohmm_evac.cli simulate ...`.
2. Allocating a new `(N, T+1)` array per step instead of reusing one row.
3. Using `np.append` / `np.concatenate` inside the per-step loop.

## Joblib parallelism and BLAS thread pinning

The bootstrap runner uses joblib's **loky** backend
(`Parallel(n_jobs=..., backend="loky")`). Loky spawns subprocesses (rather
than threads or forks), which lets each worker run an independent EM fit
without GIL contention.

To stop NumPy/SciPy's BLAS from over-subscribing the box when N workers
are already running in parallel, `iohmm_evac.bootstrap.runner` calls
`os.environ.setdefault(...)` for `OMP_NUM_THREADS`, `OPENBLAS_NUM_THREADS`,
`MKL_NUM_THREADS`, `BLIS_NUM_THREADS`, `VECLIB_MAXIMUM_THREADS`, and
`NUMEXPR_NUM_THREADS` *at module import time, before NumPy is imported*.
Loky workers re-import the module fresh, so the worker-side BLAS sees
single-thread settings.

If you need to override (e.g. on a 64-core box you want 8 outer × 8 inner
threads), set the env vars explicitly before launching `iohmm-evac`:

```bash
OMP_NUM_THREADS=8 uv run iohmm-evac bootstrap fit --jobs 8 ...
```

`setdefault` will respect the explicit value and skip the override.

## The test-only clean DGP

[`tests/_clean_dgp.py`](../tests/_clean_dgp.py) is a deliberately simpler
DGP used by the inference test suite. It matches the IO-HMM exactly: no
endogenous feedback (`pi`, `c`, `tir`), no `evac_path` bookkeeping — just
state-conditional emissions over a fixed exogenous input sequence.

It exists *separately from* the production DGP in `iohmm_evac.dgp` because
the production DGP is intentionally mis-specified relative to the IO-HMM
(the chapter's narrative point about real-world model fit). Recovery tests
need a setting where the IO-HMM is the *correct* model — that's the role
of the clean DGP.

The clean DGP is test-only (the leading underscore in
`_clean_dgp.py` keeps it out of pytest's auto-collection) and is not part
of the public package.