The numbers below are wall-clock timings on commodity hardware. Models that a given tool cannot fit at all are marked accordingly. Treat the figures as order-of-magnitude rather than precise micro-benchmarks — they will move with hardware, BLAS choice, and convergence tolerance.
Representative workloads
| Model | N | fastsem | OpenMx | lavaan | Mplus |
|---|---|---|---|---|---|
| Bivariate FIML, ordinal | 5,000 | ~3 s | ~45 s | DWLS only | ~20 s |
| ACE twin model (FIML) | 1,000 pairs | ~1 s | ~10 s | no ACE | ~4 s |
| RI-CLPM, 4 waves, ordinal | 3,000 | ~8 s | ~90 s | no FIML ordinal | ~30 s |
| Gen-SEM scan | 500k SNPs | ~50 min | ~20 h | not supported | not sup |
Why fastsem is fast
- Analytical gradients. Exact ML and FIML gradients are computed via the RAM weight matrix, not by finite differences.
- OpenCL float32 objective. The objective function is evaluated on the GPU when available; the float64 analytical gradient runs on CPU.
- Multi-threaded CPU fallback. When no GPU is detected, the same objective runs on multiple cores via OpenMP-style parallelism.
- Warm-starting. For Gen-SEM-style sweeps, each SNP fit reuses the previous solution, yielding roughly a 20× speed-up over cold starts.
- L-BFGS with preconditioner and mini-batch warmup. Faster convergence on stiff likelihoods than naive quasi-Newton.
Reproducing these numbers
Reproduction scripts live in the fastsem engine
repository under benchmarks/. The R-side counterparts
(using run_fastsem() against equivalent OpenMx and lavaan
models) will be packaged in a future revision of fastsemR.
For everyday R workloads, the easiest way to spot the speedup is to
take an existing umx/OpenMx model and swap mxRun() for
run_fastsem() — see the getting started article.