Benchmarks • fastsemR

The numbers below are wall-clock timings on commodity hardware. Models that a given tool cannot fit at all are marked accordingly. Treat the figures as order-of-magnitude rather than precise micro-benchmarks — they will move with hardware, BLAS choice, and convergence tolerance.

Representative workloads

Model	N	fastsem	OpenMx	lavaan	Mplus
Bivariate FIML, ordinal	5,000	~3 s	~45 s	DWLS only	~20 s
ACE twin model (FIML)	1,000 pairs	~1 s	~10 s	no ACE	~4 s
RI-CLPM, 4 waves, ordinal	3,000	~8 s	~90 s	no FIML ordinal	~30 s
Gen-SEM scan	500k SNPs	~50 min	~20 h	not supported	not sup

Why fastsem is fast

Analytical gradients. Exact ML and FIML gradients are computed via the RAM weight matrix, not by finite differences.
OpenCL float32 objective. The objective function is evaluated on the GPU when available; the float64 analytical gradient runs on CPU.
Multi-threaded CPU fallback. When no GPU is detected, the same objective runs on multiple cores via OpenMP-style parallelism.
Warm-starting. For Gen-SEM-style sweeps, each SNP fit reuses the previous solution, yielding roughly a 20× speed-up over cold starts.
L-BFGS with preconditioner and mini-batch warmup. Faster convergence on stiff likelihoods than naive quasi-Newton.

Reproducing these numbers

Reproduction scripts live in the fastsem engine repository under benchmarks/. The R-side counterparts (using run_fastsem() against equivalent OpenMx and lavaan models) will be packaged in a future revision of fastsemR.

For everyday R workloads, the easiest way to spot the speedup is to take an existing umx/OpenMx model and swap mxRun() for run_fastsem() — see the getting started article.