Nous Research — Lighthouse Attention speeds up long-context pretraining 1.4–1.7×
Nous Research published Lighthouse Attention, a training-only hierarchical mechanism that pools queries, keys, and values symmetrically across a multi-resolution pyramid.
Tested on a 530M Llama-3 model at 98K context, it cuts attention compute from O(N·S·d) to O(S²·d) and delivers 1.40–1.69× wall-clock speedup against cuDNN SDPA with matching or lower final loss.
The wrapper is discarded after pretraining, so inference stays standard—no new runtime cost.