Defeating Nondeterminism in LLM Inference
Horace He and Thinking Machines tackle a core problem: LLM inference produces different outputs on identical inputs, breaking reproducibility and complicating debugging.
The post walks through sources of randomness (floating-point rounding, kernel scheduling, batch ordering) and practical techniques to pin down deterministic behavior in production systems.