How GPT-4, Mixtral, and DeepSeek activate only a fraction of their parameters per token. Routers, experts, and sparse activation explained.