Design an LLM Inference Serving System
Unique to OpenAI and AI companies. Tests understanding of GPU infrastructure, streaming inference (token streaming), load balancing for stateful GPU workers, and caching strategies for LLM workloads.
Enable JavaScript for the full StreamPrep guide.