System Design Questions | Chill Interview Learn

Design a Reproducible ML Configuration System

You are designing a configuration system for machine learning experiments. Researchers use this system to define training runs, evaluation jobs, model settings, dataset choices, optimizer parameters, runtime options, and experiment metadata. The same config system should work for quick local experiments and for scheduled jobs on remote training machines.

View system design question(Beta Access)

Review a Real-World Design Proposal

You are given an existing technical design document for a system the team is considering building. The document may describe a backend service, data platform, ML infrastructure component, internal tool, or product-facing system. Your task is not to design from a blank page, but to review the proposal like a senior engineer or engineering manager would in a real planning process.

View system design question(Beta Access)

Design a Scalable Data Platform for Multiple Teams

You are designing a data infrastructure platform for a company that collects large volumes of operational, product, and business data. Different teams need to use this data in different ways: some want near-real-time dashboards, some run offline analytics, some build derived datasets, and some need access to sensitive fields under strict controls.

View system design question(Beta Access)

Distribute a Large Model Checkpoint Across a GPU Cluster

You are building an internal deployment service that needs to roll out a new model checkpoint to every machine in a compute cluster. The checkpoint is very large, often hundreds of GB, and it starts from a single model repository. Before workers can serve traffic, each machine must have a complete and verified local copy of the model weights.

View system design question(Beta Access)

Design a Reliable 1-on-1 Messaging Service

Design a web-based direct messaging system where users can exchange private 1-on-1 messages in real time. The product does not need group chats, public channels, reactions, media attachments, or multi-device sync in the initial version. Each user is assumed to use one active web client.

View system design question(Beta Access)

Design a High-Throughput LLM Inference Gateway

You are designing the serving layer for a large language model product. Users send prompts to an API and expect generated text back, either as a complete response or as a streamed sequence of tokens. Behind the API, the system runs multiple model replicas on GPU machines and must keep those expensive GPUs highly utilized without making user-facing latency unpredictable.

View system design question(Beta Access)