Choose your model
All models run locally in your browser. No data leaves your device.
First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.
WebLLM and MediaPipe models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web.
Three paths to in-browser AI
WebLLM + MLC
HuggingFaceMLC-format weights
TVM / MLC CompilerAhead-of-time compilation
WebGPU Compute ShadersPre-optimized GPU kernels
Your GPU
Models are compiled ahead-of-time using Apache TVM / MLC. The compiler transforms model weights into optimized WebGPU compute shaders that run directly on your GPU.
Used by
SmolLM2, Qwen3 4B, Phi-3.5, Llama 3.2
Transformers.js + ONNX
HuggingFaceONNX-format model
ONNX Runtime WebBuilds execution plan
WebGPU
WASM fallback
Your GPU / CPU
ONNX Runtime Web interprets the model graph and executes it via WebGPU, or falls back to WebAssembly.
Used by
Qwen3.5 0.8B, 2B, 4B
MediaPipe + LiteRT
HuggingFaceLiteRT model file
MediaPipe GenAILLM Inference API
WebGPU ComputeMultimodal: text + images
Your GPU
Google's MediaPipe LLM Inference loads Gemma models. Supports multimodal input — text and images.
Used by
Gemma 3n E2B, Gemma 3n E4B
All three methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download.
—
Initializing engine...
Download
Compile
Ready
0%
—
0s elapsed
Downloading model weights — this only happens once, then it's cached locally.
System Prompt
Generation
Knowledge Base
Embedding model not loaded
No documents added