Choose your model

All models run locally in your browser. No data leaves your device.

First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.
WebLLM and MediaPipe models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web.
How do these models run in your browser? ↓

Three paths to in-browser AI

WebLLM + MLC

HuggingFaceMLC-format weights
TVM / MLC CompilerAhead-of-time compilation
WebGPU Compute ShadersPre-optimized GPU kernels
Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC. The compiler transforms model weights into optimized WebGPU compute shaders that run directly on your GPU.

Used by

SmolLM2, Qwen3 4B, Phi-3.5, Llama 3.2

Transformers.js + ONNX

HuggingFaceONNX-format model
ONNX Runtime WebBuilds execution plan
WebGPU
WASM fallback
Your GPU / CPU

ONNX Runtime Web interprets the model graph and executes it via WebGPU, or falls back to WebAssembly.

Used by

Qwen3.5 0.8B, 2B, 4B

MediaPipe + LiteRT

HuggingFaceLiteRT model file
MediaPipe GenAILLM Inference API
WebGPU ComputeMultimodal: text + images
Your GPU

Google's MediaPipe LLM Inference loads Gemma models. Supports multimodal input — text and images.

Used by

Gemma 3n E2B, Gemma 3n E4B

All three methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download.
Initializing engine...
Download
Compile
Ready
0% 0s elapsed
Downloading model weights — this only happens once, then it's cached locally.

System Prompt

Generation

Knowledge Base

Embedding model not loaded
No documents added
Drop file to add as context
Model loaded · all processing happens here
0 tokens
Ready