Choose your model

All models run locally in your browser. No data leaves your device.

First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.
WebLLM models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web.
How do these models run in your browser? ↓

Two paths to in-browser AI

WebLLM + MLC

Cloudflare R2MLC-format weights
TVM / MLC CompilerAhead-of-time compilation
WebGPU Compute ShadersPre-optimized GPU kernels
Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC. The compiler transforms model weights into optimized WebGPU compute shaders that run directly on your GPU.

Used by

SmolLM2 1.7B, Mistral 7B, Llama 3.2

Transformers.js + ONNX

Cloudflare R2ONNX-format model
ONNX Runtime WebBuilds execution plan
WebGPU
WASM fallback
Your GPU / CPU

ONNX Runtime Web interprets the model graph and executes it via WebGPU, or falls back to WebAssembly.

Used by

Qwen3.5 0.8B, 2B, 4B · Gemma 4 E2B · Gemma 4 E4B

Both methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download.
Initializing...

System Prompt

Generation

Knowledge Base

Embedding model not loaded
No documents added
Drop file to add as context
Model loaded · all processing happens here
0 tokens
Ready