ThinkHere App — In-Browser AI

First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.

WebLLM models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web.

How do these models run in your browser? ↓

Two paths to in-browser AI

WebLLM + MLC

Cloudflare R2MLC-format weights

TVM / MLC CompilerAhead-of-time
                      compilation

WebGPU Compute ShadersPre-optimized GPU kernels

Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC. The compiler transforms model weights into optimized WebGPU compute shaders that run directly on your GPU.

Used by

SmolLM2 1.7B, Mistral 7B, Llama 3.2

Transformers.js + ONNX

Cloudflare R2ONNX-format model

ONNX Runtime WebBuilds execution
                      plan

WebGPU

WASM fallback

Your GPU / CPU

ONNX Runtime Web interprets the model graph and executes it via WebGPU, or falls back to WebAssembly.

Used by

Qwen3.5 0.8B, 2B, 4B · Gemma 4 E2B · Gemma 4 E4B

Both methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download.

—

Initializing...

System Prompt

Generation

Temperature 0.7

Top-P 0.9

Max Tokens 1024

Knowledge Base

Embedding model not loaded

No documents added

Drop file to add as context

Model loaded · all processing happens here

—

0 tokens

Ready

By using ThinkHere you agree to our Terms of Use and Privacy Policy · A Qanata Labs product

Sign in

Create account

Verify your email

Reset password

Set new password

Upgrade to Pro

Choose your model

Two paths to in-browser AI

WebLLM + MLC

Transformers.js + ONNX

System Prompt

Generation

Knowledge Base