πŸ”’ The AI runs on your device β€” messages never leave your browser. No server, no API key, no cost.

A private AI chat that runs in your browser

A real language model, downloaded once and run entirely on your own device with WebGPU. No server, no API key, no per-message cost β€” with personas, a prompt library, markdown answers, saved chats and export. Your conversation stays 100% private.

βš™ checking WebGPUβ€¦πŸ”’ fully privateπŸ’Έ zero costπŸ“΄ offline after load
Pick a model and press Load. First load downloads it once, then caches.

βš™ Generation controls PRO

Fine-tune the model's behaviour. Adjust creativity (temperature), focus (top-p) and answer length, and save a setup as a preset.
Temperature (creativity)0.70
Top-p (focus)0.95
Max response length (tokens)800
Preset

✨ Prompt library

Tap a prompt to drop it into the box, then edit and send.

Unlock 24+ more prompts across writing, coding, study, business and everyday life.

πŸ”“ Unlock PocketLLM Pro

Pro adds custom personas, generation controls, the full prompt library, and TXT/JSON export with full backup & restore. Everything still runs 100% on your device β€” Pro is just more control, no accounts, no tracking.

Try it now with demo code AV-POCKETLLM-DEMO.

Load a model above, pick a persona, then say hello πŸ‘‹

Real AI, no cloud and no cost

PocketLLM loads a compact open-source language model and runs it directly on your computer's graphics hardware through WebGPU. Because the model lives in your browser, there is no server to pay for, no API key to manage, and nothing to leak β€” every message you type is processed locally and then kept only on your own device. It is a working demonstration of where browser AI has reached in 2026: capable models, running for free, on hardware you already own.

Personas that steer the assistant

A raw chatbot is fine, but the same model becomes far more useful when you tell it who to be. PocketLLM ships with ready-made personas β€” a Coding Helper that answers in clean code blocks, an Email Writer that drafts in a natural tone, a Brainstorm Partner that fires off many short ideas, a Study Tutor that explains step by step, and a faithful Translator. Switch persona at any time, or build and save your own with a custom system prompt so the assistant always starts in exactly the mindset you want.

A prompt library so you never face a blank box

The hardest part of using any AI is knowing what to ask. The built-in prompt library gives you tested starting points across writing, coding, study, business and everyday life β€” tap one and it drops straight into the message box ready to edit. It turns "I'm not sure how to phrase this" into a one-tap head start, and it is a big reason small on-device models punch above their size: a good prompt does a lot of the work.

Markdown answers, saved chats and export

Responses render as proper markdown β€” headings, lists, and syntax-friendly code blocks with one-tap copy β€” so answers are easy to read and reuse. Every conversation is saved in your browser so you can rename it, return to it, or delete it later, and a live tokens-per-second readout shows exactly how fast your hardware is generating. When you want to keep something, export the current chat to Markdown for free; Pro adds plain-text and JSON export plus a full backup and restore of every saved conversation, so your data is always portable and never locked in.

Private by design

Nothing is uploaded and there is no account. Once the model has cached you can disconnect from the internet entirely and keep chatting. These small models (0.5 to 1.5 billion parameters) are quick and genuinely useful for everyday tasks β€” questions, drafting, brainstorming, rewriting and summarising β€” while heavy reasoning or large-scale coding still belongs to big cloud models. For fast, private, zero-cost help, on-device is hard to beat.

Frequently asked questions

Where does the AI run?
Entirely on your own device, via WebGPU, inside the browser tab. No server, no API key β€” your messages never leave your computer.
Is it really free?
Yes. The model runs on your GPU, so there is no per-message fee. It downloads once, caches, and runs offline thereafter.
What do I need?
A modern desktop browser with WebGPU (Chrome, Edge or Brave) on a reasonably recent computer. Larger models need more GPU memory. The smallest model can run on capable phones.
Are my conversations saved or uploaded?
Nothing is uploaded. Chats are stored only in your browser (localStorage) so you can return to them. Rename, delete or export them anytime; clearing browser data removes them.
Can I change the AI's personality or settings?
Yes β€” pick a persona such as Coding Helper or Study Tutor, or create your own. Pro adds temperature, top-p and length controls and the full prompt library.
How does it compare to ChatGPT?
These are small models, so great for quick questions, drafting and summarising, but not as strong as large cloud models on complex reasoning. The upside is total privacy and zero cost.
Note: requires a desktop browser with WebGPU (Chrome/Edge/Brave). First load downloads the model (~350 MB to ~1.1 GB depending on choice); this is a one-time download that then caches. Responses are AI-generated by a small model and can be wrong or made-up β€” verify anything important. Not for medical, legal or financial advice.