A real language model, downloaded once and run entirely on your own device with WebGPU. No server, no API key, no per-message cost β with personas, a prompt library, markdown answers, saved chats and export. Your conversation stays 100% private.
Tap a prompt to drop it into the box, then edit and send.
Pro adds custom personas, generation controls, the full prompt library, and TXT/JSON export with full backup & restore. Everything still runs 100% on your device β Pro is just more control, no accounts, no tracking.
Try it now with demo code AV-POCKETLLM-DEMO.
PocketLLM loads a compact open-source language model and runs it directly on your computer's graphics hardware through WebGPU. Because the model lives in your browser, there is no server to pay for, no API key to manage, and nothing to leak β every message you type is processed locally and then kept only on your own device. It is a working demonstration of where browser AI has reached in 2026: capable models, running for free, on hardware you already own.
A raw chatbot is fine, but the same model becomes far more useful when you tell it who to be. PocketLLM ships with ready-made personas β a Coding Helper that answers in clean code blocks, an Email Writer that drafts in a natural tone, a Brainstorm Partner that fires off many short ideas, a Study Tutor that explains step by step, and a faithful Translator. Switch persona at any time, or build and save your own with a custom system prompt so the assistant always starts in exactly the mindset you want.
The hardest part of using any AI is knowing what to ask. The built-in prompt library gives you tested starting points across writing, coding, study, business and everyday life β tap one and it drops straight into the message box ready to edit. It turns "I'm not sure how to phrase this" into a one-tap head start, and it is a big reason small on-device models punch above their size: a good prompt does a lot of the work.
Responses render as proper markdown β headings, lists, and syntax-friendly code blocks with one-tap copy β so answers are easy to read and reuse. Every conversation is saved in your browser so you can rename it, return to it, or delete it later, and a live tokens-per-second readout shows exactly how fast your hardware is generating. When you want to keep something, export the current chat to Markdown for free; Pro adds plain-text and JSON export plus a full backup and restore of every saved conversation, so your data is always portable and never locked in.
Nothing is uploaded and there is no account. Once the model has cached you can disconnect from the internet entirely and keep chatting. These small models (0.5 to 1.5 billion parameters) are quick and genuinely useful for everyday tasks β questions, drafting, brainstorming, rewriting and summarising β while heavy reasoning or large-scale coding still belongs to big cloud models. For fast, private, zero-cost help, on-device is hard to beat.