Watchfire

Watchfire

8 billion parameters. One bit per weight.
Running on your phone. No cloud. No compromise.

Your AI lives on your device. It never phones home. It works on the subway, in airplane mode, in the middle of nowhere. It's yours.

8B
Parameters
1-bit
Per weight
1.1 GB
Total size
0
Server calls
Capabilities

Everything runs on your device

No API keys. No rate limits. No terms of service changes. Full language model inference, entirely local.

1-Bit Inference

PrismML Bonsai models. Each weight is a single bit — a direction, not a magnitude. ARM NEON sign-flip XOR. No multiplication.

Lodestone Crystals

Domain knowledge pre-crystallized and compressed 7.2×. The model arrives already understanding your field.

Modulum Armor

Six optimization pieces working in concert. Attention shaping. Active vocabulary. Null cone acceleration. Compression. Intelligent eviction. Spectral coverage.

Fully Private

Zero telemetry. Zero server calls. Zero data collection. Works in airplane mode. Your conversations never leave your phone.

Domain Library

Fiction and medical lodestones. PRISMA screenplay. EMS protocols from 5,622 clinical encounters. Switch domains in one tap.

Fire & Ice

Dark mode with warm ember accents. Light mode with cool arctic tones. Adapts to you. Persists across sessions.

Getting started

Running in under a minute

1

Get the app

10 MB download. The full inference engine ships inside.

2

Pick a model

1.7B for speed. 8B for depth. Downloads with resume support.

3

Load a lodestone

Pre-crystallized domain knowledge. Fiction or medical. The model starts warm.

4

Ask anything

Streaming responses. Conversation history. Stop anytime. No internet needed.

Knowledge domains

Crystallized expertise

Lodestones are pre-computed attention states — the model's understanding of a domain, compressed and frozen. Load one and the model already knows.

Fiction

PRISMA

Anime screenplay analysis. Characters, arcs, themes, lore, and scene-by-scene knowledge of the complete script.

1,964
Tokens
7.2x
Compressed
60 MB
1.7B model
78 MB
8B model
Medical

Emergency Medicine

NAEMSP protocols, ACLS/PALS guidelines. Built from 5,622 clinical encounters across three datasets.

2K-16K
Context
309
Encounters
65 MB
Smallest
660 MB
Largest
The industry says: bigger models, bigger datacenters, bigger budgets. We proved the intelligence lives in the conversation, not the weights. A 1-bit model with a lodestone outperforms a cold model with 100× the parameters. The EMT on hour 36, the nurse at 3 AM, the inventor at 2 AM — each one grows their own crystal from their own experience. No server. No subscription. No landlord.
Under the hood

Technical specifications

Built from first principles. Every byte accounted for.

ModelsBonsai 1.7B & 8B (PrismML)
QuantizationQ1_0_g128 — 1 bit, 128-element groups
InferenceARM NEON, sign-flip XOR, zero multiply
CompressionCape PCA + 4-bit residual (7.2×)
OptimizationModulum — 6 active pieces
TokenizerQwen BPE, 151,665 entries
ArchitectureQwen3 — QK norm, YaRN RoPE, tied embeddings
PlatformiOS 17+ — iPhone & iPad
Privacy100% on-device, zero telemetry
Medical data5,622 encounters (NAEMSP + MIMIC-IV + Kaggle)