Watchfire

8 billion parameters. One bit per weight.
Running on your phone. No cloud. No compromise.

Your AI lives on your device. It never phones home. It works on the subway, in airplane mode, in the middle of nowhere. It's yours.

Download on the App Store How it works

Parameters

1-bit

Per weight

1.1 GB

Total size

Server calls

Capabilities

Everything runs on your device

No API keys. No rate limits. No terms of service changes. Full language model inference, entirely local.

1-Bit Inference

PrismML Bonsai models. Each weight is a single bit — a direction, not a magnitude. ARM NEON sign-flip XOR. No multiplication.

Lodestone Crystals

Domain knowledge pre-crystallized and compressed 7.2×. The model arrives already understanding your field.

Modulum Armor

Six optimization pieces working in concert. Attention shaping. Active vocabulary. Null cone acceleration. Compression. Intelligent eviction. Spectral coverage.

Fully Private

Zero telemetry. Zero server calls. Zero data collection. Works in airplane mode. Your conversations never leave your phone.

Domain Library

Fiction and medical lodestones. PRISMA screenplay. EMS protocols from 5,622 clinical encounters. Switch domains in one tap.

Fire & Ice

Dark mode with warm ember accents. Light mode with cool arctic tones. Adapts to you. Persists across sessions.

Getting started

Running in under a minute

Get the app

10 MB download. The full inference engine ships inside.

Pick a model

1.7B for speed. 8B for depth. Downloads with resume support.

Load a lodestone

Pre-crystallized domain knowledge. Fiction or medical. The model starts warm.

Ask anything

Streaming responses. Conversation history. Stop anytime. No internet needed.

Knowledge domains

Crystallized expertise

Lodestones are pre-computed attention states — the model's understanding of a domain, compressed and frozen. Load one and the model already knows.

Fiction

PRISMA

Anime screenplay analysis. Characters, arcs, themes, lore, and scene-by-scene knowledge of the complete script.

1,964

Tokens

7.2x

Compressed

60 MB

1.7B model

78 MB

8B model

Medical

Emergency Medicine

NAEMSP protocols, ACLS/PALS guidelines. Built from 5,622 clinical encounters across three datasets.

2K-16K

Context

309

Encounters

65 MB

Smallest

660 MB

Largest

The industry says: bigger models, bigger datacenters, bigger budgets. We proved the intelligence lives in the conversation, not the weights. A 1-bit model with a lodestone outperforms a cold model with 100× the parameters. The EMT on hour 36, the nurse at 3 AM, the inventor at 2 AM — each one grows their own crystal from their own experience. No server. No subscription. No landlord.

Under the hood

Technical specifications

Built from first principles. Every byte accounted for.

Models	Bonsai 1.7B & 8B (PrismML)
Quantization	Q1_0_g128 — 1 bit, 128-element groups
Inference	ARM NEON, sign-flip XOR, zero multiply
Compression	Cape PCA + 4-bit residual (7.2×)
Optimization	Modulum — 6 active pieces
Tokenizer	Qwen BPE, 151,665 entries
Architecture	Qwen3 — QK norm, YaRN RoPE, tied embeddings
Platform	iOS 17+ — iPhone & iPad
Privacy	100% on-device, zero telemetry
Medical data	5,622 encounters (NAEMSP + MIMIC-IV + Kaggle)

Watchfire

Everything runs on your device

1-Bit Inference

Lodestone Crystals

Modulum Armor

Fully Private

Domain Library

Fire & Ice

Running in under a minute

Get the app

Pick a model

Load a lodestone

Ask anything

Crystallized expertise

PRISMA

Emergency Medicine

Technical specifications

Privacy Policy

What Watchfire Does

Data We Do Not Collect

Aggregate Analytics

Downloads

Contact

Terms of Service

The Service

Your Data

No Training on Your Data

Model Weights & Lodestones

Medical Disclaimer

Limitation of Liability

Changes

Contact