8 billion parameters. One bit per weight.
Running on your phone. No cloud. No compromise.
Your AI lives on your device. It never phones home. It works on the subway, in airplane mode, in the middle of nowhere. It's yours.
No API keys. No rate limits. No terms of service changes. Full language model inference, entirely local.
PrismML Bonsai models. Each weight is a single bit — a direction, not a magnitude. ARM NEON sign-flip XOR. No multiplication.
Domain knowledge pre-crystallized and compressed 7.2×. The model arrives already understanding your field.
Six optimization pieces working in concert. Attention shaping. Active vocabulary. Null cone acceleration. Compression. Intelligent eviction. Spectral coverage.
Zero telemetry. Zero server calls. Zero data collection. Works in airplane mode. Your conversations never leave your phone.
Fiction and medical lodestones. PRISMA screenplay. EMS protocols from 5,622 clinical encounters. Switch domains in one tap.
Dark mode with warm ember accents. Light mode with cool arctic tones. Adapts to you. Persists across sessions.
10 MB download. The full inference engine ships inside.
1.7B for speed. 8B for depth. Downloads with resume support.
Pre-crystallized domain knowledge. Fiction or medical. The model starts warm.
Streaming responses. Conversation history. Stop anytime. No internet needed.
Lodestones are pre-computed attention states — the model's understanding of a domain, compressed and frozen. Load one and the model already knows.
Anime screenplay analysis. Characters, arcs, themes, lore, and scene-by-scene knowledge of the complete script.
NAEMSP protocols, ACLS/PALS guidelines. Built from 5,622 clinical encounters across three datasets.
The industry says: bigger models, bigger datacenters, bigger budgets. We proved the intelligence lives in the conversation, not the weights. A 1-bit model with a lodestone outperforms a cold model with 100× the parameters. The EMT on hour 36, the nurse at 3 AM, the inventor at 2 AM — each one grows their own crystal from their own experience. No server. No subscription. No landlord.
Built from first principles. Every byte accounted for.
| Models | Bonsai 1.7B & 8B (PrismML) |
| Quantization | Q1_0_g128 — 1 bit, 128-element groups |
| Inference | ARM NEON, sign-flip XOR, zero multiply |
| Compression | Cape PCA + 4-bit residual (7.2×) |
| Optimization | Modulum — 6 active pieces |
| Tokenizer | Qwen BPE, 151,665 entries |
| Architecture | Qwen3 — QK norm, YaRN RoPE, tied embeddings |
| Platform | iOS 17+ — iPhone & iPad |
| Privacy | 100% on-device, zero telemetry |
| Medical data | 5,622 encounters (NAEMSP + MIMIC-IV + Kaggle) |