Applied AI Research
We research cognitive training for small local models, specialising in on-device reasoning and context optimisation for embedded systems.
SOTA LLM Inference Guide (April 2026)
Curated recommendations by VRAM tier for local inference (Q4 Quantisation).
ENTERPRISE (64GB+ VRAM)
massive scale — system-wide refactoring & agentic swarmsdeepseek-v4-pro ~42 GB 93.5% LCB. Best for system-wide refactoring.
qwen3.6-plus ~42 GB 1M Context. Optimized for agentic swarms.
llama4-maverick ~40 GB Meta's flagship for multi-language & images.
LARGE (32GB VRAM)
large — gold standard logic & engineeringgemma4:31b ~18 GB 80% LCB; 256K context. The local logic gold standard.
qwen3.5-coder:32b ~19 GB Deep specialized engineering logic.
deepseek-v4-flash ~19 GB 1M context; high-efficiency MoE.
MID-RANGE (16GB VRAM)
mid-range — balanced intelligence & contextgemma4-moe:26b ~15 GB 4B active; high-speed with 30B+ intelligence.
phi4:14b ~9 GB SOTA reasoning-per-parameter for complex math/code.
mistral-small:24b ~14 GB Best balanced 128k context generalist.
EFFICIENT (8GB VRAM)
efficient — real-time local intelligenceqwen3.5-coder:7b ~4 GB Best real-time IDE local completions.
gemma4-e4b ~6 GB Native Audio/Vision for visual debugging.
deepseek-r1:8b ~5 GB Best for logical debugging and 'thinking' tasks.
LOW-POWER (4GB VRAM)
small — edge instruction followingllama4-scout:3b ~2 GB SOTA sub-5GB instruction following.
qwen3.5-coder:3b ~2 GB Local 'Ghost Text' and shell scripts.
gemma3:4b ~2.5 GB Efficient summarization and baseline assistant.
EDGE/MICRO (500MB - 2GB VRAM)
ultra-small — specialized agentic tasksgemma4-e2b ~1.5 GB Agent-ready; native tool-use and vision.
qwen3.6:0.6b ~0.5 GB SOTA sub-1GB logic for boilerplate and regex.
deepseek-r1:0.5b ~0.5 GB 'Thinking' minis for CLI and automation.
smollm3:500m ~0.5 GB Best for data cleaning, formatting, and JSON tasks.