Stable Transit Labs
SAGE TATE INITIALISING
SESSION --
UPTIME 00:00:00

Applied AI Research

We research cognitive training for small local models, specialising in on-device reasoning and context optimisation for embedded systems.

SOTA LLM Inference Guide (April 2026)

Curated recommendations by VRAM tier for local inference (Q4 Quantisation).

tags generalcodingreasoningmultimodalsubagents

ENTERPRISE (64GB+ VRAM)

massive scale — system-wide refactoring & agentic swarms
deepseek-v4-pro ~42 GB

93.5% LCB. Best for system-wide refactoring.

codinggeneral
70B+
qwen3.6-plus ~42 GB

1M Context. Optimized for agentic swarms.

codingsubagents
72B
llama4-maverick ~40 GB

Meta's flagship for multi-language & images.

generalmultimodal
70B+

LARGE (32GB VRAM)

large — gold standard logic & engineering
gemma4:31b ~18 GB

80% LCB; 256K context. The local logic gold standard.

codingreasoning
31B
qwen3.5-coder:32b ~19 GB

Deep specialized engineering logic.

coding
32B
deepseek-v4-flash ~19 GB

1M context; high-efficiency MoE.

codingsubagents
32B

MID-RANGE (16GB VRAM)

mid-range — balanced intelligence & context
gemma4-moe:26b ~15 GB

4B active; high-speed with 30B+ intelligence.

codinggeneral
26B
phi4:14b ~9 GB

SOTA reasoning-per-parameter for complex math/code.

reasoningcoding
14B
mistral-small:24b ~14 GB

Best balanced 128k context generalist.

general
24B

EFFICIENT (8GB VRAM)

efficient — real-time local intelligence
qwen3.5-coder:7b ~4 GB

Best real-time IDE local completions.

coding
7B
gemma4-e4b ~6 GB

Native Audio/Vision for visual debugging.

multimodalsubagents
11B
deepseek-r1:8b ~5 GB

Best for logical debugging and 'thinking' tasks.

reasoning
8B

LOW-POWER (4GB VRAM)

small — edge instruction following
llama4-scout:3b ~2 GB

SOTA sub-5GB instruction following.

generalsubagents
3B
qwen3.5-coder:3b ~2 GB

Local 'Ghost Text' and shell scripts.

codingsubagents
3B
gemma3:4b ~2.5 GB

Efficient summarization and baseline assistant.

general
4B

EDGE/MICRO (500MB - 2GB VRAM)

ultra-small — specialized agentic tasks
gemma4-e2b ~1.5 GB

Agent-ready; native tool-use and vision.

multimodalsubagents
2B
qwen3.6:0.6b ~0.5 GB

SOTA sub-1GB logic for boilerplate and regex.

codingsubagents
0.6B
deepseek-r1:0.5b ~0.5 GB

'Thinking' minis for CLI and automation.

reasoningsubagents
0.5B
smollm3:500m ~0.5 GB

Best for data cleaning, formatting, and JSON tasks.

subagents
0.5B