Liquid AI ships LFM2.5-8B-A1B, an 8B on-device MoE trained on 38T tokens

The on-device 8B MoE adds a 128K context, 128K vocab, and scaled pretraining to improve tool-calling on laptops, with base and post-trained weights on Hugging Face.

By ·

Why it matters

Edge AI is shifting from demos to deployables. An 8B MoE that chains tools, runs in llama.cpp/MLX, and handles 128K context lowers latency, cuts cloud costs, and makes private, on-device agents viable.

A high-performance laptop, seen as if through a thermal camera, revealing its intricate internal AI processing and data flow (Infrared / thermal-camera aesthetic with scientific instrument readout overlays)

Liquid AI released LFM2.5-8B-A1B, a new 8B-parameter mixture-of-experts model for on-device tool use, expanding context to 128,000 tokens and scaling pretraining to 38T tokens, in a blog post on May 28.

What is it

LFM2.5-8B-A1B is positioned as an edge model for fast, reliable tool calling on consumer hardware. Liquid AI says it is a "reasoning-only" model that produces an explicit chain of thought before the final answer, aiming to make agentic tasks like multi-step tool use more robust in compute-bound settings. Both base and post-trained variants are available on Hugging Face and in the Liquid Playground with a one-click chat UI (Playground); setup details are in the docs.

What changed since LFM2-8B-A1B

The release builds on October 2025's LFM2-8B-A1B with several upgrades:

  • Context window grows from 32,768 to 128,000 tokens.
  • Pretraining scale increases from 12T to 38T tokens, followed by large-scale reinforcement learning.
  • The tokenizer vocabulary doubles from 65,536 to 128,000 tokens. Liquid AI reports compression gains for non-Latin languages, with larger improvements in Hindi, Thai, Vietnamese, Indonesian, and Arabic.
  • Architecture continues the prior blend of MoE, grouped-query attention, and gated short convolution blocks.

Liquid AI says the model is optimized for edge inference with day-one support in common runtimes, including llama.cpp, MLX, vLLM, and SGLang.

Performance claims

Liquid AI frames LFM2.5-8B-A1B as "compressed performance" that can compete with larger dense and MoE models on instruction-following and agentic tasks while fitting on laptops. On its internal benchmark slate, Liquid AI reports notable gains versus the prior 8B model:

  • AA-Omniscience Index improves from -78.42 to -24.70, driven by a jump in non-hallucination rate from 7.46 to 63.47.
  • IFEval rises from 79.44 to 91.84; AIME25 from 20.00 to 42.53; and MATH500 from 74.80 to 88.76.
  • Specialized agentic tests (IFBench, Multi-IF) and industry-flavored Tau^2 evaluations show double-digit point improvements.

Liquid AI links to the AA-Omniscience methodology at Artificial Analysis and characterizes throughput as "fastest in its size class" on both CPU and GPU, though it does not publish latency numbers in the post.

Liquid AI's chart highlights gains on hallucination-sensitive and instruction-following tests

Developer notes and open questions

For builders, the practical hooks are in place: weights on Hugging Face, a hosted Playground to try the model, and support for llama.cpp and MLX for local runs across CPUs and consumer GPUs. The post also details how Liquid AI expanded the tokenizer in-place to 128K vocab to better handle non-Latin scripts while preserving compatibility with prior token IDs.

Some details are not specified in the blog: license terms for the weights, expert/routing counts and active parameters, the training compute budget or data mixture for the 38T-token run, and concrete throughput metrics. Still, the release fits a pattern for Liquid AI in 2026: a steady cadence of on-device models (from a 1.2B "Thinking" variant up to 24B-class LFM2) and partnerships to embed models outside the data center, including a Mercedes-Benz collaboration announced April 23, 2026.

For teams betting on local agents, the combination of 128K context, explicit reasoning traces, and out-of-the-box support in standard edge inference stacks is the headline.

Reader comments

Conversation for this story loads after sign-in.