Kimi-K2.6 Beats Ministral-3B by Doing the Job Right

Kimi-K2.6 wins this matchup 38.0 to 25.0 by being the more reliable, instruction-tight model across every task. Ministral-3B isn’t undone by style points; it loses on avoidable accuracy and format mistakes.

By · Published

Two side-by-side computer terminals or printouts on a desk, one displaying perfectly structured, accurate data and the other showing subtle but critical formatting errors or incorrect values. (oil painting)

Kimi-K2.6 takes this head-to-head cleanly because it behaves like a model that actually reads the assignment. Across all four tasks, it stayed inside the requested format, avoided inventing facts, and handled edge cases with more care. That sounds basic, but in practical use it’s the difference between output you can ship and output you have to babysit.

The clearest example is python-log-redactor. Kimi-K2.6 returned code only, included exactly four short assert tests, and used a tighter IPv4 regex that avoids redacting invalid addresses. Ministral-3B fumbled both the wrapper and the substance: it added prose/markdown despite the code-only instruction, and its regex was loose enough to catch invalid IP-like strings. That’s not a cosmetic miss; it’s a correctness miss.

The same pattern shows up in the writing and extraction tasks. In customer-delay-email, Kimi-K2.6 kept the tone calm and accountable without smuggling in details the prompt never gave it. Ministral-3B wrote a polished note, but invented dates anyway. In meeting-summary-risks, Kimi-K2.6 stuck to the notes and the requested JSON keys, while Ministral-3B started freelancing—adding unsupported dates, changing the owner_decisions schema, and inferring next steps that weren’t actually in the source.

messy-orders-to-json seals it. Kimi-K2.6 produced a valid JSON array only, normalized correctly, and sorted by ship_by ascending as instructed. Ministral-3B again added extra prose outside the JSON and even misordered the results by putting 2026-04-09 before 2026-04-08. Those are unforced errors.

Final call: Kimi-K2.6 is the better text model here, decisively. It wins not by being flashy, but by being stricter, cleaner, and more trustworthy on the exact tasks in front of it. Ministral-3B keeps making the kind of small-seeming mistakes that break real workflows.

How they were tested

We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had gpt-5.4 score each one. Kimi-K2.6 scored 38.0 to Ministral-3B's 25.0.

1. python-log-redactor

Language: Python 3.11 Write a function redact_log(line: str) -> str for a support tool. It must replace: - any IPv4 address with [IP] - any email address with [EMAIL] - any 10-digit incident ID written as INC-########## with [INCIDENT] Preserve all other text exactly. Do not use external packages. Then add 4 short assert tests covering mixed content and repeated matches. Return code only.

Winner: Kimi-K2.6 — A follows the instruction to return code only, includes exactly four short assert tests with mixed and repeated matches, and uses a stricter IPv4 regex that avoids matching invalid addresses. B violates the code-only requirement by adding prose/markdown and uses an overly permissive IPv4 pattern that would redact invalid IP-like strings.

2. customer-delay-email

Draft an email to a customer named Priya Menon at BlueHarbor Clinics. Audience: an operations manager waiting on a vendor onboarding. Goal: explain that our payout verification is delayed because the bank rejected the first micro-deposit due to an account-name mismatch, confirm no action is needed from her team until Friday, and promise a status update by 3 p.m. CT Friday. Tone: calm, accountable, professional. Length: 120–160 words.

Winner: Kimi-K2.6 — A is clear, accurate, and fully follows the prompt with a calm, accountable tone and no unnecessary specifics. B is polished but introduces unsupported dates not provided in the prompt, which risks inaccuracy and slightly weakens instruction adherence.

3. meeting-summary-risks

Read these meeting notes and produce: 1) a 3-bullet summary 2) a JSON object with keys launch_date, owner_decisions, open_risks, and next_steps Notes: "Atlas Billing sync, 9:00 a.m. Tuesday. Mara said the pilot for Cedar Point Dental moved from May 20 to May 28 because their SFTP allowlist still blocks our new outbound IP. Jin confirmed invoice PDF generation is fixed in staging, but not yet deployed to production. Elio wants the launch banner removed before pilot start. Pri explained support can cover extended hours on May 28-30 only if training is finished by May 24. Decision: Mara owns customer comms, Jin owns prod deploy, Pri owns support schedule. Risk: if allowlist isn't updated by May 23, pilot slips again. Next step: Mara sends Cedar Point the new IPs today; Jin deploys Thursday night and posts rollback plan in #atlas-ops."

Winner: Kimi-K2.6 — A is faithful to the notes, concise, and uses the requested JSON keys without inventing dates or extra structure. B adds unsupported absolute dates/year assumptions, changes the owner_decisions schema, and includes inferred next steps not explicitly listed as decisions or next steps.

4. messy-orders-to-json

Convert the messy order lines below into valid JSON as an array of objects with EXACTLY these keys in this order: order_id (string), customer (string), items (array of strings), priority (string: low|normal|high), ship_by (string in YYYY-MM-DD), gift (boolean). Rules: trim spaces, normalize customer names to title case, convert Y/Yes/true to true and N/No/false to false, split multiple items on ;, and sort the output array by ship_by ascending. Data: ORD-9081 | customer= nora velasquez | items=label rolls; cutter blades | priority=HIGH | ship_by=2026/04/09 | gift=N ORD-9077|customer=ACME FIELD OPS|items=solar charger|priority=normal|ship_by=2026-04-07|gift=yes ORD-9090 | customer= dmitri orlov | items= desk mount ; cable clips ; usb hub | priority= low | ship_by=2026-04-12 | gift= false ORD-9079 | customer=mei tan | items=waterproof notebook; graphite pencils | priority=High | ship_by=2026-04-08 | gift=Y

Winner: Kimi-K2.6 — Model A fully follows the instructions: valid JSON array only, correct normalization, and correctly sorted by ship_by ascending. Model B includes extra prose outside the JSON and misorders the array by placing 2026-04-09 before 2026-04-08.


See every prompt and the full side-by-side outputs in the interactive Head-to-Head.

Reader comments

Conversation for this story loads after sign-in.