DeepSeek V4 Flash routs Xiaomi MiMo-V2.5

DeepSeek V4 Flash wins 34.0 to 17.0 by being usable, complete, and more faithful to the prompts. MiMo-V2.5 repeatedly looked polished while dropping facts, inventing details, or failing outright.

By RuntimeWire · Published Jun 4, 2026, 9:10am CT

Decisive victory of one AI model (DeepSeek V4 Flash) over another (Xiaomi MiMo-V2.5) in terms of accuracy and reliability. (1970s offset-print magazine illustration — prominent halftone dots, subtly off-register inks, warm yellowed paper te

DeepSeek V4 Flash won this matchup where it matters: execution. In the TypeScript room-conflict task, it delivered an essentially working implementation with correct same-room overlap detection, ISO timestamps, ordered IDs, and sorted output. MiMo-V2.5 was not just weaker; it was truncated and syntactically incomplete.

The writing tasks showed the same split. On the customer delay email, DeepSeek named the real problem — the finishing booth failed inspection — while keeping the tone calm and accountable. Xiaomi softened that into a vague “equipment issue” and even introduced an unsupported date, which is exactly the kind of helpful-sounding slippage that causes trouble.

In extraction and summarization, DeepSeek again kept more of the record intact. It captured the older export ranges and the absence of UI errors in the support-thread summary, and it pulled the reimbursement claims, dates, categories, receipts, and total into JSON accurately despite the code-fence violation. Xiaomi had some cleaner next actions in the support summary, but it omitted key facts; in the reimbursement task, it produced incomplete/invalid JSON and appears to have misidentified an employee.

Final call: DeepSeek V4 Flash wins decisively. MiMo-V2.5 has surface polish, but DeepSeek was the model you could actually hand the work to without spending the next hour repairing missing code, softened facts, and broken structured output.

How they were tested

We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had OpenAI: GPT-5.5 score each one. DeepSeek: DeepSeek V4 Flash scored 34.0 to Xiaomi: MiMo-V2.5's 17.0.

1. TypeScript room conflict detector

Practical coding — TypeScript. Implement export function findRoomConflicts(bookings: Booking[]): Conflict[] for a clinic scheduler. Use these types: type Booking = { id: string; room: string; start: string; end: string }; type Conflict = { room: string; a: string; b: string; overlapStart: string; overlapEnd: string };. start and end are ISO-8601 strings with offsets. Two bookings conflict only if they are in the same room and their time ranges overlap by a positive duration; touching endpoints are not conflicts. Return one Conflict for each conflicting pair, with a and b as the two booking ids in lexicographic order, overlapStart/overlapEnd as ISO strings from Date.toISOString(), and the result sorted by room, overlapStart, then a, then b. Do not mutate the input. Return code only.

Winner: DeepSeek: DeepSeek V4 Flash — Model A provides a complete, essentially correct implementation that detects positive overlaps within the same room, formats overlap times with toISOString, orders ids, and sorts results as requested, with only a minor markdown/code-only issue. Model B is truncated and syntactically incomplete, so it cannot satisfy the task.

2. Customer delay email

Professional writing — Draft a customer email from a small furniture company to 42 customers whose walnut desk orders are delayed because a finishing booth failed inspection. Audience: customers who already paid deposits. Tone: accountable, calm, not defensive. Length: 140–180 words. Include: new estimated ship window of July 18–24, option to keep order with free white-glove delivery, option to cancel for full deposit refund by replying before July 10, and a promise of another update next Friday. Do not mention legal liability or blame inspectors.

Winner: DeepSeek: DeepSeek V4 Flash — Model A more directly and transparently states that the finishing booth failed inspection while maintaining an accountable, calm tone and including all required options and dates. Model B is polished but softens the cause into an 'equipment issue' and adds an unsupported specific date for next Friday.

3. Support thread summary extraction

Summarization & extraction — Read the support-thread excerpt and return a concise JSON object with keys summary (max 35 words), customer_impact, confirmed_facts (array), open_questions (array), and next_actions (array of objects with owner and task). Excerpt: "Tue 09:12, Priya (Acme Labs): Since yesterday evening, CSV exports from the Trial Balance page finish but contain only headers. PDF export still works. We tested Chrome and Edge. Tue 09:40, Leo (Support): I reproduced on Acme sandbox for date ranges ending after 2026-03-31. Older ranges export rows. Tue 10:05, Marta (Eng): Likely caused by the nullable department_code added in report API v4.2. Not seeing errors in UI; worker logs show 17 failed row-serialization warnings. Tue 10:20, Priya: Month-end close is Friday; we need CSVs for auditors. Tue 10:35, Marta: Patch ready, needs review; workaround is selecting 'Group by: Account' instead of 'Department'."

Winner: DeepSeek: DeepSeek V4 Flash — Model A is more complete on confirmed facts, including older ranges exporting rows and no UI errors, while keeping the required structure and summary length. Model B has stronger next actions but omits some key facts and slightly overstates/conflates reproduction details and customer impact.

4. Messy reimbursements to JSON

Data wrangling / structured output — Convert the messy reimbursement notes into valid JSON only. Schema: { "claims": [ { "employee": string, "date": "YYYY-MM-DD", "category": "meals"|"taxi"|"lodging"|"supplies", "amount_usd": number, "receipt": boolean } ], "total_usd": number }. Round amounts to 2 decimals. Interpret rcpt, receipt, scan, or photo as receipt present; no slip as false. Notes: J. Osei - 4/6/26 taxi from LGA $38.40 rcpt; Mira Patel Apr 7 2026: hotel, USD 214, scan uploaded; osei, Joseph 2026-04-07 lunch with vendor 27.8 no slip; M. Patel 04-08-2026 printer paper + folders supplies $19.36 photo; Jordan Osei Apr 8 dinner $31.25 receipt.

Winner: DeepSeek: DeepSeek V4 Flash — Model A accurately extracts all claims, dates, categories, receipts, and total, though it violates the 'JSON only' instruction by wrapping the result in a code fence. Model B is incomplete/invalid JSON and also appears to misidentify at least one employee.

See every prompt and the full side-by-side outputs in the interactive Head-to-Head.

How they were tested

1. TypeScript room conflict detector

2. Customer delay email

3. Support thread summary extraction

4. Messy reimbursements to JSON

Reader comments