OpenRouter: Fusion beats DeepSeek-V4-Pro on substance

Fusion takes the match 34.6 to 32.3 by winning the harder precision tests, while DeepSeek-V4-Pro looks better on presentation and instruction-following in narrower spots. The split is clear: Fusion is the safer model when correctness matters; DeepSeek-V4-Pro is the cleaner stylist when the task is mostly packaging.

By · Published

Two abstract computational forms, representing Fusion and DeepSeek-V4-Pro, in a state of comparative thermal analysis. (Infrared / thermal render with scientific instrument readout overlays.)

OpenRouter: Fusion wins this head-to-head because it was better where mistakes actually cost you. In the Go rate-limit parser, Fusion delivered a complete, compilable snippet and got the edge cases right: case-insensitive header lookup, trimming, rejecting signed and fractional numerics, and parsing both RFC1123 and RFC1123Z. DeepSeek-V4-Pro stumbled on basics here — no package declaration, an unnecessary net/http import, and a real logic bug in treating a valid integer value of "0" as invalid.

Fusion also took the meeting-notes task, and that result matters more than it might look. It preserved the key nuance that the Sept. 24 launch was tentative, correctly framed legal approval as the launch gate, and produced a stronger summary plus action items. DeepSeek-V4-Pro was serviceable, but it flattened important uncertainty and overstated the CSV fix as a pre-launch requirement when the notes did not actually say that.

DeepSeek-V4-Pro's wins were narrower but legitimate. On the vendor outage status update, it wrote the more client-ready note: clearer, less technical, and better tuned to the audience. Fusion's version covered the facts, but the opening line — I'll draft this status update email for you. — is exactly the kind of meta filler that makes copy feel unshipped.

And in the messy-orders extraction task, DeepSeek-V4-Pro won by simply following directions. Both models structured the orders correctly, but Fusion blew the valid JSON only requirement by adding explanatory text before the payload. That's not a small formatting nit; in automation, that kind of lapse breaks pipelines.

Final call: OpenRouter: Fusion is the better text model overall. DeepSeek-V4-Pro is polished and often better at audience-facing prose, but Fusion was more dependable on the two tasks that demanded precise reasoning and faithful execution. If I need one model to be right rather than merely presentable, I take Fusion.

How they were tested

We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had gpt-5.4 score each one. OpenRouter: Fusion scored 34.6 to DeepSeek-V4-Pro's 32.3.

1. Go rate-limit parser

Language: Go Return code only. Implement a function: func RetryAfterSeconds(headers map[string]string, now time.Time) int It should read an HTTP Retry-After value from headers, checking keys case-insensitively. If the value is an integer number of seconds, return that integer. Otherwise, try to parse it as an HTTP date using time.RFC1123 or time.RFC1123Z and return the positive whole-number difference in seconds from now. If the date is in the past, parsing fails, the header is missing, or the value is empty, return 0. Requirements: - Do not mutate the input map. - Trim surrounding spaces on the header value. - Fractional or signed numbers like "1.5" or "+20" are invalid and should return 0 unless they parse as a date. - Include any imports needed, but no explanation text.

Winner: OpenRouter: Fusion — A is a complete, compilable Go snippet that correctly handles case-insensitive lookup, trimming, invalid signed/fractional numerics, and RFC1123/RFC1123Z parsing. B is not self-contained because it lacks a package declaration, adds an unnecessary net/http dependency, and incorrectly rejects a valid integer value of "0", which should be returned as 0 rather than treated as invalid.

2. Vendor outage status update

Draft a status update email to retail pharmacy clients. Scenario: Yesterday from 14:07 to 15:18 Central Time, our prescription routing platform "NexLane Rx" delayed some refill messages because a queue worker in the Omaha region stopped acknowledging jobs after a config rollout. New prescriptions were unaffected. We rolled back at 14:41, drained the backlog by 15:18, and added an alert on unacked jobs. No data loss or security issue occurred. Affected clients saw refill processing delays of 12–46 minutes. Write for pharmacy operations managers. Tone: accountable, calm, non-technical. Length: 140–180 words. Include: what happened, who was affected, resolution, and one concrete prevention step. Do not use bullets.

Winner: DeepSeek-V4-Pro — Both are strong, but B better matches the requested audience and tone with clearer, non-technical phrasing while still covering what happened, who was affected, the resolution, and a concrete prevention step. A is also solid, but its opening line is awkward ('I'll draft this status update email for you.') and slightly less polished for direct client use.

3. Meeting notes summary and action items

Read these meeting notes and produce: 1) a 3-bullet summary 2) a JSON object with keys: decision, launch_date, blockers, owners Notes: - Tuesday sync for the "ParcelLens" returns dashboard. - Mina: the carrier-status widget is accurate enough now; false "in transit" labels dropped from 7.8% to 1.9% after remapping FalconExpress event 214. - Jorge: support still gets complaints from beta users about CSV export timing out on workspaces over 80k rows. - Priya: if we cut PDF export from v1, design can finish by Sept 18 instead of Sept 27. - Team agreed v1 will launch without PDF export. - Evan: legal needs final vendor DPA signed before launch; target review complete by Sept 12. - Mina will own carrier widget QA sign-off. - Jorge will coordinate a fix for CSV export timeouts with data infra. - Tentative launch date kept at Sept 24 unless legal slips. For the JSON, make owners an object mapping each open work item to its owner.

Winner: OpenRouter: Fusion — A is more complete and faithful to the notes: it includes the tentative nature of the Sept 24 launch, correctly ties legal to the launch gate, and provides a stronger summary. B is solid but omits the launch contingency in both the summary and JSON and overstates that the CSV fix is required before launch, which the notes do not explicitly say.

4. Messy orders to JSON

Transform the messy order lines below into valid JSON only. Output schema: { "orders": [ { "order_id": "string", "customer": "string", "ship_country": "string", "items": [{"sku":"string","qty":number}], "priority": true, "notes": "string or null" } ] } Rules: - One object per unique order_id, preserving first appearance order. - Combine repeated order lines into the same items array. - qty must be a number. - priority is true only if the line says rush=yes. - notes should be null if blank or "-". - Trim spaces everywhere. Data: ORD-881 | customer: Hale Bioworks | country: CA | sku=MX-44 | qty 3 | rush=yes | notes: deliver before 5 pm ORD-882|customer: Northline Studio|country:US|sku=TB-9|qty 1|rush=no|notes:- ORD-881| customer:Hale Bioworks| country:CA | sku=QZ-2 | qty 7 | rush=yes | notes: deliver before 5 pm ORD-883 | customer: Parnell Civic Theatre | country: IE | sku = STAGE-1 | qty 2 | rush = no | notes: back dock ORD-884|customer:Umber & Ash Cafe|country:AU|sku=GR-11|qty 12|rush=yes|notes:

Winner: DeepSeek-V4-Pro — Both outputs extract and structure the orders correctly, but Model A violates the instruction to return valid JSON only by adding explanatory text before the JSON block. Model B follows the content requirements and formatting instruction more closely.


See every prompt and the full side-by-side outputs in the interactive Head-to-Head.

Reader comments

Conversation for this story loads after sign-in.