DeepSeek V4 Pro beats GPT-5.5 Pro on precision

DeepSeek: DeepSeek V4 Pro vs OpenAI: GPT-5.5 Pro

DeepSeek V4 Pro wins this head-to-head by being more exact where it matters: following instructions, matching schemas, and solving edge cases cleanly. GPT-5.5 Pro is still strong, but it gave away points with avoidable deviations.

DeepSeek V4 Pro takes this matchup 38.0 to 33.0, and the margin feels earned. Across the scored tasks, the pattern is simple: Model A was tighter, more literal, and more reliable under constraints, while Model B was good but a little too willing to improvise. The clearest technical win came in **python-log-redactor**. DeepSeek handled overlapping patterns the right way: one regex, one replacer, correct priority, no dropped matches. GPT-5.5 Pro split the work across separate regexes, which opens the door to ordering bugs, and its email pattern had small but real flaws around boundaries and over-matching. That is the difference between code that merely looks plausible and code you would actually trust. DeepSeek also won the instruction-following tasks by not getting cute. In **vendor-delay-update**, it did exactly what the prompt asked: tell the VP to send daily shortage counts by 4 p.m. local time, in a calm and accountable tone, without bolting on extra process. GPT-5.5 Pro wrote a solid note, but it drifted—adding shift-handoff and escalation details and even redirecting the recipient toward "Operations Planning." In **meeting-notes-summary**, the gap was even cleaner: DeepSeek matched the schema exactly, while GPT-5.5 Pro broke it with conditional text in `launch_date` and an array for `blocked_by` where a single value was required. The only draw was **messy-orders-to-json**, where both models did the unglamorous work correctly: valid JSON, preserved order, correct schema, normalized fields. But a tie on the easy cleanup task does not erase the misses on precision work. **Final call: DeepSeek V4 Pro is the better model here.** It was more disciplined, more exact, and more dependable on the tasks where small deviations turn into real failures.

python-log-redactor

Language: Python 3. Write code only. Implement a function `redact_log(line: str) -> str` for an internal support tool. It must mask: - email addresses -> replace the whole address with `[EMAIL]` - IPv4 addresses -> replace with `[IP]` - ticket IDs of the form `INC-` followed by 6 digits -> replace with `[TICKET]` Preserve all other text exactly. Do not mask invalid IPs like `999.1.2.3`. Assume no multiline input. Include any imports needed and nothing else besides the code.

DeepSeek: DeepSeek V4 Pro: ```python import re def redact_log(line: str) -> str: # Combined regex: three capturing groups: (email) or (ip) or (ticket) pattern = re.compile( r'([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})' r'|(\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|0)\.){3}' r'(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|0)\b)' r'|(\bINC-\d{6}\b)' ) def repl(match: re.Match) -> str: if match.group(1): return '[EMAIL]' if match.group(2): return '[IP]' return '[TICKET]' return pattern.sub(repl, line) ```
OpenAI: GPT-5.5 Pro: import re *EMAIL_RE = re.compile(r"\b[A-Za-z0-9.*%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}\b") _TICKET_RE = re.compile(r"\bINC-\d{6}\b") _OCTET = r"(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)" _IP_RE = re.compile(rf"(?<![\d.])(?:{_OCTET}.){{3}}{_OCTET}(?![\d.])") def redact_log(line: str) -> str: line = _EMAIL_RE.sub("[EMAIL]", line) line = _TICKET_RE.sub("[TICKET]", line) line = _IP_RE.sub("[IP]", line) return line

Model A correctly handles overlapping patterns with a single regex and replacer function, ensuring proper replacement priority and no missed matches. Model B's separate regexes risk incorrect ordering and has minor email regex flaws like missing word boundaries and potential over-matching.

vendor-delay-update

Draft a workplace status update for the VP of Operations to send to regional warehouse managers. Situation: our barcode scanner vendor, North Quay Devices, delayed shipment of 420 replacement units from May 12 to May 19 because of a failed battery certification batch. We have enough spare scanners to cover only the Memphis and Reno sites; Tulsa and Allentown will need to share devices for one week. Ask managers to pause nonessential inventory recounts, prioritize outbound picking, and send daily shortage counts by 4 p.m. local time. Tone: calm, accountable, and practical. Length: 140–180 words.

DeepSeek: DeepSeek V4 Pro: Subject: Update on Scanner Replacement Shipment – Temporary Adjustments Required Team, I want to make you aware of a delay in our barcode scanner replacement rollout. Our vendor, North Quay Devices, has informed us that 420 replacement units originally scheduled for delivery on May 12 will now arrive on May 19. The postponement stems from a failed battery certification batch; they are reworking the units to meet safety standards. We’ve assessed our spare inventory. Memphis and Reno have sufficient backup scanners to operate normally. However, Tulsa and Allentown will need to share devices for the coming week. To manage this, please take the following steps immediately: - Pause all nonessential inventory recounts. - Prioritize outbound picking to keep shipments on schedule. - Send a daily shortage count (number of scanners needed to cover your operation) to me by 4 p.m. local time. I recognize this creates a pinch, and I appreciate your flexibility. We’ll provide further updates if anything changes. Thank you for keeping our customers front and center. Best, [VP Name]
OpenAI: GPT-5.5 Pro: Subject: Barcode Scanner Availability – Week of May 12 Team, North Quay Devices has informed us that shipment of 420 replacement barcode scanners, originally due May 12, is now expected May 19 after a failed battery certification batch. We have confirmed the revised date and are holding the vendor accountable for daily progress updates. Current spares are sufficient to keep Memphis and Reno fully covered. Tulsa and Allentown will need to share devices for the week of May 12, so please adjust shift handoffs and charging routines accordingly. Until the replacements arrive, pause all nonessential inventory recounts and direct scanner availability first to outbound picking, then receiving and cycle-count activity tied to customer commitments. If you see a risk to service levels, escalate early. Please send a daily scanner shortage count to Operations Planning by 4:00 p.m. local time, beginning May 12 and continuing until further notice. Thank you for keeping teams focused and customers protected.

Model A better adheres to the prompt by directly specifying 'send daily shortage counts by 4 p.m. local time' to the VP without adding unprompted details like shift handoffs or escalation instructions, while maintaining a perfectly calm, accountable, and practical tone. Model B introduces minor extras and shifts the recipient to 'Operations Planning,' slightly deviating from instructions, though both are high-quality and within word limits.

meeting-notes-summary

Read the meeting notes below, then provide: 1) a 2-sentence summary 2) a JSON object with keys `launch_date`, `owner`, `blocked_by`, `open_questions` (array), and `decisions` (array) Meeting notes: - Project: Cedar Lane tenant portal refresh - Maya said legal approved the new lease-upload wording after changing “instant approval” to “faster review.” - Andre confirmed the frontend is done except for the maintenance banner behavior on iPad Mini. - Priya wants launch on 2026-03-18, but only if payment autofill passes final QA by the 14th. - Blocker: finance sandbox is still returning duplicate receipt IDs for ACH retries. - Decision: remove dark mode from this release and revisit in Q3. - Decision: keep SMS login, but make email login the default option. - Open question: should users be able to delete stored bank accounts without calling support? - Open question: do we localize the late-fee explainer for Quebec French now or after launch? - Owner for launch checklist: Priya.

DeepSeek: DeepSeek V4 Pro: 1) The Cedar Lane tenant portal refresh is on track for a March 18, 2026 launch, contingent on payment autofill passing final QA by March 14, but progress is blocked by a finance sandbox bug that returns duplicate receipt IDs for ACH retries. The team decided to drop dark mode from this release and make email login the default while keeping SMS login, and open questions remain about self-service bank account deletion and the timing of Quebec French localization for the late-fee explainer. 2) ```json { "launch_date": "2026-03-18", "owner": "Priya", "blocked_by": "finance sandbox returning duplicate receipt IDs for ACH retries", "open_questions": [ "should users be able to delete stored bank accounts without calling support?", "do we localize the late-fee explainer for Quebec French now or after launch?" ], "decisions": [ "remove dark mode from this release and revisit in Q3", "keep SMS login, but make email login the default option" ] } ```
OpenAI: GPT-5.5 Pro: The Cedar Lane tenant portal refresh is nearly ready: legal approved the revised lease-upload wording, and the frontend is complete except for iPad Mini maintenance banner behavior. The target launch is 2026-03-18, conditional on payment autofill passing final QA by 2026-03-14, with finance sandbox duplicate ACH receipt IDs still blocking progress. ```json { "launch_date": "2026-03-18, conditional on payment autofill passing final QA by 2026-03-14", "owner": "Priya", "blocked_by": [ "Finance sandbox is returning duplicate receipt IDs for ACH retries" ], "open_questions": [ "Should users be able to delete stored bank accounts without calling support?", "Do we localize the late-fee explainer for Quebec French now or after launch?" ], "decisions": [ "Remove dark mode from this release and revisit in Q3.", "Keep SMS login, but make email login the default option." ] } ```

A follows the requested schema exactly and provides a clear 2-sentence summary plus correctly typed JSON fields. B’s summary is good, but its JSON does not adhere to the specified structure: `launch_date` includes extra conditional text and `blocked_by` is an array instead of a single value.

messy-orders-to-json

Convert the messy order lines below into valid JSON as an array of objects. Use exactly this schema for each object and preserve input order: `{"order_id": string, "customer": string, "items": [{"sku": string, "qty": integer}], "priority": boolean, "ship_by": string|null}` Rules: - Normalize `priority` to true/false. - Normalize missing ship date words like `none`, `tbd`, `-` to null. - Trim spaces around values. - `items` are separated by `;` and each item is `SKU xQTY`. Data: Order=QX-1042 | customer: Larkspur Clinic | items: AB-9 x2; TT-41 x1 | priority=YES | ship_by=2026-07-02 Order=QX-1043|customer: Mira & Son Catering|items: P-88 x12 |priority=no|ship_by=none Order = QX-1044 | customer: Oak Route Studio | items: ZK-2 x3; MN-7 x5; MN-7 x1 | priority = true | ship_by = TBD Order=QX-1045 | customer: Heliotrope Labs | items: R1 x1 | priority = false | ship_by = 2026-07-05

DeepSeek: DeepSeek V4 Pro: [ { "order_id": "QX-1042", "customer": "Larkspur Clinic", "items": [ { "sku": "AB-9", "qty": 2 }, { "sku": "TT-41", "qty": 1 } ], "priority": true, "ship_by": "2026-07-02" }, { "order_id": "QX-1043", "customer": "Mira & Son Catering", "items": [ { "sku": "P-88", "qty": 12 } ], "priority": false, "ship_by": null }, { "order_id": "QX-1044", "customer": "Oak Route Studio", "items": [ { "sku": "ZK-2", "qty": 3 }, { "sku": "MN-7", "qty": 5 }, { "sku": "MN-7", "qty": 1 } ], "priority": true, "ship_by": null }, { "order_id": "QX-1045", "customer": "Heliotrope Labs", "items": [ { "sku": "R1", "qty": 1 } ], "priority": false, "ship_by": "2026-07-05" } ]
OpenAI: GPT-5.5 Pro: [ { "order_id": "QX-1042", "customer": "Larkspur Clinic", "items": [ { "sku": "AB-9", "qty": 2 }, { "sku": "TT-41", "qty": 1 } ], "priority": true, "ship_by": "2026-07-02" }, { "order_id": "QX-1043", "customer": "Mira & Son Catering", "items": [ { "sku": "P-88", "qty": 12 } ], "priority": false, "ship_by": null }, { "order_id": "QX-1044", "customer": "Oak Route Studio", "items": [ { "sku": "ZK-2", "qty": 3 }, { "sku": "MN-7", "qty": 5 }, { "sku": "MN-7", "qty": 1 } ], "priority": true, "ship_by": null }, { "order_id": "QX-1045", "customer": "Heliotrope Labs", "items": [ { "sku": "R1", "qty": 1 } ], "priority": false, "ship_by": "2026-07-05" } ]

Both outputs are valid JSON, preserve input order, match the required schema exactly, and correctly normalize priority and ship_by values. There are no substantive differences in quality or correctness between them.

Matchup powered by OpenRouter.