grok-4.3 edges gpt-5.4-nano on execution, not flash

grok-4.3 vs gpt-5.4-nano

This was close on aggregate, but grok-4.3 wins because it made the fewer costly mistakes in structured-output work. gpt-5.4-nano was sharper on tone and regex edge cases, yet it gave back those gains by breaking instructions where precision mattered more.

The score says nail-biter — 33.8 to 33.4 — but the split is revealing. gpt-5.4-nano took both writing-adjacent tasks: **python-log-redaction-fix** and **release-delay-customer-email**. In the log-redaction task, B was simply more careful: it preserved separators and existing quotes better, and it dealt more explicitly with quoted JSON-style values. In the customer email, B also had the stronger editorial instinct, matching the requested candid tone and laying out options more cleanly. But grok-4.3 won the task that mattered most for trust: **meeting-notes-summary-extract**. A stuck to the requested format, delivered the required two-sentence summary, and produced a valid JSON array without getting cute. B, by contrast, added markdown/code fences and invented absolute dates for relative phrases like “today” and “before next Wednesday.” That is exactly the kind of unnecessary improvisation that turns a usable model into one you have to babysit. The **messy-orders-to-json** tie reinforces the point. When the job was straightforward normalization, both models were competent: valid JSON, correct coercions, malformed `quantity: "two"` sensibly set to `null`. No daylight there. So this verdict comes down to error profile. gpt-5.4-nano is the nicer stylist and the more polished fixer on narrow tasks, but grok-4.3 was more disciplined when the instructions were brittle and the output format actually mattered. **Final call: grok-4.3 wins because it was the model less likely to create new problems while solving the prompt.**

python-log-redaction-fix

Language: Python 3.11. Fix the function below so it safely redacts sensitive values from app logs. It must replace the values for keys api_key, token, and ssn with "[REDACTED]" whether they appear as JSON-style pairs or query-string-style pairs, case-insensitively, without changing unrelated text. Preserve the original separators and quotes when possible. Return code only. Broken code: import re def redact_sensitive(line: str) -> str: pattern = r'(api_key|token|ssn)\s*[:=]\s*([^,\s}]+)' return re.sub(pattern, r'\1=[REDACTED]', line) # desired behavior examples: # 'POST /sync?token=abC9z&user=mina' -> 'POST /sync?token=[REDACTED]&user=mina' # '{"ssn": "903-44-1209", "name": "Ivo"}' -> '{"ssn": "[REDACTED]", "name": "Ivo"}' # 'api_key : ZX-77-PL, status=ok' -> 'api_key : [REDACTED], status=ok'

grok-4.3: ```python import re def redact_sensitive(line: str) -> str: pattern = r'(?i)(["\']?)(api_key|token|ssn)(["\']?)(\s*[:=]\s*)(["\']?)([^"\'&,\s}]+)(["\']?)' return re.sub(pattern, r'\1\2\3\4\5[REDACTED]\7', line) ```
gpt-5.4-nano: ```python import re def redact_sensitive(line: str) -> str: keys = r'(api_key|token|ssn)' value = r'([^,\s}"\'\]]+|"[^"]*"|\'[^\']*\')' # JSON-style: "key": value (optional spaces, preserve separators/quotes as much as possible) json_pattern = re.compile( rf'("{keys}")(\s*)(:)(\s*)({value})', flags=re.IGNORECASE ) # Query-string / form-style: key=value or key: value, preserving spaces around separators query_pattern = re.compile( rf'({keys})(\s*)([:=])(\s*)({value})', flags=re.IGNORECASE ) def _replace_value(match: re.Match) -> str: groups = match.groups() key = groups[0] # includes quotes for JSON pattern, otherwise raw sep_after_key = groups[1] sep = groups[2] sep_after_sep = groups[3] _old_value = groups[4] # Preserve quotes around string values when possible if _old_value and (_old_value[0] == _old_value[-1]) and _old_value[0] in ('"', "'"): quoted = _old_value[0] + "[REDACTED]" + _old_value[-1] return f"{key}{sep_after_key}{sep}{sep_after_sep}{quoted}" return f"{key}{sep_after_key}{sep}{sep_after_sep}[REDACTED]" # Apply JSON redaction first, then general key/value redaction line = json_pattern.sub(_replace_value, line) line = query_pattern.sub(_replace_value, line) return line ```

B better preserves separators and existing quotes around values, and more explicitly handles quoted JSON-style values as required. A is concise and mostly works, but its single regex is less robust for JSON cases and can mishandle edge cases around quoting and value boundaries.

release-delay-customer-email

Write an email to customers about a delayed shipment of the EmberLane Studio Desk in walnut. Audience: customers who preordered from our small furniture company. Tone: candid, calm, and professional; do not sound legalistic. Length: 140-180 words. Facts to include: a varnish supplier in Brno delivered a batch that failed our finish test; we will not ship desks with that defect; the new estimated ship window is 18-24 July; customers can keep their order, switch to oak for immediate dispatch, or get a full refund; as an apology, include a 12% refund on the desk price for anyone who keeps the walnut order. Include a clear subject line.

grok-4.3: Subject: Update on Your EmberLane Studio Desk in Walnut Hello, Thank you for preordering the EmberLane Studio Desk in walnut. We wanted to let you know about a short delay we’ve run into with the finish. A recent batch of varnish from our supplier in Brno failed our quality test. We won’t ship desks that don’t meet our standards, so we’re waiting for a fresh supply. The new estimated ship window is July 18–24. To thank you for your patience, anyone who keeps the walnut order will receive a 12% refund on the desk price. You can also switch to oak for immediate dispatch or request a full refund—whatever works best for you. Please reply to this email and let us know your preference. We appreciate your understanding and look forward to getting your desk to you soon. Best regards, EmberLane Studio
gpt-5.4-nano: **Subject:** EmberLane Studio Desk (Walnut) Update: New Ship Window + Options Hello EmberLane customer, We’re reaching out with a candid update about your preordered **EmberLane Studio Desk in walnut**. While preparing this batch, our **varnish supplier in Brno** delivered a batch that **failed our finish test**. We’re not going to ship any desks with that defect, even if it would mean sending them sooner. Our new estimated shipping window is **18–24 July**. If you’d like to keep moving, you have three options: 1) **Keep your walnut order** — we’ll ship it during the new window. 2) **Switch to oak for immediate dispatch**. 3) **Request a full refund**. As a sincere apology, customers who **keep the walnut order** will receive a **12% refund on the desk price**. If you tell us which option you prefer, we’ll confirm the next steps right away. Thank you for your patience and trust. Warmly, [Your Name] [Small Furniture Company Name]

Both emails are clear, professional, and include all required facts within the length range. B is slightly better because it more directly mirrors the requested candid tone, states the defect issue more explicitly, and presents the customer options in a cleaner, easier-to-scan format.

meeting-notes-summary-extract

Read these meeting notes and then do two things: (1) write a 2-sentence summary, and (2) extract the action items as a JSON array where each item has owner, task, and due_date. Notes: - Monday ops check-in for the Northline clinic rollout. - Priya: kiosk tablets at Glenferry arrived Friday; 11 of 12 enrolled correctly, last one stuck on setup screen. - Mateo: traced the setup issue to an expired MDM token in the staging profile, not hardware. - Decision: postpone Glenferry patient self-check-in launch from 6 May to 8 May. - Juno will renew the MDM token and re-enroll the failed tablet by 3 May. - Priya will notify clinic manager Elise Tan about the 2-day delay today. - Mateo to update the deployment runbook with the corrected staging-profile step before next Wednesday. - Separate note: Southport site remains on track for 13 May; no blockers. - Risk raised: courier labels for spare chargers still show old helpdesk number. - Anika to send corrected label artwork to vendor LumaPack by 2 May.

grok-4.3: The Monday ops check-in addressed tablet enrollment problems at Glenferry stemming from an expired MDM token, which prompted a decision to delay the patient self-check-in launch by two days to 8 May while Southport stayed on track for 13 May. Action items were assigned to fix the issue, communicate the delay, update documentation, and mitigate a labeling risk. [ {"owner": "Juno", "task": "renew the MDM token and re-enroll the failed tablet", "due_date": "3 May"}, {"owner": "Priya", "task": "notify clinic manager Elise Tan about the 2-day delay", "due_date": "today"}, {"owner": "Mateo", "task": "update the deployment runbook with the corrected staging-profile step", "due_date": "before next Wednesday"}, {"owner": "Anika", "task": "send corrected label artwork to vendor LumaPack", "due_date": "2 May"} ]
gpt-5.4-nano: **(1) 2-sentence summary:** During the Northline clinic ops check-in, the Glenferry kiosk tablet setup issue was traced to an expired MDM token in the staging profile rather than hardware, and the patient self-check-in launch was postponed from 6 May to 8 May. Juno will renew the MDM token and re-enroll the failed tablet by 3 May, while Priya and Mateo will handle customer/clinic communications and update the deployment runbook, and Anika will correct courier label artwork to resolve the helpdesk-number risk. **(2) Action items (JSON):** ```json [ { "owner": "Juno", "task": "Renew the MDM token and re-enroll the failed Glenferry tablet.", "due_date": "2026-05-03" }, { "owner": "Priya", "task": "Notify clinic manager Elise Tan about the 2-day delay (launch moved to 8 May).", "due_date": "2026-06-10" }, { "owner": "Mateo", "task": "Update the deployment runbook with the corrected staging-profile step.", "due_date": "2026-06-17" }, { "owner": "Anika", "task": "Send corrected courier label artwork to vendor LumaPack to remove old helpdesk number.", "due_date": "2026-05-02" } ] ```

A follows the requested format closely with a correct 2-sentence summary and a valid JSON array preserving the source due-date wording. B’s summary is fine, but it adds markdown/code fences and incorrectly invents absolute dates for relative due dates like “today” and “before next Wednesday,” reducing adherence and accuracy.

messy-orders-to-json

Convert the messy order lines below into valid JSON as an array of objects. Use exactly this schema for each object and no extra fields: {"order_id": string, "customer": string, "sku": string, "quantity": integer, "unit_price": number, "ship_country": string, "priority": boolean} Rules: trim spaces; order_id must stay a string; quantity is an integer; unit_price is a number without currency symbols; priority is true only for Y/yes/TRUE (case-insensitive), otherwise false; normalize country names to title case. Data: ORD-9081 | customer= Nabila Hart | sku: QN-44B | qty 3 | unit price $19.90 | country=canada | priority=Y ORD-9082|customer=R. Velasco|sku:LM-2|qty=1|unit price=7|country=UNITED STATES|priority=no ORD-9083 | customer = Teo Mirk | sku : AX-9Z | qty= 12 | unit price= €3.50 | country = germany | priority = TRUE ORD-9084 | customer= "Aya Song" | sku: PT-88 | qty two | unit price $14.00 | country=Japan | priority=N If a field is malformed and cannot be safely coerced, set it to null for that field only.

grok-4.3: [ {"order_id": "ORD-9081", "customer": "Nabila Hart", "sku": "QN-44B", "quantity": 3, "unit_price": 19.9, "ship_country": "Canada", "priority": true}, {"order_id": "ORD-9082", "customer": "R. Velasco", "sku": "LM-2", "quantity": 1, "unit_price": 7, "ship_country": "United States", "priority": false}, {"order_id": "ORD-9083", "customer": "Teo Mirk", "sku": "AX-9Z", "quantity": 12, "unit_price": 3.5, "ship_country": "Germany", "priority": true}, {"order_id": "ORD-9084", "customer": "Aya Song", "sku": "PT-88", "quantity": null, "unit_price": 14, "ship_country": "Japan", "priority": false} ]
gpt-5.4-nano: [ { "order_id": "ORD-9081", "customer": "Nabila Hart", "sku": "QN-44B", "quantity": 3, "unit_price": 19.9, "ship_country": "Canada", "priority": true }, { "order_id": "ORD-9082", "customer": "R. Velasco", "sku": "LM-2", "quantity": 1, "unit_price": 7, "ship_country": "United States", "priority": false }, { "order_id": "ORD-9083", "customer": "Teo Mirk", "sku": "AX-9Z", "quantity": 12, "unit_price": 3.5, "ship_country": "Germany", "priority": true }, { "order_id": "ORD-9084", "customer": "Aya Song", "sku": "PT-88", "quantity": null, "unit_price": 14, "ship_country": "Japan", "priority": false } ]

Both outputs are valid JSON arrays matching the exact schema, correctly normalize and coerce fields, and appropriately set the malformed quantity 'two' to null. There are no substantive differences in correctness or instruction adherence.

Matchup powered by OpenRouter.