Head to head: Anthropic: Claude Opus 4.8 vs Kimi K2.7 Code

Anthropic: Claude Opus 4.8 vs Kimi K2.7 Code

Claude Opus 4.8 sweeps three of four tasks with sharper regex engineering, more polished prose, and cleaner structured output—Kimi K2.7 Code only manages a tie on the JSON normalization task.

This wasn't a nail-biter. Claude Opus 4.8 took a decisive 37–31 win over Kimi K2.7 Code, and the per-task breakdown tells a clear story: Opus simply operates with more precision and better judgment across the board. The only task Kimi didn't lose outright was `messy-orders-to-json`, where both models produced identical, rule-compliant output. A tie on a rigid data-transformation task is fine, but it's not where the real differentiation happens. On `python-log-redactor`, the gap was technical and unambiguous. Opus deployed lookahead/lookbehind assertions to avoid mangling invalid IPs and to preserve surrounding punctuation—exactly the kind of defensive regex craftsmanship you want in production code. Kimi's simpler regex, by contrast, risked matching partial numbers or invalid IPs and lacked email boundary checks. One model wrote a surgical instrument; the other wrote a blunt tool. The `customer-delay-email` task exposed a softer but equally important weakness in Kimi's output. Opus delivered a warmer, more polished email with a genuine-sounding apology, clearer reassurance, and a proactive offer to schedule a call. Kimi's version wasn't broken, but it read as more abrupt and transactional—fine for an internal memo, not for a customer you're trying to retain. Tone matters, and Opus nailed it. Then came `meeting-notes-summary`, where structured output discipline separated the two. Opus produced clean JSON with a properly formatted `launch_date` and a self-contained `budget_change` description. Kimi embedded conditional text inside the date field and omitted the purpose of the budget change entirely. That's not a stylistic difference—it's a data-integrity problem. If you're piping model output into downstream systems, Kimi's sloppiness here would break things. **Claude Opus 4.8 is the clear winner. It writes better regex, better prose, and better JSON. Kimi K2.7 Code is competent when the task is purely mechanical, but the moment judgment, tone, or structural rigor enters the picture, Opus pulls ahead and doesn't look back.**

python-log-redactor

Language: Python 3.11. Write a function `sanitize_log(line: str) -> str` for an API gateway. Replace any IPv4 address with `[IP]` and any email address with `[EMAIL]`, but leave surrounding punctuation untouched. Treat emails case-insensitively. Do not modify invalid IPv4s like `999.1.2.3`. Examples: `"Alert from 10.24.8.9 by Nia.Ross@BlueHarbor.io."` -> `"Alert from [IP] by [EMAIL]."`; `"cc:<ops+night@acme-example.org>, src=192.168.0.14"` -> `"cc:<[EMAIL]>, src=[IP]"`. Return code only.

Anthropic: Claude Opus 4.8: ```python import re def sanitize_log(line: str) -> str: octet = r"(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])" ipv4_re = re.compile( r"(?<![\w.])" + octet + r"(?:\." + octet + r"){3}" + r"(?![\w.])" ) email_re = re.compile( r"(?<![\w.+-])" r"[A-Za-z0-9._%+-]+@[A-Za-z0-9-]+(?:\.[A-Za-z0-9-]+)*\.[A-Za-z]{2,}" r"(?![\w.])", re.IGNORECASE, ) line = email_re.sub("[EMAIL]", line) line = ipv4_re.sub("[IP]", line) return line ```
Kimi K2.7 Code: ```python import re _IPV4_RE = re.compile( r'(?<!\d)(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3}(?!\d)' ) _EMAIL_RE = re.compile( r'[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}', re.IGNORECASE, ) def sanitize_log(line: str) -> str: line = _EMAIL_RE.sub('[EMAIL]', line) line = _IPV4_RE.sub('[IP]', line) return line ```

Model A correctly uses lookahead/lookbehind assertions to avoid modifying invalid IPs and to preserve surrounding punctuation, while Model B's simpler regex could match partial numbers or invalid IPs and lacks boundary checks for emails.

customer-delay-email

Draft an email to a business customer, Harbor & Finch Retail, from the account manager at Northline Analytics. Situation: their custom weekly inventory export is delayed because a validation bug was found during final checks. We now expect delivery by Thursday 3:00 PM Central instead of Tuesday morning. Include a brief apology, reassure them no source data was lost, mention we will send a sample file by Wednesday noon, and offer a 20-minute call if helpful. Audience: operations director. Tone: calm, accountable, professional. Length: 130-170 words.

Anthropic: Claude Opus 4.8: **Subject: Update on Your Weekly Inventory Export — Revised Delivery Thursday 3:00 PM Central** Dear [Operations Director Name], I want to give you a direct update on your custom weekly inventory export. During our final checks, we identified a validation bug that we need to resolve before delivering the file. As a result, the export will now be ready by Thursday at 3:00 PM Central, rather than Tuesday morning. I apologize for the delay and any disruption this causes to your planning. I also want to reassure you that none of your source data was affected or lost—the issue is limited to a validation step in our export process, and the underlying data remains intact. To keep you informed along the way, we will send you a sample file by Wednesday at noon so you can review the format and confirm it meets your needs. If it would be helpful, I'm happy to set up a brief 20-minute call to walk through the details and answer any questions. Thank you for your patience. Best regards, [Your Name] Account Manager, Northline Analytics
Kimi K2.7 Code: **Subject:** Update on this week’s inventory export delivery Dear [Name], I’m writing to let you know that this week’s custom inventory export for Harbor & Finch Retail will be delayed. During our final checks we discovered a validation bug that must be resolved before delivery. We now expect to deliver the file by **Thursday at 3:00 PM Central**, rather than Tuesday morning. I apologize for the inconvenience. Please know that **no source data has been lost or compromised**; the issue is limited to the export validation step. To help you prepare, we will send a representative **sample file by Wednesday at noon Central** for your review. If a brief **20-minute call** would be useful to walk through the updated timing or sample, please let me know and I’ll set it up. Best regards, [Your Name] Account Manager Northline Analytics

Model A is more polished and professional, with a warmer apology, clearer reassurance, and a more proactive offer for the call, while Model B is slightly more abrupt and less customer-focused.

meeting-notes-summary

Read these meeting notes and produce: 1) a 2-sentence summary 2) a JSON object with keys `decision`, `owner`, `launch_date`, `blocked_by`, and `budget_change` Notes: - PulsePeak mobile app release review, 9 May - Mira: crash rate dropped from 2.8% to 0.6% after the image-cache patch - Devlin: Android build passed; iOS still waiting on legal approval for the new heart-zone disclaimer text - Team agreed not to ship on 13 May; target moved to 20 May if legal signs off by 15 May - Budget: +$4,200 for an extra week of contractor QA - Owner for legal follow-up: Sana - Open concern: translated screenshots for Polish are nice-to-have, not a blocker

Anthropic: Claude Opus 4.8: **1) Summary** The PulsePeak team confirmed the image-cache patch reduced the crash rate from 2.8% to 0.6% and that the Android build passed, but the iOS launch is held up pending legal approval of the heart-zone disclaimer text. They pushed the release from 13 May to a 20 May target—contingent on legal sign-off by 15 May—and approved an extra $4,200 for a week of contractor QA. **2) JSON** ```json { "decision": "Do not ship on 13 May; move launch target to 20 May if legal signs off by 15 May", "owner": "Sana (legal follow-up)", "launch_date": "2024-05-20", "blocked_by": "iOS legal approval for the new heart-zone disclaimer text", "budget_change": "+$4,200 for an extra week of contractor QA" } ```
Kimi K2.7 Code: 1) The PulsePeak mobile app will not launch on 13 May; the team set a new target of 20 May, contingent on legal approval of the iOS heart-zone disclaimer text by 15 May. Crash rates improved significantly after the image-cache patch, and the launch is blocked only by legal sign-off, while Polish translated screenshots remain a non-blocking nice-to-have. 2) ```json { "decision": "Do not ship on 13 May; target launch moved to 20 May if legal signs off by 15 May", "owner": "Sana", "launch_date": "20 May (conditional on legal sign-off by 15 May)", "blocked_by": "iOS legal approval for the new heart-zone disclaimer text", "budget_change": "+$4,200" } ```

Model A provides a cleaner, more structured JSON with a properly formatted launch_date and a clear, self-contained budget_change description, while Model B embeds conditional text in the date field and omits the purpose of the budget change.

messy-orders-to-json

Convert the order notes below into valid JSON as an array of objects. Use exactly this schema for each object: `{"order_id": string, "customer": string, "items": [{"sku": string, "qty": number}], "ship_method": "ground"|"air"|"pickup", "rush": boolean}` Rules: trim spaces, uppercase SKUs, numeric qty only, normalize ship methods (`gnd`->`ground`, `airmail`->`air`, `pick up`->`pickup`), and set `rush` true only if the note explicitly says rush/urgent. Order notes: - id=Q-1041 | customer: Elm St. Deli | items: mx-44 x2; ab-9 x 1 | ship=gnd | note: urgent for Friday - customer=RivetLab, id:Q-1042, ship: pick up, items= zz-200 x4 - Q-1043 / customer "Sola Marine" / items: kp-7 x 12 ; rt-2 x3 / ship=airmail / note=standard - id = Q-1044 | customer = Tansy Clinic | items: n5-1 x1 | ship = air |

Anthropic: Claude Opus 4.8: ```json [ { "order_id": "Q-1041", "customer": "Elm St. Deli", "items": [ {"sku": "MX-44", "qty": 2}, {"sku": "AB-9", "qty": 1} ], "ship_method": "ground", "rush": true }, { "order_id": "Q-1042", "customer": "RivetLab", "items": [ {"sku": "ZZ-200", "qty": 4} ], "ship_method": "pickup", "rush": false }, { "order_id": "Q-1043", "customer": "Sola Marine", "items": [ {"sku": "KP-7", "qty": 12}, {"sku": "RT-2", "qty": 3} ], "ship_method": "air", "rush": false }, { "order_id": "Q-1044", "customer": "Tansy Clinic", "items": [ {"sku": "N5-1", "qty": 1} ], "ship_method": "air", "rush": false } ] ```
Kimi K2.7 Code: ```json [ { "order_id": "Q-1041", "customer": "Elm St. Deli", "items": [ { "sku": "MX-44", "qty": 2 }, { "sku": "AB-9", "qty": 1 } ], "ship_method": "ground", "rush": true }, { "order_id": "Q-1042", "customer": "RivetLab", "items": [ { "sku": "ZZ-200", "qty": 4 } ], "ship_method": "pickup", "rush": false }, { "order_id": "Q-1043", "customer": "Sola Marine", "items": [ { "sku": "KP-7", "qty": 12 }, { "sku": "RT-2", "qty": 3 } ], "ship_method": "air", "rush": false }, { "order_id": "Q-1044", "customer": "Tansy Clinic", "items": [ { "sku": "N5-1", "qty": 1 } ], "ship_method": "air", "rush": false } ] ```

Both outputs are identical in content, correctly applying all rules: trimmed spaces, uppercased SKUs, numeric quantities, normalized ship methods, and accurately set rush flags.

Matchup powered by OpenRouter.