Kimi-K2.6 wins this matchup 38.0 to 25.0 by being the more reliable, instruction-tight model across every task. Ministral-3B isn’t undone by style points; it loses on avoidable accuracy and format mistakes.
Kimi-K2.6 takes this head-to-head cleanly because it behaves like a model that actually reads the assignment. Across all four tasks, it stayed inside the requested format, avoided inventing facts, and handled edge cases with more care. That sounds basic, but in practical use it’s the difference between output you can ship and output you have to babysit.
The clearest example is `python-log-redactor`. Kimi-K2.6 returned **code only**, included **exactly four short assert tests**, and used a tighter IPv4 regex that avoids redacting invalid addresses. Ministral-3B fumbled both the wrapper and the substance: it added prose/markdown despite the code-only instruction, and its regex was loose enough to catch invalid IP-like strings. That’s not a cosmetic miss; it’s a correctness miss.
The same pattern shows up in the writing and extraction tasks. In `customer-delay-email`, Kimi-K2.6 kept the tone calm and accountable without smuggling in details the prompt never gave it. Ministral-3B wrote a polished note, but invented dates anyway. In `meeting-summary-risks`, Kimi-K2.6 stuck to the notes and the requested JSON keys, while Ministral-3B started freelancing—adding unsupported dates, changing the `owner_decisions` schema, and inferring next steps that weren’t actually in the source.
`messy-orders-to-json` seals it. Kimi-K2.6 produced a valid JSON array only, normalized correctly, and sorted by `ship_by` ascending as instructed. Ministral-3B again added extra prose outside the JSON and even misordered the results by putting `2026-04-09` before `2026-04-08`. Those are unforced errors.
**Final call: Kimi-K2.6 is the better text model here, decisively.** It wins not by being flashy, but by being stricter, cleaner, and more trustworthy on the exact tasks in front of it. Ministral-3B keeps making the kind of small-seeming mistakes that break real workflows.
python-log-redactor
Language: Python 3.11
Write a function `redact_log(line: str) -> str` for a support tool. It must replace:
- any IPv4 address with `[IP]`
- any email address with `[EMAIL]`
- any 10-digit incident ID written as `INC-##########` with `[INCIDENT]`
Preserve all other text exactly. Do not use external packages. Then add 4 short `assert` tests covering mixed content and repeated matches. Return code only.
Kimi-K2.6: import re
def redact_log(line: str) -> str:
email = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b')
ipv4 = re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b')
incident = re.compile(r'\bINC-\d{10}\b')
line = email.sub('[EMAIL]', line)
line = ipv4.sub('[IP]', line)
line = incident.sub('[INCIDENT]', line)
return line
assert redact_log("Contact admin@example.com from 192.168.0.1 for INC-1234567890") == "Contact [EMAIL] from [IP] for [INCIDENT]"
assert redact_log("Servers: 10.0.0.1, 10.0.0.2") == "Servers: [IP], [IP]"
assert redact_log("To: a@b.co and c@d.co") == "To: [EMAIL] and [EMAIL]"
assert redact_log("INC-0000000000, INC-9999999999, host 127.0.0.1, mail ops@local.host") == "[INCIDENT], [INCIDENT], host [IP], mail [EMAIL]"
Ministral-3B: Here's the `redact_log` function with the required functionality, along with the test cases:
```python
import re
def redact_log(line: str) -> str:
"""
Redacts sensitive information in a log line:
- IPv4 addresses (e.g., 192.168.1.1) → [IP]
- Email addresses (e.g., user@example.com) → [EMAIL]
- 10-digit incident IDs (e.g., INC-1234567890) → [INCIDENT]
"""
# IPv4 regex: matches 4 groups of 1-3 digits separated by dots
ip_pattern = r'\b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\b'
redacted_line = re.sub(ip_pattern, r'[IP]', line)
# Email regex: matches standard email format
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
redacted_line = re.sub(email_pattern, r'[EMAIL]', redacted_line, flags=re.IGNORECASE)
# Incident ID regex: matches INC- followed by exactly 10 digits
incident_pattern = r'\bINC-[0-9]{10}\b'
redacted_line = re.sub(incident_pattern, r'[INCIDENT]', redacted_line)
return redacted_line
# Test cases
assert redact_log("User 192.168.1.1 contacted support at user@example.com") == "User [IP] contacted support at [EMAIL]"
assert redact_log("Incident INC-1234567890 was logged by user1") == "Incident [INCIDENT] was logged by user1"
assert redact_log("IPv6 addresses shouldn't match: [2001:db8::1]") == "IPv6 addresses shouldn't match: [2001:db8::1]"
assert redact_log("Mixed: INC-9876543210, 192.168.0.1, user@test.COM") == "Mixed: [INCIDENT], [IP], [EMAIL]"
```
The function uses regex to perform the replacements while preserving the rest of the text. The test cases cover:
1. Mixed IPv4 and email
2. Incident ID with valid digits
3. Edge case with non-IPv4 addresses
4. Case-insensitive email handling
5. Mixed content with multiple redacted elements
A follows the instruction to return code only, includes exactly four short assert tests with mixed and repeated matches, and uses a stricter IPv4 regex that avoids matching invalid addresses. B violates the code-only requirement by adding prose/markdown and uses an overly permissive IPv4 pattern that would redact invalid IP-like strings.
customer-delay-email
Draft an email to a customer named Priya Menon at BlueHarbor Clinics. Audience: an operations manager waiting on a vendor onboarding. Goal: explain that our payout verification is delayed because the bank rejected the first micro-deposit due to an account-name mismatch, confirm no action is needed from her team until Friday, and promise a status update by 3 p.m. CT Friday. Tone: calm, accountable, professional. Length: 120–160 words.
Kimi-K2.6: Subject: Update on Payout Verification – BlueHarbor Clinics Onboarding
Hi Priya,
I wanted to give you a quick update on our vendor onboarding. Our payout verification is currently delayed because the bank rejected the initial micro-deposit due to an account-name mismatch. We are working directly with our bank to resolve this and have already re-initiated the deposit with the corrected details.
No action is needed from your team until Friday. I will send you a status update by 3 p.m. CT on Friday with next steps or confirmation that everything is cleared.
I know your team is waiting on this, and I take full accountability for the delay. Thank you for your patience as we get this sorted.
Best regards,
[Your Name]
Ministral-3B: **Subject:** Update on Payout Verification for BlueHarbor Clinics
**Dear Priya,**
I wanted to personally update you on the status of BlueHarbor Clinics’ vendor onboarding. Our payout verification is temporarily delayed because the bank rejected the first micro-deposit due to an account-name mismatch. We’ve already initiated a second attempt, and the deposits should post by **Thursday, June 11**.
**No action is needed from your team**—we’ll monitor the process and confirm verification as soon as the deposits clear. You can expect a full status update from me by **3 p.m. CT this Friday, June 12**.
I appreciate your patience and understand the importance of timely onboarding. Please don’t hesitate to reach out if you’d like to discuss further.
**Best regards,**
[Your Full Name]
[Your Job Title]
[Your Contact Information]
[Company Name]
A is clear, accurate, and fully follows the prompt with a calm, accountable tone and no unnecessary specifics. B is polished but introduces unsupported dates not provided in the prompt, which risks inaccuracy and slightly weakens instruction adherence.
meeting-summary-risks
Read these meeting notes and produce:
1) a 3-bullet summary
2) a JSON object with keys `launch_date`, `owner_decisions`, `open_risks`, and `next_steps`
Notes:
"Atlas Billing sync, 9:00 a.m. Tuesday. Mara said the pilot for Cedar Point Dental moved from May 20 to May 28 because their SFTP allowlist still blocks our new outbound IP. Jin confirmed invoice PDF generation is fixed in staging, but not yet deployed to production. Elio wants the launch banner removed before pilot start. Pri explained support can cover extended hours on May 28-30 only if training is finished by May 24. Decision: Mara owns customer comms, Jin owns prod deploy, Pri owns support schedule. Risk: if allowlist isn't updated by May 23, pilot slips again. Next step: Mara sends Cedar Point the new IPs today; Jin deploys Thursday night and posts rollback plan in #atlas-ops."
Kimi-K2.6: **3-bullet summary**
• The Cedar Point Dental pilot has been pushed from May 20 to May 28 because their SFTP allowlist still blocks the new outbound IP; invoice PDF generation is fixed in staging but not yet in production, and Elio requires the launch banner be removed before go-live.
• Ownership was assigned: Mara owns customer communications, Jin owns the production deployment, and Pri owns the support schedule—provided training is completed by May 24 so extended coverage can run May 28-30.
• The critical path hinges on Cedar Point updating the allowlist by May 23; immediate actions are Mara sending the new IPs today and Jin deploying Thursday night with a rollback plan posted in #atlas-ops.
**JSON object**
```json
{
"launch_date": "May 28",
"owner_decisions": {
"Mara": "customer comms",
"Jin": "prod deploy",
"Pri": "support schedule"
},
"open_risks": [
"Cedar Point SFTP allowlist not updated by May 23, causing pilot to slip again"
],
"next_steps": [
"Mara sends Cedar Point the new outbound IPs today",
"Jin deploys Thursday night and posts rollback plan in #atlas-ops"
]
}
```
Ministral-3B: ### 3-Bullet Summary
- **Pilot Delay**: The Cedar Point Dental pilot was postponed from **May 20 to May 28** due to SFTP allowlist issues blocking the new outbound IP.
- **Key Actions Assigned**: Ownership was clarified for customer communications (Mara), production deployment (Jin), and support scheduling (Pri).
- **Critical Deadlines**: The SFTP allowlist must be updated by **May 23** to avoid further delays; support training must be completed by **May 24**.
---
### JSON Object
```json
{
"launch_date": "2026-05-28",
"owner_decisions": {
"customer_comms": "Mara",
"prod_deploy": "Jin",
"support_schedule": "Pri"
},
"open_risks": [
{
"description": "SFTP allowlist not updated by May 23",
"impact": "Pilot launch slips again"
}
],
"next_steps": [
{
"action": "Mara sends Cedar Point the new IPs",
"deadline": "Today (2026-06-06)"
},
{
"action": "Jin deploys invoice PDF fix to production",
"deadline": "Thursday night (2026-06-11)",
"follow_up": "Post rollback plan in #atlas-ops"
},
{
"action": "Remove launch banner before pilot start",
"owner": "Elio"
}
]
}
```
A is faithful to the notes, concise, and uses the requested JSON keys without inventing dates or extra structure. B adds unsupported absolute dates/year assumptions, changes the owner_decisions schema, and includes inferred next steps not explicitly listed as decisions or next steps.
messy-orders-to-json
Convert the messy order lines below into valid JSON as an array of objects with EXACTLY these keys in this order: `order_id` (string), `customer` (string), `items` (array of strings), `priority` (string: `low`|`normal`|`high`), `ship_by` (string in YYYY-MM-DD), `gift` (boolean).
Rules: trim spaces, normalize customer names to title case, convert `Y/Yes/true` to true and `N/No/false` to false, split multiple items on `;`, and sort the output array by `ship_by` ascending.
Data:
ORD-9081 | customer= nora velasquez | items=label rolls; cutter blades | priority=HIGH | ship_by=2026/04/09 | gift=N
ORD-9077|customer=ACME FIELD OPS|items=solar charger|priority=normal|ship_by=2026-04-07|gift=yes
ORD-9090 | customer= dmitri orlov | items= desk mount ; cable clips ; usb hub | priority= low | ship_by=2026-04-12 | gift= false
ORD-9079 | customer=mei tan | items=waterproof notebook; graphite pencils | priority=High | ship_by=2026-04-08 | gift=Y
Kimi-K2.6: ```json
[
{
"order_id": "ORD-9077",
"customer": "Acme Field Ops",
"items": ["solar charger"],
"priority": "normal",
"ship_by": "2026-04-07",
"gift": true
},
{
"order_id": "ORD-9079",
"customer": "Mei Tan",
"items": ["waterproof notebook", "graphite pencils"],
"priority": "high",
"ship_by": "2026-04-08",
"gift": true
},
{
"order_id": "ORD-9081",
"customer": "Nora Velasquez",
"items": ["label rolls", "cutter blades"],
"priority": "high",
"ship_by": "2026-04-09",
"gift": false
},
{
"order_id": "ORD-9090",
"customer": "Dmitri Orlov",
"items": ["desk mount", "cable clips", "usb hub"],
"priority": "low",
"ship_by": "2026-04-12",
"gift": false
}
]
```
Ministral-3B: Here's the converted JSON array, following all the specified rules:
```json
[
{
"order_id": "ORD-9077",
"customer": "Acme Field Ops",
"items": ["solar charger"],
"priority": "normal",
"ship_by": "2026-04-07",
"gift": true
},
{
"order_id": "ORD-9081",
"customer": "Nora Velasquez",
"items": ["label rolls", "cutter blades"],
"priority": "high",
"ship_by": "2026-04-09",
"gift": false
},
{
"order_id": "ORD-9079",
"customer": "Mei Tan",
"items": ["waterproof notebook", "graphite pencils"],
"priority": "high",
"ship_by": "2026-04-08",
"gift": true
},
{
"order_id": "ORD-9090",
"customer": "Dmitri Orlov",
"items": ["desk mount", "cable clips", "usb hub"],
"priority": "low",
"ship_by": "2026-04-12",
"gift": false
}
]
```
Model A fully follows the instructions: valid JSON array only, correct normalization, and correctly sorted by ship_by ascending. Model B includes extra prose outside the JSON and misorders the array by placing 2026-04-09 before 2026-04-08.
Matchup powered by
OpenRouter.