This matchup wasn’t close on execution: one model consistently did the job asked, while the other kept drifting into extra verbiage and looser instruction-following. The difference showed up not in flashy reasoning claims, but in whether the output was precise, disciplined, and actually usable.
grok-4.3 wins this head-to-head cleanly, 37.0 to 24.0, by doing the unglamorous thing better: following directions exactly. Across all four cited tasks, it was the model that stayed inside the brief, preserved required structure, and avoided the kind of “helpful” sprawl that turns a usable answer into something an editor or engineer has to trim.
The clearest technical edge showed up in **python-log-redactor**. Both models were broadly correct, but grok-4.3 had the stronger pattern design: its `token`/`api_key` handling stops at any whitespace, while Llama-4-Maverick-17B-128E-Instruct-FP8 only stops at a literal space. That matters. In real logs, tabs and irregular whitespace happen, and B’s approach can overmatch. A was also more concise while better preserving the original line.
On instruction discipline, the gap widened. In **meeting-notes-summary**, grok-4.3 delivered exactly what was asked: a 2-sentence summary plus a JSON object with only the specified keys. B was “mostly correct,” but mostly correct is not the standard when the format is explicit; it added commentary and an alternative JSON structure with disallowed keys. The same pattern repeated in **messy-csv-to-json**, where A returned the converted JSON directly and correctly, while B padded the answer with explanation and code despite being told, in effect, to just output the data.
Even in softer business writing, grok-4.3 was sharper. In **vendor-delay-update**, it was calmer, tighter, and more action-oriented while still covering all required facts within the requested length. Llama’s version wasn’t missing the essentials, but it was wordier and slipped into a slightly motivational tone that didn’t fit the brief. That’s the story of this matchup: B often knew roughly what to do, but A knew how to deliver it in the form actually requested.
**Final call: grok-4.3 is the better working model here, decisively.** It won on robustness in the code task and on obedience to format in the structured-output tasks, while also writing cleaner operational prose. Llama-4-Maverick-17B-128E-Instruct-FP8 wasn’t disastrous; it was just consistently less disciplined, and that’s exactly why it lost.
python-log-redactor
Language: Python 3. Write a function `redact_log(line: str) -> str` that masks sensitive values in application log lines before they are shipped to a vendor. Replace any email address with `[EMAIL]`, any IPv4 address with `[IP]`, and any token value in patterns `token=...` or `api_key=...` with `[SECRET]` while preserving the rest of the line exactly. Matching should be case-insensitive for the keys `token` and `api_key`. Do not alter other fields. Example: `2026-02-14 user=maya.chen@brindle.io ip=172.16.4.22 token=sk-live-9fzA note=retry` -> `2026-02-14 user=[EMAIL] ip=[IP] token=[SECRET] note=retry`. Return code only.
grok-4.3: ```python
import re
def redact_log(line: str) -> str:
line = re.sub(r'(?i)\b(token|api_key)=[^\s]+', r'\1=[SECRET]', line)
line = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b', '[EMAIL]', line)
line = re.sub(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', '[IP]', line)
return line
```
Llama-4-Maverick-17B-128E-Instruct-FP8: ```python
import re
def redact_log(line: str) -> str:
"""
Masks sensitive values in application log lines.
Replaces any email address with `[EMAIL]`, any IPv4 address with `[IP]`,
and any token value in patterns `token=...` or `api_key=...` with `[SECRET]`
while preserving the rest of the line exactly. Matching is case-insensitive
for the keys `token` and `api_key`.
Args:
line (str): The log line to be redacted.
Returns:
str: The redacted log line.
"""
# First, replace email addresses with [EMAIL]
line = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', line)
# Next, replace IPv4 addresses with [IP]
line = re.sub(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', '[IP]', line)
# Finally, replace token and api_key values with [SECRET] in a case-insensitive manner
line = re.sub(r'(?i)(token|api_key)=[^ ]+', lambda match: f'{match.group(1)}=[SECRET]', line)
return line
```
A better preserves the original line while redacting secrets: its token/api_key pattern stops at any whitespace, whereas B only stops at a literal space and can overmatch across tabs or other whitespace. Both are generally correct, but A is slightly more robust and concise for the stated requirements.
vendor-delay-update
Write a Slack update to the operations team about a shipment delay. Facts to include: 280 custom badge reels for the HarborNorth conference were due Thursday; supplier Lattice Print called at 7:40 a.m. saying their laminator failed overnight; earliest ship date is now Monday 9:00 a.m.; we have enough old stock for only 90 attendees; Priya is checking a same-day local fallback with Mint Carton in Tacoma; no customer-facing announcement yet. Audience: internal ops team. Tone: calm, concise, action-oriented. Length: 90-130 words.
grok-4.3: Hi Ops team,
Quick update on the HarborNorth badge reels. The 280 custom reels from Lattice Print were due Thursday, but they called at 7:40 a.m. to report an overnight laminator failure. Their earliest ship date is now Monday at 9:00 a.m.
We have old stock for only 90 attendees. Priya is checking a same-day local fallback with Mint Carton in Tacoma. No customer-facing announcement yet—please flag any urgent impacts on your areas so we can align on next steps.
Thanks,
Llama-4-Maverick-17B-128E-Instruct-FP8: "Hey Ops Team, just a heads up on the HarborNorth conference shipment. Lattice Print, our supplier for 280 custom badge reels, called at 7:40 a.m. to report their laminator failed overnight, pushing back our shipment to Monday at 9:00 a.m. We currently have enough old stock to cover 90 attendees. Priya is exploring a same-day local fallback option with Mint Carton in Tacoma. Let's keep this internal for now and avoid making any customer-facing announcements until we have a clearer plan. I'll keep you updated on any developments. Let's work together to find a solution to get the badges to HarborNorth on time."
A is calmer, more concise, and more action-oriented while covering all required facts within the requested length. B includes all key details but is wordier, uses less concise phrasing, and adds a slightly motivational tone that is less aligned with the brief.
meeting-notes-summary
Read the meeting notes and produce: (1) a 2-sentence summary, and (2) a JSON object with keys `decision`, `owner`, `deadline`, and `risks`.
Meeting notes:
- Weekly checkout reliability sync, 11 Apr, attendees: Nia, Pavel, Jorge, Emi.
- Incident recap: on Tuesday from 14:12-14:45 UTC, card payments from Banco Mistral were rejected because the gateway cert chain on `pay-gw-2` was outdated.
- Customer impact: 183 failed orders, 41 later recovered manually by support.
- Decision: rotate certs on both gateways monthly instead of quarterly, and add a 7-day expiry alert in Grafana.
- Pavel will implement the alerting this sprint. Jorge will update the runbook by 18 Apr.
- Nia flagged one risk: the backup gateway still uses a hard-coded trust store in the legacy container, so monthly rotation may fail until that is patched.
- Emi asked support to draft a macro for affected merchants.
grok-4.3: (1) During the weekly checkout reliability sync, the team recapped a 33-minute incident on Tuesday where an outdated cert chain on pay-gw-2 caused Banco Mistral card payments to fail, resulting in 183 rejected orders. They decided to switch to monthly cert rotation plus a 7-day Grafana expiry alert, with Pavel implementing the alert this sprint, Jorge updating the runbook by 18 Apr, and Nia noting the risk that the backup gateway's legacy hard-coded trust store could break future rotations.
(2)
```json
{
"decision": "rotate certs on both gateways monthly instead of quarterly and add a 7-day expiry alert in Grafana",
"owner": "Pavel (alerting), Jorge (runbook)",
"deadline": "18 Apr",
"risks": "backup gateway still uses hard-coded trust store in legacy container so monthly rotation may fail until patched"
}
```
Llama-4-Maverick-17B-128E-Instruct-FP8: Here are the requested outputs:
**(1) 2-sentence summary:**
The weekly checkout reliability sync meeting on April 11 discussed a recent incident where card payments from Banco Mistral were rejected due to an outdated gateway cert chain. The team decided to rotate certs on both gateways monthly and add a 7-day expiry alert in Grafana, with tasks assigned to Pavel and Jorge.
**(2) JSON object:**
```json
{
"decision": "Rotate certs on both gateways monthly instead of quarterly, and add a 7-day expiry alert in Grafana",
"owner": ["Pavel", "Jorge"],
"deadline": ["This sprint", "18 Apr"],
"risks": ["Backup gateway uses a hard-coded trust store in the legacy container, which may cause monthly rotation to fail until patched"]
}
```
However, to make the JSON object more structured and consistent, we could modify it to:
```json
{
"decision": "Rotate certs on both gateways monthly instead of quarterly, and add a 7-day expiry alert in Grafana",
"actions": [
{"owner": "Pavel", "task": "Implement alerting", "deadline": "This sprint"},
{"owner": "Jorge", "task": "Update runbook", "deadline": "18 Apr"}
],
"risks": [
{"description": "Backup gateway uses a hard-coded trust store in the legacy container, which may cause monthly rotation to fail until patched", "flagged_by": "Nia"}
]
}
```
A follows the requested format exactly with a 2-sentence summary and a JSON object using the specified keys only, while accurately capturing the main decisions, owners, deadline, and risk. B is mostly correct but adds extra commentary and an alternative JSON structure with disallowed keys, so it adheres less well to the instructions.
messy-csv-to-json
Convert the messy inventory data below into valid JSON as an array of objects with EXACTLY these keys in this order: `sku` (string), `name` (string), `qty` (integer), `unit_price` (number), `discontinued` (boolean). Rules: trim whitespace; `qty` may have leading zeros; parse prices as numbers without currency symbols; treat `Y` as true and `N` as false; skip the header row; preserve the original row order.
Data:
SKU | Item | Qty | Price | Disc
QX-14 | Pocket torch | 0007 | $19.95 | N
LM-2 | Field notebook | 12 | 4.5 | n
ZZ-900|Cable ties (50) | 003 | $7.00 | Y
AA-77 | Mini first-aid kit | 0 | $12 | N
grok-4.3: [
{"sku": "QX-14", "name": "Pocket torch", "qty": 7, "unit_price": 19.95, "discontinued": false},
{"sku": "LM-2", "name": "Field notebook", "qty": 12, "unit_price": 4.5, "discontinued": false},
{"sku": "ZZ-900", "name": "Cable ties (50)", "qty": 3, "unit_price": 7, "discontinued": true},
{"sku": "AA-77", "name": "Mini first-aid kit", "qty": 0, "unit_price": 12, "discontinued": false}
]
Llama-4-Maverick-17B-128E-Instruct-FP8: ### Inventory Data in JSON Format
Below is the Python code to convert the given messy inventory data into valid JSON as an array of objects.
```python
import json
def convert_to_json(data):
# Split data into rows
rows = data.strip().split('\n')
# Skip the header row
rows = rows[1:]
# Initialize an empty list to store the inventory items
inventory = []
for row in rows:
# Split the row into columns
columns = [col.strip() for col in row.split('|')]
# Extract and process the data
item = {
"sku": columns[0].strip(),
"name": columns[1].strip(),
"qty": int(columns[2].strip()),
"unit_price": float(columns[3].strip().lstrip('$').strip()),
"discontinued": columns[4].strip().upper() == 'Y'
}
# Append the item to the inventory list
inventory.append(item)
# Convert the inventory list to JSON
json_data = json.dumps(inventory, indent=4)
return json_data
# Given data
data = """
SKU | Item | Qty | Price | Disc
QX-14 | Pocket torch | 0007 | $19.95 | N
LM-2 | Field notebook | 12 | 4.5 | n
ZZ-900|Cable ties (50) | 003 | $7.00 | Y
AA-77 | Mini first-aid kit | 0 | $12 | N
"""
print(convert_to_json(data))
```
### Output
The output will be a JSON array with the inventory data.
```json
[
{
"sku": "QX-14",
"name": "Pocket torch",
"qty": 7,
"unit_price": 19.95,
"discontinued": false
},
{
"sku": "LM-2",
"name": "Field notebook",
"qty": 12,
"unit_price": 4.5,
"discontinued": false
},
{
"sku": "ZZ-900",
"name": "Cable ties (50)",
"qty": 3,
"unit_price": 7.0,
"discontinued": true
},
{
"sku": "AA-77",
"name": "Mini first-aid kit",
"qty": 0,
"unit_price": 12.0,
"discontinued": false
}
]
```
### Explanation
1. The `convert_to_json` function takes the inventory data as a string.
2. It splits the data into rows and skips the header row.
3. For each row, it extracts the columns, strips whitespace, and processes the data according to the given rules.
4. The processed data is stored in a dictionary with the required keys in the specified order.
5. The dictionary is appended to a list, which is then converted to JSON using the `json.dumps` function.
6. The resulting JSON string is returned by the function.
The code ensures that the output JSON has the exact keys in the specified order and adheres to the given rules for processing the data.
A directly provides valid JSON in the requested format with the correct keys, order, values, and row order. B includes substantial extra explanatory text and code instead of only the converted JSON, so it does not adhere to the instruction despite containing a correct JSON example within it.
Matchup powered by
OpenRouter.