Head to head: grok-4.3 vs Ministral-3B

grok-4.3 vs Ministral-3B

This wasn’t a close stylistic split; it was a clean execution gap. grok-4.3 won every task by being more disciplined about instructions, format, and the small details that make outputs usable in the real world.

grok-4.3 takes this matchup 38.0 to 25.0, and the scoreline flatters Ministral-3B. Across all four tasks, the pattern is the same: grok-4.3 actually does the job asked, while Ministral-3B keeps introducing avoidable problems—mutating inputs, drifting from requested tone, or breaking output constraints. The coding task is the clearest example. In **python-merge-interval-bugfix**, grok-4.3 gets the whole brief right: it sorts without mutating the input, merges overlapping and touching intervals, and keeps the required function name intact. Ministral-3B repairs the merge logic, but then blows the non-mutation requirement by calling `windows.sort()`. That’s not a cosmetic miss; it’s the kind of bug that causes downstream surprises. The writing and summarization tasks tell the same story. In **vendor-delay-status-update**, grok-4.3 matches the requested calm, direct, professional internal Slack tone, while Ministral-3B leans on emojis and alert-style formatting that make a routine ops update sound theatrical. In **meeting-notes-summary-actions**, grok-4.3 follows the requested structure with a real two-sentence summary and a clean JSON object, and it captures the important 8-store pilot scope decision. Ministral-3B misses that owner decision, uses bullets instead of the requested summary format, and freelances on the JSON schema. Then there’s **messy-orders-to-json**, where grok-4.3 simply behaves like a model you can trust in production: valid JSON only, correct preservation of the customer name as "Noah Bennet," no extra decoration. Ministral-3B adds Markdown fences—despite being told not to—and silently changes the name to "Noah Bennett." That’s exactly the kind of seemingly small alteration that breaks pipelines and corrupts records. **Final call: grok-4.3 wins decisively because it is more exact, more compliant, and more reliable on the details that matter. Ministral-3B isn’t losing on brilliance; it’s losing on discipline.**

python-merge-interval-bugfix

Practical coding — Python. Return code only. Fix the bug in this function so it merges overlapping maintenance windows for the same server. Input: a list of tuples (start_minute, end_minute) where end_minute is exclusive. The function should sort if needed, merge any intervals that overlap OR touch, and return a new list. Example: [(30,45),(10,20),(20,30),(50,55)] -> [(10,45),(50,55)]. Buggy code: ```python def merge_windows(windows): if not windows: return [] merged = [windows[0]] for start, end in windows[1:]: last_start, last_end = merged[-1] if start < last_end: merged[-1] = (last_start, max(last_end, end)) else: merged.append((start, end)) return merged ``` Requirements: - Keep the function name `merge_windows` - Do not mutate the input list - Handle unsorted input - Merge touching intervals, e.g. (100,120) and (120,125) - Assume all tuples are valid with start_minute < end_minute

grok-4.3: ```python def merge_windows(windows): if not windows: return [] sorted_windows = sorted(windows) merged = [sorted_windows[0]] for start, end in sorted_windows[1:]: last_start, last_end = merged[-1] if start <= last_end: merged[-1] = (last_start, max(last_end, end)) else: merged.append((start, end)) return merged ```
Ministral-3B: ```python def merge_windows(windows): if not windows: return [] windows.sort() merged = [windows[0]] for start, end in windows[1:]: last_start, last_end = merged[-1] if start <= last_end: merged[-1] = (last_start, max(last_end, end)) else: merged.append((start, end)) return merged ```

A correctly sorts without mutating the input, merges overlapping and touching intervals, and preserves the required function name. B fixes the merge logic but violates the requirement not to mutate the input list by calling windows.sort().

vendor-delay-status-update

Professional writing. Write a status update for our internal operations Slack channel about a vendor delay. Audience: warehouse supervisors and customer support leads. Tone: calm, direct, no blame. Length: 110-140 words. Facts to include: - Vendor: North Spoke Plastics - Delayed item: 28 mm tamper seals for the Haviland cold-brew bottles - PO: 44187-K - Original dock date: Tuesday, 14 May - New ETA: Friday, 17 May by 3:00 p.m. - Current stock on hand covers packing through Wednesday morning only - We are prioritizing wholesale orders for Juniper Market and Vale & Pine - Customer support should avoid promising same-day shipment for cold-brew bottle orders until replenishment lands - Next update will be posted by 4:30 p.m. today

grok-4.3: Team, We wanted to give everyone a quick update on the recent delay from our vendor North Spoke Plastics. This involves the 28 mm tamper seals for the Haviland cold-brew bottles under purchase order 44187-K. These were originally scheduled to arrive on Tuesday, 14 May, but we now expect them by Friday, 17 May at 3:00 p.m. With the stock we currently have on hand, packing can continue through Wednesday morning only. In light of this, we are prioritizing all wholesale orders for Juniper Market and Vale & Pine to keep those moving. Customer support should avoid promising same-day shipment for cold-brew bottle orders until the new stock is in. We will post the next status update by 4:30 p.m. today.
Ministral-3B: **Status Update:** 📢 **Vendor Delay Alert** 📢 We're experiencing a delay with our order from North Spoke Plastics. The 28 mm tamper seals for Haviland cold-brew bottles (PO: 44187-K) are running behind schedule. 📅 **Original Dock Date:** Tuesday, 14 May 📅 **New ETA:** Friday, 17 May by 3:00 p.m. Our current stock covers packing through Wednesday morning. We're prioritizing wholesale orders for Juniper Market and Vale & Pine. 📞 **Customer Support:** Please avoid promising same-day shipment for cold-brew bottle orders until replenishment arrives. We'll provide an update by 4:30 p.m. today. Thank you for your understanding and cooperation.

A better matches the requested calm, direct, professional Slack update and includes all required facts in a clear internal-ops style. B is accurate but less aligned in tone due to emojis, alert-style formatting, and the unnecessary closing line, which make it feel less professional and more dramatic.

meeting-notes-summary-actions

Summarization & extraction. Read these meeting notes and produce: 1) a 2-sentence summary 2) a JSON object with keys `launch_date`, `owner_decisions`, `risks`, and `action_items`. Meeting notes: """ OrbitLane weekly launch sync — 6 Feb - Priya said the Android build 2.8.1 fixed the duplicate receipt bug, but the offline refund flow still fails if the cashier added more than 12 line items. - Mateo wants the pilot to start on 19 Feb, limited to 8 stores, not the original 20. - Lena confirmed legal approved the revised loyalty disclaimer text this morning. - Risk: training guides are still in review with Retail Ops; if not signed off by 12 Feb, store managers won't have time to brief weekend staff. - Decision: keep iOS in the pilot; remove kiosk mode from scope entirely. - Action: Priya to deliver a hotfix candidate by 9 Feb. - Action: Jonah to send the final store list by end of day Thursday. - Action: Lena to publish the approved disclaimer in the help center. - Open question: whether gift-card top-ups should be enabled during pilot; no decision yet. """

grok-4.3: 1) The OrbitLane team reviewed Android build fixes and legal sign-off on the loyalty disclaimer while deciding to launch a limited pilot on 19 Feb with only 8 stores, keeping iOS but dropping kiosk mode entirely. A key risk is delayed training-guide approval by Retail Ops, and owners were assigned hotfix, store-list, and help-center publication tasks, leaving gift-card top-ups undecided. 2) ```json { "launch_date": "19 Feb", "owner_decisions": [ "Limit pilot to 8 stores instead of 20", "Keep iOS in the pilot", "Remove kiosk mode from scope entirely" ], "risks": [ "Training guides still in review with Retail Ops; missing 12 Feb sign-off would leave store managers no time to brief weekend staff" ], "action_items": [ "Priya to deliver hotfix candidate by 9 Feb", "Jonah to send final store list by end of day Thursday", "Lena to publish approved disclaimer in the help center" ] } ```
Ministral-3B: ### Summary - The Android build 2.8.1 has fixed a bug but the offline refund flow still fails with more than 12 line items. The pilot will start on 19 Feb with 8 stores, and the loyalty disclaimer text has been approved. - Risks include the delayed training guides, and decisions include keeping iOS in the pilot and removing kiosk mode from scope. Actions include delivering a hotfix candidate by 9 Feb, sending the final store list by end of day Thursday, and publishing the approved disclaimer in the help center. ### JSON Object ```json { "launch_date": "19 Feb", "owner_decisions": [ "Keep iOS in the pilot", "Remove kiosk mode from scope entirely" ], "risks": [ "Training guides are still in review with Retail Ops; if not signed off by 12 Feb, store managers won't have time to brief weekend staff" ], "action_items": [ { "action": "Deliver a hotfix candidate", "responsible": "Priya", "due_date": "9 Feb" }, { "action": "Send the final store list", "responsible": "Jonah", "due_date": "End of day Thursday" }, { "action": "Publish the approved disclaimer in the help center", "responsible": "Lena", "due_date": "N/A" } ] } ```

A follows the requested format more closely with a true 2-sentence summary and a JSON object whose fields cleanly capture the meeting decisions, including the 8-store pilot scope change. B is mostly correct but misses that owner decision, uses bullet points instead of a plain 2-sentence summary, and changes the action_items structure beyond what was asked.

messy-orders-to-json

Data wrangling / structured output. Transform the messy order notes below into valid JSON only. Output an object with one key, `orders`, whose value is an array of order objects sorted by `order_id` ascending. Each order object must have exactly these keys: - `order_id` (string) - `customer` (string, title case) - `items` (array of objects with keys `sku` string and `qty` integer) - `ship_method` (one of `ground`, `air`, `pickup`) - `priority` (boolean) Rules: - Normalize customer names to title case - If an item appears twice within the same order, combine quantities - Ignore comments in parentheses - `priority=yes/true` => true, `priority=no/false` => false Messy notes: """ #A-104 | customer: marta iverson | items: QX-4 x2, LM-9 x1, QX-4 x3 | ship=ground | priority=yes A-102 / customer=ELI TRAN / ship:pickup / items:[RN-1*1; TK-7*2] / priority=false order A-103 ; items ZP-8=4, RN-1=2 ; customer "noah bennet" ; PRIORITY: TRUE ; ship method = air A-101, customer: reem al-hadi, items: LM-9 x 2, BC-3 x1, ship=ground, priority=no (hold gift wrap) """

grok-4.3: { "orders": [ { "order_id": "A-101", "customer": "Reem Al-Hadi", "items": [ { "sku": "LM-9", "qty": 2 }, { "sku": "BC-3", "qty": 1 } ], "ship_method": "ground", "priority": false }, { "order_id": "A-102", "customer": "Eli Tran", "items": [ { "sku": "RN-1", "qty": 1 }, { "sku": "TK-7", "qty": 2 } ], "ship_method": "pickup", "priority": false }, { "order_id": "A-103", "customer": "Noah Bennet", "items": [ { "sku": "ZP-8", "qty": 4 }, { "sku": "RN-1", "qty": 2 } ], "ship_method": "air", "priority": true }, { "order_id": "A-104", "customer": "Marta Iverson", "items": [ { "sku": "QX-4", "qty": 5 }, { "sku": "LM-9", "qty": 1 } ], "ship_method": "ground", "priority": true } ] }
Ministral-3B: ```json { "orders": [ { "order_id": "A-101", "customer": "Reem Al-Hadi", "items": [ { "sku": "LM-9", "qty": 2 }, { "sku": "BC-3", "qty": 1 } ], "ship_method": "ground", "priority": false }, { "order_id": "A-102", "customer": "Eli Tran", "items": [ { "sku": "RN-1", "qty": 1 }, { "sku": "TK-7", "qty": 2 } ], "ship_method": "pickup", "priority": false }, { "order_id": "A-103", "customer": "Noah Bennett", "items": [ { "sku": "ZP-8", "qty": 4 }, { "sku": "RN-1", "qty": 2 } ], "ship_method": "air", "priority": true }, { "order_id": "A-104", "customer": "Marta Iverson", "items": [ { "sku": "QX-4", "qty": 5 }, { "sku": "LM-9", "qty": 1 } ], "ship_method": "ground", "priority": true } ] } ```

A fully follows the instruction to output valid JSON only and correctly preserves the source customer name as "Noah Bennet". B adds Markdown code fences, violating the output-format requirement, and also changes the customer name to "Noah Bennett," which is not supported by the input.

Matchup powered by OpenRouter.