Head to head: grok-4.3 vs Ministral-3B
grok-4.3 vs Ministral-3B
This wasn’t a close stylistic split; it was a clean execution gap. grok-4.3 won every task by being more disciplined about instructions, format, and the small details that make outputs usable in the real world.
grok-4.3 takes this matchup 38.0 to 25.0, and the scoreline flatters Ministral-3B. Across all four tasks, the pattern is the same: grok-4.3 actually does the job asked, while Ministral-3B keeps introducing avoidable problems—mutating inputs, drifting from requested tone, or breaking output constraints. The coding task is the clearest example. In **python-merge-interval-bugfix**, grok-4.3 gets the whole brief right: it sorts without mutating the input, merges overlapping and touching intervals, and keeps the required function name intact. Ministral-3B repairs the merge logic, but then blows the non-mutation requirement by calling `windows.sort()`. That’s not a cosmetic miss; it’s the kind of bug that causes downstream surprises. The writing and summarization tasks tell the same story. In **vendor-delay-status-update**, grok-4.3 matches the requested calm, direct, professional internal Slack tone, while Ministral-3B leans on emojis and alert-style formatting that make a routine ops update sound theatrical. In **meeting-notes-summary-actions**, grok-4.3 follows the requested structure with a real two-sentence summary and a clean JSON object, and it captures the important 8-store pilot scope decision. Ministral-3B misses that owner decision, uses bullets instead of the requested summary format, and freelances on the JSON schema. Then there’s **messy-orders-to-json**, where grok-4.3 simply behaves like a model you can trust in production: valid JSON only, correct preservation of the customer name as "Noah Bennet," no extra decoration. Ministral-3B adds Markdown fences—despite being told not to—and silently changes the name to "Noah Bennett." That’s exactly the kind of seemingly small alteration that breaks pipelines and corrupts records. **Final call: grok-4.3 wins decisively because it is more exact, more compliant, and more reliable on the details that matter. Ministral-3B isn’t losing on brilliance; it’s losing on discipline.**
python-merge-interval-bugfix
Practical coding — Python. Return code only. Fix the bug in this function so it merges overlapping maintenance windows for the same server. Input: a list of tuples (start_minute, end_minute) where end_minute is exclusive. The function should sort if needed, merge any intervals that overlap OR touch, and return a new list. Example: [(30,45),(10,20),(20,30),(50,55)] -> [(10,45),(50,55)]. Buggy code: ```python def merge_windows(windows): if not windows: return [] merged = [windows[0]] for start, end in windows[1:]: last_start, last_end = merged[-1] if start < last_end: merged[-1] = (last_start, max(last_end, end)) else: merged.append((start, end)) return merged ``` Requirements: - Keep the function name `merge_windows` - Do not mutate the input list - Handle unsorted input - Merge touching intervals, e.g. (100,120) and (120,125) - Assume all tuples are valid with start_minute < end_minute
A correctly sorts without mutating the input, merges overlapping and touching intervals, and preserves the required function name. B fixes the merge logic but violates the requirement not to mutate the input list by calling windows.sort().
vendor-delay-status-update
Professional writing. Write a status update for our internal operations Slack channel about a vendor delay. Audience: warehouse supervisors and customer support leads. Tone: calm, direct, no blame. Length: 110-140 words. Facts to include: - Vendor: North Spoke Plastics - Delayed item: 28 mm tamper seals for the Haviland cold-brew bottles - PO: 44187-K - Original dock date: Tuesday, 14 May - New ETA: Friday, 17 May by 3:00 p.m. - Current stock on hand covers packing through Wednesday morning only - We are prioritizing wholesale orders for Juniper Market and Vale & Pine - Customer support should avoid promising same-day shipment for cold-brew bottle orders until replenishment lands - Next update will be posted by 4:30 p.m. today
A better matches the requested calm, direct, professional Slack update and includes all required facts in a clear internal-ops style. B is accurate but less aligned in tone due to emojis, alert-style formatting, and the unnecessary closing line, which make it feel less professional and more dramatic.
meeting-notes-summary-actions
Summarization & extraction. Read these meeting notes and produce: 1) a 2-sentence summary 2) a JSON object with keys `launch_date`, `owner_decisions`, `risks`, and `action_items`. Meeting notes: """ OrbitLane weekly launch sync — 6 Feb - Priya said the Android build 2.8.1 fixed the duplicate receipt bug, but the offline refund flow still fails if the cashier added more than 12 line items. - Mateo wants the pilot to start on 19 Feb, limited to 8 stores, not the original 20. - Lena confirmed legal approved the revised loyalty disclaimer text this morning. - Risk: training guides are still in review with Retail Ops; if not signed off by 12 Feb, store managers won't have time to brief weekend staff. - Decision: keep iOS in the pilot; remove kiosk mode from scope entirely. - Action: Priya to deliver a hotfix candidate by 9 Feb. - Action: Jonah to send the final store list by end of day Thursday. - Action: Lena to publish the approved disclaimer in the help center. - Open question: whether gift-card top-ups should be enabled during pilot; no decision yet. """
A follows the requested format more closely with a true 2-sentence summary and a JSON object whose fields cleanly capture the meeting decisions, including the 8-store pilot scope change. B is mostly correct but misses that owner decision, uses bullet points instead of a plain 2-sentence summary, and changes the action_items structure beyond what was asked.
messy-orders-to-json
Data wrangling / structured output. Transform the messy order notes below into valid JSON only. Output an object with one key, `orders`, whose value is an array of order objects sorted by `order_id` ascending. Each order object must have exactly these keys: - `order_id` (string) - `customer` (string, title case) - `items` (array of objects with keys `sku` string and `qty` integer) - `ship_method` (one of `ground`, `air`, `pickup`) - `priority` (boolean) Rules: - Normalize customer names to title case - If an item appears twice within the same order, combine quantities - Ignore comments in parentheses - `priority=yes/true` => true, `priority=no/false` => false Messy notes: """ #A-104 | customer: marta iverson | items: QX-4 x2, LM-9 x1, QX-4 x3 | ship=ground | priority=yes A-102 / customer=ELI TRAN / ship:pickup / items:[RN-1*1; TK-7*2] / priority=false order A-103 ; items ZP-8=4, RN-1=2 ; customer "noah bennet" ; PRIORITY: TRUE ; ship method = air A-101, customer: reem al-hadi, items: LM-9 x 2, BC-3 x1, ship=ground, priority=no (hold gift wrap) """
A fully follows the instruction to output valid JSON only and correctly preserves the source customer name as "Noah Bennet". B adds Markdown code fences, violating the output-format requirement, and also changes the customer name to "Noah Bennett," which is not supported by the input.
Matchup powered by OpenRouter.