Nemotron-3 Ultra crushes Gemma-4 31B by 6 points

NVIDIA's 550B beast wins four straight tasks with cleaner code, sharper reasoning, and stricter instruction following.

By · Published

Comparison and dominance of two AI models, Nemotron-3 Ultra over Gemma-4 31B (Infrared / thermal-camera aesthetic with false-color heat map and scientific instrument readout overlays)

Nemotron-3 Ultra-550B-A55B dominated this matchup. It delivered the only production-grade rate-limit parser that normalizes headers case-insensitively, trims retry-after values, and avoids brittle find() patterns. Gemma-4 simply shipped a less robust version that could break on whitespace.

In the vendor-delay update, Nemotron stayed calm, precise, and actionable—explicitly naming key-based auth rejection and giving non-technical teammates a direct ticket path. Gemma-4 stayed vague on both root cause and process. The meeting-notes summary exposed the same gap: Nemotron captured the decision, carrier indecision, open questions, and action items in two tight bullets. Gemma-4 omitted half the substance.

Even on the strict "Output JSON only" test, Nemotron obeyed while Gemma-4 wrapped its answer in markdown. Every single task told the same story.

Verdict: nemotron-3-ultra-550b-a55b wins.

How they were tested

We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had grok-4-1-fast-non-reasoning score each one. nemotron-3-ultra-550b-a55b scored 38.0 to gemma-4-31b-it's 32.0.

1. js-rate-limit-parser

Language: JavaScript. Return code only. Implement a function parseRetryAfter(headers, nowMs) for an HTTP client. It should read the retry delay from either headers['retry-after-ms'] (milliseconds, preferred) or headers['retry-after'] (either integer seconds or an HTTP-date string). Header names are case-insensitive. Return the number of milliseconds to wait as a non-negative integer. If both headers are missing or invalid, return 0. If the computed delay is negative, return 0. Fractional values should be rounded up to the next integer millisecond. Do not use any external packages. Examples: - parseRetryAfter({'Retry-After-Ms':'2500'}, 0) -> 2500 - parseRetryAfter({'retry-after':'3'}, 1700000000000) -> 3000 - parseRetryAfter({'Retry-After':'Wed, 21 Oct 2037 07:28:00 GMT'}, 2140000000000) returns the ms difference between that date and nowMs - parseRetryAfter({'retry-after-ms':'bad','retry-after':'-2'}, 0) -> 0 Include only the function definition.

Winner: nemotron-3-ultra-550b-a55b — Model A correctly normalizes all header keys into a single object for case-insensitive lookup and trims the retry-after value before parsing, ensuring robustness against whitespace; Model B uses find() which works but is less efficient and skips trimming, potentially failing on spaced values, while Date.parse is fine but less explicit than new Date.

2. vendor-delay-update

Draft a workplace status update for our internal product and support teams. Context: Our payroll export feature for customers in Chile is delayed because vendor Solvanta's SFTP endpoint started rejecting key-based auth after their maintenance window last night. We have a temporary manual workaround and no customer data loss. Impact: 14 customers cannot run their scheduled 06:00 CLT exports. Engineering has a fix in staging and expects production rollout by 13:30 CLT if final validation passes. Support should tell customers we are prioritizing the issue and can manually generate today's export on request. Audience: non-technical coworkers. Tone: calm, accountable, concise. Length: 120-150 words.

Winner: nemotron-3-ultra-550b-a55b — Model A is more precise and accountable, clearly explaining the technical cause (key-based auth rejection) without unnecessary vagueness, while providing explicit, actionable support guidance including ticketing instructions. Model B is slightly less detailed on the root cause and support process, making A better adhere to the calm, accountable, and concise tone for non-technical coworkers.

3. meeting-notes-summary

Read the meeting notes below, then produce: 1) a 2-bullet summary 2) a JSON object with keys decision, owner, deadline, open_questions Meeting notes: - Weekly operations sync, 09:00 Tuesday - Marina: Freight carrier BlueRidge added a new "oversize handling" surcharge on pallet shipments from Reno starting 1 Sep. - Dev checked 38 August orders: 11 would have been affected; average added cost about $14.20. - Decision proposal: keep customer prices unchanged for now; absorb surcharge until we have 30 days of real cost data. - Leo worried margin impact could exceed forecast for the TrailPro XL rack. - Agreed action: Marina to send a pricing review with actual September data by 3 Oct. - Open question: should the website show a shipping disclaimer for oversize items before checkout? - Open question: does BlueRidge apply the fee to replacement shipments under warranty? - No decision yet on changing carrier.

Winner: nemotron-3-ultra-550b-a55b — Model A's 2-bullet summary is more comprehensive, capturing the decision, action item, open questions, and lack of carrier decision, whereas Model B's restates the issue and omits key elements like open questions. Both JSONs are strong and correct, but A's integrates the action into the decision more fully while precisely matching details.

4. messy-orders-to-json

Convert the messy order lines below into valid JSON as an array of objects. Use exactly this schema per object: {"order_id":string,"customer":string,"sku":string,"qty":integer,"unit_price":number,"rush":boolean} Rules: - Trim spaces. - order_id should preserve leading zeros. - qty is an integer. - unit_price should be a number with no currency symbol. - rush is true only if the line contains RUSH or expedite; otherwise false. - Output JSON only. Data: 00172 | Northline Clinic | KM-441-B | qty 3 | $19.95 | normal 00173| Alder & Peak | QX-9 | 1 unit | USD 204.00 | RUSH 00174 | Miri's Bakery | PAN-88 | Qty: 12 | 3.5 | standard 00175 | Solera Home | LMP-220 | 02 | $48 | expedite 00176| Hightide School| BK-CVR-mini | qty=7 | $11.25 | -

Winner: nemotron-3-ultra-550b-a55b — Model A outputs pure JSON as required by 'Output JSON only', while Model B adds a markdown wrapper. Both correctly parse all fields, including trimming, data types, leading zeros, and rush logic.


See every prompt and the full side-by-side outputs in the interactive Head-to-Head.

Reader comments

Conversation for this story loads after sign-in.