Expanse launches resource predictor for GPU and HPC clusters
The founders say their software reads code, job scripts and cluster telemetry before SLURM or Kubernetes runs a workload, aiming to cut wasted compute without becoming the scheduler.
By Ryan Merket · · updated
Why it matters
AI infrastructure spending has made GPU waste a real target for founders. Expanse is betting enterprises can get more capacity from hardware they already own, but that only works if pooling can clear security, scheduling, and internal politics.

Expanse, a startup built by Ismaeel, Eren, Yafet and Nikodem, has launched with a narrower and more technical pitch than idle-GPU pooling: predict what HPC and GPU jobs actually need before a scheduler sees them.
The company says its software reads source code, job submission scripts and hardware context for workloads headed into SLURM or Kubernetes, then recommends resource requests, flags likely failures and surfaces line-level optimizations for researchers. The product is aimed at clusters where users routinely over-request GPUs, memory, CPUs or wall time because the downside of under-requesting is severe: a crashed job can erase days of work.
Expanse is pitching that asymmetry as a software opening. In the founders' launch post, they said data centers run at roughly 30% to 40% effective utilization and that users often request two to three times the resources they need. They also said they measured one national-scale HPC cluster for a month and found that, across 122,000 jobs, 59% of compute was wasted. At on-demand cloud rates for the same hardware, they estimated that waste at roughly $8.5 million in one month on one cluster.
Those are founder-supplied figures, not independently verified benchmarks. But they point to a specific operating problem in AI labs, quant funds, research facilities and manufacturing environments: capacity can be wasted inside allocations, not just left idle at the cluster level.
Expanse is not selling a scheduler
The startup's current product sits in the prediction and intelligence layer, according to the founders. In a follow-up answer to a launch reader, Ismaeel said Expanse does not grant, sell or schedule capacity. It tells the scheduler and user what a job is likely to need.
That distinction matters. The earlier version of the story around Expanse sounded like a shared pool for unused enterprise GPUs. The founder materials describe something different: software that installs on every node, hooks into SLURM or the Kubernetes scheduler, ingests live telemetry, and builds cluster-specific models that get sharper as more workloads run.
Expanse says it uses signals including DCGM, CUPTI, cgroups, network and I/O monitoring, source code, submission scripts, hardware telemetry and cluster metadata. The company then fine-tunes models for a given cluster and returns estimates for GPU VRAM, GPU utilization, memory, CPUs and wall time, with uncertainty estimates and p90 values so users can choose how much risk they are willing to take.
The founders say the models are intentionally trained to over-provision rather than under-provision because the cost of a crashed job is usually higher than the cost of leaving some slack.
The founder claim: custom model, not an LLM wrapper
Ismaeel said in a follow-up answer that Expanse's core model is not an LLM, but a custom architecture built to accept multimodal inputs such as source code, submission scripts and hardware topology.
The founders framed that as the company's technical wedge. They said Ismaeel previously worked at EPCC, Edinburgh's Parallel Computing Centre, under Adrian Jackson, where he built a multimodal HPC resource predictor that ingested code, submission scripts, hardware telemetry and cluster metadata. On EPCC workloads, they said that model scored 34% better than other baselines and outperformed frontier general-purpose LLMs prompted on the same resource-prediction task by roughly 8x.
Expanse is using that argument to separate itself from three common approaches in clusters today: historical per-user averages from SLURM accounting data, hand-written heuristics, and general-purpose coding agents. The founders argue that averages break when workloads change, heuristics are brittle, and LLMs reason about code without a native view of how a particular node topology actually performs.
That is also where the proof burden sits. Buyers will want to know whether Expanse's cluster-specific model keeps improving across new hardware, new teams and new workload patterns, and whether its recommendations are trusted enough for researchers to stop padding every job request.
What users get
Expanse describes three user-facing capabilities.
First, resource prediction at submit time. The product estimates the resources a job needs and can warn about likely out-of-memory failures and other memory-related issues before the job runs.
Second, live observability. While a job runs, Expanse shows hardware telemetry and code-stack profiling in a dashboard. The founders say the profiling overhead is in the low single digits.
Third, failure diagnosis. If a workload fails, Expanse correlates stack profiling with hardware telemetry and produces short, solution-oriented logs that explain what happened, why it happened and what code-level change might fix it.
For researchers and infrastructure teams, the appeal is not just saving money. It is reducing the penalty for asking for a tighter allocation. If the prediction is credible, users can request fewer resources without feeling like they are gambling a multi-day run.
Paid pilots, with a trust problem to solve
Expanse says it is onboarding customers through paid pilots. Pricing is per-cluster. The company is offering a two-week measurement window in which it installs, ingests cluster data and reports recoverable capacity to data center operators, followed by a paid pilot in one department at a fixed monthly fee.
The stated target is operators of SLURM or Kubernetes HPC/GPU clusters with more than 100 GPUs.
That buyer knows the pain already. The challenge is trust. Cluster users over-request because it is rational: wasting someone else's capacity is less immediately painful than losing a long run near completion. Expanse's opportunity is to make the safer behavior and the efficient behavior the same behavior. Its launch gives a more concrete product than an idle-capacity marketplace, but the evidence still has to come from live clusters where users actually act on the recommendations.