CrankGPT is a hand-cranked AI project with real edge-computing numbers
CrankGPT runs speech recognition, a small language model, and text-to-speech locally on a Raspberry Pi 5 with no battery or cloud.
By Ryan Merket · Published
Why it matters
CrankGPT is a joke with a useful benchmark inside: it turns the abstract debate over AI energy use into concrete numbers for local, private inference.

Katrin Tomanek and Alex Kauffmann have built CrankGPT, a hand-cranked, fully offline AI box that is part product satire and part serious edge-computing benchmark.

The technical documentation published by Squeez Labs makes clear that the gag has working hardware underneath it: a Raspberry Pi 5, a 20W hand-crank generator, a custom capacitor board, local speech recognition, a local language model, and local text-to-speech. The pitch page sells the joke hard, with tiers that run from a 20W hand-cranked voice assistant to a fictional 2,000W-plus "Singularity" tier for agent swarms. The build log is the useful part: it shows where the power, latency, boot time, and model-size constraints actually land when AI leaves the data center and has to survive on a human wrist.
That is on-brand for Squeez Labs, which describes itself as a San Francisco research and innovation lab focused on making AI smaller, cheaper, and private enough to run anywhere. Tomanek brings the model-side credibility: Squeez says she has a PhD in machine learning and spent more than a decade at Google working on natural language processing, automatic speech recognition, and neural machine translation. Kauffmann brings the physical interface and prototyping side: his portfolio says he spent 12 years at Google, prototyped early Glass work at X, shipped experimental computer-vision apps, led design work on Google Cardboard at Google Research, and later ran an embedded computation and sensing team at ATAP.
CrankGPT is a compact expression of that pairing. Tomanek's edge voice agent repository is open on GitHub and describes a CPU-only stack that can run on a Raspberry Pi 5, with Moonshine for automatic speech recognition, Piper for text-to-speech, and a language model hosted through llama.cpp. The documentation says the CrankGPT version runs every component locally on CPU, with no accelerator, no battery, and no cloud service.
The hardware choice is intentionally constrained. Squeez used a stock Raspberry Pi 5 with 8GB of RAM and a cooling fan HAT, plus a KEYESTUDIO ReSpeaker 2-Mic Pi HAT for audio I/O. Power comes from an off-the-shelf switchable-voltage 20W hand-crank generator. That generator is not enough by itself to make the system feel reliable: the Pi normally draws about 1.5A, but under CPU inference the current can rise sharply, with observed momentary spikes up to 5A. Squeez says those spikes can pull voltage below the Pi's required 4.8V or trigger the generator's overcurrent protection, causing brownouts.
The practical fix is the most revealing part of the build. Squeez built a custom capacitor board that smooths the generator output and acts as a roughly 20-second power reservoir. The result is not a free lunch. The documentation says the crank gets harder to turn when LLM inference and speech synthesis run together. Idle is about 4W. Speech recognition is about 8W. LLM plus text-to-speech inference is about 15W.
The software stack is a survey of what is small enough to matter today. Squeez says the preferred language models are Liquid AI's LFM2 variants, including 350M and 1.2B parameter models, along with Gemma 3 in its 1B form, all running through llama.cpp. On a Raspberry Pi 5, Squeez measured the LFM2.5 350M Q4_K_M model at about 48.86 generated tokens per second, LFM2.5 1.2B at about 15.01 tokens per second, and Gemma 3 1B at about 14.31 tokens per second. The same documentation says an Orange Pi 5 Pro, helped by DDR5 memory bandwidth, improved generation rates by 29% to 58% across the same tested models.
That comparison is the real lesson. Local AI hardware discussions often collapse into chip branding or model leaderboard claims. CrankGPT shows the operational bottleneck more plainly: autoregressive decoding is constrained by memory bandwidth, response time is governed by token generation, and a voice interface exposes every delay. Squeez reports typical time-to-first-byte of about 0.8 seconds with LFM2.5 350M, about 1.5 seconds with LFM2.5 1.2B, and about 2.9 seconds with Gemma 3 1B.
Startup time is harder. The documentation says a user needs to crank for about 30 seconds before having a conversation. That includes about 10 to 15 seconds for the Pi 5 cold boot and firmware sequence, roughly 3 seconds for DietPi to reach userspace, and another 10 to 15 seconds for Python imports and model loading. Squeez tried NVMe to speed random reads and found it was the wrong optimization for this use case: the Pi 5 bootloader's PCIe and controller setup added roughly 10 seconds before Linux, erasing the runtime gains. For a device that cold-starts every session, an SD card was faster end to end.
The marketing page wraps this in a joke about climate, privacy, tech CEOs, and getting in shape. Squeez writes that CrankGPT is meant to "take the power back" and keep prompts local rather than handing them to large AI providers. Those claims are intentionally theatrical. The underlying engineering argument is more grounded: many useful interactions do not require frontier models, always-on cloud services, or kilowatts of server-side capacity.
CrankGPT is not evidence that human-powered AI is commercially practical. It is evidence that the floor for useful local inference keeps moving downward. The machine can answer questions, hold simple conversations, and run as a voice agent on a board-class computer with a small stack of models. That makes it a provocation with a benchmark attached: if a Raspberry Pi 5 can do this at roughly 15W under inference, the next useful edge-AI product may not look like a chatbot subscription at all. It may look like a cheap box that knows one job, keeps data local, and only needs enough intelligence to do that job well.