A Linux LUKS suspend regression left encryption keys in memory for two years

Ingo Blechschmidt traced the bug to a Linux 6.9 kernel refactor and proposed tests and warnings to stop silent failures.

By ยท Published

Why it matters

The bug shows how laptop encryption can fail at the boundary between kernel, cryptsetup, systemd, and distro integration, where no single project owns the full security promise.

abstract symbolic representation of the story's core idea (editorial illustration in the spirit of New Yorker or The Atlantic)

Ingo Blechschmidt spent June chasing the kind of security bug that open-source systems are good at creating and bad at surfacing: a regression that did not crash, did not throw a visible error, and quietly invalidated a protection users thought they had.

In a June 18 post on Mastodon, Blechschmidt said a suspend-time LUKS locking workflow had been silently failing since Linux 6.9, the kernel release that arrived in May 2024. The narrow claim matters. Blechschmidt is not saying ordinary LUKS full-disk encryption stopped working. He is saying a specific workflow meant to wipe or lock LUKS volume keys during suspend-to-RAM stopped achieving that goal, leaving keys resident in memory on still-powered laptops.

That distinction is the story. Full-disk encryption protects data at rest. Suspend-to-RAM keeps the machine alive. The gap between those two facts is where cold-boot and other memory-extraction attacks live, and where users often overestimate what encryption buys them after they close a laptop lid.

The man page for cryptsetup luksSuspend says the command suspends an active device and wipes the encryption key from kernel memory. Debian's cryptsetup-suspend documentation is explicit about the threat model: suspending LUKS devices means removing their encryption keys from system memory, protecting against attacks that read memory from a suspended system. It also warns that the package protects only LUKS encryption keys, not other sensitive data that may be in RAM.

Blechschmidt's finding says that promise failed for a particular modern-kernel path.

The bug was hiding in a sensible refactor

Blechschmidt attributed the regression to Linux kernel commit a28d893eb3270cf62c10dd8777af0d8452cdc072, titled "md: port block device access to file." In his account, the refactor altered how block devices are opened in the device-mapper path.

In Blechschmidt's account, that refactor had an unexpected long-range interaction with cryptsetup's use of kernel keyrings. The result was that the caller's thread keyring could stay alive beyond the caller process lifetime, preserving a copy of the LUKS volume key after cryptsetup luksSuspend was expected to wipe it.

His proposed kernel patch, posted to the Linux mailing lists as dm: avoid leaking the caller's thread keyring via the table device file, is small.

Hardening also moved into userspace. The cryptsetup project opened merge request 937, which proposes loading volume keys in an intermediary keyring linked to the thread keyring. The MR says it works around the issue Blechschmidt reported and aims to avoid similar future mistakes in the kernel.

The quiet failure is the operational problem

Blechschmidt also opened cryptsetup merge request 936, framed as an RFC to print a warning when wiping the volume key is impossible. That page states the kernel path triggered by cryptsetup luksSuspend wipes dm-crypt's own data structures, while copies of the volume key in kernel keyrings are not wiped. It identifies three cases where the key can remain in keyrings, including kernels starting with 6.9.

The practical consequence, according to that merge request, is that Debian's cryptsetup-suspend, and distributions or private setups built around the same idea, can silently fail to protect the volume during suspend. Silent failure is the hard part for operators. A laptop that refuses to sleep is annoying. A laptop that sleeps and reports no problem after failing its security objective is a different class of bug.

Blechschmidt's NixOS work treats that as a testing problem as much as a patching problem. In NixOS/nixpkgs pull request 532499, opened June 16, he proposed an integration test to verify that cryptsetup luksSuspend actually wipes the volume key from memory. His PR description says the command, and the corresponding libcryptsetup path, are used by systemd-homed and Debian's cryptsetup-suspend to lock an encrypted volume on laptop suspend.

The failure crosses project boundaries: kernel, cryptsetup, systemd, and distribution glue. End-to-end tests at the distro level help ensure the property users care about actually holds after suspend.

Secure suspend for NixOS is the next experiment

Blechschmidt followed the disclosure with a June 21 announcement of experimental secure suspend-to-RAM for NixOS. In that post, he credited an older kernel patch by Pali Rohar and described the new work as a NixOS-focused attempt to wipe LUKS encryption keys on suspend, re-ask for them on resume, and avoid a race condition he associated with Debian's approach.

He also said the kernel patch and userspace tooling could be adapted to other Linux distributions. That is a plausible path, although the public material reviewed here does not establish that the kernel patch has landed upstream, that the NixOS PR has merged, or that cryptsetup has shipped the warning and keyring changes in a release.

For users, the safest reading is narrower than the viral headline version of the story. LUKS did not stop encrypting disks. A suspend-time key-wiping workflow that some security-conscious Linux users relied on appears to have stopped wiping all relevant key material after a Linux 6.9-era kernel change. A full shutdown still removes power from RAM. Ordinary suspend-to-RAM remains a different threat model.

For maintainers, the finding is a reminder that security properties need end-to-end tests. Documentation that says a key is wiped, code that wipes one copy, and a kernel that keeps another reference alive can all be individually understandable while the system fails its promise.

Blechschmidt's contribution was to verify the promise rather than trust the shape of the code. He bisected the regression, dumped VM memory with QEMU, filed the kernel patch, proposed a NixOS regression test, and pushed cryptsetup toward warning users instead of failing silently. That is slow, unglamorous security work. It is also the work that turns a scary discovery into something the next release process can catch.

Reader comments

Conversation for this story loads after sign-in.