Malware authors use nuclear and biological weapons language to evade scanners
A Hades supply-chain wave hid weapons-policy bait in non-executing code comments to jam LLM-first malware triage.
By Ryan Merket ยท Published
Why it matters
The incident turns an AI safety feature into a predictable security failure mode: a model that refuses too early can become the blind spot attackers route around.

John Scott-Railton (@jsrailton), the Citizen Lab spyware researcher, flagged a practical failure mode for AI security tooling this week: malware authors are putting nuclear and biological weapons language into malicious code so safety-tuned models refuse to analyze it.
Scott-Railton wrote in a June 10 thread on X that the goal was to trigger LLM refusals before an AI-assisted scanner reached the spyware itself. His point was not that the trick defeats conventional malware detection. It was sharper than that: attackers have found a content-based denial-of-analysis path in systems that hand untrusted files to models trained to stop when they see certain categories of dangerous content.
The underlying research came from Socket, whose threat research team reported on June 8 that newer PyPI artifacts tied to the Mini Shai-Hulud, Miasma, and Hades campaign include a fake prompt-injection header at the top of an _index.js payload. Socket said the header sits inside a JavaScript block comment, so the runtime ignores it. An AI analysis system that ingests the start of the file without treating it as hostile source text may not.
Scott-Railton is not a random commentator on the AI safety debate. He is a senior researcher at the Citizen Lab and leads its Targeted Threats work, a beat centered on spyware, phishing, and digital attacks against high-risk people. That background matters because the episode is less a philosophical argument about model refusals than a field report from the oldest pattern in malware: if defenders build a predictable gate, attackers will test what makes it stop.
The payload still runs after the model stops
Socket described the campaign as a moving supply-chain attack rather than a single bad package. The newer wave added 23 PyPI package-version artifacts beyond an earlier set of 37 malicious PyPI wheels. It targeted bioinformatics packages, AI and MCP-themed packages, and typo-style names such as rsquests, tlask, and rlask.
The execution paths varied. Some malicious wheels used .pth startup hooks to run a JavaScript payload through Bun. Some bioinformatics packages hid the malicious path in compiled .abi3.so native extensions that execute when Python imports the module. A langchain-core-mcp variant did not bundle the expected _index.js payload at all; Socket said it searched Python's sys.path for the payload and tried to run it with Bun.
That last pattern is the important tell. The campaign is iterating around how scanners work, not merely around what payload to drop. A scanner looking for a loader and payload in the same wheel could miss a split-delivery setup. A reviewer scanning only visible Python source could miss malicious code launched from a native extension. An LLM-first pipeline reading the first chunk of a file could get stuck on safety-policy bait before it reaches the obfuscated stealer.
Socket said the stealer targets developer workstations and CI/CD environments for GitHub, npm, PyPI, RubyGems, JFrog, cloud credentials, Kubernetes service account material, SSH keys, Docker configuration, shell histories, .env files, package registry credentials, and AI developer tool configuration. Socket's live campaign tracker listed the broader Miasma Mini Shai-Hulud activity as ongoing, first discovered on June 1, with last activity on June 11 and 475 affected package artifacts across 146 unique packages.
This is a safety boundary used as an attack surface
The AI-specific evasion is simple. The malicious _index.js begins with a non-executing comment containing fake system instructions and policy-triggering content. Socket wrote that the header appears designed for AI-mediated analysis, not for Node, Bun, or Python. In weak pipelines, it can cause refusal behavior, prompt confusion, context pollution, or a premature classification before the scanner reaches the malware.
Scott-Railton framed the same mechanism as a second-order consequence of aggressive refusals. The first-order rule is that the LLM should refuse to provide weapons assistance. The second-order attack is that a malware author maps what the model refuses and places that text where a scanner will see it first.
In the thread, Scott-Railton said the trick does not defeat YARA and similar detection paths. Socket made the same point: YARA rules, entropy checks, AST parsing, string extraction, deobfuscation, and behavioral rules still work. The weakness is narrower and more operational: AI triage systems that put an LLM at the front of the analysis path, feed it hostile content without isolation, and treat a refusal as the end of the scan.
A test in the thread showed why researchers are paying attention. Scott-Railton asked whether someone could check the sample in Fable, then later wrote that Tal Be'ery (@TalBeerySec) had found a refusal on Fable 5 in a linked X post. That is not a benchmark, and it does not establish broad scanner failure. It does show the failure mode is not theoretical.
The pressure point is developer trust
The campaign's choice of targets explains the timing. Bioinformatics, Python, MCP, and AI developer ecosystems are dependency-heavy and automation-heavy. They also sit close to credentials that can turn one compromised install into more compromised packages, workflow changes, or access to production-adjacent infrastructure.
That makes AI-assisted scanning attractive to both defenders and attackers. Defenders want fast triage across package registries, pull requests, CI logs, and unfamiliar source files. Attackers see a new parser in the pipeline, one with language-model behavior that can be manipulated through the same file it is supposed to inspect.
The lesson for AI security startups and internal platform teams is not to remove safety controls. It is to stop treating refusal as a valid terminal state for malware analysis. Untrusted code has to be parsed as data, not obeyed as instruction. Safety filtering has to be separated from the question the scanner is actually answering: what does this file do, what could execute, and what secrets could it reach?
Socket's defensive guidance stays close to that reality. Review Python environments for executable .pth files, unexpected _index.js files, Bun download logic, and new .abi3.so extensions. In CI/CD environments, inspect runners for unusual workflow changes, Docker socket abuse, poisoned /etc/hosts entries, unexpected privileged containers, and package publishing credentials.
Scott-Railton's broader warning is that the market is early in discovering how model safety behavior changes security tooling. Attackers do not need to defeat every detector. They need one automated gate in one build path to stop looking before the payload starts.