In late 2025, researchers at CISA documented a ransomware campaign where each payload delivered to a new victim was functionally identical but cryptographically unique — same logic, different bytes every time. No two samples shared a signature. The threat actor was using an LLM-based mutation engine to rewrite the malware shell between deployments. Traditional AV caught zero of the initial drops.
This is AI-generated polymorphic malware in the wild. Here is exactly how it works — and what defenders can do about it.
How the Mutation Engine Works
Classic polymorphic malware used a built-in mutation engine — a chunk of code that shuffled instructions and re-encrypted the payload on each execution. The mutation engine itself was static, which gave defenders a target. AI changes that. Now the mutation engine is external: a fine-tuned code-generation model that produces a semantically equivalent but syntactically distinct version of the malicious code before it ever reaches a target.
The workflow looks like this: attacker feeds a base payload to a local LLM (Mistral, WizardCoder, or a jailbroken API endpoint). The model receives a prompt like “Rewrite this function using different variable names, control flow, and encoding — preserve behavior, change structure.” The output is a fresh variant. Rinse. Repeat per target.
Here is a stripped-down Python example showing what a mutated shellcode loader looks like versus its predecessor. Both do the same thing: allocate memory and execute shellcode.
# Variant A — original loader
import ctypes, base64
buf = base64.b64decode("SHELLCODE_B64")
rwx = ctypes.windll.kernel32.VirtualAlloc(0,len(buf),0x3000,0x40)
ctypes.memmove(rwx,buf,len(buf))
ctypes.windll.kernel32.CreateThread(0,0,rwx,0,0,0)
# Variant B — AI-mutated, same behavior
import ctypes, codecs
_d = codecs.decode(b"SHELLCODE_HEX", "hex")
_sz = len(_d)
_mem = ctypes.windll.kernel32.VirtualAlloc(None, _sz, 12288, 64)
ctypes.memmove(_mem, _d, _sz)
ctypes.windll.kernel32.CreateThread(None, 0, _mem, None, 0, None)
Same API calls. Same outcome. Different token sequences. A signature built on Variant A will miss Variant B entirely. What changed: variable names, encoding method (base64 swapped for codecs.decode hex), argument style (positional vs. keyword), and constant representation (decimal vs. hex). The mutation took an LLM about four seconds to generate.
A defender looking at this needs to shift from string matching to behavioral detection: both variants call VirtualAlloc with page protection 0x40 (RWX), followed immediately by CreateThread on that same allocation. That sequence is the signal — not the variable names.
Detecting It: Behavioral Rules and Entropy Analysis
Because each variant is syntactically unique, your YARA signatures will fail if they target surface features. You need two layers: behavioral telemetry from your EDR and static entropy analysis to flag suspicious files before execution.
Start with entropy. AI-mutated loaders frequently embed encoded payloads — base64, hex, XOR blobs — that push the file’s Shannon entropy above 6.5. Run binwalk or a quick Python script against files in your quarantine folder.
# Entropy scan on a suspicious file caught by EDR on host web01.corp (192.0.2.44)
# User: jsmith triggered an unusual Python process at 03:12 UTC
$ python3 -c "
import math, collections, sys
data = open(sys.argv[1],'rb').read()
freq = collections.Counter(data)
entropy = -sum((c/len(data))*math.log2(c/len(data)) for c in freq.values())
print(f'Entropy: {entropy:.4f}')
" suspicious_loader.py
Entropy: 6.8821
Anything above 6.5 in a Python script is a red flag — legitimate Python rarely embeds high-entropy blobs. This file scored 6.88. That tells you there is encoded data inside worth extracting. Next step: pull the base64 or hex string, decode it, and submit to a sandbox or run strings against the decoded bytes to look for API imports.
Layer two is your EDR behavioral rule. In Microsoft Defender for Endpoint KQL, hunting for the VirtualAlloc-then-CreateThread pattern on script interpreters looks like this:
// MDE Advanced Hunting — KQL
// Hunt: Script interpreter allocating RWX memory then spawning thread
DeviceEvents
| where Timestamp > ago(24h)
| where ActionType == "CreateRemoteThreadApiCall"
| where InitiatingProcessFileName in ("python.exe", "powershell.exe", "wscript.exe")
| join kind=inner (
DeviceEvents
| where ActionType == "VirtualAllocApiCall"
| where AdditionalFields contains "Protection\":\"PAGE_EXECUTE_READWRITE\""
) on DeviceId, InitiatingProcessId
| project Timestamp, DeviceId, DeviceName, InitiatingProcessFileName,
InitiatingProcessCommandLine, AccountName
| order by Timestamp desc
This query joins two API events on the same process ID: a VirtualAlloc call requesting RWX pages and a subsequent CreateRemoteThread. If python.exe on host web01.corp shows up here at 03:12 UTC under jsmith‘s account, you have an incident. Isolate the host, pull the process memory dump, and extract the shellcode for further analysis.
What To Do Now
AI-mutation engines lower the cost of evasion to nearly zero. Signature-based AV is not going to save you here. The architecture of your detection stack has to change: behavioral rules, memory telemetry, and entropy-based pre-screening working together.
Try This: Right now, open your EDR console and run a version of the KQL query above against the last 7 days of telemetry. Filter on python.exe, powershell.exe, and mshta.exe making VirtualAlloc calls with RWX protection. If you get zero results, verify your EDR is actually collecting API-level telemetry — most platforms require an explicit policy setting to enable it. If you get hits, start triaging by account name and time of day. Outliers at odd hours are your highest-priority leads.
