CVE-2026-44439 is a server-side request forgery in PlaywrightCapture, a Python wrapper that orchestrates headless-browser page capture on top of Playwright. The bug lets attacker-controlled HTML pivot the renderer to file:// paths, loopback services, link-local cloud metadata endpoints, and RFC1918 private networks during page rendering. It is fixed in PlaywrightCapture 1.39.6 via a new only_global_lookup default. The bigger story is that this same attack pattern reaches every HTML-to-PDF service that accepts user HTML and renders it server-side. Treat the CVE as a wake-up call about a whole class, not a single library.
What CVE-2026-44439 actually does
CVE-2026-44439 is a server-side request forgery (SSRF) primitive in PlaywrightCapture, the Python orchestration library that wraps Playwright to "safely" capture web pages. The GitLab advisory classifies it as CWE-918 (SSRF), and the DailyCVE summary lists Medium severity. The fix shipped in version 1.39.6 with a new only_global_lookup flag, defaulting to True, that filters resolved IPs to public-routable addresses only.
The attack mechanism is direct. PlaywrightCapture loads a target URL, waits for the page to settle, and serializes the rendered DOM. Before 1.39.6, the library did not interpose on in-page navigations once rendering started. An attacker-controlled page could ship JavaScript that redirected the renderer to a forbidden target after the initial check had passed.
A minimal proof-of-concept payload:
<!DOCTYPE html>
<html>
<head>
<title>Looks innocent</title>
</head>
<body>
<h1>Hello world</h1>
<script>
// Pivot the renderer to AWS instance metadata
window.location.href =
'http://169.254.169.254/latest/meta-data/iam/security-credentials/';
</script>
</body>
</html>When PlaywrightCapture renders this page, the JavaScript redirect fires inside the Chromium tab. The renderer fetches the metadata endpoint, the response body is rendered as text, and the captured output now contains the IAM role credentials that the EC2 instance was running with. The attacker submits one URL, receives a PDF (or HTML snapshot, in PlaywrightCapture's case) containing the temporary AWS credentials of the host running the capture service.
The same primitive works for file://:
<script>
window.location.href = 'file:///etc/passwd';
</script>And for iframes (which the redirect-style payload generalizes to):
<iframe src="http://10.0.0.42:8080/admin/health" style="width:100%;height:600px"></iframe>In each case, the attacker submits HTML, the renderer reaches a destination it was never meant to reach, and the response body comes back to the attacker through the rendered output.
If you operate any HTML-to-PDF service built on Playwright, Puppeteer, or raw headless Chromium and you do NOT explicitly intercept and filter requests at the browser level, this attack pattern works against you today. The PlaywrightCapture CVE is a single library's CVE; the attack pattern is library-agnostic. The fix in 1.39.6 is a defense for one wrapper; the underlying primitive (browsers fetch any URL by default) is unchanged in every other wrapper.
Why this is an HTML-to-PDF whole-class problem
Every HTML-to-PDF API has the same architectural property: it accepts attacker-controlled HTML and feeds it to a full browser. A full browser, by design, fetches from any URL its host network stack can reach. Without explicit filtering, the attack surface includes every destination that surface covers.
The blast radius for an unguarded HTML-to-PDF renderer is roughly the same across vendors:
| Attack vector | Example target | What an attacker reads |
|---|---|---|
| Cloud metadata endpoint (IPv4) | http://169.254.169.254/latest/meta-data/iam/security-credentials/ | AWS temporary IAM credentials |
| Cloud metadata endpoint (GCP) | http://metadata.google.internal/computeMetadata/v1/ (Metadata-Flavor header required) | GCP service-account tokens |
| Cloud metadata endpoint (Azure) | http://169.254.169.254/metadata/instance?api-version=2021-02-01 (Metadata header required) | Azure managed identity tokens |
| Loopback service | http://127.0.0.1:8080/admin, http://localhost:6379 | Internal admin UIs, Redis, Memcached, debug ports |
| RFC1918 private range | http://10.0.0.42, http://172.16.0.5, http://192.168.1.100 | Internal services on the VPC |
file:// scheme | file:///etc/passwd, file:///app/.env | Local files readable by the renderer process |
| IPv6 link-local | http://[fe80::1]/ | Adjacent IPv6 hosts on the link |
| IPv4 loopback alias | http://0.0.0.0:port, http://[::1]:port | Loopback bypass for naive blocklists |
The AWS Instance Metadata Service documentation is explicit that IMDSv1 is unauthenticated, and that hardening guidance specifically calls out SSRF as the typical exploit path. The OWASP SSRF Prevention Cheat Sheet lists the same attack vectors and the same blocklists, written long before this specific CVE landed. RFC 1918 (datatracker.ietf.org/doc/html/rfc1918) defines the private address space that needs to be in any sane blocklist.
The structural property is the part that matters: in every one of those rows, the renderer process is the one issuing the fetch. From the perspective of the destination service, the request looks like it came from the trusted server, with the trusted server's IAM role, on the trusted server's internal network. SSRF turns the renderer into a confused deputy.
How to test if your HTML-to-PDF pipeline is vulnerable
The fastest way to know is to try the attack against your own service in a controlled environment. Two test payloads are enough to cover the redirect-style and iframe-style variants.
Test 1: JavaScript redirect to AWS metadata (most common probe).
<!DOCTYPE html>
<html>
<body>
<p>Render started.</p>
<script>
setTimeout(function () {
window.location.href =
'http://169.254.169.254/latest/meta-data/';
}, 200);
</script>
</body>
</html>Submit that HTML to your /render endpoint and inspect the resulting PDF. A vulnerable pipeline returns a PDF containing the metadata service's directory listing (ami-id, hostname, iam/, etc.). A safe pipeline returns either a PDF showing "Render started." with no metadata content, an explicit error referring to a blocked destination, or a timeout. If you receive the metadata listing, the pipeline is exposed. Note that on non-AWS hosts the endpoint is unreachable, so this test gives a clean negative on hosts outside AWS even when the pipeline is unsafe; use the loopback variant below as a second check.
Test 2: Iframe to loopback (works on any host with a loopback service).
<!DOCTYPE html>
<html>
<body>
<p>Render started.</p>
<iframe
src="http://127.0.0.1/"
width="600"
height="400"
></iframe>
</body>
</html>Submit and inspect. If the iframe area in the PDF shows a response from any service listening on 127.0.0.1 (even an "It works!" default page, an Nginx welcome screen, or a connection-refused error rendered by Chromium with the destination IP visible), the pipeline allows loopback fetches and is exposed.
Test 3: file:// access to a known file.
<iframe src="file:///etc/hostname" width="400" height="100"></iframe>If the rendered iframe contains the hostname of the renderer container or VM, local-file SSRF is exploitable. The exact contents that come back depend on Chromium's handling of text/plain in iframes, but anything other than a blank iframe or an error referencing a blocked scheme is a finding.
Run these against staging, not production, and only if you operate the service. Probing third-party HTML-to-PDF endpoints without permission is out of scope for this guide and likely violates the vendor's terms of service.
The defense-in-depth playbook
No single mitigation catches every variant. Layer them, and assume each layer will eventually be bypassed.
| Layer | What it catches | What it misses |
|---|---|---|
Chromium request interception via page.route() | All in-browser fetches, including redirects and iframes, before the socket opens. Catches file://, data: top-level, RFC1918 if the handler validates IPs. | Misses if the interception handler trusts hostnames without resolving them (DNS rebinding bypasses it). |
| Network-layer egress filter (iptables, nftables, namespace) | Any fetch that escapes Chromium request interception. Stops the renderer process from opening sockets to RFC1918 and link-local destinations at the kernel. | Misses file:// access entirely (no socket involved). Misses fetches to public IPs that the renderer should not be reaching. |
| DNS resolver hardening | DNS rebinding attacks. Resolve once at intercept time, validate the IP set, pin the resolution for the lifetime of the fetch. | Misses anything that bypasses your resolver (Chromium has its own DNS in some configurations; pin via --host-resolver-rules or run the renderer in a namespace with a controlled resolver). |
| Container isolation (network namespace, seccomp, read-only fs) | Lateral movement after an initial fetch succeeds. Limits what credentials the renderer process has to begin with. | Misses the first read of any destination already on the allowlist. Defense-in-depth, not a primary control. |
Disabling file:// and risky schemes at the browser | Local-file SSRF specifically. | Misses every network-based SSRF. Needs the other layers. |
The Playwright pattern for the first layer, request interception, is documented in Playwright's network handling guide. The minimal handler looks like this:
import { chromium } from "playwright";
import { isIP } from "net";
import dns from "dns/promises";
const BLOCKED_SCHEMES = ["file:", "data:"];
const PRIVATE_CIDRS = [
/^10\./,
/^172\.(1[6-9]|2\d|3[01])\./,
/^192\.168\./,
/^127\./,
/^169\.254\./,
/^::1$/,
/^fe80:/,
];
async function isPrivate(host: string): Promise<boolean> {
// If it's already a literal IP, check directly
if (isIP(host)) return PRIVATE_CIDRS.some((re) => re.test(host));
// Otherwise resolve and check every record
const records = await dns.resolve(host).catch(() => []);
return records.some((ip) =>
PRIVATE_CIDRS.some((re) => re.test(ip)),
);
}
const browser = await chromium.launch();
const page = await browser.newPage();
await page.route("**/*", async (route) => {
const url = new URL(route.request().url());
if (BLOCKED_SCHEMES.includes(url.protocol)) {
return route.abort("blockedbyclient");
}
if (await isPrivate(url.hostname)) {
return route.abort("blockedbyclient");
}
return route.continue();
});That handler is the floor, not the ceiling. It does not pin the resolved IP, so a determined attacker with DNS rebinding can still slip through. The next section covers the pin.
Network-layer egress filtering is the second layer. A renderer container running with a dedicated network namespace and an iptables egress allowlist is materially harder to attack than the same renderer with default routing. The basic pattern, applied at container startup:
# Drop all egress by default
iptables -P OUTPUT DROP
# Allow loopback only for the renderer's own internal IPC
iptables -A OUTPUT -o lo -j ACCEPT
# Allow only the explicit destinations the renderer needs
# (DNS resolver, font CDN, telemetry endpoint, ...)
iptables -A OUTPUT -d 8.8.8.8 -p udp --dport 53 -j ACCEPT
iptables -A OUTPUT -d <fonts CDN IPs> -p tcp --dport 443 -j ACCEPT
# Explicitly drop the cloud metadata endpoint as a belt-and-braces rule
iptables -A OUTPUT -d 169.254.169.254 -j DROP
iptables -A OUTPUT -d 169.254.0.0/16 -j DROP
iptables -A OUTPUT -d 10.0.0.0/8 -j DROP
iptables -A OUTPUT -d 172.16.0.0/12 -j DROP
iptables -A OUTPUT -d 192.168.0.0/16 -j DROPThe allowlist on lines 7-9 is the part that takes work to get right: the renderer needs to fetch fonts, possibly external images, and sometimes external CSS. Each of those endpoints needs an explicit hole in the egress filter. The drops on lines 12-16 are insurance for the case where the allowlist is broader than intended.
DNS rebinding is the bypass to watch
Request interception based on hostname only is not enough. An attacker who controls a domain can serve different DNS responses to consecutive queries: the first response returns a public IP that passes your interception check, the second response (seconds later, when Chromium actually fetches) returns 10.0.0.42. The blocklist never matched because the IP it saw was different from the IP that got fetched.
The defense is to make the IP your blocklist validates the same IP the fetch uses. Three ways to do it:
Resolve in the interception handler and rewrite the URL. Inside page.route(), resolve the hostname yourself, validate every resolved A and AAAA record against the private-address blocklist, and rewrite the request URL to use the resolved IP literal. The fetch then connects to a fixed IP, not a hostname Chromium is free to re-resolve.
await page.route("**/*", async (route) => {
const req = route.request();
const url = new URL(req.url());
if (BLOCKED_SCHEMES.includes(url.protocol)) {
return route.abort("blockedbyclient");
}
// Resolve once, validate, and pin
const ips = await dns.resolve(url.hostname).catch(() => null);
if (!ips || ips.length === 0) {
return route.abort("blockedbyclient");
}
if (ips.some((ip) => isPrivateIp(ip))) {
return route.abort("blockedbyclient");
}
// Pin the request to the first validated public IP
const pinned = `${url.protocol}//${ips[0]}${url.pathname}${url.search}`;
return route.continue({
url: pinned,
headers: { ...req.headers(), host: url.hostname },
});
});The Host header preservation matters: many target services route based on the original hostname, so the request still needs to look right at the application layer even though it connects to a pinned IP.
Run the renderer behind a forward proxy that pins DNS. A small proxy in front of Chromium (squid, or a custom Node proxy) does the resolve-and-pin once, and Chromium fetches through it. Chromium never sees the original hostname for the network layer.
Use Chromium's --host-resolver-rules flag to override DNS resolution for specific hostnames inside the renderer. Useful for tests, less useful for general SSRF defense because it requires knowing the hostname set in advance.
All three patterns share the same invariant: the IP your check validates is the IP your fetch uses. Anything else leaves a window open.
How PDF4.dev handles this
PDF4.dev intercepts every Chromium request via page.route(), rejects file:// and data: top-level navigations, blocks all RFC1918 ranges plus 169.254.0.0/16 and fe80::/10 at the interception layer, runs each renderer container in a network namespace with a whitelist-only egress, and resolves hostnames once at intercept time with the resolved IP pinned for the fetch. Managed APIs in this category should ship these defaults; raw Playwright, Puppeteer, and self-hosted Gotenberg do not, and the operator is responsible for adding them.
Vendor due diligence question: ask your HTML-to-PDF provider whether their renderer blocks file://, RFC1918, and cloud metadata endpoints by default, and how they handle DNS rebinding. A clear written answer takes them ten minutes. If they cannot answer, assume the answer is no.
Frequently asked questions
The FAQs above (mirrored in this article's structured data) cover the most common follow-up questions: Playwright versus Puppeteer scope, why browsers resolve private IPs at all, the limits of blocking 169.254.169.254 alone, defeating DNS rebinding, evaluating managed PDF APIs, the hardened-render pattern, and whether file:// needs an explicit block even with network filters.
The durable takeaway for developers shipping HTML-to-PDF pipelines: every renderer is a full browser, every browser fetches from any URL by default, and SSRF in this category is a whole-class problem that needs a layered fix. CVE-2026-44439 is the latest single point on a long curve; the architectural response is the same one the OWASP cheat sheet has recommended for years, applied at the request-interception layer that Playwright, Puppeteer, and raw CDP all expose.
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.



