Converting a webpage to PDF loads the full URL in a headless Chromium browser, renders JavaScript and CSS, and exports the result as a PDF. The free PDF4.dev tool at pdf4.dev/tools/webpage-to-pdf handles this in seconds, no software needed. For automation and authenticated pages, Playwright gives full control.
How webpage-to-PDF works
A webpage-to-PDF conversion is not a screenshot. It uses Chromium's print engine, the same one behind Chrome's "Save as PDF" feature, running in headless mode:
- A headless Chromium instance opens the URL
- The browser loads HTML, CSS, JavaScript, images, and web fonts
- Scripts execute: React hydration, data fetching, dynamic rendering
- The browser waits for the page to reach a stable state
page.pdf()renders the full document using Chromium's paged media engine
The result matches what you would see if you opened the URL in Chrome and printed to PDF.
URL-to-PDF vs HTML-to-PDF
| Feature | URL-to-PDF | HTML-to-PDF |
|---|---|---|
| Input | A live URL | A raw HTML string |
| Authentication | Requires public URL (or custom cookie handling) | Full control — pass any HTML |
| Dynamic content | Loaded by the browser at runtime | Static; you control the markup |
| Network dependency | Yes | No |
| Best for | Capturing existing pages | Generating documents from templates |
For generating invoices, reports, and certificates from data, HTML-to-PDF is more reliable. URL-to-PDF is the right choice when the page already exists and renders itself.
Method 1: Free browser tool (no install)
PDF4.dev's webpage-to-PDF tool converts any public URL to PDF without installing anything.
- Go to pdf4.dev/tools/webpage-to-pdf
- Paste the URL (must be publicly accessible)
- Choose paper format (A4 or Letter) and margins
- Click Convert to PDF and download
The server fetches the page with Playwright, auto-scrolls to trigger lazy loading, injects CSS to disable animations, and returns a downloadable PDF. Files are not stored after delivery.
Works well for:
- JavaScript-rendered pages (React, Vue, Angular, Next.js)
- Pages with web fonts
- Full-page capture including content below the fold
Limitations:
- Pages behind authentication cannot be accessed
- Infinite scroll content only loads to the depth the auto-scroll reaches
- Pages that actively block headless browsers will fail
Method 2: Playwright in Node.js or Python
For automation, CI/CD pipelines, or pages that require authentication, Playwright gives direct control over every step.
Install Playwright
npm install playwright
npx playwright install chromiumBasic URL-to-PDF
import { chromium } from "playwright";
import fs from "fs";
async function urlToPdf(url: string, outputPath: string): Promise<void> {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: "networkidle" });
const pdf = await page.pdf({
format: "A4",
margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" },
printBackground: true,
});
fs.writeFileSync(outputPath, pdf);
await browser.close();
}
urlToPdf("https://example.com/report", "report.pdf");waitUntil: "networkidle" waits until there are no network requests for 500ms. This ensures client-side rendered content (React, Next.js, SWR data fetching) has finished loading before the PDF is captured.
For Python, install with:
pip install playwright
playwright install chromiumHandling lazy-loaded images
Pages using loading="lazy" or IntersectionObserver defer image loading until the element is near the viewport. Headless Chromium has a limited default viewport height, so images below the fold may not load.
Scroll through the full page height before capturing to trigger them:
// Call this after page.goto(), before page.pdf()
await page.evaluate(async () => {
await new Promise<void>((resolve) => {
let scrolled = 0;
const step = 300;
const timer = setInterval(() => {
window.scrollBy(0, step);
scrolled += step;
if (scrolled >= document.body.scrollHeight) {
clearInterval(timer);
window.scrollTo(0, 0); // scroll back to top before PDF capture
resolve();
}
}, 100);
});
});
// Wait for any newly triggered network requests to settle
await page.waitForLoadState("networkidle");Disabling animations
Animations freeze at their current frame in a PDF. To get a clean, static render, inject CSS that stops all transitions and animations:
await page.addStyleTag({
content: `
*, *::before, *::after {
animation-duration: 0s !important;
animation-delay: 0s !important;
transition-duration: 0s !important;
transition-delay: 0s !important;
}
`,
});This disables CSS animations and transitions, including those driven by Framer Motion and GSAP (which use CSS under the hood). Do this before calling page.pdf().
Avoiding bot detection
Many sites block headless Chromium by checking for signals that identify automated browsers. The most common signals are a missing user agent, the navigator.webdriver property set to true, and headless-specific Chromium flags.
const browser = await chromium.launch({
args: ["--disable-blink-features=AutomationControlled"],
});
const context = await browser.newContext({
userAgent:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 " +
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
viewport: { width: 1280, height: 800 },
});
const page = await context.newPage();
// Remove the webdriver flag
await page.addInitScript(() => {
Object.defineProperty(navigator, "webdriver", { get: () => undefined });
});
await page.goto("https://example.com");These adjustments cover most basic detection. For sites with advanced anti-bot systems (Cloudflare Turnstile, Akamai Bot Manager), a stealth browser service provides pre-configured evasion and is more reliable than manual flag tuning.
Only convert pages you are authorized to access and archive. Bypassing detection on sites that explicitly block automated access may violate their terms of service.
Handling cookie consent banners
Consent dialogs block the content behind them and appear in the PDF. Dismiss them before capturing:
await page.goto(url, { waitUntil: "networkidle" });
// Try to click the accept button (common class/id patterns)
try {
await page.click(
'[id*="accept"], [class*="accept-all"], button:has-text("Accept")',
{ timeout: 3000 }
);
await page.waitForLoadState("networkidle");
} catch {
// No banner found, continue
}
const pdf = await page.pdf({ format: "A4", printBackground: true });Controlling screen vs print rendering
By default, Playwright captures the page in screen mode, which matches the visual appearance in a browser. For pages with explicit @media print CSS rules (hiding navigation, resetting backgrounds), switch to print media before capture:
await page.emulateMedia({ media: "print" });For most URL-to-PDF conversions, screen rendering is the right choice. Print media makes sense when the target site has dedicated print styles. The CSS print styles guide covers how to write print CSS that produces clean PDFs from HTML.
Reusing a browser instance for batch conversion
Launching a new browser process for each URL is slow. For batch jobs, launch one browser and reuse it across pages:
const browser = await chromium.launch();
const urls = ["https://example.com/page1", "https://example.com/page2"];
for (const url of urls) {
const page = await browser.newPage();
await page.goto(url, { waitUntil: "networkidle" });
const pdf = await page.pdf({ format: "A4", printBackground: true });
fs.writeFileSync(`output-${Date.now()}.pdf`, pdf);
await page.close(); // close the page, keep the browser
}
await browser.close();Each page takes roughly 1-3 seconds with a warm browser, depending on the target page's complexity.
Common problems and fixes
| Problem | Cause | Fix |
|---|---|---|
| Blank PDF | JS rendering incomplete | Use waitUntil: "networkidle" or add page.waitForSelector() |
| Missing images | Lazy loading not triggered | Scroll the full page before page.pdf() |
| Fonts not rendering | Web font not loaded | Wait for networkidle before capture |
| Layout broken | Print CSS overriding screen styles | Use page.emulateMedia({ media: "screen" }) |
| Page cut off mid-content | Fixed-height container in CSS | Inject html, body { height: auto !important; overflow: visible !important; } |
| 403 / access denied | Bot detection | Set realistic user agent, remove navigator.webdriver |
| Consent banner in PDF | Modal not dismissed | Click accept button before capture |
| Images from CDN missing | Authentication-gated assets | Run the browser with the correct cookies for that domain |
Generating PDFs from templates (for structured documents)
URL-to-PDF is best for capturing existing pages. For generating structured documents (invoices, reports, certificates) from data, a template-based HTML-to-PDF approach gives more control over layout and produces leaner files. The PDF4.dev API handles this with a single POST request: pass a template ID and data, receive a PDF.
For deep-dive comparisons of Playwright vs. Puppeteer, see Playwright vs Puppeteer for PDF generation.
Related tools: convert webpage to PDF · HTML to PDF · compress PDF
Free tools mentioned:
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.