How do I prevent Chromium from leaking memory in a long-lived worker?

Always close the page in a finally block, never the browser. Recycle the worker process after a fixed number of renders (we use 5000) or when RSS exceeds a threshold (1.5 GB). Run only one Chromium per worker and rely on the page model for concurrency, not multiple browsers.

What is the right page.setContent waitUntil value for PDFs?

Use 'load' as the default. 'networkidle' waits until there are zero network requests for 500 ms, which hangs forever if any asset 404s slowly. 'domcontentloaded' is too early because external fonts and images may not be ready. 'load' fires when the window load event completes and is the right balance for inlined assets.

How do I implement idempotency for a PDF render endpoint?

Accept an Idempotency-Key header on the POST. Hash it with the request body to derive a cache key, store the resulting PDF for 24 hours, and return the cached bytes for any retry within that window. Stripe-style idempotency means the second call returns the original response, not a fresh render.

What metrics should I track for a PDF service?

Render duration p50/p95/p99, queue depth, browser-page count, RSS per worker, render error rate by error type, retry rate, and output-size distribution. Alert on p99 above 2 seconds, queue depth above 50, and error rate above 1 percent over 5 minutes.

Should I run one browser per worker or one shared browser?

One browser per worker process. Sharing a single Chromium across multiple Node or Python workers requires inter-process page management, which is error-prone. The cost of a second browser is around 250 MB of RSS, which is negligible compared to the reliability cost of cross-process coordination.

How long should a PDF render be allowed to take before timeout?

Set a 30-second hard timeout on the page render and return a 504 to the caller. Most PDFs finish in under 500 ms; anything past 5 seconds is almost always a stuck network request or an infinite animation. The 30-second cap is a safety net, not a target.

Is it cheaper to self-host or to use a PDF API?

Self-hosting is cheaper at very high volume (above 500k renders per month) if you already operate a Kubernetes cluster. Below that, the engineering time to maintain a hardened browser pool, font fallback, retries, and on-call rotation costs more than the API per-render price. The crossover is usually 100-200k renders per month.

Developer Guides

PDF generation best practices for production

An operational guide to running PDF generation at scale: caching, retries, timeouts, browser pool lifecycle, memory, observability, and cost. Real numbers from the PDF4.dev production stack.

benoitdedMay 4, 202615 min read

On this page

What does production-grade PDF generation actually mean?
How do I cache PDF renders without serving stale documents?
How do I implement retries and idempotency?
Why does waitUntil matter for PDF timeouts?
How do I run a browser pool that does not crash?
How do I avoid cold starts on serverless?
How do I detect and fix memory leaks?
What metrics should I track?
How do I sanitize errors before logging or returning them?
How much does it cost to run at scale?
Production-readiness checklist
How does the PDF4.dev architecture handle all of this?
FAQ
How do I prevent Chromium memory leaks in long-lived workers?
What is the right waitUntil value?
How do I implement idempotency for a render endpoint?
What metrics should I track?
One browser per worker or one shared browser?
How long should a render take before timeout?
Self-host or use an API?
Try it without the operational burden

PDF generation looks like a stateless function until you put it in production. Then you discover Chromium leaks, fonts go missing, the browser disconnects mid-request, and one stuck network call brings down the whole worker. This is the operational guide we wish existed when we built PDF4.dev.

The short answer: cache aggressively, retry with idempotency, never trust networkidle, recycle workers on a counter, and measure everything at p99.

What does production-grade PDF generation actually mean?

Production-grade means the system holds together under the full failure surface: cold starts, OOM kills, slow upstream assets, retries, malformed input, and traffic spikes. A prototype that renders an invoice in 300 ms is not production-grade. A service that holds 99.9 percent at p99 under 800 ms across 200k renders per day is.

Concern	Prototype	Production
Browser lifecycle	`launch()` per request	Singleton + worker recycle
Asset loading	`waitUntil: networkidle`	`waitUntil: load` + 30s cap
Errors	`try/catch`, log to stdout	Sanitized, typed, retried
Caching	None	Render cache by content hash
Observability	Console logs	p50/p95/p99, queue depth, RSS
Cost	Free	$/1000 renders + on-call
Cold start	1.2 s	under 100 ms (warm pool)

This guide walks through each row of that table.

How do I cache PDF renders without serving stale documents?

Cache on a content hash of the inputs, never on the URL or the user. A PDF render is a pure function of HTML plus data plus format. Hash all three (SHA-256 over a canonical JSON representation) and use that as the cache key. Two requests with the same inputs return the same bytes; one byte difference produces a new key and a fresh render.

import { createHash } from "node:crypto";
 
function renderKey(html: string, data: object, format: object): string {
  const payload = JSON.stringify({
    html,
    data: sortKeys(data),
    format: sortKeys(format),
  });
  return "render_" + createHash("sha256").update(payload).digest("hex").slice(0, 32);
}
 
function sortKeys<T>(obj: T): T {
  if (Array.isArray(obj)) return obj.map(sortKeys) as T;
  if (obj && typeof obj === "object") {
    return Object.fromEntries(
      Object.entries(obj as Record<string, unknown>)
        .sort(([a], [b]) => a.localeCompare(b))
        .map(([k, v]) => [k, sortKeys(v)]),
    ) as T;
  }
  return obj;
}

Two layers of cache work well in practice:

Render cache. Stores the actual PDF bytes for 24 hours, keyed by renderKey(). Disk-backed for cheap, S3-backed for shared. PDF4.dev uses disk under data/renders/ with a 2000-entry soft cap and an HMAC-signed URL for delivery.
Template cache. Stores the compiled Handlebars function in memory, keyed by the template ID and version. Skips the parse step entirely. A Handlebars compile takes 5-15 ms, which adds up across thousands of renders per minute.

Cache invalidation happens automatically: any change to HTML, data, or format produces a new hash. You never need to call cache.delete() manually.

Do not cache by user ID or template name alone. Two users with the same invoice template and the same data will get the same PDF, but if you key by user the cache hit rate drops to zero. Key by content, not by identity.

How do I implement retries and idempotency?

Accept an Idempotency-Key header on the render endpoint. Store the response keyed by that header for 24 hours. On retry, return the original response (same bytes, same HTTP status, same Content-Type). This is the Stripe pattern and it is the only retry strategy that holds under network failures.

import { NextRequest, NextResponse } from "next/server";
 
const idempotencyStore = new Map<string, { status: number; body: Buffer }>();
 
export async function POST(req: NextRequest) {
  const idempotencyKey = req.headers.get("idempotency-key");
  if (idempotencyKey && idempotencyStore.has(idempotencyKey)) {
    const cached = idempotencyStore.get(idempotencyKey)!;
    return new NextResponse(new Uint8Array(cached.body), {
      status: cached.status,
      headers: { "Content-Type": "application/pdf", "Idempotency-Replay": "true" },
    });
  }
 
  const { html, data } = await req.json();
  const pdf = await renderPdf(html, data);
 
  if (idempotencyKey) {
    idempotencyStore.set(idempotencyKey, { status: 200, body: pdf });
    setTimeout(() => idempotencyStore.delete(idempotencyKey), 24 * 60 * 60 * 1000);
  }
 
  return new NextResponse(new Uint8Array(pdf), {
    status: 200,
    headers: { "Content-Type": "application/pdf" },
  });
}

The in-memory Map is fine for a single instance. For multi-instance, swap it for Redis with a 24-hour TTL.

The retry policy on the client side should use exponential backoff with jitter, capped at 3 attempts. Anything more is masking a real problem.

async function renderWithRetry(payload: object, attempt = 0): Promise<Buffer> {
  try {
    const idempotencyKey = crypto.randomUUID();
    const res = await fetch("https://pdf4.dev/api/v1/render", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.PDF4_API_KEY}`,
        "Content-Type": "application/json",
        "Idempotency-Key": idempotencyKey,
      },
      body: JSON.stringify(payload),
    });
    if (res.status >= 500 && attempt < 2) throw new Error("retryable");
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    return Buffer.from(await res.arrayBuffer());
  } catch (err) {
    if (attempt >= 2) throw err;
    const delay = 200 * 2 ** attempt + Math.random() * 100;
    await new Promise((r) => setTimeout(r, delay));
    return renderWithRetry(payload, attempt + 1);
  }
}

Why does waitUntil matter for PDF timeouts?

page.setContent(html, { waitUntil: ... }) is the most consequential single argument in the Playwright API for PDF work. Pick the wrong value and your worker hangs on a slow CDN or fires the PDF before the fonts arrive.

Value	Fires when	When to use
`domcontentloaded`	DOM tree parsed, no subresources loaded	Fully self-contained HTML (no `<img>`, no `@font-face`)
`load`	`window.load` event, all subresources loaded	Default for inlined assets and embedded fonts
`networkidle`	No network activity for 500 ms	Heavy JS apps that wait on XHR. Hangs if any asset 404s slowly.
`commit`	Navigation committed	Almost never useful for PDFs

Pair the waitUntil with an explicit hard timeout on the page operation:

const RENDER_TIMEOUT_MS = 30_000;
 
await page.setContent(html, { waitUntil: "load", timeout: RENDER_TIMEOUT_MS });
const pdf = await page.pdf({
  format: "A4",
  printBackground: true,
  timeout: RENDER_TIMEOUT_MS,
});

The 30-second cap is the safety net, not the target. Most PDFs render in 200-400 ms. Anything over 5 seconds is almost always a stuck <link rel="stylesheet"> to a slow CDN or a JavaScript animation that never settles.

For fonts specifically, prefer base64 data URIs over remote URLs. The CSS @font-face { src: url("data:font/woff2;base64,...") } pattern guarantees the font is "loaded" the moment the HTML is parsed, so waitUntil: load works without extra coordination.

How do I run a browser pool that does not crash?

One Chromium per worker process. One worker process per CPU core. One page per concurrent render. The browser stays alive forever, the page is created on demand and closed in a finally block.

import { chromium, type Browser } from "playwright";
 
let browser: Browser | null = null;
let renderCount = 0;
const MAX_RENDERS_PER_BROWSER = 5000;
 
async function getBrowser(): Promise<Browser> {
  if (browser && browser.isConnected() && renderCount < MAX_RENDERS_PER_BROWSER) {
    return browser;
  }
  if (browser) {
    await browser.close().catch(() => {});
  }
  browser = await chromium.launch({
    args: ["--no-sandbox", "--disable-dev-shm-usage", "--font-render-hinting=none"],
  });
  renderCount = 0;
 
  browser.on("disconnected", () => {
    browser = null;
  });
 
  return browser;
}
 
export async function renderPdf(html: string): Promise<Buffer> {
  const b = await getBrowser();
  const page = await b.newPage();
  try {
    await page.setContent(html, { waitUntil: "load", timeout: 30_000 });
    const pdf = await page.pdf({ format: "A4", printBackground: true });
    renderCount++;
    return pdf;
  } finally {
    await page.close().catch(() => {});
  }
}

The three things this pattern does right:

Recycles the browser after 5000 renders to bound memory growth.
Listens for disconnect so the next request creates a fresh browser instead of throwing on a dead handle.
Closes the page in finally so a render error does not leak the page object.

Why not multiple browsers? Because Chromium uses ~250 MB of RSS at idle and concurrent pages inside one browser are essentially free. A single browser handles 30+ concurrent pages comfortably. Spawning multiple browsers per worker just multiplies the memory cost.

On Linux containers, --disable-dev-shm-usage is mandatory. The default /dev/shm is 64 MB and Chromium uses it as scratch space. Without that flag you will get random Page crashed! errors on the second or third concurrent render. Increasing /dev/shm to 1 GB also works (--shm-size=1g in Docker), but the flag is simpler.

How do I avoid cold starts on serverless?

You don't. Cold starts on serverless platforms with bundled Chromium take 1.2-2.5 seconds because Lambda or Vercel must extract a 300 MB layer, mount it, and spawn the binary. Even with @sparticuz/chromium, the warm path is fast but the cold path is brutal.

Three strategies actually work:

Provisioned concurrency (AWS Lambda) keeps N instances warm. Cost scales linearly with N, but cold start drops to zero.
Long-lived workers (ECS, Fly Machines, Railway) run a singleton browser for hours or days. No cold starts after the first request.
Hosted API keeps the warm pool on someone else's machine. PDF4.dev's /api/v1/render endpoint runs against a permanently-warm Chromium pool, so the only latency is the network round-trip plus the actual render.

The right answer depends on traffic shape. Bursty traffic with long idle periods favours option 3. Steady traffic favours option 2. If you need option 1, you are usually two months away from outgrowing serverless anyway.

How do I detect and fix memory leaks?

Memory leaks in PDF workers come from three places: forgotten pages, cached compiled templates that never expire, and Chromium itself leaking 200-500 KB per render even when you do everything right. The first two are bugs. The third is a property of the engine and you have to plan around it.

The detection pattern is straightforward: log RSS every minute and chart it. A healthy worker oscillates between 280 MB and 320 MB. A leaking worker climbs by ~100 KB per render, so 1000 renders later you are at 380 MB and rising.

setInterval(() => {
  const rss = process.memoryUsage().rss / 1024 / 1024;
  console.log(`[mem] rss=${rss.toFixed(0)}MB renders=${renderCount}`);
  if (rss > 1500) {
    console.warn("[mem] RSS over 1.5GB, exiting for restart");
    process.exit(0); // process manager restarts us
  }
}, 60_000);

The process.exit(0) line is the trick. Let your process manager (PM2, systemd, Kubernetes, the Railway runner) restart the worker. Kubernetes does this for you with a memory limit on the pod. PM2 does it with --max-memory-restart 1500M. Either way, the recycle is cheap because incoming requests just route to the other workers in the pool.

What metrics should I track?

Render-time histograms, queue depth, error rate, and worker memory. Everything else is secondary. The three queries you should be able to answer in 30 seconds:

What is the p99 render duration over the last hour?
How many renders are queued right now?
Which error types are climbing today?

A minimal Prometheus setup:

import { Histogram, Counter, Gauge } from "prom-client";
 
const renderDuration = new Histogram({
  name: "pdf_render_duration_seconds",
  help: "PDF render duration",
  buckets: [0.1, 0.2, 0.3, 0.5, 1, 2, 5, 10, 30],
});
 
const renderErrors = new Counter({
  name: "pdf_render_errors_total",
  help: "PDF render errors",
  labelNames: ["error_type"],
});
 
const browserPages = new Gauge({
  name: "pdf_browser_pages",
  help: "Number of open browser pages",
});
 
export async function renderPdfInstrumented(html: string): Promise<Buffer> {
  const end = renderDuration.startTimer();
  browserPages.inc();
  try {
    return await renderPdf(html);
  } catch (err) {
    renderErrors.inc({ error_type: classify(err) });
    throw err;
  } finally {
    end();
    browserPages.dec();
  }
}
 
function classify(err: unknown): string {
  const msg = err instanceof Error ? err.message : String(err);
  if (msg.includes("timeout")) return "timeout";
  if (msg.includes("Page crashed")) return "page_crash";
  if (msg.includes("Target closed")) return "browser_disconnect";
  return "unknown";
}

Alert thresholds we run in production: p99 above 2 seconds for 5 minutes, error rate above 1 percent for 5 minutes, queue depth above 50 for 1 minute, RSS above 1.4 GB on any single worker.

How do I sanitize errors before logging or returning them?

PDF rendering errors leak filesystem paths, container hostnames, and stack traces. Strip them before they hit logs or response bodies. The PDF4.dev pipeline runs every error through a sanitizeErrorMessage() helper that removes absolute paths, source locations, and stack tails:

export function sanitizeErrorMessage(input: unknown): string {
  let msg = input instanceof Error ? input.message : String(input);
 
  msg = msg
    .replace(/\/(?:app|home|workspaces|root|usr)\/[\w./-]+/g, "[path]")
    .replace(/[A-Z]:\\\\[\w\\\\.-]+/g, "[path]")
    .replace(/file:\/\/\/[^\s)]+/g, "[file]")
    .replace(/:\d+:\d+/g, "")
    .replace(/\n\s+at\s.+/g, "");
 
  msg = msg.split("\n").find((line) => line.trim()) ?? "";
  return msg.slice(0, 500);
}

Apply it at three points: writing to the log table, sending to your error tracker, and serializing the error response to the client. Skipping any of those leaks infrastructure detail to a customer or, worse, to another tenant in a multi-tenant deployment.

How much does it cost to run at scale?

Self-hosting cost is dominated by RAM, not CPU. A single worker handles 30+ concurrent renders inside one Chromium and uses 300-400 MB of RSS. At 8 workers per machine you need 4 GB just for browsers, plus headroom for Node and the OS. A typical $40/month VPS with 8 GB of RAM handles roughly 1 million renders per month before saturation.

Volume / month	Self-host	PDF4.dev API	Engineering time
10k	$40 (1 VPS)	$0 (free tier)	1-2 days setup, ongoing on-call
100k	$40 (1 VPS)	$20-50	3-5 days hardening, ongoing on-call
500k	$80 (2 VPS + LB)	$100-200	Dedicated browser-pool work
5M	$400 (cluster)	$800-1500	Dedicated SRE work

The engineering-time column is the one that matters. The first 100k renders are cheap. The transition from "it works" to "it holds at p99 under 800 ms with 99.9 percent uptime" costs weeks. The transition from "it holds" to "it scales horizontally with on-call rotation" costs months. Most teams underestimate both.

Production-readiness checklist

A 24-item table you can paste into a runbook. Every item is a real failure mode we have hit or seen.

Area	Item	Done?
Browser	Singleton Chromium per worker	☐
Browser	`--no-sandbox` and `--disable-dev-shm-usage` flags	☐
Browser	Worker recycle at 5000 renders or 1.5 GB RSS	☐
Browser	`disconnected` event handler that nulls the singleton	☐
Browser	`page.close()` in finally on every render	☐
Rendering	`waitUntil: "load"` not `networkidle`	☐
Rendering	30-second hard timeout on `setContent` and `pdf`	☐
Rendering	Embedded fonts via base64 `@font-face`	☐
Rendering	Print stylesheet authored against `@page`	☐
Caching	Content-hash render cache (24 h TTL)	☐
Caching	Compiled template cache (in-memory)	☐
Reliability	`Idempotency-Key` header support on POST	☐
Reliability	Exponential backoff with jitter, max 3 retries	☐
Reliability	Sanitized error messages (no paths or stack tails)	☐
Reliability	Typed error responses with `error.type` and `error.code`	☐
Observability	Render duration histogram (p50/p95/p99)	☐
Observability	Error counter labelled by error type	☐
Observability	RSS gauge per worker, alerting at 1.4 GB	☐
Observability	Queue depth gauge	☐
Operations	Process manager configured to restart on exit	☐
Operations	Health check endpoint that calls `browser.isConnected()`	☐
Operations	Graceful shutdown closes browser before exit	☐
Security	Render endpoint behind API key auth, never anonymous	☐
Security	Per-tenant rate limits with Retry-After headers	☐

If any item is unchecked when you ship, it will eventually wake someone up at 3 a.m.

How does the PDF4.dev architecture handle all of this?

We built PDF4.dev because we wanted to stop solving these problems on every project. The production stack uses a singleton Chromium per Next.js process, a 24-hour disk-backed render cache with HMAC-signed delivery URLs, sanitized errors persisted to SQLite, per-org rate limits, and the same Idempotency-Key pattern shown above. The internals are documented in how we render PDFs in under 300 ms and how we built an MCP server for PDF generation.

The point of the architecture is not that it is novel. It is that all 24 checklist items are already done. You wire your application to one HTTPS endpoint and stop thinking about browser pools.

FAQ

How do I prevent Chromium memory leaks in long-lived workers?

Close the page in a finally block, never the browser. Recycle the worker after a fixed number of renders (5000) or at an RSS threshold (1.5 GB). Run one Chromium per worker.

What is the right `waitUntil` value?

load. Use domcontentloaded only for fully self-contained HTML. Avoid networkidle because it hangs on slow asset 404s.

How do I implement idempotency for a render endpoint?

Accept Idempotency-Key, store the response keyed by that header for 24 hours, return the cached bytes on retry.

What metrics should I track?

Render duration p50/p95/p99, queue depth, browser-page count, RSS per worker, error rate by type.

One browser per worker or one shared browser?

One browser per worker. Sharing across processes is error-prone and the second browser only costs 250 MB of RSS.

How long should a render take before timeout?

30 seconds hard cap. Most PDFs finish in 200-500 ms.

Self-host or use an API?

The crossover is around 100-200k renders per month, dominated by engineering time, not infra cost.

Try it without the operational burden

The fastest way to validate the patterns in this guide is to run them against a service that already implements them. The free Html To PdfTry it free tool runs the same Playwright pipeline with the same browser pool, the same timeouts, and the same error sanitization. When you are ready to wire it into your application, the render endpoint takes the same HTML.

Free tools mentioned:

Html To PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.

Get Started free API Docs

Developer Guides

Generate PDFs from HTML templates with Node.js

Build a Node.js PDF generator using Handlebars templates and Playwright. Covers dynamic data, styling, fonts, Docker, and when a PDF API makes more sense.

Feb 18, 20269 min read

Developer Guides

HTML to PDF benchmark 2026 (Playwright vs Puppeteer vs WeasyPrint)

Playwright vs Puppeteer vs WeasyPrint: real HTML-to-PDF latency and file size, Node.js and Python usage, macOS and Linux, plus the production gotchas inside.

Mar 17, 202613 min read

Developer Guides

PDF generation in serverless environments: AWS Lambda, Vercel, and Cloudflare

Why Playwright and serverless are a painful match. Cold starts, size limits, Cloudflare Workers, @sparticuz/chromium, and when to call an API instead.

Apr 27, 202615 min read

Start generating PDFs

Related Articles

Generate PDFs from HTML templates with Node.js

HTML to PDF benchmark 2026 (Playwright vs Puppeteer vs WeasyPrint)

PDF generation in serverless environments: AWS Lambda, Vercel, and Cloudflare