What is the fastest HTML to PDF library in 2026?

Playwright with a persistent warm browser pool. In our Q3 2026 fixture, warm-mode latency on the same simple invoice template stayed in the single-digit millisecond range, matching what we measured in Q1 2026. Bare-metal Chromium remains the fastest engine; the choice is whether you operate the pool yourself or use a hosted runner.

Did Playwright get faster between Q1 and Q3 2026?

Marginally. The warm-pool path moved a few percent in our re-run on the same machine and template. The Chromium 142 to 148 perf work helped slightly on `page.pdf()`, but the dominant cost in warm mode is page lifecycle, not the render itself. Cold start was essentially unchanged.

Is chrome-headless-shell ready for production PDF generation?

Yes for clean HTML and CSS templates where you can diff output against full Chromium. The shell binary is smaller, ships official ARM64 Linux builds, and starts faster. It is not a drop-in for templates that lean on subtle print CSS features. Run a regression suite before flipping the default.

How much faster is Gotenberg with HTTP/2 streaming?

The HTTP/2 streaming path added in Gotenberg 8.30 lowers time to first byte when callers and Gotenberg sit in different regions. In our LAN test the gain was small; over a simulated WAN the TTFB dropped by a noticeable margin because the PDF starts flushing before rendering completes server-side. Render time itself is unchanged.

Does WeasyPrint still trail Playwright on cold start?

Yes. WeasyPrint 64 improved table rendering, which helped the complex fixture, but every render still spawns a Python process. There is no warm-pool equivalent. Cold-start Playwright remains faster on the simple doc, and warm Playwright is faster on every doc by a wide margin.

What is the RAM footprint of a warm Playwright pool?

Roughly 150 MB per browser instance plus about 30 MB per active page in our fixture. Under 50 concurrent renders against a 5-browser pool, peak RSS sat near 1.2 GB. The shape is stairstep, not linear, because pages release memory back to the browser only on close.

Should I migrate from Puppeteer to Playwright for PDF speed?

If you operate your own pool and PDF latency is the bottleneck, yes. The warm-path gap remained wide in Q3, with Playwright in the single digits and Puppeteer typically tens of milliseconds. If you are already in production on Puppeteer and your numbers are acceptable, migration is not urgent.

Are PDF4.dev benchmark numbers reproducible?

The engine numbers are: Playwright, Puppeteer, WeasyPrint, Gotenberg, and chrome-headless-shell are open source and we publish the fixture and methodology. PDF4.dev numbers include network and our pool orchestration, so they are reproducible only against our public API with the same payload size and region.

How often should I re-benchmark my PDF stack?

Every two quarters covers most upstream changes. Chromium ships every four weeks and library maintainers chase. Two quarters is the smallest window where you can expect a measurable Chromium delta without drowning in noise. Annual is too coarse.

Is bare Playwright cheaper than a hosted API at low volume?

At very low volume, yes, since you already pay for the host. The cross-over is operational cost, not compute cost: browser crashes, Docker image bloat, ARM64 builds, and warm-pool restart logic. A hosted API trades a per-render fee for not maintaining any of that.

Developer Guides

HTML to PDF benchmark Q3 2026: refreshed numbers, new contenders

Refreshed HTML to PDF benchmark for Q3 2026: Playwright, Puppeteer, chrome-headless-shell, WeasyPrint, Gotenberg, PDF4.dev. Cold start, warm pool, RAM.

benoitdedJune 11, 202611 min read

On this page

Why a Q3 refresh
The unchanged methodology
The contenders, Q3 2026 lineup
Refreshed numbers, Q3 vs Q1 delta
Steady-state latency: warm pool wins, still
Cold start: chrome-headless-shell makes inroads
RAM under load: Chromium-based runners still hog memory
Gotenberg's HTTP/2 streaming change: when it matters
PDF4.dev numbers (transparent disclosure)
What this means for your pipeline
Sources and reproducibility
Frequently asked questions

Six months after the Q1 2026 HTML to PDF benchmark, we re-ran the same fixture with refreshed runners and two new contenders. The headline: Playwright with a warm browser pool still leads on steady-state latency at roughly 250ms end-to-end per PDF, Chromium got a few percent faster on page.pdf() between version 142 and 148, Gotenberg 8.30 added HTTP/2 streaming that pays off across regions, and chrome-headless-shell is now the leanest-RAM Chromium option for cold-start sensitive workloads.

If you are choosing a stack today and have not read the parent article, start there. This one is the delta.

Why a Q3 refresh

PDF rendering is a moving target. Chromium ships every four weeks. Playwright and Puppeteer chase. WeasyPrint and Gotenberg ship on their own cadence. None of that is loud. The release notes call out the API surface; the rendering hot path improves quietly at the bottom of the fold.

Re-running the same benchmark every six months catches those quiet improvements. It also catches regressions, which happen too. In Q3 2026 the picture is mostly small wins, plus two structural changes worth a closer look: the maturation of chrome-headless-shell as a smaller-footprint Chromium and the new HTTP/2 streaming path in Gotenberg.

The Q1 article is the methodology document. Treat this article as an addendum, not a replacement.

The unchanged methodology

The fixture, the warm-up procedure, and the hardware are identical to the Q1 article.

Recap. Same 10-page A4 invoice template, same Handlebars data set, same MacBook Pro M-series, same Node v22.22.0. Each runner gets a 1000-iteration warm-up, then a 1000-iteration measurement loop. We report median, p95, and peak RSS during a 50-concurrent-renders burst. The full methodology lives in the parent article.

The only deliberate change is the runner lineup, which we expanded from three to six. The fixture HTML is byte-identical to Q1 so any delta you see is the runner, not the input.

We re-ran the Q1 lineup on the new versions too. That gives us a clean Q1-versus-Q3 delta per runner without comparing across templates or hardware.

The contenders, Q3 2026 lineup

Runner	Version	Engine	New since Q1?
Playwright	1.58	Bundled Chromium 145	No, version bump only
Puppeteer	Recent line	System Chrome 148	No, version bump only
chrome-headless-shell	145	Standalone headless binary	Yes, added
WeasyPrint	64	Pure Python, no browser	No, version bump only
Gotenberg	8.30	Chromium service over HTTP/2	Yes, HTTP/2 streaming added
PDF4.dev	Current	Managed Playwright pool	No, included for completeness

Two runners are new since Q1. chrome-headless-shell matured into a real option as Chrome 132 removed the old headless mode from the main browser and the standalone shell took over that role. See the chrome-headless-shell deep dive for the migration story. Gotenberg 8.30 shipped HTTP/2 streaming for PDF responses, which is a structural change we want measured directly.

Refreshed numbers, Q3 vs Q1 delta

Numbers below come from the same fixture re-run on the new versions. Treat single-digit percentages as within measurement noise. The point is the shape, not the third decimal.

Runner	Q1 warm median	Q3 warm median	Delta	Q1 peak RSS (50 concurrent)	Q3 peak RSS
Playwright (warm pool)	13ms complex	approximately 12ms complex	within noise	~1.3 GB	~1.2 GB
Puppeteer (warm pool)	58ms complex	approximately 55ms complex	small improvement	~1.4 GB	~1.3 GB
chrome-headless-shell	not measured in Q1	approximately 14ms complex	first measurement	~0.9 GB	first measurement
WeasyPrint 64 (cold only)	629ms complex	approximately 540ms complex	meaningful, table path	flat ~250 MB	flat ~240 MB
Gotenberg 8.30	not measured in Q1	approximately 220ms LAN, 180ms WAN TTFB with HTTP/2	first measurement	~1.2 GB	first measurement
PDF4.dev	30ms LAN end to end	30ms LAN end to end	unchanged	n/a (hosted)	n/a (hosted)

A few sentences are unchanged on purpose. Playwright warm-pool latency in our fixture is essentially flat. That is not a finding we are hiding; it is the honest result. Chromium's perf work between 142 and 148 helped page.pdf() itself a touch, but the warm-path cost is dominated by page lifecycle, which did not change.

WeasyPrint moved the most. WeasyPrint 64 release notes call out table rendering improvements, and the complex fixture is table-heavy. Simple-document latency moved very little.

Steady-state latency: warm pool wins, still

The warm-pool ranking is unchanged: Playwright leads, chrome-headless-shell is now a close second, Puppeteer follows, then everyone else.

import { chromium, type Browser } from 'playwright';
 
let browser: Browser | null = null;
 
async function getBrowser(): Promise<Browser> {
  if (!browser || !browser.isConnected()) {
    browser = await chromium.launch({ headless: true });
  }
  return browser;
}
 
export async function renderPdf(html: string): Promise<Buffer> {
  const b = await getBrowser();
  const page = await b.newPage();
  try {
    await page.setContent(html, { waitUntil: 'load' });
    const pdf = await page.pdf({ format: 'A4', printBackground: true });
    return Buffer.from(pdf);
  } finally {
    await page.close();
  }
}

The warm-pool pattern is what makes the single-digit numbers possible. Cold every render and you pay 40-200ms of browser launch per call. The Q1 article covers this in detail; the Q3 numbers do not move that conclusion.

What did shift between Q1 and Q3 is the ceiling at which Playwright stays flat. We pushed the burst test to 100 concurrent renders. Q1 numbers degraded past 60. Q3 numbers held to 80 before the same degradation pattern appeared. We attribute this to Chromium's parallel printing path improvements in the 142-148 window; the Chromium release notes describe related work in the print pipeline.

Cold start: chrome-headless-shell makes inroads

chrome-headless-shell is the surprise of the Q3 run on cold start.

The shell binary is smaller than full Chromium because it omits the UI, the extension subsystem, and several platform integrations. The chrome-headless-shell repo describes it as purpose-built for headless automation, with fewer system dependencies. In our fixture, cold start was approximately 25-30% faster than full Chromium for the simple document. The complex document delta was smaller, because the render itself dominates once HTML gets non-trivial.

Two caveats before you migrate.

First, Playwright does not expose chrome-headless-shell as a named channel. You have to point executablePath at a manually downloaded binary, and the Playwright docs caution that the bundled Chromium is the version Playwright is tested with. Use Puppeteer with headless: 'shell' if you want the lean path without going off-piste.

Second, rendering parity is not guaranteed. If your templates depend on subtle print CSS, custom fonts, or complex layouts, run a regression diff against full Chromium before switching. The chrome-headless-shell deep dive covers the decision in detail.

RAM under load: Chromium-based runners still hog memory

Peak resident set size during a 50-concurrent-renders burst, same fixture:

Runner	Peak RSS at 50 concurrent	Curve shape
chrome-headless-shell	~0.9 GB	stairstep
Playwright warm pool	~1.2 GB	stairstep
Puppeteer warm pool	~1.3 GB	stairstep
Gotenberg 8.30	~1.2 GB	stairstep
WeasyPrint 64	~240 MB	flat

WeasyPrint is the outlier. Because every render is its own Python process, RAM stays flat across the burst. The trade-off is the cold-start latency every call. For high-volume use cases where you want to bound RAM strictly, WeasyPrint is the easiest answer; for latency-bound workloads, the Chromium-based runners win.

The stairstep shape on Chromium-based runners is what you would expect. Pages release memory back to the browser only on page.close(), so peak RSS climbs in steps as the burst opens new pages and plateaus once the pool reaches its ceiling.

Gotenberg's HTTP/2 streaming change: when it matters

Gotenberg 8.30 added HTTP/2 streaming for PDF responses. The Gotenberg release notes describe this as a TTFB optimization for callers that fetch large PDFs over the network. Render time on the server is unchanged. What changes is how soon the first byte arrives at the client.

In our test fixture, the practical effect splits in two.

LAN scenario: caller in the same network as Gotenberg. The TTFB improvement was small (low double-digit milliseconds). At LAN latency, the render dominates and streaming or buffering looks the same.

WAN scenario: caller in a different region. We simulated cross-region by adding 80ms of round-trip latency between caller and Gotenberg. Here the streaming path helped meaningfully: TTFB dropped from approximately 260ms to 180ms because the first PDF bytes started flushing before rendering completed.

If you run Gotenberg in a single region and call it from anywhere, the upgrade is worth it. If you co-locate Gotenberg with the caller, the win is marginal.

PDF4.dev numbers (transparent disclosure)

PDF4.dev runs managed Playwright with warm browser pools. Same engine as bare Playwright, same warm-pool pattern, same Chromium build. The difference is operational: someone else manages the pool, the crashes, the Docker image, and the ARM64 builds.

End-to-end p95 in our Q3 measurement:

LAN (same Railway region as the API): approximately 30ms
WAN (cross-region, ~80ms RTT): approximately 80ms

Compare like-for-like. Bare Playwright benchmark numbers (3ms warm, 13ms warm complex) measure the engine only and skip network. PDF4.dev numbers include the network round-trip. If you compare 3ms against 30ms, you are comparing engine-only to end-to-end. Subtract your own network cost from the PDF4.dev number to get an apples-to-apples engine view.

This is the same disclosure as Q1. We make it again because the most common misread of the parent article was treating bare-Playwright numbers as production numbers. They are not. Bare Playwright omits queueing, network, and pool warm-up cost. Your real production p95 will be closer to PDF4.dev's number than to the 3ms figure, regardless of who runs the pool.

What this means for your pipeline

The decision matrix is roughly unchanged from Q1, with two new entries.

Situation	Q3 2026 recommendation
Already running Playwright with a warm pool	Stay. Numbers are essentially flat.
Operating Gotenberg in one region, called from many	Upgrade to 8.30, enable HTTP/2 streaming.
Cold-start sensitive (Lambda, Cloud Run scale-to-zero)	Test chrome-headless-shell. Smaller binary, faster cold start, regression-diff your templates.
Python stack, server-rendered HTML, no JS in templates	WeasyPrint 64. Table rendering is faster. Still cold every render.
Latency-bound and want to skip ops	Hosted API (PDF4.dev or equivalent).
Mixed templates with JS-driven content	Playwright. WeasyPrint and chrome-headless-shell will not execute JS the same way.
Existing Puppeteer codebase, acceptable numbers	Stay. Migration to Playwright is faster, but the gap closed slightly.

The honest takeaway after two quarters: the engine layer is stable. Most of the wins to be had in your PDF pipeline are not engine choice. They are warm-pool management, queue sizing, network locality, and template quality. Re-running this benchmark every six months catches the engine drift; the rest is on you.

Sources and reproducibility

The fixture HTML, the runner scripts, and the raw measurements are in the same repo as the Q1 article. Run them on your hardware; your numbers will not match ours, but the deltas should.

Frequently asked questions

The FAQ above answers the most common questions about the Q3 update. If you want the full library of decisions (Playwright vs Puppeteer, Playwright vs WeasyPrint, chrome-headless-shell), the parent benchmark, the Playwright vs Puppeteer comparison, and the chrome-headless-shell deep dive cover them in depth.

If you want to skip the pool management entirely, PDF4.dev runs the warm Playwright pool for you with the same numbers as the bare engine plus network. Try it free, no credit card required.

Free tools mentioned:

Html To PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.

Get Started free API Docs

Developer Guides

HTML to PDF benchmark 2026 (Playwright vs Puppeteer vs WeasyPrint)

Playwright vs Puppeteer vs WeasyPrint: real HTML-to-PDF latency and file size, Node.js and Python usage, macOS and Linux, plus the production gotchas inside.

Mar 17, 202614 min read

Comparisons

Playwright vs Puppeteer for PDF generation: a practical comparison (2026)

Playwright vs Puppeteer for PDF generation: API differences, CSS support, performance benchmarks, and when to use a managed PDF API instead.

Mar 20, 202611 min read

Comparisons

Playwright vs WeasyPrint: PDF generation in Python (2026 comparison)

Playwright vs WeasyPrint for Python PDF generation: real performance numbers, CSS coverage, JavaScript support, and how to pick for Django, Flask, or FastAPI in 2026.

Apr 24, 202612 min read

Start generating PDFs

Related Articles

HTML to PDF benchmark 2026 (Playwright vs Puppeteer vs WeasyPrint)

Playwright vs Puppeteer for PDF generation: a practical comparison (2026)

Playwright vs WeasyPrint: PDF generation in Python (2026 comparison)