Get started

PDF generation in serverless environments: AWS Lambda, Vercel, and Cloudflare

Why Playwright and serverless are a painful match. Cold starts, size limits, Cloudflare Workers, @sparticuz/chromium, and when to call an API instead.

benoitded15 min read

Serverless and Chromium are a forced marriage. Lambda was built for stateless functions that init in milliseconds; Chromium is a 300 MB binary that takes seconds to launch and idles at half a gigabyte of RAM. Every PDF generation tutorial that says "just deploy this to Lambda" hides the cost: cold starts that wreck the user experience, deployment artifacts that barely fit, and concurrency caps that turn a small spike into a queue of timeouts.

This guide covers what actually works in each major serverless environment, what does not, and when to admit defeat and call an external API instead.

Why is serverless a bad fit for headless browser PDF rendering?

A serverless function is supposed to be cheap and fast to spawn. A headless Chromium is neither. It takes 500ms to 3 seconds to launch, allocates 200-400 MB of RAM the moment it starts, and is too large to fit inside the default Lambda or Vercel deployment package. Every serverless PDF stack works around these constraints, but none make them disappear.

The mismatch shows up in five places: package size, cold start latency, memory footprint, concurrent execution caps, and ephemeral filesystem. The next sections cover each one, starting with the constraint that bites first.

What are the deployment size limits for serverless PDF generation?

Stock Playwright ships with a 300-400 MB Chromium that exceeds every major serverless package limit. You either strip the browser to fit, push it as a Lambda layer, or move to a container deployment. There is no fourth option that uses the standard npm install playwright flow.

PlatformPackage limitStock Playwright fits?Workaround
AWS Lambda zip250 MB unzipped, 50 MB zippedNo@sparticuz/chromium (~50 MB)
AWS Lambda container10 GBYesContainer image with Chromium baked in
Vercel Functions250 MB unzipped, 50 MB zippedNo@sparticuz/chromium
Vercel Edge Functions1 MBNoNot possible, no native binaries
Cloudflare Workers1 MB (3 MB paid)NoBrowser Rendering API
Netlify Functions250 MB unzipped, 50 MB zippedNo@sparticuz/chromium
Google Cloud Functions100 MB source, 500 MB deployedSometimes@sparticuz/chromium recommended

The standard answer for Lambda and Vercel is @sparticuz/chromium, a fork of chrome-aws-lambda that ships a pre-stripped Chromium binary just under the size limit. It removes locales, optional codecs, and headers Chromium does not need in headless mode. The result is a binary that boots, but boots slower because the size reduction comes from removing optimization data alongside dead code.

How do you run Playwright on AWS Lambda with @sparticuz/chromium?

Use @sparticuz/chromium to ship the stripped binary, plus playwright-core (not playwright) so you do not pull the full launcher. Wire the chromium executable path into the Playwright launch options. Memory must be set to at least 1024 MB, and the function timeout to at least 30 seconds.

// handler.js
import chromium from '@sparticuz/chromium';
import playwright from 'playwright-core';
 
let browser = null;
 
async function getBrowser() {
  if (browser && browser.isConnected()) return browser;
 
  browser = await playwright.chromium.launch({
    args: chromium.args,
    executablePath: await chromium.executablePath(),
    headless: true,
  });
  return browser;
}
 
export const handler = async (event) => {
  const { html } = JSON.parse(event.body);
 
  try {
    const b = await getBrowser();
    const page = await b.newPage();
 
    try {
      await page.setContent(html, { waitUntil: 'networkidle' });
      await page.evaluate(() => document.fonts.ready);
      const pdf = await page.pdf({ format: 'A4', printBackground: true });
 
      return {
        statusCode: 200,
        headers: { 'Content-Type': 'application/pdf' },
        body: pdf.toString('base64'),
        isBase64Encoded: true,
      };
    } finally {
      await page.close();
    }
  } catch (err) {
    return { statusCode: 500, body: JSON.stringify({ error: err.message }) };
  }
};

The let browser = null outside the handler is the key trick. Lambda keeps the function instance warm for several minutes between invocations. By stashing the browser in module scope, the second invocation reuses the warm Chromium and skips the 1-2 second launch cost. The first invocation still pays it.

package.json:

{
  "dependencies": {
    "@sparticuz/chromium": "^131.0.0",
    "playwright-core": "^1.49.0"
  }
}

Configure Lambda with at least 1024 MB memory and 30 second timeout. Below 1024 MB, Chromium runs out of headroom and the page crashes during render. Below 30 seconds, the cold start plus the actual render can hit the timeout on a complex page.

Lambda functions running Chromium routinely OOM at 512 MB. Set memory to 1024 MB minimum, and 2048 MB if your templates pull external assets or render more than a couple of pages. Lambda CPU is allocated proportionally to memory, so doubling the memory also halves render time.

What does a Lambda cold start with Chromium actually cost?

A cold Lambda invocation that loads @sparticuz/chromium runs through five sequential phases before your render starts. Each phase has a measurable cost, and they cannot be parallelized.

PhaseTime on first invocationNotes
Lambda init (Node runtime)100-300msStandard Node cold start
import of @sparticuz/chromium200-500msDecompresses the binary to /tmp
playwright-core import50-100msSmaller than full playwright
chromium.launch()800-1500msActual browser process spin-up
newPage() + first setContent()150-300msFirst page is slower than subsequent
Total cold start1.3-2.7 secondsBefore any template work

A user clicking "download invoice" waits roughly 2 seconds for nothing useful, then another 200-400ms for the actual render. On a warm invocation, the import and launch phases drop to zero and total time falls to about 250ms. The painful part is that "warm" is fragile: Lambda evicts idle instances after a few minutes, so a low-traffic endpoint pays the cold cost on most requests.

Can you run PDF generation on Vercel?

Vercel Functions run on AWS Lambda under the hood, so the same @sparticuz/chromium workaround applies. Vercel Edge Functions, on the other hand, run on Cloudflare Workers, which cannot run native binaries at all. Edge is a dead end for PDF generation; you need the standard (Node.js) runtime.

// app/api/render-pdf/route.ts
import chromium from '@sparticuz/chromium';
import playwright from 'playwright-core';
import { NextResponse } from 'next/server';
 
export const runtime = 'nodejs';
export const maxDuration = 30;
 
let browser = null;
 
async function getBrowser() {
  if (browser && browser.isConnected()) return browser;
  browser = await playwright.chromium.launch({
    args: chromium.args,
    executablePath: await chromium.executablePath(),
    headless: true,
  });
  return browser;
}
 
export async function POST(request: Request) {
  const { html } = await request.json();
  const b = await getBrowser();
  const page = await b.newPage();
 
  try {
    await page.setContent(html, { waitUntil: 'networkidle' });
    const pdf = await page.pdf({ format: 'A4', printBackground: true });
    return new NextResponse(new Uint8Array(pdf), {
      headers: { 'Content-Type': 'application/pdf' },
    });
  } finally {
    await page.close();
  }
}

export const runtime = 'nodejs' is mandatory. Without it, Next.js may default to Edge for route.ts files in some configurations, which silently fails because Edge cannot import @sparticuz/chromium.

Do not put export const runtime = 'edge' on a route that uses Playwright or @sparticuz/chromium. The build will succeed, the deploy will succeed, and the function will fail at runtime with a confusing "module not found" error. Edge runtime cannot execute native binaries.

Vercel's 50 MB zipped function limit is tight. With @sparticuz/chromium, playwright-core, and a small handler, you land at about 45 MB. There is no room for additional native dependencies (sharp, canvas, libxml). If your template processing needs anything else native, switch to a Lambda container deployment.

What about Cloudflare Workers?

Cloudflare Workers cannot run Playwright, Puppeteer, or any native binary. Workers run in V8 isolates, not Node containers. There is no filesystem to unpack a Chromium binary to, no syscalls to launch a process, no shared memory to communicate with one. The standard answer is Cloudflare's Browser Rendering API, which runs a managed Chromium pool that you call from your Worker.

import puppeteer from '@cloudflare/puppeteer';
 
export default {
  async fetch(request, env) {
    const browser = await puppeteer.launch(env.MYBROWSER);
    const page = await browser.newPage();
 
    await page.setContent('<h1>Hello from Cloudflare</h1>');
    const pdf = await page.pdf({ format: 'A4' });
 
    await browser.close();
 
    return new Response(pdf, {
      headers: { 'Content-Type': 'application/pdf' },
    });
  },
};

wrangler.toml:

name = "pdf-worker"
main = "src/index.js"
compatibility_date = "2025-01-01"
 
browser = { binding = "MYBROWSER" }

This works, but Browser Rendering is a separate paid product with its own pricing (per session and per duration) on top of Workers. The cold start moves from your Worker to the Cloudflare-managed browser pool, which helps, but you are now operating two services instead of one. For most teams, paying for Browser Rendering is harder to justify than paying for a dedicated PDF API.

How do you scale PDF generation past the serverless cold start?

Three strategies, in order of complexity. The right answer depends on whether your renders are user-facing (someone is waiting) or batch (a job that can take minutes).

Strategy 1: warm pings. Schedule a CloudWatch event to invoke the Lambda every 5 minutes with a no-op payload. This keeps at least one warm instance alive at all times, eliminating the cold start for the first user request. The cost is roughly $0.20/month per warmed instance. This works for low-traffic endpoints with one or two concurrent requests at a time. It does not help if traffic spikes past your warm capacity, because new instances spawning to handle the spike are all cold.

Strategy 2: queue-based background jobs. Push render requests onto SQS, process them with a worker Lambda, and store the PDF in S3. The user gets a job ID and polls or receives a webhook when the PDF is ready. This decouples the user from the cold start: the user is not blocked on the render, and the worker can take 5 seconds to start up without anyone noticing. Use this for batch jobs (monthly invoices, end-of-day reports, anything triggered by cron).

// queue-render.js (called from your API)
import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs';
 
const sqs = new SQSClient({});
 
export async function queueRender(html, userId) {
  await sqs.send(new SendMessageCommand({
    QueueUrl: process.env.RENDER_QUEUE_URL,
    MessageBody: JSON.stringify({ html, userId, requestedAt: Date.now() }),
  }));
}
 
// worker-handler.js (Lambda triggered by SQS)
export const handler = async (event) => {
  for (const record of event.Records) {
    const { html, userId } = JSON.parse(record.body);
    const pdf = await renderPdf(html); // your @sparticuz/chromium render
    await uploadToS3(pdf, `${userId}/${Date.now()}.pdf`);
    await notifyUser(userId);
  }
};

Strategy 3: external PDF API with a warm browser pool. Offload the render entirely to a service that keeps Chromium warm permanently. The PDF4.dev architecture is a worked example: a long-running Node server with a singleton Chromium instance, pages closed after each render but the browser kept alive for the next request. Cold starts do not exist because the browser never starts cold. The render server reads from a persistent volume (data/) and serves dozens of concurrent requests against the same warm browser. Average render time is ~300ms because there is no spin-up cost on any request.

StrategyCold start costLatency on warm pathBest for
Warm pingsEliminated for 1 instance200-400msLow-traffic, single-region
Queue + background workerHidden from userJob time depends on queueBatch jobs, async UX
External PDF APINone (always warm)~300msInteractive, user-facing

How does a persistent server compare to serverless for PDF rendering?

A persistent server with a warm browser pool wins on every metric except minimum monthly cost. Serverless wins only when the workload is so low that the per-millisecond billing beats a fixed server price. The crossover point sits around 1000 renders per day on Lambda; below that, serverless is cheaper, above that, a small VPS plus a warm browser is both cheaper and faster.

MetricServerless (Lambda + Sparticuz)Persistent server (warm pool)
Cold start latency1.3-2.7s0
Warm render latency200-400ms200-300ms
Memory cost1024-2048 MB allocated even when idleOne process, shared across requests
Concurrency modelOne browser per Lambda instanceOne browser, many pages
Failure isolationPer-invocationCrash recovery on the singleton
Cost at 100 renders/dayLower (free tier)Higher (always-on VPS)
Cost at 10,000 renders/dayHigher (per-ms billing)Lower (fixed)
Operational complexityLambda + layer + warm pingsOne process, a watchdog, restart logic

The persistent-server pattern is what every dedicated PDF API uses internally. PDF4.dev runs a singleton Chromium per render server, recycles pages after each render, and keeps the browser alive across thousands of requests. The cold start cost happens once per deploy, not once per user.

When should you call an external PDF API instead?

When the cost of operating Chromium yourself is bigger than the cost of paying someone else to do it. Three concrete signals: your cold start latency hurts user experience, your Lambda bill is climbing because of memory tier creep, or you spent a Friday afternoon debugging a Chromium crash on Lambda that you could not reproduce locally. All three are common, all three argue for offloading.

// Replace your @sparticuz/chromium handler with a one-line API call
import { NextResponse } from 'next/server';
 
export async function POST(request: Request) {
  const { html } = await request.json();
 
  const response = await fetch('https://pdf4.dev/api/v1/render', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ html, format: { preset: 'a4' } }),
  });
 
  if (!response.ok) {
    return NextResponse.json({ error: 'render failed' }, { status: 500 });
  }
 
  const pdf = Buffer.from(await response.arrayBuffer());
  return new NextResponse(new Uint8Array(pdf), {
    headers: { 'Content-Type': 'application/pdf' },
  });
}

Your function deployment drops from 45 MB back to under 1 MB. Cold start drops from ~2 seconds to ~150ms. The render itself takes ~300ms instead of 1.5-3 seconds on a cold path. The cost is the API call price, which on most workloads is lower than the Lambda memory premium you were paying to keep Chromium fed.

For PDFs larger than 1 MB, use delivery: 'url' to get back a signed URL instead of a base64 payload. This avoids inflating the response body through Lambda, which has its own 6 MB sync invocation limit.

const response = await fetch('https://pdf4.dev/api/v1/render', {
  method: 'POST',
  headers: { Authorization: `Bearer ${process.env.PDF4_API_KEY}` },
  body: JSON.stringify({
    template_id: 'invoice',
    data: { /* ... */ },
    delivery: 'url',
  }),
});
 
const { url, expires_at } = await response.json();
// Return the URL to the user. They download from PDF4.dev directly.

FAQ

Can you run Playwright in AWS Lambda?

Yes, but only with @sparticuz/chromium, a pre-stripped Chromium binary that fits inside Lambda's 250 MB unzipped layer limit. Stock Playwright ships a 300+ MB Chromium that does not fit. Cold starts run 800ms to 3 seconds depending on Lambda memory configuration.

Can you generate PDFs on Cloudflare Workers?

Not with Playwright or Puppeteer directly. Workers cannot run native binaries. Use Cloudflare's Browser Rendering API instead, which runs a managed Chromium pool you call over HTTP from your Worker.

Why are serverless cold starts a problem for PDF generation?

Chromium itself takes 500ms to 3 seconds to spin up cold. On every Lambda cold start, you pay that cost on top of your function init. A user clicking "download PDF" can wait 4-5 seconds for a single render before any template work starts.

How big is a serverless Playwright deployment?

Stock Playwright with Chromium is 300-400 MB unzipped, which exceeds Lambda's 250 MB layer limit and Vercel's 50 MB function limit. The @sparticuz/chromium package strips this to about 50 MB compressed.

Is it cheaper to run PDF generation on Lambda or on a dedicated server?

Dedicated, for any consistent workload. Lambda charges per millisecond including the Chromium startup time, and Chromium idles at 300+ MB of memory which forces you into the higher Lambda memory tiers. A small always-on server with a warm browser pool is cheaper above ~1000 renders per day.

Should I use a queue for PDF generation in serverless?

For batch jobs, yes. SQS plus a background Lambda decouples the user request from the render time. For interactive renders where the user is waiting, an external API with a warm browser pool gives a better user experience because there is no cold start at all.

What is the cheapest way to run PDF generation in production?

A persistent server with a warm singleton Chromium plus a page pool, like the architecture PDF4.dev runs internally. Above a few hundred renders per day, this beats Lambda on both cost and latency.

Skip the cold-start tax

If you do not want to debug @sparticuz/chromium deployment errors, OOMs at 512 MB, and 2-second cold starts on every low-traffic Lambda invocation, try the free HTML to PDF toolTry it free. For automated workloads, the PDF4.dev API runs a permanently warm browser pool, so a render from your Lambda or Vercel function completes in ~300ms with no cold start, no Chromium binary in your bundle, and no memory tier creep on your serverless bill.

Free tools mentioned:

Html To PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.