Serverless and Chromium are a forced marriage. Lambda was built for stateless functions that init in milliseconds; Chromium is a 300 MB binary that takes seconds to launch and idles at half a gigabyte of RAM. Every PDF generation tutorial that says "just deploy this to Lambda" hides the cost: cold starts that wreck the user experience, deployment artifacts that barely fit, and concurrency caps that turn a small spike into a queue of timeouts.
This guide covers what actually works in each major serverless environment, what does not, and when to admit defeat and call an external API instead.
Why is serverless a bad fit for headless browser PDF rendering?
A serverless function is supposed to be cheap and fast to spawn. A headless Chromium is neither. It takes 500ms to 3 seconds to launch, allocates 200-400 MB of RAM the moment it starts, and is too large to fit inside the default Lambda or Vercel deployment package. Every serverless PDF stack works around these constraints, but none make them disappear.
The mismatch shows up in five places: package size, cold start latency, memory footprint, concurrent execution caps, and ephemeral filesystem. The next sections cover each one, starting with the constraint that bites first.
What are the deployment size limits for serverless PDF generation?
Stock Playwright ships with a 300-400 MB Chromium that exceeds every major serverless package limit. You either strip the browser to fit, push it as a Lambda layer, or move to a container deployment. There is no fourth option that uses the standard npm install playwright flow.
| Platform | Package limit | Stock Playwright fits? | Workaround |
|---|---|---|---|
| AWS Lambda zip | 250 MB unzipped, 50 MB zipped | No | @sparticuz/chromium (~50 MB) |
| AWS Lambda container | 10 GB | Yes | Container image with Chromium baked in |
| Vercel Functions | 250 MB unzipped, 50 MB zipped | No | @sparticuz/chromium |
| Vercel Edge Functions | 1 MB | No | Not possible, no native binaries |
| Cloudflare Workers | 1 MB (3 MB paid) | No | Browser Rendering API |
| Netlify Functions | 250 MB unzipped, 50 MB zipped | No | @sparticuz/chromium |
| Google Cloud Functions | 100 MB source, 500 MB deployed | Sometimes | @sparticuz/chromium recommended |
The standard answer for Lambda and Vercel is @sparticuz/chromium, a fork of chrome-aws-lambda that ships a pre-stripped Chromium binary just under the size limit. It removes locales, optional codecs, and headers Chromium does not need in headless mode. The result is a binary that boots, but boots slower because the size reduction comes from removing optimization data alongside dead code.
How do you run Playwright on AWS Lambda with @sparticuz/chromium?
Use @sparticuz/chromium to ship the stripped binary, plus playwright-core (not playwright) so you do not pull the full launcher. Wire the chromium executable path into the Playwright launch options. Memory must be set to at least 1024 MB, and the function timeout to at least 30 seconds.
// handler.js
import chromium from '@sparticuz/chromium';
import playwright from 'playwright-core';
let browser = null;
async function getBrowser() {
if (browser && browser.isConnected()) return browser;
browser = await playwright.chromium.launch({
args: chromium.args,
executablePath: await chromium.executablePath(),
headless: true,
});
return browser;
}
export const handler = async (event) => {
const { html } = JSON.parse(event.body);
try {
const b = await getBrowser();
const page = await b.newPage();
try {
await page.setContent(html, { waitUntil: 'networkidle' });
await page.evaluate(() => document.fonts.ready);
const pdf = await page.pdf({ format: 'A4', printBackground: true });
return {
statusCode: 200,
headers: { 'Content-Type': 'application/pdf' },
body: pdf.toString('base64'),
isBase64Encoded: true,
};
} finally {
await page.close();
}
} catch (err) {
return { statusCode: 500, body: JSON.stringify({ error: err.message }) };
}
};The let browser = null outside the handler is the key trick. Lambda keeps the function instance warm for several minutes between invocations. By stashing the browser in module scope, the second invocation reuses the warm Chromium and skips the 1-2 second launch cost. The first invocation still pays it.
package.json:
{
"dependencies": {
"@sparticuz/chromium": "^131.0.0",
"playwright-core": "^1.49.0"
}
}Configure Lambda with at least 1024 MB memory and 30 second timeout. Below 1024 MB, Chromium runs out of headroom and the page crashes during render. Below 30 seconds, the cold start plus the actual render can hit the timeout on a complex page.
Lambda functions running Chromium routinely OOM at 512 MB. Set memory to 1024 MB minimum, and 2048 MB if your templates pull external assets or render more than a couple of pages. Lambda CPU is allocated proportionally to memory, so doubling the memory also halves render time.
What does a Lambda cold start with Chromium actually cost?
A cold Lambda invocation that loads @sparticuz/chromium runs through five sequential phases before your render starts. Each phase has a measurable cost, and they cannot be parallelized.
| Phase | Time on first invocation | Notes |
|---|---|---|
| Lambda init (Node runtime) | 100-300ms | Standard Node cold start |
import of @sparticuz/chromium | 200-500ms | Decompresses the binary to /tmp |
playwright-core import | 50-100ms | Smaller than full playwright |
chromium.launch() | 800-1500ms | Actual browser process spin-up |
newPage() + first setContent() | 150-300ms | First page is slower than subsequent |
| Total cold start | 1.3-2.7 seconds | Before any template work |
A user clicking "download invoice" waits roughly 2 seconds for nothing useful, then another 200-400ms for the actual render. On a warm invocation, the import and launch phases drop to zero and total time falls to about 250ms. The painful part is that "warm" is fragile: Lambda evicts idle instances after a few minutes, so a low-traffic endpoint pays the cold cost on most requests.
Can you run PDF generation on Vercel?
Vercel Functions run on AWS Lambda under the hood, so the same @sparticuz/chromium workaround applies. Vercel Edge Functions, on the other hand, run on Cloudflare Workers, which cannot run native binaries at all. Edge is a dead end for PDF generation; you need the standard (Node.js) runtime.
// app/api/render-pdf/route.ts
import chromium from '@sparticuz/chromium';
import playwright from 'playwright-core';
import { NextResponse } from 'next/server';
export const runtime = 'nodejs';
export const maxDuration = 30;
let browser = null;
async function getBrowser() {
if (browser && browser.isConnected()) return browser;
browser = await playwright.chromium.launch({
args: chromium.args,
executablePath: await chromium.executablePath(),
headless: true,
});
return browser;
}
export async function POST(request: Request) {
const { html } = await request.json();
const b = await getBrowser();
const page = await b.newPage();
try {
await page.setContent(html, { waitUntil: 'networkidle' });
const pdf = await page.pdf({ format: 'A4', printBackground: true });
return new NextResponse(new Uint8Array(pdf), {
headers: { 'Content-Type': 'application/pdf' },
});
} finally {
await page.close();
}
}export const runtime = 'nodejs' is mandatory. Without it, Next.js may default to Edge for route.ts files in some configurations, which silently fails because Edge cannot import @sparticuz/chromium.
Do not put export const runtime = 'edge' on a route that uses Playwright or @sparticuz/chromium. The build will succeed, the deploy will succeed, and the function will fail at runtime with a confusing "module not found" error. Edge runtime cannot execute native binaries.
Vercel's 50 MB zipped function limit is tight. With @sparticuz/chromium, playwright-core, and a small handler, you land at about 45 MB. There is no room for additional native dependencies (sharp, canvas, libxml). If your template processing needs anything else native, switch to a Lambda container deployment.
What about Cloudflare Workers?
Cloudflare Workers cannot run Playwright, Puppeteer, or any native binary. Workers run in V8 isolates, not Node containers. There is no filesystem to unpack a Chromium binary to, no syscalls to launch a process, no shared memory to communicate with one. The standard answer is Cloudflare's Browser Rendering API, which runs a managed Chromium pool that you call from your Worker.
import puppeteer from '@cloudflare/puppeteer';
export default {
async fetch(request, env) {
const browser = await puppeteer.launch(env.MYBROWSER);
const page = await browser.newPage();
await page.setContent('<h1>Hello from Cloudflare</h1>');
const pdf = await page.pdf({ format: 'A4' });
await browser.close();
return new Response(pdf, {
headers: { 'Content-Type': 'application/pdf' },
});
},
};wrangler.toml:
name = "pdf-worker"
main = "src/index.js"
compatibility_date = "2025-01-01"
browser = { binding = "MYBROWSER" }This works, but Browser Rendering is a separate paid product with its own pricing (per session and per duration) on top of Workers. The cold start moves from your Worker to the Cloudflare-managed browser pool, which helps, but you are now operating two services instead of one. For most teams, paying for Browser Rendering is harder to justify than paying for a dedicated PDF API.
How do you scale PDF generation past the serverless cold start?
Three strategies, in order of complexity. The right answer depends on whether your renders are user-facing (someone is waiting) or batch (a job that can take minutes).
Strategy 1: warm pings. Schedule a CloudWatch event to invoke the Lambda every 5 minutes with a no-op payload. This keeps at least one warm instance alive at all times, eliminating the cold start for the first user request. The cost is roughly $0.20/month per warmed instance. This works for low-traffic endpoints with one or two concurrent requests at a time. It does not help if traffic spikes past your warm capacity, because new instances spawning to handle the spike are all cold.
Strategy 2: queue-based background jobs. Push render requests onto SQS, process them with a worker Lambda, and store the PDF in S3. The user gets a job ID and polls or receives a webhook when the PDF is ready. This decouples the user from the cold start: the user is not blocked on the render, and the worker can take 5 seconds to start up without anyone noticing. Use this for batch jobs (monthly invoices, end-of-day reports, anything triggered by cron).
// queue-render.js (called from your API)
import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs';
const sqs = new SQSClient({});
export async function queueRender(html, userId) {
await sqs.send(new SendMessageCommand({
QueueUrl: process.env.RENDER_QUEUE_URL,
MessageBody: JSON.stringify({ html, userId, requestedAt: Date.now() }),
}));
}
// worker-handler.js (Lambda triggered by SQS)
export const handler = async (event) => {
for (const record of event.Records) {
const { html, userId } = JSON.parse(record.body);
const pdf = await renderPdf(html); // your @sparticuz/chromium render
await uploadToS3(pdf, `${userId}/${Date.now()}.pdf`);
await notifyUser(userId);
}
};Strategy 3: external PDF API with a warm browser pool. Offload the render entirely to a service that keeps Chromium warm permanently. The PDF4.dev architecture is a worked example: a long-running Node server with a singleton Chromium instance, pages closed after each render but the browser kept alive for the next request. Cold starts do not exist because the browser never starts cold. The render server reads from a persistent volume (data/) and serves dozens of concurrent requests against the same warm browser. Average render time is ~300ms because there is no spin-up cost on any request.
| Strategy | Cold start cost | Latency on warm path | Best for |
|---|---|---|---|
| Warm pings | Eliminated for 1 instance | 200-400ms | Low-traffic, single-region |
| Queue + background worker | Hidden from user | Job time depends on queue | Batch jobs, async UX |
| External PDF API | None (always warm) | ~300ms | Interactive, user-facing |
How does a persistent server compare to serverless for PDF rendering?
A persistent server with a warm browser pool wins on every metric except minimum monthly cost. Serverless wins only when the workload is so low that the per-millisecond billing beats a fixed server price. The crossover point sits around 1000 renders per day on Lambda; below that, serverless is cheaper, above that, a small VPS plus a warm browser is both cheaper and faster.
| Metric | Serverless (Lambda + Sparticuz) | Persistent server (warm pool) |
|---|---|---|
| Cold start latency | 1.3-2.7s | 0 |
| Warm render latency | 200-400ms | 200-300ms |
| Memory cost | 1024-2048 MB allocated even when idle | One process, shared across requests |
| Concurrency model | One browser per Lambda instance | One browser, many pages |
| Failure isolation | Per-invocation | Crash recovery on the singleton |
| Cost at 100 renders/day | Lower (free tier) | Higher (always-on VPS) |
| Cost at 10,000 renders/day | Higher (per-ms billing) | Lower (fixed) |
| Operational complexity | Lambda + layer + warm pings | One process, a watchdog, restart logic |
The persistent-server pattern is what every dedicated PDF API uses internally. PDF4.dev runs a singleton Chromium per render server, recycles pages after each render, and keeps the browser alive across thousands of requests. The cold start cost happens once per deploy, not once per user.
When should you call an external PDF API instead?
When the cost of operating Chromium yourself is bigger than the cost of paying someone else to do it. Three concrete signals: your cold start latency hurts user experience, your Lambda bill is climbing because of memory tier creep, or you spent a Friday afternoon debugging a Chromium crash on Lambda that you could not reproduce locally. All three are common, all three argue for offloading.
// Replace your @sparticuz/chromium handler with a one-line API call
import { NextResponse } from 'next/server';
export async function POST(request: Request) {
const { html } = await request.json();
const response = await fetch('https://pdf4.dev/api/v1/render', {
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ html, format: { preset: 'a4' } }),
});
if (!response.ok) {
return NextResponse.json({ error: 'render failed' }, { status: 500 });
}
const pdf = Buffer.from(await response.arrayBuffer());
return new NextResponse(new Uint8Array(pdf), {
headers: { 'Content-Type': 'application/pdf' },
});
}Your function deployment drops from 45 MB back to under 1 MB. Cold start drops from ~2 seconds to ~150ms. The render itself takes ~300ms instead of 1.5-3 seconds on a cold path. The cost is the API call price, which on most workloads is lower than the Lambda memory premium you were paying to keep Chromium fed.
For PDFs larger than 1 MB, use delivery: 'url' to get back a signed URL instead of a base64 payload. This avoids inflating the response body through Lambda, which has its own 6 MB sync invocation limit.
const response = await fetch('https://pdf4.dev/api/v1/render', {
method: 'POST',
headers: { Authorization: `Bearer ${process.env.PDF4_API_KEY}` },
body: JSON.stringify({
template_id: 'invoice',
data: { /* ... */ },
delivery: 'url',
}),
});
const { url, expires_at } = await response.json();
// Return the URL to the user. They download from PDF4.dev directly.FAQ
Can you run Playwright in AWS Lambda?
Yes, but only with @sparticuz/chromium, a pre-stripped Chromium binary that fits inside Lambda's 250 MB unzipped layer limit. Stock Playwright ships a 300+ MB Chromium that does not fit. Cold starts run 800ms to 3 seconds depending on Lambda memory configuration.
Can you generate PDFs on Cloudflare Workers?
Not with Playwright or Puppeteer directly. Workers cannot run native binaries. Use Cloudflare's Browser Rendering API instead, which runs a managed Chromium pool you call over HTTP from your Worker.
Why are serverless cold starts a problem for PDF generation?
Chromium itself takes 500ms to 3 seconds to spin up cold. On every Lambda cold start, you pay that cost on top of your function init. A user clicking "download PDF" can wait 4-5 seconds for a single render before any template work starts.
How big is a serverless Playwright deployment?
Stock Playwright with Chromium is 300-400 MB unzipped, which exceeds Lambda's 250 MB layer limit and Vercel's 50 MB function limit. The @sparticuz/chromium package strips this to about 50 MB compressed.
Is it cheaper to run PDF generation on Lambda or on a dedicated server?
Dedicated, for any consistent workload. Lambda charges per millisecond including the Chromium startup time, and Chromium idles at 300+ MB of memory which forces you into the higher Lambda memory tiers. A small always-on server with a warm browser pool is cheaper above ~1000 renders per day.
Should I use a queue for PDF generation in serverless?
For batch jobs, yes. SQS plus a background Lambda decouples the user request from the render time. For interactive renders where the user is waiting, an external API with a warm browser pool gives a better user experience because there is no cold start at all.
What is the cheapest way to run PDF generation in production?
A persistent server with a warm singleton Chromium plus a page pool, like the architecture PDF4.dev runs internally. Above a few hundred renders per day, this beats Lambda on both cost and latency.
Skip the cold-start tax
If you do not want to debug @sparticuz/chromium deployment errors, OOMs at 512 MB, and 2-second cold starts on every low-traffic Lambda invocation, try the free HTML to PDF toolTry it free. For automated workloads, the PDF4.dev API runs a permanently warm browser pool, so a render from your Lambda or Vercel function completes in ~300ms with no cold start, no Chromium binary in your bundle, and no memory tier creep on your serverless bill.
Free tools mentioned:
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.


