Generating PDFs from HTML in Python is a solved problem, but the solution you pick at prototype time often becomes a production bottleneck. This guide covers three approaches: WeasyPrint (pure Python), Playwright (headless Chromium), and a REST API. You will see working code for each, plus a clear breakdown of when to switch.
Approach comparison
Before diving into code, here is a direct comparison of the three main options for Python PDF generation.
| Approach | CSS accuracy | Install complexity | Async support | Docker size | Best for |
|---|---|---|---|---|---|
| WeasyPrint | Good (CSS 2.1, partial CSS3) | Medium (needs Pango/Cairo) | No (blocks event loop) | ~150MB extra | Most Python projects |
| Playwright | Excellent (full Chromium) | High (Chromium binary ~300MB) | Yes (async API) | ~300MB extra | Pixel-perfect, complex CSS |
| PDF REST API | Excellent (hosted Chromium) | None (HTTP client only) | Yes | 0 | Production, serverless, teams |
| pdfkit/wkhtmltopdf | Poor (deprecated engine) | High (C binary) | No | ~200MB extra | Legacy projects only |
| ReportLab | n/a (not HTML-based) | Low | No | ~15MB | Pure programmatic docs |
Note: pdfkit wraps wkhtmltopdf, which stopped active development in 2023. Avoid it for new projects.
Option 1: WeasyPrint
WeasyPrint is a Python library that converts HTML and CSS to PDF using the Pango text layout engine and the Cairo graphics library. It supports CSS 2.1 plus most commonly used CSS3 properties.
Installation
pip install weasyprintmacOS (Homebrew):
brew install pangoDebian/Ubuntu:
apt-get install libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 libffi-devBasic conversion
from weasyprint import HTML
html_string = """
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; margin: 20mm; }
h1 { color: #111; font-size: 24px; }
.total { font-size: 18px; font-weight: bold; }
</style>
</head>
<body>
<h1>Invoice #001</h1>
<p>Client: Acme Corp</p>
<p class="total">Total: $1,500.00</p>
</body>
</html>
"""
pdf_bytes = HTML(string=html_string).write_pdf()
# Save to file
with open("invoice.pdf", "wb") as f:
f.write(pdf_bytes)Dynamic templates with Jinja2
In practice, you almost always need dynamic data. Jinja2 is the standard Python templating engine for this.
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
# Load template from file
env = Environment(loader=FileSystemLoader("templates/"))
template = env.get_template("invoice.html")
# Render with data
data = {
"invoice_number": "INV-0042",
"client_name": "Acme Corp",
"items": [
{"description": "Consulting", "qty": 10, "unit_price": 150.00},
{"description": "Setup fee", "qty": 1, "unit_price": 200.00},
],
"total": 1700.00,
}
html_string = template.render(**data)
pdf_bytes = HTML(string=html_string).write_pdf()templates/invoice.html:
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; margin: 20mm; color: #333; }
table { width: 100%; border-collapse: collapse; }
th, td { padding: 8px 12px; border-bottom: 1px solid #eee; text-align: left; }
.total-row { font-weight: bold; font-size: 16px; }
</style>
</head>
<body>
<h1>Invoice {{ invoice_number }}</h1>
<p>Client: {{ client_name }}</p>
<table>
<thead>
<tr><th>Description</th><th>Qty</th><th>Unit Price</th><th>Subtotal</th></tr>
</thead>
<tbody>
{% for item in items %}
<tr>
<td>{{ item.description }}</td>
<td>{{ item.qty }}</td>
<td>${{ "%.2f"|format(item.unit_price) }}</td>
<td>${{ "%.2f"|format(item.qty * item.unit_price) }}</td>
</tr>
{% endfor %}
</tbody>
<tfoot>
<tr class="total-row"><td colspan="3">Total</td><td>${{ "%.2f"|format(total) }}</td></tr>
</tfoot>
</table>
</body>
</html>Flask endpoint
from flask import Flask, make_response
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
app = Flask(__name__)
env = Environment(loader=FileSystemLoader("templates/"))
@app.route("/invoice/<invoice_id>.pdf")
def generate_invoice(invoice_id):
# Fetch data (replace with your DB query)
data = {
"invoice_number": invoice_id,
"client_name": "Acme Corp",
"total": 1700.00,
"items": [],
}
template = env.get_template("invoice.html")
html_string = template.render(**data)
pdf_bytes = HTML(string=html_string).write_pdf()
response = make_response(pdf_bytes)
response.headers["Content-Type"] = "application/pdf"
response.headers["Content-Disposition"] = f"inline; filename=invoice-{invoice_id}.pdf"
return responseFastAPI endpoint
WeasyPrint is synchronous and will block the FastAPI event loop if called directly. Use asyncio.to_thread() to run it in a thread pool.
import asyncio
from fastapi import FastAPI
from fastapi.responses import Response
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
app = FastAPI()
env = Environment(loader=FileSystemLoader("templates/"))
@app.get("/invoice/{invoice_id}.pdf")
async def generate_invoice(invoice_id: str):
data = {"invoice_number": invoice_id, "client_name": "Acme Corp", "total": 1700.00, "items": []}
template = env.get_template("invoice.html")
html_string = template.render(**data)
# Run WeasyPrint in a thread pool to avoid blocking
pdf_bytes = await asyncio.to_thread(lambda: HTML(string=html_string).write_pdf())
return Response(
content=pdf_bytes,
media_type="application/pdf",
headers={"Content-Disposition": f"inline; filename=invoice-{invoice_id}.pdf"},
)Option 2: Playwright
Playwright is the Python bindings for Microsoft's browser automation library. It uses a real Chromium browser, which means it supports the full CSS3 spec including CSS Grid, Flexbox, custom properties, and @font-face.
Installation
pip install playwright
playwright install chromiumBasic PDF generation
import asyncio
from playwright.async_api import async_playwright
async def html_to_pdf(html_string: str) -> bytes:
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.set_content(html_string, wait_until="networkidle")
pdf_bytes = await page.pdf(
format="A4",
margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
print_background=True,
)
await browser.close()
return pdf_bytes
# Usage
html = "<html><body><h1>Hello PDF</h1></body></html>"
pdf = asyncio.run(html_to_pdf(html))Singleton browser pattern for production
Launching a new browser for every request adds 200-400ms of startup time. Keep a single browser instance open across requests.
from playwright.async_api import async_playwright, Browser
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.responses import Response
browser: Browser | None = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global browser
async with async_playwright() as p:
browser = await p.chromium.launch()
yield
await browser.close()
app = FastAPI(lifespan=lifespan)
@app.get("/invoice/{invoice_id}.pdf")
async def generate_invoice(invoice_id: str):
html = f"<html><body><h1>Invoice {invoice_id}</h1></body></html>"
page = await browser.new_page()
await page.set_content(html, wait_until="load")
pdf_bytes = await page.pdf(format="A4", print_background=True)
await page.close()
return Response(content=pdf_bytes, media_type="application/pdf")This pattern brings PDF generation down to ~150-300ms per request on a warm browser.
When the DIY approach starts to hurt
Both WeasyPrint and Playwright work well for low-to-medium volume. At production scale, several pain points appear.
WeasyPrint pain points
No concurrency. WeasyPrint is a synchronous Python library with no built-in worker pool. At 10+ concurrent requests, you need a process pool or task queue (Celery, RQ), which adds infrastructure.
System dependencies. Pango and Cairo are C libraries. Every deployment target (Docker, Lambda, CI) needs them. Debugging "libpango not found" in production is time you could spend building features.
CSS compatibility gaps. WeasyPrint does not support JavaScript, CSS animations, or some newer Grid/Flexbox features. Complex designs that work in a browser may render differently.
Playwright pain points
Docker image size. A minimal Python image with Playwright and Chromium weighs ~550MB. Cold starts on AWS Lambda or Google Cloud Run can hit 5-10 seconds when the container is cold.
Concurrency management. Managing a browser pool, page lifecycle, and graceful shutdown under load requires careful code. Memory leaks from unclosed pages are a common production issue.
Serverless limitations. AWS Lambda has a 250MB unzipped deployment package limit. Chromium alone exceeds that. You need a Lambda layer or a custom container image, both of which are non-trivial to maintain.
The break-even point
The table below summarizes when the operational overhead of self-hosted PDF generation stops being worth it.
| Signal | Self-hosted (WeasyPrint/Playwright) | REST API (PDF4.dev) |
|---|---|---|
| PDFs per day | Under 500 | Any volume |
| Team size | Solo or 1-2 devs | Any |
| Deployment target | Traditional server | Serverless, Lambda, edge |
| Docker size budget | No constraint | Size-constrained |
| CSS complexity | Simple documents | Complex HTML/CSS |
| Multi-language (fonts, RTL) | Needs custom setup | Handled by API |
| SLA requirement | DIY monitoring | Managed |
Option 3: PDF REST API (PDF4.dev)
PDF4.dev is an HTML-to-PDF REST API. You send an HTTP request with your HTML and data, and get a PDF back. The rendering is done by a managed Chromium instance with no infrastructure on your side.
Installation
No system dependencies. Just an HTTP client.
pip install httpx # or use requestsGenerate a PDF with raw HTML
import httpx
API_KEY = "p4_live_your_api_key"
def generate_pdf(html: str, data: dict = None) -> bytes:
response = httpx.post(
"https://pdf4.dev/api/v1/render",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"html": html, "data": data or {}},
)
response.raise_for_status()
return response.content
# Example
html = """
<html>
<head>
<style>
body { font-family: Inter, sans-serif; margin: 20mm; }
h1 { color: #111; }
</style>
</head>
<body>
<h1>Invoice {{invoice_number}}</h1>
<p>Client: {{client_name}}</p>
<p>Total: {{total}}</p>
</body>
</html>
"""
pdf_bytes = generate_pdf(html, {
"invoice_number": "INV-0042",
"client_name": "Acme Corp",
"total": "$1,700.00",
})
with open("invoice.pdf", "wb") as f:
f.write(pdf_bytes)The API uses Handlebars syntax ({{variable}}) for templating, with built-in helpers for formatting dates, numbers, and currencies.
Use a saved template
PDF4.dev lets you save HTML templates in the dashboard and reference them by slug. This separates template design from application code.
import httpx
API_KEY = "p4_live_your_api_key"
def render_template(template_id: str, data: dict) -> bytes:
response = httpx.post(
"https://pdf4.dev/api/v1/render",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"template_id": template_id, "data": data},
)
response.raise_for_status()
return response.content
pdf_bytes = render_template("invoice", {
"invoice_number": "INV-0042",
"client_name": "Acme Corp",
"items": [
{"description": "Consulting", "qty": 10, "unit_price": 150},
{"description": "Setup fee", "qty": 1, "unit_price": 200},
],
"total": 1700,
})Async FastAPI with httpx
import httpx
from fastapi import FastAPI
from fastapi.responses import Response
app = FastAPI()
API_KEY = "p4_live_your_api_key"
@app.get("/invoice/{invoice_id}.pdf")
async def generate_invoice(invoice_id: str):
async with httpx.AsyncClient() as client:
response = await client.post(
"https://pdf4.dev/api/v1/render",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"template_id": "invoice",
"data": {
"invoice_number": invoice_id,
"client_name": "Acme Corp",
"total": 1700.00,
},
},
timeout=30.0,
)
response.raise_for_status()
return Response(
content=response.content,
media_type="application/pdf",
headers={"Content-Disposition": f"inline; filename=invoice-{invoice_id}.pdf"},
)No browser to manage, no Pango/Cairo, no Docker image bloat.
Choosing the right approach for your project
Use this decision tree to pick the right tool.
Use WeasyPrint if:
- You want a pure-Python solution with no external HTTP calls
- Your documents use standard CSS (tables, simple layouts, no Grid/complex Flexbox)
- You can afford to add Pango/Cairo to your Docker image
- Volume is under a few hundred PDFs per day
Use Playwright if:
- You need pixel-perfect CSS rendering (full Grid, CSS custom properties, JS-rendered content)
- You are already using Playwright for end-to-end testing
- You are comfortable managing a browser pool
Use a PDF API if:
- You are deploying to serverless (Lambda, Cloud Functions, Vercel)
- You want zero system dependencies in your Docker image
- You need reliable concurrency without building a worker pool
- You want designers to edit PDF templates in a UI without changing Python code
Font handling in Python PDF generation
Font rendering is a frequent source of rendering inconsistencies between development and production.
WeasyPrint uses Pango for text rendering. To use a custom font, embed it via @font-face in your CSS and pass a base_url so WeasyPrint can resolve the file path:
from weasyprint import HTML, CSS
html = "<html><body><p>Hello</p></body></html>"
css = CSS(string="""
@font-face {
font-family: 'Inter';
src: url('/path/to/Inter-Regular.ttf');
}
body { font-family: 'Inter', sans-serif; }
""")
pdf = HTML(string=html).write_pdf(stylesheets=[css])Playwright loads fonts the same way a browser does. Use @font-face with a relative URL and set the base_url argument in page.set_content() if needed. Google Fonts work as long as you wait for networkidle when setting content.
PDF4.dev supports Google Fonts via the google_fonts_url field in the format options, and supports @font-face with any URL.
PDF format options: A4, letter, custom sizes
All three approaches support A4, Letter, and custom page sizes.
| Format | WeasyPrint | Playwright | PDF4.dev (format.preset) |
|---|---|---|---|
| A4 portrait | @page { size: A4; } | format="A4" | "a4" |
| A4 landscape | @page { size: A4 landscape; } | format="A4", landscape=True | "a4-landscape" |
| Letter | @page { size: letter; } | format="Letter" | "letter" |
| Custom | @page { size: 150mm 200mm; } | width="150mm", height="200mm" | preset="custom", width="150mm", height="200mm" |
For custom margins in WeasyPrint:
@page {
size: A4;
margin: 20mm 15mm 20mm 15mm;
}Production checklist
Before deploying PDF generation to production, verify these points regardless of which approach you use.
| Checklist item | WeasyPrint | Playwright | PDF API |
|---|---|---|---|
| System libs installed in Docker | Pango, Cairo required | Chromium required | None needed |
| Concurrency handled | Process pool or task queue | Browser page pool | Handled by API |
| Timeouts configured | Thread timeout | page.pdf() timeout | HTTP client timeout (30s) |
| Error handling for malformed HTML | Try/except around write_pdf() | Try/except around page.pdf() | Check HTTP status code |
| Logging render duration | time.time() around call | time.time() around call | Check response headers |
| Fonts available in container | Font files copied in Docker | System fonts or @font-face | Embedded or Google Fonts |
Summary
Generating PDFs from HTML in Python has three solid paths. WeasyPrint is the quickest to add to an existing Python project. Playwright gives the most accurate rendering. A REST API eliminates all system dependencies and is the only realistic option for serverless environments.
For most Flask or FastAPI projects, start with WeasyPrint. If CSS accuracy becomes a problem or you need to scale beyond a few hundred PDFs per day, switching to a managed API requires only changing one function call.
You can try PDF generation with the HTML to PDF converter or sign up at PDF4.dev to use the full API with saved templates, Handlebars variables, and a visual editor.
Free tools mentioned:
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.