Get started

Generate PDFs from HTML in Python: WeasyPrint, Playwright, and PDF APIs compared

Generate PDFs from HTML in Python using WeasyPrint, Playwright, or a REST API. Covers Flask, FastAPI, dynamic templates, fonts, and production tips.

benoitdedMarch 17, 202611 min read

Generating PDFs from HTML in Python is a solved problem, but the solution you pick at prototype time often becomes a production bottleneck. This guide covers three approaches: WeasyPrint (pure Python), Playwright (headless Chromium), and a REST API. You will see working code for each, plus a clear breakdown of when to switch.

Approach comparison

Before diving into code, here is a direct comparison of the three main options for Python PDF generation.

ApproachCSS accuracyInstall complexityAsync supportDocker sizeBest for
WeasyPrintGood (CSS 2.1, partial CSS3)Medium (needs Pango/Cairo)No (blocks event loop)~150MB extraMost Python projects
PlaywrightExcellent (full Chromium)High (Chromium binary ~300MB)Yes (async API)~300MB extraPixel-perfect, complex CSS
PDF REST APIExcellent (hosted Chromium)None (HTTP client only)Yes0Production, serverless, teams
pdfkit/wkhtmltopdfPoor (deprecated engine)High (C binary)No~200MB extraLegacy projects only
ReportLabn/a (not HTML-based)LowNo~15MBPure programmatic docs

Note: pdfkit wraps wkhtmltopdf, which stopped active development in 2023. Avoid it for new projects.

Option 1: WeasyPrint

WeasyPrint is a Python library that converts HTML and CSS to PDF using the Pango text layout engine and the Cairo graphics library. It supports CSS 2.1 plus most commonly used CSS3 properties.

Installation

pip install weasyprint

macOS (Homebrew):

brew install pango

Debian/Ubuntu:

apt-get install libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 libffi-dev

Basic conversion

from weasyprint import HTML
 
html_string = """
<!DOCTYPE html>
<html>
<head>
  <style>
    body { font-family: Arial, sans-serif; margin: 20mm; }
    h1 { color: #111; font-size: 24px; }
    .total { font-size: 18px; font-weight: bold; }
  </style>
</head>
<body>
  <h1>Invoice #001</h1>
  <p>Client: Acme Corp</p>
  <p class="total">Total: $1,500.00</p>
</body>
</html>
"""
 
pdf_bytes = HTML(string=html_string).write_pdf()
 
# Save to file
with open("invoice.pdf", "wb") as f:
    f.write(pdf_bytes)

Dynamic templates with Jinja2

In practice, you almost always need dynamic data. Jinja2 is the standard Python templating engine for this.

from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
 
# Load template from file
env = Environment(loader=FileSystemLoader("templates/"))
template = env.get_template("invoice.html")
 
# Render with data
data = {
    "invoice_number": "INV-0042",
    "client_name": "Acme Corp",
    "items": [
        {"description": "Consulting", "qty": 10, "unit_price": 150.00},
        {"description": "Setup fee", "qty": 1, "unit_price": 200.00},
    ],
    "total": 1700.00,
}
 
html_string = template.render(**data)
pdf_bytes = HTML(string=html_string).write_pdf()

templates/invoice.html:

<!DOCTYPE html>
<html>
<head>
  <style>
    body { font-family: Arial, sans-serif; margin: 20mm; color: #333; }
    table { width: 100%; border-collapse: collapse; }
    th, td { padding: 8px 12px; border-bottom: 1px solid #eee; text-align: left; }
    .total-row { font-weight: bold; font-size: 16px; }
  </style>
</head>
<body>
  <h1>Invoice {{ invoice_number }}</h1>
  <p>Client: {{ client_name }}</p>
  <table>
    <thead>
      <tr><th>Description</th><th>Qty</th><th>Unit Price</th><th>Subtotal</th></tr>
    </thead>
    <tbody>
      {% for item in items %}
      <tr>
        <td>{{ item.description }}</td>
        <td>{{ item.qty }}</td>
        <td>${{ "%.2f"|format(item.unit_price) }}</td>
        <td>${{ "%.2f"|format(item.qty * item.unit_price) }}</td>
      </tr>
      {% endfor %}
    </tbody>
    <tfoot>
      <tr class="total-row"><td colspan="3">Total</td><td>${{ "%.2f"|format(total) }}</td></tr>
    </tfoot>
  </table>
</body>
</html>

Flask endpoint

from flask import Flask, make_response
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
 
app = Flask(__name__)
env = Environment(loader=FileSystemLoader("templates/"))
 
@app.route("/invoice/<invoice_id>.pdf")
def generate_invoice(invoice_id):
    # Fetch data (replace with your DB query)
    data = {
        "invoice_number": invoice_id,
        "client_name": "Acme Corp",
        "total": 1700.00,
        "items": [],
    }
 
    template = env.get_template("invoice.html")
    html_string = template.render(**data)
    pdf_bytes = HTML(string=html_string).write_pdf()
 
    response = make_response(pdf_bytes)
    response.headers["Content-Type"] = "application/pdf"
    response.headers["Content-Disposition"] = f"inline; filename=invoice-{invoice_id}.pdf"
    return response

FastAPI endpoint

WeasyPrint is synchronous and will block the FastAPI event loop if called directly. Use asyncio.to_thread() to run it in a thread pool.

import asyncio
from fastapi import FastAPI
from fastapi.responses import Response
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
 
app = FastAPI()
env = Environment(loader=FileSystemLoader("templates/"))
 
@app.get("/invoice/{invoice_id}.pdf")
async def generate_invoice(invoice_id: str):
    data = {"invoice_number": invoice_id, "client_name": "Acme Corp", "total": 1700.00, "items": []}
    template = env.get_template("invoice.html")
    html_string = template.render(**data)
 
    # Run WeasyPrint in a thread pool to avoid blocking
    pdf_bytes = await asyncio.to_thread(lambda: HTML(string=html_string).write_pdf())
 
    return Response(
        content=pdf_bytes,
        media_type="application/pdf",
        headers={"Content-Disposition": f"inline; filename=invoice-{invoice_id}.pdf"},
    )

Option 2: Playwright

Playwright is the Python bindings for Microsoft's browser automation library. It uses a real Chromium browser, which means it supports the full CSS3 spec including CSS Grid, Flexbox, custom properties, and @font-face.

Installation

pip install playwright
playwright install chromium

Basic PDF generation

import asyncio
from playwright.async_api import async_playwright
 
async def html_to_pdf(html_string: str) -> bytes:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.set_content(html_string, wait_until="networkidle")
        pdf_bytes = await page.pdf(
            format="A4",
            margin={"top": "20mm", "bottom": "20mm", "left": "15mm", "right": "15mm"},
            print_background=True,
        )
        await browser.close()
        return pdf_bytes
 
# Usage
html = "<html><body><h1>Hello PDF</h1></body></html>"
pdf = asyncio.run(html_to_pdf(html))

Singleton browser pattern for production

Launching a new browser for every request adds 200-400ms of startup time. Keep a single browser instance open across requests.

from playwright.async_api import async_playwright, Browser
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.responses import Response
 
browser: Browser | None = None
 
@asynccontextmanager
async def lifespan(app: FastAPI):
    global browser
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        yield
        await browser.close()
 
app = FastAPI(lifespan=lifespan)
 
@app.get("/invoice/{invoice_id}.pdf")
async def generate_invoice(invoice_id: str):
    html = f"<html><body><h1>Invoice {invoice_id}</h1></body></html>"
    page = await browser.new_page()
    await page.set_content(html, wait_until="load")
    pdf_bytes = await page.pdf(format="A4", print_background=True)
    await page.close()
    return Response(content=pdf_bytes, media_type="application/pdf")

This pattern brings PDF generation down to ~150-300ms per request on a warm browser.


When the DIY approach starts to hurt

Both WeasyPrint and Playwright work well for low-to-medium volume. At production scale, several pain points appear.

WeasyPrint pain points

No concurrency. WeasyPrint is a synchronous Python library with no built-in worker pool. At 10+ concurrent requests, you need a process pool or task queue (Celery, RQ), which adds infrastructure.

System dependencies. Pango and Cairo are C libraries. Every deployment target (Docker, Lambda, CI) needs them. Debugging "libpango not found" in production is time you could spend building features.

CSS compatibility gaps. WeasyPrint does not support JavaScript, CSS animations, or some newer Grid/Flexbox features. Complex designs that work in a browser may render differently.

Playwright pain points

Docker image size. A minimal Python image with Playwright and Chromium weighs ~550MB. Cold starts on AWS Lambda or Google Cloud Run can hit 5-10 seconds when the container is cold.

Concurrency management. Managing a browser pool, page lifecycle, and graceful shutdown under load requires careful code. Memory leaks from unclosed pages are a common production issue.

Serverless limitations. AWS Lambda has a 250MB unzipped deployment package limit. Chromium alone exceeds that. You need a Lambda layer or a custom container image, both of which are non-trivial to maintain.

The break-even point

The table below summarizes when the operational overhead of self-hosted PDF generation stops being worth it.

SignalSelf-hosted (WeasyPrint/Playwright)REST API (PDF4.dev)
PDFs per dayUnder 500Any volume
Team sizeSolo or 1-2 devsAny
Deployment targetTraditional serverServerless, Lambda, edge
Docker size budgetNo constraintSize-constrained
CSS complexitySimple documentsComplex HTML/CSS
Multi-language (fonts, RTL)Needs custom setupHandled by API
SLA requirementDIY monitoringManaged

Option 3: PDF REST API (PDF4.dev)

PDF4.dev is an HTML-to-PDF REST API. You send an HTTP request with your HTML and data, and get a PDF back. The rendering is done by a managed Chromium instance with no infrastructure on your side.

Installation

No system dependencies. Just an HTTP client.

pip install httpx  # or use requests

Generate a PDF with raw HTML

import httpx
 
API_KEY = "p4_live_your_api_key"
 
def generate_pdf(html: str, data: dict = None) -> bytes:
    response = httpx.post(
        "https://pdf4.dev/api/v1/render",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"html": html, "data": data or {}},
    )
    response.raise_for_status()
    return response.content
 
# Example
html = """
<html>
<head>
  <style>
    body { font-family: Inter, sans-serif; margin: 20mm; }
    h1 { color: #111; }
  </style>
</head>
<body>
  <h1>Invoice {{invoice_number}}</h1>
  <p>Client: {{client_name}}</p>
  <p>Total: {{total}}</p>
</body>
</html>
"""
 
pdf_bytes = generate_pdf(html, {
    "invoice_number": "INV-0042",
    "client_name": "Acme Corp",
    "total": "$1,700.00",
})
 
with open("invoice.pdf", "wb") as f:
    f.write(pdf_bytes)

The API uses Handlebars syntax ({{variable}}) for templating, with built-in helpers for formatting dates, numbers, and currencies.

Use a saved template

PDF4.dev lets you save HTML templates in the dashboard and reference them by slug. This separates template design from application code.

import httpx
 
API_KEY = "p4_live_your_api_key"
 
def render_template(template_id: str, data: dict) -> bytes:
    response = httpx.post(
        "https://pdf4.dev/api/v1/render",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"template_id": template_id, "data": data},
    )
    response.raise_for_status()
    return response.content
 
pdf_bytes = render_template("invoice", {
    "invoice_number": "INV-0042",
    "client_name": "Acme Corp",
    "items": [
        {"description": "Consulting", "qty": 10, "unit_price": 150},
        {"description": "Setup fee", "qty": 1, "unit_price": 200},
    ],
    "total": 1700,
})

Async FastAPI with httpx

import httpx
from fastapi import FastAPI
from fastapi.responses import Response
 
app = FastAPI()
API_KEY = "p4_live_your_api_key"
 
@app.get("/invoice/{invoice_id}.pdf")
async def generate_invoice(invoice_id: str):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://pdf4.dev/api/v1/render",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={
                "template_id": "invoice",
                "data": {
                    "invoice_number": invoice_id,
                    "client_name": "Acme Corp",
                    "total": 1700.00,
                },
            },
            timeout=30.0,
        )
        response.raise_for_status()
 
    return Response(
        content=response.content,
        media_type="application/pdf",
        headers={"Content-Disposition": f"inline; filename=invoice-{invoice_id}.pdf"},
    )

No browser to manage, no Pango/Cairo, no Docker image bloat.


Choosing the right approach for your project

Use this decision tree to pick the right tool.

Use WeasyPrint if:

  • You want a pure-Python solution with no external HTTP calls
  • Your documents use standard CSS (tables, simple layouts, no Grid/complex Flexbox)
  • You can afford to add Pango/Cairo to your Docker image
  • Volume is under a few hundred PDFs per day

Use Playwright if:

  • You need pixel-perfect CSS rendering (full Grid, CSS custom properties, JS-rendered content)
  • You are already using Playwright for end-to-end testing
  • You are comfortable managing a browser pool

Use a PDF API if:

  • You are deploying to serverless (Lambda, Cloud Functions, Vercel)
  • You want zero system dependencies in your Docker image
  • You need reliable concurrency without building a worker pool
  • You want designers to edit PDF templates in a UI without changing Python code

Font handling in Python PDF generation

Font rendering is a frequent source of rendering inconsistencies between development and production.

WeasyPrint uses Pango for text rendering. To use a custom font, embed it via @font-face in your CSS and pass a base_url so WeasyPrint can resolve the file path:

from weasyprint import HTML, CSS
 
html = "<html><body><p>Hello</p></body></html>"
css = CSS(string="""
  @font-face {
    font-family: 'Inter';
    src: url('/path/to/Inter-Regular.ttf');
  }
  body { font-family: 'Inter', sans-serif; }
""")
 
pdf = HTML(string=html).write_pdf(stylesheets=[css])

Playwright loads fonts the same way a browser does. Use @font-face with a relative URL and set the base_url argument in page.set_content() if needed. Google Fonts work as long as you wait for networkidle when setting content.

PDF4.dev supports Google Fonts via the google_fonts_url field in the format options, and supports @font-face with any URL.


PDF format options: A4, letter, custom sizes

All three approaches support A4, Letter, and custom page sizes.

FormatWeasyPrintPlaywrightPDF4.dev (format.preset)
A4 portrait@page { size: A4; }format="A4""a4"
A4 landscape@page { size: A4 landscape; }format="A4", landscape=True"a4-landscape"
Letter@page { size: letter; }format="Letter""letter"
Custom@page { size: 150mm 200mm; }width="150mm", height="200mm"preset="custom", width="150mm", height="200mm"

For custom margins in WeasyPrint:

@page {
  size: A4;
  margin: 20mm 15mm 20mm 15mm;
}

Production checklist

Before deploying PDF generation to production, verify these points regardless of which approach you use.

Checklist itemWeasyPrintPlaywrightPDF API
System libs installed in DockerPango, Cairo requiredChromium requiredNone needed
Concurrency handledProcess pool or task queueBrowser page poolHandled by API
Timeouts configuredThread timeoutpage.pdf() timeoutHTTP client timeout (30s)
Error handling for malformed HTMLTry/except around write_pdf()Try/except around page.pdf()Check HTTP status code
Logging render durationtime.time() around calltime.time() around callCheck response headers
Fonts available in containerFont files copied in DockerSystem fonts or @font-faceEmbedded or Google Fonts

Summary

Generating PDFs from HTML in Python has three solid paths. WeasyPrint is the quickest to add to an existing Python project. Playwright gives the most accurate rendering. A REST API eliminates all system dependencies and is the only realistic option for serverless environments.

For most Flask or FastAPI projects, start with WeasyPrint. If CSS accuracy becomes a problem or you need to scale beyond a few hundred PDFs per day, switching to a managed API requires only changing one function call.

You can try PDF generation with the HTML to PDF converter or sign up at PDF4.dev to use the full API with saved templates, Handlebars variables, and a visual editor.

Free tools mentioned:

Html To PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.