Get started
Generating accessible tagged PDFs from HTML: WCAG 2.2 and PDF/UA-2

Generating accessible tagged PDFs from HTML: WCAG 2.2 and PDF/UA-2

Developer guide to producing WCAG 2.2 and PDF/UA-2 compliant tagged PDFs from HTML for the European Accessibility Act, with veraPDF in CI and audit-passing patterns.

19 min read

The European Accessibility Act has been in force since 28 June 2025. Germany sent its first batch of warning letters to non-compliant operators in autumn 2025, and France's enforcement body has signaled it will follow in the second half of 2026. Most digital products that fall under the act look fine on screen. The PDFs they generate are usually the weakest point: invoices, statements, contracts, certificates, tickets. A standard "render HTML to PDF" pipeline produces a file that is technically tagged but fails almost every strict PDF/UA-2 audit. This article walks through what the rules actually require, why headless Chromium falls short, and three production-grade paths to a compliant tagged PDF, including the veraPDF check that should fail your CI build before a broken document ever reaches a customer.

What the EAA actually requires for PDFs

The European Accessibility Act is Directive (EU) 2019/882. Article 4 obligates economic operators placing covered products and services on the EU market to make them accessible according to the functional accessibility requirements in Annex I. Article 30 sets the application date at 28 June 2025. Article 31 obliges member states to lay down penalties that are "effective, proportionate and dissuasive."

The directive does not name a specific PDF standard. It points at functional outcomes: perceivable, operable, understandable, robust. In practice, the only ISO standard a regulator can point a non-compliant operator at for PDF documents is PDF/UA, mapped onto WCAG 2.2 success criteria.

Covered services include retail e-commerce, consumer banking, transport ticketing, e-books, audiovisual services, telephony, and the consumer hardware and operating systems those services run on. If your product emits a PDF that an end consumer in the EU can download, the PDF is in scope.

Penalty regimes vary by member state because each transposed the directive into local law:

Member stateTransposing lawMaximum fine per violation
GermanyBFSG (Barrierefreiheitsstärkungsgesetz)100,000 EUR
FranceLoi 2023-171 + decree 2023-1006250,000 EUR (repeat)
SpainReal Decreto 193/20231,000,000 EUR (very serious)
ItalyD.Lgs. 27 maggio 2022 n. 8240,000 EUR
NetherlandsImplementatiewet toegankelijkheidsvoorschriften22,500 EUR
BelgiumCode de droit économique livre XV.131/380,000 EUR

Fines are rare. Warning letters and demands to publish remediation timelines are not. The German Bundesfachstelle Barrierefreiheit confirmed in October 2025 that the first round of enforcement letters had been sent to e-commerce operators whose checkout PDFs were not tagged.

WCAG 2.2 vs PDF/UA-2: pick both

WCAG 2.2 is the W3C standard that defines accessibility at the content level. It is written for web content, but the success criteria apply to any electronic document. WCAG is what the EAA functional requirements map to.

PDF/UA-2, published as ISO 14289-2:2024, is the technical PDF specification that says how a PDF file must be structured to satisfy those criteria. The two are not alternatives. A compliance audit checks WCAG at the content level and PDF/UA at the format level.

The PDF Association's announcement of ISO 14289-2 describes PDF/UA-2 as the gold standard for accessibility in PDF 2.0. It updates PDF/UA-1 with the new tag namespace, role mapping, and structure rules introduced by ISO 32000-2 (PDF 2.0).

A short mapping of the WCAG criteria most relevant to PDF generation:

WCAG 2.2 success criterionWhat it means in a PDFPDF/UA-2 mechanism
1.1.1 Non-text contentEvery image needs alt text or artifact marking/Figure tag with /Alt entry, or /Artifact
1.3.1 Info and relationshipsHeadings, lists, tables must be structurally marked/H1..H6, /L /LI, /Table /TR /TH /TD
1.3.2 Meaningful sequenceReading order must match logical orderOrder of children in the structure tree
1.4.3 Contrast (minimum)4.5:1 for normal text, 3:1 for large textChecked on rendered output, not in tags
2.4.6 Headings and labelsHeadings must describe the sectionHeading text content
3.1.1 Language of pageDocument language must be declared/Lang in the document catalog
3.1.2 Language of partsInline language switches must be marked/Lang on the parent /Span
4.1.2 Name, role, valueForm fields must expose name and role/Form annotations with /T and /TU

PDF/UA-2 adds machine-checkable rules around tag nesting (a /TH only inside a /TR, a /LI only inside an /L, etc.) that veraPDF can verify deterministically. WCAG 2.2 contains criteria that no automated tool can judge: whether an alt text is actually meaningful, whether the reading order matches the visual intent, whether form field labels make sense in context. Both layers matter.

Why headless Chromium PDFs fail audits

Chromium has emitted Tagged PDF since version 85 (August 2020), and the tagging code has improved every year since. As of May 2026 the output is still not good enough to clear a strict PDF/UA-2 audit on a real-world document.

Three failure patterns appear in almost every audit of a Chromium-rendered PDF:

Heading levels guessed from font size. When a page uses CSS classes like .title and .subtitle instead of <h1> and <h2>, Chromium falls back to font-size heuristics to decide heading levels. The output frequently nests /H4 inside /H1 with no intermediate levels, which fails the strict heading hierarchy check.

Lists rendered as flat blocks. A "list" built with <div class="bullet"> plus a CSS ::before pseudo-element produces no list structure in the tag tree. The output is a series of /P tags with bullet characters in the text content. A screen reader announces "bullet, item one, bullet, item two" instead of "list of three items, item one of three."

Tables missing scope attributes. Even with a real <table>, omitting scope="col" on <th> elements makes Chromium emit /TH tags with no header-cell relationship, breaking screen-reader announcements like "row two, column total, value 1500".

A Tailwind-styled invoice page tagged by Chromium will typically produce a structure tree that looks roughly like this:

/Document
  /Sect
    /P "INVOICE"           ← was <h1>, demoted to /P
    /P "Invoice #INV-001"
    /P "Issued: 2026-06-01"
    /P "Acme Corp"
    /P "-"
    /Figure                ← decorative logo, no /Alt, not marked as artifact
    /Table
      /TR
        /TD "Description"  ← was <th> but no scope, downgraded to /TD
        /TD "Amount"
      /TR
        /TD "Consulting"
        /TD "$1,500"

The same page tagged correctly looks like this:

/Document /Lang=en
  /H1 "Invoice"
  /Sect
    /P "Invoice #INV-001"
    /P "Issued: 2026-06-01"
    /Artifact "Acme Corp logo"   ← decorative
    /Table
      /Caption "Line items"
      /THead
        /TR
          /TH scope=col "Description"
          /TH scope=col "Amount"
      /TBody
        /TR
          /TD "Consulting"
          /TD "$1,500"

The second tree passes the PDF/UA-2 structure-tree rules. The first fails on four counts.

Path 1: author for tagging from the start

The biggest under-investment in most PDF generation pipelines is not in tooling, it is in the source HTML. A clean semantic template removes most of the failure modes before any rendering engine touches it.

Concrete rules for HTML destined for PDF generation:

  • Use real semantic tags: <main>, <nav>, <aside>, <article>, <section>, <header>, <footer>, <h1> through <h6> in correct nesting order.
  • Use real list elements: <ul>, <ol>, <li>. Never simulate a list with <div> and CSS.
  • Use real table elements: <table>, <caption>, <thead>, <tbody>, <th scope="col"> or <th scope="row">, <td>.
  • Set lang on the root <html> element. Add lang on inline spans when a phrase switches language.
  • Add alt on every <img>. Use empty alt (alt="") for decorative images so they are marked as artifact in the PDF.
  • Use aria-label only when visible text is genuinely missing. Visible text is always better.
  • Avoid layout <table> elements. Tables in PDFs mean tabular data.

Chromium honors more of this than developers expect. Switch a template from class-based headings to real <h1> through <h3> and the structure tree improves immediately. Adding scope="col" to existing table headers takes 30 seconds and fixes one of the most common audit failures.

Path 2: Chromium plus post-process with veraPDF and PAC 2024

This is the most common production path in 2026 because Chromium is fast, scales horizontally, and renders CSS the way developers expect. The trick is to render, then validate, then fail the build on any non-passed check.

veraPDF is the open-source reference validator for the PDF/A and PDF/UA ISO standards. It is maintained by the PDF Association and the Open Preservation Foundation, and it is the validator the ISO working group itself uses to confirm conformance. veraPDF supports PDF/UA-1 (--flavour ua1) and PDF/UA-2 (--flavour ua2).

# Install veraPDF
brew install --cask verapdf      # macOS
# Or download from verapdf.org/releases
 
# Validate a PDF against PDF/UA-2
verapdf --flavour ua2 --format json invoice.pdf

A compliant file returns isCompliant: true in the JSON report. A non-compliant file lists every failed check with the rule from ISO 14289-2.

Wiring veraPDF into CI as a hard gate looks like this:

- name: Render PDF
  run: node scripts/render-invoice.mjs sample-data.json invoice.pdf
 
- name: Validate PDF/UA-2 with veraPDF
  run: |
    verapdf --flavour ua2 --format json invoice.pdf > report.json
    node scripts/check-verapdf.mjs report.json
 
- name: Upload veraPDF report
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: verapdf-report
    path: report.json

PAC 2024 is the free Swiss accessibility checker from the Access for All foundation. It runs the Matterhorn Protocol checks that need a human eye, like whether a reading order makes logical sense or whether alt text actually describes the image. Run veraPDF in CI for the machine-checkable rules. Run PAC 2024 manually before every release for the rest.

Path 3: WeasyPrint or Prince for high-fidelity tagging

WeasyPrint and Prince are HTML-to-PDF engines built specifically for paged media. Both produce significantly better tagged PDF output than Chromium today.

WeasyPrint is open source, written in Python, and supports PDF/UA-1 output via the --pdf-variant pdf/ua-1 flag (added in version 60). WeasyPrint 68, released in early 2026, ships CMYK color support per the project's changelog, closing one of the last gaps with commercial tools. WeasyPrint's PDF/UA-2 work is in progress as of May 2026 and tracked in the project's open issues.

Prince is commercial. It supports PDF/UA-1 and PDF/UA-2 directly and produces some of the cleanest tag trees in the industry. Prince licenses are not cheap, but for organizations that need PDF/UA-2 by next quarter and cannot wait for Chromium tooling to catch up, Prince is the fastest path.

Trade-offs vs Chromium:

ConcernChromiumWeasyPrintPrince
Tag fidelity out of the boxMediocreGoodExcellent
CSS coverageReference browserGood, minor gapsGood, some Prince-specific extensions
Render speedSub-secondFew secondsSub-second
Memory per render~150 MB~80 MB~60 MB
LicenseFreeFree (BSD)Commercial
PDF/UA-2 todayRequires post-processPDF/UA-1 native, UA-2 in progressNative

PDF4.dev currently runs on Chromium for its CSS coverage and speed. For pipelines where PDF/UA-2 conformance is required on every render, the practical advice in 2026 is: prototype with WeasyPrint, validate with veraPDF, and switch to Prince if the rendering quality on your specific templates is not good enough.

Tag tree anatomy

Every PDF/UA-compliant document has a structure tree rooted at /Document. The standard tag types map closely to HTML semantics:

HTMLPDF tagNotes
<html>/DocumentCarries /Lang
<section>, <article>/SectGeneric grouping
<h1> through <h6>/H1 through /H6PDF/UA-2 allows /H with /HeadingLevel
<p>/PParagraph
<ul>, <ol>/LList
<li>/LIList item with optional /Lbl + /LBody
<table>/TableWith optional /Caption
<thead>, <tbody>, <tfoot>/THead, /TBody, /TFootNew in PDF/UA-2
<tr>/TRTable row
<th>/THNeeds scope attribute
<td>/TDData cell
<img> with alt/Figure + /AltRequired alt text
<img> decorative/ArtifactSkipped by AT
<a>/LinkWith /Contents for screen reader
<form>/FormWith annotation reference

A correctly tagged invoice line items table in JSON-equivalent shape:

{
  "type": "Table",
  "children": [
    { "type": "Caption", "children": [{ "type": "P", "text": "Line items" }] },
    {
      "type": "THead",
      "children": [{
        "type": "TR",
        "children": [
          { "type": "TH", "scope": "col", "text": "Description" },
          { "type": "TH", "scope": "col", "text": "Amount" }
        ]
      }]
    },
    {
      "type": "TBody",
      "children": [{
        "type": "TR",
        "children": [
          { "type": "TD", "text": "Consulting" },
          { "type": "TD", "text": "$1,500" }
        ]
      }]
    }
  ]
}

A screen reader navigating this tree announces "Table, line items, two columns, header row Description Amount, row one Consulting fifteen hundred dollars."

Reading order and the artifact marker

Content order in the tag tree is what assistive technology users hear. Visual position on the page is irrelevant. A footer placed at the bottom of the visual page but listed first in the tag tree will be read first.

Two practical consequences:

Decorative content goes under /Artifact. Backgrounds, decorative icons, page numbers when the page is also numbered in the document title, header rules, and footer chrome should all be marked as artifacts. Screen readers skip artifacts entirely. The PDF/UA-2 standard makes artifact marking mandatory for decorative content.

In HTML, the closest signal is role="presentation" plus aria-hidden="true":

<svg role="presentation" aria-hidden="true" width="24" height="24">
  <circle cx="12" cy="12" r="10" fill="#7c3aed" />
</svg>

Chromium and WeasyPrint both honor this and emit the SVG as /Artifact.

Logical order must match visual order. Multi-column layouts are the worst offender: a two-column print layout that places "continued from previous column" text in the source HTML before the column-one paragraph will be read in the wrong order. The fix is to write the HTML in logical reading order and let CSS columns handle the visual flow.

Alt text rules that actually pass audits

Alt text is the most-failed PDF/UA criterion in our reviews. Two rules cover the entire problem space:

Decorative images get empty alt (alt=""). This signals "image present but not meaningful" and produces /Artifact in the tag tree. The image is silently skipped by screen readers.

Content images get a meaningful alt. "Meaningful" means it describes what the image conveys in context. A chart needs alt text summarizing the data point, not "bar chart". A logo at the top of an invoice needs alt text "Acme Corp logo" only if the company name is not also visible elsewhere on the page. The W3C WCAG 2.2 understanding document for 1.1.1 gives the canonical decision tree.

Never auto-generate alt text with a vision model for compliance. AI-generated alt is fine as a starting point that a human reviews, but a fully automated pipeline that ships AI alt text directly to production will not pass an audit. Auditors flag AI alt because it is generic ("an image of a chart") and frequently wrong. The WCAG understanding document is explicit that alt text must be reviewed by a human who understands the content.

Color, contrast, and the 4.5:1 rule

WCAG 2.2 success criterion 1.4.3 requires a contrast ratio of at least 4.5:1 between text and background for normal text, and 3:1 for large text (18 point or 14 point bold). This is checked on the rendered output, not in the tag tree.

The cheapest way to enforce this is to audit the HTML before rendering with axe-core or pa11y. Both run in Node and can be wired into the same CI step that renders the PDF:

import { AxeBuilder } from "@axe-core/playwright";
import { chromium } from "playwright";
 
const browser = await chromium.launch();
const page = await browser.newPage();
await page.setContent(html, { waitUntil: "load" });
 
const results = await new AxeBuilder({ page })
  .withTags(["wcag22aa"])
  .analyze();
 
const contrast = results.violations.filter((v) => v.id === "color-contrast");
if (contrast.length > 0) {
  console.error("Contrast failures:", contrast);
  process.exit(1);
}

Run this before page.pdf() and the rest of the pipeline runs only on HTML that already passed the contrast check.

Tables: the audit killer

Tables fail audits more often than any other content type. A "fake table built with divs" fails three success criteria at once:

  • 1.3.1 Info and relationships (no header-cell relationships)
  • 1.3.2 Meaningful sequence (CSS grid reordering breaks reading order)
  • 4.1.2 Name, role, value (no /Table role)

A real <table> with <caption>, <thead>, <th scope="col">, and <tbody> passes all three on most rendering engines. The minimum compliant pattern:

<table>
  <caption>Q2 2026 revenue by region</caption>
  <thead>
    <tr>
      <th scope="col">Region</th>
      <th scope="col">Revenue (EUR)</th>
      <th scope="col">Growth</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">EMEA</th>
      <td>1,250,000</td>
      <td>+12%</td>
    </tr>
    <tr>
      <th scope="row">APAC</th>
      <td>820,000</td>
      <td>+8%</td>
    </tr>
  </tbody>
</table>

scope="row" on the first cell of each data row is what lets a screen reader announce "EMEA, revenue 1.25 million euros, growth 12 percent" instead of three unrelated cell values. Most templates skip it.

Complex tables with merged cells need headers attributes referring to id values on the header cells, but the simpler the table, the better the audit result. When in doubt, split a complex table into two simple ones.

How AI agents benefit from tagged PDFs too

A tagged PDF is a free annotation for downstream LLM ingestion. RAG pipelines that parse PDFs face the same problem screen readers do: an untagged PDF is a flat stream of glyph positions, and the pipeline has to reconstruct headings, paragraphs, lists, tables, and reading order from layout heuristics. The reconstruction is slow, expensive, and frequently wrong on multi-column or table-heavy documents.

A tagged PDF gives the pipeline an explicit reading order from the tag tree, explicit heading boundaries to chunk on, and explicit table structure with row and column relationships intact. The same investment that makes a document accessible to a blind user makes it 5 to 10 times more accurate to ingest into a vector store.

This is the strongest internal business case for tagging: it pays off twice. Once for compliance, once for every AI feature that consumes the document later.

Testing in CI

The minimum viable pipeline is render, validate, fail on regression:

#!/usr/bin/env bash
set -euo pipefail
 
# 1. Render the PDF with PDF4.dev (or your engine of choice)
curl -sS -X POST https://pdf4.dev/api/v1/render \
  -H "Authorization: Bearer $PDF4_API_KEY" \
  -H "Content-Type: application/json" \
  -d @sample-data.json \
  -o invoice.pdf
 
# 2. Validate against PDF/UA-2 with veraPDF
verapdf --flavour ua2 --format json invoice.pdf > report.json
 
# 3. Fail the build on any non-passed check
node -e "
  const r = require('./report.json').report.jobs[0].validationResult;
  if (!r.isCompliant) {
    console.error(\`PDF/UA-2 audit failed: \${r.totalAssertions} checks\`);
    process.exit(1);
  }
"

PDF4.dev logs every render in the dashboard logs page with status, duration, and template id, so regressions caught by veraPDF in CI can be cross-referenced against the exact render that produced the failing file.

For documents generated on demand in production (one-off invoices, statements, certificates), run veraPDF as a sidecar service and reject any render whose validation fails before the file ever reaches the customer. The veraPDF CLI takes around 200 to 400 ms per page on a modern server, which fits inside most acceptable render budgets.

When to ship PDF/UA-2 plus PDF/A-3 together

Accessibility and archival are orthogonal but compatible. A PDF can be both PDF/UA-2 and PDF/A-3 at the same time, and most regulated archives now expect both. PDF/A-3 covers the long-term reproducibility requirement: every font embedded, every color profile embedded, no JavaScript, no encryption, valid XMP metadata. PDF/UA-2 covers the accessibility requirement on top.

The combination is the right default for any document that needs to be retained for years and made available to end users in the EU: e-invoices, statements, contracts, government correspondence. The PDF/A compliance guide covers the archival side in detail. The two standards do not conflict, and the tooling (Ghostscript for PDF/A, veraPDF for both) overlaps cleanly in a CI pipeline.

What's next: PDF/UA-2 audits ramp up in 2026 to 2027

The German enforcement body has signaled that 2026 is the year warning letters convert into fines for repeat non-compliance, and the French body is expected to start enforcement in the second half of 2026. Industry-specific regulators (banking, insurance, transport) are layering their own accessibility requirements on top of the EAA baseline. The EU's Web Accessibility Directive (EU) 2016/2102 already required PDF/UA-1 for public-sector documents since 2018, and the experience there is that the first two years are warnings, the third year is fines.

The compliance path is technical. Author semantic HTML, render with a tagging-aware engine, embed PDF/UA-2 XMP metadata, mark decorative content as artifact, validate with veraPDF in CI, inspect with PAC 2024 before launch, and test with a real screen reader before shipping. Each step is small. Skipping any of them is what makes audits fail.

PDF4.dev is investing in this pipeline through 2026: tagged-PDF improvements in the rendering engine, veraPDF as an opt-in validation step on every render, and a PDF/UA-2 conformance badge in the dashboard logs. If you want to be notified when these ship, create an account and watch the changelog.

Related: PDF/A compliance guide · EU e-invoicing mandates 2026 · CSS print styles for PDF generation · flatten PDFTry it free

Free tools mentioned:

Flatten PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.