AI document generation combines LLMs with PDF rendering APIs to create invoices, reports, contracts, and certificates from natural language prompts or structured data. The LLM writes or populates the content; a headless browser or PDF API converts it to a file. This guide covers the architecture, code patterns, and production considerations for building AI-powered document workflows in 2026.
What AI document generation actually is
AI document generation uses a large language model (LLM) at some point in the document creation pipeline. This ranges from simple (LLM writes the text, human pastes it into a template) to fully agentic (LLM receives a prompt, calls a PDF API via MCP, and returns a download link).
The key distinction from traditional document automation: traditional systems require you to define every field and template in advance. AI systems can infer structure, write variable content, or create templates dynamically from a description.
There are three main patterns, ordered by LLM involvement:
| Pattern | LLM role | Data source | Best for |
|---|---|---|---|
| Template generation | Creates the HTML/CSS template once | Your database | Consistent branded documents |
| Content generation | Writes the body text for each document | LLM inference | Reports, summaries, analysis |
| Full agentic | Selects template, fills data, renders PDF | Mixed | Conversational workflows, AI assistants |
Pattern 1: LLM generates the template, you render the data
The most production-safe approach. Use the LLM to write a Handlebars HTML template once, then pass real data at render time. The LLM never touches customer data — it only creates the structure.
Why this works: LLMs are excellent at writing HTML and CSS. Generating a pixel-perfect invoice layout from a plain-English description takes seconds. Doing it by hand takes hours. The LLM's creative role is separated from the data integrity requirement.
Example: generate an invoice template with Claude
Send this prompt to Claude or GPT-4o:
Create an HTML invoice template using Handlebars syntax.
Variables: {{company.name}}, {{company.logo}}, {{invoice.number}},
{{invoice.date}}, {{invoice.dueDate}}, {{customer.name}},
{{customer.address}}, {{#each items}}{{name}}, {{qty}}, {{unitPrice}}, {{total}}{{/each}},
{{subtotal}}, {{tax}}, {{grandTotal}}.
Design: clean, professional, A4 width, CSS print styles, page-break-avoid on table rows.
Output: single HTML file with embedded CSS, no external dependencies.
The LLM returns a complete HTML template in under 10 seconds. Save it as a template in PDF4.dev, then call the render API with your data:
import fetch from "node-fetch";
const response = await fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
templateId: "tpl_invoice_001",
variables: {
company: { name: "Acme Corp", logo: "https://acme.com/logo.png" },
invoice: {
number: "INV-2026-0042",
date: "2026-04-02",
dueDate: "2026-05-02",
},
customer: {
name: "Globex Ltd",
address: "742 Evergreen Terrace, Springfield",
},
items: [
{ name: "API access (Pro)", qty: 1, unitPrice: 99, total: 99 },
{ name: "Overage (1,200 renders)", qty: 1200, unitPrice: 0.002, total: 2.4 },
],
subtotal: 101.4,
tax: 20.28,
grandTotal: 121.68,
},
}),
});
const pdf = await response.arrayBuffer();
await fs.promises.writeFile("invoice.pdf", Buffer.from(pdf));The LLM created the template; your database provides the data. No hallucinated totals, no missing fields.
Pattern 2: LLM generates the document content
For content-heavy documents — monthly reports, analysis summaries, meeting minutes — the LLM writes the body text. A fixed HTML shell wraps the generated content and renders to PDF.
When to use this: when the document content is unique per render (a market analysis, a personalized report, a legal summary) and cannot be populated from a database alone. The LLM's language generation is the primary value.
Production pattern for AI-generated reports
import Anthropic from "@anthropic-ai/sdk";
import fetch from "node-fetch";
// Step 1: generate content with the LLM
const claude = new Anthropic();
const contentResponse = await claude.messages.create({
model: "claude-opus-4-5",
max_tokens: 2000,
messages: [
{
role: "user",
content: `Write a monthly performance summary for ${customer.name}.
Data: ${JSON.stringify(metrics)}
Format: HTML fragments only (no <html>/<body> tags).
Include: executive summary (2 sentences), key metrics table, 3 highlights, 2 risks.
Tone: professional, specific, no filler words.`,
},
],
});
const generatedHtml = contentResponse.content[0].text;
// Step 2: wrap in branded shell and render to PDF
const htmlShell = `
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<style>
body { font-family: Inter, sans-serif; color: #111; max-width: 680px; margin: 40px auto; }
h1 { font-size: 24px; color: #1a1a2e; }
table { width: 100%; border-collapse: collapse; }
th, td { padding: 8px 12px; border: 1px solid #e5e5e5; text-align: left; }
th { background: #f5f5f5; }
@page { margin: 20mm; }
</style>
</head>
<body>
<h1>${customer.name} — Monthly Report (${month})</h1>
${generatedHtml}
<footer style="margin-top:40px;font-size:11px;color:#666;">
Generated ${new Date().toISOString().split("T")[0]} · Confidential
</footer>
</body>
</html>
`;
// Step 3: render with PDF4.dev
const pdfResponse = await fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ html: htmlShell }),
});
const pdf = await pdfResponse.arrayBuffer();The LLM inference runs in roughly 2-4 seconds for a 2,000-token response. The PDF render adds another 1-2 seconds. Total end-to-end time for an AI-generated, branded, print-ready PDF: under 6 seconds.
Pattern 3: fully agentic via MCP
The Model Context Protocol (MCP) lets an AI agent call external APIs mid-conversation. PDF4.dev provides an MCP server that exposes PDF generation as a tool. The agent can list templates, select one, fill variables, and render the PDF — all in a single conversation turn.
When to use this: in AI assistants, Slack bots, or customer-facing chatbots where a user asks "generate my invoice" and the system handles everything without a developer writing orchestration code.
Set up PDF4.dev MCP in Claude Desktop
Add this to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"pdf4dev": {
"url": "https://pdf4.dev/api/mcp/sse",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}After restarting Claude Desktop, Claude can:
- List your saved templates
- Render a template with variables you describe in natural language
- Return a PDF download link
Example conversation:
User: Generate an invoice for Globex Ltd for 3 months of API access at $99/month.
Claude: I'll create that invoice for you. [calls
pdf4dev_renderwith template variables] Here's your invoice: [download link]
No backend code. No orchestration layer. The agent handles the data extraction, template selection, and API call.
Full MCP setup for multiple AI clients (Cursor, ChatGPT, Windsurf) is covered in the MCP protocol explained guide.
Preventing hallucinations in document generation
LLMs hallucinate. In documents, a hallucinated invoice total or wrong customer name is a serious problem. Three patterns prevent this:
1. Separate structure from data. Let the LLM create the Handlebars template (structure). Pass verified data from your database at render time. The LLM never sees or generates numerical values.
2. Validate before rendering. For content-heavy documents, parse the LLM output before passing it to the PDF API. Check that required fields exist, that numbers are plausible, that no fields are empty. Reject and retry if validation fails.
3. Use constrained output. Ask the LLM for JSON, not free-form HTML. Define a schema:
const schema = z.object({
executiveSummary: z.string().max(300),
highlights: z.array(z.string()).length(3),
risks: z.array(z.string()).length(2),
metrics: z.record(z.number()),
});
// Parse LLM output against schema before using it
const validated = schema.parse(JSON.parse(llmOutput));A structured output failure rate under 1% is achievable with GPT-4o and Claude Sonnet with the right prompting. For critical documents (contracts, tax invoices), always add a human review step before delivery.
Choosing an AI model for document generation
Different documents need different models. The tradeoff is speed vs. quality vs. cost.
| Document type | Recommended model | Latency | Cost per doc |
|---|---|---|---|
| Invoices (template fill) | GPT-4o-mini / Claude Haiku | ~0.5s | ~$0.0003 |
| Monthly reports (1,500 words) | Claude Sonnet 4.5 / GPT-4o | ~3s | ~$0.005 |
| Legal summaries (complex analysis) | Claude Opus 4.5 / o3 | ~8s | ~$0.02 |
| Certificates (short text) | GPT-4o-mini | ~0.3s | ~$0.0001 |
| Batch (>1,000 docs/day) | Claude Haiku / GPT-4o-mini | ~0.5s | ~$0.0001 |
For most business document use cases, a mid-tier model (Claude Sonnet, GPT-4o) hits the right quality/cost balance. Reserve frontier models (Opus, o3) for documents where quality errors have real consequences (legal, financial).
AI document generation at scale: production concerns
Generating 10 documents a day is easy. Generating 10,000 requires engineering decisions.
Concurrency. LLM APIs have rate limits (tokens per minute). PDF rendering APIs have concurrent request limits. Design your queue to stay within both. PDF4.dev handles concurrent renders gracefully — multiple simultaneous requests to /api/v1/render are queued server-side.
Caching templates. If the LLM generates a template once per document type, cache it. Don't regenerate the same invoice template 1,000 times. Store it in your PDF4.dev dashboard and reference it by templateId.
Retry logic. LLM calls fail. PDF renders fail. Build exponential backoff with a max of 3 retries. Log failures separately from successes so you can identify patterns (a specific template that always fails, an LLM prompt that consistently returns malformed HTML).
Cost control. At 10,000 documents/day with GPT-4o for content generation at $0.005/doc, you're spending $50/day on LLM inference alone. Profile your actual token usage before scaling. Use smaller models for simpler document types.
Audit trail. For invoices, contracts, and compliance documents, log the exact inputs (variables, template version) and the resulting PDF hash. This lets you reproduce any document exactly and prove what was generated.
Comparison: AI document generation approaches
| Approach | Setup | Flexibility | Hallucination risk | Cost |
|---|---|---|---|---|
| LLM template + data API | Low | High | Very low | Low |
| LLM content + fixed shell | Medium | High | Medium | Medium |
| Full agentic (MCP) | Low | Very high | Medium | Medium |
| Traditional template only | High | Low | None | Very low |
| Manual document creation | None | Unlimited | None | High (human time) |
The right choice depends on your document type. For invoices and certificates, use LLM template + data API. For personalized reports, use LLM content + fixed shell. For conversational AI products, use MCP.
Building a complete AI invoice generator
Here is a complete, production-ready example combining all three steps: LLM template generation, data injection, and PDF rendering.
import Anthropic from "@anthropic-ai/sdk";
import fetch from "node-fetch";
import { z } from "zod";
const INVOICE_TEMPLATE_ID = "tpl_invoice_ai_001"; // pre-created in PDF4.dev
const invoiceSchema = z.object({
number: z.string(),
date: z.string(),
dueDate: z.string(),
customer: z.object({ name: z.string(), address: z.string(), email: z.string() }),
items: z.array(z.object({ name: z.string(), qty: z.number(), unitPrice: z.number() })),
});
// Step 1: extract structured data from natural language
async function extractInvoiceData(prompt: string) {
const claude = new Anthropic();
const response = await claude.messages.create({
model: "claude-haiku-4-5",
max_tokens: 500,
messages: [
{
role: "user",
content: `Extract invoice data from this request and return JSON matching the schema.
Schema: {number, date, dueDate, customer: {name, address, email}, items: [{name, qty, unitPrice}]}
Request: ${prompt}
Today: ${new Date().toISOString().split("T")[0]}
Return only valid JSON.`,
},
],
});
return invoiceSchema.parse(JSON.parse(response.content[0].text));
}
// Step 2: calculate totals (never let LLM do math)
function calculateTotals(items: { qty: number; unitPrice: number }[]) {
const subtotal = items.reduce((sum, i) => sum + i.qty * i.unitPrice, 0);
const tax = subtotal * 0.2;
return { subtotal, tax, grandTotal: subtotal + tax };
}
// Step 3: render PDF via PDF4.dev
async function renderInvoice(data: z.infer<typeof invoiceSchema>) {
const totals = calculateTotals(data.items);
const response = await fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
templateId: INVOICE_TEMPLATE_ID,
variables: {
...data,
items: data.items.map((i) => ({
...i,
total: (i.qty * i.unitPrice).toFixed(2),
})),
...totals,
},
}),
});
return response.arrayBuffer();
}
// Main
async function generateInvoiceFromPrompt(prompt: string) {
const data = await extractInvoiceData(prompt);
const pdf = await renderInvoice(data);
return pdf;
}
// Example usage
const pdf = await generateInvoiceFromPrompt(
"Invoice for TechCorp ([email protected], 100 Main St) for 2 months of Pro plan at $199/month. Invoice #INV-2026-007, due in 30 days."
);
await fs.promises.writeFile("output.pdf", Buffer.from(pdf));Note that totals are calculated in application code, not by the LLM. This is the single most important guard against document errors in AI generation pipelines.
For more invoice generation patterns, see the programmatic invoice generation guide and the AI-powered invoice generation deep dive.
What comes next: document agents
The next evolution of AI document generation is multi-step document agents: systems that can reason about which document to create, retrieve relevant data from external sources, draft, review, and revise — all without human intervention.
The building blocks exist today: LLMs with tool use (function calling), MCP for API access, vector databases for template retrieval, and reliable PDF APIs for rendering. What's missing is orchestration tooling mature enough for production document workflows at scale.
If you are building in this space, the generate PDFs with AI agents guide covers the MCP setup and agentic patterns in detail.
To get started with AI document generation in your application, sign up for a PDF4.dev account. The free tier includes 50 renders per month, full API access, and the MCP server for AI agent integration.
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.