PDF conversion is the process of transforming a PDF file into another format (JPG, PNG, Word, HTML) or converting another format into PDF. It is one of the most common document processing tasks, with over 290 billion PDFs created annually and constant demand to exchange content between formats. This guide covers every major conversion direction with working code examples, recommended tools, and practical parameters for each use case.
PDF conversion: direction and format overview
The conversion direction determines which tools and techniques apply. There is no universal "convert PDF" tool because the underlying requirements differ completely.
| Conversion | Common use cases | Key constraint |
|---|---|---|
| PDF to JPG | Thumbnails, email, social media | Raster output, no text layer |
| PDF to PNG | Diagrams, transparency, archives | Larger files than JPG |
| PDF to Word (DOCX) | Editing existing documents | Layout fidelity varies |
| PDF to plain text | Data extraction, indexing | Loses all formatting |
| HTML to PDF | Document generation, invoices | Requires headless browser |
| Image to PDF | Packaging files, scanning | Rasterized PDF (no text layer) |
| PDF to PDF/A | Long-term archiving | ISO 19005 compliance |
The most common use cases are PDF to image (for previewing and sharing) and HTML to PDF (for programmatically generating documents). We cover both in depth below.
How to convert PDF to JPG
Converting a PDF to JPEG produces one image file per page. JPG is the right format when file size matters: email attachments, CMS thumbnails, social media previews.
Browser tool (no upload required)
The fastest approach for one-off conversions:
- Open PDF to JPG
- Drop your PDF onto the upload zone
- Choose quality: Low (60%), High (85%), or Maximum (100%)
- Download each page as a JPG
The tool runs entirely in your browser using pdfjs-dist: your files are never uploaded to any server. This matters in 2026: the FBI issued a warning in March 2025 about criminals distributing malware through fake online converters.
Python with pdf2image
pdf2image wraps Poppler and is the most widely used Python library for PDF-to-image conversion:
pip install pdf2image
# Requires Poppler: brew install poppler (macOS) or apt install poppler-utils (Ubuntu)from pdf2image import convert_from_path
images = convert_from_path("document.pdf", dpi=150, fmt="jpeg")
for i, img in enumerate(images):
img.save(f"page_{i + 1}.jpg", "JPEG", quality=85)
print(f"Converted {len(images)} pages")Node.js with pdfjs-dist
PDF.js (Mozilla's open-source PDF engine) works in Node.js via the pdfjs-dist package:
npm install pdfjs-dist canvasimport { getDocument } from "pdfjs-dist/legacy/build/pdf.mjs";
import { createCanvas } from "canvas";
import fs from "fs";
async function pdfToJpg(pdfPath, dpi = 150, quality = 85) {
const scale = dpi / 72; // PDF uses 72 points per inch (ISO 32000)
const data = new Uint8Array(fs.readFileSync(pdfPath));
const pdf = await getDocument({ data }).promise;
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i);
const viewport = page.getViewport({ scale });
const canvas = createCanvas(viewport.width, viewport.height);
await page.render({ canvasContext: canvas.getContext("2d"), viewport }).promise;
fs.writeFileSync(`page_${i}.jpg`, canvas.toBuffer("image/jpeg", { quality: quality / 100 }));
}
}
await pdfToJpg("document.pdf");Command line with Poppler (pdftoppm)
# All pages, 150 DPI, JPEG at quality 85
pdftoppm -jpeg -jpegopt quality=85 -r 150 document.pdf page
# Output: page-1.jpg, page-2.jpg, ...JPEG quality and DPI reference
JPEG quality settings are not standardized across tools, but the underlying trade-off is consistent: higher quality means larger files. At 85%, the quality difference from 100% is nearly invisible to the human eye for most documents, but the file is roughly 3-5x smaller.
| DPI | A4 output size | Best for |
|---|---|---|
| 72 | 595 x 842 px | Web thumbnails (small) |
| 150 | 1240 x 1754 px | General use, screen, presentations |
| 300 | 2480 x 3508 px | Professional print output |
See the full walkthrough in our PDF to JPG guide.
How to convert PDF to PNG
PNG conversion follows the same process as JPG but uses lossless compression. This produces larger files but preserves every pixel exactly, with support for transparency (alpha channel). Use PNG when converting text-heavy PDFs, technical diagrams, or any content you plan to edit further.
When to choose PNG over JPG
| JPG | PNG | |
|---|---|---|
| Compression | Lossy | Lossless |
| Transparency | Not supported | Supported |
| Text legibility | Slight artifacts at low quality | Pixel-perfect |
| File size (A4 at 150 DPI) | ~400-800 KB | ~1-3 MB |
Choose JPG for photos, gradients, and social media sharing. Choose PNG for diagrams, screenshots, text documents, and anything requiring a transparent background.
Python
from pdf2image import convert_from_path
images = convert_from_path("document.pdf", dpi=150, fmt="png")
for i, img in enumerate(images):
img.save(f"page_{i + 1}.png", "PNG")Command line
# PNG output via pdftoppm
pdftoppm -png -r 150 document.pdf page
# Output: page-1.png, page-2.png, ...For the full PNG conversion guide including browser tool, see how to convert PDF to PNG.
How to convert PDF to Word (DOCX)
Converting PDF to an editable Word document is the most technically challenging PDF conversion direction. PDFs store content as positioned elements (text runs, images, paths), not as structured paragraphs. Extracting this into Word's OOXML format requires heuristic reconstruction of layout, paragraphs, tables, and fonts.
Limitation: no tool perfectly converts all PDFs. Complex layouts with columns, tables, custom fonts, and embedded images typically require manual cleanup after conversion.
LibreOffice (free, open-source)
LibreOffice is the most capable free option for PDF to DOCX conversion:
# Install LibreOffice
# macOS: brew install --cask libreoffice
# Ubuntu: apt install libreoffice
# Convert PDF to DOCX
libreoffice --headless --convert-to docx --outdir ./output document.pdfBatch conversion of a folder:
libreoffice --headless --convert-to docx --outdir ./output *.pdfPython with python-docx and pdfplumber
For text extraction without layout reconstruction, pdfplumber provides reliable character-level text extraction:
pip install pdfplumber python-docximport pdfplumber
from docx import Document
def pdf_to_docx(pdf_path, output_path):
doc = Document()
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
text = page.extract_text()
if text:
doc.add_paragraph(text)
doc.add_page_break()
doc.save(output_path)
pdf_to_docx("document.pdf", "output.docx")This approach extracts raw text without formatting. For layout-preserving conversion, LibreOffice or commercial APIs (Adobe PDF Services, Microsoft Graph) produce significantly better results.
Microsoft Graph API (programmatic, layout-aware)
For applications that need high-fidelity PDF to Word conversion in production, the Microsoft Graph API converts PDFs via OneDrive:
// Requires Microsoft 365 tenant and Azure app registration
const response = await fetch(
`https://graph.microsoft.com/v1.0/me/drive/items/${itemId}/content?format=docx`,
{ headers: { Authorization: `Bearer ${accessToken}` } }
);
const docxBuffer = await response.arrayBuffer();How to convert HTML to PDF
HTML to PDF is the reverse direction: generating a PDF document from web content. This is the standard method for programmatic document generation (invoices, reports, certificates, contracts).
Why HTML and CSS are the best PDF template language
HTML and CSS give you complete control over page layout, typography, and visual design. Headless Chromium renders HTML with the same CSS engine as Chrome, producing pixel-perfect output. The alternative approaches (LaTeX, PDF libraries like ReportLab or jsPDF) require learning proprietary APIs and lack the ecosystem of web design tools.
Method 1: PDF4.dev API (production-ready)
PDF4.dev is an HTML-to-PDF API that handles the headless browser infrastructure for you. Create a template in the dashboard with Handlebars variables, then call the API with your data:
const response = await fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
template_id: "invoice",
data: {
company_name: "Acme Corp",
invoice_number: "INV-2026-001",
total: "$1,500.00",
},
}),
});
const pdfBuffer = await response.arrayBuffer();You can also pass raw HTML directly without a template:
const response = await fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
html: "<h1>Hello World</h1><p>This is a PDF generated from HTML.</p>",
format: { preset: "a4" },
}),
});Try the free HTML to PDF converter to generate a PDF from any HTML snippet without an API key.
Method 2: Playwright (self-hosted)
Playwright is Microsoft's headless browser library. It supports PDF generation directly via page.pdf():
npm install playwright
npx playwright install chromiumimport { chromium } from "playwright";
async function htmlToPdf(html, outputPath) {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.setContent(html, { waitUntil: "load" });
await page.pdf({
path: outputPath,
format: "A4",
margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" },
printBackground: true,
});
await browser.close();
}
await htmlToPdf("<h1>Invoice</h1>", "invoice.pdf");The self-hosting trade-off: Playwright works well for development and low-volume use. At scale, you face Docker image bloat (Chromium adds ~300 MB), concurrency limits (each render needs a browser page), cold start issues in serverless environments, and occasional browser crashes under memory pressure. An API like PDF4.dev uses the same Chromium engine but manages the infrastructure layer for you.
For a detailed comparison and production considerations, see our Node.js PDF generation guide.
CSS for PDF: page control
When generating PDFs from HTML, use CSS @page and @media print to control pagination:
@page {
size: A4;
margin: 20mm 15mm;
}
/* Force a page break before an element */
.page-break {
page-break-before: always;
}
/* Prevent breaks inside an element */
.keep-together {
page-break-inside: avoid;
}
/* Repeat table headers across pages */
thead {
display: table-header-group;
}How to convert images to PDF
Converting images (JPG, PNG, TIFF) to PDF packages them into a single document for sharing, archiving, or printing. The resulting PDF is a rasterized document (images only, no text layer).
Browser tool
The Image to PDF converter accepts JPG, PNG, and WebP files, allows reordering, and outputs a single merged PDF without any server upload.
Python with Pillow and reportlab
pip install pillow reportlabfrom PIL import Image
from reportlab.lib.pagesizes import A4
from reportlab.platypus import SimpleDocTemplate, Image as RLImage
import os
def images_to_pdf(image_paths, output_path):
doc = SimpleDocTemplate(output_path, pagesize=A4)
page_w, page_h = A4
story = []
for img_path in image_paths:
img = Image.open(img_path)
img_w, img_h = img.size
# Scale to fit A4 width with margins
scale = (page_w - 40) / img_w
story.append(RLImage(img_path, width=img_w * scale, height=img_h * scale))
doc.build(story)
images_to_pdf(["scan_1.jpg", "scan_2.jpg", "scan_3.jpg"], "scans.pdf")Command line with ImageMagick
# Combine multiple images into one PDF
convert scan_1.jpg scan_2.jpg scan_3.jpg output.pdf
# With DPI metadata preserved
convert -density 300 scan_1.jpg scan_2.jpg output.pdfFor the detailed guide including file reordering and multi-format support, see how to convert images to PDF.
Batch PDF conversion
For converting multiple files at once, command-line tools are the most efficient approach.
Batch PDF to JPG (shell)
# Convert all PDFs in the current directory to JPG pages
for pdf in *.pdf; do
name="${pdf%.pdf}"
pdftoppm -jpeg -jpegopt quality=85 -r 150 "$pdf" "${name}_page"
doneBatch HTML to PDF (API)
When generating multiple PDFs (invoices, reports, certificates), send parallel requests to the PDF4.dev API. Most API rate limits allow 10-50 concurrent requests:
const documents = [
{ template_id: "invoice", data: { invoice_number: "INV-001", total: "$100" } },
{ template_id: "invoice", data: { invoice_number: "INV-002", total: "$200" } },
{ template_id: "invoice", data: { invoice_number: "INV-003", total: "$300" } },
];
const results = await Promise.all(
documents.map((doc) =>
fetch("https://pdf4.dev/api/v1/render", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(doc),
}).then((r) => r.arrayBuffer())
)
);PDF conversion format comparison
| Format | Quality | File size | Text selectable | Best for |
|---|---|---|---|---|
| JPG | Lossy (adjustable) | Small | No | Photos, sharing, thumbnails |
| PNG | Lossless | Medium | No | Diagrams, text, transparency |
| TIFF | Lossless | Large | No | Archival, professional scanning |
| DOCX | Variable | Small | Yes (extracted) | Editing existing documents |
| Plain text | N/A | Tiny | Yes | Data extraction, indexing |
| PDF/A | Lossless | Medium | Yes | Long-term archiving (ISO 19005) |
| HTML | N/A | Variable | Yes | Web display, templating |
Choosing the right tool
The right conversion tool depends on your volume, environment, and privacy requirements.
| Scenario | Recommended tool | Why |
|---|---|---|
| One-off conversion, privacy required | PDF4.dev browser tools | Client-side, no upload |
| Scripting and automation | pdftoppm / LibreOffice CLI | Free, reliable, batch-capable |
| Production application (PDF generation) | PDF4.dev API | Managed browser, scales automatically |
| Python data pipeline | pdf2image + pdfplumber | Pythonic, well-maintained |
| Node.js application | pdfjs-dist or PDF4.dev SDK | Same engine as browsers |
| Maximum Word fidelity | LibreOffice or Microsoft Graph | Best layout reconstruction |
For document generation (HTML to PDF), using an API is almost always the better choice over self-hosting Playwright in production. The infrastructure overhead (Docker, concurrency, browser crashes, memory) compounds quickly as volume grows.
Privacy and security considerations
PDF conversion tools have become a significant security vector. The FBI and CISA both issued warnings in 2025 about malware distributed via fake online converters, specifically targeting file conversion search queries. Malwarebytes identified specific converter domains distributing ransomware through Google Ads.
For sensitive documents (contracts, financial records, medical files), use:
- Local tools: command-line tools (pdftoppm, LibreOffice) running on your own machine
- Client-side browser tools: tools that use WebAssembly and the Web PDF API so files never leave your device. All PDF4.dev tools work this way
- Trusted API providers: for programmatic use, verify that the API does not retain PDF content after processing
Avoid uploading sensitive documents to unknown online converters. When in doubt, run a quick scan of any downloaded conversion software before installing it.
PDF conversion performance benchmarks
| Method | 10-page PDF to JPG | Notes |
|---|---|---|
| pdftoppm (CLI) | ~0.5-2s | Fastest, C-based Poppler |
| pdf2image (Python) | ~1-3s | Wraps pdftoppm |
| pdfjs-dist (Node.js) | ~3-8s | JavaScript rendering |
| Browser tool | ~5-15s | Includes UI overhead |
| PDF4.dev API (HTML to PDF) | ~200-500ms | Warm browser pool |
These are approximate ranges for a typical text-heavy A4 document on modern hardware. Actual performance varies significantly by document complexity, image content, and available CPU.
Common PDF conversion errors
"Error: no such file or directory" (Poppler not installed)
pdftoppm requires Poppler to be installed. On macOS: brew install poppler. On Ubuntu: apt install poppler-utils.
Blank output pages
Usually caused by an encrypted PDF. Run the file through PDF unlock first, then retry the conversion.
Blurry images
Low DPI setting. Increase from 72 DPI (the PDF native resolution) to 150 DPI for general use or 300 DPI for print quality.
Missing fonts in Word output
LibreOffice substitutes fonts not installed on the system. For consistent output, install the required fonts before running the conversion, or accept the substitution for body text.
Page breaks in wrong places (HTML to PDF)
Add CSS page-break-inside: avoid to elements that should not be split across pages (tables, cards, section blocks). Use page-break-before: always to force a new page before an element.
Summary
PDF conversion covers two fundamentally different problems: extracting content from existing PDFs (to JPG, PNG, or Word) and generating new PDFs from structured content (HTML to PDF).
For extraction, Poppler's pdftoppm is the fastest and most reliable command-line tool. For generation, headless Chromium (via Playwright or an API) is the standard approach. For one-off conversions without privacy concerns, browser-based tools running locally are the safest and most convenient option.
The free tools on PDF4.dev handle the most common cases entirely in your browser:
- PDF to JPG — export each page as JPEG
- PDF to PNG — export each page as PNG
- HTML to PDF — convert HTML snippets to PDF
- Image to PDF — package images into a PDF
- Compress PDF — reduce file size after conversion
- Merge PDF — combine multiple PDFs into one
For programmatic document generation in production, PDF4.dev's API removes the browser infrastructure layer while giving you full control over templates, variables, and PDF format.
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.