Get started

Complete guide to PDF conversion: every format, every method (2026)

PDF conversion explained: convert PDF to JPG, PNG, Word, HTML and more. Covers free browser tools, Python, Node.js, and command-line methods. Updated 2026.

benoitdedMarch 12, 202614 min read

PDF conversion is the process of transforming a PDF file into another format (JPG, PNG, Word, HTML) or converting another format into PDF. It is one of the most common document processing tasks, with over 290 billion PDFs created annually and constant demand to exchange content between formats. This guide covers every major conversion direction with working code examples, recommended tools, and practical parameters for each use case.

PDF conversion: direction and format overview

The conversion direction determines which tools and techniques apply. There is no universal "convert PDF" tool because the underlying requirements differ completely.

ConversionCommon use casesKey constraint
PDF to JPGThumbnails, email, social mediaRaster output, no text layer
PDF to PNGDiagrams, transparency, archivesLarger files than JPG
PDF to Word (DOCX)Editing existing documentsLayout fidelity varies
PDF to plain textData extraction, indexingLoses all formatting
HTML to PDFDocument generation, invoicesRequires headless browser
Image to PDFPackaging files, scanningRasterized PDF (no text layer)
PDF to PDF/ALong-term archivingISO 19005 compliance

The most common use cases are PDF to image (for previewing and sharing) and HTML to PDF (for programmatically generating documents). We cover both in depth below.

How to convert PDF to JPG

Converting a PDF to JPEG produces one image file per page. JPG is the right format when file size matters: email attachments, CMS thumbnails, social media previews.

Browser tool (no upload required)

The fastest approach for one-off conversions:

  1. Open PDF to JPG
  2. Drop your PDF onto the upload zone
  3. Choose quality: Low (60%), High (85%), or Maximum (100%)
  4. Download each page as a JPG

The tool runs entirely in your browser using pdfjs-dist: your files are never uploaded to any server. This matters in 2026: the FBI issued a warning in March 2025 about criminals distributing malware through fake online converters.

Python with pdf2image

pdf2image wraps Poppler and is the most widely used Python library for PDF-to-image conversion:

pip install pdf2image
# Requires Poppler: brew install poppler (macOS) or apt install poppler-utils (Ubuntu)
from pdf2image import convert_from_path
 
images = convert_from_path("document.pdf", dpi=150, fmt="jpeg")
 
for i, img in enumerate(images):
    img.save(f"page_{i + 1}.jpg", "JPEG", quality=85)
 
print(f"Converted {len(images)} pages")

Node.js with pdfjs-dist

PDF.js (Mozilla's open-source PDF engine) works in Node.js via the pdfjs-dist package:

npm install pdfjs-dist canvas
import { getDocument } from "pdfjs-dist/legacy/build/pdf.mjs";
import { createCanvas } from "canvas";
import fs from "fs";
 
async function pdfToJpg(pdfPath, dpi = 150, quality = 85) {
  const scale = dpi / 72; // PDF uses 72 points per inch (ISO 32000)
  const data = new Uint8Array(fs.readFileSync(pdfPath));
  const pdf = await getDocument({ data }).promise;
 
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const viewport = page.getViewport({ scale });
    const canvas = createCanvas(viewport.width, viewport.height);
    await page.render({ canvasContext: canvas.getContext("2d"), viewport }).promise;
    fs.writeFileSync(`page_${i}.jpg`, canvas.toBuffer("image/jpeg", { quality: quality / 100 }));
  }
}
 
await pdfToJpg("document.pdf");

Command line with Poppler (pdftoppm)

# All pages, 150 DPI, JPEG at quality 85
pdftoppm -jpeg -jpegopt quality=85 -r 150 document.pdf page
 
# Output: page-1.jpg, page-2.jpg, ...

JPEG quality and DPI reference

JPEG quality settings are not standardized across tools, but the underlying trade-off is consistent: higher quality means larger files. At 85%, the quality difference from 100% is nearly invisible to the human eye for most documents, but the file is roughly 3-5x smaller.

DPIA4 output sizeBest for
72595 x 842 pxWeb thumbnails (small)
1501240 x 1754 pxGeneral use, screen, presentations
3002480 x 3508 pxProfessional print output

See the full walkthrough in our PDF to JPG guide.

How to convert PDF to PNG

PNG conversion follows the same process as JPG but uses lossless compression. This produces larger files but preserves every pixel exactly, with support for transparency (alpha channel). Use PNG when converting text-heavy PDFs, technical diagrams, or any content you plan to edit further.

When to choose PNG over JPG

JPGPNG
CompressionLossyLossless
TransparencyNot supportedSupported
Text legibilitySlight artifacts at low qualityPixel-perfect
File size (A4 at 150 DPI)~400-800 KB~1-3 MB

Choose JPG for photos, gradients, and social media sharing. Choose PNG for diagrams, screenshots, text documents, and anything requiring a transparent background.

Python

from pdf2image import convert_from_path
 
images = convert_from_path("document.pdf", dpi=150, fmt="png")
 
for i, img in enumerate(images):
    img.save(f"page_{i + 1}.png", "PNG")

Command line

# PNG output via pdftoppm
pdftoppm -png -r 150 document.pdf page
# Output: page-1.png, page-2.png, ...

For the full PNG conversion guide including browser tool, see how to convert PDF to PNG.

How to convert PDF to Word (DOCX)

Converting PDF to an editable Word document is the most technically challenging PDF conversion direction. PDFs store content as positioned elements (text runs, images, paths), not as structured paragraphs. Extracting this into Word's OOXML format requires heuristic reconstruction of layout, paragraphs, tables, and fonts.

Limitation: no tool perfectly converts all PDFs. Complex layouts with columns, tables, custom fonts, and embedded images typically require manual cleanup after conversion.

LibreOffice (free, open-source)

LibreOffice is the most capable free option for PDF to DOCX conversion:

# Install LibreOffice
# macOS: brew install --cask libreoffice
# Ubuntu: apt install libreoffice
 
# Convert PDF to DOCX
libreoffice --headless --convert-to docx --outdir ./output document.pdf

Batch conversion of a folder:

libreoffice --headless --convert-to docx --outdir ./output *.pdf

Python with python-docx and pdfplumber

For text extraction without layout reconstruction, pdfplumber provides reliable character-level text extraction:

pip install pdfplumber python-docx
import pdfplumber
from docx import Document
 
def pdf_to_docx(pdf_path, output_path):
    doc = Document()
 
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            text = page.extract_text()
            if text:
                doc.add_paragraph(text)
            doc.add_page_break()
 
    doc.save(output_path)
 
pdf_to_docx("document.pdf", "output.docx")

This approach extracts raw text without formatting. For layout-preserving conversion, LibreOffice or commercial APIs (Adobe PDF Services, Microsoft Graph) produce significantly better results.

Microsoft Graph API (programmatic, layout-aware)

For applications that need high-fidelity PDF to Word conversion in production, the Microsoft Graph API converts PDFs via OneDrive:

// Requires Microsoft 365 tenant and Azure app registration
const response = await fetch(
  `https://graph.microsoft.com/v1.0/me/drive/items/${itemId}/content?format=docx`,
  { headers: { Authorization: `Bearer ${accessToken}` } }
);
const docxBuffer = await response.arrayBuffer();

How to convert HTML to PDF

HTML to PDF is the reverse direction: generating a PDF document from web content. This is the standard method for programmatic document generation (invoices, reports, certificates, contracts).

Why HTML and CSS are the best PDF template language

HTML and CSS give you complete control over page layout, typography, and visual design. Headless Chromium renders HTML with the same CSS engine as Chrome, producing pixel-perfect output. The alternative approaches (LaTeX, PDF libraries like ReportLab or jsPDF) require learning proprietary APIs and lack the ecosystem of web design tools.

Method 1: PDF4.dev API (production-ready)

PDF4.dev is an HTML-to-PDF API that handles the headless browser infrastructure for you. Create a template in the dashboard with Handlebars variables, then call the API with your data:

const response = await fetch("https://pdf4.dev/api/v1/render", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    template_id: "invoice",
    data: {
      company_name: "Acme Corp",
      invoice_number: "INV-2026-001",
      total: "$1,500.00",
    },
  }),
});
 
const pdfBuffer = await response.arrayBuffer();

You can also pass raw HTML directly without a template:

const response = await fetch("https://pdf4.dev/api/v1/render", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    html: "<h1>Hello World</h1><p>This is a PDF generated from HTML.</p>",
    format: { preset: "a4" },
  }),
});

Try the free HTML to PDF converter to generate a PDF from any HTML snippet without an API key.

Method 2: Playwright (self-hosted)

Playwright is Microsoft's headless browser library. It supports PDF generation directly via page.pdf():

npm install playwright
npx playwright install chromium
import { chromium } from "playwright";
 
async function htmlToPdf(html, outputPath) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.setContent(html, { waitUntil: "load" });
  await page.pdf({
    path: outputPath,
    format: "A4",
    margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" },
    printBackground: true,
  });
  await browser.close();
}
 
await htmlToPdf("<h1>Invoice</h1>", "invoice.pdf");

The self-hosting trade-off: Playwright works well for development and low-volume use. At scale, you face Docker image bloat (Chromium adds ~300 MB), concurrency limits (each render needs a browser page), cold start issues in serverless environments, and occasional browser crashes under memory pressure. An API like PDF4.dev uses the same Chromium engine but manages the infrastructure layer for you.

For a detailed comparison and production considerations, see our Node.js PDF generation guide.

CSS for PDF: page control

When generating PDFs from HTML, use CSS @page and @media print to control pagination:

@page {
  size: A4;
  margin: 20mm 15mm;
}
 
/* Force a page break before an element */
.page-break {
  page-break-before: always;
}
 
/* Prevent breaks inside an element */
.keep-together {
  page-break-inside: avoid;
}
 
/* Repeat table headers across pages */
thead {
  display: table-header-group;
}

How to convert images to PDF

Converting images (JPG, PNG, TIFF) to PDF packages them into a single document for sharing, archiving, or printing. The resulting PDF is a rasterized document (images only, no text layer).

Browser tool

The Image to PDF converter accepts JPG, PNG, and WebP files, allows reordering, and outputs a single merged PDF without any server upload.

Python with Pillow and reportlab

pip install pillow reportlab
from PIL import Image
from reportlab.lib.pagesizes import A4
from reportlab.platypus import SimpleDocTemplate, Image as RLImage
import os
 
def images_to_pdf(image_paths, output_path):
    doc = SimpleDocTemplate(output_path, pagesize=A4)
    page_w, page_h = A4
    story = []
 
    for img_path in image_paths:
        img = Image.open(img_path)
        img_w, img_h = img.size
        # Scale to fit A4 width with margins
        scale = (page_w - 40) / img_w
        story.append(RLImage(img_path, width=img_w * scale, height=img_h * scale))
 
    doc.build(story)
 
images_to_pdf(["scan_1.jpg", "scan_2.jpg", "scan_3.jpg"], "scans.pdf")

Command line with ImageMagick

# Combine multiple images into one PDF
convert scan_1.jpg scan_2.jpg scan_3.jpg output.pdf
 
# With DPI metadata preserved
convert -density 300 scan_1.jpg scan_2.jpg output.pdf

For the detailed guide including file reordering and multi-format support, see how to convert images to PDF.

Batch PDF conversion

For converting multiple files at once, command-line tools are the most efficient approach.

Batch PDF to JPG (shell)

# Convert all PDFs in the current directory to JPG pages
for pdf in *.pdf; do
  name="${pdf%.pdf}"
  pdftoppm -jpeg -jpegopt quality=85 -r 150 "$pdf" "${name}_page"
done

Batch HTML to PDF (API)

When generating multiple PDFs (invoices, reports, certificates), send parallel requests to the PDF4.dev API. Most API rate limits allow 10-50 concurrent requests:

const documents = [
  { template_id: "invoice", data: { invoice_number: "INV-001", total: "$100" } },
  { template_id: "invoice", data: { invoice_number: "INV-002", total: "$200" } },
  { template_id: "invoice", data: { invoice_number: "INV-003", total: "$300" } },
];
 
const results = await Promise.all(
  documents.map((doc) =>
    fetch("https://pdf4.dev/api/v1/render", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.PDF4_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(doc),
    }).then((r) => r.arrayBuffer())
  )
);

PDF conversion format comparison

FormatQualityFile sizeText selectableBest for
JPGLossy (adjustable)SmallNoPhotos, sharing, thumbnails
PNGLosslessMediumNoDiagrams, text, transparency
TIFFLosslessLargeNoArchival, professional scanning
DOCXVariableSmallYes (extracted)Editing existing documents
Plain textN/ATinyYesData extraction, indexing
PDF/ALosslessMediumYesLong-term archiving (ISO 19005)
HTMLN/AVariableYesWeb display, templating

Choosing the right tool

The right conversion tool depends on your volume, environment, and privacy requirements.

ScenarioRecommended toolWhy
One-off conversion, privacy requiredPDF4.dev browser toolsClient-side, no upload
Scripting and automationpdftoppm / LibreOffice CLIFree, reliable, batch-capable
Production application (PDF generation)PDF4.dev APIManaged browser, scales automatically
Python data pipelinepdf2image + pdfplumberPythonic, well-maintained
Node.js applicationpdfjs-dist or PDF4.dev SDKSame engine as browsers
Maximum Word fidelityLibreOffice or Microsoft GraphBest layout reconstruction

For document generation (HTML to PDF), using an API is almost always the better choice over self-hosting Playwright in production. The infrastructure overhead (Docker, concurrency, browser crashes, memory) compounds quickly as volume grows.

Privacy and security considerations

PDF conversion tools have become a significant security vector. The FBI and CISA both issued warnings in 2025 about malware distributed via fake online converters, specifically targeting file conversion search queries. Malwarebytes identified specific converter domains distributing ransomware through Google Ads.

For sensitive documents (contracts, financial records, medical files), use:

  1. Local tools: command-line tools (pdftoppm, LibreOffice) running on your own machine
  2. Client-side browser tools: tools that use WebAssembly and the Web PDF API so files never leave your device. All PDF4.dev tools work this way
  3. Trusted API providers: for programmatic use, verify that the API does not retain PDF content after processing

Avoid uploading sensitive documents to unknown online converters. When in doubt, run a quick scan of any downloaded conversion software before installing it.

PDF conversion performance benchmarks

Method10-page PDF to JPGNotes
pdftoppm (CLI)~0.5-2sFastest, C-based Poppler
pdf2image (Python)~1-3sWraps pdftoppm
pdfjs-dist (Node.js)~3-8sJavaScript rendering
Browser tool~5-15sIncludes UI overhead
PDF4.dev API (HTML to PDF)~200-500msWarm browser pool

These are approximate ranges for a typical text-heavy A4 document on modern hardware. Actual performance varies significantly by document complexity, image content, and available CPU.

Common PDF conversion errors

"Error: no such file or directory" (Poppler not installed)

pdftoppm requires Poppler to be installed. On macOS: brew install poppler. On Ubuntu: apt install poppler-utils.

Blank output pages

Usually caused by an encrypted PDF. Run the file through PDF unlock first, then retry the conversion.

Blurry images

Low DPI setting. Increase from 72 DPI (the PDF native resolution) to 150 DPI for general use or 300 DPI for print quality.

Missing fonts in Word output

LibreOffice substitutes fonts not installed on the system. For consistent output, install the required fonts before running the conversion, or accept the substitution for body text.

Page breaks in wrong places (HTML to PDF)

Add CSS page-break-inside: avoid to elements that should not be split across pages (tables, cards, section blocks). Use page-break-before: always to force a new page before an element.

Summary

PDF conversion covers two fundamentally different problems: extracting content from existing PDFs (to JPG, PNG, or Word) and generating new PDFs from structured content (HTML to PDF).

For extraction, Poppler's pdftoppm is the fastest and most reliable command-line tool. For generation, headless Chromium (via Playwright or an API) is the standard approach. For one-off conversions without privacy concerns, browser-based tools running locally are the safest and most convenient option.

The free tools on PDF4.dev handle the most common cases entirely in your browser:

For programmatic document generation in production, PDF4.dev's API removes the browser infrastructure layer while giving you full control over templates, variables, and PDF format.

Free tools mentioned:

Pdf To JpgTry it freePdf To PngTry it freeHtml To PdfTry it freeImage To PdfTry it freeCompress PdfTry it freeMerge PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.