Get started
How to extract images from a PDF (free online and programmatic methods)

How to extract images from a PDF (free online and programmatic methods)

Extract embedded images from any PDF, free online or with Python and JavaScript. Pull out JPEGs and PNGs at full quality, or render pages to images.

10 min read

Extracting images from a PDF means pulling the picture content out of the file and saving it as standalone image files. There are two distinct operations behind that phrase: recovering the original embedded images at their native resolution, and rendering whole pages to images at a resolution you pick.

This guide covers both, with a free browser tool for page rendering and full Python and JavaScript code for embedded extraction.

Extract embedded images vs render pages to images

These are two different jobs, and picking the wrong one gives disappointing results. Extracting embedded images recovers the exact JPEG or PNG objects the author placed in the document. Rendering a page rasterizes everything on it (text, vectors, and images flattened together) into one picture.

Use embedded extraction when you want the original photos back at full quality. Use page rendering when you want a faithful snapshot of how each page looks.

PropertyExtract embedded imagesRender pages to images
What you getOriginal photos as separate filesOne image per page
ResolutionNative (whatever was embedded)You choose the DPI
Includes text and vectorsNo, images onlyYes, everything flattened
Quality lossNone (byte-exact copy)Depends on chosen DPI
Best toolPyMuPDF, pikepdf (Python)pdf-to-png tool, pdfjs-dist
Typical useRecover author photos, scansThumbnails, previews, page snapshots

A quick test tells you which you need. If you want the same photo that was dropped into the document, extract embedded images. If you want a screenshot of the page, render it.

How to extract images from a PDF online (free, no signup)

The fastest way to turn PDF pages into images is PDF4.dev's free converters: PDF to PNG for lossless page snapshots and PDF to JPG for smaller files. Upload your PDF, convert, and download one image per page. Both run entirely in your browser using PDF.js, so your file never reaches a server.

Steps:

  1. Open pdf4.dev/tools/pdf-to-png
  2. Drop your PDF onto the upload area or click to browse
  3. Choose the resolution if you need higher detail
  4. Click Convert and download the images

No account is required. The free tier allows 3 conversions per week.

One thing to know: these tools render each page to an image, so the output is a picture of the whole page, not the individual embedded photos. To recover the original embedded images at their native resolution, use the Python code below. For pulling out specific pages first, the extract pages tool lets you isolate the pages you care about before converting.

How to extract embedded images from a PDF in Python

PyMuPDF (imported as fitz) is the most reliable library for embedded image extraction in 2026. It lists every image reference on a page and returns the raw bytes plus the correct file extension for each one.

import fitz  # PyMuPDF
 
def extract_images(pdf_path: str, out_dir: str = "images") -> int:
    import os
    os.makedirs(out_dir, exist_ok=True)
 
    doc = fitz.open(pdf_path)
    seen = set()
    count = 0
 
    for page_index in range(doc.page_count):
        page = doc[page_index]
        for img in page.get_images(full=True):
            xref = img[0]
            if xref in seen:  # same image reused on several pages
                continue
            seen.add(xref)
 
            base = doc.extract_image(xref)
            image_bytes = base["image"]
            ext = base["ext"]  # "jpeg", "png", etc.
 
            with open(f"{out_dir}/img_{xref}.{ext}", "wb") as f:
                f.write(image_bytes)
            count += 1
 
    doc.close()
    return count
 
n = extract_images("document.pdf")
print(f"Extracted {n} images")

Install with: pip install pymupdf

page.get_images(full=True) returns one tuple per image reference, where img[0] is the xref (the object number inside the PDF). Deduplicating by xref matters because a logo placed on every page points to a single stored image, and you do not want fifty copies of it.

Fix CMYK and transparency before saving

doc.extract_image() returns the bytes exactly as stored, which is byte-exact but can be CMYK (print color) or carry a separate transparency mask. Convert through a Pixmap to get a clean RGB or RGBA PNG.

import fitz
 
def extract_as_png(pdf_path: str, out_dir: str = "images") -> int:
    import os
    os.makedirs(out_dir, exist_ok=True)
 
    doc = fitz.open(pdf_path)
    seen, count = set(), 0
 
    for page in doc:
        for img in page.get_images(full=True):
            xref = img[0]
            if xref in seen:
                continue
            seen.add(xref)
 
            pix = fitz.Pixmap(doc, xref)
 
            # CMYK or has alpha: normalize to RGB(A)
            if pix.n - pix.alpha >= 4:
                pix = fitz.Pixmap(fitz.csRGB, pix)
 
            pix.save(f"{out_dir}/img_{xref}.png")
            count += 1
 
    doc.close()
    return count
 
print(extract_as_png("print-ready.pdf"))

This always writes a viewable PNG, at the cost of re-encoding (so it is not byte-exact). Use the first script when you need the original file, and this one when you need a guaranteed-correct PNG.

pikepdf (alternative)

pikepdf wraps the QPDF library and exposes images through page.images and a PdfImage helper that handles extraction and color conversion for you.

import pikepdf
from pikepdf import PdfImage
 
def extract_with_pikepdf(pdf_path: str, out_dir: str = "images") -> int:
    import os
    os.makedirs(out_dir, exist_ok=True)
 
    pdf = pikepdf.open(pdf_path)
    count = 0
 
    for page_num, page in enumerate(pdf.pages, start=1):
        for name, raw in page.images.items():
            image = PdfImage(raw)
            # extract_to picks the right extension automatically
            image.extract_to(fileprefix=f"{out_dir}/p{page_num}_{name}")
            count += 1
 
    pdf.close()
    return count
 
print(extract_with_pikepdf("document.pdf"))

Install with: pip install pikepdf

pikepdf is a strong choice when you also need to inspect or edit the PDF structure, since it gives direct access to the underlying objects.

How to extract images from a PDF in JavaScript and Node.js

The practical approach in JavaScript is to render each page to a canvas with pdfjs-dist and export the canvas as a PNG or JPEG. This produces page images rather than the original embedded objects, which is enough for thumbnails, previews, and most web workflows.

import * as pdfjsLib from "pdfjs-dist";
 
pdfjsLib.GlobalWorkerOptions.workerSrc =
  "//cdnjs.cloudflare.com/ajax/libs/pdf.js/4.0.379/pdf.worker.min.mjs";
 
async function renderPagesToImages(file: File, scale = 2): Promise<Blob[]> {
  const data = await file.arrayBuffer();
  const pdf = await pdfjsLib.getDocument({ data }).promise;
  const blobs: Blob[] = [];
 
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const viewport = page.getViewport({ scale }); // scale 2 = ~144 DPI
 
    const canvas = document.createElement("canvas");
    canvas.width = viewport.width;
    canvas.height = viewport.height;
    const context = canvas.getContext("2d")!;
 
    await page.render({ canvasContext: context, viewport }).promise;
 
    const blob = await new Promise<Blob>((resolve) =>
      canvas.toBlob((b) => resolve(b!), "image/png")
    );
    blobs.push(blob);
  }
 
  return blobs;
}

Raise scale for sharper output: scale: 2 is roughly 144 DPI, scale: 3 roughly 216 DPI. Higher values produce larger files and slower rendering.

For byte-exact recovery of the original embedded objects in a Node.js pipeline, the most dependable route is to shell out to a Python step using PyMuPDF, since pdfjs-dist does not expose embedded image objects as cleanly. Note that pdf-lib, the popular JavaScript library, embeds images into new PDFs but does not extract them from existing ones.

Comparison: PDF image extraction libraries

LibraryLanguageEmbedded extractionPage renderingBest for
PyMuPDF (fitz)PythonExcellentExcellentMost extraction jobs
pikepdfPythonVery goodNoStructure-aware editing
pypdfPythonBasicNoSimple standard PDFs
pdfjs-distJavaScriptLimitedExcellentBrowser previews
pdf-libJavaScriptNo (embed only)NoBuilding new PDFs

For pure embedded extraction, PyMuPDF is the default recommendation. For browser-side work where you only need page snapshots, pdfjs-dist is the right tool.

Common image extraction problems

Problem: extracted image looks inverted or wrong color

This is a CMYK color space, common in print-ready PDFs. Image formats like PNG and JPEG expect RGB, so the raw CMYK bytes display incorrectly. Convert with a PyMuPDF Pixmap (fitz.Pixmap(fitz.csRGB, pix)) before saving, as shown in the CMYK example above.

Problem: image has a black or missing background

The image uses a soft mask, where transparency is stored as a separate grayscale object. PyMuPDF can merge the base image and its mask: pass both xref and the mask reference to a Pixmap, or use page.get_image_info() to inspect the relationship before extracting.

Problem: one logical image comes out as many tiles

Some tools split large images into a grid of smaller tiles to fit memory limits. Each tile is a separate object, so naive extraction yields fragments. Detect this by checking image positions with page.get_image_info(), then stitch tiles that are adjacent on the page.

Problem: the picture you want is not extracted at all

It is probably a vector graphic (a logo or chart drawn with lines and curves), not a raster image, so image extraction skips it. To keep it as vectors, export the page to SVG with page.get_svg_image(). To get a raster copy, crop the region and render it to a PNG at high DPI.

Generating image-rich PDFs with PDF4.dev

PDF4.dev focuses on the other direction: generating clean, image-rich PDFs from HTML rather than parsing existing files. When you place an <img> in your template, Playwright embeds it as a proper image object, so the resulting PDF stays text-searchable and the images stay at the resolution you supplied.

This matters for round-trip workflows. A PDF generated from well-structured HTML extracts cleanly later, because each image is a single, correctly-encoded object rather than a flattened page raster. For details on embedding images and fonts, see the guide on adding custom fonts to a PDF and the complete HTML to PDF guide.

To verify the image objects in any PDF you generate, run it through the PDF to PNG tool for a page-level check, or the PyMuPDF script above for an object-level audit.

Which approach should you use?

GoalUse
Recover original author photos at full qualityPyMuPDF embedded extraction
Save each page as a picturePDF to PNG or PDF to JPG
Get images out of a scanned PDFEmbedded extraction (one image per page)
Read the text inside a scanOCR, not image extraction
Keep a logo or chart as vectorsPyMuPDF SVG export
Generate a new image-rich PDFPDF4.dev HTML to PDF API

Summary

Extracting images from a PDF is two jobs, not one. To recover the original embedded photos at native quality, use PyMuPDF in Python: list references with page.get_images(), then pull bytes with doc.extract_image(), deduplicating by xref and converting CMYK through a Pixmap. pikepdf is a solid alternative.

To save whole pages as images, render them: use PDF4.dev's free PDF to PNG and PDF to JPG tools in the browser, or pdfjs-dist for programmatic page rendering. Scanned PDFs extract cleanly as one image per page, but reading their text needs OCR instead.

Free tools mentioned:

Pdf To PngTry it freePdf To JpgTry it freeExtract PagesTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.