How do I extract images from a PDF for free?

To save the visible pages of a PDF as images, use a browser-based tool like PDF4.dev's free PDF to PNG converter at pdf4.dev/tools/pdf-to-png. Upload your PDF, convert, and download each page as a PNG. The conversion runs in your browser, so the file never leaves your device. To pull out the original embedded photos at full resolution, use a script with PyMuPDF (Python).

What is the difference between extracting embedded images and rendering pages to images?

Extracting embedded images pulls the original raster objects (the JPEGs and PNGs the author placed in the document) at their native resolution. Rendering pages to images rasterizes the whole page, text and graphics together, into a single picture at a resolution you choose. Use embedded extraction to recover original photos, and page rendering to get a snapshot of each page.

Can I extract images from a PDF in Python?

Yes. PyMuPDF (imported as fitz) is the most reliable option. Call page.get_images() to list the image references on a page, then doc.extract_image(xref) to get the raw bytes and file extension of each embedded image. pikepdf is a good alternative that exposes images through page.images and a PdfImage helper.

Does extracting images from a PDF reduce their quality?

No, if you extract the embedded objects. PyMuPDF and pikepdf return the exact bytes stored in the PDF, so a 4000px JPEG comes out as a 4000px JPEG with no recompression. Quality only drops when you render a page to an image, because that step rasterizes the page at a fixed DPI.

How do I extract images from a PDF in JavaScript or Node.js?

The practical path in JavaScript is to render each page to a canvas with pdfjs-dist (page.render) and export the canvas as a PNG or JPEG. True extraction of the original embedded objects is harder in pdfjs-dist than in Python, so for byte-exact embedded image recovery, use PyMuPDF or pikepdf.

How do I extract all images from a PDF at once?

Loop over every page, collect the image references, and write each one to disk. With PyMuPDF, iterate doc.page_count, call page.get_images() per page, deduplicate by xref so shared images are saved once, and write the bytes returned by doc.extract_image(). The full script is in this guide.

Can I extract images from a scanned PDF?

Yes, and it is straightforward. A scanned PDF is usually one full-page image per page, so each page has exactly one embedded image. Extracting it gives you back the original scan. If you instead want the text inside the scan, you need OCR, not image extraction.

How do I extract vector graphics (logos, charts) from a PDF?

Vector graphics are drawing instructions, not raster images, so image extraction tools skip them. To keep a logo or chart as vectors, extract the page as SVG with PyMuPDF (page.get_svg_image()). To get a raster copy, render the page or crop region to a PNG at high DPI.

Is there an API to extract images from PDFs programmatically?

PDF4.dev focuses on generating PDFs from HTML rather than parsing existing ones, so image extraction is best done with the open-source libraries in this guide (PyMuPDF, pikepdf). PDF4.dev's free PDF to PNG and PDF to JPG tools cover the page-rendering case directly in the browser.

PDF Manipulation

How to extract images from a PDF (free online and programmatic methods)

Q: Why are some extracted images distorted or wrongly colored?

The most common cause is a CMYK color space. PDFs used for print store images in CMYK, but most image formats expect RGB. Convert with a PyMuPDF Pixmap (fitz.Pixmap(fitz.csRGB, pix)) before saving. Other causes are soft masks (transparency stored as a separate object) and images split into tiles, which need to be recombined.

Extract embedded images from any PDF, free online or with Python and JavaScript. Pull out JPEGs and PNGs at full quality, or render pages to images.

benoitdedJune 27, 202610 min read

On this page

Extract embedded images vs render pages to images
How to extract images from a PDF online (free, no signup)
How to extract embedded images from a PDF in Python
PyMuPDF (recommended)
Fix CMYK and transparency before saving
pikepdf (alternative)
How to extract images from a PDF in JavaScript and Node.js
Comparison: PDF image extraction libraries
Common image extraction problems
Problem: extracted image looks inverted or wrong color
Problem: image has a black or missing background
Problem: one logical image comes out as many tiles
Problem: the picture you want is not extracted at all
Generating image-rich PDFs with PDF4.dev
Which approach should you use?
Summary

Extracting images from a PDF means pulling the picture content out of the file and saving it as standalone image files. There are two distinct operations behind that phrase: recovering the original embedded images at their native resolution, and rendering whole pages to images at a resolution you pick.

This guide covers both, with a free browser tool for page rendering and full Python and JavaScript code for embedded extraction.

Extract embedded images vs render pages to images

These are two different jobs, and picking the wrong one gives disappointing results. Extracting embedded images recovers the exact JPEG or PNG objects the author placed in the document. Rendering a page rasterizes everything on it (text, vectors, and images flattened together) into one picture.

Use embedded extraction when you want the original photos back at full quality. Use page rendering when you want a faithful snapshot of how each page looks.

Property	Extract embedded images	Render pages to images
What you get	Original photos as separate files	One image per page
Resolution	Native (whatever was embedded)	You choose the DPI
Includes text and vectors	No, images only	Yes, everything flattened
Quality loss	None (byte-exact copy)	Depends on chosen DPI
Best tool	PyMuPDF, pikepdf (Python)	pdf-to-png tool, pdfjs-dist
Typical use	Recover author photos, scans	Thumbnails, previews, page snapshots

A quick test tells you which you need. If you want the same photo that was dropped into the document, extract embedded images. If you want a screenshot of the page, render it.

The fastest way to turn PDF pages into images is PDF4.dev's free converters: PDF to PNG for lossless page snapshots and PDF to JPG for smaller files. Upload your PDF, convert, and download one image per page. Both run entirely in your browser using PDF.js, so your file never reaches a server.

Steps:

Open pdf4.dev/tools/pdf-to-png
Drop your PDF onto the upload area or click to browse
Choose the resolution if you need higher detail
Click Convert and download the images

No account is required. The free tier allows 3 conversions per week.

One thing to know: these tools render each page to an image, so the output is a picture of the whole page, not the individual embedded photos. To recover the original embedded images at their native resolution, use the Python code below. For pulling out specific pages first, the extract pages tool lets you isolate the pages you care about before converting.

How to extract embedded images from a PDF in Python

PyMuPDF (imported as fitz) is the most reliable library for embedded image extraction in 2026. It lists every image reference on a page and returns the raw bytes plus the correct file extension for each one.

PyMuPDF (recommended)

import fitz  # PyMuPDF
 
def extract_images(pdf_path: str, out_dir: str = "images") -> int:
    import os
    os.makedirs(out_dir, exist_ok=True)
 
    doc = fitz.open(pdf_path)
    seen = set()
    count = 0
 
    for page_index in range(doc.page_count):
        page = doc[page_index]
        for img in page.get_images(full=True):
            xref = img[0]
            if xref in seen:  # same image reused on several pages
                continue
            seen.add(xref)
 
            base = doc.extract_image(xref)
            image_bytes = base["image"]
            ext = base["ext"]  # "jpeg", "png", etc.
 
            with open(f"{out_dir}/img_{xref}.{ext}", "wb") as f:
                f.write(image_bytes)
            count += 1
 
    doc.close()
    return count
 
n = extract_images("document.pdf")
print(f"Extracted {n} images")

Install with: pip install pymupdf

page.get_images(full=True) returns one tuple per image reference, where img[0] is the xref (the object number inside the PDF). Deduplicating by xref matters because a logo placed on every page points to a single stored image, and you do not want fifty copies of it.

Fix CMYK and transparency before saving

doc.extract_image() returns the bytes exactly as stored, which is byte-exact but can be CMYK (print color) or carry a separate transparency mask. Convert through a Pixmap to get a clean RGB or RGBA PNG.

import fitz
 
def extract_as_png(pdf_path: str, out_dir: str = "images") -> int:
    import os
    os.makedirs(out_dir, exist_ok=True)
 
    doc = fitz.open(pdf_path)
    seen, count = set(), 0
 
    for page in doc:
        for img in page.get_images(full=True):
            xref = img[0]
            if xref in seen:
                continue
            seen.add(xref)
 
            pix = fitz.Pixmap(doc, xref)
 
            # CMYK or has alpha: normalize to RGB(A)
            if pix.n - pix.alpha >= 4:
                pix = fitz.Pixmap(fitz.csRGB, pix)
 
            pix.save(f"{out_dir}/img_{xref}.png")
            count += 1
 
    doc.close()
    return count
 
print(extract_as_png("print-ready.pdf"))

This always writes a viewable PNG, at the cost of re-encoding (so it is not byte-exact). Use the first script when you need the original file, and this one when you need a guaranteed-correct PNG.

pikepdf (alternative)

pikepdf wraps the QPDF library and exposes images through page.images and a PdfImage helper that handles extraction and color conversion for you.

import pikepdf
from pikepdf import PdfImage
 
def extract_with_pikepdf(pdf_path: str, out_dir: str = "images") -> int:
    import os
    os.makedirs(out_dir, exist_ok=True)
 
    pdf = pikepdf.open(pdf_path)
    count = 0
 
    for page_num, page in enumerate(pdf.pages, start=1):
        for name, raw in page.images.items():
            image = PdfImage(raw)
            # extract_to picks the right extension automatically
            image.extract_to(fileprefix=f"{out_dir}/p{page_num}_{name}")
            count += 1
 
    pdf.close()
    return count
 
print(extract_with_pikepdf("document.pdf"))

Install with: pip install pikepdf

pikepdf is a strong choice when you also need to inspect or edit the PDF structure, since it gives direct access to the underlying objects.

How to extract images from a PDF in JavaScript and Node.js

The practical approach in JavaScript is to render each page to a canvas with pdfjs-dist and export the canvas as a PNG or JPEG. This produces page images rather than the original embedded objects, which is enough for thumbnails, previews, and most web workflows.

import * as pdfjsLib from "pdfjs-dist";
 
pdfjsLib.GlobalWorkerOptions.workerSrc =
  "//cdnjs.cloudflare.com/ajax/libs/pdf.js/4.0.379/pdf.worker.min.mjs";
 
async function renderPagesToImages(file: File, scale = 2): Promise<Blob[]> {
  const data = await file.arrayBuffer();
  const pdf = await pdfjsLib.getDocument({ data }).promise;
  const blobs: Blob[] = [];
 
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const viewport = page.getViewport({ scale }); // scale 2 = ~144 DPI
 
    const canvas = document.createElement("canvas");
    canvas.width = viewport.width;
    canvas.height = viewport.height;
    const context = canvas.getContext("2d")!;
 
    await page.render({ canvasContext: context, viewport }).promise;
 
    const blob = await new Promise<Blob>((resolve) =>
      canvas.toBlob((b) => resolve(b!), "image/png")
    );
    blobs.push(blob);
  }
 
  return blobs;
}

Raise scale for sharper output: scale: 2 is roughly 144 DPI, scale: 3 roughly 216 DPI. Higher values produce larger files and slower rendering.

For byte-exact recovery of the original embedded objects in a Node.js pipeline, the most dependable route is to shell out to a Python step using PyMuPDF, since pdfjs-dist does not expose embedded image objects as cleanly. Note that pdf-lib, the popular JavaScript library, embeds images into new PDFs but does not extract them from existing ones.

Comparison: PDF image extraction libraries

Library	Language	Embedded extraction	Page rendering	Best for
PyMuPDF (fitz)	Python	Excellent	Excellent	Most extraction jobs
pikepdf	Python	Very good	No	Structure-aware editing
pypdf	Python	Basic	No	Simple standard PDFs
pdfjs-dist	JavaScript	Limited	Excellent	Browser previews
pdf-lib	JavaScript	No (embed only)	No	Building new PDFs

For pure embedded extraction, PyMuPDF is the default recommendation. For browser-side work where you only need page snapshots, pdfjs-dist is the right tool.

Common image extraction problems

Problem: extracted image looks inverted or wrong color

This is a CMYK color space, common in print-ready PDFs. Image formats like PNG and JPEG expect RGB, so the raw CMYK bytes display incorrectly. Convert with a PyMuPDF Pixmap (fitz.Pixmap(fitz.csRGB, pix)) before saving, as shown in the CMYK example above.

Problem: image has a black or missing background

The image uses a soft mask, where transparency is stored as a separate grayscale object. PyMuPDF can merge the base image and its mask: pass both xref and the mask reference to a Pixmap, or use page.get_image_info() to inspect the relationship before extracting.

Problem: one logical image comes out as many tiles

Some tools split large images into a grid of smaller tiles to fit memory limits. Each tile is a separate object, so naive extraction yields fragments. Detect this by checking image positions with page.get_image_info(), then stitch tiles that are adjacent on the page.

Problem: the picture you want is not extracted at all

It is probably a vector graphic (a logo or chart drawn with lines and curves), not a raster image, so image extraction skips it. To keep it as vectors, export the page to SVG with page.get_svg_image(). To get a raster copy, crop the region and render it to a PNG at high DPI.

Generating image-rich PDFs with PDF4.dev

PDF4.dev focuses on the other direction: generating clean, image-rich PDFs from HTML rather than parsing existing files. When you place an <img> in your template, Playwright embeds it as a proper image object, so the resulting PDF stays text-searchable and the images stay at the resolution you supplied.

This matters for round-trip workflows. A PDF generated from well-structured HTML extracts cleanly later, because each image is a single, correctly-encoded object rather than a flattened page raster. For details on embedding images and fonts, see the guide on adding custom fonts to a PDF and the complete HTML to PDF guide.

To verify the image objects in any PDF you generate, run it through the PDF to PNG tool for a page-level check, or the PyMuPDF script above for an object-level audit.

Which approach should you use?

Goal	Use
Recover original author photos at full quality	PyMuPDF embedded extraction
Save each page as a picture	PDF to PNG or PDF to JPG
Get images out of a scanned PDF	Embedded extraction (one image per page)
Read the text inside a scan	OCR, not image extraction
Keep a logo or chart as vectors	PyMuPDF SVG export
Generate a new image-rich PDF	PDF4.dev HTML to PDF API

Summary

Extracting images from a PDF is two jobs, not one. To recover the original embedded photos at native quality, use PyMuPDF in Python: list references with page.get_images(), then pull bytes with doc.extract_image(), deduplicating by xref and converting CMYK through a Pixmap. pikepdf is a solid alternative.

To save whole pages as images, render them: use PDF4.dev's free PDF to PNG and PDF to JPG tools in the browser, or pdfjs-dist for programmatic page rendering. Scanned PDFs extract cleanly as one image per page, but reading their text needs OCR instead.

Free tools mentioned:

Pdf To PngTry it free Pdf To JpgTry it free Extract PagesTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.

Get Started free API Docs

PDF Manipulation

How to extract text from a PDF (free online and programmatic methods)

Extract text from any PDF in seconds, free and online, or automate it with JavaScript, Python, and the PDF.js API. No signup required for the free tool.

Mar 25, 202611 min read