What is PDF metadata?

PDF metadata is a set of key-value pairs stored inside a PDF file that describe the document itself rather than its visible content. Standard fields include Title, Author, Subject, Keywords, Creator, Producer, CreationDate, and ModDate. PDF readers display these fields in the document properties panel (File > Properties in most viewers).

What is the difference between document info dictionary and XMP metadata in a PDF?

The document info dictionary is the original PDF metadata format defined in the PDF 1.0 specification. It stores simple string fields like Title and Author in a flat dictionary. XMP (Extensible Metadata Platform) is an Adobe-designed XML format embedded as a metadata stream, supporting namespaces, arrays, and richer schemas. PDF 2.0 (ISO 32000-2) deprecates the document info dictionary in favor of XMP, but most tools still read and write both for backward compatibility.

How do I edit PDF metadata without Adobe Acrobat?

Use the PDF4.dev edit metadata tool at /tools/metadata-pdf for a free browser-based option. Files are processed locally with pdf-lib and never uploaded to a server. For programmatic use, pdf-lib (Node.js), PyMuPDF (Python), ExifTool, and Ghostscript all edit metadata from the command line or a script.

Does editing PDF metadata change the visible content?

No. Metadata fields are stored separately from the page content stream. Changing the title, author, or keywords does not alter any text, image, or layout on the pages. The file size change is typically a few hundred bytes at most.

How do I remove all metadata from a PDF for privacy?

To strip metadata, set every field to an empty string and remove the XMP metadata stream. With pdf-lib, call setTitle(""), setAuthor(""), and so on, then delete the XMP stream from the catalog. With ExifTool, run exiftool -all= file.pdf to remove all metadata tags at once. This is a common step for GDPR compliance when sharing documents externally.

Can I edit metadata on multiple PDFs at once?

Yes. ExifTool natively supports batch operations across entire directories: exiftool -Title="X" -Author="Y" *.pdf. With pdf-lib or PyMuPDF, loop over files in a script and apply the same changes to each document.

How do I add keywords to a PDF for search?

Keywords are stored in the Keywords field of the document info dictionary, typically as a comma-separated string. With pdf-lib, call doc.setKeywords(["invoice", "2026", "acme"]). With PyMuPDF, include a "keywords" key in the metadata dictionary passed to doc.set_metadata(). Search engines and document management systems index these keywords when crawling or ingesting PDFs.

Does Google index PDF metadata?

Google reads the Title field from PDF metadata and may use it as the page title in search results when the PDF has no HTML wrapper. The Author, Subject, and Keywords fields are not confirmed as direct ranking signals, but they help Google understand the document topic. Setting a descriptive Title is the single most impactful metadata change for PDF SEO.

PDF Manipulation

How to edit PDF metadata (title, author, keywords)

Edit PDF metadata online for free, or automate it with pdf-lib, PyMuPDF, Ghostscript, and ExifTool. Covers document info dictionary and XMP metadata.

benoitdedApril 22, 202610 min read

On this page

What PDF metadata contains
How to edit PDF metadata online (free, no upload)
How to edit PDF metadata with pdf-lib (Node.js)
Strip all metadata with pdf-lib
How to edit PDF metadata with PyMuPDF (Python)
How to edit PDF metadata with Ghostscript (command line)
How to edit PDF metadata with ExifTool (command line)
Set metadata fields
Read existing metadata
Strip all metadata
Batch edit an entire directory
Metadata editing methods compared
Common use cases for editing PDF metadata
Summary

PDF metadata is a set of properties stored inside a PDF file that describe the document: its title, author, subject, keywords, creation date, and the software that produced it. Editing metadata lets you fix incorrect author names, add search keywords, strip private information before sharing, or tag files for document management systems. Use the PDF4.dev edit metadata tool for a free browser-based editor, or the code examples below for batch automation in Node.js, Python, and the command line.

What PDF metadata contains

A PDF stores metadata in two places. The document info dictionary is the original format from PDF 1.0, holding eight standard fields as simple strings:

Field	Description	Example
Title	Document title shown in browser tabs and search results	"Q1 2026 Invoice"
Author	Person or organization that created the content	"Acme Corp"
Subject	Topic or summary of the document	"Quarterly billing"
Keywords	Comma-separated search terms	"invoice, billing, Q1, 2026"
Creator	Application that authored the original content	"Google Docs"
Producer	Application that converted the file to PDF	"pdf-lib 1.17.1"
CreationDate	When the document was first created	"2026-04-15T10:30:00Z"
ModDate	When the document was last modified	"2026-04-15T14:00:00Z"

The second format is XMP metadata (Extensible Metadata Platform), an XML stream embedded inside the PDF. XMP supports namespaces, arrays, and structured fields beyond what the document info dictionary allows. PDF 2.0 (ISO 32000-2) deprecates the document info dictionary in favor of XMP, but most tools still write both for backward compatibility with older readers.

In practice, if you only need Title, Author, Subject, and Keywords, writing the document info dictionary is enough. Every major PDF viewer reads it.

How to edit PDF metadata online (free, no upload)

The PDF4.dev edit metadata tool edits all eight standard metadata fields in your browser using pdf-lib. Files are processed locally and never sent to a server.

Open pdf4.dev/tools/metadata-pdf and drop your PDF onto the upload area.
The current metadata values appear in editable fields. Change any field you need.
Click Save metadata and download the result.

The tool writes to the document info dictionary. The original page content, fonts, and images are untouched.

Metadata PdfTry it free

How to edit PDF metadata with pdf-lib (Node.js)

pdf-lib exposes setter methods for every standard metadata field. Each method accepts a string (or a Date for timestamps, or a string[] for keywords).

npm install pdf-lib

import { PDFDocument } from "pdf-lib";
import { readFileSync, writeFileSync } from "fs";
 
async function editMetadata(inputPath: string, outputPath: string) {
  const bytes = readFileSync(inputPath);
  const doc = await PDFDocument.load(bytes);
 
  doc.setTitle("Q1 2026 Invoice — Acme Corp");
  doc.setAuthor("Acme Corp");
  doc.setSubject("Quarterly billing for January through March 2026");
  doc.setKeywords(["invoice", "billing", "Q1", "2026", "acme"]);
  doc.setCreator("PDF4.dev");
  doc.setProducer("pdf-lib 1.17.1");
  doc.setCreationDate(new Date("2026-04-15T10:30:00Z"));
  doc.setModificationDate(new Date());
 
  const output = await doc.save();
  writeFileSync(outputPath, output);
  console.log(`Metadata updated: ${outputPath}`);
}
 
editMetadata("input.pdf", "output.pdf");

The setKeywords() method accepts an array of strings. pdf-lib joins them with commas internally and writes the result to the document info dictionary's /Keywords entry. To read existing metadata before overwriting, use doc.getTitle(), doc.getAuthor(), and so on.

Strip all metadata with pdf-lib

To remove identifying information before sharing a PDF externally, set every field to an empty string:

doc.setTitle("");
doc.setAuthor("");
doc.setSubject("");
doc.setKeywords([]);
doc.setCreator("");
doc.setProducer("");

This clears the document info dictionary. To also remove the XMP metadata stream, delete it from the document catalog:

const catalog = doc.context.lookup(doc.context.trailerInfo.Root);
if (catalog && catalog.has && catalog.has(PDFName.of("Metadata"))) {
  catalog.delete(PDFName.of("Metadata"));
}

How to edit PDF metadata with PyMuPDF (Python)

PyMuPDF provides doc.set_metadata() which accepts a dictionary of metadata fields and writes both the document info dictionary and the XMP stream.

pip install pymupdf

import pymupdf  # pip install pymupdf
 
def edit_metadata(input_path: str, output_path: str) -> None:
    doc = pymupdf.open(input_path)
 
    # Read existing metadata
    old = doc.metadata
    print(f"Current title: {old.get('title', '(none)')}")
    print(f"Current author: {old.get('author', '(none)')}")
 
    # Set new metadata
    doc.set_metadata({
        "title": "Q1 2026 Invoice",
        "author": "Acme Corp",
        "subject": "Quarterly billing",
        "keywords": "invoice, billing, Q1, 2026",
        "creator": "PDF4.dev",
        "producer": "PyMuPDF",
    })
 
    doc.save(output_path, garbage=4, deflate=True)
    doc.close()
    print(f"Metadata updated: {output_path}")
 
edit_metadata("input.pdf", "output.pdf")

PyMuPDF writes both the document info dictionary and the XMP metadata stream in a single set_metadata() call, keeping them in sync. The keywords field is a single comma-separated string, not an array.

To strip all metadata with PyMuPDF:

doc.set_metadata({
    "title": "", "author": "", "subject": "",
    "keywords": "", "creator": "", "producer": "",
})
doc.del_xml_metadata()  # removes the XMP stream entirely

How to edit PDF metadata with Ghostscript (command line)

Ghostscript writes metadata through the pdfmark PostScript operator. This re-distills the PDF, so the output is a new file with the metadata baked in.

gs -dNOPAUSE -dBATCH -dSAFER \
  -sDEVICE=pdfwrite \
  -sOutputFile=output.pdf \
  -c "[/Title (Q1 2026 Invoice) /Author (Acme Corp) /Subject (Quarterly billing) /Keywords (invoice, billing, Q1, 2026) /DOCINFO pdfmark" \
  -f input.pdf

Each field is a PostScript string in parentheses. The /DOCINFO pdfmark operator writes the key-value pairs into the document info dictionary.

For longer metadata, put the pdfmark commands in a separate file:

% metadata.ps
[
  /Title (Q1 2026 Invoice)
  /Author (Acme Corp)
  /Subject (Quarterly billing for January through March 2026)
  /Keywords (invoice, billing, Q1, 2026, acme)
  /Creator (PDF4.dev)
  /DOCINFO pdfmark

Then pass it after the input file:

gs -dNOPAUSE -dBATCH -dSAFER \
  -sDEVICE=pdfwrite \
  -sOutputFile=output.pdf \
  -f input.pdf metadata.ps

Ghostscript re-renders the entire PDF through its pdfwrite device. This means the output may differ from the input in stream encoding, image quality, or font subsetting. For metadata-only changes without altering the page content, use pdf-lib, PyMuPDF, or ExifTool instead.

How to edit PDF metadata with ExifTool (command line)

ExifTool by Phil Harvey is the most widely used command-line metadata editor. It reads and writes both the document info dictionary and XMP metadata in PDFs without re-rendering the page content.

# Install on macOS
brew install exiftool
 
# Install on Ubuntu/Debian
sudo apt install libimage-exiftool-perl

Set metadata fields

exiftool \
  -Title="Q1 2026 Invoice" \
  -Author="Acme Corp" \
  -Subject="Quarterly billing" \
  -Keywords="invoice, billing, Q1, 2026" \
  -Creator="PDF4.dev" \
  input.pdf

ExifTool edits the file in place and saves a backup as input.pdf_original. To skip the backup, add -overwrite_original.

Read existing metadata

exiftool input.pdf

This prints every metadata field ExifTool can find, including XMP, IPTC, and PDF-specific tags.

Strip all metadata

exiftool -all= input.pdf

The -all= flag removes every metadata tag from every metadata format in the file. This is the fastest way to sanitize a PDF for external sharing or GDPR compliance.

Batch edit an entire directory

exiftool -Author="Acme Corp" -Creator="PDF4.dev" -overwrite_original *.pdf

ExifTool processes all matching files in sequence. For recursive directory processing, add -r:

exiftool -r -Author="Acme Corp" -overwrite_original ./documents/

Metadata editing methods compared

Method	Modifies page content	XMP support	Batch support	Speed
PDF4.dev metadata tool	No	No (info dict only)	No	Instant
pdf-lib (Node.js)	No	Manual (catalog access)	Via script loop	Fast
PyMuPDF (Python)	No	Yes (auto-synced)	Via script loop	Fast
Ghostscript	Yes (re-renders)	Via pdfmark	Via shell loop	Slow
ExifTool	No	Yes	Native (`*.pdf`, `-r`)	Fast

For one-off edits, the PDF4.dev tool or ExifTool is the fastest path. For Python automation pipelines, PyMuPDF writes both metadata formats in a single call. For Node.js workflows, pdf-lib covers the document info dictionary with clean setter methods. Avoid Ghostscript for metadata-only changes because it re-renders the entire PDF, which can alter image quality and increase processing time.

Common use cases for editing PDF metadata

PDF SEO for hosted documents. Google reads the Title field from PDF metadata and may use it as the page title in search results. According to Google's developer documentation on indexable file types, PDFs are indexed as HTML-equivalent pages. Setting a descriptive, keyword-rich Title is the single highest-impact metadata change for PDFs that are publicly hosted.

Fixing wrong author names. PDFs generated by word processors, design tools, or API services often carry the software name or the machine username as the Author field. Replacing "user@laptop" with the actual author name or company name makes the document look professional in properties panels and search results.

Adding keywords for document management. Enterprise document management systems (SharePoint, Alfresco, M-Files) index the Keywords field when ingesting PDFs. Adding consistent keywords across a batch of documents improves search recall within these systems.

Stripping metadata for privacy and GDPR compliance. The Author, Creator, and Producer fields can expose internal usernames, software versions, and machine identifiers. Under GDPR Article 5(1)(c), personal data must be limited to what is necessary. Stripping metadata before sharing PDFs externally is a common compliance step.

Batch-standardizing metadata across a document library. When merging PDFs from multiple sources, the metadata is inconsistent: different authors, different creators, different titles. A batch script that rewrites Author and Creator to a single organization name keeps the library clean.

Setting correct dates for archival. The CreationDate and ModDate fields record when the document was authored and last changed. For legal or compliance archives, setting accurate timestamps matters. pdf-lib and PyMuPDF both accept Date objects for these fields.

Summary

PDF metadata lives in two places: the document info dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the XMP metadata stream (XML, richer schemas).
For quick one-off edits, use the PDF4.dev edit metadata tool, which runs in your browser and never uploads files.
For Node.js automation, pdf-lib provides setTitle(), setAuthor(), setKeywords(), and other setter methods on the PDFDocument object.
For Python, PyMuPDF's set_metadata() writes both the document info dictionary and XMP in a single call.
For command-line batch processing, ExifTool is the fastest option: exiftool -Title="X" -Author="Y" *.pdf edits all files in a directory without re-rendering pages.
Avoid Ghostscript for metadata-only edits because it re-renders the entire PDF through its pdfwrite device.
Set a descriptive Title field on any publicly hosted PDF, as Google may display it as the page title in search results.