Get started

How to edit PDF metadata (title, author, keywords)

Edit PDF metadata online for free, or automate it with pdf-lib, PyMuPDF, Ghostscript, and ExifTool. Covers document info dictionary and XMP metadata.

benoitded10 min read

PDF metadata is a set of properties stored inside a PDF file that describe the document: its title, author, subject, keywords, creation date, and the software that produced it. Editing metadata lets you fix incorrect author names, add search keywords, strip private information before sharing, or tag files for document management systems. Use the PDF4.dev edit metadata tool for a free browser-based editor, or the code examples below for batch automation in Node.js, Python, and the command line.

What PDF metadata contains

A PDF stores metadata in two places. The document info dictionary is the original format from PDF 1.0, holding eight standard fields as simple strings:

FieldDescriptionExample
TitleDocument title shown in browser tabs and search results"Q1 2026 Invoice"
AuthorPerson or organization that created the content"Acme Corp"
SubjectTopic or summary of the document"Quarterly billing"
KeywordsComma-separated search terms"invoice, billing, Q1, 2026"
CreatorApplication that authored the original content"Google Docs"
ProducerApplication that converted the file to PDF"pdf-lib 1.17.1"
CreationDateWhen the document was first created"2026-04-15T10:30:00Z"
ModDateWhen the document was last modified"2026-04-15T14:00:00Z"

The second format is XMP metadata (Extensible Metadata Platform), an XML stream embedded inside the PDF. XMP supports namespaces, arrays, and structured fields beyond what the document info dictionary allows. PDF 2.0 (ISO 32000-2) deprecates the document info dictionary in favor of XMP, but most tools still write both for backward compatibility with older readers.

In practice, if you only need Title, Author, Subject, and Keywords, writing the document info dictionary is enough. Every major PDF viewer reads it.

How to edit PDF metadata online (free, no upload)

The PDF4.dev edit metadata tool edits all eight standard metadata fields in your browser using pdf-lib. Files are processed locally and never sent to a server.

  1. Open pdf4.dev/tools/metadata-pdf and drop your PDF onto the upload area.
  2. The current metadata values appear in editable fields. Change any field you need.
  3. Click Save metadata and download the result.

The tool writes to the document info dictionary. The original page content, fonts, and images are untouched.

Metadata PdfTry it free

How to edit PDF metadata with pdf-lib (Node.js)

pdf-lib exposes setter methods for every standard metadata field. Each method accepts a string (or a Date for timestamps, or a string[] for keywords).

npm install pdf-lib
import { PDFDocument } from "pdf-lib";
import { readFileSync, writeFileSync } from "fs";
 
async function editMetadata(inputPath: string, outputPath: string) {
  const bytes = readFileSync(inputPath);
  const doc = await PDFDocument.load(bytes);
 
  doc.setTitle("Q1 2026 Invoice — Acme Corp");
  doc.setAuthor("Acme Corp");
  doc.setSubject("Quarterly billing for January through March 2026");
  doc.setKeywords(["invoice", "billing", "Q1", "2026", "acme"]);
  doc.setCreator("PDF4.dev");
  doc.setProducer("pdf-lib 1.17.1");
  doc.setCreationDate(new Date("2026-04-15T10:30:00Z"));
  doc.setModificationDate(new Date());
 
  const output = await doc.save();
  writeFileSync(outputPath, output);
  console.log(`Metadata updated: ${outputPath}`);
}
 
editMetadata("input.pdf", "output.pdf");

The setKeywords() method accepts an array of strings. pdf-lib joins them with commas internally and writes the result to the document info dictionary's /Keywords entry. To read existing metadata before overwriting, use doc.getTitle(), doc.getAuthor(), and so on.

Strip all metadata with pdf-lib

To remove identifying information before sharing a PDF externally, set every field to an empty string:

doc.setTitle("");
doc.setAuthor("");
doc.setSubject("");
doc.setKeywords([]);
doc.setCreator("");
doc.setProducer("");

This clears the document info dictionary. To also remove the XMP metadata stream, delete it from the document catalog:

const catalog = doc.context.lookup(doc.context.trailerInfo.Root);
if (catalog && catalog.has && catalog.has(PDFName.of("Metadata"))) {
  catalog.delete(PDFName.of("Metadata"));
}

How to edit PDF metadata with PyMuPDF (Python)

PyMuPDF provides doc.set_metadata() which accepts a dictionary of metadata fields and writes both the document info dictionary and the XMP stream.

pip install pymupdf
import pymupdf  # pip install pymupdf
 
def edit_metadata(input_path: str, output_path: str) -> None:
    doc = pymupdf.open(input_path)
 
    # Read existing metadata
    old = doc.metadata
    print(f"Current title: {old.get('title', '(none)')}")
    print(f"Current author: {old.get('author', '(none)')}")
 
    # Set new metadata
    doc.set_metadata({
        "title": "Q1 2026 Invoice",
        "author": "Acme Corp",
        "subject": "Quarterly billing",
        "keywords": "invoice, billing, Q1, 2026",
        "creator": "PDF4.dev",
        "producer": "PyMuPDF",
    })
 
    doc.save(output_path, garbage=4, deflate=True)
    doc.close()
    print(f"Metadata updated: {output_path}")
 
edit_metadata("input.pdf", "output.pdf")

PyMuPDF writes both the document info dictionary and the XMP metadata stream in a single set_metadata() call, keeping them in sync. The keywords field is a single comma-separated string, not an array.

To strip all metadata with PyMuPDF:

doc.set_metadata({
    "title": "", "author": "", "subject": "",
    "keywords": "", "creator": "", "producer": "",
})
doc.del_xml_metadata()  # removes the XMP stream entirely

How to edit PDF metadata with Ghostscript (command line)

Ghostscript writes metadata through the pdfmark PostScript operator. This re-distills the PDF, so the output is a new file with the metadata baked in.

gs -dNOPAUSE -dBATCH -dSAFER \
  -sDEVICE=pdfwrite \
  -sOutputFile=output.pdf \
  -c "[/Title (Q1 2026 Invoice) /Author (Acme Corp) /Subject (Quarterly billing) /Keywords (invoice, billing, Q1, 2026) /DOCINFO pdfmark" \
  -f input.pdf

Each field is a PostScript string in parentheses. The /DOCINFO pdfmark operator writes the key-value pairs into the document info dictionary.

For longer metadata, put the pdfmark commands in a separate file:

% metadata.ps
[
  /Title (Q1 2026 Invoice)
  /Author (Acme Corp)
  /Subject (Quarterly billing for January through March 2026)
  /Keywords (invoice, billing, Q1, 2026, acme)
  /Creator (PDF4.dev)
  /DOCINFO pdfmark

Then pass it after the input file:

gs -dNOPAUSE -dBATCH -dSAFER \
  -sDEVICE=pdfwrite \
  -sOutputFile=output.pdf \
  -f input.pdf metadata.ps

Ghostscript re-renders the entire PDF through its pdfwrite device. This means the output may differ from the input in stream encoding, image quality, or font subsetting. For metadata-only changes without altering the page content, use pdf-lib, PyMuPDF, or ExifTool instead.

How to edit PDF metadata with ExifTool (command line)

ExifTool by Phil Harvey is the most widely used command-line metadata editor. It reads and writes both the document info dictionary and XMP metadata in PDFs without re-rendering the page content.

# Install on macOS
brew install exiftool
 
# Install on Ubuntu/Debian
sudo apt install libimage-exiftool-perl

Set metadata fields

exiftool \
  -Title="Q1 2026 Invoice" \
  -Author="Acme Corp" \
  -Subject="Quarterly billing" \
  -Keywords="invoice, billing, Q1, 2026" \
  -Creator="PDF4.dev" \
  input.pdf

ExifTool edits the file in place and saves a backup as input.pdf_original. To skip the backup, add -overwrite_original.

Read existing metadata

exiftool input.pdf

This prints every metadata field ExifTool can find, including XMP, IPTC, and PDF-specific tags.

Strip all metadata

exiftool -all= input.pdf

The -all= flag removes every metadata tag from every metadata format in the file. This is the fastest way to sanitize a PDF for external sharing or GDPR compliance.

Batch edit an entire directory

exiftool -Author="Acme Corp" -Creator="PDF4.dev" -overwrite_original *.pdf

ExifTool processes all matching files in sequence. For recursive directory processing, add -r:

exiftool -r -Author="Acme Corp" -overwrite_original ./documents/

Metadata editing methods compared

MethodModifies page contentXMP supportBatch supportSpeed
PDF4.dev metadata toolNoNo (info dict only)NoInstant
pdf-lib (Node.js)NoManual (catalog access)Via script loopFast
PyMuPDF (Python)NoYes (auto-synced)Via script loopFast
GhostscriptYes (re-renders)Via pdfmarkVia shell loopSlow
ExifToolNoYesNative (*.pdf, -r)Fast

For one-off edits, the PDF4.dev tool or ExifTool is the fastest path. For Python automation pipelines, PyMuPDF writes both metadata formats in a single call. For Node.js workflows, pdf-lib covers the document info dictionary with clean setter methods. Avoid Ghostscript for metadata-only changes because it re-renders the entire PDF, which can alter image quality and increase processing time.

Common use cases for editing PDF metadata

PDF SEO for hosted documents. Google reads the Title field from PDF metadata and may use it as the page title in search results. According to Google's developer documentation on indexable file types, PDFs are indexed as HTML-equivalent pages. Setting a descriptive, keyword-rich Title is the single highest-impact metadata change for PDFs that are publicly hosted.

Fixing wrong author names. PDFs generated by word processors, design tools, or API services often carry the software name or the machine username as the Author field. Replacing "user@laptop" with the actual author name or company name makes the document look professional in properties panels and search results.

Adding keywords for document management. Enterprise document management systems (SharePoint, Alfresco, M-Files) index the Keywords field when ingesting PDFs. Adding consistent keywords across a batch of documents improves search recall within these systems.

Stripping metadata for privacy and GDPR compliance. The Author, Creator, and Producer fields can expose internal usernames, software versions, and machine identifiers. Under GDPR Article 5(1)(c), personal data must be limited to what is necessary. Stripping metadata before sharing PDFs externally is a common compliance step.

Batch-standardizing metadata across a document library. When merging PDFs from multiple sources, the metadata is inconsistent: different authors, different creators, different titles. A batch script that rewrites Author and Creator to a single organization name keeps the library clean.

Setting correct dates for archival. The CreationDate and ModDate fields record when the document was authored and last changed. For legal or compliance archives, setting accurate timestamps matters. pdf-lib and PyMuPDF both accept Date objects for these fields.

Summary

  • PDF metadata lives in two places: the document info dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the XMP metadata stream (XML, richer schemas).
  • For quick one-off edits, use the PDF4.dev edit metadata tool, which runs in your browser and never uploads files.
  • For Node.js automation, pdf-lib provides setTitle(), setAuthor(), setKeywords(), and other setter methods on the PDFDocument object.
  • For Python, PyMuPDF's set_metadata() writes both the document info dictionary and XMP in a single call.
  • For command-line batch processing, ExifTool is the fastest option: exiftool -Title="X" -Author="Y" *.pdf edits all files in a directory without re-rendering pages.
  • Avoid Ghostscript for metadata-only edits because it re-renders the entire PDF through its pdfwrite device.
  • Set a descriptive Title field on any publicly hosted PDF, as Google may display it as the page title in search results.

Free tools mentioned:

Metadata PdfTry it freeCompress PdfTry it freeProtect PdfTry it freeFlatten PdfTry it freeMerge PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.