PDF metadata is a set of properties stored inside a PDF file that describe the document: its title, author, subject, keywords, creation date, and the software that produced it. Editing metadata lets you fix incorrect author names, add search keywords, strip private information before sharing, or tag files for document management systems. Use the PDF4.dev edit metadata tool for a free browser-based editor, or the code examples below for batch automation in Node.js, Python, and the command line.
What PDF metadata contains
A PDF stores metadata in two places. The document info dictionary is the original format from PDF 1.0, holding eight standard fields as simple strings:
| Field | Description | Example |
|---|---|---|
| Title | Document title shown in browser tabs and search results | "Q1 2026 Invoice" |
| Author | Person or organization that created the content | "Acme Corp" |
| Subject | Topic or summary of the document | "Quarterly billing" |
| Keywords | Comma-separated search terms | "invoice, billing, Q1, 2026" |
| Creator | Application that authored the original content | "Google Docs" |
| Producer | Application that converted the file to PDF | "pdf-lib 1.17.1" |
| CreationDate | When the document was first created | "2026-04-15T10:30:00Z" |
| ModDate | When the document was last modified | "2026-04-15T14:00:00Z" |
The second format is XMP metadata (Extensible Metadata Platform), an XML stream embedded inside the PDF. XMP supports namespaces, arrays, and structured fields beyond what the document info dictionary allows. PDF 2.0 (ISO 32000-2) deprecates the document info dictionary in favor of XMP, but most tools still write both for backward compatibility with older readers.
In practice, if you only need Title, Author, Subject, and Keywords, writing the document info dictionary is enough. Every major PDF viewer reads it.
How to edit PDF metadata online (free, no upload)
The PDF4.dev edit metadata tool edits all eight standard metadata fields in your browser using pdf-lib. Files are processed locally and never sent to a server.
- Open pdf4.dev/tools/metadata-pdf and drop your PDF onto the upload area.
- The current metadata values appear in editable fields. Change any field you need.
- Click Save metadata and download the result.
The tool writes to the document info dictionary. The original page content, fonts, and images are untouched.
Metadata PdfTry it freeHow to edit PDF metadata with pdf-lib (Node.js)
pdf-lib exposes setter methods for every standard metadata field. Each method accepts a string (or a Date for timestamps, or a string[] for keywords).
npm install pdf-libimport { PDFDocument } from "pdf-lib";
import { readFileSync, writeFileSync } from "fs";
async function editMetadata(inputPath: string, outputPath: string) {
const bytes = readFileSync(inputPath);
const doc = await PDFDocument.load(bytes);
doc.setTitle("Q1 2026 Invoice — Acme Corp");
doc.setAuthor("Acme Corp");
doc.setSubject("Quarterly billing for January through March 2026");
doc.setKeywords(["invoice", "billing", "Q1", "2026", "acme"]);
doc.setCreator("PDF4.dev");
doc.setProducer("pdf-lib 1.17.1");
doc.setCreationDate(new Date("2026-04-15T10:30:00Z"));
doc.setModificationDate(new Date());
const output = await doc.save();
writeFileSync(outputPath, output);
console.log(`Metadata updated: ${outputPath}`);
}
editMetadata("input.pdf", "output.pdf");The setKeywords() method accepts an array of strings. pdf-lib joins them with commas internally and writes the result to the document info dictionary's /Keywords entry. To read existing metadata before overwriting, use doc.getTitle(), doc.getAuthor(), and so on.
Strip all metadata with pdf-lib
To remove identifying information before sharing a PDF externally, set every field to an empty string:
doc.setTitle("");
doc.setAuthor("");
doc.setSubject("");
doc.setKeywords([]);
doc.setCreator("");
doc.setProducer("");This clears the document info dictionary. To also remove the XMP metadata stream, delete it from the document catalog:
const catalog = doc.context.lookup(doc.context.trailerInfo.Root);
if (catalog && catalog.has && catalog.has(PDFName.of("Metadata"))) {
catalog.delete(PDFName.of("Metadata"));
}How to edit PDF metadata with PyMuPDF (Python)
PyMuPDF provides doc.set_metadata() which accepts a dictionary of metadata fields and writes both the document info dictionary and the XMP stream.
pip install pymupdfimport pymupdf # pip install pymupdf
def edit_metadata(input_path: str, output_path: str) -> None:
doc = pymupdf.open(input_path)
# Read existing metadata
old = doc.metadata
print(f"Current title: {old.get('title', '(none)')}")
print(f"Current author: {old.get('author', '(none)')}")
# Set new metadata
doc.set_metadata({
"title": "Q1 2026 Invoice",
"author": "Acme Corp",
"subject": "Quarterly billing",
"keywords": "invoice, billing, Q1, 2026",
"creator": "PDF4.dev",
"producer": "PyMuPDF",
})
doc.save(output_path, garbage=4, deflate=True)
doc.close()
print(f"Metadata updated: {output_path}")
edit_metadata("input.pdf", "output.pdf")PyMuPDF writes both the document info dictionary and the XMP metadata stream in a single set_metadata() call, keeping them in sync. The keywords field is a single comma-separated string, not an array.
To strip all metadata with PyMuPDF:
doc.set_metadata({
"title": "", "author": "", "subject": "",
"keywords": "", "creator": "", "producer": "",
})
doc.del_xml_metadata() # removes the XMP stream entirelyHow to edit PDF metadata with Ghostscript (command line)
Ghostscript writes metadata through the pdfmark PostScript operator. This re-distills the PDF, so the output is a new file with the metadata baked in.
gs -dNOPAUSE -dBATCH -dSAFER \
-sDEVICE=pdfwrite \
-sOutputFile=output.pdf \
-c "[/Title (Q1 2026 Invoice) /Author (Acme Corp) /Subject (Quarterly billing) /Keywords (invoice, billing, Q1, 2026) /DOCINFO pdfmark" \
-f input.pdfEach field is a PostScript string in parentheses. The /DOCINFO pdfmark operator writes the key-value pairs into the document info dictionary.
For longer metadata, put the pdfmark commands in a separate file:
% metadata.ps
[
/Title (Q1 2026 Invoice)
/Author (Acme Corp)
/Subject (Quarterly billing for January through March 2026)
/Keywords (invoice, billing, Q1, 2026, acme)
/Creator (PDF4.dev)
/DOCINFO pdfmarkThen pass it after the input file:
gs -dNOPAUSE -dBATCH -dSAFER \
-sDEVICE=pdfwrite \
-sOutputFile=output.pdf \
-f input.pdf metadata.psGhostscript re-renders the entire PDF through its pdfwrite device. This means the output may differ from the input in stream encoding, image quality, or font subsetting. For metadata-only changes without altering the page content, use pdf-lib, PyMuPDF, or ExifTool instead.
How to edit PDF metadata with ExifTool (command line)
ExifTool by Phil Harvey is the most widely used command-line metadata editor. It reads and writes both the document info dictionary and XMP metadata in PDFs without re-rendering the page content.
# Install on macOS
brew install exiftool
# Install on Ubuntu/Debian
sudo apt install libimage-exiftool-perlSet metadata fields
exiftool \
-Title="Q1 2026 Invoice" \
-Author="Acme Corp" \
-Subject="Quarterly billing" \
-Keywords="invoice, billing, Q1, 2026" \
-Creator="PDF4.dev" \
input.pdfExifTool edits the file in place and saves a backup as input.pdf_original. To skip the backup, add -overwrite_original.
Read existing metadata
exiftool input.pdfThis prints every metadata field ExifTool can find, including XMP, IPTC, and PDF-specific tags.
Strip all metadata
exiftool -all= input.pdfThe -all= flag removes every metadata tag from every metadata format in the file. This is the fastest way to sanitize a PDF for external sharing or GDPR compliance.
Batch edit an entire directory
exiftool -Author="Acme Corp" -Creator="PDF4.dev" -overwrite_original *.pdfExifTool processes all matching files in sequence. For recursive directory processing, add -r:
exiftool -r -Author="Acme Corp" -overwrite_original ./documents/Metadata editing methods compared
| Method | Modifies page content | XMP support | Batch support | Speed |
|---|---|---|---|---|
| PDF4.dev metadata tool | No | No (info dict only) | No | Instant |
| pdf-lib (Node.js) | No | Manual (catalog access) | Via script loop | Fast |
| PyMuPDF (Python) | No | Yes (auto-synced) | Via script loop | Fast |
| Ghostscript | Yes (re-renders) | Via pdfmark | Via shell loop | Slow |
| ExifTool | No | Yes | Native (*.pdf, -r) | Fast |
For one-off edits, the PDF4.dev tool or ExifTool is the fastest path. For Python automation pipelines, PyMuPDF writes both metadata formats in a single call. For Node.js workflows, pdf-lib covers the document info dictionary with clean setter methods. Avoid Ghostscript for metadata-only changes because it re-renders the entire PDF, which can alter image quality and increase processing time.
Common use cases for editing PDF metadata
PDF SEO for hosted documents. Google reads the Title field from PDF metadata and may use it as the page title in search results. According to Google's developer documentation on indexable file types, PDFs are indexed as HTML-equivalent pages. Setting a descriptive, keyword-rich Title is the single highest-impact metadata change for PDFs that are publicly hosted.
Fixing wrong author names. PDFs generated by word processors, design tools, or API services often carry the software name or the machine username as the Author field. Replacing "user@laptop" with the actual author name or company name makes the document look professional in properties panels and search results.
Adding keywords for document management. Enterprise document management systems (SharePoint, Alfresco, M-Files) index the Keywords field when ingesting PDFs. Adding consistent keywords across a batch of documents improves search recall within these systems.
Stripping metadata for privacy and GDPR compliance. The Author, Creator, and Producer fields can expose internal usernames, software versions, and machine identifiers. Under GDPR Article 5(1)(c), personal data must be limited to what is necessary. Stripping metadata before sharing PDFs externally is a common compliance step.
Batch-standardizing metadata across a document library. When merging PDFs from multiple sources, the metadata is inconsistent: different authors, different creators, different titles. A batch script that rewrites Author and Creator to a single organization name keeps the library clean.
Setting correct dates for archival. The CreationDate and ModDate fields record when the document was authored and last changed. For legal or compliance archives, setting accurate timestamps matters. pdf-lib and PyMuPDF both accept Date objects for these fields.
Summary
- PDF metadata lives in two places: the document info dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the XMP metadata stream (XML, richer schemas).
- For quick one-off edits, use the PDF4.dev edit metadata tool, which runs in your browser and never uploads files.
- For Node.js automation, pdf-lib provides
setTitle(),setAuthor(),setKeywords(), and other setter methods on thePDFDocumentobject. - For Python, PyMuPDF's
set_metadata()writes both the document info dictionary and XMP in a single call. - For command-line batch processing, ExifTool is the fastest option:
exiftool -Title="X" -Author="Y" *.pdfedits all files in a directory without re-rendering pages. - Avoid Ghostscript for metadata-only edits because it re-renders the entire PDF through its
pdfwritedevice. - Set a descriptive Title field on any publicly hosted PDF, as Google may display it as the page title in search results.
Start generating PDFs
Build PDF templates with a visual editor. Render them via API from any language in ~300ms.


