What is PDF redaction?

PDF redaction is the permanent removal of sensitive text, images, or metadata from a PDF file. True redaction replaces content with black rectangles and removes the underlying data so it cannot be recovered. Simply drawing a black box on top of text does not redact it — the original data remains in the file.

Is highlighting text in black the same as redacting it?

No. Placing a black highlight or annotation on top of text leaves the original text in the PDF file. Anyone can remove the annotation layer or copy the text from the file using a PDF reader or command-line tool. True redaction permanently deletes the underlying content.

How do I redact a PDF for free?

PDF4.dev offers a free browser-based redaction tool at pdf4.dev/tools/redact-pdf that runs entirely in your browser. No file is uploaded to any server. Draw over the content you want to remove, and the tool permanently erases it from the file using pdf-lib.

Can redacted text in a PDF be recovered?

If redaction is done correctly (the underlying text data is removed, not just covered), the content cannot be recovered. However, if redaction was performed by adding an opaque annotation or image layer, the original text may still be extractable. Always verify redaction by opening the file in a text editor or running a text extraction check.

Does PDF redaction remove metadata too?

Basic redaction tools only remove visible content. Metadata (author, creation date, revision history, embedded comments) may still contain sensitive information. For a complete sanitization, also clear PDF metadata using a tool like pdf4.dev/tools/metadata-pdf after redacting content.

What is the difference between redaction and deletion?

Deleting a page removes the entire page from the PDF. Redaction removes specific content (words, phone numbers, images, signatures) from a page while keeping the rest. Use deletion when an entire page is sensitive, and redaction when only parts of a page need to be removed.

Is it legal to redact documents under GDPR?

Under GDPR Article 17 (right to erasure), organizations may be required to remove personal data from documents before sharing or archiving them. Redaction is the standard method for complying with this requirement in PDF documents. The redaction must permanently remove the data, not just visually obscure it.

How do I redact a PDF programmatically?

Use a PDF manipulation library like pdf-lib (JavaScript) or PyMuPDF (Python) to draw filled rectangles over sensitive regions, then flatten the page to remove the underlying content. For a production pipeline, you can also use the PDF4.dev API to render a pre-redacted HTML template directly as a PDF without embedding sensitive data in the first place.

What formats of content can be redacted from a PDF?

Modern PDF redaction tools can remove text, images, vector graphics, form field values, annotations, and metadata. The PDF4.dev browser tool handles text and image regions. For complex documents with embedded fonts or form data, a library like PyMuPDF or a dedicated redaction workflow is recommended.

Does redacting a PDF reduce its file size?

It depends on the tool. Some tools replace redacted regions with white rectangles (smaller), others with black-filled vector shapes (similar size). Removing images through redaction can significantly reduce file size. To optimize size further after redacting, run the PDF through a compression tool.

PDF Security

How to redact a PDF permanently (and why highlighting isn't enough)

Learn how to permanently redact sensitive text and images in PDFs. Covers free browser tools, Python scripts, and why simple highlighting fails security audits.

benoitdedApril 1, 20269 min read

On this page

What redaction actually means
What stays in the file if you only draw over it
How to redact a PDF in the browser (free, no upload)
Redacting a PDF programmatically with Python
Install PyMuPDF
Redact specific text
Redact by regex pattern (phone numbers, emails, SSNs)
Remove PDF metadata after redaction
Redacting with JavaScript and pdf-lib
Preventing the problem at generation time
Redaction checklist for regulated environments
How redaction compares to other PDF security methods
Summary

PDF redaction permanently removes sensitive content from a document. The most common mistake is placing a black rectangle on top of text — this hides the text visually but leaves the original data fully accessible in the file. This guide explains what true redaction is, how to do it correctly, and when a programmatic approach is the right choice.

What redaction actually means

Redaction is the permanent deletion of content from a PDF file. A properly redacted PDF has the sensitive content removed from the file's data structures, not just hidden behind an opaque overlay. The resulting file should be indistinguishable from a document that never contained the redacted content.

The PDF specification stores text as content streams, images as compressed binary objects, and annotations as a separate layer. A black highlight is an annotation — it sits on top of the content layer and can be removed by any user who opens the annotation panel in Acrobat, Preview, or a PDF editor.

What stays in the file if you only draw over it

When you draw a black box without true redaction, the original data remains:

Method	Text removable?	Image removable?	Copy-paste blocked?
Black highlight annotation	No	No	No
Black image overlay	No	No	No
Comment/markup box	No	No	No
True redaction (pdf-lib, PyMuPDF)	Yes	Yes	Yes

A simple test: open any "redacted" PDF in a text editor or run pdftotext file.pdf - in your terminal. If the sensitive words appear in the output, the redaction is incomplete.

How to redact a PDF in the browser (free, no upload)

PDF4.dev's redact PDF tool runs entirely in your browser using pdf-lib. No file leaves your device.

Open pdf4.dev/tools/redact-pdf.
Drop your PDF onto the upload area.
Draw rectangles over the content you want to remove. Each rectangle becomes a permanent black block.
Click "Apply redactions" to flatten the changes into the file.
Download the result.

The tool renders each page using PDF.js, overlays your drawn rectangles as vector shapes, and then uses pdf-lib to permanently replace those regions with opaque black rectangles in the file's content stream. The original text data under each rectangle is not recoverable.

After downloading your redacted file, verify the result: open it, select all text (Cmd+A / Ctrl+A), and check whether any redacted content is selectable. If it is, the redaction did not work correctly.

Redacting a PDF programmatically with Python

For batch redaction or automated pipelines, PyMuPDF (also available as pymupdf on PyPI) is the most capable Python library for this task. It supports text search, image removal, and metadata scrubbing.

Install PyMuPDF

pip install pymupdf

Redact specific text

This script searches for a pattern (e.g., a Social Security number) and redacts every occurrence across all pages:

import fitz  # PyMuPDF
 
def redact_text_in_pdf(input_path: str, output_path: str, search_term: str) -> int:
    doc = fitz.open(input_path)
    total_redactions = 0
 
    for page in doc:
        # Find all instances of the search term
        instances = page.search_for(search_term)
        for rect in instances:
            # Add a redaction annotation with a black fill
            page.add_redact_annot(rect, fill=(0, 0, 0))
            total_redactions += 1
 
        # Apply redactions — this permanently removes the underlying text
        page.apply_redactions()
 
    doc.save(output_path, garbage=4, deflate=True)
    doc.close()
    return total_redactions
 
count = redact_text_in_pdf("contract.pdf", "contract_redacted.pdf", "John Smith")
print(f"Redacted {count} instance(s)")

The key step is page.apply_redactions(). Without it, the annotations are added but the underlying content is not removed. The garbage=4 flag in doc.save() removes orphaned objects and compresses the output.

Redact by regex pattern (phone numbers, emails, SSNs)

import fitz
import re
 
PATTERNS = {
    "ssn": r"\b\d{3}-\d{2}-\d{4}\b",
    "email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
    "phone": r"\b(\+\d{1,3}[\s.-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}\b",
}
 
def redact_patterns(input_path: str, output_path: str) -> dict:
    doc = fitz.open(input_path)
    counts = {k: 0 for k in PATTERNS}
 
    for page in doc:
        # Extract text with position data
        words = page.get_text("words")  # returns (x0, y0, x1, y1, word, ...)
        full_text = page.get_text("text")
 
        for pattern_name, pattern in PATTERNS.items():
            for match in re.finditer(pattern, full_text):
                # Search for the matched string on the page
                instances = page.search_for(match.group())
                for rect in instances:
                    page.add_redact_annot(rect, fill=(0, 0, 0))
                    counts[pattern_name] += 1
 
        page.apply_redactions()
 
    doc.save(output_path, garbage=4, deflate=True)
    doc.close()
    return counts
 
result = redact_patterns("document.pdf", "document_redacted.pdf")
print(result)
# {'ssn': 3, 'email': 7, 'phone': 2}

For large-scale document processing, run redaction as part of your document ingestion pipeline before storing or sharing files. Redacting at generation time (before the document is ever stored with sensitive data) is more reliable than redacting after the fact.

Remove PDF metadata after redaction

Visible content is only part of the problem. PDF metadata can include the original author's name, revision history, and document title. Clear it after redacting:

import fitz
 
def scrub_metadata(input_path: str, output_path: str) -> None:
    doc = fitz.open(input_path)
    doc.set_metadata({
        "author": "",
        "creator": "",
        "producer": "",
        "subject": "",
        "title": "",
        "keywords": "",
        "creationDate": "",
        "modDate": "",
    })
    doc.save(output_path, garbage=4, deflate=True, clean=True)
    doc.close()

Alternatively, use the PDF4.dev metadata editor tool to clear metadata in the browser without writing code.

Redacting with JavaScript and pdf-lib

For Node.js environments, pdf-lib can draw filled rectangles over specific regions. This approach requires knowing the coordinates of the content to redact — it does not support text search.

import { PDFDocument, rgb, degrees } from "pdf-lib";
import fs from "fs";
 
interface RedactionRect {
  page: number; // 0-indexed
  x: number;
  y: number;
  width: number;
  height: number;
}
 
async function redactPdf(
  inputPath: string,
  outputPath: string,
  redactions: RedactionRect[]
): Promise<void> {
  const pdfBytes = fs.readFileSync(inputPath);
  const pdfDoc = await PDFDocument.load(pdfBytes);
  const pages = pdfDoc.getPages();
 
  for (const rect of redactions) {
    const page = pages[rect.page];
    const { height: pageHeight } = page.getSize();
 
    // PDF coordinate system: origin is bottom-left
    // Convert from top-left coordinates to bottom-left
    page.drawRectangle({
      x: rect.x,
      y: pageHeight - rect.y - rect.height,
      width: rect.width,
      height: rect.height,
      color: rgb(0, 0, 0),
      opacity: 1,
    });
  }
 
  const outputBytes = await pdfDoc.save();
  fs.writeFileSync(outputPath, outputBytes);
}
 
// Example: redact a 200x20 rectangle on page 0, 50px from top, 100px from left
await redactPdf("input.pdf", "output.pdf", [
  { page: 0, x: 100, y: 50, width: 200, height: 20 },
]);

pdf-lib draws over content but does not remove the underlying text from the content stream in all cases. For PDFs where text must be cryptographically unrecoverable, use PyMuPDF with apply_redactions() or a dedicated redaction service. The PDF4.dev browser tool uses pdf-lib for client-side convenience; for regulated industries (HIPAA, GDPR, legal discovery), use PyMuPDF or a dedicated solution.

Preventing the problem at generation time

If you generate PDFs programmatically (invoices, reports, contracts), the most secure approach is to never embed sensitive data in the final PDF in the first place. Design your template to exclude or abbreviate sensitive fields:

// Instead of embedding a full credit card number in the PDF
const invoiceData = {
  customer_name: "Jane Doe",
  card_number: "4111111111111111", // ❌ don't embed this
};
 
// Mask it before passing to the PDF template
const invoiceData = {
  customer_name: "Jane Doe",
  card_last_four: "1111", // ✅ only embed what's needed
};
 
const response = await fetch("https://pdf4.dev/api/v1/render", {
  method: "POST",
  headers: {
    Authorization: "Bearer p4_live_xxx",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    template_id: "invoice",
    data: invoiceData,
  }),
});

This eliminates the need for post-generation redaction entirely. See the guide on generating PDF invoices programmatically for a full example.

Redaction checklist for regulated environments

For documents subject to GDPR, HIPAA, CCPA, or legal discovery, use this checklist before sharing a redacted PDF:

Step	Action	Tool
1	Redact visible text and images	PyMuPDF `apply_redactions()` or PDF4.dev tool
2	Clear document metadata (author, title, dates)	PyMuPDF `set_metadata()` or PDF4.dev metadata tool
3	Remove embedded attachments	PyMuPDF `embfile_del()` if applicable
4	Remove JavaScript actions	Manual review or specialized tool
5	Verify: extract text and confirm no sensitive content remains	`pdftotext file.pdf -` or PyMuPDF `get_text()`
6	Verify: check metadata is cleared	`exiftool file.pdf` or PDF properties dialog
7	Password-protect if distributing externally	PDF4.dev protect PDF tool

Step 5 is the most frequently skipped. Run it every time.

How redaction compares to other PDF security methods

Method	What it does	Reversible?	Use case
Redaction	Permanently removes content	No	Sharing with third parties
Password protection	Restricts access to the file	Yes (if password known)	Controlling who opens the file
Watermark	Marks the document as confidential	Yes	Deterrence, not removal
Permission restrictions	Blocks printing, copying, editing	Partial	Reducing accidental sharing
Page deletion	Removes entire pages	No	Removing wholly sensitive pages

Redaction and password protection solve different problems. Redaction removes the sensitive content; password protection controls who can read the remaining content. For the highest security, apply both. See how to password-protect a PDF for the second step.

Summary

Permanent PDF redaction requires removing content from the file's data structures, not just drawing over it. For browser-based redaction of individual files, the PDF4.dev redact tool handles this without uploading your document. For programmatic pipelines, PyMuPDF's apply_redactions() is the most reliable option for ensuring text cannot be recovered. After redacting, always verify by extracting text from the output file, and clear metadata as a second step.

For teams generating documents programmatically, the most secure approach is designing templates that never include sensitive data beyond what needs to appear in the final document — eliminating the redaction step entirely.

Related tools: redact PDF · protect PDF · edit PDF metadata · watermark PDF

Free tools mentioned:

Redact PdfTry it free Protect PdfTry it free Watermark PdfTry it free

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.

Get Started free API Docs

PDF Security

How to password protect a PDF (free, no Adobe needed)

Add a password to any PDF in seconds using free browser-based tools, Adobe Acrobat, or the command line. Files never leave your device.

Mar 10, 20268 min read

PDF Security

How to remove a password from a PDF (free, no Adobe needed)

Remove a PDF password instantly in your browser or with qpdf on the command line. Files stay on your device. Works on Mac, Windows, and mobile.

Mar 16, 202611 min read

PDF Manipulation

How to add a watermark to a PDF (free, no upload needed)

Add text watermarks to any PDF for free in your browser. Interactive designer, code examples with pdf-lib and Python, plus API automation.

Mar 16, 202611 min read

Start generating PDFs

Related Articles

How to password protect a PDF (free, no Adobe needed)

How to remove a password from a PDF (free, no Adobe needed)

How to add a watermark to a PDF (free, no upload needed)