Real-World PDF Redaction Failures That Exposed Sensitive Data (2023-2025)

2025-04-15

TL;DR

Over the last two years there have been multiple incidents where documents intended to hide sensitive information were instead released with that information readable or fully unredacted. Notable cases include state litigation documents that revealed internal corporate research, mass federal archival releases that exposed Social Security numbers, and government bodies accidentally sending unredacted records to journalists.

These failures are avoidable with simple changes to process, tooling, and verification. Below I summarise key incidents, explain common failure modes, and give a pragmatic checklist you can use to avoid repeating the same mistakes.

Selected incidents (2023-2025)

1) Trump administration release of JFK assassination records: personally identifying data exposed (March 2025)

In March 2025 the National Archives published tens of thousands of JFK-era pages with many items either released without redaction or with redactions removed.

The Washington Post reported that the release included Social Security numbers and other personal data for hundreds of living people. The disclosures prompted the Archives and other agencies to offer remediation support and drew criticism for the privacy and safety risk to people named in the files.

2) Kentucky Attorney General filing revealed TikTok internal material via redaction failure (October 2024)

A multi-state investigation into TikTok produced court filings that were heavily redacted in many jurisdictions. A Kentucky filing, however, was prepared in a way that allowed a journalist to copy/paste and read the underlying redacted text.

Kentucky Public Radio/NPR reported that internal research was exposed showing the company had studied harms to young users. Business Insider noted that this was the first large-scale public glimpse into the company's private assessments.

3) Tasmanian government accidentally sent an unredacted issues register to the ABC (July 2025)

In July 2025 the ABC reported that it received a document that was supposed to be redacted but was sent unredacted, revealing internal concerns about detaining children in watch houses.

The department quickly attempted to recall the document, but the unredacted content had already been seen and reported, creating reputational and privacy implications.

4) Repeated court-filing redaction mistakes and copy/paste defeats (ongoing, documented across 2023-2025)

Across the US and other jurisdictions courts and legal commentators have repeatedly flagged filings that looked redacted but left the underlying text intact.

For example, Above the Law highlighted multiple 2023–24 cases where black boxes were overlaid without removing the text layer. Anyone with a PDF viewer could extract the hidden text by copying or toggling layers. These failures have produced sanctions, re-filings, and public embarrassment when sensitive data was involved.

5) Broader structural failures: metadata, poor tooling, and process gaps

Security researchers and practitioners have repeatedly demonstrated that many common redaction workflows are insecure. Ars Technica documented how supposedly "blacked out" text can be recovered via metadata or OCR layers.

The same errors keep appearing because organisations use the wrong tools, rely on visual concealment, or skip verification.

What went wrong: recurring failure modes

Non-destructive "visual" redaction: black rectangles or overlay annotations that hide appearance but not the underlying text. Example.
Publishing the wrong file/version: sending an earlier unredacted draft or failing to remove tracked changes. Example from courts guidance.
Metadata and hidden content: document metadata, OCR text layers, or alternate representations retain sensitive data. Explained here.
Tooling misuse or inadequate tools: general editors instead of true redaction-capable software. US courts advisory.
Process and human error: insufficient review or lack of verification checklists. Legal commentary.

The real harms (concrete consequences)

Identity theft and fraud risk: exposed Social Security numbers can be misused (Washington Post).
Personal safety risk: revealing names of intelligence sources, victims, or witnesses can endanger lives (NPR).
Legal and regulatory fallout: sanctions, re-filings, and potential lawsuits (Above the Law).
Reputational and commercial damage: internal corporate research exposed to the public and regulators (Business Insider).

Practical checklist: how to redact safely

Start from the source text: no tracked changes or comments.
Use destructive redaction tools: software that strips text and metadata, not just overlay boxes.
Flatten and export, then test: copy/paste into a text editor to confirm nothing remains.
Strip metadata and OCR layers: use sanitization features.
Version control and access control: restrict who touches unredacted drafts.
Peer review and automation: run automated extraction checks and have a second reviewer.
Training and process documentation: staff need clear procedures and practice.

Final thoughts

Bad redaction is not an obscure edge case: it has real victims. The incidents above show two things clearly: first, releasing documents without rigorous controls can expose people and secrets; second, most exposures are preventable with a few disciplined steps.

For engineers and product people building document pipelines, treat redaction like security: destructive operations, tests, and audits, not ad-hoc black boxes.

Ready to redact documents properly? RedactMyPDF provides true content removal with complete metadata scrubbing. No account required, and free for basic use.

When it comes to sensitive information, even one mistake can have serious consequences.

Sources and further reading

← Back to Blog