Omitly is launching soon — join the waitlist →

Redaction guide · why "ask the AI to redact it" backfires

Don't ask an LLM to redact your PDF.

It feels fast. But a chat model can't reliably remove data from a PDF, misses instances, changes its answer every run — and to even try, you've handed a confidential document to a third party.

Why it doesn't work

It misses instances

An LLM has no guarantee of finding every name, account number or identifier in a long document. Miss one and the redaction has failed — and you won't know which one it missed.

It can't actually remove the data

A chat model produces text or instructions; it does not rewrite a PDF's content streams. Whatever you get back can still carry the original data underneath, in metadata, or in the page objects.

It's non-deterministic

Run the same prompt twice and you can get two different results. Redaction needs to be exact and repeatable — not a best-effort guess that changes each time.

There's no proof

There's no audit log, no verification pass, no defensible record that a specific region was removed. For legal, compliance or healthcare, the proof is the point.

The cost isn't the tokens. It's the breach.

Sending a document to an LLM means it leaves your control — transmitted, possibly logged or retained by a third party. For privileged, HIPAA, GDPR or contractually sensitive material, that alone can be the violation.

And if the redaction is incomplete or recoverable, the real bill arrives later: breach notification, sanctions, malpractice exposure, a waived privilege. Any one of those dwarfs the price of doing it properly.

// "redact this" → LLM

upload(confidential.pdf) // left your control ✗

output = guess() // misses + non-deterministic ✗

// omitly (local)

remove(data); verify(gone) // ✓ audited, on-device

Redaction should be deterministic, local and provable.

Omitly removes the underlying text and image data from the PDF itself, verifies nothing survives in any redacted region, and writes a signed audit log — entirely on your machine. No upload, no guessing, no third party.

Frequently asked questions

Can ChatGPT redact a PDF for me?

Not safely. A language model can suggest what to redact, but it cannot reliably remove the underlying text and image data from the PDF file, it can miss instances, and its output is non-deterministic. The original data often remains recoverable, and you've uploaded the document to a third party to get there.

What's the real cost of using an LLM to redact confidential documents?

Two costs. First, confidentiality: you send the document to a third-party service, where it may be transmitted, logged or retained outside your control — a problem for privileged, HIPAA, GDPR or contractually sensitive material. Second, the cost of failure: a single missed or recoverable redaction can mean breach notification, sanctions, malpractice exposure or a privilege waiver, which dwarfs any tooling cost.

Why is non-deterministic redaction a problem?

Redaction has to be exact and repeatable. If the same input can produce different outputs, you cannot rely on it, audit it, or defend it. Deterministic, rule-based removal with a verification pass is what gives you a result you can stand behind.

What should I use instead of an LLM to redact a PDF?

A deterministic, local tool that removes the underlying data from the PDF structure and verifies it is gone. Omitly does this entirely on your machine — nothing is uploaded — and produces a signed audit log of every redacted region.