Question 1

Can ChatGPT redact a PDF for me?

Accepted Answer

Not safely. A language model can suggest what to redact, but it cannot reliably remove the underlying text and image data from the PDF file, it can miss instances, and its output is non-deterministic. The original data often remains recoverable, and you've uploaded the document to a third party to get there.

Question 2

What's the real cost of using an LLM to redact confidential documents?

Accepted Answer

Two costs. First, confidentiality: you send the document to a third-party service, where it may be transmitted, logged or retained outside your control — a problem for privileged, HIPAA, GDPR or contractually sensitive material. Second, the cost of failure: a single missed or recoverable redaction can mean breach notification, sanctions, malpractice exposure or a privilege waiver, which dwarfs any tooling cost.

Question 3

Why is non-deterministic redaction a problem?

Accepted Answer

Redaction has to be exact and repeatable. If the same input can produce different outputs, you cannot rely on it, audit it, or defend it. Deterministic, rule-based removal with a verification pass is what gives you a result you can stand behind.

Question 4

What should I use instead of an LLM to redact a PDF?

Accepted Answer

A deterministic, local tool that removes the underlying data from the PDF structure and verifies it is gone. Omitly does this entirely on your machine — nothing is uploaded — and produces a signed audit log of every redacted region.

Don't ask an LLM to redact your PDF.

Why it doesn't work

It misses instances

It can't actually remove the data

It's non-deterministic

There's no proof

The cost isn't the tokens. It's the breach.

Redaction should be deterministic, local and provable.

Frequently asked questions