![OpenAI Privacy Filter,]()
OpenAI has released Privacy Filter, an open-weight model specifically designed to detect and redact Personally Identifiable Information (PII) in unstructured text. Moving beyond traditional, rule-based pattern matching, Privacy Filter leverages deep language and context awareness to identify sensitive information that standard tools often miss.
Frontier Performance in a Compact Package
Privacy Filter is a small, efficient model (1.5B total parameters with 50M active parameters) engineered for high-throughput workflows. Because of its lightweight design, it can be run locally, allowing organizations to redact sensitive data without it ever leaving their machine or being sent to a server.
Context-Aware Detection: Unlike legacy tools that rely on deterministic regex (e.g., just looking for the "@" in an email), Privacy Filter analyzes the surrounding language to determine if a string is a private individual, a secret, or public information.
Efficient Architecture: It is a bidirectional token-classification model that labels input sequences in a single pass, supporting up to 128,000 tokens of context—making it ideal for processing long, complex documents.
High Accuracy: On the PII-Masking-300k benchmark, it achieves an F1 score of 97.43%, showing state-of-the-art capability in identifying sensitive spans.
Privacy Taxonomy & Capabilities
The model is trained to recognize eight specific categories of sensitive information:
private_person
private_address
private_email
private_phone
private_url
private_date
account_number (banking, credit cards)
secret (passwords, API keys)
Built for Production Environments
OpenAI designed Privacy Filter to be a "component in a broader privacy-by-design system." It is highly configurable, allowing developers to:
Fine-Tune: Adapt the model to specific domain-specific tasks (e.g., medical or legal docs), where it can quickly saturate accuracy on custom data distributions.
Control Precision/Recall: Tune the model's operating points based on whether a workflow requires stricter redaction or lower false positives.
Local Integration: By keeping data local, organizations can mitigate the risks associated with third-party processing.
Availability
Privacy Filter is available today under the Apache 2.0 license on Hugging Face and GitHub. OpenAI has provided extensive documentation covering its architecture, decoding controls, and known limitations to help developers integrate it safely.
This tool addresses a critical challenge: handling sensitive data in the age of agentic AI. As agents increasingly interact with logs, databases, and emails, having a high-performance, local redaction engine is essential for ensuring that personal data is stripped before it reaches your model’s context window. You can download the model and begin testing it in your local dev environments today.