Text Redaction

This guide will show you how to use regular expression (regex) patterns to remove confidential and sensitive information from PDF documents.

Redaction is a form of information editing in which image and text content are permanently removed from a PDF document.

Redaction is used to remove sensitive information from a document in order to maintain confidentiality during transmission and to comply with privacy regulations such as GDPR.

In the UniPDF SDK, redaction is accomplished in two steps:

  • Content Identification: Redact annotations are used to identify regions where content will be removed.

  • Content Removal: The redact annotations are applied, and the region within the redact annotations area is removed. Color markings indicate that the area has been redacted in the region of the removed content.

In this example, the regex patterns search for common usecases like credit card numbers and emails in the input PDF and permanently redact them. You can change the regex patterns to suit your custom needs.

Sample input

PDF to be redacted

Before you begin

You should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Project setup

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the redact folder in the unipdf-examples directory.

cd unipdf-examples/redact

Configure environment variables

Replace the UNIDOC_LICENSE_API_KEY with your API credentials from your UniCloud account.

Linux/Mac

export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

Windows

set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

How it works

Lines 9-18 import the UniPDF packages and other required dependencies.

The init function in lines 20-27 authenticates your request with your UNIDOC_LICENSE_API_KEY.

The main function in lines 29-59 validates your input and passes it as arguments to the redactText function. You can define your regex patterns in the variable patterns and initialize a RectangleProp object to draw a black rectangular box over the position of redacted text in the output PDF.

Lines 62-96 define the redactText function whch accepts the inputPath, patterns, rectprops and outputPath as arguements. The function initializes .RedactionTerm method on the patterns, applies redaction to the input PDF and save the redacted document to the outputPath.

Run the code

go run redact_text.go input.pdf output.pdf

Sample output

Redacted PDF

Got any Questions?

We're here to help you.