Text Redaction
This guide will show you how to use regular expression (regex) patterns to remove confidential and sensitive information from PDF documents.
Redaction is a form of information editing in which image and text content are permanently removed from a PDF document.
Redaction is used to remove sensitive information from a document in order to maintain confidentiality during transmission and to comply with privacy regulations such as GDPR.
In the UniPDF SDK, redaction is accomplished in two steps:
Content Identification: Redact annotations are used to identify regions where content will be removed.
Content Removal: The redact annotations are applied, and the region within the redact annotations area is removed. Color markings indicate that the area has been redacted in the region of the removed content.
In this example, the regex patterns search for common usecases like credit card numbers and emails in the input PDF and permanently redact them. You can change the regex patterns to suit your custom needs.
Sample input
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Project setup
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the redact
folder in the unipdf-examples directory.
cd unipdf-examples/redact
Configure environment variables
Replace the UNIDOC_LICENSE_API_KEY
with your API credentials from your UniCloud account.
Linux/Mac
export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
Windows
set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
How it works
Lines 9-18
import the UniPDF packages and other required dependencies.
The init function in lines 20-27
authenticates your request with your UNIDOC_LICENSE_API_KEY
.
The main function in lines 29-59
validates your input and passes it as arguments to the redactText
function. You can define your regex patterns in the variable patterns
and initialize a RectangleProp
object to draw a black rectangular box over the position of redacted text in the output PDF.
Lines 62-96
define the redactText
function whch accepts the inputPath
, patterns
, rectprops
and outputPath
as arguements. The function initializes .RedactionTerm
method on the patterns, applies redaction to the input PDF and save the redacted document to the outputPath.
Run the code
go run redact_text.go input.pdf output.pdf