Search

This guide will show how to do text searching using UniPDF library.

Before you begin

Before starting to follow along with this guide you should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Project setup

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the search-and-replace folder in the unipdf-examples directory.

cd unipdf-examples/search-and-replace

Configure environment variables

Replace the UNIDOC_LICENSE_API_KEY with your API credentials from your UniCloud account.

Linux/Mac

export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

Windows

set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

How it works

In lines 3-29, the code imports the necessary library and sets up the metered license key using license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))

In line 31 the main function starts by checking the number of command line arguments. Line 39-53 parses the arguments and sets the pattern, pageList and filePath variables. In line 56 a new PDFReader is created using model.NewPdfReaderFromFile. The Editor object which is used for text searching is instantiated in line 63 using editor := extractor.NewEditor(reader).

Then the searching operation is done by calling the search method of the extractor.Editor object. The method returns a map of pages and the match results obtained at each page. This map is map[int]Match type, where the integer keys are the pages number where the matches are found on. Match is a struct object and has Pattern, Indexes and Locations fields as follows.

type Match struct {
	Pattern   string
	Indexes   [][]int
	Locations []Box
}

The pattern here is the provided search pattern. The Indexes field contains the start and end indexes of each match. The Location field is a list of rectangular coordinates where the matched text is found. This might be used for farther editing of the PDF such as highlighting or drawing rectangles around the matched content.

In line 73, the content is printed by calling the function printSearchResults. This printer function is defined in lines 79-112 and is used to print the results.

Run the code

Run the code using the following command

go run search_text.go <pattern> <pages> <input>

Example usage:

go run search_text.go "copyright law" "1,2" ./test-data/file1.pdf

Sample output

Page 1:
indexes: [292:305]
locations: {307.08 469.42 362.53 479.42}

Page 2:
indexes: [2459:2472], [2785:2798]
locations: {128.93 231.18 184.54 241.18}, {103.11 183.18 158.88 193.18}

Got any Questions?

We're here to help you.