Search
This guide will show how to do text searching using UniPDF library.
Before you begin
Before starting to follow along with this guide you should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Project setup
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the search-and-replace folder in the unipdf-examples directory.
cd unipdf-examples/search-and-replace
Configure environment variables
Replace the UNIDOC_LICENSE_API_KEY with your API credentials from your UniCloud account.
Linux/Mac
export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
Windows
set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
How it works
In lines 3-29, the code imports the necessary library and sets up the metered license key using license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
In line 31 the main function starts by checking the number of command line arguments. Line 39-53 parses the arguments and sets the pattern, pageList and filePath variables. In line 56 a new PDFReader is created using model.NewPdfReaderFromFile. The Editor object which is used for text searching is instantiated in line 63 using editor := extractor.NewEditor(reader).
Then the searching operation is done by calling the search method of the extractor.Editor object. The method returns a map of pages and the match results obtained at each page. This map is map[int]Match type, where the integer keys are the pages number where the matches are found on. Match is a struct object and has Pattern, Indexes and Locations fields as follows.
type Match struct {
Pattern string
Indexes [][]int
Locations []Box
}
The pattern here is the provided search pattern. The Indexes field contains the start and end indexes of each match. The Location field is a list of rectangular coordinates where the matched text is found. This might be used for farther editing of the PDF such as highlighting or drawing rectangles around the matched content.
In line 73, the content is printed by calling the function printSearchResults. This printer function is defined in lines 79-112 and is used to print the results.
Run the code
Run the code using the following command
go run search_text.go <pattern> <pages> <input>
Example usage:
go run search_text.go "copyright law" "1,2" ./test-data/file1.pdf
Sample output
Page 1:
indexes: [292:305]
locations: {307.08 469.42 362.53 479.42}
Page 2:
indexes: [2459:2472], [2785:2798]
locations: {128.93 231.18 184.54 241.18}, {103.11 183.18 158.88 193.18}