Search
This guide will show how to do text searching using UniPDF library.
Before you begin
Before starting to follow along with this guide you should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Project setup
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the search-and-replace
folder in the unipdf-examples
directory.
cd unipdf-examples/search-and-replace
Configure environment variables
Replace the UNIDOC_LICENSE_API_KEY
with your API credentials from your UniCloud account.
Linux/Mac
export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
Windows
set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
How it works
In lines 3-29
, the code imports the necessary library and sets up the metered license key using license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
In line 31
the main function starts by checking the number of command line arguments. Line 39-53
parses the arguments and sets the pattern
, pageList
and filePath
variables. In line 56
a new PDFReader is created using model.NewPdfReaderFromFile
. The Editor
object which is used for text searching is instantiated in line 63
using editor := extractor.NewEditor(reader)
.
Then the searching operation is done by calling the search
method of the extractor.Editor
object. The method returns a map of pages and the match results obtained at each page. This map is map[int]Match
type, where the integer keys are the pages number where the matches are found on. Match
is a struct object and has Pattern
, Indexes
and Locations
fields as follows.
type Match struct {
Pattern string
Indexes [][]int
Locations []Box
}
The pattern
here is the provided search pattern. The Indexes
field contains the start
and end
indexes of each match. The Location
field is a list of rectangular coordinates where the matched text is found. This might be used for farther editing of the PDF such as highlighting or drawing rectangles around the matched content.
In line 73
, the content is printed by calling the function printSearchResults
. This printer function is defined in lines 79-112
and is used to print the results.
Run the code
Run the code using the following command
go run search_text.go <pattern> <pages> <input>
Example usage:
go run search_text.go "copyright law" "1,2" ./test-data/file1.pdf
Sample output
Page 1:
indexes: [292:305]
locations: {307.08 469.42 362.53 479.42}
Page 2:
indexes: [2459:2472], [2785:2798]
locations: {128.93 231.18 184.54 241.18}, {103.11 183.18 158.88 193.18}