Text Extraction
This guide shows you how to extract text from every page in a PDF with the extractor package. You can also extract text at a specific location in a PDF.
In addition, you can extract text and its boundary information.
Sample Input
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Project setup
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this example.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the extract
folder in the unipdf-examples directory.
cd unipdf-examples/extract
Configure environment variables
Replace the UNIDOC_LICENSE_API_KEY
with your API credentials from your UniCloud account.
Linux/Mac
export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
Windows
set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE
How it works
Lines 9-16
import the UniPDF packages and other required dependencies.
Lines 18-25
authenticate your request with your UNIDOC_LICENSE_API_KEY
with the init function.
The main function in lines 27-40
validate your input and passes the argument to the outputPdfText
function.
Lines 43-89
create the outputPdfText function, accepting the inputPath
as an argument. The extractor
package extracts the text for each page in the PDF and prints the output to the terminal.
Run the code
Run this command to extract text from each page in the PDF. This will also get all the required dependencies to run the program.
go run pdf_extract_text.go input.pdf
Sample output
You will get all the text for each page in the PDF on the terminal or command line.