Text Extraction

This guide shows you how to extract text from every page in a PDF with the extractor package. You can also extract text at a specific location in a PDF.

In addition, you can extract text and its boundary information.

Sample Input

PDF Page to be extracted

Before you begin

You should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Project setup

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this example.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the extract folder in the unipdf-examples directory.

cd unipdf-examples/extract

Configure environment variables

Replace the UNIDOC_LICENSE_API_KEY with your API credentials from your UniCloud account.

Linux/Mac

export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

Windows

set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

How it works

Lines 9-16 import the UniPDF packages and other required dependencies.

Lines 18-25 authenticate your request with your UNIDOC_LICENSE_API_KEY with the init function.

The main function in lines 27-40 validate your input and passes the argument to the outputPdfText function.

Lines 43-89 create the outputPdfText function, accepting the inputPath as an argument. The extractor package extracts the text for each page in the PDF and prints the output to the terminal.

Run the code

Run this command to extract text from each page in the PDF. This will also get all the required dependencies to run the program.

go run pdf_extract_text.go input.pdf

Sample output

You will get all the text for each page in the PDF on the terminal or command line. Extracted Text from a PDF

Got any Questions?

We're here to help you.