Reconstruct Text from PDFs
This guide demonstrates the accuracy of extracting text from a PDF. In this case, the extractor package extracts the text from the input PDF and then reconstructs it by writing the text for each page to a new PDF with the creator package.
The Reconstruct words fom PDF example displays the position of words in the reconstructed text PDF.
Note: Only text in a PDF will be reconstructed.
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the
extract folder in the unipdf-examples directory.
Configure environment variables
UNIDOC_LICENSE_API_KEY with your API credentials from your UniCloud account.
How it works
14-22 import the UniPDF packages and other required dependencies.
24-31 authenticate your request with your
UNIDOC_LICENSE_API_KEY with the init function.
The main function in lines
33-47 validates your input and passes it as an argument to the
49-103 define the
reconstruct function, which takes the inputPath as an argument. The extractor package extracts text from each page of the PDF, reconstructs the text, and writes it page by page to the output PDF,
Run the code
Run this command to reconstruct the text in a PDF. This will also get all the required dependencies to run the program.
go run reconstruct_text.go input.pdf
You will get text per page from the input file as a PDF. The created PDF is similar to the input PDF except that the created PDF contains text only.