Table Extraction

This guide shows you how to extract tables from one or more PDFs and save the tables to a csv file. It is important to know that PDF does not store tables as tables; rather, text is stored as characters (glyphs) with their own positions, and lines are drawn as individual shapes.

UniPDF extracts tables from PDF documents by detecting lines, gaps, text proximity, and other properties.

Sample input

Tables to be extracted from a PDF

Before you begin

You should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Project setup

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this example.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the extract folder in the unipdf-examples directory.

cd unipdf-examples/extract

Configure environment variables

Replace the UNIDOC_LICENSE_API_KEY with your API credentials from your UniCloud account.

Linux/Mac

export UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

Windows

set UNIDOC_LICENSE_API_KEY=PUT_YOUR_API_KEY_HERE

How it works

Lines 9-32 import the UniPDF packages and other required dependencies.

Lines 34-41 authenticate your request with your UNIDOC_LICENSE_API_KEY with the init function.

The main function in lines 47-124 validates your input and passes it as arguments to the extractTables function.

Lines 126-459 create the extractTables function, accepting the inputPath and firstPage and lastPage as optional arguments. The extractor package extracts tables from each page in the PDF and save each table to a csv file.

Run the code

Run this command to extract tables from the PDF. This will also get all the required dependencies to run the program.

#run this to extract tables from all the pages in a PDF
go run pdf_tables.go input.pdf

#run this to customize the output
go run pdf_tables.go [options] <file1.pdf> <file2.pdf> ...

Sample output

You will get tables for each page in the PDF saved in the outcsv folder.

Extracted Tables

Got any Questions?

We're here to help you.