Get All PDF Objects

This guide will demonstrate how to get all PDF objects in a PDF document using UniPDF.

Sample Input

Sample PDF file

Before you begin

You should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the analysis folder in the unipdf-examples directory.

cd unipdf-examples/analysis

How it works

The import section in lines 9-16, imports the necessary UniPDF packages and other Go libraries. The init function loads the API key prior to running the program.

The main function in lines 31-58, gets all the PDF objects and prints it. In lines 32-41, the command line arguments are parsed to obtain inputPath and other options.

In lines 47-52 of this function, a new PdfReader is created using:

readerOpts := model.NewReaderOpts()
readerOpts.Password = opt.pdfPassword

pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, readerOpts)
if err != nil {
  return err
}
defer f.Close()

In line 53 the PrintPdfObjects function of PdfReader is called to inspect the PDF and print all objects found to the standard output. This function iterates through each object number, gets the corresponding object and prints the details of the object.

Run the code

Run the code using the following command:

go run pdf_all_objects.go input.pdf

Sample output

Input file: sample.pdf
10 PDF objects:
=========================================================
  0: 1 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Catalog, "Outlines": Ref(2 0), "Pages": Ref(3 0), )
=========================================================
  1: 2 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Outlines, "Count": 0, )
=========================================================
  2: 3 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Pages, "Count": 2, "Kids": ÄIObject:4, IObject:6Å, )
=========================================================
  3: 4 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Page, "Parent": IObject:3, "Resources": Dict("Font": Dict("F1": Ref(9 0), ), "ProcSet": Ref(8 0), ), "MediaBox": Ä0, 0, 612.000000, 792.000000Å, "Contents": Ref(5 0), )
=========================================================
  4: 5 0 *core.PdfObjectStream
Decoded:
2 J
BT
0 0 0 rg
/F1 0027 Tf
57.3750 722.2800 Td
( A Simple PDF File ) Tj
ET
BT
/F1 0010 Tf
69.2500 688.6080 Td
( This is a small demonstration .pdf file - ) Tj
ET
BT
/F1 0010 Tf
69.2500 664.7040 Td
( just for use in the Virtual Mechanics tutorials. More text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 652.7520 Td
( text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 628.8480 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 616.8960 Td
( text. And more text. Boring, zzzzz. And more text. And more text. And ) Tj
ET
BT
/F1 0010 Tf
69.2500 604.9440 Td
( more text. And more text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 592.9920 Td
( And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 569.0880 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 557.1360 Td
( text. And more text. And more text. Even more. Continued on page 2 ...) Tj
ET

=========================================================
  5: 6 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Page, "Parent": IObject:3, "Resources": Dict("Font": Dict("F1": Ref(9 0), ), "ProcSet": Ref(8 0), ), "MediaBox": Ä0, 0, 612.000000, 792.000000Å, "Contents": Ref(7 0), )
=========================================================
  6: 7 0 *core.PdfObjectStream
Decoded:
2 J
BT
0 0 0 rg
/F1 0027 Tf
57.3750 722.2800 Td
( Simple PDF File 2 ) Tj
ET
BT
/F1 0010 Tf
69.2500 688.6080 Td
( ...continued from page 1. Yet more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 676.6560 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 664.7040 Td
( text. Oh, how boring typing this stuff. But not as boring as watching ) Tj
ET
BT
/F1 0010 Tf
69.2500 652.7520 Td
( paint dry. And more text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 640.8000 Td
( Boring.  More, a little more text. The end, and just as well. ) Tj
ET

=========================================================
  7: 8 0 *core.PdfIndirectObject
*core.PdfObjectArray
ÄPDF, TextÅ
=========================================================
  8: 9 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Font, "Subtype": Type1, "Name": F1, "BaseFont": Helvetica, "Encoding": WinAnsiEncoding, )
=========================================================
  9: 10 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Creator": Rave (http://www.nevrona.com/rave), "Producer": Nevrona Designs, "CreationDate": D:20060301072826, )

Got any Questions?

We're here to help you.