Get All PDF Objects

This guide will demonstrate how to get all PDF objects in a PDF document using UniPDF.

Sample Input

Sample PDF file

Before you begin

You should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the analysis folder in the unipdf-examples directory.

cd unipdf-examples/analysis

How it works

The import section in lines 9-17, imports the necessary UniPDF packages and other Go libraries. The init function loads the API key prior to running the program.

The main function in lines 31-49, gets all the PDF objects and prints it. In lines 32-41, the command line arguments are parsed to obtain inputPath and other options. In line 44 the inspectPdf function is used to inspect the PDF and print all objects found.

The inspectPdf function defined lines 51-86, takes the input file, iterates through each PDF object and prints the PDF object information to the standard output. In lines 52-59 of this function, a new PdfReader is created using:

readerOpts := model.NewReaderOpts()
readerOpts.Password = opt.pdfPassword

pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, readerOpts)
if err != nil {
  return err
}
defer f.Close()

Next, the object numbers are obtained in line 61, using pdfReader.GetObjectNums(). Then the for loop in lines 64-81, iterates through each object number, gets the corresponding object and prints the details of the object. The PDF object is obtained from the object number using pdfReader.GetIndirectObjectByNumber(objNum). If the object is type of PdfObjectStream it is decoded using core.DecodeStream(stream) in line 73.

Run the code

Run the code using the following command:

go run pdf_all_objects.go input.pdf

Sample output

Input file: sample.pdf
10 PDF objects:
=========================================================
  0: 1 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Catalog, "Outlines": Ref(2 0), "Pages": Ref(3 0), )
=========================================================
  1: 2 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Outlines, "Count": 0, )
=========================================================
  2: 3 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Pages, "Count": 2, "Kids": ÄIObject:4, IObject:6Å, )
=========================================================
  3: 4 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Page, "Parent": IObject:3, "Resources": Dict("Font": Dict("F1": Ref(9 0), ), "ProcSet": Ref(8 0), ), "MediaBox": Ä0, 0, 612.000000, 792.000000Å, "Contents": Ref(5 0), )
=========================================================
  4: 5 0 *core.PdfObjectStream
Decoded:
2 J
BT
0 0 0 rg
/F1 0027 Tf
57.3750 722.2800 Td
( A Simple PDF File ) Tj
ET
BT
/F1 0010 Tf
69.2500 688.6080 Td
( This is a small demonstration .pdf file - ) Tj
ET
BT
/F1 0010 Tf
69.2500 664.7040 Td
( just for use in the Virtual Mechanics tutorials. More text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 652.7520 Td
( text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 628.8480 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 616.8960 Td
( text. And more text. Boring, zzzzz. And more text. And more text. And ) Tj
ET
BT
/F1 0010 Tf
69.2500 604.9440 Td
( more text. And more text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 592.9920 Td
( And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 569.0880 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 557.1360 Td
( text. And more text. And more text. Even more. Continued on page 2 ...) Tj
ET

=========================================================
  5: 6 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Page, "Parent": IObject:3, "Resources": Dict("Font": Dict("F1": Ref(9 0), ), "ProcSet": Ref(8 0), ), "MediaBox": Ä0, 0, 612.000000, 792.000000Å, "Contents": Ref(7 0), )
=========================================================
  6: 7 0 *core.PdfObjectStream
Decoded:
2 J
BT
0 0 0 rg
/F1 0027 Tf
57.3750 722.2800 Td
( Simple PDF File 2 ) Tj
ET
BT
/F1 0010 Tf
69.2500 688.6080 Td
( ...continued from page 1. Yet more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 676.6560 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 664.7040 Td
( text. Oh, how boring typing this stuff. But not as boring as watching ) Tj
ET
BT
/F1 0010 Tf
69.2500 652.7520 Td
( paint dry. And more text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 640.8000 Td
( Boring.  More, a little more text. The end, and just as well. ) Tj
ET

=========================================================
  7: 8 0 *core.PdfIndirectObject
*core.PdfObjectArray
ÄPDF, TextÅ
=========================================================
  8: 9 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Font, "Subtype": Type1, "Name": F1, "BaseFont": Helvetica, "Encoding": WinAnsiEncoding, )
=========================================================
  9: 10 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Creator": Rave (http://www.nevrona.com/rave), "Producer": Nevrona Designs, "CreationDate": D:20060301072826, )

Got any Questions?

We're here to help you.