Get All PDF Objects
This guide will demonstrate how to get all PDF objects in a PDF document using UniPDF.
Sample Input
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the analysis
folder in the unipdf-examples
directory.
cd unipdf-examples/analysis
How it works
The import
section in lines 9-17
, imports the necessary UniPDF packages and other Go libraries.
The init
function loads the API key prior to running the program.
The main
function in lines 31-49
, gets all the PDF objects and prints it. In lines 32-41
, the command line arguments are parsed to obtain inputPath
and other options. In line 44
the inspectPdf
function is used to inspect the PDF and print all objects found.
The inspectPdf
function defined lines 51-86
, takes the input file, iterates through each PDF object and prints the PDF object information to the standard output. In lines 52-59
of this function, a new PdfReader
is created using:
readerOpts := model.NewReaderOpts()
readerOpts.Password = opt.pdfPassword
pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, readerOpts)
if err != nil {
return err
}
defer f.Close()
Next, the object numbers are obtained in line 61
, using pdfReader.GetObjectNums()
.
Then the for loop in lines 64-81
, iterates through each object number, gets the corresponding object and prints the details of the object. The PDF object is obtained from the object number using pdfReader.GetIndirectObjectByNumber(objNum)
. If the object is type of PdfObjectStream
it is decoded using core.DecodeStream(stream)
in line 73
.
Run the code
Run the code using the following command:
go run pdf_all_objects.go input.pdf
Sample output
Input file: sample.pdf
10 PDF objects:
=========================================================
0: 1 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Catalog, "Outlines": Ref(2 0), "Pages": Ref(3 0), )
=========================================================
1: 2 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Outlines, "Count": 0, )
=========================================================
2: 3 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Pages, "Count": 2, "Kids": ÄIObject:4, IObject:6Å, )
=========================================================
3: 4 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Page, "Parent": IObject:3, "Resources": Dict("Font": Dict("F1": Ref(9 0), ), "ProcSet": Ref(8 0), ), "MediaBox": Ä0, 0, 612.000000, 792.000000Å, "Contents": Ref(5 0), )
=========================================================
4: 5 0 *core.PdfObjectStream
Decoded:
2 J
BT
0 0 0 rg
/F1 0027 Tf
57.3750 722.2800 Td
( A Simple PDF File ) Tj
ET
BT
/F1 0010 Tf
69.2500 688.6080 Td
( This is a small demonstration .pdf file - ) Tj
ET
BT
/F1 0010 Tf
69.2500 664.7040 Td
( just for use in the Virtual Mechanics tutorials. More text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 652.7520 Td
( text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 628.8480 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 616.8960 Td
( text. And more text. Boring, zzzzz. And more text. And more text. And ) Tj
ET
BT
/F1 0010 Tf
69.2500 604.9440 Td
( more text. And more text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 592.9920 Td
( And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 569.0880 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 557.1360 Td
( text. And more text. And more text. Even more. Continued on page 2 ...) Tj
ET
=========================================================
5: 6 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Page, "Parent": IObject:3, "Resources": Dict("Font": Dict("F1": Ref(9 0), ), "ProcSet": Ref(8 0), ), "MediaBox": Ä0, 0, 612.000000, 792.000000Å, "Contents": Ref(7 0), )
=========================================================
6: 7 0 *core.PdfObjectStream
Decoded:
2 J
BT
0 0 0 rg
/F1 0027 Tf
57.3750 722.2800 Td
( Simple PDF File 2 ) Tj
ET
BT
/F1 0010 Tf
69.2500 688.6080 Td
( ...continued from page 1. Yet more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 676.6560 Td
( And more text. And more text. And more text. And more text. And more ) Tj
ET
BT
/F1 0010 Tf
69.2500 664.7040 Td
( text. Oh, how boring typing this stuff. But not as boring as watching ) Tj
ET
BT
/F1 0010 Tf
69.2500 652.7520 Td
( paint dry. And more text. And more text. And more text. And more text. ) Tj
ET
BT
/F1 0010 Tf
69.2500 640.8000 Td
( Boring. More, a little more text. The end, and just as well. ) Tj
ET
=========================================================
7: 8 0 *core.PdfIndirectObject
*core.PdfObjectArray
ÄPDF, TextÅ
=========================================================
8: 9 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Type": Font, "Subtype": Type1, "Name": F1, "BaseFont": Helvetica, "Encoding": WinAnsiEncoding, )
=========================================================
9: 10 0 *core.PdfIndirectObject
*core.PdfObjectDictionary
Dict("Creator": Rave (http://www.nevrona.com/rave), "Producer": Nevrona Designs, "CreationDate": D:20060301072826, )