Inspect PDF Objects

This guide will explain the process of inspecting PDF object types using UniPDF.

Sample Input

Sample PDF file

Before you begin

You should get your API key from your UniCloud account.

If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.

Clone the project repository

In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.

git clone https://github.com/unidoc/unipdf-examples.git

Navigate to the analysis folder in the unipdf-examples directory.

cd unipdf-examples/analysis

How it works

In the above example code, the import section imports the necessary libraries. The init function defined in lines 17-24, loads the metered license key to authenticate the library request.

The main function defined in lines 26-40, uses the inspectPdf(inputPath) function to inspect the provided PDF file.

The inspectPdf function is defined in lines 42-93. This function starts by creating a new PdfReader from the provided input path. Line 49 gets the number of pages contained in the PDF file. Then in line 56, the Inspect method of the PdfReader is used to get objects types information. This method returns a map containing the frequency of each object type in the document. The keys of this map are sorted in lines 62-66. After this, the object types are printed in lines 70-72 using:

for _, key := range keys {
	fmt.Printf("- %s: %d instances\n", key, objTypes[key])
}

In lines 75-90, potentially malicious objects are identified as follows:

// Identify potentially risky content.
isMalicious := false
if count, has := objTypes["JavaScript"]; has {
  fmt.Printf("! Potentially malicious file - has %d Javascript objects\n", count)
  isMalicious = true
}
if count, has := objTypes["Flash"]; has {
  fmt.Printf("! Potentially malicious file - has %d Flash rich media objects\n", count)
  isMalicious = true
}
if count, has := objTypes["Video"]; has {
  fmt.Printf("! Potentially malicious file - has %d video objects\n", count)
  isMalicious = true
}
if !isMalicious {
  fmt.Printf("Most likely harmless - No javascript or rich media objects.\n")
}

Run the code

Run the code using the following command:

go run pdf_inspect.go input.pdf

Sample output

Input file: templates/boarding-pass/unipdf-boarding-pass.pdf
PDF Num Pages: 1
Object types:
- Catalog: 1 instances
- Font: 64 instances
- Outlines: 1 instances
- Page: 1 instances
- Pages: 1 instances
- XObject: 14 instances
Most likely harmless - No javascript or rich media objects.

Got any Questions?

We're here to help you.