Inspect PDF Objects
This guide will explain the process of inspecting PDF object types using UniPDF.
Sample Input
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the analysis
folder in the unipdf-examples
directory.
cd unipdf-examples/analysis
How it works
In the above example code, the import
section imports the necessary libraries.
The init
function defined in lines 17-24
, loads the metered license key to authenticate the library request.
The main
function defined in lines 26-40
, uses the inspectPdf(inputPath)
function to inspect the provided PDF file.
The inspectPdf
function is defined in lines 42-93
. This function starts by creating a new PdfReader
from the provided input path. Line 49
gets the number of pages contained in the PDF file. Then in line 56
, the Inspect
method of the PdfReader
is used to get objects types information. This method returns a map containing the frequency of each object type in the document. The keys of this map are sorted in lines 62-66
. After this, the object types are printed in lines 70-72
using:
for _, key := range keys {
fmt.Printf("- %s: %d instances\n", key, objTypes[key])
}
In lines 75-90
, potentially malicious objects are identified as follows:
// Identify potentially risky content.
isMalicious := false
if count, has := objTypes["JavaScript"]; has {
fmt.Printf("! Potentially malicious file - has %d Javascript objects\n", count)
isMalicious = true
}
if count, has := objTypes["Flash"]; has {
fmt.Printf("! Potentially malicious file - has %d Flash rich media objects\n", count)
isMalicious = true
}
if count, has := objTypes["Video"]; has {
fmt.Printf("! Potentially malicious file - has %d video objects\n", count)
isMalicious = true
}
if !isMalicious {
fmt.Printf("Most likely harmless - No javascript or rich media objects.\n")
}
Run the code
Run the code using the following command:
go run pdf_inspect.go input.pdf
Sample output
Input file: templates/boarding-pass/unipdf-boarding-pass.pdf
PDF Num Pages: 1
Object types:
- Catalog: 1 instances
- Font: 64 instances
- Outlines: 1 instances
- Page: 1 instances
- Pages: 1 instances
- XObject: 14 instances
Most likely harmless - No javascript or rich media objects.