Detect Scanned PDF Document
In this guide, the process of determining if a given PDF is likely scanned will be explained.
Sample Input
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the analysis
folder in the unipdf-examples
directory.
cd unipdf-examples/analysis
How it works
The import
section in lines 19-14
, imports the necessary UniPDF
packages and other libraries.
The init
function in lines 16-23
loads the metered license key before running the program.
In lines 25-37
, the main
function is defined. In this function number of command line arguments is checked in lines 26-29
. Then the for loop in lines 31-36
, iterates through each inputPath provided in the command line arguments and checks if the file is a scanned PDF document by using detectScanned(inputPath)
.
The detectScanned
function in lines 39-64
, takes the path to a file and determines whether the file is scanned or not. In this function, in lines 46
, the number of pages is obtained from the PdfReader
using pdfReader.GetNumPages()
. Then number of each object type is obtained using pdfReader.Inspect()
in line 51
. Finally, in line 57
the number of font objects is checked. If the number of font types is 0 or 1, the document doesn’t have any text objects, which means document is scanned. Otherwise, the document is not scanned.
Run the code
Run the code using the following command:
go run pdf_detect_scanned.go input.pdf
Sample output
sample.pdf (1 pages) - SCANNED!