Detect Scanned PDF Document
In this guide, the process of determining if a given PDF is likely scanned will be explained.
Before you begin
You should get your API key from your UniCloud account.
If this is your first time using UniPDF SDK, follow this guide to set up a local development environment.
Clone the project repository
In your terminal, clone the examples repository. It contains the Go code we will be using for this guide.
git clone https://github.com/unidoc/unipdf-examples.git
Navigate to the
analysis folder in the
How it works
import section in lines
19-14, imports the necessary
UniPDF packages and other libraries.
init function in lines
16-23 loads the metered license key before running the program.
main function is defined. In this function number of command line arguments is checked in lines
26-29. Then the for loop in lines
31-36, iterates through each inputPath provided in the command line arguments and checks if the file is a scanned PDF document by using
detectScanned function in lines
39-64, takes the path to a file and determines whether the file is scanned or not. In this function, in lines
46, the number of pages is obtained from the
pdfReader.GetNumPages(). Then number of each object type is obtained using
pdfReader.Inspect() in line
51. Finally, in line
57 the number of font objects is checked. If the number of font types is 0 or 1, the document doesn’t have any text objects, which means document is scanned. Otherwise, the document is not scanned.
Run the code
Run the code using the following command:
go run pdf_detect_scanned.go input.pdf
sample.pdf (1 pages) - SCANNED!