V4 Migration Guide

A new major version of UniPDF (v4) was released on May 11, 2025, and is now available here. This release brings significant improvements and new features, but also introduces breaking changes that may affect existing codebases.

This guide is going to help you smoothly migrate to UniPDF v4. You will find step by step instructions for updating your code, explanations of deprecated APIs and their replacements, and an overview of the key enhancements included in this release.

New Features

1. Text Extraction Enhancements

In UniPDF v4, text extraction has been significantly improved to deliver more accurate extraction, and to fix earlier bugs and issues. The new extractor package contains now the following extraction modes.
ExtractionModeLayout: This mode is a renaming of the default extraction that has been available as default extraction mode in UniPDF v3.

ExtractionModePlain: This extraction mode is a newly implemented feature. It follows the internal structure of PDF content and extracts the text in the order they appear. Since this by passes lines and paragraphs structures, it doesn’t make any assumption in the organization of the pages. This means predefined heuristics and constants doesn’t affect extracted text which has been causing extraction issue in many PDFs.

ExtractionModeLayoutNoBreaks: This extraction mode is similar to ExtractionModeLayout but it extracts text without any break of the horizontal line. This means the extractor when in this mode assumes the whole page contains a single column.

2. Grid Component

The Grid component serves similar purpose to an existing Text component but with the requirement of explicit specification of number of rows needed. It follows similar structure to an HTML table with table → rows → cells hierarchy. The usage example and step by step explanation of this component can be found here.

Deprecated Components

Paragraph Component Deprecation

The paragraph component is being deprecated and will be removed in the future. A more generic and feature rich StyledParagraph component is going to replace this feature. Users should use this component instead.

Other Improvements

SVG Gradient Handling

In the new version support to svg definitions and implementation of linear gradient has been introduced

API Cleanup

The PdfObject’s WriteString method has been replaced by Write method and its return type has been changed to []byte instead of String.

Migration Guide

Here a detailed guide on how to update migrate your old version to new one will be discussed.

1. Installation

You can install the v4.x.x version of UniPDF using the following command.

go get github.com/unidoc/unipdf/v4

2. Breaking Changes and Deprecated APIs

Since the WriteString method of PDFObject has been replaced by Write any code using the older method should updated accordingly.
Example:

  1. On digitally signing a PDF file using an external signing service the older usage
sigBytes := make([]byte, 8192)
copy(sigBytes, signatureData)

sig := core.MakeHexString(string(sigBytes)).WriteString()
copy(pdfData[byteRange[1]:byteRange[2]], []byte(sig))

Should be updated to:

sigBytes := make([]byte, sigLen)
copy(sigBytes, signatureData)

sig := core.MakeHexString(string(sigBytes)).Write()
copy(pdfData[byteRange[1]:byteRange[2]], sig)
  1. Other similar patterns and usages throughout the codebase should be reviewed and updated in the same way.

  2. The usage of NewParagraph should also be replaced by NewStyledParagraph since it is deprecated and will be removed in the future.

Example:

p := c.NewParagraph("Hello World")

Should be replaced by:

p := c.NewStyledParagraph()
p.SetText("Hello World")
Grid Component

A new component called Grid is introduced in v4. It is recommended to use it instead of the old Table component. It follows ``table → rows → cells` structure and is easy to use. Since it simplifies row or column span calculations, it brings performance improvement. The detailed step by step usage guide is documented here, but for quick glance here is a simple usage:

// create grid that can have n columns, in this case 2.
grid := c.NewGrid(2)
// instantiate new Row form that grid.
row := grid.NewRow() 
// create new cell form that row
cell, err := row.NewCell() 
// create paragraph with its properties and components
p := c.NewStyledParagraph()
p.SetText(text)
p.SetMargins(5, 5, 5, 5)
p.SetFontSize(14)
// set the content of that cell
cell.SetContent(p)
// then continue doing this. create rows, cells, and set their contents.
Extraction Modes

The usage of simple mode extraction has been changed. the usage a new extractor with options for simple extraction process should be updated as follows.
Old usage:

ex, err := extractor.NewWithOptions(page, &extractor.Options{
        UseSimplerExtractionProcess: true,
})

New usage:

ex, err := extractor.NewWithOptions(page, &extractor.Options{
        ExtractionMode: extractor.ExtractionModePlain,
})

This new mode is not the same as the old one, it is different in the way it extracts contents. It extracts the text from the internal PDF Content Streams, without building tables or text marks.

There is also one more additional mode to try if PDF content is not extracted correctly, The ExtractionModeLayoutNoBreaks. This basically uses the same underlying structure as the default extraction mode (ExtractionModeLayout), but doesn’t break lines. It assumes text lines span from the left side of the page the right end of the page. The text is extracted as a single column page.

Got any Questions?

We're here to help you.