v25.7

Improved usability of Text Extractor

  • Class TextExtractor: is static and does not require the use of a constructor.
  • Class TextExtractor: Improved behavior when running in evaluation mode. No exception with 4+ pages documents.
  • Class TextExtractor: fixed issues in method Process.
  • Class PdfExtractorOptions: removed.
  • Class TextExtractorOptions: refactored.
  • Enum TextFormattingMode: renamed and improved.

Example Usage:

// The example demonstrates how to extract text content of PDF document.
// Create TextExtractorOptions object to set instructions
var options = new TextExtractorOptions(TextFormattingMode.Pure);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = TextExtractor.Process(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();

Improved usability of Html Converter

  • Class HtmlConverter: is static and does not require the use of a constructor.
  • Class HtmlConverter: fixed issues in method Process.
  • Class HtmlConverter: removed IDisposable logic.
  • Class PdfConverterOptions: removed.

Examples Usage:

// The example demonstrates how to convert PDF to HTML document.
// Create PdfToHtmlOptions object to set output data type as file with embedded resources
var options = new PdfToHtmlOptions(PdfToHtmlOptions.SaveDataType.FileWithEmbeddedResources);
// Add input file path
options.AddInput(new FileDataSource("path_to_input.pdf"));
// Set output file path
options.AddOutput(new FileDataSource("path_to_output.html"));
//Perform the process
HtmlConverter.Process(options);

// The example demonstrates how to convert HTML to PDF document.
// Create HtmlToPdfOptions
var options = new HtmlToPdfOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_input.html"));
// Set output file path
options.AddOutput(new FileDataSource("path_to_output.pdf"));
//Perform the process
HtmlConverter.Process(options);

Improved usability of Image Extractor

  • Class ImageExtractor: is static and does not require the use of a constructor.

Example Usage:

// The example demonstrates how to extract images from PDF document.
// Create ImageExtractorOptions to set instructions
var options = new ImageExtractorOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output Directory path
options.AddOutput(new DirectoryDataSource("path_to_results_directory"));
// Perform the process
var results = ImageExtractor.Process(options);
// Get path to image result
var imageExtracted = results.ResultCollection[0].ToFile();

Minor Fixes

  • Internal fixes.
  • Fixed examples and hints of Tiff Converter.
  • Minimized page optimization duration.
  • Fixed: incorrect output image from PDF to PNG conversion.
  • Fixed: Chinese characters not displaying properly during PDF to PNG conversion.
  • Improved: Performance during PDF to HTML conversion.