v26.2

Improved Feature Extract Properties for PDF Extractor

  • Class Extractor: can Extract new metadata from PDF documents.
  • Class PdfProperties: added properties: FileName, Created, Modified, Application, PdfProducer.

Example Usage:

The example demonstrates how to Extract Properties (FileName, Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages) from PDF file.

// Create ExtractPropertiesOptions object to set input file
var options = new ExtractPropertiesOptions("path_to_your_pdf_file.pdf");
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var filename = pdfProperties.FileName;
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

Example Usage:

The example demonstrates how to Extract Properties (Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages) from PDF stream.

// Create ExtractPropertiesOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractPropertiesOptions(stream);
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

Improved Feature Extract Text for PDF Extractor

  • Working with text extraction and extraction parameters is now easier: you can specify input data and get the result easier than before.
  • Class ExtractTextOptions: implement IHaveInput. Use only 1 input data. Allowed DataTypes : File and Stream.
  • Method Extract(ExtractTextOptions options): return string with results.
  • Object ResultContainer: removed from ExtractTextOptions.

Example Usage:

The example demonstrates how to Extract Text content from PDF file.

// Create ExtractTextOptions object to set input file path
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf");
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

Example Usage:

The example demonstrates how to Extract Text content from PDF stream.

// Create ExtractTextOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractTextOptions(stream);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

Example Usage:

The example demonstrates how to Extract Text content of PDF document with TextFormattingMode.

// Create ExtractTextOptions object to set input file path and TextFormattingMode
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

Example Usage:

The example demonstrates how to Extract Text from PDF file in the shortest possible style.

// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure));

Enhancements

Fixed Bugs