v26.2

改进的 PDF 提取器属性提取功能

  • Class Extractor: 能够从 PDF 文档中提取新的元数据。
  • Class PdfProperties: 新增属性:FileName、Created、Modified、Application、PdfProducer。

示例用法:

该示例演示如何从 PDF 文件中提取属性(FileName、Title、Author、Subject、Keywords、Created、Modified、Application、PDF Producer、页数)。

// Create ExtractPropertiesOptions object to set input file
var options = new ExtractPropertiesOptions("path_to_your_pdf_file.pdf");
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var filename = pdfProperties.FileName;
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

示例用法:

该示例演示如何从 PDF 流中提取属性(Title、Author、Subject、Keywords、Created、Modified、Application、PDF Producer、页数)。

// Create ExtractPropertiesOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractPropertiesOptions(stream);
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

改进的 PDF 提取器文本提取功能

  • 文本提取和提取参数的使用更加简便:现在可以更容易地指定输入数据并获取结果。
  • Class ExtractTextOptions: 实现 IHaveInput。仅使用 1 个输入数据。支持的数据类型:File 和 Stream。
  • Method Extract(ExtractTextOptions options): 返回包含结果的字符串。
  • Object ResultContainer: 已从 ExtractTextOptions 中移除。

示例用法:

该示例演示如何从 PDF 文件中提取文本内容。

// Create ExtractTextOptions object to set input file path
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf");
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

示例用法:

该示例演示如何从 PDF 流中提取文本内容。

// Create ExtractTextOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractTextOptions(stream);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

示例用法:

该示例演示如何在指定 TextFormattingMode 的情况下提取 PDF 文档的文本内容。

// Create ExtractTextOptions object to set input file path and TextFormattingMode
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

示例用法:

该示例演示如何以最简方式从 PDF 文件中提取文本。

// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure));

已修复的错误

  • Jpeg2000 文件转换为 PDF 失败
  • PDF 合并问题
  • PDF 页面缩放产生空白输出
  • PDF 转 HTML:高亮颜色可见但文本缺失
  • PDF 转 HTML:生成的 HTML 输出不正确
  • PDF 转 HTML:左侧垂直文本缺失
  • PDF 转 HTML:页眉中的文本消失
  • PDF 转 HTML:透明标注文本不可见
  • PDF 转 PNG:某些中文字符渲染失败
2026年2月11日
 中文