v26.2

PDF Extractor のプロパティ抽出機能を改善

Class Extractor: PDF ドキュメントから新しいメタデータを抽出できるようになりました。
Class PdfProperties: 追加プロパティ: FileName、Created、Modified、Application、PdfProducer。

使用例:

この例は、PDF ファイルからプロパティ (FileName、Title、Author、Subject、Keywords、Created、Modified、Application、PDF Producer、ページ数) を抽出する方法を示しています。

// Create ExtractPropertiesOptions object to set input file
var options = new ExtractPropertiesOptions("path_to_your_pdf_file.pdf");
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var filename = pdfProperties.FileName;
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

使用例:

この例は、PDF ストリームからプロパティ (Title、Author、Subject、Keywords、Created、Modified、Application、PDF Producer、ページ数) を抽出する方法を示しています。

// Create ExtractPropertiesOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractPropertiesOptions(stream);
// Perform the process and get Properties
var pdfProperties = PdfExtractor.Extract(options);
var title = pdfProperties.Title;
var author = pdfProperties.Author;
var subject = pdfProperties.Subject;
var keywords = pdfProperties.Keywords;
var created = pdfProperties.Created;
var modified = pdfProperties.Modified;
var application = pdfProperties.Application;
var pdfProducer = pdfProperties.PdfProducer;
var numberOfPages = pdfProperties.NumberOfPages;

PDF Extractor のテキスト抽出機能を改善

テキスト抽出と抽出パラメータの操作がより簡単になり、入力データを指定して結果を取得しやすくなりました。
Class ExtractTextOptions: IHaveInput を実装。入力データは 1 つのみ使用可能。許可されるデータ型: File と Stream。
Method Extract(ExtractTextOptions options): 結果文字列を返す。
Object ResultContainer: ExtractTextOptions から削除。

使用例:

この例は、PDF ファイルからテキストコンテンツを抽出する方法を示しています。

// Create ExtractTextOptions object to set input file path
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf");
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

使用例:

この例は、PDF ストリームからテキストコンテンツを抽出する方法を示しています。

// Create ExtractTextOptions object to set input stream
var stream = File.OpenRead("path_to_your_pdf_file.pdf");
var options = new ExtractTextOptions(stream);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

使用例:

この例は、TextFormattingMode を指定して PDF ドキュメントのテキストコンテンツを抽出する方法を示しています。

// Create ExtractTextOptions object to set input file path and TextFormattingMode
var options = new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure);
// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(options);

使用例:

この例は、最も簡潔な形で PDF ファイルからテキストを抽出する方法を示しています。

// Perform the process and get the extracted text
var textExtracted = PdfExtractor.Extract(new ExtractTextOptions("path_to_your_pdf_file.pdf", TextFormattingMode.Pure));

バグ修正

Jpeg2000 ファイルの PDF 変換が失敗する問題を修正
PDF 結合の問題を修正
PDF ページのリサイズで空白の出力が生成される問題を修正
PDF から HTML への変換で、ハイライト色は表示されるがテキストが欠落する問題を修正
PDF から HTML への変換で、誤った HTML 出力が生成される問題を修正
PDF から HTML への変換で、左側の縦書きテキストが欠落する問題を修正
PDF から HTML への変換で、ヘッダー内のテキストが消える問題を修正
PDF から HTML への変換で、透明な注釈テキストが表示されない問題を修正
PDF から PNG への変換で、特定の中国文字が正しくレンダリングされない問題を修正