如何使用pdfbox或其他Java库减小合并的PDF / A-1b文件的大小

:包含嵌入式字体的(例如14个)PDF / A-1b文件列表。

:与Apache PDFBOX进行简单合并。

:1个PDF / A-1b文件,文件大小太大(太大)。(这几乎是所有源文件大小的总和)。

:是否可以减小生成的PDF的文件大小?

:删除多余的嵌入式字体。但是如何?这是正确的做法吗?

不幸的是,以下代码无法完成任务,但突出了明显的问题。

try (PDDocument document = PDDocument.load(new File("E:/tmp/16189_ZU_20181121195111_5544_2008-12-31_Standardauswertung.pdf"))) {

List<COSName> collectedFonts = new ArrayList<>();

PDPageTree pages = document.getDocumentCatalog().getPages();

int pageNr = 0;

for (PDPage page : pages) {

pageNr++;

Iterable<COSName> names = page.getResources().getFontNames();

System.out.println("Page " + pageNr);

for (COSName name : names) {

collectedFonts.add(name);

System.out.print("\t" + name + " - ");

PDFont font = page.getResources().getFont(name);

System.out.println(font + ", embedded: " + font.isEmbedded());

page.getCOSObject().removeItem(COSName.F);

page.getResources().getCOSObject().removeItem(name);

}

}

document.save("E:/tmp/output.pdf");

}

该代码产生如下输出:

Page 1

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 2

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 3

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 4

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 5

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 6

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 7

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 8

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 9

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 10

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 11

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F33} - PDTrueTypeFont ArialMT-BoldItalic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 12

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 13

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

Page 14

COSName{F23} - PDTrueTypeFont ArialMT-Bold, embedded: true

COSName{F25} - PDTrueTypeFont ArialMT-Italic, embedded: true

COSName{F27} - PDTrueTypeFont ArialMT-Regular, embedded: true

任何帮助表示赞赏…

回答:

在文件中进行调试时,我认识到多次引用相同字体的字体文件。因此,用已经查看过的字体文件项替换字典中的实际字体文件项,即可删除引用并进行压缩。到此为止,我能够将30

MB的文件缩小到6 MB左右。

    File file = new File("test.pdf");

PDDocument doc = PDDocument.load(file);

Map<String, COSBase> fontFileCache = new HashMap<>();

for (int pageNumber = 0; pageNumber < doc.getNumberOfPages(); pageNumber++) {

final PDPage page = doc.getPage(pageNumber);

COSDictionary pageDictionary = (COSDictionary) page.getResources().getCOSObject().getDictionaryObject(COSName.FONT);

for (COSName currentFont : pageDictionary.keySet()) {

COSDictionary fontDictionary = (COSDictionary) pageDictionary.getDictionaryObject(currentFont);

for (COSName actualFont : fontDictionary.keySet()) {

COSBase actualFontDictionaryObject = fontDictionary.getDictionaryObject(actualFont);

if (actualFontDictionaryObject instanceof COSDictionary) {

COSDictionary fontFile = (COSDictionary) actualFontDictionaryObject;

if (fontFile.getItem(COSName.FONT_NAME) instanceof COSName) {

COSName fontName = (COSName) fontFile.getItem(COSName.FONT_NAME);

fontFileCache.computeIfAbsent(fontName.getName(), key -> fontFile.getItem(COSName.FONT_FILE2));

fontFile.setItem(COSName.FONT_FILE2, fontFileCache.get(fontName.getName()));

}

}

}

}

}

final ByteArrayOutputStream baos = new ByteArrayOutputStream();

doc.save(baos);

final File compressed = new File("test_compressed.pdf");

baos.writeTo(new FileOutputStream(compressed));

也许这不是最优雅的方法,但是它可以工作并保持PDF / A-1b兼容性。

以上是 如何使用pdfbox或其他Java库减小合并的PDF / A-1b文件的大小 的全部内容, 来源链接: utcz.com/qa/425506.html

回到顶部