现在,如何处理内容并将其写回到 PDF 中?我是否必须构建一个全新的 PDF 文档并复制所有内容(以经过操作的形式),或者我可以以某种方式直接操作读取的 PDF 数据吗?
本质上,您正在寻找一个类,它不仅仅是解析 PDF 内容流并发出其中的指令信号,就像PdfCanvasProcessor
(the PdfDocumentContentParser
你使用的只是一个非常薄的包装纸PdfCanvasProcessor
),但它也会根据您转发回的指令重新创建内容流。
通用内容流编辑器类
对于 iText 5.5.x,可以在以下位置找到此类内容流编辑器类的概念验证:这个答案 https://stackoverflow.com/a/35915789/1729265(Java 版本位于答案文本的下方)。
这是 iText 7 概念验证的移植:
public class PdfCanvasEditor extends PdfCanvasProcessor
{
/**
* This method edits the immediate contents of a page, i.e. its content stream.
* It explicitly does not descent into form xobjects, patterns, or annotations.
*/
public void editPage(PdfDocument pdfDocument, int pageNumber) throws IOException
{
if ((pdfDocument.getReader() == null) || (pdfDocument.getWriter() == null))
{
throw new PdfException("PdfDocument must be opened in stamping mode.");
}
PdfPage page = pdfDocument.getPage(pageNumber);
PdfResources pdfResources = page.getResources();
PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), pdfResources, pdfDocument);
editContent(page.getContentBytes(), pdfResources, pdfCanvas);
page.put(PdfName.Contents, pdfCanvas.getContentStream());
}
/**
* This method processes the content bytes and outputs to the given canvas.
* It explicitly does not descent into form xobjects, patterns, or annotations.
*/
public void editContent(byte[] contentBytes, PdfResources resources, PdfCanvas canvas)
{
this.canvas = canvas;
processContent(contentBytes, resources);
this.canvas = null;
}
/**
* <p>
* This method writes content stream operations to the target canvas. The default
* implementation writes them as they come, so it essentially generates identical
* copies of the original instructions the {@link ContentOperatorWrapper} instances
* forward to it.
* </p>
* <p>
* Override this method to achieve some fancy editing effect.
* </p>
*/
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
PdfOutputStream pdfOutputStream = canvas.getContentStream().getOutputStream();
int index = 0;
for (PdfObject object : operands)
{
pdfOutputStream.write(object);
if (operands.size() > ++index)
pdfOutputStream.writeSpace();
else
pdfOutputStream.writeNewLine();
}
}
//
// constructor giving the parent a dummy listener to talk to
//
public PdfCanvasEditor()
{
super(new DummyEventListener());
}
//
// Overrides of PdfContentStreamProcessor methods
//
@Override
public IContentOperator registerContentOperator(String operatorString, IContentOperator operator)
{
ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
wrapper.setOriginalOperator(operator);
IContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
}
//
// members holding the output canvas and the resources
//
protected PdfCanvas canvas = null;
//
// A content operator class to wrap all content operators to forward the invocation to the editor
//
class ContentOperatorWrapper implements IContentOperator
{
public IContentOperator getOriginalOperator()
{
return originalOperator;
}
public void setOriginalOperator(IContentOperator originalOperator)
{
this.originalOperator = originalOperator;
}
@Override
public void invoke(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
if (originalOperator != null && !"Do".equals(operator.toString()))
{
originalOperator.invoke(processor, operator, operands);
}
write(processor, operator, operands);
}
private IContentOperator originalOperator = null;
}
//
// A dummy event listener to give to the underlying canvas processor to feed events to
//
static class DummyEventListener implements IEventListener
{
@Override
public void eventOccurred(IEventData data, EventType type)
{ }
@Override
public Set<EventType> getSupportedEvents()
{
return null;
}
}
}
(PdfCanvasEditor.java https://github.com/mkl-public/testarea-itext7/blob/master/src/main/java/mkl/testarea/itext7/content/PdfCanvasEditor.java#L39)
的解释来自iText 5 的答案 https://stackoverflow.com/a/35915789/1729265仍然适用,从 iText 5.5.x 到 iText 7.0.x,解析框架没有太大变化。
使用示例
不幸的是,您以非常模糊的措辞描述了您想要如何更改内容。因此,我只是移植了一些使用原始 iText 5 内容流编辑器类的 iText 5 示例:
水印去除
这些是用例的端口这个答案 https://stackoverflow.com/a/38572474/1729265.
testRemoveBoldMTTextDocument
此示例删除以名称以“BoldMT”结尾的字体编写的所有文本:
try ( InputStream resource = getClass().getResourceAsStream("document.pdf");
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBoldMTText.pdf"));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if (getGraphicsState().getFont().getFontProgram().getFontNames().getFontName().endsWith("BoldMT"))
return;
}
super.write(processor, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(编辑页面内容.java https://github.com/mkl-public/testarea-itext7/blob/master/src/test/java/mkl/testarea/itext7/content/EditPageContent.java#L61测试方法testRemoveBoldMTTextDocument
)
testRemoveBigTextDocument
此示例删除所有使用大字体书写的文本:
try ( InputStream resource = getClass().getResourceAsStream("document.pdf");
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBigText.pdf"));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if (getGraphicsState().getFontSize() > 100)
return;
}
super.write(processor, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(编辑页面内容.java https://github.com/mkl-public/testarea-itext7/blob/master/src/test/java/mkl/testarea/itext7/content/EditPageContent.java#L114测试方法testRemoveBigTextDocument
)
文字颜色改变
这是用例的一个端口这个答案 https://stackoverflow.com/a/40709845/1729265.
testChangeBlackTextToGreenDocument
此示例将黑色文本的颜色更改为绿色。
try ( InputStream resource = getClass().getResourceAsStream("document.pdf");
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-blackTextToGreen.pdf"));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if (currentlyReplacedBlack == null)
{
Color currentFillColor = getGraphicsState().getFillColor();
if (Color.BLACK.equals(currentFillColor))
{
currentlyReplacedBlack = currentFillColor;
super.write(processor, new PdfLiteral("rg"), Arrays.asList(new PdfNumber(0), new PdfNumber(1), new PdfNumber(0), new PdfLiteral("rg")));
}
}
}
else if (currentlyReplacedBlack != null)
{
if (currentlyReplacedBlack instanceof DeviceCmyk)
{
super.write(processor, new PdfLiteral("k"), Arrays.asList(new PdfNumber(0), new PdfNumber(0), new PdfNumber(0), new PdfNumber(1), new PdfLiteral("k")));
}
else if (currentlyReplacedBlack instanceof DeviceGray)
{
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
}
else
{
super.write(processor, new PdfLiteral("rg"), Arrays.asList(new PdfNumber(0), new PdfNumber(0), new PdfNumber(0), new PdfLiteral("rg")));
}
currentlyReplacedBlack = null;
}
super.write(processor, operator, operands);
}
Color currentlyReplacedBlack = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(编辑页面内容.java https://github.com/mkl-public/testarea-itext7/blob/master/src/test/java/mkl/testarea/itext7/content/EditPageContent.java#L179测试方法testChangeBlackTextToGreenDocument
)