In 如何使用 apache poi 更改特定 Word 文档的颜色? https://stackoverflow.com/questions/40318507/how-do-i-change-color-of-a-particular-word-document-using-apache-poi/40327308#40327308我已经展示了一种分割算法XWPFRuns
出于格式原因。这仅用于格式化一个字符,并且不会克隆运行属性。但基本的已经显示出来了。我们必须查看整个段落,因为只有插入运行的方法。我们需要按字符循环运行文本,因为所有拆分为单词的方法都会导致标点符号出现问题,然后将单词重新组装成段落。
缺少的是一种将运行属性从原始运行克隆到新添加的运行属性的方法。这可以通过克隆底层来完成w:rPr
元素。
然后整个方法就是遍历段落中的所有运行。如果我们有一个包含关键字的运行,则将运行文本拆分为字符。然后遍历该运行中的所有字符并缓冲它们。如果缓冲的字符流以关键字结尾,则将当前缓冲的除关键字之外的所有字符设置为实际运行的文本。然后为格式化关键字插入新的运行,并从原始运行克隆运行属性。将关键字设置到运行中并进行附加格式化。然后为下一个字符插入一个新的运行,并从原始运行中克隆运行属性。对于该段落中的每次运行,依此类推。
完整示例:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.*;
import java.awt.Desktop;
public class WordFormatWords {
static void cloneRunProperties(XWPFRun source, XWPFRun dest) { // clones the underlying w:rPr element
CTR tRSource = source.getCTR();
CTRPr rPrSource = tRSource.getRPr();
if (rPrSource != null) {
CTRPr rPrDest = (CTRPr)rPrSource.copy();
CTR tRDest = dest.getCTR();
tRDest.setRPr(rPrDest);
}
}
static void formatWord(XWPFParagraph paragraph, String keyword, Map<String, String> formats) {
int runNumber = 0;
while (runNumber < paragraph.getRuns().size()) { //go through all runs, we cannot use for each since we will possibly insert new runs
XWPFRun run = paragraph.getRuns().get(runNumber);
XWPFRun run2 = run;
String runText = run.getText(0);
if (runText != null && runText.contains(keyword)) { //if we have a run with keyword in it, then
// This code part is to manage comment ranges.
// Do we have commentRangeEnd immediately after the run?
// If so then remember that in a cursor.
XmlCursor commentRangeEndCursor = null;
XmlCursor cursor = run.getCTR().newCursor();
cursor.toEndToken();
if (cursor.hasNextToken()) {
cursor.toNextToken();
XmlObject commentRangeEnd = cursor.getObject();
if (commentRangeEnd != null && commentRangeEnd instanceof CTMarkupRange) {
commentRangeEndCursor = cursor;
}
}
char[] runChars = runText.toCharArray(); //split run text into characters
StringBuffer sb = new StringBuffer();
for (int charNumber = 0; charNumber < runChars.length; charNumber++) { //go through all characters in that run
sb.append(runChars[charNumber]); //buffer all characters
runText = sb.toString();
if (runText.endsWith(keyword)) { //if the bufferend character stream ends with the keyword
//set all chars, which are current buffered, except the keyword, as the text of the actual run
run.setText(runText.substring(0, runText.length() - keyword.length()), 0);
run2 = paragraph.insertNewRun(++runNumber); //insert new run for the formatted keyword
cloneRunProperties(run, run2); // clone the run properties from original run
run2.setText(keyword, 0); // set the keyword in run
for (String toSet : formats.keySet()) { // do the additional formatting
if ("color".equals(toSet)) {
run2.setColor(formats.get(toSet));
} else if ("bold".equals(toSet)) {
run2.setBold(Boolean.valueOf(formats.get(toSet)));
}
}
run2 = paragraph.insertNewRun(++runNumber); //insert a new run for the next characters
cloneRunProperties(run, run2); // clone the run properties from original run
run = run2;
sb = new StringBuffer(); //empty the buffer
}
}
run.setText(sb.toString(), 0); //set all characters, which are currently buffered, as the text of the actual run
// This code part is to manage comment ranges.
// If we had remembered commentRangeEnd, then move this to here now.
if(commentRangeEndCursor != null) {
cursor = run.getCTR().newCursor();
cursor.toEndToken();
if (cursor.hasNextToken()) {
cursor.toNextToken();
commentRangeEndCursor.moveXml(cursor);
}
cursor.dispose();
commentRangeEndCursor.dispose();
}
}
runNumber++;
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String[] keywords = new String[]{"fox", "dog"};
Map<String, String> formats = new HashMap<String, String>();
formats.put("bold", "true");
formats.put("color", "DC143C");
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
for (String keyword : keywords) {
formatWord(paragraph, keyword, formats);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
System.out.println("Done");
Desktop.getDesktop().open(new File("result.docx"));
}
}
这段代码还关心XML
标记范围元素,例如commentRangeEnd
紧接着跑步之后r
元素。此类标记范围元素用于标记其他元素组的开始和结束。例如,应用注释的一组文本运行元素位于commentRangeStart
and commentRangeEnd
具有相同的id
.
如果在需要分割的运行之后立即跟随commentRangeEnd
,然后我们在光标中记住它。然后在分割运行后我们移动这个commentRangeEnd
紧接着最后一个新插入的运行。所以评论应该保持正确。
当然,即使这样也会有一些缺点,因为方法很笨拙。Microsoft Word
有时将文本存储在文本运行中。对于这个问题,没有唯一的通用解决方案Microsoft Word
是源头。