主要是想要用此功插件操作docx,主要的操作就是操作段落等信息,另外,也想实现替换docx的内容,实现根据模板动态生成内容的效果,也想用此插件实现docx转换pdf。
word的格式其实可以用xml来表现,docx4j也应该是基于xml来操作docx文档的。xml就比较好理解了。我们都是通过doc树的形式操作docx,只不过对于docx4j来说根节点是一个package,我们可以从根节点获取所有的内容,也可以指定元素的类型从document中查找元素集合,用下标访问指定位置的元素。
docx4j官网下载的包本身缺slf4j的支持包,而且转换pdf的时候fop-2.3的包与docx4j的包冲突,在文章最后会将最终整理过的docx4j及其相关依赖包附上下载链接。
1.docx的下载
到官网下载即可,下载的zip包里面有jar包,也有examples,下面的例子就是出自官网的examples。但是官网下载的lib里面日志记录缺失log4j的包和slf4j-log4j包。
官网下载地址:https://www.docx4java.org/downloads.html
2.简单的使用
0. docx4j.properties 可以指定docx的一些全局属性,包括文字方向,纸张大小等。下面是官网给出的一个配置
# Page size: use a value from org.docx4j.model.structure.PageSizePaper enum
# eg A4, LETTER
docx4j.PageSize=LETTER
# Page size: use a value from org.docx4j.model.structure.MarginsWellKnown enum
docx4j.PageMargins=NORMAL
docx4j.PageOrientationLandscape=false
# Page size: use a value from org.pptx4j.model.SlideSizesWellKnown enum
# eg A4, LETTER
pptx4j.PageSize=LETTER
pptx4j.PageOrientationLandscape=false
# These will be injected into docProps/app.xml
# if App.Write=true
docx4j.App.write=true
docx4j.Application=docx4j
docx4j.AppVersion=2.7
# of the form XX.YYYY where X and Y represent numerical values
# These will be injected into docProps/core.xml
docx4j.dc.write=true
docx4j.dc.creator.value=docx4j
docx4j.dc.lastModifiedBy.value=docx4j
#
#docx4j.McPreprocessor=true
# If you haven't configured log4j yourself
# docx4j will autoconfigure it. Set this to true to disable that
docx4j.Log4j.Configurator.disabled=false
1.创建一个新的docx文档
/**
* 创建一个简单的docx
*/
private static void createDocx() {
// Create the package
WordprocessingMLPackage wordMLPackage;
try {
wordMLPackage = WordprocessingMLPackage.createPackage();
// 另存为新的文件
wordMLPackage.save(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
} catch (InvalidFormatException e) {
log.error("createDocx error:InvalidFormatException", e);
} catch (Docx4JException e) {
log.error("createDocx error: Docx4JException", e);
}
}
调用 WordprocessingMLPackage.createPackage(); 创建一个包,并且调用其save(file)就是生成一个新的文件。
补充:还有另一种常用的保存方法是:
Docx4J.save(wordMLPackage, new File("C:/Users/liqiang/Desktop/docx4j/helloworld_2.docx"));
2.向文件中增加段落
/**
* 增加一个段落,增加完成记得保存,否则不生效
*/
public static void addParagraph() {
WordprocessingMLPackage wordprocessingMLPackage;
try {
wordprocessingMLPackage = WordprocessingMLPackage
.load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
wordprocessingMLPackage.getMainDocumentPart().addParagraphOfText("Hello Word!");
wordprocessingMLPackage.getMainDocumentPart().addStyledParagraphOfText("Title", "Hello Word!");
wordprocessingMLPackage.getMainDocumentPart().addStyledParagraphOfText("Subtitle", " a subtitle!");
wordprocessingMLPackage.save(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
} catch (Docx4JException e) {
log.error("addParagraph to docx error: Docx4JException", e);
}
}
调用 WordprocessingMLPackage.load(file) 加载一个已经存在的docx,最后记得调用其save方法进行保存,否则修改不生效。
最后文件内容:
3.第二种采用工厂类增加段落的方法(工厂类的使用,工厂类也是一种通用的方法)
/**
* 增加一个段落,增加完成记得保存,否则不生效
*/
public static void addParagraph2(String simpleText) {
try {
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage
.load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
org.docx4j.wml.ObjectFactory factory = Context.getWmlObjectFactory();
org.docx4j.wml.P para = factory.createP();
if (simpleText != null) {
org.docx4j.wml.Text t = factory.createText();
t.setValue(simpleText);
org.docx4j.wml.R run = factory.createR();
run.getContent().add(t);
para.getContent().add(run);
}
wordprocessingMLPackage.getMainDocumentPart().getContent().add(para);
wordprocessingMLPackage.save(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
} catch (Exception e) {
log.error("addParagraph to docx error: Docx4JException", e);
}
}
先创建一个工厂,(需要导入的包是org.docx4j.wml,导错的的话下面全错)。
R是一个运行块,负责便于将多个属性相同的Object对象统一操作,通过其内部的content成员变量可以添加内容,RPr是运行块的属性(属于类R的一个成员变量),可以对R对象进行操作。R通过被作为其他对象的content内容。所以通过R在A元素中加一个B元素的操作的一般步骤是:(1)创建R;(2)将内容元素B加到R中;(3)将R增加到A元素中;(4)将A元素加到mainDocumentPart内容中。
补充:工厂类的一些通用方法:
4.读取文件的内容
private static void readParagraph() {
try {
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage
.load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
String contentType = wordprocessingMLPackage.getContentType();
log.info("contentType -> {}", contentType);
MainDocumentPart mainDocumentPart = wordprocessingMLPackage.getMainDocumentPart();
List<Object> content = mainDocumentPart.getContent();
for (Object ob : content) {
log.info("ob -> {}", ob);
}
} catch (Docx4JException e) {
log.error("createDocx error: Docx4JException", e);
}
}
结果:
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] contentType -> application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> a subtitle!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> a subtitle!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> a subtitle!
5.创建表格
(1)创建一个普通的表格
public static void addTable() {
try {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
ObjectFactory factory = Context.getWmlObjectFactory();
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
// 创建表格元素
Tbl table = factory.createTbl();
addBorders(table);
for (int i = 0; i < 3; i++) {
Tr tr = factory.createTr();
for (int j = 0; j < 3; j++) {
Tc tc = factory.createTc();
P p = mainDocumentPart.createParagraphOfText("---row" + i + "---column" + j + "---");
tc.getContent().add(p);
tr.getContent().add(tc);
}
table.getContent().add(tr);
}
mainDocumentPart.addObject(table);
wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
} catch (Docx4JException e) {
log.error("createDocx error: Docx4JException", e);
}
}
查看createParagraphOfText(str)的源码:(1.创建一个text,并设置其值,2.创建一个R并将text增加到R中,3.创建一个P将R加到P中)
public org.docx4j.wml.P createParagraphOfText(String simpleText) {
org.docx4j.wml.ObjectFactory factory = Context.getWmlObjectFactory();
org.docx4j.wml.P para = factory.createP();
if (simpleText!=null) {
org.docx4j.wml.Text t = factory.createText();
t.setValue(simpleText);
org.docx4j.wml.R run = factory.createR();
run.getContent().add(t); // ContentAccessor
para.getContent().add(run); // ContentAccessor
}
return para;
}
结果:
上面的表格创建出来了,但是表格的边框也没有,接下来研究更复杂的操作,包括显示边框,合并单元格,设置单元格样式。
(2)显示表格的边框
public static void addTable() {
try {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
ObjectFactory factory = Context.getWmlObjectFactory();
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
// 0. 创建表格元素
Tbl table = factory.createTbl();
// 1.显示表格的边框
addBorders(table);
// 2.添加表格内容(创建行和列)
for (int i = 0; i < 3; i++) {
Tr tr = factory.createTr();
for (int j = 0; j < 3; j++) {
Tc tc = factory.createTc();
P p = mainDocumentPart.createParagraphOfText("---row" + i + "---column" + j + "---");//
tc.getContent().add(p);
tr.getContent().add(tc);
}
table.getContent().add(tr);
}
// 3.加表格加到主要内容中
mainDocumentPart.addObject(table);
wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
} catch (Docx4JException e) {
log.error("createDocx error: Docx4JException", e);
}
}
/**
* 设置边框样式
*
* @param table
* 需要设置表格边框的单元格
*/
private static void addBorders(Tbl table) {
table.setTblPr(new TblPr());// 必须设置一个TblPr,否则最后会报空指针异常
CTBorder border = new CTBorder();
border.setColor("auto");
border.setSz(new BigInteger("4"));
border.setSpace(new BigInteger("0"));
border.setVal(STBorder.SINGLE);
TblBorders borders = new TblBorders();
borders.setBottom(border);
borders.setLeft(border);
borders.setRight(border);
borders.setTop(border);
borders.setInsideH(border);
borders.setInsideV(border);
// 获取其内部的TblPr属性设置属性
table.getTblPr().setTblBorders(borders);
}
结果:
(3)设置表格居中显示,而且内容部分字体加粗,设置列宽等操作
public static void addTable() {
try {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
ObjectFactory factory = Context.getWmlObjectFactory();
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
// 0. 创建表格元素
Tbl table = factory.createTbl();
// 1.显示表格的边框
addBorders(table);
// 2.添加表格内容(创建行和列)
for (int i = 0; i < 3; i++) {
Tr tr = factory.createTr();
for (int j = 0; j < 3; j++) {
Tc tc = factory.createTc();
// P p = mainDocumentPart.createParagraphOfText("---row" + i
// + "---column" + j + "---");
// 第二种创建P并设置样式的方法
P p1 = factory.createP();
R r = factory.createR();
Text text = factory.createText();
text.setValue("---row" + i + "---column" + j + "---");
r.getContent().add(text);
p1.getContent().add(r);
// 2.1通过R设置字体加粗等属性
setRStyle(r);
// 2.2设置列宽
if (j == 1) {
setCellWidth(tc, 1250);
} else {
setCellWidth(tc, 2500);
}
tc.getContent().add(p1);
tr.getContent().add(tc);
}
table.getContent().add(tr);
}
// 3.合并单元格
// 3.加表格加到主要内容中
mainDocumentPart.addObject(table);
wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
} catch (Docx4JException e) {
log.error("createDocx error: Docx4JException", e);
}
}
/**
* 设置列宽
*
* @param tc
* @param width
*/
private static void setCellWidth(Tc tc, int width) {
TcPr tableCellProperties = new TcPr();
TblWidth tableWidth = new TblWidth();
tableWidth.setW(BigInteger.valueOf(width));
tableCellProperties.setTcW(tableWidth);
tc.setTcPr(tableCellProperties);
}
/**
* 通过设置R设置表格中属性字体加粗,大小为25
*
* @param
*/
private static void setRStyle(R r) {
// 1.创建一个RPr
RPr rpr = new RPr();
// 2.设置RPr
// 2.1设置字体大小
HpsMeasure size = new HpsMeasure();
size.setVal(new BigInteger("25"));
rpr.setSz(size);
// 2.2设置加粗
BooleanDefaultTrue bold = new BooleanDefaultTrue();
bold.setVal(true);
rpr.setB(bold);
// 3.将RPr设置为R的属性
r.setRPr(rpr);
}
/**
* 设置边框样式
*
* @param table
* 需要设置表格边框的单元格
*/
private static void addBorders(Tbl table) {
table.setTblPr(new TblPr());// 必须设置一个TblPr,否则最后会报空指针异常
CTBorder border = new CTBorder();
border.setColor("auto");
border.setSz(new BigInteger("4"));
border.setSpace(new BigInteger("0"));
border.setVal(STBorder.SINGLE);
TblBorders borders = new TblBorders();
borders.setBottom(border);
borders.setLeft(border);
borders.setRight(border);
borders.setTop(border);
borders.setInsideH(border);
borders.setInsideV(border);
// 获取其内部的TblPr属性设置属性
table.getTblPr().setTblBorders(borders);
}
结果:
关于表格合并或者更加复杂的操作参考:https://www.cnblogs.com/cxxjohnson/p/7886275.html
6.读取表格内容:(解析docx4j的树结构---获取指定类型的元素)
表格内容:
代码:(有时候我们调用getContent()获取的元素类型是Tr之类的直接元素,可以强转;有时候不可以直接强转,其类型是JAXBElement,需要进行提取---getAllElementFromObject方法)
package cn.qlq.docx4j;
import java.util.ArrayList;
import java.util.List;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;
import org.docx4j.TraversalUtil;
import org.docx4j.finders.ClassFinder;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.ContentAccessor;
import org.docx4j.wml.Tbl;
import org.docx4j.wml.Tc;
import org.docx4j.wml.Tr;
/**
* 循环替换表格内容
*
* @author QiaoLiQiang
* @time 2018年10月28日下午8:51:41
*/
public class ReplaceTable {
public static void main(String[] args) throws JAXBException {
String template = "C:/Users/liqiang/Desktop/docx4j/helloworld_1.docx";
WordprocessingMLPackage wordMLPackage;
try {
wordMLPackage = WordprocessingMLPackage.load(new java.io.File(template));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
// 1. ClassFinder 构造类型查询器获取指定元素
ClassFinder find = new ClassFinder(Tbl.class);
new TraversalUtil(documentPart.getContent(), find);
Tbl table = (Tbl) find.results.get(0);// 获取到第一个表格元素
List<Object> trs = table.getContent();
System.out.println(trs);
System.out.println("=====================");
for (Object obj : trs) {
Tr tr = (Tr) obj;// 获取到tr
List<Object> content = tr.getContent();
System.out.println(content);
List<Object> objList = getAllElementFromObject(tr, Tc.class);// 获取所有的Tc元素
for (Object obj1 : objList) {
Tc tc = (Tc) obj1;
System.out.println(tc.getContent());
}
System.out.println("===============");
}
} catch (Docx4JException e) {
e.printStackTrace();
}
}
private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
List<Object> result = new ArrayList<Object>();
if (obj instanceof JAXBElement)
obj = ((JAXBElement<?>) obj).getValue();
if (obj.getClass().equals(toSearch))
result.add(obj);
else if (obj instanceof ContentAccessor) {
List<?> children = ((ContentAccessor) obj).getContent();
for (Object child : children) {
result.addAll(getAllElementFromObject(child, toSearch));
}
}
return result;
}
}
结果:
[org.docx4j.wml.Tr@234f18c8, org.docx4j.wml.Tr@1de40494, org.docx4j.wml.Tr@64e89fe0, org.docx4j.wml.Tr@64585ee1, org.docx4j.wml.Tr@65bd393e, org.docx4j.wml.Tr@69f949a0]
=====================
[javax.xml.bind.JAXBElement@6d50ddba, javax.xml.bind.JAXBElement@580d1667, javax.xml.bind.JAXBElement@4339f15a]
[姓名]
[性别]
[年龄]
===============
[javax.xml.bind.JAXBElement@11146e31, javax.xml.bind.JAXBElement@544e5bb9, javax.xml.bind.JAXBElement@6467f9ec]
[name0]
[sex0]
[age0]
===============
[javax.xml.bind.JAXBElement@66492873, javax.xml.bind.JAXBElement@4cfeca7b, javax.xml.bind.JAXBElement@6b9f78ba]
[name1]
[sex1]
[age1]
===============
[javax.xml.bind.JAXBElement@32af3289, javax.xml.bind.JAXBElement@c1eda5e, javax.xml.bind.JAXBElement@3d925789]
[name2]
[sex2]
[age2]
===============
[javax.xml.bind.JAXBElement@52b102f3, javax.xml.bind.JAXBElement@6338c9ee, javax.xml.bind.JAXBElement@25515b26]
[name3]
[sex3]
[age3]
===============
[javax.xml.bind.JAXBElement@372eee, javax.xml.bind.JAXBElement@26ea0b5e, javax.xml.bind.JAXBElement@4f905c47]
[name4]
[sex4]
[age4]
===============
7.格式化样式的操作:
有时候我们需要格式化一些样式,每个元素内部都有一个XXXpr属性用于操作样式,Pr表示Properties,如下:
3.docx4j高级用法
1.docx转换为html
参考github官网:https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutHtml.java
package cn.qlq.docx4j;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;
public class Docx2Html extends AbstractSample {
static {
inputfilepath = "C:/Users/liqiang/Desktop/docx4j/helloworld.docx";
save = true;
nestLists = true;
}
static boolean save;
static boolean nestLists;
public static void main(String[] args) throws Exception {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(inputfilepath + "_files");
htmlSettings.setImageTargetUri(inputfilepath.substring(inputfilepath.lastIndexOf("/") + 1) + "_files");
htmlSettings.setWmlPackage(wordMLPackage);
String userCSS = null;
if (nestLists) {
userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img, table, caption, tbody, tfoot, thead, tr, th, td "
+ "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
} else {
userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img, ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td "
+ "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
}
htmlSettings.setUserCSS(userCSS);
OutputStream os;
if (save) {
os = new FileOutputStream(inputfilepath + ".html");
} else {
os = new ByteArrayOutputStream();
}
Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
if (save) {
System.out.println("Saved: " + inputfilepath + ".html ");
} else {
System.out.println(((ByteArrayOutputStream) os).toString());
}
if (wordMLPackage.getMainDocumentPart().getFontTablePart() != null) {
wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
}
htmlSettings = null;
wordMLPackage = null;
}
}
封装为一个更简单的工具类的代码如下:(userCSS是生成的html的样式,可以手动设置,使用此参数可以灵活的设置边距字体等信息)
package cn.qlq.docx4j;
import java.io.File;
import java.io.FileOutputStream;
import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;
public class Docx2Html extends AbstractSample {
public static void main(String[] args) throws Exception {
String inputfilepath = "C:/Users/liqiang/Desktop/docx4j/helloworld.docx";
boolean nestLists = true;
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(inputfilepath + "_files");
htmlSettings.setImageTargetUri(inputfilepath.substring(inputfilepath.lastIndexOf("/") + 1) + "_files");
htmlSettings.setWmlPackage(wordMLPackage);
String userCSS = null;