请问如何将pdf转换为docx。我尝试使用 pdfminer 转换为 html 来提取文本,但看起来仍然不够好。
pdf2docx
- 安装pdf2docx包点击here https://github.com/dothinking/pdf2docx
安装
-
克隆或下载 pdf2docx
pip install pdf2docx
or
# download the package and install your environment
python setup.py install
-
Option 1
from pdf2docx import Converter
pdf_file = r'C:\Users\ABCD\Desktop\XYZ/Document1.pdf'# source file
docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample.docx' # destination file
# convert pdf to docx
cv = Converter(pdf_file)
cv.convert(docx_file, start=0, end=None)
cv.close()
#Output
Parsing Page 53: 53/53...
Creating Page 53: 53/53...
--------------------------------------------------
Terminated in 6.258919400000195s.
-
Option 2
from pdf2docx import parse
pdf_file = r'C:\Users\ABCD\Desktop\XYZ/Document2.pdf' # source file
docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample_2.docx' # destination file
# convert pdf to docx
parse(pdf_file, docx_file, start=0, end=None)
# output
Parsing Page 53: 53/53...
Creating Page 53: 53/53...
--------------------------------------------------
Terminated in 5.883666100000482s.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)