处理 PDF 时文件 (2.pdf) https://yadi.sk/i/2vABlTaexZerg使用 pdfminer (pdf2txt.py) 我收到以下错误:
pdf2txt.py 2.pdf
Traceback (most recent call last):
File "/usr/local/bin/pdf2txt.py", line 115, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/usr/local/bin/pdf2txt.py", line 109, in main
interpreter.process_page(page)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 843, in render_contents
self.init_resources(resources)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 347, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 195, in get_font
font = self.get_font(None, subspec)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 186, in get_font
font = PDFCIDFont(self, spec)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 654, in __init__
StringIO(self.fontfile.get_data()))
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 375, in __init__
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16
虽然类似的文件 (1.pdf) https://yadi.sk/i/Z37JK5S9xZeoX不会造成问题。
我找不到有关该错误的任何信息。我添加了一个issue https://github.com/euske/pdfminer/issues/144在 pdfminer GitHub 存储库上,但仍未得到答复。有人可以向我解释为什么会发生这种情况吗?我可以做什么来解析2.pdf https://yadi.sk/i/2vABlTaexZerg?
Update:我遇到类似的错误BytesIO
代替StringIO
after 安装pdfminer https://github.com/euske/pdfminer#how-to-install直接来自 GitHub 存储库。
$ pdf2txt.py 2.pdf
Traceback (most recent call last):
File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 116, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 110, in main
interpreter.process_page(page)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 839, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 850, in render_contents
self.init_resources(resources)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 356, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 204, in get_font
font = self.get_font(None, subspec)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 195, in get_font
font = PDFCIDFont(self, spec)
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 665, in __init__
BytesIO(self.fontfile.get_data()))
File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 386, in __init__
(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16