文章目录
- 1、pdf转txt
- 2、判断txt文件是否为空
- 3、获取txt文件每一行
- 4、获取文件夹所有文件名
- 5、读写xlsx表格
- 6、遍历txt每个字符
- 7、字符串中字符替换
1、pdf转txt
def pdf_to_txt(dealPdf, path1, path2):
logging.propagate = False
logging.getLogger().setLevel(logging.ERROR)
pdf_filename = dealPdf
device = PDFPageAggregator(PDFResourceManager(), laparams=LAParams())
interpreter = PDFPageInterpreter(PDFResourceManager(), device)
parser = PDFParser(open(path1 + pdf_filename, 'rb'))
doc = PDFDocument(parser)
txt_filename = dealPdf.split('.')[0] + '.txt'
if not doc.is_extractable:
raise PDFTextExtractionNotAllowed
else:
with open(path2 + txt_filename, 'w', encoding="utf-8") as fw:
for i, page in enumerate(PDFPage.create_pages(doc)):
interpreter.process_page(page)
layout = device.get_result()
for x in layout:
if isinstance(x, LTTextBox):
results = x.get_text()
print(results)
fw.write(results)
2、判断txt文件是否为空
def check(path):
with open(path, 'r', encoding="utf-8") as f:
txt = f.read()
if txt == '':
return True
else:
return False
3、获取txt文件每一行
def get_lines(path):
with open(path, 'r', encoding="utf-8") as f:
txt = f.readlines()
return txt
4、获取文件夹所有文件名
import os
def get_lines(path):
files_name = os.listdir(path)
return files_name
5、读写xlsx表格
import openpyxl
one_xlsx = openpyxl.load_workbook('E:/one.xlsx')
sheet_1 = one_xlsx.worksheets[0]
sheet_2 = one_xlsx.worksheets[1]
col_num = sheet_1.max_column
row_num = sheet_1.max_row
for i in range(1,row_num + 1):
for j in range(1, col_num + 1):
print(sheet_1.cell(i, j).value)
one_xlsx.save('E:/two.xlsx')
6、遍历txt每个字符
def visit(path):
with open(path, 'r', encoding="utf-8") as f:
txt = f.read()
for i in range(len(txt)):
print(txt[i])
7、字符串中字符替换
txt.replace(str1, str2)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)