后果我之前的问题 https://stackoverflow.com/questions/31742609/how-to-strip-the-leading-unciode-characters-from-a-file/31742694?noredirect=1#comment51595470_31742694,我已经编码了:
def ConvertFileToAscii(args, filePath):
try:
# Firstly, make sure that the file is writable by all, otherwise we can't update it
os.chmod(filePath, 0o666)
with open(filePath, "rb") as file:
contentOfFile = file.read()
unicodeData = contentOfFile.decode("utf-8")
asciiData = unicodeData.encode("ascii", "ignore")
asciiData = unicodedata.normalize('NFKD', unicodeData).encode('ASCII', 'ignore')
temporaryFile = tempfile.NamedTemporaryFile(mode='wt', delete=False)
temporaryFileName = temporaryFile.name
with open(temporaryFileName, 'wb') as file:
file.write(asciiData)
if ((args.info) or (args.diagnostics)):
print(filePath + ' converted to ASCII and stored in ' + temporaryFileName)
return temporaryFileName
#
except KeyboardInterrupt:
raise
except Exception as e:
print('!!!!!!!!!!!!!!!\nException while trying to convert ' + filePath + ' to ASCII')
print(e)
exc_type, exc_value, exc_traceback = sys.exc_info()
print(traceback.format_exception(exc_type, exc_value, exc_traceback))
if args.break_on_error:
sys.exit('Break on error\n')
当我运行它时,我收到如下异常:
['Traceback (most recent call last):
', ' File "/home/ker4hi/tools/xmlExpand/xmlExpand.py", line 99, in ConvertFileToAscii
unicodeData = contentOfFile.decode("utf-8")
', "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 1081: invalid start byte"]
我究竟做错了什么?
我真的不关心将它们转换为 ASCII 时的数据丢失。
ox9C 是Ü
带有变音符号 (Umlaut) 的 U,没有它我也能生活。
如何将此类文件转换为仅包含纯 Ascii 字符?我真的需要将它们作为二进制打开并检查每个字节吗?