我正在构建一个 Python 实用程序,它将涉及将整数映射到单词字符串,其中许多整数可能映射到同一个字符串。根据我的理解,Python 默认情况下会实习短字符串和大多数硬编码字符串,从而通过在表中保留字符串的“规范”版本来节省内存开销。我认为通过实习字符串值我可以从中受益,尽管字符串实习更多的是为了键散列优化而构建的。我编写了一个快速测试,检查长字符串的字符串相等性,首先仅将字符串存储在列表中,然后将字符串作为值存储在字典中。这种行为出乎我意料:
import sys
top = 10000
non1 = []
non2 = []
for i in range(top):
s1 = '{:010d}'.format(i)
s2 = '{:010d}'.format(i)
non1.append(s1)
non2.append(s2)
same = True
for i in range(top):
same = same and (non1[i] is non2[i])
print("non: ", same) # prints False
del non1[:]
del non2[:]
with1 = []
with2 = []
for i in range(top):
s1 = sys.intern('{:010d}'.format(i))
s2 = sys.intern('{:010d}'.format(i))
with1.append(s1)
with2.append(s2)
same = True
for i in range(top):
same = same and (with1[i] is with2[i])
print("with: ", same) # prints True
###############################
non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"
with_dict = {}
with_dict[1] = sys.intern("this is a long string")
with_dict[2] = sys.intern("this is another long string")
with_dict[3] = sys.intern("this is a long string")
with_dict[4] = sys.intern("this is another long string")
print("non: ", non_dict[1] is non_dict[3] and non_dict[2] is non_dict[4]) # prints True ???
print("with: ", with_dict[1] is with_dict[3] and with_dict[2] is with_dict[4]) # prints True
我认为非字典检查会导致“False”打印输出,但我显然错了。有人知道发生了什么事吗?就我而言,字符串实习是否会产生任何好处?我可以有很多,many如果我合并来自多个输入文本的数据,则键比单个值更多,因此我正在寻找一种节省内存空间的方法。 (也许我必须使用数据库,但这超出了这个问题的范围。)
先感谢您!
字节码编译器执行的优化之一与实习类似但不同,是它将在同一代码块中使用相同的对象来表示相同的常量。这里的字符串文字:
non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"
位于同一代码块中,因此相同的字符串最终由同一字符串对象表示。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)