我们可以做一些调查来弄清楚这一点:
>>> import numpy as np
>>> a = np.arange(32)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
>>> a.data
<read-write buffer for 0x107d01e40, size 256, offset 0 at 0x107d199b0>
>>> id(a.data)
4433424176
>>> id(a[0])
4424950096
>>> id(a[1])
4424950096
>>> for item in a:
... print id(item)
...
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
那么这是怎么回事呢?首先,我查看了数组内存缓冲区的内存位置。它在4433424176
。这本身并不是too有启发性。但是,numpy 将其数据存储为连续的 C 数组,因此 numpy 数组中的第一个元素should对应于数组本身的内存地址,但它不是:
>>> id(a[0])
4424950096
这样做是件好事,因为这会破坏 python 中的不变量,即两个对象永远不会有相同的对象id
在他们的一生中。
So, how does numpy accomplish this? Well, the answer is that numpy has to wrap the returned object with a python type (e.g. numpy.float64
or numpy.int64
in this case) which takes time if you're iterating item-by-item1. Further proof of this is demonstrated when iterating -- We see that we're alternating between 2 separate IDs while iterating over the array. This means that python's memory allocator and garbage collector are working overtime to create new objects and then free them.
A list没有内存分配器/垃圾收集器开销。列表中的对象已经作为 python 对象存在(并且它们在迭代后仍然存在),因此两者在列表迭代中都不起任何作用。
计时方法:
另请注意,您的假设会导致您的时间安排有些偏差。你假设k + 1
在这两种情况下应该花费相同的时间,但事实并非如此。请注意我是否重复你的时间安排without做任何添加:
mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k"
1000 loops, best of 3: 233 usec per loop
mgilson$ python -m timeit "for k in range(5000): k"
10000 loops, best of 3: 114 usec per loop
大约只有 2 倍的差异。然而,进行加法运算会导致 5 倍左右的差异:
mgilson$ python -m timeit "for k in range(5000): k+1"
10000 loops, best of 3: 179 usec per loop
mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1"
1000 loops, best of 3: 786 usec per loop
为了好玩,让我们做一下加法:
$ python -m timeit -s "v = 1" "v + 1"
10000000 loops, best of 3: 0.0261 usec per loop
mgilson$ python -m timeit -s "import numpy; v = numpy.int64(1)" "v + 1"
10000000 loops, best of 3: 0.121 usec per loop
最后,您的 timeit 还包括列表/数组构建时间,这并不理想:
mgilson$ python -m timeit -s "v = range(5000)" "for k in v: k"
10000 loops, best of 3: 80.2 usec per loop
mgilson$ python -m timeit -s "import numpy; v = numpy.arange(5000)" "for k in v: k"
1000 loops, best of 3: 237 usec per loop
请注意,在这种情况下,numpy 实际上距离列表解决方案更远了。这表明迭代 really is速度较慢,如果将 numpy 类型转换为标准 python 类型,可能会获得一些加速。
1Note, this doesn't take a lot of time when slicing because that only has to allocate O(1) new objects since numpy returns a view into the original array.