我一直在努力克服对 Cython 的恐惧(恐惧是因为我对 c 或 c++ 一无所知)
我有一个函数需要 2 个参数,一个集合(我们称之为testSet
)和一个集合列表(我们称之为targetSets
)。然后该函数会迭代targetSets
,并计算与的交集长度testSet
,将该值添加到列表中,然后返回该列表。
现在,这本身并没有那么慢,但问题是我需要对 testSet 进行模拟(数量很大,约 10,000 个),而 targetSet 大约有 10,000 个集长。
因此,对于要测试的少量模拟,纯 Python 实现大约需要 50 秒。
我尝试制作一个 cython 函数,它成功了,现在运行时间约为 16 秒。
如果我可以对任何人都能想到的 cython 函数做任何其他事情,那就太好了(python 2.7 btw)
这是我的 Cython 实现重叠函数.pyx
def computeOverlap(set testSet, list targetSets):
cdef list obsOverlaps = []
cdef int i, N
cdef set overlap
N = len(targetSets)
for i in range(N):
overlap = testSet & targetSets[i]
if len(overlap) <= 1:
obsOverlaps.append(0)
else:
obsOverlaps.append(len(overlap))
return obsOverlaps
and the setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("overlapFunc",
["overlapFunc.pyx"])]
setup(
name = 'computeOverlap function',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
以及一些代码来构建一些随机集以进行测试并对函数进行计时。test.py
import numpy as np
from overlapFunc import computeOverlap
import time
def simRandomSet(n):
for i in range(n):
simSet= set(np.random.randint(low=1, high=100, size=50))
yield simSet
if __name__ == '__main__':
np.random.seed(23032014)
targetSet = [set(np.random.randint(low=1, high=100, size=50)) for i in range(10000)]
simulatedTestSets = simRandomSet(200)
start = time.time()
for i in simulatedTestSets:
obsOverlaps = computeOverlap(i, targetSet)
print time.time()-start
我尝试更改 ComputerOverlap 函数开头的 def,如下所示:
cdef list computeOverlap(set testSet, list targetSets):
但当我运行时收到以下警告消息setup.py
script:
'__pyx_f_11overlapFunc_computeOverlap' defined but not used [-Wunused-function]
然后当我运行尝试使用该函数的东西时,我收到导入错误:
from overlapFunc import computeOverlap
ImportError: cannot import name computeOverlap
在此先感谢您的帮助,
Cheers,
Davy