我发现自己正在编写这种模式a lot:
tmp = <some operation>
result = tmp[<boolean expression>]
del tmp
...在哪里<boolean expression>
被理解为一个布尔表达式涉及 tmp
。 (暂且,tmp
始终是 pandas 数据框,但我认为如果我使用 numpy ndarrays,会出现相同的模式 - 不确定。)
例如:
tmp = df.xs('A')['II'] - df.xs('B')['II']
result = tmp[tmp < 0]
del tmp
正如人们可以从del tmp
最后,only创建原因tmp
根本就是这样我可以在应用于它的索引表达式中使用涉及它的布尔表达式。
I would love to eliminate the need for this (otherwise useless) intermediate, but I don't know of any efficient1 way to do this. (Please, correct me if I'm wrong!)
作为第二好,我想将这种模式推到一些辅助函数中。问题是找到一个合适的方法来通过<boolean expression>
到它。我只能想到不雅的。例如。:
def filterobj(obj, criterion):
return obj[eval(criterion % 'obj')]
This actually works2:
filterobj(df.xs('A')['II'] - df.xs('B')['II'], '%s < 0')
# Int
# 0 -1.650107
# 2 -0.718555
# 3 -1.725498
# 4 -0.306617
# Name: II
...但是使用eval
总是让我感觉很恶心……如果还有其他方法,请告诉我。
1E.g., any approach I can think of involving the filter
built-in is probably ineffiencient, since it would apply the criterion (some lambda function) by iterating, "in Python", over the panda (or numpy) object...
2The definition of df
used in the last expression above would be something like this:
import itertools
import pandas as pd
import numpy as np
a = ('A', 'B')
i = range(5)
ix = pd.MultiIndex.from_tuples(list(itertools.product(a, i)),
names=('Alpha', 'Int'))
c = ('I', 'II', 'III')
df = pd.DataFrame(np.random.randn(len(idx), len(c)), index=ix, columns=c)