Option 1
set
转换和difference
using np.where
df_temp = DF.applymap(set)
DF['x_sub_y'] = np.where(df_temp.X - df_temp.Y, False, True)
DF
X Y x_sub_y
0 [1, 5] [1, 2, 5] True
1 [1, 2] [1, 3, 5] False
Option 2
快点,astype
转换
DF['x_sub_y'] = ~(DF.X.apply(set) - DF.Y.apply(set)).astype(bool)
DF
X Y x_sub_y
0 [1, 5] [1, 2, 5] True
1 [1, 2] [1, 3, 5] False
Option 3
有趣的np.vectorize
def foo(x):
return not x
v = np.vectorize(foo)
DF['x_sub_y'] = v(DF.X.apply(set) - DF.Y.apply(set))
DF
X Y x_sub_y
0 [1, 5] [1, 2, 5] True
1 [1, 2] [1, 3, 5] False
延伸斯科特·波士顿的回答 https://stackoverflow.com/a/46163839/4909087使用相同的方法提高速度:
def foo(x, y):
return set(x).issubset(y)
v = np.vectorize(foo)
DF['x_sub_y'] = v(DF.X, DF.Y)
DF
X Y x_sub_y
0 [1, 5] [1, 2, 5] True
1 [1, 2] [1, 3, 5] False
Small
1000 loops, best of 3: 460 µs per loop # Before
10000 loops, best of 3: 103 µs per loop # After
Large (df * 10000
)
1 loop, best of 3: 1.26 s per loop # Before
100 loops, best of 3: 13.3 ms per loop # After