假设您有足够的内存来保存原始数组和新数组的形状的布尔掩码,这是一种方法:
import numpy as np
def main():
np.random.seed(1) # For reproducibility
data = generate_data((10, 6))
indices = rightmost_min_col(data)
new_data = pop_col(data, indices)
print 'Original data...'
print data
print 'Modified data...'
print new_data
def generate_data(shape):
return np.random.randint(0, 10, shape)
def rightmost_min_col(data):
nrows, ncols = data.shape[:2]
min_indices = np.fliplr(data).argmin(axis=1)
min_indices = (ncols - 1) - min_indices
return min_indices
def pop_col(data, col_indices):
nrows, ncols = data.shape[:2]
col_indices = col_indices[:, np.newaxis]
row_indices = np.arange(ncols)[np.newaxis, :]
mask = col_indices != row_indices
return data[mask].reshape((nrows, ncols-1))
if __name__ == '__main__':
main()
这产生:
Original data...
[[5 8 9 5 0 0]
[1 7 6 9 2 4]
[5 2 4 2 4 7]
[7 9 1 7 0 6]
[9 9 7 6 9 1]
[0 1 8 8 3 9]
[8 7 3 6 5 1]
[9 3 4 8 1 4]
[0 3 9 2 0 4]
[9 2 7 7 9 8]]
Modified data...
[[5 8 9 5 0]
[7 6 9 2 4]
[5 2 4 4 7]
[7 9 1 7 6]
[9 9 7 6 9]
[1 8 8 3 9]
[8 7 3 6 5]
[9 3 4 8 4]
[0 3 9 2 4]
[9 7 7 9 8]]
我在这里使用的可读性较差的技巧之一是在数组比较期间利用 numpy 的广播。作为一个简单的例子,请考虑以下内容:
import numpy as np
a = np.array([[1, 2, 3]])
b = np.array([[1],[2],[3]])
print a == b
这产生:
array([[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
因此,如果我们知道要删除的项目的列索引,我们可以对列索引数组的操作进行向量化,这就是pop_col
does.