use map http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html#pandas.Series.map执行查找:
In [46]:
df['1st'] = df['1st'].map(idxDict)
df
Out[46]:
1st 2nd
0 a 2
1 b 4
2 c 6
避免出现没有有效密钥可以通过的情况na_action='ignore'
您还可以使用df['1st'].replace(idxDict)
但回答你关于效率的问题:
timings
In [69]:
%timeit df['1st'].replace(idxDict)
%timeit df['1st'].map(idxDict)
1000 loops, best of 3: 1.57 ms per loop
1000 loops, best of 3: 1.08 ms per loop
In [70]:
%%timeit
for k,v in idxDict.items():
df ['1st'] = df ['1st'].replace(k, v)
100 loops, best of 3: 3.25 ms per loop
所以使用map
这里速度快了 3 倍以上
在更大的数据集上:
In [3]:
df = pd.concat([df]*10000, ignore_index=True)
df.shape
Out[3]:
(30000, 2)
In [4]:
%timeit df['1st'].replace(idxDict)
%timeit df['1st'].map(idxDict)
100 loops, best of 3: 18 ms per loop
100 loops, best of 3: 4.31 ms per loop
In [5]:
%%timeit
for k,v in idxDict.items():
df ['1st'] = df ['1st'].replace(k, v)
100 loops, best of 3: 18.2 ms per loop
对于 30K 行 df,map
大约快 4 倍,因此可扩展性比replace
或循环