我建议您使用 DataFrame API,它允许在以下方面与 DF 一起操作join, merge http://pandas.pydata.org/pandas-docs/stable/merging.html, groupby http://pandas.pydata.org/pandas-docs/stable/groupby.html等等。您可以在下面找到我的解决方案:
import pandas as pd
df1 = pd.DataFrame({'Column1': [1,2,3,4,5],
'Column2': ['a','b','c','d','e'],
'Column3': ['r','u','k','j','f']})
df2 = pd.DataFrame({'Column1': [1,1,1,2,2,3,3], 'ColumnB': ['a','d','e','r','w','y','h']})
dfs = pd.DataFrame({})
for name, group in df2.groupby('Column1'):
buffer_df = pd.DataFrame({'Column1': group['Column1'][:1]})
i = 0
for index, value in group['ColumnB'].iteritems():
i += 1
string = 'Column_' + str(i)
buffer_df[string] = value
dfs = dfs.append(buffer_df)
result = pd.merge(df1, dfs, how='left', on='Column1')
print(result)
结果是:
Column1 Column2 Column3 Column_0 Column_1 Column_2
0 1 a r a d e
1 2 b u r w NaN
2 3 c k y h NaN
3 4 d j NaN NaN NaN
4 5 e f NaN NaN NaN
附注更多细节:
1) 对于 df2 我产生groups按“Column1”。单人group是一个数据框。下面的例子:
Column1 ColumnB
0 1 a
1 1 d
2 1 e
2)对于每个group我生成数据框缓冲区_df:
Column1 Column_0 Column_1 Column_2
0 1 a d e
3)之后我创建DFdfs:
Column1 Column_0 Column_1 Column_2
0 1 a d e
3 2 r w NaN
5 3 y h NaN
4)最后我执行左连接df1 and dfs获得所需的结果。
2)* 缓冲区_df迭代产生:
step0 (buffer_df = pd.DataFrame({'Column1': group['Column1'][:1]})):
Column1
5 3
step1 (buffer_df['Column_0'] = group['ColumnB'][5]):
Column1 Column_0
5 3 y
step2 (buffer_df['Column_1'] = group['ColumnB'][5]):
Column1 Column_0 Column_1
5 3 y h