你可以使用functools.reduce https://docs.python.org/3.5/library/functools.html#functools.reduce迭代应用pd.merge
对于每个数据帧:
result = functools.reduce(merge, dfs)
这相当于
result = dfs[0]
for df in dfs[1:]:
result = merge(result, df)
为了通过on=['org', 'name']
论证,你可以使用functools.partial
定义合并函数:
merge = functools.partial(pd.merge, on=['org', 'name'])
由于指定了suffixes
参数输入functools.partial
只会允许
后缀的一个固定选择,因为这里我们需要为每个后缀选择一个不同的后缀pd.merge
打电话,我认为准备 DataFrames 列是最简单的
打电话前的名字pd.merge
:
for i, df in enumerate(dfs, start=1):
df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')},
inplace=True)
例如,
import pandas as pd
import numpy as np
import functools
np.random.seed(2015)
N = 50
dfs = [pd.DataFrame(np.random.randint(5, size=(N,4)),
columns=['org', 'name', 'items', 'spend']) for i in range(9)]
for i, df in enumerate(dfs, start=1):
df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')},
inplace=True)
merge = functools.partial(pd.merge, on=['org', 'name'])
result = functools.reduce(merge, dfs)
print(result.head())
yields
org name items_df1 spend_df1 items_df2 spend_df2 items_df3 \
0 2 4 4 2 3 0 1
1 2 4 4 2 3 0 1
2 2 4 4 2 3 0 1
3 2 4 4 2 3 0 1
4 2 4 4 2 3 0 1
spend_df3 items_df4 spend_df4 items_df5 spend_df5 items_df6 \
0 3 1 0 1 0 4
1 3 1 0 1 0 4
2 3 1 0 1 0 4
3 3 1 0 1 0 4
4 3 1 0 1 0 4
spend_df6 items_df7 spend_df7 items_df8 spend_df8 items_df9 spend_df9
0 3 4 1 3 0 1 2
1 3 4 1 3 0 0 3
2 3 4 1 3 0 0 0
3 3 3 1 3 0 1 2
4 3 3 1 3 0 0 3