尝试这个:
我创建了 2 个索引,然后将第 2-4 行设置为重复:
import numpy as np
test_master = pd.DataFrame(np.random.rand(3, 3), columns=['A', 'B', 'C'])
test_daily = pd.DataFrame(np.random.rand(5, 3), columns=['A', 'B', 'C'])
test_daily.iloc[1:4] = test_master[:3].values
print(test_master)
print(test_daily)
output:
A B C
0 0.009322 0.330057 0.082956
1 0.197500 0.010593 0.356774
2 0.147410 0.697779 0.421207
A B C
0 0.643062 0.335643 0.215443
1 0.009322 0.330057 0.082956
2 0.197500 0.010593 0.356774
3 0.147410 0.697779 0.421207
4 0.973867 0.873358 0.502973
然后,添加多索引级别来识别哪些数据来自哪个数据帧:
test_master['master'] = 'master'
test_master.set_index('master', append=True, inplace=True)
test_daily['daily'] = 'daily'
test_daily.set_index('daily', append=True, inplace=True)
现在按照您的建议合并并删除重复项:
merged = test_master.append(test_daily)
merged = merged.drop_duplicates().sort_index()
print(merged)
output:
A B C
master
0 daily 0.643062 0.335643 0.215443
master 0.009322 0.330057 0.082956
1 master 0.197500 0.010593 0.356774
2 master 0.147410 0.697779 0.421207
4 daily 0.973867 0.873358 0.502973
在那里您可以看到组合数据框以及索引中标记的数据来源。现在只需对每日数据进行切片:
idx = pd.IndexSlice
print(merged.loc[idx[:, 'daily'], :])
output:
A B C
master
0 daily 0.643062 0.335643 0.215443
4 daily 0.973867 0.873358 0.502973