我有两个熊猫数据框,我正在尝试将它们组合成一个数据框。我是这样设置它们的:
a = {'date':['1/1/2015 00:00','1/1/2015 00:15','1/1/2015 00:30'], 'num':[1,2,3]}
b = {'date':['1/1/2015 01:15','1/1/2015 01:30','1/1/2015 01:45'], 'num':[4,5,6]}
dfa = pd.DataFrame(a)
dfb = pd.DataFrame(b)
dfa['date'] = dfa['date'].apply(pd.to_datetime)
dfb['date'] = dfb['date'].apply(pd.to_datetime)
然后我找到earliest
and latest
每个数据帧的时间戳,并创建一个新的数据帧,该数据帧以date
series:
earliest = min(dfa['date'].min(), dfb['date'].min())
latest = max(dfa['date'].max(), dfb['date'].max())
date_range = pd.date_range(earliest, latest, freq='15min')
dfd = pd.DataFrame({'date':date_range})
然后我想将它们全部合并到一个数据框中dfd
作为基础,因为它将包含所有正确的时间戳。所以我合并dfd
and dfa
一切都很好:
dfd = pd.merge(dfd, dfa, how = 'outer', on = 'date')
但是,当我将它与dfb
the date
系列变得扭曲,我不明白为什么。
dfd = pd.merge(dfd, dfb, how = 'outer', on = ['date','num'])
...产量:
date num
0 2015-01-01 00:00:00 1.0
1 2015-01-01 00:15:00 2.0
2 2015-01-01 00:30:00 3.0
3 2015-01-01 00:45:00 NaN
4 2015-01-01 01:00:00 NaN
5 2015-01-01 01:15:00 NaN
6 2015-01-01 01:30:00 NaN
7 2015-01-01 01:45:00 NaN
8 2015-01-01 01:15:00 4.0
9 2015-01-01 01:30:00 5.0
10 2015-01-01 01:45:00 6.0
我所期望的地方4.0
填写2015-01-01 01:15:00
时间段等,并且不创建新行。
或者如果我尝试:
dfd = pd.merge(dfd, dfb, how = 'outer', on = 'date')
I get:
date num_x num_y
0 2015-01-01 00:00:00 1.0 NaN
1 2015-01-01 00:15:00 2.0 NaN
2 2015-01-01 00:30:00 3.0 NaN
3 2015-01-01 00:45:00 NaN NaN
4 2015-01-01 01:00:00 NaN NaN
5 2015-01-01 01:15:00 NaN 4.0
6 2015-01-01 01:30:00 NaN 5.0
7 2015-01-01 01:45:00 NaN 6.0
这也不是我想要的(只想要一个num
柱子)。任何帮助,将不胜感激。