df_对:
city1 city2
0 sfo yyz
1 sfo yvr
2 sfo dfw
3 sfo ewr
df_pairs.to_dict('records') 的输出:
[{'city1': 'sfo', 'city2': 'yyz'},
{'city1': 'sfo', 'city2': 'yvr'},
{'city1': 'sfo', 'city2': 'dfw'},
{'city1': 'sfo', 'city2': 'ewr'}]
data_df:
city 2016-02-02 00:00:00 2016-02-05 00:00:00 2016-02-01 00:00:00 2016-02-04 00:00:00 2016-02-03 00:00:00
0 sfo -33.63 -62.34 -35.70 -31.84 -33.87
1 yyz -24.31 -51.17 -22.07 -31.00 -23.00
2 yvr -24.31 -51.17 -22.07 -31.00 -23.00
3 dfw -32.17 -43.77 -34.84 0.27 -11.49
4 ewr -28.87 -59.66 -28.40 -32.94 -29.06
data_df.to_dict('records') 的输出
[{'city': 'sfo',
Timestamp('2016-02-02 00:00:00'): -33.63,
Timestamp('2016-02-05 00:00:00'): -62.34,
Timestamp('2016-02-01 00:00:00'): -35.7,
Timestamp('2016-02-04 00:00:00'): -31.84,
Timestamp('2016-02-03 00:00:00'): -33.87},
{'city': 'yyz',
Timestamp('2016-02-02 00:00:00'): -24.31,
Timestamp('2016-02-05 00:00:00'): -51.17,
Timestamp('2016-02-01 00:00:00'): -22.07,
Timestamp('2016-02-04 00:00:00'): -31.0,
Timestamp('2016-02-03 00:00:00'): -23.0},
{'city': 'yvr',
Timestamp('2016-02-02 00:00:00'): -24.31,
Timestamp('2016-02-05 00:00:00'): -51.17,
Timestamp('2016-02-01 00:00:00'): -22.07,
Timestamp('2016-02-04 00:00:00'): -31.0,
Timestamp('2016-02-03 00:00:00'): -23.0},
{'city': 'dfw',
Timestamp('2016-02-02 00:00:00'): -32.17,
Timestamp('2016-02-05 00:00:00'): -43.77,
Timestamp('2016-02-01 00:00:00'): -34.84,
Timestamp('2016-02-04 00:00:00'): 0.27,
Timestamp('2016-02-03 00:00:00'): -11.49},
{'city': 'ewr',
Timestamp('2016-02-02 00:00:00'): -28.87,
Timestamp('2016-02-05 00:00:00'): -59.66,
Timestamp('2016-02-01 00:00:00'): -28.4,
Timestamp('2016-02-04 00:00:00'): -32.94,
Timestamp('2016-02-03 00:00:00'): -29.06}]
所以我有一个名为df_pairs
。对于每一对df_pairs
,我想在中查找 city1 和 city2data_df
,从另一个中减去一个,取差时间序列的符号,分离正负符号值,分离正差值和负差值,并计算 data_df 列中每一列的总和。
diff_df_sign_pos = diff_df_sign_neg = diff_df_pos = diff_df_neg = 0
for i in range(0,len(data_df.columns)):
a = pd.merge(df_pairs[['city1','city2']], data_df.ix[:, [i]], left_on='city1', right_index=True, how='left').set_index(['city1', 'city2'])
b = pd.merge(df_pairs[['city1','city2']], data_df.ix[:, [i]], left_on='city2', right_index=True, how='left').set_index(['city1', 'city2'])
diff_df = b - a
diff_df_sign = np.sign(diff_df)
diff_df_sign_pos+= diff_df_sign.clip(lower=0)
diff_df_sign_neg+= diff_df_sign.clip(upper=0)
diff_df_pos+= diff_df.clip(lower=0)
diff_df_neg+= diff_df.clip(upper=0)
如果运行上面的代码,您将看到最终值diff_df_sign_pos
, diff_df_sign_neg
, diff_df_pos
and diff_df_neg
是 NaN。
例如,最终结果为diff_df_sign_pos
应该看起来像:
2016-02-03 00:00:00
city1 city2
sfo yyz 5.0
yvr 5.0
dfw 5.0
ewr 4.0
这告诉我们 yyz、yvr、dfw 和 sfo 之间的所有 5 个差异都是正的。