我正在尝试填充数据框(elist)与公司的累积回报和累积市场回报。这可以通过使用循环遍历 elist 数据框来完成iterrows
,看这个link https://stackoverflow.com/questions/42593859/why-cant-iterrows-do-math-and-instead-returns-integer-values-where-these-shou。然而,这很慢。
我正在寻找更有效、更快的解决方案。
作为累积收益计算输入的原始收益源自两个数据帧 (ri, rm)。结果应记录在以下列中elist。请参阅下面的示例,使用此中的数据file https://www.dropbox.com/s/r69b54q2zw1wp7q/cumrets.zip?dl=0.
在运行之前iterrows
loop, elist好像:
permno begdat enddat return vwretd
0 11628 2012-03-31 2013-03-31 NaN NaN
1 11628 2012-06-30 2013-06-30 NaN NaN
2 11628 2012-09-30 2013-09-30 NaN NaN
3 11628 2012-12-31 2013-12-31 NaN NaN
4 11628 2013-03-31 2014-03-31 NaN NaN
运行循环后elist应该看起来像:
permno begdat enddat return vwretd
0 11628 2012-03-31 2013-03-31 0.212355 0.133429
1 11628 2012-06-30 2013-06-30 0.274788 0.198380
2 11628 2012-09-30 2013-09-30 0.243590 0.198079
3 11628 2012-12-31 2013-12-31 0.299277 0.304479
4 11628 2013-03-31 2014-03-31 0.303147 0.208454
这是依赖 iterrows 的代码,速度很慢:
import os,sys
import pandas as pd
import numpy as np
rm = pd.read_csv('rm_so.csv') # market return
ri = pd.read_csv('ri_so.csv') # firm return
elist = pd.read_csv('elist_so.csv') # table to be filled with cumlative returns over a period (begdat to enddat)
for index, row in elist.iterrows():
#fill cumulative market return
elist.loc[index, 'vwretd']=rm.loc[(rm['date']>row['begdat']) & (rm['date']<=row['enddat']),'vwretd'].product()-1
#fill cumulative firm return
r = ri.loc[(ri['permno']==row['permno']),]
elist.loc[index, 'return'] = r.loc[(r['date']>row['begdat']) & (r['date']<=row['enddat']),'ret'].product()-1
很高兴看到这个过程运行得更快!