根据我的测试,这是最快、准确和高效的实现:
def HA(df):
df['HA_Close']=(df['Open']+ df['High']+ df['Low']+df['Close'])/4
idx = df.index.name
df.reset_index(inplace=True)
for i in range(0, len(df)):
if i == 0:
df.set_value(i, 'HA_Open', ((df.get_value(i, 'Open') + df.get_value(i, 'Close')) / 2))
else:
df.set_value(i, 'HA_Open', ((df.get_value(i - 1, 'HA_Open') + df.get_value(i - 1, 'HA_Close')) / 2))
if idx:
df.set_index(idx, inplace=True)
df['HA_High']=df[['HA_Open','HA_Close','High']].max(axis=1)
df['HA_Low']=df[['HA_Open','HA_Close','Low']].min(axis=1)
return df
这是我的测试算法(基本上我使用了本文中提供的算法来对速度结果进行基准测试):
import quandl
import time
df = quandl.get("NSE/NIFTY_50", start_date='1997-01-01')
def test_HA():
print('HA Test')
start = time.time()
HA(df)
end = time.time()
print('Time taken by set and get value functions for HA {}'.format(end-start))
start = time.time()
df['HA_Close_t']=(df['Open']+ df['High']+ df['Low']+df['Close'])/4
from collections import namedtuple
nt = namedtuple('nt', ['Open','Close'])
previous_row = nt(df.ix[0,'Open'],df.ix[0,'Close'])
i = 0
for row in df.itertuples():
ha_open = (previous_row.Open + previous_row.Close) / 2
df.ix[i,'HA_Open_t'] = ha_open
previous_row = nt(ha_open, row.Close)
i += 1
df['HA_High_t']=df[['HA_Open_t','HA_Close_t','High']].max(axis=1)
df['HA_Low_t']=df[['HA_Open_t','HA_Close_t','Low']].min(axis=1)
end = time.time()
print('Time taken by ix (iloc, loc) functions for HA {}'.format(end-start))
这是我在 i7 处理器上得到的输出(请注意,结果可能会根据您的处理器速度而有所不同,但我假设结果会类似):
HA Test
Time taken by set and get value functions for HA 0.05005788803100586
Time taken by ix (iloc, loc) functions for HA 0.9360761642456055
我对 Pandas 的经验表明,其功能类似于ix
, loc
, iloc
相比之下较慢set_value
and get_value
功能。此外,使用以下方法计算列本身的值shift
函数给出错误的结果。