我知道网上有大量用于异常值去除的资源,但我还没有设法获得我真正想要的东西,所以在这里发布,我有一个数组(或 DF)4
列。现在我想根据列的异常值从 DF 中删除行。以下是我尝试过的,但并不完美。
def outliers2(data2, m = 4.5):
c=[]
data = data2[:,1] # Choosing the column
d = np.abs(data - np.median(data)) # deviation comoutation
mdev = np.median(d) # mean deviation
for i in range(len(data)):
if (abs(data[i] - mdev) < m * np.std(data)):
c.append(data2[i])
return c
x = pd.DataFrame(outliers2(np.array(b)))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column
#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) # After outlier removal
plt.legend()
该图说明了结果的外观,在蓝色原始点上经过异常值处理后的红点。我真的很想去掉 x~0 标记周围的那些垂直点组。该怎么办 ?
A link to the data file is provided here : Full data https://drive.google.com/file/d/1aYPX31zE4P-LW5Hva6fdqNUf4fwYHpJa/view?usp=sharing
The green circles show typically the points i would like to get rid of