我正在尝试使用拟合高斯总和scikit学习 http://scikit-learn.org/stable/index.html因为 scikit-learn高斯混合 http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.fit看起来比使用 curve_fit 更稳健。
Problem:它在拟合单个高斯峰值的截断部分方面效果不佳:
from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np
clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
data = np.random.randn(10000)
data = [[x] for x in data]
clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = matplotlib.pyplot.hist(data, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
mlab.normpdf(np.linspace(rangeMin, rangeMax),
clf.means_, np.sqrt(clf.covariances_[0]))[0])
gives
now changing data = [[x] for x in data]
to data = [[x] for x in data if x <0]
in order to truncate the distribution returns
Any ideas how to get the truncation fitted properly?
Note:分布不一定会在中间被截断,可能会剩下完整分布的 50% 到 100% 之间的内容。
如果有人能给我指出替代方案,我也会很高兴。我只尝试过 curve_fit 但一旦涉及两个以上的峰值就无法让它做任何有用的事情。