您正在寻找的相关点是百分位数 http://en.wikipedia.org/wiki/Percentile:
% generate sample data
data = [randn(900,1) ; randn(50,1)*3 + 5; ; randn(50,1)*3 - 5];
subplot(121), hist(data)
subplot(122), boxplot(data)
% find 5th, 95th percentiles (range that contains 90% of the data)
limits = prctile(data, [5 95])
% find data in that range
reducedData = data(limits(1) < data & data < limits(2));
存在其他方法来检测outliers http://en.wikipedia.org/wiki/Outlier, 如那个IQR 异常值测试 http://en.wikipedia.org/wiki/Outlier#Identifying_outliers和三标准差法则 http://en.wikipedia.org/wiki/Three_sigma_rule,其中包括:
%% three standard deviation rule
z = 3;
bounds = z * std(data)
reducedData = data( abs(data-mean(data)) < bounds );
and
%% IQR outlier test
Q = prctile(data, [25 75]);
IQ = Q(2)-Q(1);
%a = 1.5; % mild outlier
a = 3.0; % extreme outlier
bounds = [Q(1)-a*IQ , Q(2)+a*IQ]
reducedData = data(bounds(1) < data & data < bounds(2));
顺便说一句,如果你想获得 z 值(|X|<z
)对应于曲线下 90% 的面积,使用:
area = 0.9; % two-tailed probability
z = norminv(1-(1-area)/2)