Overview
我正在使用 WEKA API 3.7.10(开发者版本)来使用我预制的.model
files.
我制作了 25 个模型:五种算法的五个结果变量。
-
J48决策树 http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html.
- 交替决策树
- 随机森林
- LogitBoost
- 随机子空间
我在 J48、随机子空间和随机森林方面遇到问题。
必要的文件
以下是ARFF
创建后我的数据的表示:
@relation WekaData
@attribute ageDiagNum numeric
@attribute raceGroup {Black,Other,Unknown,White}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg {'Not performed, patient died prior to recommended surgery','Not recommended','Not recommended, contraindicated due to other conditions','Recommended but not performed, patient refused','Recommended but not performed, unknown reason','Recommended, unknown if performed','Surgery performed','Unknown; death certificate or autopsy only case'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27,28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@attribute time4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}
@data
65,White,IIA,MX,'Not recommended, contraindicated due to other conditions',14,?,?,?,?,?
我需要获取二进制属性time2
to time10
来自各自的型号。
下面是我用来获取预测的代码片段all模型文件:
private static Map<String, Object> predict(Instances instances,
Classifier classifier, int attributeIndex) {
Map<String, Object> map = new LinkedHashMap<String, Object>();
int instanceIndex = 0; // do not change, equal to row 1
double[] percentage = { 0 };
double outcomeValue = 0;
AbstractOutput abstractOutput = null;
if(classifier.getClass() == RandomForest.class || classifier.getClass() == RandomSubSpace.class) {
// has problems predicting time2 to time10
instances.setClassIndex(5);
} else {
// works as intended in LogitBoost and ADTree
instances.setClassIndex(attributeIndex);
}
try {
outcomeValue = classifier.classifyInstance(instances.instance(0));
percentage = classifier.distributionForInstance(instances
.instance(instanceIndex));
} catch (Exception e) {
e.printStackTrace();
}
map.put("Class", outcomeValue);
if (percentage.length > 0) {
double percentageRaw = 0;
if (outcomeValue == new Double(1)) {
percentageRaw = percentage[1];
} else {
percentageRaw = 1 - percentage[0];
}
map.put("Percentage", percentageRaw);
} else {
// because J48 returns an error if percentage[i] because it's empty
map.put("Percentage", new Double(0));
}
return map;
}
这是我用来预测结果的模型time2
因此我们将使用索引 6:
instances.setClassIndex(5);
- ADTree模型为time2预言 https://www.dropbox.com/s/xjdmfvqi3gnal6b/bosom.100k.2.adt.MODEL
- J48模型为time2预言 https://www.dropbox.com/s/hlk6enuo8xk0e34/bosom.100k.2.j48.MODEL
- RandomForest模型为time2预言 https://www.dropbox.com/s/7enbiuvzi9y0wd7/bosom.100k.2.rf.MODEL
- LogitBoost模型为time2预言 https://www.dropbox.com/s/6k1l3b38k20e0cb/bosom.100k.2.lb.MODEL
- RandomSubSpace模型为time2预言 https://www.dropbox.com/s/8a0ferlorfpjznm/bosom.100k.2.rs.MODEL
Problems
正如我之前所说,LogitBoost
and ADTree
与其他三种方法相比,这种简单的方法没有问题,因为我遵循了"在 Java 代码中使用 WEKA" http://weka.wikispaces.com/Use+WEKA+in+your+Java+code教程。
-
[Solved]根据我的调整,RandomForest http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/RandomForest.html and RandomSubSpace http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/RandomSubSpace.html返回一个ArrayOutOfBoundsException
如果被告知预测time2
to time10
.
java.lang.ArrayIndexOutOfBoundsException: 0
at weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586)
at weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java:602)
at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
堆栈跟踪将根本错误指向以下行:
outcomeValue = classifier.classifyInstance(instances.instance(0));
Solution:我在复制粘贴过程中遇到了一些错误ARFF
为二进制变量创建文件time2
to time10
关于FastVector<String>()
的赋值给FastVector<Attribute>()
目的。我的全部十个RandomForest
and RandomSubSpace
模型现在工作正常!
-
[Solved] J48决策树 http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html现在有一个新问题。它现在返回一个错误,而不是不提供任何预测:
java.lang.ArrayIndexOutOfBoundsException: 11
at weka.core.DenseInstance.value(DenseInstance.java:332)
at weka.core.AbstractInstance.isMissing(AbstractInstance.java:315)
at weka.classifiers.trees.j48.C45Split.whichSubset(C45Split.java:494)
at weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670)
at weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231)
at weka.classifiers.trees.J48.classifyInstance(J48.java:266)
它追踪到这条线
outcomeValue = classifier.classifyInstance(instances.instance(0));
Solution:实际上我随机运行了该程序J48
它起作用了——给出了结果变量和相关的分布。
我希望有人能帮我解决这个问题。我真的不知道这段代码有什么问题,因为我已经检查了在线 Javadocs 和示例,并且不断的预测仍然存在。
(我目前正在检查 WEKA GUI 的主程序,但请在这里帮助我:-))