正如已经指出的,
为了SVM
基于分类器(如y == np.int*
)
预处理是必须的,否则 ML 估计器的预测能力会因倾斜特征对决策函数的影响而丧失。
正如所反对的处理时间:
- 尝试更好地了解您的 AI/ML 模型过度拟合/泛化是什么
[C,gamma]
景观
- 尝试添加冗长进入初始 AI/ML 流程调整
- 尝试添加n_jobs进入数字运算
- 如果规模需要,尝试将网格计算添加到您的计算方法中
.
aGrid = aML_GS.GridSearchCV( aClassifierOBJECT,
param_grid = aGrid_of_parameters,
cv = cv,
n_jobs = n_JobsOnMultiCpuCores,
verbose = 5 )
有时,GridSearchCV()
确实会占用大量的 CPU 时间/CPU 资源池,即使使用了上述所有技巧之后。
因此,如果您确定特征工程、数据完整性和特征域预处理正确完成,请保持冷静,不要惊慌。
[GridSearchCV] ................ C=16777216.0, gamma=0.5, score=0.761619 -62.7min
[GridSearchCV] C=16777216.0, gamma=0.5 .........................................
[GridSearchCV] ................ C=16777216.0, gamma=0.5, score=0.792793 -64.4min
[GridSearchCV] C=16777216.0, gamma=1.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=1.0, score=0.793103 -116.4min
[GridSearchCV] C=16777216.0, gamma=1.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=1.0, score=0.794603 -205.4min
[GridSearchCV] C=16777216.0, gamma=1.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=1.0, score=0.771772 -200.9min
[GridSearchCV] C=16777216.0, gamma=2.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=2.0, score=0.713643 -446.0min
[GridSearchCV] C=16777216.0, gamma=2.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=2.0, score=0.743628 -184.6min
[GridSearchCV] C=16777216.0, gamma=2.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=2.0, score=0.761261 -281.2min
[GridSearchCV] C=16777216.0, gamma=4.0 .........................................
[GridSearchCV] ............... C=16777216.0, gamma=4.0, score=0.670165 -138.7min
[GridSearchCV] C=16777216.0, gamma=4.0 .........................................
[GridSearchCV] ................ C=16777216.0, gamma=4.0, score=0.760120 -97.3min
[GridSearchCV] C=16777216.0, gamma=4.0 .........................................
[GridSearchCV] ................ C=16777216.0, gamma=4.0, score=0.732733 -66.3min
[GridSearchCV] C=16777216.0, gamma=8.0 .........................................
[GridSearchCV] ................ C=16777216.0, gamma=8.0, score=0.755622 -13.6min
[GridSearchCV] C=16777216.0, gamma=8.0 .........................................
[GridSearchCV] ................ C=16777216.0, gamma=8.0, score=0.772114 - 4.6min
[GridSearchCV] C=16777216.0, gamma=8.0 .........................................
[GridSearchCV] ................ C=16777216.0, gamma=8.0, score=0.717718 -14.7min
[GridSearchCV] C=16777216.0, gamma=16.0 ........................................
[GridSearchCV] ............... C=16777216.0, gamma=16.0, score=0.763118 - 1.3min
[GridSearchCV] C=16777216.0, gamma=16.0 ........................................
[GridSearchCV] ............... C=16777216.0, gamma=16.0, score=0.746627 - 25.4s
[GridSearchCV] C=16777216.0, gamma=16.0 ........................................
[GridSearchCV] ............... C=16777216.0, gamma=16.0, score=0.738739 - 44.9s
[Parallel(n_jobs=1)]: Done 2700 out of 2700 | elapsed: 5670.8min finished
正如上面所问的“......常规svm.SVC().fit
”
请注意,
它使用默认值[C,gamma]
值,因此与您的模型/问题域的行为无关。
回复:更新
哦,是的,确实,SVM 输入的正则化/缩放是这个 AI/ML 工具的强制性任务。
scikit-learn 有很好的工具来生成和重用aScalerOBJECT
对于先验缩放(之前aDataSET
进入.fit()
)和事后临时缩放,一旦您需要重新缩放新的example并将其发送给预测器来回答它的魔力
通过请求
anSvmCLASSIFIER.predict( aScalerOBJECT.transform( aNewExampleX ) )
( Yes, aNewExampleX
可能是一个矩阵,因此要求对多个答案进行“矢量化”处理)
Performance relief of O( M 2 . N 1 ) computational complexity
In contrast to the below posted guess, that the Problem-"width", measured as N
== a number of SVM-Features in matrix X
is to be blamed for an overall computing time, the SVM classifier with rbf-kernel is by-design an O( M 2 . N 1 ) problem.
因此,对观察总数(示例)存在二次依赖性,移入训练(.fit()
)或交叉验证阶段,并且很难说,如果“减少”特征的(仅线性)“宽度”,监督学习分类器将获得更好的预测能力,这本身bear输入到 SVM 分类器构建的预测能力中,不是吗?