一、 实验目的:
决策树分类算法(decision tree)通过树状结构对具有某特征属性的样本进行分类。其典型算法包括ID3算法、C4.5算法、C5.0算法、CART算法等。本次实验掌握用ID3的信息增益来实现决策树归纳。
二、 实验软件:
Rstudio
三、 实验思路
1.计算决策属性的熵 Info(D)
2.计算每个属性的熵 :计算年龄、收入、学生、信誉的条件熵 Info_A(D)
3.每个属性的信息增益 Gain(A)=Info(D)-InfoA(D)
4.选择节点 :选择信息增益最大的属性对数据集进行分类
四、 源代码:
#示例数据集
data<-data.frame(
Age=c("youth","youth","middle_aged","senior","senior","senior","middle_aged","youth","youth","senior","youth","middle_aged","middle_aged","senior"),
income=c("high","high","high","medium","low","low","low","medium","low","medium","medium","medium","high","medium"),
student=c("no","no","no","no","yes","yes","yes","no","yes","yes","yes","no","yes","no"),
credit_rating=c("fair","excellent","fair","fair","fair","excellent","excellent","fair","fair","fair",