我有一个数据集,如下所示:
df <- tribble(
~id, ~price, ~number_of_book,
"1", 10, 3,
"1", 5, 1,
"2", 7, 4,
"2", 6, 2,
"2", 3, 4,
"3", 4, 1,
"4", 5, 1,
"4", 6, 1,
"5", 1, 2,
"5", 9, 3,
)
正如您在数据集中看到的,如果 id 为“1”,则有 3 本书每本书售价 10 美元,而 1 本书每本书售价 5 美元。基本上,我想查看每个价格区间的图书数量份额 (%)。这是我想要的数据集:
df <- tribble(
~id, ~less_than_three, ~three-five, ~five-six, ~more_than_six,
"1", "0%", "25%", "0%", "75%",
"2", "0%", "40%", "20%", "40%",
"3", "0%", "100%", "0%", "0%",
"4", "0%", "50%", "50%", "0%",
"5", "40%", "0%", "0%", "60%",
)
现在,我首先对价格进行聚类。为此,我运行以下代码:
out <- cut(df$price, breaks = c(0, 3, 5, 6, 10),
labels = c("<3","3-5","5-6", ">6"))
out = table(out) / sum(table(out))
但不幸的是,由于缺乏编码知识,我无法更进一步。你能帮我获取所需的数据吗?