我有一个大约 100,000 次一起订购的项目的列表,我已将其粘贴到一列中,以便我可以计算每个组合出现的次数。
4845 Curly Fries California Burger 1
4846 French Fries California Burger 1
4847 Hamburger California Burger 1
4848 $1 Fountain Drinks Curly Fries 1
4849 $1 Fountain Drinks Curly Fries 1
4850 California Burger Curly Fries 1
4851 Curly Fries Curly Fries 1
我探索了聚合函数,它给了我以下错误:
aggregate(t1$count,list(t1$pc), sum) <br>
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list? <br>
我还尝试过 ddply 的变体:
ddply(t1,t1$pc,transform,occurances=sum(t1$count))
但我收到这个错误
Error in UseMethod("as.quoted") :
no applicable method for 'as.quoted' applied to an object of class "c('matrix', 'list')"
我假设我得到这个是因为我试图按字符值本质上“分组”。我也探索过tapply
and recast
基于类似问题的答案,但无济于事。
我怎样才能得到这个组合数?
供考虑,单独列出的项目示例(再次对格式问题表示歉意):
Var1 Var2 Var3
>2 Onion Rings Onion Rings 1
>3 Pineapple Cheddar Burger Onion Rings 1
>4 Onion Rings Pineapple Cheddar Burger 1
>5 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
>5 Onion Rings Onion Rings 1
>6 Pineapple Cheddar Burger Onion Rings 1
>7 Onion Rings Pineapple Cheddar Burger 1
>8 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
>9 Fountain Soda Fountain Soda 1
>10 French Fries Fountain Soda 1