如何查找另一列的不同行中具有多个值的列值的总长度

2024-05-08

有没有办法找到同时有Apple和Strawberry的ID,然后求总长度?和只有苹果的ID,和只有草莓的IDS?

df:

        ID           Fruit
0       ABC          Apple        <-ABC has Apple and Strawberry
1       ABC          Strawberry   <-ABC has Apple and Strawberry
2       EFG          Apple        <-EFG has Apple only
3       XYZ          Apple        <-XYZ has Apple and Strawberry
4       XYZ          Strawberry   <-XYZ has Apple and Strawberry 
5       CDF          Strawberry   <-CDF has Strawberry
6       AAA          Apple        <-AAA has Apple only

期望的输出:

Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1

Thanks!


如果总是所有值都只是Apple or Strawberry在列中Fruit您可以比较每组的组数,然后进行计数ID by sum of True值:

v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2

编辑:如果有很多值:

s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple}                2
{Strawberry, Apple}    2
{Strawberry}           1
Name: Fruit, dtype: int64
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何查找另一列的不同行中具有多个值的列值的总长度 的相关文章

随机推荐