将长表转换为宽表并根据行创建列

2024-01-06

我有一个如下所示的数据框:

Customer_ID     Category   Products 
  1               Veg         A
  2               Veg         B
  3              Fruit        A   
  3              Fruit        B
  3               Veg         B 
  1              Fruit        A
  3               Veg         C 
  1              Fruit        C

我想找出每个类别的每个客户 ID 购买了哪些产品,并相应地为每个产品创建一个列。输出如下所示:

Customer_ID     Category    Pro_1    Pro_2     Pro_3
  1               Veg        A        NA         NA
  1              Fruit       A        NA         C
  2               Veg        NA       B          NA
  3               Veg        NA       B          C
  3              Fruit       A        B          NA

Use groupby http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html with unstack http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html,但是如果重复的行数据连接在一起:

df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
   Customer_ID Category Pro_1 Pro_2 Pro_3
0            1    Fruit     A  None     C
1            1      Veg     A  None  None
2            2      Veg  None     B  None
3            3    Fruit     A     B  None
4            3      Veg  None     B     C

另一个带有辅助列的解决方案,三元组必须是唯一的:

#if not unique triples remove duplicates
df = df.drop_duplicates(['Customer_ID','Category','Products'])

df['a'] = df['Products']
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
   Customer_ID Category Pro_1 Pro_2 Pro_3
0            1    Fruit     A  None     C
1            1      Veg     A  None  None
2            2      Veg  None     B  None
3            3    Fruit     A     B  None
4            3      Veg  None     B     C
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

将长表转换为宽表并根据行创建列 的相关文章

随机推荐