我有一个如下所示的数据框:
Customer_ID Category Products
1 Veg A
2 Veg B
3 Fruit A
3 Fruit B
3 Veg B
1 Fruit A
3 Veg C
1 Fruit C
我想找出每个类别的每个客户 ID 购买了哪些产品,并相应地为每个产品创建一个列。输出如下所示:
Customer_ID Category Pro_1 Pro_2 Pro_3
1 Veg A NA NA
1 Fruit A NA C
2 Veg NA B NA
3 Veg NA B C
3 Fruit A B NA
Use groupby http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html with unstack http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html,但是如果重复的行数据连接在一起:
df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
0 1 Fruit A None C
1 1 Veg A None None
2 2 Veg None B None
3 3 Fruit A B None
4 3 Veg None B C
另一个带有辅助列的解决方案,三元组必须是唯一的:
#if not unique triples remove duplicates
df = df.drop_duplicates(['Customer_ID','Category','Products'])
df['a'] = df['Products']
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
0 1 Fruit A None C
1 1 Veg A None None
2 2 Veg None B None
3 3 Fruit A B None
4 3 Veg None B C
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)