这是我的 pandas 数据框的示例,它包含接近 100k 行
import pandas as pd
df = pd.DataFrame({'cluster': ['5', '5', '5', '5', '5', '5'],
'mdse_item_i': ['23627102',
'23627102',
'23627102',
'23627102',
'23627102',
'23627102'],
'predPriceQty': ['35.675543',
'33.236678',
'35.675543',
'35.675543',
'35.675543',
'35.675543'],
'schedule_i': ['56', '56', '56', '56', '56', '56'],
'segment_id': ['4123', '4123', '4144', '4161', '4295', '4454'],
'wk': ['1', '2', '1', '1', '1', '1']} )
segment_id |
cluster |
schedule_i |
mdse_item_i |
wk |
predPriceQty |
4123 |
5 |
56 |
23627102 |
1 |
35.675543 |
4123 |
5 |
56 |
23627102 |
2 |
33.236678 |
4144 |
5 |
56 |
23627102 |
1 |
35.675543 |
4161 |
5 |
56 |
23627102 |
1 |
35.675543 |
4295 |
5 |
56 |
23627102 |
1 |
35.675543 |
4454 |
5 |
56 |
23627102 |
1 |
35.675543 |
下面是我想要实现的字典的嵌套格式
{(4123, 5): {56.0: {23627102.0: {1: 35.6755430505491, 2:33.236678}}},
(4144, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
(4161, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
(4295, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
(4454, 5): {56.0: {23627102.0: {1: 35.6755430505491}}}}
下面的代码适用于我,但对于巨大的数据框,创建字典需要几个小时,我正在尝试避免逐行迭代
forecast_dict_all = {}
for _, row in df.iterrows():
item_agg_id = int(row[segment_id])
mdse_item_i = row["mdse_item_i"]
cluster = int(row["cluster"])
wk = int(row["wk"])
forecast = float(row["predPriceQty"])
schedule_id = row["schedule_i"]
if (item_agg_id, cluster) not in forecast_dict_all:
forecast_dict_all[item_agg_id, cluster] = {
schedule_id: {mdse_item_i: {wk: forecast}}
}
到目前为止我的解决方案
dict(df.groupby(['segment_id','cluster'],as_index=False).apply(lambda x: x.to_dict()).to_dict())
df.set_index(['segment_id', 'cluster'], inplace=True)
di = df.to_dict(orient='index')
forecast_dict_all = {k:{v['schedule_i']: {v['mdse_item_i']: {v['wk']: v['predPriceQty']}}}
for k,v in di.items()}
df.set_index(['segment_id', 'cluster'], inplace=True)
{k:{grp['schedule_i']: {grp['mdse_item_i']: {grp['wk']: grp['predPriceQty']}}}
for k, grp in df.groupby(['schedule_i','mdse_item_i','wk','predPriceQty'])}
我什至尝试使用压缩,但在这两种情况下,我都无法实现所需的输出。
编辑
我在用
蟒蛇:2.7.13.final.0
熊猫:0.20.1
任何帮助表示赞赏,谢谢