我需要从 .json 文件导入的数据集中添加额外的功能。
它看起来是这样的:
f1 = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json')
print(f1.head())
short_desc
1 [{'when': 1002742486, 'what': 'Usability issue...
10 [{'when': 1002742495, 'what': 'API - VCM event...
100 [{'when': 1002742586, 'what': 'Would like a wa...
10000 [{'when': 1014113227, 'what': 'getter/setter c...
100001 [{'when': 1118743999, 'what': 'Create Help Ind...
本质上,我需要将“short_desc”作为列名称,并用其正下方的字符串值填充它:“可用性问题...”
到目前为止,我已经尝试过以下操作:
f1['desc'] = pd.DataFrame([x for x in f1['short_desc']])
Wrong number of items passed 19, placement implies 1
有没有一种简单的方法可以在不使用循环的情况下完成此任务?有人能指出这个新手正确的方向吗?
不要初始化数据框并尝试将其分配给列 - 列意味着pd.Series
.
您应该直接分配列表理解,如下所示:
f1['desc'] = [x[0]['what'] for x in f1['short_desc']]
作为替代方案,我会提出一个不涉及任何 lambda 函数的解决方案,使用operator
and pd.Series.apply
:
import operator
f1['desc'] = f1.short_desc.apply(operator.itemgetter(0))\
.apply(operator.itemgetter('what'))
print(f1.desc.head())
1 Usability issue with external editors (1GE6IRL)
10 API - VCM event notification (1G8G6RR)
100 Would like a way to take a write lock on a tea...
10000 getter/setter code generation drops "F" in ".....
100001 Create Help Index Fails with seemingly incorre...
Name: desc, dtype: object
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)