这不简单。
需要将值转换为list
of dict
by replace http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.replace.html (\s+
是一个或多个空格)然后使用ast https://docs.python.org/2/library/ast.html.
那么就可以使用DataFrame
构造函数与concat http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html, pop http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pop.html从以下位置删除列df
:
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
EDIT:
另一种解决方案是yaml
:
import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife
0 NaN None NaN none none
1 band NaN NaN 10000 True
2 jail none one none None
df = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife
0 matt active 12345 NaN None NaN none none
1 james active 23456 band NaN NaN 10000 True
2 adam inactive 34567 jail none one none None