在尝试使用 Pandas 系列 str.split() 函数分割数据帧的“Actors”列中的值时,我得到的值比我指定的分割更多:
df['Actors'] = df['Actors'].str.split(",",n=3)
1 [timrobbins, morganfreeman, bobgunton, william...
2 [marlonbrando, alpacino, jamescaan, richardsca...
3 [alpacino, robertduvall, dianekeaton, robertde...
4 [christianbale, heathledger, aaroneckhart, mic...
5 [martinbalsam, johnfiedler, leejcobb, egmarshall]
如果我尝试使用下面的代码片段对上述结果进行切片,则 NaN 开始出现在结果中:
df['Actors'] = df['Actors'].str.split(",",n=3)[:3]
df['Actors'].head()
1 [timrobbins, morganfreeman, bobgunton, william...
2 [marlonbrando, alpacino, jamescaan, richardsca...
3 [alpacino, robertduvall, dianekeaton, robertde...
4 NaN
5 NaN
Name: Actors, dtype: object
或者,如果我尝试使用 apply 函数的代码片段(如下所示),则会获得正确的结果:
df['Actors'] = df['Actors'].apply(lambda x: x.split(",")[:3])
df['Actors'].head()
1 [timrobbins, morganfreeman, bobgunton]
2 [marlonbrando, alpacino, jamescaan]
3 [alpacino, robertduvall, dianekeaton]
4 [christianbale, heathledger, aaroneckhart]
5 [martinbalsam, johnfiedler, leejcobb]
Name: Actors, dtype: object
我想知道为什么会发生这种异常以及在这种情况下如何正确使用 str.split() 函数?
要进一步检查数据,您可以使用以下代码片段自行下载数据:
df = pd.read_csv('https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7',index_col=0)