在 python 中使用 pandas 检索数据列上的匹配字数

2024-01-01

我有一个df,

Name      Description
Ram Ram   is one of the good cricketer
Sri Sri   is one of the member
Kumar     Kumar is a keeper

和一个清单, my_list=["一","好","拉维","球"]

我正在尝试从 my_list 中获取至少具有一个关键字的行。

I tried,

  mask=df["Description"].str.contains("|".join(my_list),na=False)

我得到了output_df,

Name    Description
Ram     Ram is one of ONe crickete
Sri     Sri is one of the member
Ravi    Ravi is a player, ravi is playing
Kumar   there is a BALL

我还想添加“描述”中存在的关键字及其在单独列中的计数,

我想要的输出是,

Name    Description                      pre-keys          keys     count
Ram     Ram is one of ONe crickete         one,good,ONe   one,good    2
Sri     Sri is one of the member           one            one         1
Ravi    Ravi is a player, ravi is playing  Ravi,ravi      ravi        1
Kumar   there is a BALL                    ball           ball        1

Use str.findall http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.findall.html + str.join http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.join.html + str.len http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.len.html:

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')') 
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      2
1  Sri          Sri is one of the member       one      1

EDIT:

import re
my_list=["ONE","good"]

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      2
1  Sri          Sri is one of the member       one      1
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

在 python 中使用 pandas 检索数据列上的匹配字数 的相关文章

随机推荐