dataframe = pd.DataFrame({'Date':['This 1A1619 person BL171111 the A-1-24',
'dont Z112 but NOT 1-22-2001',
'mix: 1A25629Q88 or A13B ok'],
'IDs': ['A11','B22','C33'],
})
Date IDs
0 This 1A1619 person BL171111 the A-1-24 A11
1 dont Z112 but NOT 1-22-2001 B22
2 mix: 1A25629Q88 or A13B ok C33
我有上面的数据框。我的目标是替换所有不带连字符的混合单词/数字组合-
e.g. 1A1619I
or BL171111
or A13B
但不是1-22-2001
or A-1-24
与这封信M
。我尝试通过使用下面的代码使用正则表达式识别字母/数字组合并存储在字典中 https://stackoverflow.com/questions/57650538/identify-letter-number-combinations-using-regex-and-storing-in-dictionary
dataframe['MixedNum'] = dataframe['Date'].str.replace(r'(?=.*[a-zA-Z])(\S+\S+\S+)','M')
但我得到这个输出
Date IDs MixedNum
0 This 1A1619 person BL171111 the A-1-24 A11 M M M M M M M
1 dont Z112 but NOT 1-22-2001 B22 M M M M 1-22-2001
2 mix: 1A25629Q88 or A13B ok C33 M M or M ok
当我真的想要这个输出时
Date IDs MixedNum
0 This 1A1619 person BL171111 the A-1-24 A11 This M person M the A-1-24
1 dont Z112 but NOT 1-22-2001 B22 dont M but NOT 1-22-2001
2 mix: 1A25629Q88 or A13B ok C33 mix: M or M ok
我也尝试了这里建议的正则表达式,但它对我也不起作用正则表达式替换混合数字+字符串 https://stackoverflow.com/questions/13453999/regex-replace-mixed-numberstrings
谁能帮我改变我的正则表达式?r'(?=.*[a-zA-Z])(\S+\S+\S+