我有一个df
如下
import pandas as pd
df = pd.DataFrame(
{'number_C1_E1': ['1', '2', None, None, '5', '6', '7', '8'],
'fruit_C11_E1': ['apple', 'banana', None, None, 'watermelon', 'peach', 'orange', 'lemon'],
'name_C111_E1': ['tom', 'jerry', None, None, 'paul', 'edward', 'reggie', 'nicholas'],
'number_C2_E2': [None, None, '3', None, None, None, None, None],
'fruit_C22_E2': [None, None, 'blueberry', None, None, None, None, None],
'name_C222_E2': [None, None, 'anthony', None, None, None, None, None],
'number_C3_E1': [None, None, '3', '4', None, None, None, None],
'fruit_C33_E1': [None, None, 'blueberry', 'strawberry', None, None, None, None],
'name_C333_E1': [None, None, 'anthony', 'terry', None, None, None, None],
}
)
我想要做的就是合并这些列,我们有两个规则:
- 如果一列删除
_C{0~9}
or _C{0~9}{0~9}
or _C{0~9}{0~9}{0~9}
等于另一列,这两列可以合并。
让我们来number_C1_E1
number_C2_E2
number_C3_E1
举个例子,这里number_C1_E1
and number_C3_E1
可以组合,因为它们都是number_E1
after removing _C{0~9}
.
- 两个组合列应该去掉
None
values.
期望的结果是
number_C1_1_E1 fruit_C11_1_E1 name_C111_1_E1 number_C2_1_E2 fruit_C22_1_E2 name_C222_1_E2
0 1 apple tom None None None
1 2 banana jerry None None None
2 3 blueberry anthony 3 blueberry anthony
3 4 strawberry terry None None None
4 5 watermelon paul None None None
5 6 peach edward None None None
6 7 orange reggie None None None
7 8 lemon nicholas None None None
有人有好的解决办法吗?