我们可以选择行ORG1 != A
and ORG2 == A
with
mask = (df['ORG1'] != 'A')&(df[orgi] == 'A')
mask
那么是一个布尔系列。要将值从 ORG2 复制到 ORG1,我们可以使用
df['ORG1'][mask] = df['ORG2'][mask]
或者,因为我们知道右边的值是A
,我们可以使用
df['ORG1'][mask] = 'A'
复制 AS 列也可以类似地完成。
我们可以找到列值包含某些字符串的行,例如'aureus'
with
df[orgi].str.contains('aureus') == True
str.contains
可以采取任何正则表达式模式作为其参数。
请参阅文档:矢量化字符串方法 http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-string-methods.
注意:通常使用就足够了df[orgi].str.contains('aureus')
(没有== True
, 但是由于df[orgi]
可能包含NaN
值,我们还需要映射NaN
s 为 False,所以我们使用df[orgi].str.contains('aureus') == True
.
import pandas as pd
filename = 'data.txt'
df = pd.read_table(filename, delimiter='\s+')
print(df)
# Index ORG1 ORG2 ORG3 AB1 AS101 AS201 AS301 AB2 AS102 AS202 AS302
# 0 1 A NaN NaN pen S NaN NaN dfluc S NaN NaN
# 1 2 A B C pen R S S dfluc S R S
# 2 3 B A B pen S S R dfluc S S R
# 3 4 A NaN NaN pen R NaN NaN dfluc S NaN NaN
# 4 5 A NaN NaN pen R NaN NaN dfluc S NaN NaN
# 5 6 C A A pen S R R dfluc R S R
# 6 7 B NaN A pen R NaN S dfluc S NaN S
# 7 8 A B A pen R R R dfluc R R R
# 8 9 A NaN NaN pen R NaN NaN dfluc S NaN NaN
for i in range(2,4):
orgi = 'ORG{i}'.format(i=i)
# mask = (df['ORG1'] != 'A')&(df[orgi] == 'A')
mask = (df['ORG1'].str.contains('A') == False)&(df[orgi].str.contains('A') == True)
# Move ORGi --> ORG1
df['ORG1'][mask] = df[orgi][mask]
for j in range(1,4):
# Move ASij --> AS1j
source_as = 'AS{i}{j:02d}'.format(i=i, j=j)
target_as = 'AS1{j:02d}'.format(i=i, j=j)
try:
df[target_as][mask] = df[source_as][mask]
except KeyError:
pass
print(df)
yields
Index ORG1 ORG2 ORG3 AB1 AS101 AS201 AS301 AB2 AS102 AS202 AS302
0 1 A NaN NaN pen S NaN NaN dfluc S NaN NaN
1 2 A B C pen R S S dfluc S R S
2 3 A A B pen S S R dfluc S S R
3 4 A NaN NaN pen R NaN NaN dfluc S NaN NaN
4 5 A NaN NaN pen R NaN NaN dfluc S NaN NaN
5 6 A A A pen R R R dfluc S S R
6 7 A NaN A pen S NaN S dfluc S NaN S
7 8 A B A pen R R R dfluc R R R
8 9 A NaN NaN pen R NaN NaN dfluc S NaN NaN
请注意,如果ORG2 == A
and ORG3 == A
,然后是列中的值AS20*
and AS30*
两者都竞争覆盖列中的值AS10*
。我不确定你想赢得哪个价值。在上面的代码中,last列获胜,这将是AS30*
.