你可以先read_csv http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html带参数name
用于创建DataFrame
带柱Region Name
,分隔符是不在值中的值(例如;
):
df = pd.read_csv('filename.txt', sep=";", names=['Region Name'])
Then insert http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.insert.html新专栏State
with extract http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extract.html文本所在的行[edit]
and replace http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html所有值来自(
到列的末尾Region Name
.
df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())
df['Region Name'] = df['Region Name'].str.replace(r' \(.+$', '')
最后删除文本所在的行[edit]
by boolean indexing http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing,掩模是由str.contains http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html:
df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)
print (df)
State Region Name
0 Alabama Auburn
1 Alabama Florence
2 Alabama Jacksonville
3 Alabama Livingston
4 Alabama Montevallo
5 Alabama Troy
6 Alabama Tuscaloosa
7 Alabama Tuskegee
8 Alaska Fairbanks
9 Arizona Flagstaff
10 Arizona Tempe
11 Arizona Tucson
如果需要所有值解决方案更容易:
df = pd.read_csv('filename.txt', sep=";", names=['Region Name'])
df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())
df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)
print (df)
State Region Name
0 Alabama Auburn (Auburn University)[1]
1 Alabama Florence (University of North Alabama)
2 Alabama Jacksonville (Jacksonville State University)[2]
3 Alabama Livingston (University of West Alabama)[2]
4 Alabama Montevallo (University of Montevallo)[2]
5 Alabama Troy (Troy University)[2]
6 Alabama Tuscaloosa (University of Alabama, Stillman Co...
7 Alabama Tuskegee (Tuskegee University)[5]
8 Alaska Fairbanks (University of Alaska Fairbanks)[2]
9 Arizona Flagstaff (Northern Arizona University)[6]
10 Arizona Tempe (Arizona State University)
11 Arizona Tucson (University of Arizona)