我正在尝试阅读https://www.whatdotheyknow.com/request/193811/response/480664/attach/3/GCSE%20IGCSE%20results%20v3.xlsx https://www.whatdotheyknow.com/request/193811/response/480664/attach/3/GCSE%20IGCSE%20results%20v3.xlsx使用熊猫。
保存后我的脚本是
import sys
import pandas as pd
inputfile = sys.argv[1]
xl = pd.ExcelFile(inputfile)
# print xl.sheet_names
df = xl.parse(xl.sheet_names[0])
print df.head()
然而,这似乎没有正确处理标头,因为它给出了
GCSE and IGCSE1 results2,3 in selected subjects4 of pupils at the end of key stage 4 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10
0 Year: 2010/11 (Final) NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Coverage: England NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1. Includes International GCSE, Cambridge Inte... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2. Includes attempts and achievements by these... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
所有这些都应该被视为评论。
例如,如果将电子表格加载到 libreoffice 中,您可以看到列标题已正确解析并显示在第 15 行中,并带有下拉菜单,可供您选择所需的项目。
如何让 pandas 像 libreoffice 一样自动检测列标题的位置?
pandas
正在(是?)正确地处理文件,并且完全按照您要求的方式(他们?)。你没有指定一个header
value,这意味着它默认从第0行开始获取列名。前几行单元格在某种程度上并不是注释,它们只是不是您感兴趣的单元格。
简单的告诉parse
你想跳过一些行:
>>> xl = pd.ExcelFile("GCSE IGCSE results v3.xlsx")
>>> df = xl.parse(xl.sheet_names[0], skiprows=14)
>>> df.columns
Index([u'Local Authority Number', u'Local Authority Name', u'Local Authority Establishment Number', u'Unique Reference Number', u'School Name', u'Town', u'Number of pupils at the end of key stage 4', u'Number of pupils attempting a GCSE or an IGCSE', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-G', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-A', u'Number of students achieving 5 A*-A grades or more at GCSE or IGCSE'], dtype='object')
>>> df.head()
Local Authority Number Local Authority Name \
0 201 City of london
1 201 City of london
2 202 Camden
3 202 Camden
4 202 Camden
Local Authority Establishment Number Unique Reference Number \
0 2016005 100001
1 2016007 100003
2 2024104 100049
3 2024166 100050
4 2024196 100051
School Name Town \
0 City of London School for Girls London
1 City of London School London
2 Haverstock School London
3 Parliament Hill School London
4 Regent High School London
Number of pupils at the end of key stage 4 \
0 105
1 140
2 200
3 172
4 174
Number of pupils attempting a GCSE or an IGCSE \
0 104
1 140
2 194
3 169
4 171
Number of students achieving 8 or more GCSE or IGCSE passes at A*-G \
0 100
1 108
2 SUPP
3 22
4 0
Number of students achieving 8 or more GCSE or IGCSE passes at A*-A \
0 87
1 75
2 0
3 7
4 0
Number of students achieving 5 A*-A grades or more at GCSE or IGCSE
0 100
1 123
2 0
3 34
4 SUPP
[5 rows x 11 columns]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)