我想将 Excel 工作表读入 Pandas DataFrame。但是,存在合并的 Excel 单元格以及空行(完整/部分NaN
填充),如下图所示。需要澄清的是,John H. 已下订单购买从《The Bodyguard》到《Red Pill Blues》的所有专辑。
当我将此 Excel 工作表读入 Pandas DataFrame 时,Excel 数据无法正确传输。 Pandas 将合并的单元格视为一个单元格。数据框如下所示:(注意:()中的值是我想要的值)
请注意,最后一行不包含合并单元格;它只具有以下价值Artist
柱子。
EDIT:
I did try the following to forward-fill in the NaN values:(
Pandas: Reading Excel with merged cells)
df.index = pd.Series(df.index).fillna(method='ffill')
但是,那NaN
价值观依然存在。我可以使用什么策略或方法来正确填充 DataFrame?有没有取消合并单元格并复制相应内容的 Pandas 方法?
您尝试转发的引用链接仅需要填写index柱子。对于您的用例,您需要fillna
for all数据框列。因此,只需前向填充整个数据帧:
df = pd.read_excel("Input.xlsx")
print(df)
# Order_ID Customer_name Album_Name Artist Quantity
# 0 NaN NaN RadioShake NaN NaN
# 1 1.0 John H. The Bodyguard Whitney Houston 2.0
# 2 NaN NaN Lemonade Beyonce 1.0
# 3 NaN NaN The Thrill Of It All Sam Smith 2.0
# 4 NaN NaN Thriller Michael Jackson 11.0
# 5 NaN NaN Divide Ed Sheeran 4.0
# 6 NaN NaN Reputation Taylor Swift 3.0
# 7 NaN NaN Red Pill Blues Maroon 5 5.0
df = df.fillna(method='ffill')
print(df)
# Order_ID Customer_name Album_Name Artist Quantity
# 0 NaN NaN RadioShake NaN NaN
# 1 1.0 John H. The Bodyguard Whitney Houston 2.0
# 2 1.0 John H. Lemonade Beyonce 1.0
# 3 1.0 John H. The Thrill Of It All Sam Smith 2.0
# 4 1.0 John H. Thriller Michael Jackson 11.0
# 5 1.0 John H. Divide Ed Sheeran 4.0
# 6 1.0 John H. Reputation Taylor Swift 3.0
# 7 1.0 John H. Red Pill Blues Maroon 5 5.0
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)