使用 Pandas DataFrame iloc 属性进行基于索引的访问

2023-10-20

The iloc财产在Pandas库代表“整数位置”，并提供基于整数的索引以按位置进行选择。

这意味着您可以通过整数位置选择 DataFrame 中的行和列。

在本教程中，我们将介绍使用的各个方面iloc，包括选择单行、多行、特定列，甚至单个单元格。我们还将深入研究布尔索引等高级技术。

目录 hide

1 通过整数索引选择单行
2 使用整数索引列表选择多行
3 使用一系列整数索引对行进行切片
4 通过整数索引选择单个列
5 使用整数索引列表选择多个列
6 使用一系列整数索引对列进行切片
7 通过指定行索引和列索引选择单个单元格
8 使用索引列表选择特定列的行
9 使用整数索引范围对行和列进行切片
10 设置特定单元格的值
11 设置一行或一组行的值
12 设置一列或一组列的值
13 设置单元格范围（行和列）的值
14 Boolean Indexing (Use Boolean Arrays/Masks)
- 14.1 基本布尔索引
- 14.2 组合多个条件
15 错误处理和常见陷阱
16 Resource

通过整数索引选择单行

您可以通过提供要提取的行的整数索引来获取整行数据。


import pandas as pd
data = {
    'Name': ['John', 'Doe', 'Jane', 'Smith'],
    'Age': [28, 34, 22, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)

# Select the second row
selected_row = df.iloc[1]
print(selected_row)

Output:


Name             Doe
Age              34
City    Los Angeles
Name: 1, dtype: object

使用整数索引列表选择多行

有时，您可能希望根据行的位置一次检索多行。和iloc，您可以通过提供整数索引列表来做到这一点。

此方法返回一个新的 DataFrame，仅包含指定位置的行。


# Select the first and third rows
selected_rows = df.iloc[[0, 2]]
print(selected_rows)

Output:


   Name  Age       City
0  John   28   New York
2  Jane   22    Chicago

通过传递包含的列表0 and 2 to iloc，我们已经获取了 DataFrame 的第一行和第三行。

使用一系列整数索引对行进行切片

iloc还允许您使用基于范围的索引来对行进行切片。

您可以指定开始索引和结束索引，以及可选的步骤。这会从 DataFrame 返回一系列连续行。


# Select the first three rows
sliced_rows = df.iloc[0:3]
print(sliced_rows)

Output:


   Name  Age         City
0  John   28     New York
1   Doe   34  Los Angeles
2  Jane   22      Chicago

在提供的示例中，我们从第 0 个索引（含）开始，一直到但不包括第 3 个索引。

通过整数索引选择单个列

列是第二个轴（轴=1）。利用iloc，您可以通过整数索引选择各个列。

请注意，当您使用提取单个列时iloc，结果是一个 Series 对象，而不是 DataFrame。


# Select the first column
selected_column = df.iloc[:, 0]
print(selected_column)

Output:


0    John
1     Doe
2    Jane
3   Smith
Name: Name, dtype: object

冒号:行位置中的意思是“所有行”，并且0逗号后面指定第一列。

使用整数索引列表选择多个列

正如您可以使用索引列表选择多行一样，iloc通过提供列的整数索引列表来支持选择多个列。


# Select the first and third columns
selected_columns = df.iloc[:, [0, 2]]

Output:


   Name       City
0  John   New York
1   Doe  Los Angeles
2  Jane    Chicago
3  Smith   Houston

在此示例中，我们使用列表定位第一列和第三列[0, 2]在列位置。

使用一系列整数索引对列进行切片

您可以使用iloc与基于范围的索引相结合，根据位置选择一组连续的列：


# Select the first two columns
sliced_columns = df.iloc[:, 0:2]
print(sliced_columns)

Output:


   Name  Age
0  John   28
1   Doe   34
2  Jane   22
3  Smith  45

在这里，我们利用了范围0:2在列的位置内iloc.

这将选择从第 0 个索引（包括）开始到（但不包括）第 2 个索引的列。

通过指定行索引和列索引选择单个单元格

Using iloc，您可以通过指定行和列整数索引来查明并提取单个单元格的值。


# Select the cell from the second row and first column
cell_value = df.iloc[1, 0]
print(cell_value)

Output:

Doe

在此代码片段中，我们使用以下命令定位第二行第一列中的单元格iloc[1, 0]。结果就是名字“Doe”。

使用索引列表选择特定列的行

您可以通过提供两个维度的整数索引列表来同时选择多行和特定列。


# Select the first and third rows for the first and third columns
subset = df.iloc[[0, 2], [0, 2]]
print(subset)

Output:


   Name       City
0  John   New York
2  Jane    Chicago

在提供的示例中，我们为行索引和列索引指定了一个列表：[0, 2].

这将获取第一行和第三行，并且在这些行中仅获取第一列和第三列。

使用整数索引范围对行和列进行切片

With iloc，您可以使用范围对行和列进行切片，提供子 DataFrame 作为输出。


# Select the first three rows and first two columns
subset = df.iloc[0:3, 0:2]

Output:


   Name  Age
0  John   28
1   Doe   34
2  Jane   22

在演示的代码中，我们组合了两个范围：0:3对于行和0:2对于列。

这将选择前三行和前两列。

设置特定单元格的值

Using iloc，您可以通过指定其行索引和列索引来设置任何特定单元格的值。


# Set the value of the cell in the second row and first column to 'Alex'
df.iloc[1, 0] = 'Alex'
print(df)

Output:


    Name  Age         City
0   John   28     New York
1   Alex   34  Los Angeles
2   Jane   22      Chicago
3  Smith  45      Houston

设置一行或一组行的值

The iloc属性，您可以更新整行或一组行的值：


# Set values for the third row
df.iloc[2] = ['Ella', 30, 'Seattle']

# Set values for the first and fourth rows
df.iloc[[0, 3]] = [['Bob', 29, 'Boston'], ['Lucas', 47, 'Miami']]
print(df)

Output:


    Name  Age         City
0    Bob   29       Boston
1   Alex   34  Los Angeles
2   Ella   30      Seattle
3  Lucas  47        Miami

在示例中，我们首先使用以下命令为第三行设置新值df.iloc[2] = ['Ella', 30, 'Seattle']，更新“Jane”的数据。

然后，我们定位第一行和第四行，同时分配新值。

设置一列或一组列的值

The iloc属性允许您一次更新整个列或多个列：


# Set values for the 'Age' column
df.iloc[:, 1] = [35, 36, 31, 48]

# Set values for the 'Name' and 'City' columns
df.iloc[:, [0, 2]] = [['Mia', 'Atlanta'], ['Liam', 'Dallas'], ['Sophia', 'Denver'], ['Ethan', 'Phoenix']]
print(df)

Output:


     Name  Age      City
0     Mia   35   Atlanta
1    Liam   36    Dallas
2  Sophia   31    Denver
3   Ethan   48   Phoenix

在这里，我们首先定位“年龄”列，并使用分配一个新的年龄值列表df.iloc[:, 1].

接下来，我们继续同时设置“名称”和“城市”列的值。

设置单元格范围（行和列）的值

The iloc属性允许您跨行和列更新一系列单元格，提供要修改的特定值片段。


# Set values for the cells in the first two rows and last two columns
df.iloc[0:2, 1:3] = [[40, 'Orlando'], [37, 'Sacramento']]
print(df)

Output:


    Name  Age        City
0   John   40     Orlando
1    Doe   37  Sacramento
2   Jane   31     Chicago
3  Smith   48     Houston

在上面的演示中，我们选择了跨越 DataFrame 的前两行和最后两列的单元格块。

通过使用df.iloc[0:2, 1:3]，我们指定此范围并为相应行的“年龄”和“城市”列设置新值。

请记住，更新单元格范围时，您分配的值的形状应与您定位的单元格范围的形状匹配，以避免数据不一致。

布尔索引（使用布尔数组/掩码）

您可以使用布尔值（True 或 False）数组根据特定条件过滤行，而不是按整数索引选择行或列。

让我们深入研究如何将布尔数组/掩码与iloc优化您的 DataFrame 选择。

基本布尔索引

首先根据条件创建布尔掩码：


# Create a boolean mask for rows where Age is greater than 35
age_mask = df['Age'] > 35

现在，使用此面膜iloc:


filtered_data = df.iloc[age_mask.values]
print(filtered_data)

Output:


    Name  Age     City
3  Smith   45  Houston

在示例中，我们首先生成一个布尔掩码age_mask标识“年龄”超过 35 的行。当应用时iloc，仅包含以下行True掩码中的值被保留。

组合多个条件

您可以使用按位运算符组合多个条件，例如& (and), |（或），以及~ (not).


# Create a mask for rows where Age is greater than 35 and City is 'Houston'
combined_mask = (df['Age'] > 35) & (df['City'] == 'Houston')
filtered_data = df.iloc[combined_mask.values]
print(filtered_data)

Output:


    Name  Age     City
3  Smith   45  Houston

在这里，我们过滤个人年龄超过 35 岁且居住在“Phoenix”的条目。

错误处理和常见陷阱

使用导航 Pandas DataFrameiloc通常是流畅且直观的。但是，您可能会遇到一些潜在的陷阱和错误。

常见的错误之一是尝试访问 DataFrame 中不存在的索引，从而导致IndexError.


# Attempting to access the fifth row in a DataFrame with only four rows
# will raise an error.
try:
    print(df.iloc[4])
except IndexError as e:
    print(f"Error: {e}")

Output:


Error: single positional indexer is out-of-bounds

为了避免这种情况，请始终确保您提供的索引位于 DataFrame 的有效范围内。

Resource

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

pandas