pandas KeyError,使用浮点数时找不到索引

2024-02-11

我遇到以下问题:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))

我们看到0.47在那里:

[ 0.      0.0025  0.005   0.0075  0.01    0.0125  0.015   0.0175  0.02
  0.0225  0.025   0.0275  0.03    0.0325  0.035   0.0375  0.04    0.0425
  0.045   0.0475  0.05    0.0525  0.055   0.0575  0.06    0.0625  0.065
  0.0675  0.07    0.0725  0.075   0.0775  0.08    0.0825  0.085   0.0875
  0.09    0.0925  0.095   0.0975  0.1     0.1025  0.105   0.1075  0.11
  0.1125  0.115   0.1175  0.12    0.1225  0.125   0.1275  0.13    0.1325
  0.135   0.1375  0.14    0.1425  0.145   0.1475  0.15    0.1525  0.155
  0.1575  0.16    0.1625  0.165   0.1675  0.17    0.1725  0.175   0.1775
  0.18    0.1825  0.185   0.1875  0.19    0.1925  0.195   0.1975  0.2
  0.2025  0.205   0.2075  0.21    0.2125  0.215   0.2175  0.22    0.2225
  0.225   0.2275  0.23    0.2325  0.235   0.2375  0.24    0.2425  0.245
  0.2475  0.25    0.2525  0.255   0.2575  0.26    0.2625  0.265   0.2675
  0.27    0.2725  0.275   0.2775  0.28    0.2825  0.285   0.2875  0.29
  0.2925  0.295   0.2975  0.3     0.3025  0.305   0.3075  0.31    0.3125
  0.315   0.3175  0.32    0.3225  0.325   0.3275  0.33    0.3325  0.335
  0.3375  0.34    0.3425  0.345   0.3475  0.35    0.3525  0.355   0.3575
  0.36    0.3625  0.365   0.3675  0.37    0.3725  0.375   0.3775  0.38
  0.3825  0.385   0.3875  0.39    0.3925  0.395   0.3975  0.4     0.4025
  0.405   0.4075  0.41    0.4125  0.415   0.4175  0.42    0.4225  0.425
  0.4275  0.43    0.4325  0.435   0.4375  0.44    0.4425  0.445   0.4475
  0.45    0.4525  0.455   0.4575  0.46    0.4625  0.465   0.4675  0.47
  0.4725  0.475   0.4775  0.48    0.4825  0.485   0.4875  0.49    0.4925
  0.495   0.4975  0.5     0.5025  0.505   0.5075  0.51    0.5125  0.515
  0.5175  0.52    0.5225  0.525   0.5275  0.53    0.5325  0.535   0.5375
  0.54    0.5425  0.545   0.5475  0.55    0.5525  0.555   0.5575  0.56
  0.5625  0.565   0.5675  0.57    0.5725  0.575   0.5775  0.58    0.5825
  0.585   0.5875  0.59    0.5925  0.595   0.5975  0.6     0.6025  0.605
  0.6075  0.61    0.6125  0.615   0.6175  0.62    0.6225  0.625   0.6275
  0.63    0.6325  0.635   0.6375  0.64    0.6425  0.645   0.6475  0.65
  0.6525  0.655   0.6575  0.66    0.6625  0.665   0.6675  0.67    0.6725
  0.675   0.6775  0.68    0.6825  0.685   0.6875  0.69    0.6925  0.695
  0.6975  0.7     0.7025  0.705   0.7075  0.71    0.7125  0.715   0.7175
  0.72    0.7225  0.725   0.7275  0.73    0.7325  0.735   0.7375  0.74
  0.7425  0.745   0.7475  0.75    0.7525  0.755   0.7575  0.76    0.7625
  0.765   0.7675  0.77    0.7725  0.775   0.7775  0.78    0.7825  0.785
  0.7875  0.79    0.7925  0.795   0.7975  0.8     0.8025  0.805   0.8075
  0.81    0.8125  0.815   0.8175  0.82    0.8225  0.825   0.8275  0.83
  0.8325  0.835   0.8375  0.84    0.8425  0.845   0.8475  0.85    0.8525
  0.855   0.8575  0.86    0.8625  0.865   0.8675  0.87    0.8725  0.875
  0.8775  0.88    0.8825  0.885   0.8875  0.89    0.8925  0.895   0.8975
  0.9     0.9025  0.905   0.9075  0.91    0.9125  0.915   0.9175  0.92
  0.9225  0.925   0.9275  0.93    0.9325  0.935   0.9375  0.94    0.9425
  0.945   0.9475  0.95    0.9525  0.955   0.9575  0.96    0.9625  0.965
  0.9675  0.97    0.9725  0.975   0.9775  0.98    0.9825  0.985   0.9875
  0.99    0.9925  0.995   0.9975  1.    ]

现在例如我尝试df[0.47]并得到以下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

我不明白为什么会发生这种情况。


这里的问题是由于浮点不精确,您可以使用该方法get_slice_bound返回该行的序号位置:

In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]

Out[237]:
0    0.854001
Name: 0.47, dtype: float64

我们可以看到该索引标签的实际值:

In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003

虽然 pandas 确实支持float64Index通过这样做,精确的标签查找将会出现问题,你最好坚持使用默认值Int64Index

get_slice_bound是一个未记录的方法,但文档字符串为您提供了足够的信息:

Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.

Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.

Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}

您还可以使用get_loc http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_loc.html并通过method='nearest'达到同样的目的:

In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]

Out[240]:
0    0.854001
Name: 0.47, dtype: float64
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

pandas KeyError,使用浮点数时找不到索引 的相关文章

随机推荐