Pandas：如何在python3中使用混合类型多索引的切片？

2024-01-12

正如我在这个部分相关的问题 https://stackoverflow.com/questions/50097704，不可能再对混合类型序列进行排序：

# Python3.6
sorted(['foo', 'bar', 10, 200, 3])
# => TypeError: '<' not supported between instances of 'str' and 'int'

这会影响 pandas 中的切片查询。下面的例子说明了我的问题。

import pandas as pd
import numpy as np
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)

idx=pd.IndexSlice
table.loc[idx[:10,:],:]
# The last line will raise an UnsortedIndexError because 
# 'foo' and 'bar' appear in the wrong order.

异常信息如下：

UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'

在 python2.x 中，我通过对索引进行 lex 排序来从该异常中恢复：

# Python2.x:
table = table.sort_index()

#               0         1
# 2   2  0.020841  0.717178
# 10  1  1.608883  0.807834
#     3  0.566967  1.978718
# bar 5 -0.683814 -0.382024
# foo 4  0.150284 -0.750709

table.loc[idx[:10,:],:]
#              0         1
# 2  2  0.020841  0.717178
# 10 1  1.608883  0.807834
#    3  0.566967  1.978718

然而，在 python3 中，我最终遇到了我在开头提到的异常：

TypeError: '<' not supported between instances of 'str' and 'int'

如何从中恢复？在排序之前将索引转换为字符串不是一个选项，因为这会破坏索引的正确顺序：

# Python2/3
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = list(map(lambda x: tuple(map(str,x)), index))
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)
table = table.sort_index()
#               0         1
# 10  1  0.020841  0.717178
#     3  1.608883  0.807834
# 2   2  0.566967  1.978718
# bar 5 -0.683814 -0.382024
# foo 4  0.150284 -0.750709

通过这种排序，基于值的切片将被打破。

table.loc[idx[:10,:],:]     # Raises a TypeError
table.loc[idx[:'10',:],:]   # Misses to return the indices [2,:]

我该如何恢复？

这是我能想到的最好的办法。解决方法分三步：

以 lex 排序保留 python2 中旧的混合类型排序的方式对多重索引进行字符串化。例如，ints 可以前面加上足够的 0。
对表格进行排序。
使用切片访问表时使用相同的字符串化。

代码如下（完整示例）：

import numpy as np
import pandas as pd 

# Stringify whatever needs to be converted.
# In this example: only ints are stringified.
def toString(x):
    if isinstance(x,int):
        x = '%03d' % x
    return x
# Stringify an index tuple.
def idxToString(idx):
    if isinstance(idx, tuple):
        idx = list(idx)
        for i,x in enumerate(idx):
            idx[i] = toString(x)
        return tuple(idx)
    else:
        return toString(idx)
# Replacement for pd.IndexSlice
class IndexSlice(object):
    @staticmethod
    def _toString(arg):
        if isinstance(arg, slice):
            arg = slice(toString(arg.start),
                        toString(arg.stop),
                        toString(arg.step))
        else:
            arg = toString(arg)
        return arg

    def __getitem__(self, arg):
        if isinstance(arg, tuple):
            return tuple(map(self._toString, arg))
        else:
            return self._toString(arg)

# Build the table.
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)
# 1) Stringify the index.
table.index = table.index.map(idxToString)
# 2) Sort the index.
table = table.sort_index()
# 3) Create an IndexSlice that applies the same
#    stringification rules. (Replaces pd.IndexSlice)
idx = IndexSlice()
# Now, the table rows can be accessed as usual.
table.loc[idx[10],:]
table.loc[idx[:10],:]
table.loc[idx[:'bar',:],:]
table.loc[idx[:,:2],:]

这不是很漂亮，但它修复了升级到 python3 后损坏的表数据的基于切片的访问。如果你们有更好的建议，我很高兴阅读。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

python3x

pandas

Sorting

Pandas：如何在python3中使用混合类型多索引的切片？的相关文章

更新 Sqlalchemy 中的多个列

我有一个在 Flask 上运行的应用程序并使用 sqlalchemy 与数据库交互我想用用户指定的值更新表的列我正在使用的查询是 def update table value1 value2 value3 query update T
查找模块中显式定义的函数 (python)

好的我知道您可以使用 dir 方法列出模块中的所有内容但是有什么方法可以仅查看该模块中定义的函数吗例如假设我的模块如下所示 from datetime import date datetime def test return Thi
在 macOS 中通过 Python 访问进程的压缩 RAM（顶部的 CMPRS）的方法？

我试图弄清楚如何从 Python 访问任何给定进程占用的实际 RAM 量我发现 psutil Process PID memory info rss 工作得很好直到操作系统决定开始压缩某些进程的 RAM 然后所有的 memory in
根据开始列和结束列扩展数据框（速度）

我有一个pandas DataFrame含有start and end列加上几个附加列我想将此数据框扩展为一个时间序列从start值并结束于end值但复制我的其他专栏到目前为止我想出了以下内容 import pandas as
有没有办法在每个特定的时间间隔运行 python Flask 函数并在本地服务器上显示输出？

我正在使用 Flask 工作 python 程序我想从字典中提取键该密钥为文本格式但我想在每个特定的时间间隔后重复上述整个过程并每次在本地浏览器上显示此输出我已经使用flask apscheduler尝试过这个程序只运行一次并显
更改 python tkinter canvas 中的线坐标

我画了一条线tkinter Canvas现在我想移动一端这可能吗例如和itemconfig import tkinter tk tkinter Tk canvas tkinter Canvas tk canvas pack line c
python是带有字符串的运算符行为[重复]

这个问题在这里已经有答案了我无法理解以下行为我正在创建 2 个字符串并使用 is 运算符来比较它对于第一种情况它的工作方式有所不同对于第二种情况它按预期工作当我使用逗号或空格时它显示是什么原因False与比较is当没有使用
PySide6.1 与 matplotlib 3.4 不兼容

当我只安装PySide6时 GUI程序运行良好但是一旦我安装了matplotlib及其依赖包包括pyqt5 则GUI程序将无法运行并输出以下错误消息 This application failed to start because no
为什么我无法在 Mac OS X Terminal.app 上的 Python 解释器中显示 unicode 字符？

如果我尝试粘贴 unicode 字符例如中间的点在我的 python 解释器中它什么也不做我在 Mac OS X 上使用 Terminal app 当我只是在 bash 中时我没有遇到任何问题但在解释器中 python Pytho
如何使用 Django 项目设置 SQLite？

我已阅读 Django 文档仅供参考 https docs djangoproject com en 1 3 intro tutorial01 https docs djangoproject com en 1 3 intro tutor
Python多处理错误“ForkAwareLocal”对象没有属性“连接”

下面是我的代码我面临着多处理问题我看到这个问题之前已经被问过我已经尝试过这些解决方案但它似乎不起作用有人可以帮我吗 from multiprocessing import Pool Manager Class X def init
django-admin.py makemessages 不起作用

我正在尝试翻译一个字符串 load i18n trans Well Hello there how are you to Hola amigo que tal 我的 settings py 文件有这样的内容 LOCALE PATHS os
Python 类型安全吗？

根据维基百科 https en wikipedia org wiki Type system Type safety and memory safety 如果一种语言不允许违反类型系统规则的操作或转换计算机科学家就认为该语言是类型安全的
在 Sphinx 中，有没有办法在声明参数的同时记录参数？

我更喜欢在声明参数的同一行记录每个参数根据需要以便应用D R Y http en wikipedia org wiki Don t repeat yourself 如果我有这样的代码 def foo flab nickers a ser
解析根元素内元素之间的 XML 文本

我正在尝试用 Python 解析 XML 以下是 XML 结构的示例 a aaaa1 b bbbb b aaaa2 a
无法在 python 3.8 上将带有 webapp 的 python 部署到 azure

我正在尝试使用部署一个测试项目Flask使用以下方法将框架迁移到 Azure 云中Azure CLI https learn microsoft com en us azure app service containers quicksta
处理大文件的最快方法？

我有多个 3 GB 制表符分隔文件每个文件中有 2000 万行所有行都必须独立处理任何两行之间没有关系我的问题是什么会更快逐行阅读 with open as infile for line in infile 将文件分块读入内存
如何在 robobrowser-python 中发出 POST 请求

http robobrowser readthedocs org en latest api html http robobrowser readthedocs org en latest api html 我正在尝试使用 APIbrows
Pandas - 合并数据框以将所有值保留在左侧，如果“左侧没有键”，则从右侧“插入”值，否则“更新”左侧现有的“键”

我有两个数据框 df1 和 df2 np random seed 0 df1 pd DataFrame key A B C D id 2 23 234 2345 2021 np random randn 4 df2 pd DataFrame
在 Django shell 会话期间获取 SQL 查询计数

有没有办法打印 Django ORM 在 Django shell 会话期间执行的原始 SQL 查询的数量 Django 调试工具栏已经提供了此类信息例如 5 QUERIES in 5 83MS但如何从 shell 中获取它并不明显您可

随机推荐

如何覆盖 Razor 的“名称”HtmlAttribute

Html RadioButtonFor Model gt Model Location Location Html LabelFor Model gt Model Location Location Html RadioButtonFor
在keras中，如何拟合不同类型的多个输入数据

我有 3000 张 320 320 形状的图像它们的拍摄时间以及它们的标签现在我想使用这两种类型的数据图像和时间来预测它们的标签主要代码如下 num classes 10 image out GlobalMaxPooling2D
Hibernate: LazyInitializationException: 未能延迟初始化角色集合。无法初始化代理 - 无会话

我有下一个错误 nested exception is org hibernate LazyInitializationException failed to lazily initialize a collection of role c
如何根据消息头属性仅读取特定队列消息

我在 activemq 队列中有一个消息列表每条消息都有一个带有值的自定义标头属性我应该如何才能仅访问那些自定义标头属性值 123 的消息我正在使用类似下面的东西从队列中选择消息如何选择具有 customHeaderProperty
如何处理android中的睡眠模式进入？

我在任何地方都没有找到它我该如何处理在android中进入睡眠模式当Android设备进入睡眠模式时我想做什么这是可能的还是有办法处理它只需使用 BroadCastReceivers 进行系统调用唤醒睡眠即可实现此目的 And
如何在MVVM模式中实现INotifyPropertyChanged和observableCollection？

我在模型中有一个 ObservableCollection of Products 我希望 ViewModel 能够侦听 ObservableCollection of Products 中的任何更改我不确定如何去实施它我读过一些教程
查找二叉树中指定节点的路径 (Python)

我在计算二叉树中从根到指定节点的路径时遇到问题这是专门针对此问题的 Python 解决方案这是一个例子给定下面的二叉树如果我指定值为 4 的节点我想返回 1 2 4 如果我指定值为5的节点我想返回 1 2 5 1 2 3 4 5
C++ 对象实例化

我是一名 C 程序员正在尝试理解 C 许多教程使用片段演示对象实例化例如 Dog sparky new Dog 这意味着稍后您将执行以下操作 delete sparky 这是有道理的现在在不需要动态内存分配的情况下是否有任何理由使
Visual Studio中有类似Eclipse Perspective的东西吗？

我想知道 Visual Studio 2008 或 2010 中是否有类似 Eclipse Perspectives 的东西对于那些不熟悉 Eclipse 的人这里有一个视角的定义 http www eclipse org articl
spplot() 上的国家/地区标签

我想为 spplot 上的区域添加名称标签 Example load url http gadm org data rda FRA adm0 RData FR lt gadm FR lt spChFIDs FR paste FR rowna
spring-context.xml 的位置

当我在 tomcat 上运行应用程序时 spring context xml 文件位于 WEB inf spring context xml 还行吧但是运行 junit 测试时我必须向它提供 spring test context xm
Jquery 与原型 magento 冲突 - 我怎样才能分开？

我似乎无法通过我的 Magento 网站将 jQuery 与 Prototype 分开我已经使用更改的标签等在 JsFiddle 上工作了但是当我将它添加到我的 magento 站点时我不断收到未捕获的语法错误页面位于http ww
C++ 中的接口继承

我有以下类结构 class InterfaceA virtual void methodA 0 class ClassA public InterfaceA void methodA class InterfaceB public Inte
使用 odp.net 和 C# 中的 OCI 连接到 Oracle

我一直在阅读有关如何从 C win 应用程序连接到我的 Oracle 数据库的信息但我一直碰壁我决定使用odp net和OCI 这样客户端计算机就不需要安装客户端但我无法让它工作我有一个小型测试应用程序如下所示的代码在我的解决
试图找出 Windows Workflow 4.5 问题的根源

我得到的错误是工作流应用程序已中止因为加载或 LoadRunnableInstance 操作引发异常创建一个新的 WorkflowApplication 对象尝试加载另一个工作流实例我正在使用 workflowapplication
Facebook API 获取好友相册对某些好友不起作用

我正在使用以下 FQL 查询 select src src big from photo where aid in select aid from album where owner contactId and type profile 但
获取 java.lang.NoClassDefFoundError: org/pdfbox/pdfparser/

下面是我正在使用的代码我提供了一个 pdf 文件和一个文本文件作为命令行的输入 import org pdfbox cos COSDocument import org pdfbox pdfparser PDFParser import
在 Maven 中本地引用依赖项 jar

在我的项目中我使用的外部 jar 不存在于公司的 Maven 存储库中所以我收到以下错误 Could not resolve dependencies for project Could not find artifact in htt
加快 Spring Boot 启动时间

我有一个 Spring Boot 应用程序我添加了很多依赖项不幸的是看起来我需要所有这些依赖项并且启动时间增加了很多只是做一个SpringApplication run source args 需要 10 秒虽然与习惯相比
Pandas：如何在python3中使用混合类型多索引的切片？

正如我在这个部分相关的问题 https stackoverflow com questions 50097704 不可能再对混合类型序列进行排序 Python3 6 sorted foo bar 10 200 3 gt TypeError

Pandas：如何在python3中使用混合类型多索引的切片？

Pandas：如何在python3中使用混合类型多索引的切片？ 的相关文章

随机推荐

热门标签

Pandas：如何在python3中使用混合类型多索引的切片？的相关文章