AssertionError:unstack() 数据帧时 blk ref_locs 中存在间隙

2024-03-29

我正在尝试 unstack() Pandas 数据框中的数据,但我不断收到此错误,我不知道为什么。这是到目前为止我的代码和我的数据示例。我尝试修复它是删除 voteId 不是数字的所有行,这不适用于我的实际数据集。当我部署代码时,在 Anaconda 笔记本(我正在开发的地方)和生产环境中都会发生这种情况。

我无法弄清楚如何在示例代码中重现错误...可能是由于当您像我在示例中那样实例化数据帧时不存在类型转换问题?

#dataset simulate likely input
# d = {'vote': [100, 50,1,23,55,67,89,44], 
#      'vote2': [10, 2,18,26,77,99,9,40], 
#      'ballot1': ['a','b','a','a','b','a','c','c'],
#      'voteId':[1,2,3,4,5,'aaa',7,'NaN']}
# df1=pd.DataFrame(d)
#########################################################

df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')

s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format) 
dflw=pd.DataFrame(s)

完整的错误消息/堆栈跟踪:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-0a520180a8d9> in <module>()
     22 df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')
     23 
---> 24 s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
     25 s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)
     26 dflw=pd.DataFrame(s)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in unstack(self, level, fill_value)
   4567         """
   4568         from pandas.core.reshape.reshape import unstack
-> 4569         return unstack(self, level, fill_value)
   4570 
   4571     _shared_docs['melt'] = ("""

~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value)
    467     if isinstance(obj, DataFrame):
    468         if isinstance(obj.index, MultiIndex):
--> 469             return _unstack_frame(obj, level, fill_value=fill_value)
    470         else:
    471             return obj.T.stack(dropna=False)

~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in _unstack_frame(obj, level, fill_value)
    480         unstacker = partial(_Unstacker, index=obj.index,
    481                             level=level, fill_value=fill_value)
--> 482         blocks = obj._data.unstack(unstacker)
    483         klass = type(obj)
    484         return klass(blocks)

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in unstack(self, unstacker_func)
   4349         new_columns = new_columns[columns_mask]
   4350 
-> 4351         bm = BlockManager(new_blocks, [new_columns, new_index])
   4352         return bm
   4353 

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   3035         self._consolidate_check()
   3036 
-> 3037         self._rebuild_blknos_and_blklocs()
   3038 
   3039     def make_empty(self, axes=None):

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)
   3127 
   3128         if (new_blknos == -1).any():
-> 3129             raise AssertionError("Gaps in blk ref_locs")
   3130 
   3131         self._blknos = new_blknos

AssertionError: Gaps in blk ref_locs

要获取触发异常的真实数据,请添加额外的调试信息

Modify ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py

添加行到class BlockManager()

def __init__(self)
    print('BlockManager blocks')
    pprint(self.blocks)
    print('BlockManager axes')
    pprint(self.axes)

您将获得数据:



_unstack_frame level -1 fill_value None 

                 vote  vote2
ballot1 voteId              
NaN     xx      100.0   10.0
False   aaa      50.1    2.0
-1      \n        1.0   18.0
True    NaN      23.0   26.0
b       False    55.0   77.0
a       \        67.0   99.0
c                89.0    9.0
        8        44.0    NaN
  

Modify ~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py

def __unstack_frame(self, ...)
    from pprint import pprint
    print('_unstack_frame level {} fill_value {} {}'.format(level, fill_value, type(obj)))
    pprint(obj)

你会看到数据:




BlockManager blocks
(FloatBlock: slice(0, 16, 1), 16 x 8, dtype: float64,)
BlockManager axes
[MultiIndex(levels=[[u'vote', u'vote2'], [False, 8, u'\n', u' ', u'\', u'aaa', u'xx']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], [-1, 0, 1, 2, 3, 4, 5, 6, -1, 0, 1, 2, 3, 4, 5, 6]],
           names=[None, u'voteId']),
 Index([nan, -1, False, True, u'', u'a', u'b', u'c'], dtype='object', name=u'ballot1')]

  

我确实用另一个例子触发了异常:



  File "/usr/lib64/python2.7/site-packages/pandas/core/internals.py", line 2902, in _rebuild_blknos_and_blklocs
    raise AssertionError("Gaps in blk ref_locs")
AssertionError: Gaps in blk ref_locs


  

带有调试信息



BlockManager blocks
(FloatBlock: [-1, -1, -1], 3 x 2, dtype: float64,)
BlockManager axes
[Index([aaa, bbb, ccc], dtype='object'), Int64Index([0, 1], dtype='int64')]

  
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

AssertionError:unstack() 数据帧时 blk ref_locs 中存在间隙 的相关文章

随机推荐