Pandas：根据是否为 NaN 来移动列

2024-05-08

我有一个像这样的数据框：

phone_number_1_clean    phone_number_2_clean    phone_number_3_clean
                 NaN                     NaN                 8546987
             8316589                 8751369                     NaN
             4569874                     NaN                 2645981

我想phone_number_1_clean人口尽可能多。这将需要转移phone_number_2_clean or phone_number_3_clean to phone_number_1_clean反之亦然意味着得到phone_number_2_clean尽可能多的人口，如果phone_number_1_clean是否有人居住等

输出应该类似于：

phone_number_1_clean    phone_number_2_clean    phone_number_3_clean
             8546987                     NaN                     NaN
             8316589                 8751369                     NaN
             4569874                 2645981                     NaN

我也许能做到np.where声明，但可能会很混乱。

该方法最好是矢量化，因为将应用于大型数据帧。

Use:

#for each row remove NaNs and create new Series - rows in final df 
df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
#if possible different number of columns like original df is necessary reindex
df1 = df1.reindex(columns=range(len(df.columns)))
#assign original columns names
df1.columns = df.columns
print (df1)
  phone_number_1_clean phone_number_2_clean  phone_number_3_clean
0              8546987                  NaN                   NaN
1              8316589              8751369                   NaN
2              4569874              2645981                   NaN

Or:

s = df.stack()
s.index = [s.index.get_level_values(0), s.groupby(level=0).cumcount()]

df1 = s.unstack().reindex(columns=range(len(df.columns)))
df1.columns = df.columns
print (df1)
  phone_number_1_clean phone_number_2_clean  phone_number_3_clean
0              8546987                  NaN                   NaN
1              8316589              8751369                   NaN
2              4569874              2645981                   NaN

或者稍微改变一下justify https://stackoverflow.com/a/47898659功能：

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notnull(a) #changed to pandas notnull
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val, dtype=object) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

df = pd.DataFrame(justify(df.values, invalid_val=np.nan),  
                  index=df.index, columns=df.columns)
print (df)
  phone_number_1_clean phone_number_2_clean phone_number_3_clean
0              8546987                  NaN                  NaN
1              8316589              8751369                  NaN
2              4569874              2645981                  NaN

表现:

#3k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [442]: %%timeit
     ...: df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
     ...: #if possible different number of columns like original df is necessary reindex
     ...: df1 = df1.reindex(columns=range(len(df.columns)))
     ...: #assign original columns names
     ...: df1.columns = df.columns
     ...: 
1.17 s ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [443]: %%timeit
     ...: s = df.stack()
     ...: s.index = [s.index.get_level_values(0), s.groupby(level=0).cumcount()]
     ...: 
     ...: df1 = s.unstack().reindex(columns=range(len(df.columns)))
     ...: df1.columns = df.columns
     ...: 
     ...: 
5.88 ms ± 74.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [444]: %%timeit
     ...: pd.DataFrame(justify(df.values, invalid_val=np.nan),
          index=df.index, columns=df.columns)
     ...: 
941 µs ± 131 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

pandas

Pandas：根据是否为 NaN 来移动列的相关文章

Python 转换矩阵

我有一个如下所示的列表 2 1 3 1 2 3 1 2 2 2 我想要的是一个转换矩阵它向我显示如下序列 1 后跟 1 的频率是多少 1 后面跟着 2 的频率是多少 1 后跟 3 的频率是多少 2 后跟 1 的频率是多少 2 后跟 2 的
如何使用一个模型中间层的输出作为另一个模型的输入？

我训练一个模型A并尝试使用中间层的输出name layer x 作为模型的附加输入B 我尝试像 Keras 文档一样使用中间层的输出https keras io getting started faq how can i obtain th
Keras model.predict 函数给出输入形状错误

我已经在 Tensorflow 中实现了通用句子编码器现在我正在尝试预测句子的类概率我也将字符串转换为数组 Code if model model type universal classifier basic class probs
如何计算数据框中按另一列的列值分组的一列的连续字符串值？

我有以下数据框 Levels Labels Confidence 0 Hands 0 8 0 Leg 0 7 0 Eye 0 9 1 Ear 0 9 1 Eye 0 8 2 Hands 0 9 2 Eye 0 8 3 Eye 0 8 我想检
Matplotlib 图例，跨列添加项目而不是向下添加项目

对于下面的简单绘图有没有办法让 matplotlib 填充图例以便它从左到右填充行而不是第一列然后第二列 gt gt gt from pylab import gt gt gt x arange 2 pi 2 pi 0 1 gt gt
为什么在连接两个字符串时 Python 比 C 更快？

目前我想比较 Python 和 C 用来处理字符串的速度我认为 C 应该比 Python 提供更好的性能然而我得到了完全相反的结果这是 C 程序 include
带有 mkdocs 的本地 mathjax

我想在无法访问互联网的计算机上使用 MathJax 和 Mkdocs 因此我不能只调用 Mathjax CDN Config mkdocs yml site name My Docs extra javascript javascripts
Django 多对多关系（类别）

我的目标是向我的 Post 模型添加类别我希望以后能够按不同类别有时是多个类别查询所有帖子模型 py class Category models Model categories 1 red 2 blue 3 black title
设置高亮大括号的 vim 颜色主题

如何更改突出显示大括号的 vim 配色方案我希望实际编辑 vim 主题文件以使更改永久生效问候克雷格匹配括号的自动高亮颜色称为MatchParen 您可以通过执行以下操作来更改 vimrc 中的颜色 highlight MatchP
将 window.location 传递给 Flask url_for

我正在使用 python 在我的页面上当匿名用户转到登录页面时我想将一个变量传递到后端以便它指示用户来自哪里发送 URL 因此当用户单击此锚链接时 a href Sign in a 我想发送用户当前所在页面的当前 URL
通过 Python 循环浏览网络上的目录并显示其内容（文件和其他目录）

同样的道理在Python中处理从源目录到目标目录的一组文件 https stackoverflow com questions 2593399 process a set of files from a source directory t
无法使用 python rasterio、gdal 打开 jp2 （来自哨兵）

我试图在 python 中将 jp2 栅格产品作为栅格打开但当我们使用 raterio 和 gdal 包时没有成功我收到此错误 RasterioIOError b4 jp2 not recognized as a supported f
python Recipe：列出最接近等于值的项[关闭]

Closed 这个问题需要多问focused help closed questions 目前不接受答案考虑像这样的列表 0 3 7 10 12 15 19 21 我想获得最接近任何值的最近的最小数字所以如果我通过4 我会得到3 如果我
如何获取分类数据的分组条形图

I have a big dataset with information about students And I have to build a graph of dependencies between different value
如何在matplotlib中调整x轴

I have a graph like this x轴上的数据表示小时所以我希望x轴设置为0 24 48 72 而不是现在的值很难看到 0 100 之间的数据 fig1 plt figure ax fig1 add subplot 11
无法导入QUERY_TERMS

我正在运行一个网站Python and Django Django filters 2 1 installed Django 2 1 installed 当我运行时我收到以下错误 importError Could not import
Python 2.7 缩进错误[关闭]

Closed 这个问题不符合堆栈溢出指南 help closed questions 目前不接受答案这个问题是由拼写错误或无法再重现的问题引起的虽然类似的问题可能是on topic help on topic在这里这个问题的解决方式不
是否可以使用 Anaconda 包作为 Google Cloud Functions 的依赖项？

我正在使用 Python 运行时编写 Google Cloud Function 我需要包含一些无法使用的依赖项pip 如文档中所述here https cloud google com functions docs writing spe
Python：如何在不先创建整个列表的情况下计算列表的总和？

通常我们必须 1 声明一个列表 2 使用以下方法计算该列表的总和sum 但现在我希望指定一个以 1 开头间隔为 4 100 个元素的列表如下所示 1 5 9 13 17 21 25 29 33 37 我不想涉及数学公式所以 1 如何在
Python 读取未格式化的直接访问 Fortran 90 给出不正确的输出

这是数据的写入方式它是一个二维浮点矩阵我不确定大小 open unit 51 file rmsd nn output form unformatted access direct status replace recl Npoints

随机推荐

类型 '' 未映射

我已经尝试修复这个错误有一段时间了每当我的应用程序尝试创建数据上下文的实例时我都会收到此错误下面是代码 using System using System Collections Generic using System Linq u
如何在 MySQL 查询编辑器中对列重新排序？

我想移动专栏OtherSupport below Amount2 是否有捷径可寻 ALTER TABLE myTable MODIFY OtherSupport VARCHAR 50 AFTER Amount2
jquery 验证数组输入的添加规则[重复]

这个问题在这里已经有答案了我想将复选框值存储在数组中但是我无法使用验证规则因为名称是selectList 代替selectList 我尝试了 id 但似乎规则只绑定到名称 html
为什么CreateUserWizard Control会自动添加ASPNETDB.MDF数据库？

我只想使用 CreateUserWizard Control 从用户收集信息并将其插入到我的自定义数据库中我不想使用 Asp Net Membership 当我将此设置添加到 web config 时
仅在满足条件时添加到字典

我在用urllib urlencode构建 Web POST 参数但是有一些值我只想在除None为他们而存在 apple green orange orange params urllib urlencode apple apple or
对聚合物发布的属性感到困惑

我已经深入研究了聚合物的ajax核心元素如下代码工作正常
为什么 -march=native 很少使用？

对于大多数 C C 编译器有一个可传递给编译器的标志 march native 它告诉编译器调整为主机 CPU 的微架构和 ISA 扩展生成的代码即使它的名称不同基于 LLVM 的编译器通常也有一个等效的选项例如rustc or s
Bazel：为 cc_binary/cc_test 设置运行时环境变量和配置文件位置

我正在尝试在 Linux 上的 C 应用程序中使用 odbc 以下构建文件用于将库作为外部依赖项包含在内 licenses notice cc library name lib srcs lib libodbc so lib64 libod
ContactsContract.CommonDataKinds.Phone.CONTENT_URI 与 ContactsContract.Contacts.CONTENT_URI

In 如何在android中检索联系人列表 https stackoverflow com questions 16124034 how to retrieve the list of contacts in android我看到代码允许您
如何在 TYPO3 扩展中设置内容元素或插件的图标

如何为内容元素和插件配置图标有没有快捷方式可以只配置一次而不是在 3 个地方配置 AFAIK 有3个地方可以配置icons在创建新的自定义内容元素和插件时TYPO3后端新内容元素向导编辑内容元素 CE 时 CType list typ
在 Eclipse Testrunner 中使用名称的 ParameterizedTest

当您使用 Eclipse TestRunner 运行 JUnit 4 ParameterizedTest 时图形表示相当愚蠢对于每个测试您都有一个名为 0 1 ETC 是否可以进行测试 0 1 等显式名称实施一个toString测试
scanf() 不等待用户输入[重复]

这个问题在这里已经有答案了我正在使用 c 中的双向链表来制作树我在该函数中使用递归调用但不知何故它不起作用我的代码是 struct node int data struct node right struct node left s
如何在 conda 中静音或抑制 gfortran （或 clang？）后端？

我一直致力于构建一个非常特殊的 conda 环境专为python and R与串扰使用rpy2 我想出的方法可以安装正确的R包如下 install main environment sh now date T echo Start Tim
string() 类型的值无法转换为字符串

我不断收到此错误我尝试了所有可能的方法但它仍然显示 String 的值类型无法转换为字符串这是代码 Private Sub Label1 Click sender As Object e As EventArgs Handles La
PHP - 从图像创建一张图像

我有 n 张图像想用 php 代码创建一张我使用 imagecopymerge 但无法成功请举一些例子 Code numberOfImages 3 x 940 y 420 background imagecreatetruecolor
限制 WooCommerce 上的域名注册

如何限制用户电子邮件对 WooCommerce 注册中特定域的访问我发现这段代码可以做到这一点但由于某种原因它在 WooCommerce 注册表单上不起作用如果我进入 WP 登录页面它就会起作用任何帮助表示赞赏 function
恢复后如何挑选提交？

我正在研究我的feature branch并在审核后合并到development待部署后来一位同事决定发布一个版本并将他和我的合并到master 在部署时他意识到他的代码有错误并恢复了master 在我们的分叉和拉动流程中这意味着
保存时出现 iphone 核心数据未解决的错误

尝试保存时我从核心数据中收到一条奇怪的错误消息但问题是错误不可重现在执行不同任务时它会在不同时间出现错误消息 Unresolved error Domain NSCocoaErrorDomain Code 1560 UserInf
整数转浮点数

这段代码的工作原理 posToXY Float gt Float gt Integer posToXY a b do let y a b round y 但这不起作用 posToXY Integer gt Integer gt Intege
Pandas：根据是否为 NaN 来移动列

我有一个像这样的数据框 phone number 1 clean phone number 2 clean phone number 3 clean NaN NaN 8546987 8316589 8751369 NaN 4569874 N

Pandas：根据是否为 ​​NaN 来移动列

Pandas：根据是否为 ​​NaN 来移动列 的相关文章

随机推荐

热门标签

Pandas：根据是否为 NaN 来移动列

Pandas：根据是否为 NaN 来移动列的相关文章