如何检查两个数据集的匹配列之间的相关性?

2024-05-08

如果我们有数据集:

import pandas as pd
a = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34]})
b = pd.DataFrame({"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65]})

如何创建一个相关矩阵,其中 y 轴代表“a”,x 轴代表“b”?

目的是查看两个数据集的匹配列之间的相关性,如下所示:


如果您不介意基于 NumPy 的矢量化解决方案,基于this solution post https://stackoverflow.com/a/30143754/3293881 to Computing the correlation coefficient between two multi-dimensional arrays https://stackoverflow.com/q/30143417/3293881 -

corr2_coeff(a.values.T,b.values.T).T # func from linked solution post.

样本运行 -

In [621]: a
Out[621]: 
    A   B   C   D   E
0  34  54  56   0  78
1  12  87  78  23  12
2  78  35   0  72  31
3  84  25  14  56   0
4  26  82  13  14  34

In [622]: b
Out[622]: 
    A   B   C    D   E
0  45  45  98    0  24
1  24  87  52   23  12
2  65  65  32    1  65
3  65  52  32  365   3
4  65  12  12   53  65

In [623]: corr2_coeff(a.values.T,b.values.T).T
Out[623]: 
array([[ 0.71318502, -0.5923714 , -0.9704441 ,  0.48775228, -0.07401011],
       [ 0.0306753 , -0.0705457 ,  0.48801177,  0.34685977, -0.33942737],
       [-0.26626431, -0.01983468,  0.66110713, -0.50872017,  0.68350413],
       [ 0.58095645, -0.55231196, -0.32053858,  0.38416478, -0.62403866],
       [ 0.01652716,  0.14000468, -0.58238879,  0.12936016,  0.28602349]])
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何检查两个数据集的匹配列之间的相关性? 的相关文章

随机推荐