您可以使用pandas.crosstab
从 DataFrame 生成列联表。从文档中:
计算两个(或更多)因素的简单交叉表。默认情况下,除非传递值数组和聚合函数,否则计算因子的频率表。
下面是一个使用示例:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
# Some fake data.
n = 5 # Number of samples.
d = 3 # Dimensionality.
c = 2 # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])
# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])
# Chi-square test of independence.
c, p, dof, expected = chi2_contingency(contingency)
下列data
table
生成以下内容contingency
table
Then, scipy.stats.chi2_contingency(contingency)
回报(0.052, 0.819, 1, array([[1.6, 0.4],[2.4, 0.6]]))
.