我有一个类似于下面的示例的数据框(这是我的实际数据框的一小部分摘录)。
frequencies <- data.frame(sex=c("female", "female", "male", "male", "female", "female", "male", "male", "female", "female", "male", "male", "female", "female", "male", "male"),
ecotype=c("Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave"),
contig_ID=c("Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367",
"Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481"),
allele=c("p", "p", "p", "p", "q", "q", "q", "q", "p", "p", "p", "p", "q", "q", "q", "q"),
frequency=c(157, 98, 140, 65, 29, 8, 26, 9, 182, 108, 147, 80, 46, 4, 49, 4))
我想对“contig_ID”和“ecotype”的每个组合进行单独的卡方应急测试,测试“性别”和“等位基因”之间的关联。然后,我想将这些结果总结在一个表中,其中包括“contig_ID”和“ecotype”每种组合的 p 值。例如,从给出的示例表中,我期望有 4 个 p 值的结果表,如下例所示。
results <- data.frame(ecotype=c("Crab", "Wave", "Crab", "Wave"),
contig_ID=c("Contig100169_2367", "Contig100169_2367", "Contig100169_2481", "Contig100169_2481"),
pvalue=c("pval", "pval", "pval", "pval"))
或者,仅向原始表添加 p 值列也可以,每个组合的 p 值仅在所有相关行中重复。
我一直在尝试使用诸如lapply()
and summarise()
结合chisq.test()
实现这一目标,但到目前为止还没有运气。我也尝试过使用类似的方法:表中每一行的 R 卡方检验(3x2 列联表) https://stackoverflow.com/questions/34232869/r-chi-squared-test-3x2-contingency-table-for-each-row-in-a-table,但也无法完成这项工作。