我正在寻找反馈,以确定如何正确指定随机效应来解释重复测量设计中的相关性,但具有多个相关性级别(包括纵向数据)对于每个预测变量组合)。结果是二元的,所以我将拟合逻辑混合模型。我本来打算使用glmer()
函数从lme4
包裹。如果您想知道这些数据是如何产生的,眼动仪就是一个例子:人们的眼睛被“跟踪”30 秒,例如,在不同水平的预测变量下,确定他们是否看着屏幕上的某个对象(因此是二进制结果)。
学习规划(可以通过处理下面R中“Dummy dataset”下的代码看到):
- The outcome (Binary_outcome) is binary.
- 有重复措施:每个受试者的二元响应在每个预测变量组合中被记录多次(有关结构,请参阅下面的“虚拟数据集”)。
- There are two predictors of interest (both binary, categorical):
- One 科目间因素,Sex(男/女)。
- One 科目内因素,干涉(前/后)。
- Each subject is measured over six trials (under which there are repeated measures), Trial.
- 注意有12个possible考验一个人could被分配。因此,并不是每个受试者都参与全部 12 项试验,而是随机的 6 项试验。
-
Trial is not感兴趣的变量。人们仅仅认为,对个体内部的观察,在审判中可能会更加相似,因此Trial也应被视为集群相关性的一种形式。
虚拟数据集:显示我的数据的一般结构(尽管这不是实际的数据集):
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c("A", "A",
"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E",
"F", "F", "F", "G", "G", "G", "E", "E", "E", "D", "D", "D", "A",
"A", "A", "J", "J", "J", "L", "L", "L"), Intervention = c("Pre", "Pre", "Pre", "Pre",
"Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Post",
"Post", "Post", "Post", "Post", "Post", "Pre", "Pre", "Pre",
"Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post",
"Post", "Post", "Post", "Post", "Post", "Post"), Sex = c("Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male"), Binary_outcome = c(1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -36L))
当前使用的代码:这是我目前正在使用的,但我不知道是否应该根据数据结构以不同方式指定随机效应(在“正确计算相关性”下概述)。
install.packages("lme4")
library(lme4)
logit_model <- glmer(Binary_outcome ~ factor(Sex)*factor(Intervention) +
(1 | Trial) +
(1 | Subject),
data = data01,
family="binomial")
正确计算相关性: 这就是我的问题所在。评论/问题:
- 我相信两者Subject and Trial随机效应是交叉的(不是嵌套的),因为主题 1 始终是主题 1,试验 A 始终是试验 A。如果设计是嵌套的,则无法像您可以的那样重新编号/重新字母(参见,例如:https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified).
- As can be seen above under "Current code being used," I have included the fixed effects of interest (Sex, Intervention, and Sex**Intervention*), and random intercepts for Trial and Subject using
+ (1 | Trial) + (1 | Subject)
.
- Does
+ (1 | Trial) + (1 | Subject)
正确地“告诉”模型来解释人体内的相关性,在审判中,或者是否需要以其他方式指定?尽管我不认为随机效应是嵌套的,但仍然感觉存在“层次结构”,但也许这已经被解释为+ (1 | Trial) + (1 | Subject)
.
- 这些数据似乎很独特,因为即使在试验中,每个受试者也有多个测量值(0 秒/1 秒)。我不确定这对模型拟合的影响。
- 我是否需要进一步告诉模型来区分受试者内和受试者间的固定效应?或者代码“自动”“拾取”这个
+ (1 | Trial) + (1 | Subject)
?当您简单地为主题指定随机截距时,它会正确地执行此操作lme()
with + (1 | Subject)
, or aov()
with + Error(Subject)
, 例如。这就是为什么我简单地使用+ (1 | Trial) + (1 | Subject)
here.
- 最后,我不知道不是每个受试者都进行每次试验(而是 12 次可能的试验中的 6 次)是否重要,以及这是否会影响代码的某些方面。
我正在寻找您的反馈,最好还有用于确定您的反馈的参考文献(文本、同行评审的论文)。我有多篇关于逻辑回归、更广泛的分类数据分析和混合模型的文章,但据我所知,它们都没有汇集我在这里提出的想法。因此,了解对这种情况是否特别有用的资源也会有所帮助。