lmer（来自 R 包 lme4）如何计算对数似然？

2024-05-14

我试图理解 lmer 函数。我发现了很多关于如何使用该命令的信息，但关于它实际执行的操作的信息却很少（除了这里的一些神秘注释：http://www.bioconductor.org/help/course-materials/2008/PHSIntro/lme4Intro-handout-6.pdf http://www.bioconductor.org/help/course-materials/2008/PHSIntro/lme4Intro-handout-6.pdf）。我正在玩以下简单的例子：

library(data.table)
library(lme4)
options(digits=15)

n<-1000
m<-100
data<-data.table(id=sample(1:m,n,replace=T),key="id")
b<-rnorm(m)
data$y<-rand[data$id]+rnorm(n)*0.1
fitted<-lmer(b~(1|id),data=data,verbose=T)
fitted

我知道 lmer 正在拟合 Y_{ij} = beta + B_i + epsilon_{ij} 形式的模型，其中 epsilon_{ij} 和 B_i 是独立法线，分别具有方差 sigma^2 和 tau^2。如果 theta = tau/sigma 是固定的，我用正确的均值和最小方差计算出 beta 的估计值

c = sum_{i,j} alpha_i y_{ij}

where

alpha_i = lambda/(1 + theta^2 n_i)
lambda = 1/[\sum_i n_i/(1+theta^2 n_i)]
n_i = number of observations from group i

~~I also computed the following unbiased estimate for sigma^2:~~

~~s^2 = \sum_{i,j} alpha_i (y_{ij} - c)^2 / (1 + theta^2 - lambda)~~

这些估计似乎与 lmer 的结果一致。但是，我无法弄清楚在这种情况下如何定义对数似然。我计算出概率密度为

pd(Y_{ij}=y_{ij}) = \prod_{i,j}[f_sigma(y_{ij}-ybar_i)]
    * prod_i[f_{sqrt(sigma^2/n_i+tau^2)}(ybar_i-beta) sigma sqrt(2 pi/n_i)]

where

ybar_i = \sum_j y_{ij}/n_i (the mean of observations in group i)
f_sigma(x) = 1/(sqrt{2 pi}sigma) exp(-x^2/(2 sigma)) (normal density with sd sigma)

但上面的日志不是 lmer 产生的。在这种情况下如何计算对数似然（对于奖励分数，为什么）？

Edit:更改了一致性符号，删除了标准偏差估计的错误公式。

评论中的链接包含了答案。下面我将公式简化后的内容放在这个简单的示例中，因为结果有些直观。

lmer fits a model of the form $Y_{ij} = \beta + B_i + \epsilon_{ij}$ , where $\epsilon_{ij}$ and $B_i$ are independent normals with variances $\sigma^2$ and $\tau^2$ respectively. The joint probability distribution of $Y_{ij}$ and $B_i$ is therefore

where

$f_{\sigma^2}(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}$ .

The likelihood is obtained by integrating this with respect to $b_i$ (which isn't observed) to give

where $n_i$ is the number of observations from group $i$ , and $\bar y_i$ is the mean of observations from group $i$ . This is somewhat intuitive since the first term captures spread within each group, which should have variance $\sigma^2$ , and the second captures the spread between groups. Note that $\sigma^2/n_i+\tau^2$ is the variance of $\bar y_i$ .

However, by default (REML=T) lmer maximises not the likelihood but the "REML criterion", obtained by additionally integrating this with respect to $\beta$ to give

where $\hat\beta$ is given below.

最大化似然 (REML=F)

If $\theta=\tau/\sigma$ is fixed, we can explicitly find the $\beta$ and $\sigma$ which maximise likelihood. They turn out to be

Note $\hat\sigma^2$ has two terms for variation within and between groups, and $\hat\beta$ is somewhere between the mean of $y_{ij}$ and the mean of $\bar y_i$ depending on the value of $\theta$ .

Substituting these into likelihood, we can express the log likelihood $l$ in terms of $\theta$ only:

lmer iterates to find the value of $\theta$ which minimises this. In the output, $-2l$ and $l$ are shown in the fields "deviance" and "logLik" (if REML=F) respectively.

最大化限制似然 (REML=T)

Since the REML criterion doesn't depend on $\beta$ , we use the same estimate for $\beta$ as above. We estimate $\sigma$ to maximise the REML criterion:

The restricted log likelihood $l_R$ is given by

In the output of lmer, $-2l_R$ and $l_R$ are shown in the fields "REMLdev" and "logLik" (if REML=T) respectively.

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)