首先,因为看起来你只关注c('WATER.START', 'WATER.STOP')
, subset
你的第一个df
。接下来,对于每一个MARGIN=1
(即每一行),我们apply
匿名函数\(x)
选择适当的列df2
, using tolower
匹配大小写,并且cbind
在一起。最后使用重命名列setNames
and rbind
结果列表。
subset(df, Conflict %in% c('WATER.START', 'WATER.STOP')) |>
apply(MARGIN=1, \(x) {
mt <- match(tolower(x[2]), tolower(names(df2)))
cbind(t(x[1:2]), df2[df2$study_id == x[1], c(mt, mt + 1)]) |>
setNames(c('study_id', 'Conflict', 'initial', 'verification'))
}) |> do.call(what=rbind)
# study_id Conflict initial verification
# 1 1 WATER.START 1 1
# 2 1 WATER.STOP 33 34
# 5 5 WATER.STOP 8 8
你也可以使用字典a
(这可能会扩展到您可能正在使用的其他级别)。
a <- c(WATER.START='WATER.start', WATER.STOP='WATER.stop')
subset(df, Conflict %in% c('WATER.START', 'WATER.STOP')) |>
apply(MARGIN=1, \(x) {
mt <- match(a[match(x[2], names(a))], names(df2))
cbind(t(x[1:2]), df2[df2$study_id == x[1], c(mt, mt + 1)]) |>
setNames(c('study_id', 'Conflict', 'initial', 'verification'))
}) |> do.call(what=rbind)
# study_id Conflict initial verification
# 1 1 WATER.START 1 1
# 2 1 WATER.STOP 33 34
# 5 5 WATER.STOP 8 8
However,我认为你真正需要的是reshape
您的数据。
## basic
reshape(df2, direction='long', idvar=1, varying=list(c(2, 4), c(3, 5)))
# study_id time WATER.start WATER.truestart
# 1.1 1 1 1 1
# 2.1 2 1 1 1
# 3.1 3 1 2 2
# 4.1 4 1 NA NA
# 5.1 5 1 6 25
# 1.2 1 2 33 34
# 2.2 2 2 3 4
# 3.2 3 2 2 2
# 4.2 4 2 NA NA
# 5.2 5 2 8 8
## enhanced
reshape(df2, dir='long', idvar='study_id',
varying=list(c("WATER.start", "WATER.stop"), c("WATER.truestart", "WATER.truestop")),
timevar='foo', times=c('water.start', 'water.stop'), v.names=c('initial', 'verification'))
# study_id foo initial verification
# 1.water.start 1 water.start 1 1
# 2.water.start 2 water.start 1 1
# 3.water.start 3 water.start 2 2
# 4.water.start 4 water.start NA NA
# 5.water.start 5 water.start 6 25
# 1.water.stop 1 water.stop 33 34
# 2.water.stop 2 water.stop 3 4
# 3.water.stop 3 water.stop 2 2
# 4.water.stop 4 water.stop NA NA
# 5.water.stop 5 water.stop 8 8