这可能有一个简单的解决方案,但我似乎无法破解它。
例如,假设我有一个列出购买和客户详细信息的表:
library(data.table)
purchase <- setDT(structure(list(Name = c("John", "John", "Mary"), Surname = c("Smith",
"Smith", "Jane"), PurchaseDate = c("2017-01-01", "2015-01-01",
"2017-01-02")), .Names = c("Name", "Surname", "PurchaseDate"), row.names = c(NA,
-3L), class = c("data.table", "data.frame")))
> purchase
Name Surname PurchaseDate
1: John Smith 2017-01-01
2: John Smith 2015-01-01
3: Mary Jane 2017-01-02
我想知道这些客户在购买时是否持有有效的折扣卡,这与两个数据库中保存的数据相匹配:
df1 <- setDT(structure(list(Name = "John", Surname = "Smith", ValidFrom = "2016-12-31",
ValidTo = "2017-01-02"), .Names = c("Name", "Surname", "ValidFrom",
"ValidTo"), row.names = c(NA, -1L), class = c("data.table", "data.frame")))
df2 <- setDT(structure(list(Name = "Mary", Surname = "Jane", ValidFrom = "2017-01-01",
ValidTo = "2017-01-03"), .Names = c("Name", "Surname", "ValidFrom",
"ValidTo"), row.names = c(NA, -1L), class = c("data.table", "data.frame")))
> df1
Name Surname ValidFrom ValidTo
1: John Smith 2016-12-31 2017-01-02
> df2
Name Surname ValidFrom ValidTo
1: Mary Jane 2017-01-01 2017-01-03
我正在适应this解决方案,它使用data.table
library(data.table)
purchase[df1, on=c(Name='Name', Surname='Surname'), Match := 'Yes']
purchase[df2, on=c(Name='Name', Surname='Surname'), Match := 'Yes']
结果(基于左连接)保存到Match
原来的变量purchase
桌子。 (重要的是,这不需要创建新对象,而是将结果保存到原始对象中,否则会变得混乱。)
> purchase
Name Surname PurchaseDate Match
1: John Smith 2017-01-01 Yes
2: John Smith 2015-01-01 Yes
3: Mary Jane 2017-01-02 Yes
但是,我还需要检查PurchaseDate
是在ValidFrom
and ValidTo
日期,并且不知道如何执行此操作。
为此,我可以引入ValidFrom
and ValidTo
加入日期,然后使用以下方法确定购买是否在这些日期之间ifelse
.
purchase[df1, on=c(Name='Name', Surname='Surname'), `:=`(Match='Yes', VFrom=ValidFrom, VTo=ValidTo)]
purchase[df2, on=c(Name='Name', Surname='Surname'), `:=`(Match='Yes', VFrom=ValidFrom, VTo=ValidTo)]
伟大的!这带来了日期:
Name Surname PurchaseDate Match VFrom VTo
1: John Smith 2017-01-01 Yes 2016-12-31 2017-01-02
2: John Smith 2015-01-01 Yes 2016-12-31 2017-01-02
3: Mary Jane 2017-01-02 Yes 2017-01-01 2017-01-03
但是,如果客户有两张折扣卡,并且一次购买仅在其中一张折扣卡的有效期内,就会出现问题。假设玛丽有两张牌:
df2 <- setDT(structure(list(Name = structure(c(1L, 1L), .Label = "Mary", class = "factor"),
Surname = structure(c(1L, 1L), .Label = "Jane", class = "factor"),
ValidFrom = structure(1:2, .Label = c("2017-01-01", "1945-01-01"
), class = "factor"), ValidTo = structure(1:2, .Label = c("2017-01-03",
"1946-01-01"), class = "factor")), .Names = c("Name", "Surname",
"ValidFrom", "ValidTo"), row.names = c(NA, -2L), class = c("data.table", "data.frame")))
> df2
Name Surname ValidFrom ValidTo
1: Mary Jane 2017-01-01 2017-01-03
2: Mary Jane 1945-01-01 1946-01-01
运行这个
purchase[df2, on=c(Name='Name', Surname='Surname'), `:=`(Match='Yes', VFrom=ValidFrom, VTo=ValidTo)]
仅带来这些日期对之一(显然是最早的日期,无论行号如何)。
Name Surname PurchaseDate Match VFrom VTo
1: John Smith 2017-01-01 Yes 2016-12-31 2017-01-02
2: John Smith 2015-01-01 Yes 2016-12-31 2017-01-02
3: Mary Jane 2017-01-02 Yes 1945-01-01 1946-01-01
我如何引入所有匹配的行?
据我所知,X[Y]
语法支持附加到原始对象(我需要),而且:=
函数,我需要,但不支持完全连接。替代merge
支持完全连接,但需要在每个连接步骤创建新对象(会非常混乱),并且不支持:=
。有任何想法吗?有没有办法使用foverlaps
不知何故?