扩展由“from”和“to”列定义的范围

2023-11-21

此问题也称为“将“开始-结束”数据集转换为面板数据集”

我有一个数据框包含"name"美国总统的任期开始和结束年份,("from" and "to"列)。这是一个示例:

presidents <- data.frame(
  name = c("Bill Clinton", "George W. Bush", "Barack Obama"),
  from = c(1993, 2001, 2009),
  to = c(2001, 2009, 2012)
)
presidents
#>             name from   to
#> 1   Bill Clinton 1993 2001
#> 2 George W. Bush 2001 2009
#> 3   Barack Obama 2009 2012

我想创建包含两列的数据框("name" and "year"),每行代表总统在任的年份。因此,我需要从“开始”每年创建一个常规序列from", to "to"。这是我的预期结果:

name           year
Bill Clinton   1993
Bill Clinton   1994
...
Bill Clinton   2000
Bill Clinton   2001
George W. Bush 2001
George W. Bush 2002
... 
George W. Bush 2008
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012

我知道我可以使用data.frame(name = "Bill Clinton", year = seq(1993, 2001))为单个总统扩展事物,但我不知道如何为每位总统迭代。

我该怎么做呢?我觉得我应该知道这一点,但我却一片空白。

Update 1

好的,我已经尝试了两种解决方案,但出现错误:

foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1

这是一个data.table解决方案。它有一个很好的(如果是次要的)功能,可以让总统按照提供的顺序排列:

library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
#               name year
#  1:   Bill Clinton 1993
#  2:   Bill Clinton 1994
#  ...
#  ...
# 21:   Barack Obama 2011
# 22:   Barack Obama 2012

Edit:要处理非连续任期的总统,请使用以下命令:

dt[, list(year = seq(from, to)), by = c("name", "from")]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

扩展由“from”和“to”列定义的范围 的相关文章

随机推荐