我正在研究时间序列数据。我有 2 个日期时间列和 1 个会计周列。我给出了一个例子,我遇到如下情况,我需要获取 EditDate 的最大值。
EditDate <- c("2015-04-01 11:40:13", "2015-04-03 02:54:45","2015-04-07 11:40:13")
ID <- c("DL1X8", "DL1X8","DL1X8")
Avg <- c(38.1517, 38.1517, 38.1517)
Sig <- c(11.45880000, 11.45880000, 11.45880000)
InsertDate <- c("2015-04-03 9:40:00", "2015-04-03 9:40:00",2015-04-10 9:40:00)
FW <- c("39","39","40")
df1 <- data.frame(EditDate , ID, Avg, Sig, InsertDate, FW)
这返回
+---------------------+-------+---------+-------------+--------------------+----+
| EditDate | ID | Avg | Sig | InsertDate | FW |
+---------------------+-------+---------+-------------+--------------------+----+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | 2015-04-03 9:40:00 | 39 |
| 2015-04-03 02:54:45 | DL1X8 | 38.1517 | 11.45880000 | 2015-04-03 9:40:00 | 39 |
| 2015-04-07 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | 2015-04-10 9:40:00 | 40 |
+---------------------+-------+---------+-------------+--------------------+----+
我想要的期望输出是
+---------------------+-------+---------+-------------+--------------------+----+
| EditDate | ID | Avg | Sig | InsertDate | FW |
+---------------------+-------+---------+-------------+--------------------+----+
| 2015-04-07 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | 2015-04-10 9:40:00 | 40 |
+---------------------+-------+---------+-------------+--------------------+----+
我尝试使用 sqldf 使用库(RH2),但运行需要很长时间。
df2 <- sqldf("SELECT * FROM df1
WHERE (EditDate = (SELECT MAX(EditDate) FROM df1))
ORDER BY EditDate ASC")
我不确定是否可以使用 dplyr 包来完成。有人可以提供有关如何使用 dplyr 或任何其他替代方案优化此问题的意见吗?