在时间序列的背景下分解

2023-12-12

我有一个数据集,我想要整体可视化并按几个不同的变量进行分类。我创建了一个 Flexdashboard,其中包含一个闪亮的应用程序来选择分解类型,并使用工作代码来绘制正确的子集。

我的方法是重复的,这向我暗示我错过了更好的方法来做到这一点。让我困惑的是需要按日期计数并扩展矩阵。我不确定如何在一个管道中按周进行组计数。我分几个步骤进行并结合起来。

想法?

(ps.我问这个问题RStudio 社区,但我认为这可能更多的是“所以问题“。我无权从 RSC 中删除它,因此对交叉发布表示歉意。)

---
title: "test"
output: 
  flexdashboard::flex_dashboard:
    theme: bootstrap
runtime: shiny
---

```{r setup, include=FALSE}
  library(flexdashboard)
  library(tidyverse)
  library(tibbletime)
  library(dygraphs)
  library(magrittr)
  library(xts)
```

```{r global, include=FALSE}
  set.seed(1)
  dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                               as.Date("2018-06-30"), 
                               "days"),
                    sex = sample(c("male", "female"), 181, replace=TRUE),
                    lang = sample(c("english", "spanish"), 181, replace=TRUE),
                    age = sample(20:35, 181, replace=TRUE))
  dat <- sample_n(dat, 80)
```

Sidebar {.sidebar}
=====================================

```{r}
  radioButtons("diss", label = "Disaggregation",
    choices = list("All" = 1, "By Sex" = 2, "By Language" = 3), 
    selected = 1)
```

Page 1
=====================================

```{r}
# all
  all <- reactive(
  dat %>%  
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>% # convert to tibble time object
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total = 0)) 
  )

# males only
  males <- reactive(
  dat %>%  
    filter(sex=="male") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_m = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_m = 0)) 
  )

# females only
  females <- reactive(
  dat %>%  
    filter(sex=="female") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_f = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_f = 0)) 
  )

# english only
  english <- reactive(
  dat %>%  
    filter(lang=="english") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_e = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_e = 0)) 
  )

# spanish only
  spanish <- reactive(
  dat %>%  
    filter(lang=="spanish") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_s = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_s = 0)) 
  )

# combine

  totals <- reactive({

  all <- all()
  females <- females()
  males <- males()
  english <- english()
  spanish <- spanish()

  all %>%
    select(date, total) %>%
    full_join(select(females, date, total_f), by = "date") %>%
    full_join(select(males, date, total_m), by = "date") %>%
    full_join(select(english, date, total_e), by = "date") %>%
    full_join(select(spanish, date, total_s), by = "date") 
  })

# convert to xts
  totals_ <- reactive({
    totals <- totals()
    xts(totals, order.by = totals$date)
  })

# plot
  renderDygraph({

  totals_ <- totals_()

  if (input$diss == 1) {
  dygraph(totals_[, "total"],
          main= "All") %>%
    dySeries("total", label = "All") %>%
    dyRangeSelector() %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else if (input$diss == 2) {
    dygraph(totals_[, c("total_f", "total_m")],
            main = "By sex") %>%
    dyRangeSelector() %>%
    dySeries("total_f", label = "Female") %>%
    dySeries("total_m", label = "Male") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else {
    dygraph(totals_[, c("total_e", "total_s")],
            main = "By language") %>%
    dyRangeSelector() %>%
    dySeries("total_e", label = "English") %>%
    dySeries("total_s", label = "Spanish") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE)
  }
  })
```

Update:

@Jon Spring 建议编写一个函数来减少一些重复(在下面应用),这是一个很好的改进。不过,基本方法是相同的。分割、计算、组合、绘图。有没有一种方法可以做到这一点而无需分解并重新组合在一起?

---
title: "test"
output: 
  flexdashboard::flex_dashboard:
    theme: bootstrap
runtime: shiny
---

```{r setup, include=FALSE}
  library(flexdashboard)
  library(tidyverse)
  library(tibbletime)
  library(dygraphs)
  library(magrittr)
  library(xts)
```

```{r global, include=FALSE}
# generate data
  set.seed(1)
  dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                               as.Date("2018-06-30"), 
                               "days"),
                    sex = sample(c("male", "female"), 181, replace=TRUE),
                    lang = sample(c("english", "spanish"), 181, replace=TRUE),
                    age = sample(20:35, 181, replace=TRUE))
  dat <- sample_n(dat, 80)

# Jon Spring's function
  prep_dat <- function(filtered_dat, col_name = "total") {
  filtered_dat %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
    select(date, new) %>%
    tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
    group_by(date) %>%
    mutate(total = sum(new, na.rm = TRUE)) %>%
    distinct(date, .keep_all = TRUE) %>%
    ungroup() %>%
    # expand matrix to include weeks without data
    complete(
      date = seq(date[1], date[length(date)], by = "1 week"),
      fill = list(total = 0)
    )
  }
```

Sidebar {.sidebar}
=====================================

```{r}
  radioButtons("diss", label = "Disaggregation",
    choices = list("All" = 1, "By Sex" = 2, "By Language" = 3), 
    selected = 1)
```

Page 1
=====================================

```{r}
# all
  all <- reactive(
  prep_dat(dat) 
  )

# males only
  males <- reactive(
  prep_dat(
    dat %>% 
    filter(sex == "male")
  ) %>% 
    rename("total_m" = "total")
  )

# females only
  females <- reactive(
  prep_dat(
    dat %>% 
    filter(sex == "female")
  ) %>% 
    rename("total_f" = "total")
  )

# english only
  english <- reactive(
  prep_dat(
    dat %>% 
    filter(lang == "english")
  ) %>% 
    rename("total_e" = "total")
  )

# spanish only
  spanish <- reactive(
  prep_dat(
    dat %>% 
    filter(lang == "spanish")
  ) %>% 
    rename("total_s" = "total")
  )

# combine

  totals <- reactive({

  all <- all()
  females <- females()
  males <- males()
  english <- english()
  spanish <- spanish()

  all %>%
    select(date, total) %>%
    full_join(select(females, date, total_f), by = "date") %>%
    full_join(select(males, date, total_m), by = "date") %>%
    full_join(select(english, date, total_e), by = "date") %>%
    full_join(select(spanish, date, total_s), by = "date") 
  })

# convert to xts
  totals_ <- reactive({
    totals <- totals()
    xts(totals, order.by = totals$date)
  })

# plot
  renderDygraph({

  totals_ <- totals_()

  if (input$diss == 1) {
  dygraph(totals_[, "total"],
          main= "All") %>%
    dySeries("total", label = "All") %>%
    dyRangeSelector() %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else if (input$diss == 2) {
    dygraph(totals_[, c("total_f", "total_m")],
            main = "By sex") %>%
    dyRangeSelector() %>%
    dySeries("total_f", label = "Female") %>%
    dySeries("total_m", label = "Male") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else {
    dygraph(totals_[, c("total_e", "total_s")],
            main = "By language") %>%
    dyRangeSelector() %>%
    dySeries("total_e", label = "English") %>%
    dySeries("total_s", label = "Spanish") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE)
  }
  })
```

感谢您详细解释您的目标。我认为 @simon-s-a 建议的方法会简化事情。如果我们可以动态地运行分组,并对其进行结构化,这样我们就不需要事先知道这些组中可能存在的组件,那么维护起来就会容易得多。

这是一个最小可行的产品,它重建了绘图功能以在其中包含分组逻辑。

  1. 一旦按日期分组,无论我们的分组变量是什么,它都会计算每个组有多少行,然后展开这些行,以便每个组获得一列。

  2. 然后我用padr::pad填充中间任何缺失的时间行,并将所有 NA 替换为零。

  3. 最后,该数据帧被转换为xts对象并输入到 dygraph 中,它似乎可以自动处理多列。

Here:

---
title: "test"
output: 
  flexdashboard::flex_dashboard:
    theme: bootstrap
runtime: shiny
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```

```{r global, include=FALSE}
# generate data
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                             as.Date("2018-06-30"), 
                             "days"),
                  sex = sample(c("male", "female"), 181, replace=TRUE),
                  lang = sample(c("english", "spanish"), 181, replace=TRUE),
                  age = sample(20:35, 181, replace=TRUE))
dat <- dplyr::sample_n(dat, 80)
```

Sidebar {.sidebar}
=====================================

```{r}

radioButtons("diss", label = "Disaggregation",
             choices = list("All" = "Total",
                            "By Sex" = "sex",
                            "By Language" = "lang"), 
             selected = "Total")
```

Page 1
=====================================

```{r plot}

renderDygraph({
  grp_col <- rlang::sym(input$diss) # This converts the input selection to a symbol

  dat %>%
    mutate(Total = 1) %>% # This is a hack to let us "group" by Total -- all one group

    # Here's where we unquote the symbol so that dplyr can use it 
    #   to refer to a column. In this case I make a dummy column 
    #   that's a copy of whatever column we want to group
    mutate(my_group = !!grp_col) %>%

    # Now we make a group for every existing combination of week 
    #   (using lubridate::floor_date) and level of our grouping column,
    #   count how many rows in each group, and spread that to wide format.
    group_by(date = lubridate::floor_date(date, "1 week"), my_group) %>%
    count() %>% spread(my_group, n) %>% ungroup() %>%

    # padr:pad() fills in any missing weeks in the sequence with new rows
    #   Then we replace all the NA's with zeroes.
    padr::pad() %>% replace(is.na(.), 0) %>%

    # Finally we can convert to xts and feed the wide table into digraph.
    xts::xts(order.by = .$date) %>%
    dygraph() %>%
    dyRangeSelector() %>%
    dyOptions(
      useDataTimezone = FALSE, stepPlot = TRUE,
      drawGrid = FALSE, fillGraph = TRUE
    )
})
```
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

在时间序列的背景下分解 的相关文章

  • 解压 R 数据框中的列表

    我有一个dataframe其中一个字段包含不同长度的列表 我想将该字段中列表的每个元素提取到其自己的字段中 以便我可以将结果收集到一个很长的字段中dataframe每个列表元素都有一个 id 这是一个例子dataframe dat lt s
  • 如何在R中安装pivot_long()和pivot_wide()

    如果您想尝试这些新功能 pivot wide and pivot long 需要安装开发版tidyr devtools install github tidyverse tidyr 但我还没有实现它 我安装了一系列库 除了一个之外 vctr
  • 在 R 中计算大矩阵的零空间

    我找不到任何函数或包来计算 a 的零空间或 QR 分解 bigmatrix from library bigmemory 在 R 中 例如 library bigmemory a lt big matrix 1000000 1000 typ
  • data.table 查找值并翻译

    像许多人一样 我是 R 新手 我有一个大数据集 500M 行 我已将其读取到 data table 中logStats其中有如下数据 head logStats 15 time pid mean 1 2014 03 10 00 00 00
  • Rcpp 包不包含 Rcpp_precious_remove

    我一直在尝试创建数据库并安装 DBI 包 但仍然遇到此错误 我重新安装了 DBI 和 RSQLite 软件包 但它们似乎不起作用 library DBI con lt dbConnect RSQLite SQLite dbname memo
  • xtable 中的 Cox 回归输出 - 选择行/列并添加置信区间

    我不想将 cox 回归的输出导出到一个表中 然后将其放入我的文章中 我想最好的方法是使用 xtable library survival data pbc fit pbc lt coxph Surv time status 2 age ed
  • 如何比较数据框1的每一行与数据框2的每一行?

    我有两个数据框 如下所示 x data frame Name c 200003 200260 400826 400863 500710 Chr c chr1 chr1 chr2 chr3 chr3 Position c 11880 1441
  • ggplot2 方面的内部排序

    我正在尝试在 ggplot2 中绘制一个方面 但我很难使不同方面的内部顺序正确 数据如下 head THAT EXT ID FILE GENRE NODE 1 CKC 1823 01 CKC Novels better 2 CKC 1824
  • 使用 ggplot 为各个图例值选择所选颜色(HSV 或 HCL 或 RGB)

    我有一个类似这样的数据集 data lt read table text Me EE PE DE TE DEE CE 1 1 1 4 5 2000 0 50 0 2547 0 69 2 1 2 2 4 3000 NA 0 5896 2 56
  • 消除垂直线ggplot

    这个问题以前曾被问过 但答案并不总是明确或很复杂 我希望 ggplot2 的新版本能够带来更简单的解决方案 如何仅消除 ggplot 的垂直线而不消除轴刻度线或标签 这对于条形图来说确实很好 因为它可以消除图形中一些不必要的干扰 这里有一些
  • 更快地评估从右到左的矩阵乘法

    我注意到以二次形式评估矩阵运算右到左明显快于左到右在 R 中 取决于括号的放置方式 显然它们都执行相同的计算量 我想知道为什么会这样 这与内存分配有什么关系吗 A 5000 5000 B 5000 2 A matrix runif 5000
  • 分析和衡量 R 代码中的技术质量:有类似于 SonarQube 的工具吗?

    一个简单的问题 有人知道是否存在类似于 sonarqube 的 R 代码工具吗 或者声纳库 我的意思是 一个用于分析代码技术质量的工具 而不仅仅是突出显示或语法格式 提前致谢 您可以使用lintr并将结果上传到声纳Qube 这里有一个例子
  • 如何一次执行多个 RSQLite 语句或如何转储整个文件?

    使用 RSQLite 构建 SQLite 数据库 我想一次发送多个语句 这可能吗 为什么要做这些not work sql lt readLines createtables sql dbSendQuery con sql 和 sql lt
  • 使用服务器中的 Shiny Reactive 作为 UI 输入

    我正在努力使用反应函数的结果作为 UI 的输入 目前 我主要使用 renderUI 随着应用程序变得更加复杂 它会降低性能 Using DetailsList items filtered Accounts columns columns
  • 非等值连接 - 比较 R 中的两个数据帧

    我想根据第二个数据框中存在的值过滤数据框 例如 匹配第一个数据帧中 BP 列中高于 start pos 列的第一个值且小于 end pos 列或仅小于第二个数据中的 end pos 的行框架 我需要对第二个数据框中的所有值重复此过程 目前
  • data.table:j中的匿名函数

    我试图让匿名函数返回多列j的论证data table 这是一个例子 sample data tmpdt lt data table a c rep a 5 rep b 5 b c rep f 3 rep r 7 c 1 10 d 21 30
  • R grep:有 AND 运算符吗?

    假设我有以下数据框 User Id Tags 34234 imageUploaded people jpg more comma separated stuff 34234 imageUploaded 12345 people jpg 我如
  • ggplot2错误:美学必须是长度一,或者与数据长度相同问题:颜色、字母

    我收到此错误 错误 美学必须是长度一 或者与数据长度相同问题 颜色 字母 当我将 ggplot 与数据框一起使用时Z如图所示 Z lt data frame Name c A G C T T T AG AG GC GC CT CT AT A
  • r - 如何在 normalizePath 中指定路径,或解决与其关联的此错误?

    我正在学习 R 并将其安装在我的办公室计算机上 我没有计算机的管理员权限 因为我什至必须致电IT人员进行安装 然后我安装一个包 一开始输入时不起作用 例如 install packages thepackage 错误信息是这样的 Error
  • gridExtra 2.0.0 更改标题大小

    我知道 gridExtra 已更新 因此 我想知道如何更改标题大小 这不再有效 grid arrange a b c d ncol 2 nrow 2 main textGrob Title gp gpar fontsize 15 font

随机推荐