我已经尝试过这段代码的数百种排列字面上的天尝试获得一个可以完成我想要的功能,但我最终放弃了。感觉这绝对是可行的,而且我已经很接近了!
我试图通过下面的代表回到这里的核心问题。
基本上我有一个单行数据框,其中一列包含字符串列表(“概念”)。我想为每个字符串创建一个附加列,使用mutate
,理想情况下列从字符串中获取其名称,然后用函数调用的结果填充列(?现在哪个函数并不重要? - 我只需要函数的基础设施才能工作。 )
像往常一样,我觉得我一定错过了一些明显的东西......也许只是一个语法错误。
我也想知道是否需要使用purrr::map
,也许更简单的矢量化映射会很好地工作。
我觉得新的专栏被命名了..1
而不是概念名称,而是关于问题所在的一点线索。
我可以通过手动调用每个概念来创建我想要的数据框架(请参阅 reprex 的结尾),但由于不同数据框架的概念列表不同,我想使用管道和 tidyverse 技术来实现此功能,而不是手动执行。
我已阅读以下问题来寻求帮助:
- 如何使用 purrr 中的映射和 dplyr::mutate 基于列对创建多个新列 https://stackoverflow.com/questions/49816669/how-to-use-map-from-purrr-with-dplyrmutate-to-create-multiple-new-columns-base
- 如何使用 purrr:map 函数使用动态变量改变多列? https://stackoverflow.com/questions/57183024/how-to-mutate-multiple-columns-with-dynamic-variable-using-purrrmap-function
- (R) 将 map() 与列表列一起使用的更简洁方法 https://stackoverflow.com/questions/53938745/r-cleaner-way-to-use-map-with-list-columns
- 使用 purrr 和预定义函数添加多个输出变量 https://stackoverflow.com/questions/51978138/add-multiple-output-variables-using-purrr-and-a-predefined-function
- 使用 purrr 创建新变量(如何做到这一点?) https://stackoverflow.com/questions/52607511/creating-new-variables-with-purrr-how-does-one-go-about-that
- 如何使用动态名称计算 R 数据框中的多个新列 https://stackoverflow.com/questions/58641327/how-to-compute-multiple-new-columns-in-a-r-dataframe-with-dynamic-names
但这些都没有完全帮助我解决我遇到的问题。 [edit:在最后一个 q 中添加到该列表中,这可能是我需要的技术]。
<!-- language-all: lang-r -->
# load packages -----------------------------------------------------------
library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
# set up initial list of tibbles ------------------------------------------
df <- list(
district_population = tibble(
dataset_title = "Population estimates - local authority based by single year",
dataset_id = "NM_2002_1"
),
jsa_claimants = tibble(
dataset_title = "Jobseeker\'s Allowance with rates and proportions",
dataset_id = "NM_1_1"
)
)
# just use the first tibble for now, for testing --------------------------
# ideally I want to map across dfs through a list -------------------------
df <- df[[1]]
# nitty gritty functions --------------------------------------------------
get_concept_list <- function(df) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension") %>%
filter(!concept == "geography") %>%
pull("concept")
}
# get_concept_list() returns the strings I need:
get_concept_list(df)
#> [1] "time" "gender" "c_age" "measures"
# Here is a list of examples of types of map* that do various things,
# none of which is what I need it to do
# I'm using toupper() here for simplicity - ultimately I will use
# get_concept_info() to populate the new columns
# this creates four new tibbles
get_concept_list(df) %>%
map(~ mutate(df, {{.x}} := toupper(.x)))
#> [[1]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#>
#> [[2]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 GENDER
#>
#> [[3]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 C_AGE
#>
#> [[4]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this throws an error
get_concept_list(df) %>%
map_chr(~ mutate(df, {{.x}} := toupper(.x)))
#> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3
# this creates three extra rows in the tibble
get_concept_list(df) %>%
map_df(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this does the same as map_df
get_concept_list(df) %>%
map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this creates a single tibble 12 columns wide
get_concept_list(df) %>%
map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 1 x 12
#> dataset_title dataset_id ..1 dataset_title1 dataset_id1 ..11 dataset_title2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Population e~ NM_2002_1 TIME Population es~ NM_2002_1 GEND~ Population es~
#> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
#> # dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>
# function to get info on each concept (except geography) -----------------
# this is the function I want to use eventually to populate my new columns
get_concept_info <- function(df, concept_name) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id) %>%
filter(name == "dimensions") %>%
pluck("value", 1, "dimension") %>%
filter(concept == concept_name) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list() %>%
pluck("data")
}
# individual mutate works, for comparison ---------------------------------
# I can create the kind of table I want manually using a line like the one below
# df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
df %>% mutate(., measures = get_concept_info(df, "measures"))
#> # A tibble: 1 x 3
#> dataset_title dataset_id measures
#> <chr> <chr> <list>
#> 1 Population estimates - local authority based by sin~ NM_2002_1 <tibble [2 x ~
<sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>