我在 R 中有一个文件(“my_file”),如下所示:
NAME Address_Parse
1 name1 [('372', 'StreetNumber'), ('river', 'StreetName'), ('St', 'StreetType'), ('S', 'StreetDirection'), ('toronto', 'Municipality'), ('ON', 'Province'), ('A1C', 'PostalCode'), ('9R7', 'PostalCode')]
2 name2 [('208', 'StreetNumber'), ('ocean', 'StreetName'), ('St', 'StreetType'), ('E', 'StreetDirection'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('J8N', 'PostalCode'), ('1G8', 'PostalCode')]
如果结构令人困惑,文件如下所示
my_file = structure(list(NAME = c("name1", "name2"), Address_Parse = c("[('372', 'StreetNumber'), ('river', 'StreetName'), ('St', 'StreetType'), ('S', 'StreetDirection'), ('toronto', 'Municipality'), ('ON', 'Province'), ('A1C', 'PostalCode'), ('9R7', 'PostalCode')]",
"[('208', 'StreetNumber'), ('ocean', 'StreetName'), ('St', 'StreetType'), ('E', 'StreetDirection'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('J8N', 'PostalCode'), ('1G8', 'PostalCode')]"
)), class = "data.frame", row.names = c(NA, -2L))
目标:对于每一行,我想获取每个“元素”(例如“StreetNumber”、“StreetName”、“StreetType”等)并将其转换为新列。这看起来像这样:
name StreetNumber StreetName StreetType StreetDirection Municipality Province PostalCode
1 name1 372 river St S toronto ON A1C9R7
2 name2 208 ocean St E Toronto ON J8N1G8
对我来说,地址字段似乎是 JSON 格式(我可能是错的)。我尝试寻找解析 JSON 的不同方法。例如,我尝试应用此处提供的答案(R:将数据帧列中的嵌套 JSON 转换为同一数据帧中的附加列 https://stackoverflow.com/questions/49633803/r-convert-nested-json-in-a-data-frame-column-to-addtional-columns-in-the-same-d):
library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)
final = my_file %>%
mutate(
json_parsed = map(Address_Parse, ~ fromJSON(., flatten=TRUE))
) %>%
unnest(json_parsed)
但是,这给了我以下错误:
Error in `mutate()`:
! Problem while computing `json_parsed = map(Address_Parse, ~fromJSON(., flatten = TRUE))`.
Caused by error:
! lexical error: invalid char in json text.
[('372', 'StreetNumber'), ('rive
(right here) ------^
Run `rlang::last_error()` to see where the error occurred.
然后我尝试了另一种方法:
final <- my_file %>%
rowwise() %>%
do(data.frame(fromJSON(.$Address_Parse , flatten = T))) %>%
ungroup() %>%
bind_cols(my_file %>% select(-Address_Parse ))
但我现在收到一个新错误:
Error: lexical error: invalid char in json text.
[('372', 'StreetNumber'), ('rive
(right here) ------^
有人可以告诉我解决这个问题吗?
谢谢你!