我正在使用 R 和“elastic”包来查询包含 JSON 格式的 Twitter 数据的弹性搜索数据库。查询工作正常,我得到了我期望的输出内容(out)。
class(out)
[1] "list"
和 out$hits$hits 返回
> out$hits$hits
[[1]]
[[1]]$`_index`
[1] "twitter_all_geo-2014-11-01"
[[1]]$`_type`
[1] "ctweet"
[[1]]$`_id`
[1] "ubicity-twitter-160f0964-6fc7-43ef-af2a-0e1b8c8184c7"
[[1]]$`_version`
[1] 1
[[1]]$`_score`
[1] 2.10757
[[1]]$`_source`
[[1]]$`_source`$id
[1] "528330489049120770"
[[1]]$`_source`$created_at
[1] "2014-10-31T23:39:39+0000"
[[1]]$`_source`$user
[[1]]$`_source`$user$name
[1] "afterlifetemis"
[[1]]$`_source`$place
[[1]]$`_source`$place$geo_point
[[1]]$`_source`$place$geo_point[[1]]
[1] 30.4529
[[1]]$`_source`$place$geo_point[[2]]
[1] 50.61104
[[1]]$`_source`$place$city
[1] "Ukraine"
[[1]]$`_source`$place$country
[1] "Ukraine"
[[1]]$`_source`$place$country_code
[1] "UA"
[[1]]$`_source`$msg
[[1]]$`_source`$msg$text
[1] "u had one job artemis\none"
[[1]]$`_source`$msg$lang
[1] "EN"
[[1]]$`_source`$msg$hash_tags
list()
[[2]]
[[2]]$`_index`
[1] "twitter_all_geo-2014-11-01"
[[2]]$`_type`
[1] "ctweet"
...
...
基本上我想将数据保存为 .csv 文件,所以我输入
> write.csv(out$hits$hits,'out.csv')
Error in data.frame(text = "u had one job artemis\none", lang = "EN", : arguments imply differing number of rows: 1, 0
我认为有必要将其转换为 data.frame,所以我尝试:
> df <- ldply (out, data.frame)
data.frame 中的错误(文本 =“你有一份工作 artemis\none”,lang =“EN”,:
参数意味着不同的行数:1、0
(我也尝试了其他几种乐观的尝试,比如这个:)
> t(sapply(out$hits$hits, '[', 1:max(sapply(out$hits$hits, length))))
_index _type _id _version _score _source
[1,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-160f0964-6fc7-43ef-af2a-0e1b8c8184c7" 1 2.10757 List,5
[2,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-ba071fff-cafb-4d3f-947d-13c934905c1b" 1 2.10757 List,5
[3,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-dd64af32-4d59-4008-a3db-74471ad269d1" 1 2.10757 List,5
[4,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-4ba0d3d0-642d-4f9f-aaf9-c55929c35dc4" 1 2.10757 List,5
[5,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-d7b8cbbc-87b3-44b5-8c9c-91c7b62f1458" 1 2.10757 List,5
[6,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-76353a7c-44c9-4863-a59d-adb16716ca18" 1 2.10757 List,5
[7,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-2aec0798-9918-4b66-9b2a-ef5a4d1f3711" 1 2.10757 List,5
[8,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-c9e7637d-358a-40ee-a06c-85af04c22191" 1 2.10757 List,5
[9,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-8928c1ef-f46a-4682-99c4-4dbc55270b03" 1 2.10757 List,5
[10,] "twitter_all_geo-2014-11-01" "ctweet" "ubicity-twitter-d6b19975-b310-46c4-af11-af56971b7c4b" 1 2.10757 List,5
一开始看起来不错,但实际的推文消息不再在矩阵中
我很乐观,认为可能首先将其转换(返回)为 JSON(使用 RJSON)
toJSON(输出)
toJSON(out) 中的错误:无法转义字符串。字符串不是utf8
最后我有一个列表,无法保存,无法转换为 JSON、data.frame 或 data.table (因为它不统一)。有谁可以给我一个提示:a) 将其转换为 JSON 或如何将列表保存到 .csv 文件或将其放入 data.frame 中?
非常感谢,我想我不太明白。
-Tobias