我正在做网络抓取。
下面是我使用的代码。
我在评论上写了一些评论。
library(httr)
library(rvest)
library(stringr)
# Bulletin board url
List.of.questions.url<- 'http://kin.naver.com/qna/list.nhn?m=noanswer&dirId=70108'
# Vector to store title and body
answers <- c()
# get the posts from page 1 to page 2.
for(i in 1:2){
url <- modify_url(List.of.questions.url, query=list(page=i))
list <- read_html(url, encoding = 'utf-8') #I think I encoded, but I'm getting an error.
# Gets the url of the post.
# TLS = title.links, CLS = content.links
TLS <- html_nodes(list, '.basic1 dt a')
CLS <- html_attr(TLS, 'href')
CLS <- paste0("http://kin.naver.com",CLS)
#Gets the required properties.
for(link in CLS){
h <- read_html(link)
# answer
answer <- html_text(html_nodes(h, '#contents_layer_1'))
answer <- str_trim(repair_encoding(answer)) #I think I encoded, but I'm getting an error.
answers<-c(answers,answer)
print(link)
}
}
但是,在抓取时会出现此错误。
也许与编码有关。
(但正如我在评论中所写,我认为我的编码正确。)
[1] "http://kin.naver.com/qna/detail.nhn?d1id=7&dirId=70111&docId=280474910"
Error: No guess has more than 50% confidence
In addition: There were 43 warnings (use warnings() to see them)
> warnings()
1: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
2: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
3: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
4: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
5: In stringi::stri_conv(x, from = from) :
#All the same contents, so omitted
我如何解决它?
感谢您的建议
None
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)