我通过 ODBC 从 SQL 数据库中提取一些数据,列自动设置为factor
。它类似于以下内容:
library(RODBC)
library(data.table)
data <- data.table(sqlQuery(channel, query))
我的数据如下所示,只是有更多列:
data <- data.table("C1"=as.factor(c(letters[1:4], "NULL", letters[5])),
"C2"=as.factor(c(rnorm(3), "NULL", rnorm(2))),
"C3"=as.factor(c(letters[1], "NULL", letters[2:4], "NULL")))
> data
C1 C2 C3
1: a -0.190200079604691 a
2: b 0.310548914832963 NULL
3: c 0.0153099116493453 b
4: d NULL c
5: NULL 0.157187027626419 d
6: e 0.118537540781528 NULL
> str(data)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ C1: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 6 5
$ C2: Factor w/ 6 levels "-0.190200079604691",..: 1 5 2 6 4 3
$ C3: Factor w/ 5 levels "a","b","c","d",..: 1 5 2 3 4 5
- attr(*, ".internal.selfref")=<externalptr>
我如何替换“NULL”NA
?在这里我想要R
将这些 SQL“NULL”字符串视为缺失值NA
。我尝试了以下方法,但似乎NA
导致问题。
for (col in names(data)) {
set(data, which(data[[col]]=="NULL"), col, NA)
}
> Error in set(data, which(data[[col]] == "NULL"), col, NA) :
Can't assign to column 'C1' (type 'factor') a value of type 'logical' (not character, factor, integer or numeric)
RODBC解决方案
感谢@user20650的建议,您可以控制缺失值sqlQuery
通过做data <- data.table(sqlQuery(channel, query, na.strings=c("NA", "NULL")))
。但是,如果您的数据源格式不正确,仍然可能会出现此问题,因此这不是该帖子的通用解决方案。