由于这是 Microsoft 提供的 ODBC 驱动程序的限制,因此在他们对驱动程序进行更改之前几乎没有什么可做的。 @zozlak 在您链接到的 GitHub 问题中解释了原因。
我倾向于在需要时使用存储过程来解决这个问题,但这通常需要为每个特定实例编写一个存储过程。在某些时候,我可能会想出一种更通用的方法来在存储过程中执行此操作,但我发现在存储过程中构造查询的过程非常乏味且令人沮丧。
因此,我只是花了一些时间构建一个函数,该函数将执行涉及 VARCHAR(MAX) 变量的有限查询。这是一种蛮力方法,对于 17000 个字符的变量,将其导出为三个变量并将它们粘贴到 R 中。它很粗糙,可能不是很有效,但这是我迄今为止想到的最佳解决方案。
另一个限制是它不允许您在查询中重命名变量。您将被数据库中命名的变量所困扰。如果您只涉及几个表,那可能不是问题。在非常复杂的数据库中,这可能会出现问题。但是,至少有了这个,您可以只查询 VARCHAR(MAX) 变量,其中包含一些必要的 ID,以便在 R 中执行合并。
正如 GitHub 问题中所讨论的,最好尽可能避免使用 VARCHAR(MAX)。如果确实需要未知长度,则 VARBINARY(MAX) 更容易查询。
Example
source("https://gist.githubusercontent.com/nutterb/d2e050dada608bb6213e61d0f8471b65/raw/be8717f318b3e3087e7c26c9a8f9d0a582a5daef/query_varchar_max"
channel <- odbcDriverConnect(...)
query_varchar_max(channel = channel,
id = c("idvar"),
varchar_max = c("varchar_max_var", "varchar_max_var2"),
from = "FROM dbo.table_name WHERE group = ?",
data = list(group = "A"))
功能码
#' @name query_varchar_max
#' @title Query a VARCHAR(MAX) Variable from SQL Server
#'
#' @description The RODBC driver to SQL Server (SQL Server Native Client 11.0)
#' reports the lenght of a VARCHAR(MAX) variable to be zero. This presents
#' difficulties in extracting long text values from the database. Often, the
#' ODBC will assume a length of 255 characters and truncate the text to that
#' many characters. The approach taken here searches the VARCHAR(MAX) variables
#' for the longest length, and extracts the data in segments to be pasted
#' together in R.
#'
#' @param channel A valid ODBC channel to a SQL Server database.
#' @param id A character vector of ID variables that may be used to merge the
#' data from this query into another dataset.
#' @param varchar_max a character vector of variable names that are to be
#' treated as if they are VARCHAR(MAX) variables.
#' @param from A single character string providing the remainder of the query
#' to be run, beginning with the \code{FROM} statement.
#' @param stringsAsFactors \code{logical(1)}. Should character strings returned
#' from the database be converted to factors?
#' @param ... Additional arguments to \code{sqlExecute} when running the full
#' query.
#'
#' @details \code{query_varchar_max} operates by determining how many columns of up to
#' 8000 characters each are required to export a complete VARCHAR(MAX) variable.
#' It then creates the necessary number of intermediate variables and queries the
#' data using the SQL Server \code{SUBSTRING} command, extracting the VARCHAR(MAX)
#' variable in increments of 8000 characters. After completing the query,
#' the intemediary variables are concatenated and removed from the data.
#'
#' The function makes accommodation for multi-part queries as far as [TABLE].[VARIABLE]
#' formats are concerned. It is not intended for use in [SCHEMA].[TABLE].[VARIABLE]
#' formats. This at least allows \code{from} to include joins for more complex
#' queries. Parameterized queries are also supported through \code{sqlExecute}.
#'
#' @export
query_varchar_max <- function(channel, id, varchar_max, from,
stringsAsFactors = FALSE, ...)
{
coll <- checkmate::makeAssertCollection()
checkmate::assert_class(x = channel,
classes = "RODBC",
add = coll)
checkmate::assert_character(x = id,
add = coll)
checkmate::assert_character(x = varchar_max,
add = coll)
checkmate::assert_character(x = from,
len = 1,
add = coll)
checkmate::assert_logical(x = stringsAsFactors,
len = 1,
add = coll)
checkmate::reportAssertions(coll)
varchar_max_len <-
paste0(
sprintf("MAX(LEN(%s)) AS len_%s",
varchar_max,
sub("[.]", "_", varchar_max)),
collapse = ", "
)
varchar_len <-
unlist(
RODBCext::sqlExecute(
channel = channel,
query = sprintf("SELECT %s %s",
varchar_max_len,
from),
fetch = TRUE
)
)
varchar_max_cols <-
unlist(
mapply(expand_varchar_max,
varchar_max,
varchar_len,
SIMPLIFY = FALSE)
)
Prelim <-
RODBCext::sqlExecute(
channel = channel,
query = sprintf("SELECT %s, %s %s",
paste0(id, collapse = ", "),
paste0(varchar_max_cols, collapse = ", "),
from),
fetch = TRUE,
stringsAsFactors = stringsAsFactors,
...
)
var_stub_to_combine <-
unique(
sub(
"(part)(\\d{1,3})",
"\\1",
sub(".+AS ", "", varchar_max_cols)
)
)
col_to_combine <-
lapply(var_stub_to_combine,
grep,
names(Prelim))
Prelim[sub(".+[.]", "", varchar_max)] <-
lapply(col_to_combine,
function(col) apply(Prelim[col], 1, paste0, collapse = ""))
Prelim[-unlist(col_to_combine)]
}
expand_varchar_max <- function(varchar_max, varchar_len)
{
nvar <- varchar_len %/% 8000 + 1
var_list <- vector("character", length = nvar)
for (i in seq_along(var_list))
{
var_list[i] <-
sprintf("SUBSTRING(%s, %s, %s) AS %s_part%s",
varchar_max,
1 + (i - 1) * 8000,
8000,
paste0(sub("[.]", "_", varchar_max)),
i)
}
var_list
}