简洁版本。我load()
数据包中的数据。以前,包中的测试通过了,现在失败了,因为输出sort
改变了。
这是一个最小的可重现示例 - 详细信息请参见下文:
y <- c("Schaffhausen", "Schwyz", "Seespital", "SRZ")
sort(y)
# OLD 3.5.2 [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
# NEW 4.0.0 [1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"
# Update 4.0.2 see comment:
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
# From jay.sf's comment
sort.int(y, method="radix")
# [1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"
sort.int(y, method="shell")
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
# From Henrik's comment:
data.table::fsort(y)
# [1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"
我发现的唯一相关的报告变化是
R 4.0.0 中的变化
新功能
...
通过 read.table() 加载数据集时,data() 现在使用 LC_COLLATE=C 来确保可能的字符串到因子转换的结果与区域设置无关。
但我什至不确定这是否可以解释我所看到的。
由于我想最大程度地减少导入包的数量,并且我想了解发生了什么,所以我不确定如何继续。我错过了什么吗?
(更改为sort.int
用方法radix
会完成这项工作,但仍然:为什么会改变?这样真的更好吗?
我刚刚意识到(感谢罗兰)sort
以我为例sort.int
:
function (x, decreasing = FALSE, na.last = NA, ...)
{
if (is.object(x))
x[order(x, na.last = na.last, decreasing = decreasing)]
else sort.int(x, na.last = na.last, decreasing = decreasing,
...)
}
From ?sort.int
:
“auto”方法为短(小于2^31个元素)数值向量、整数向量、逻辑向量和因子选择“radix”;否则,“外壳”。)
根据文档,sort.int
从 4.0.0 到 4.0.2 没有改变。
From ?data.table::setorder
data.table 始终以“C-locale”重新排序。结果,
排序可能与通过 base::order 获得的排序不同。用英语
语言环境,例如,在 C 语言环境中排序区分大小写。因此,
排序 c("c", "a", "B") 在 data.table 中返回 c("B", "a", "c") 但
c("a", "B", "c") 按基本::顺序。请注意,这在大多数情况下没有区别
数据案例;两者都在 ids 上返回相同的结果,其中仅
存在大写或小写字母(“AB123”
使用 C-locale 使 data.table 中的排序行为更加有效
跨会话和区域设置保持一致。 base::order 的行为
取决于有关 R 会话区域设置的假设。用英语
locales, "america"
(相关问题使用 R 进行语言相关排序 and 最佳实践:我应该尝试将语言环境更改为 UTF-8 还是保持原样安全?)
Details
R.version # old _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.2
year 2018
month 12
day 20
svn rev 75870
language R
version.string R version 3.5.2 (2018-12-20)
nickname Eggshell Igloo
y <- c("Schaffhausen", "Schwyz", "Seespital", "SRZ")
sort(y)
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
stringr::str_sort(y)
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
stringr::str_sort(y, locale = "C")
# [1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"
# =======
R.version # new after upgrade
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.0
year 2020
month 04
day 24
svn rev 78286
language R
version.string R version 4.0.0 (2020-04-24)
nickname Arbor Day
y <- c("Schaffhausen", "Schwyz", "Seespital", "SRZ")
sort(y)
# [1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"
stringr::str_sort(y)
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
stringr::str_sort(y, locale = "C")
#[1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"
# ==== Test with new 4.0.2
R.version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.2
year 2020
month 06
day 22
svn rev 78730
language R
version.string R version 4.0.2 (2020-06-22)
nickname Taking Off Again
y <- c("Schaffhausen", "Schwyz", "Seespital", "SRZ")
sort(y)
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
stringr::str_sort(y)
# [1] "Schaffhausen" "Schwyz" "Seespital" "SRZ"
stringr::str_sort(y, locale = "C")
# [1] "SRZ" "Schaffhausen" "Schwyz" "Seespital"