我的控制台有一种接受这些“字符”的字体:
txt <- "Person, Message,
A, ????,
A, How are you?,
B, ???? Alright!,
A, ????????"
Encoding(txt)
#[1] "UTF-8"
dput(txt)
#"Person, Message,\nA, \U0001f609,\nA, How are you?,\nB, \U0001f64d Alright!,\nA, \U0001f483\U0001f483"
> tvec <- scan(text=txt, what="")
Read 13 items
> dput(tvec)
c("Person,", "Message,", "A,", "\U0001f609,", "A,", "How", "are",
"you?,", "B,", "\U0001f64d", "Alright!,", "A,", "\U0001f483\U0001f483"
)
> which(tvec == '\U0001f609,')
[1] 4
当我使用 scan 使用逗号 sep 读取该文本时,前导空格阻止了相等测试成功,但如果我使用两个字符版本,则成功:
> which(tvec == '\U0001f609')
integer(0)
> dput(tvec)
c("Person", " Message", "", "A", " \U0001f609", "", "A", " How are you?",
"", "B", " \U0001f64d Alright!", "", "A", " \U0001f483\U0001f483"
)
> which(tvec == " ????")
[1] 5
这是使用 Courier New 作为 Mac 上的控制台/编辑器字体。要查看 Unicode 表示形式的解释,请参阅?Quotes
{base}.