当我运行代码时,我得到了一些不同的东西,因为 set.seed() 尚未指定。我没有使用变量名“letter”,而是使用“letter_”作为方便的分割标记:
> fnew <- rownames(myfit$beta)[which(myfit$beta != 0)]
> fnew
[1] "letter_c" "letter_d" "letter_e" "letter_f" "letter_h" "letter_k" "letter_l"
[8] "letter_o" "letter_q" "letter_r" "letter_s" "letter_t" "letter_u" "letter_v"
[15] "letter_w"
然后进行分割并打包成字符矩阵:
> fnewmtx <- cbind( lapply(sapply(fnew, strsplit, split="_"), "[[", 2),
+ lapply(sapply(fnew, strsplit, split="_"), "[[", 1))
fnewmtx
[1] [2]
letter_c“c”“字母”
letter_d“d”“字母”
letter_e“e”“字母”
letter_f“f”“字母”剪掉了其余部分
并将粘贴函数输出包装在 as.formula() 中,这是如何“在公式及其字符表示形式之间进行转换”的答案的一半。另一半是as.character()
form <- as.formula( paste("~",
paste(
paste(" I(", fnewmtx[,2], "_ ==", "'",fnewmtx[,1],"') ", sep="") ,
sep="", collapse="+")
)
) # edit: needed to add back the underscore
现在输出是一个适当的类对象:
> class(form)
[1] "formula"
> form
~I(letter_ == "c") + I(letter_ == "d") + I(letter_ == "e") +
I(letter_ == "f") + I(letter_ == "h") + I(letter_ == "k") +
I(letter_ == "l") + I(letter_ == "o") + I(letter_ == "q") +
I(letter_ == "r") + I(letter_ == "s") + I(letter_ == "t") +
I(letter_ == "u") + I(letter_ == "v") + I(letter_ == "w")
我发现有趣的是 as.formula 转换使字母周围的单引号变成了双引号。
编辑:既然问题有一个或两个额外的维度,我的建议是跳过公式的重新创建。请注意,myfit$beta 的行名称与 X 的列名称完全相同,因此请使用非零行名称作为索引来选择 X 矩阵中的列:
> str(X[ , which( colnames(X) %in% rownames(myfit$beta)[which(myfit$beta != 0)] )] )
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:429] 9 54 91 157 166 37 55 68 117 131 ...
..@ p : int [1:61] 0 5 13 20 28 36 42 50 60 68 ...
..@ Dim : int [1:2] 200 60
..@ Dimnames:List of 2
.. ..$ : chr [1:200] "1" "2" "3" "4" ...
.. ..$ : chr [1:60] "letter_b" "letter_c" "letter_e" "letter_f" ...
..@ x : num [1:429] 1 1 1 1 1 1 1 1 1 1 ...
..@ factors : list()
> myfit2 <- glmnet(X[ , which( colnames(X) %in% rownames(myfit$beta)[which(myfit$beta != 0)] )] ,as.vector(y),lambda=.05)
> myfit2
Call: glmnet(x = X[, which(colnames(X) %in% rownames(myfit$beta)[
which(myfit$beta != 0)])],
y = as.vector(y), lambda = 0.05)
Df %Dev Lambda
[1,] 60 0.9996 0.05