scale.default 中的错误：“center”的长度必须等于“x”的列数

2024-03-18

我在用mboost包做一些分类。这是代码

library('mboost')
load('so-data.rdata')
model <- glmboost(is_exciting~., data=training, family=Binomial())
pred <- predict(model, newdata=test, type="response")

但是 R 在进行预测时抱怨说

Error in scale.default(X, center = cm, scale = FALSE) : 
  length of 'center' must equal the number of columns of 'x'

数据（training and test）可以在这里下载（7z https://dl.dropboxusercontent.com/u/1335302/so-data.7z, zip https://dl.dropboxusercontent.com/u/1335302/so-data.zip）。错误的原因是什么以及如何消除它？谢谢。

UPDATE:

> str(training)
'data.frame':   439599 obs. of  24 variables:
 $ is_exciting                           : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_state                          : Factor w/ 52 levels "AK","AL","AR",..: 15 5 5 23 47 5 44 42 42 5 ...
 $ school_charter                        : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_magnet                         : Factor w/ 2 levels "f","t": 1 1 1 1 2 1 1 1 1 1 ...
 $ school_year_round                     : Factor w/ 2 levels "f","t": 1 1 1 1 1 2 1 1 1 2 ...
 $ school_nlns                           : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_charter_ready_promise          : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_prefix                        : Factor w/ 6 levels "","Dr.","Mr.",..: 5 5 3 5 6 5 6 6 5 6 ...
 $ teacher_teach_for_america             : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 2 1 2 1 ...
 $ teacher_ny_teaching_fellow            : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ primary_focus_subject                 : Factor w/ 28 levels "","Applied Sciences",..: 19 17 18 18 10 4 17 17 18 17 ...
 $ primary_focus_area                    : Factor w/ 8 levels "","Applied Learning",..: 6 5 5 5 5 4 5 5 5 5 ...
 $ secondary_focus_subject               : Factor w/ 28 levels "","Applied Sciences",..: 28 18 17 19 26 18 18 28 24 25 ...
 $ secondary_focus_area                  : Factor w/ 8 levels "","Applied Learning",..: 7 5 5 6 8 5 5 7 7 4 ...
 $ resource_type                         : Factor w/ 7 levels "","Books","Other",..: 4 4 2 5 5 2 2 5 5 5 ...
 $ poverty_level                         : Factor w/ 4 levels "high poverty",..: 2 2 4 2 1 2 2 1 2 1 ...
 $ grade_level                           : Factor w/ 5 levels "","Grades 3-5",..: 5 5 2 5 5 2 3 2 4 2 ...
 $ fulfillment_labor_materials           : num  30 35 35 30 30 35 30 35 35 35 ...
 $ total_price_excluding_optional_support: num  1274 477 892 548 385 ...
 $ total_price_including_optional_support: num  1499 562 1050 645 453 ...
 $ students_reached                      : int  31 20 250 36 19 28 90 21 60 56 ...
 $ eligible_double_your_impact_match     : Factor w/ 2 levels "f","t": 1 2 1 2 1 2 1 1 1 1 ...
 $ eligible_almost_home_match            : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 2 2 1 1 ...
 $ essay_length                          : int  236 285 194 351 383 273 385 437 476 159 ...


> str(test)
'data.frame':   44772 obs. of  23 variables:
 $ school_state                          : Factor w/ 51 levels "AK","AL","AR",..: 22 35 11 46 5 35 11 28 28 10 ...
 $ school_charter                        : Factor w/ 2 levels "f","t": 1 1 1 1 2 1 1 1 1 1 ...
 $ school_magnet                         : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_year_round                     : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_nlns                           : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_charter_ready_promise          : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_prefix                        : Factor w/ 6 levels "","Dr.","Mr.",..: 3 5 6 6 3 5 5 5 3 5 ...
 $ teacher_teach_for_america             : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_ny_teaching_fellow            : Factor w/ 2 levels "f","t": 1 2 1 1 1 1 1 1 1 1 ...
 $ primary_focus_subject                 : Factor w/ 28 levels "","Applied Sciences",..: 5 16 17 17 18 11 16 17 2 17 ...
 $ primary_focus_area                    : Factor w/ 8 levels "","Applied Learning",..: 2 4 5 5 5 2 4 5 6 5 ...
 $ secondary_focus_subject               : Factor w/ 28 levels "","Applied Sciences",..: 25 1 19 1 17 9 17 11 1 1 ...
 $ secondary_focus_area                  : Factor w/ 8 levels "","Applied Learning",..: 4 1 6 1 5 6 5 2 1 1 ...
 $ resource_type                         : Factor w/ 7 levels "","Books","Other",..: 5 5 5 2 5 6 4 5 5 4 ...
 $ poverty_level                         : Factor w/ 4 levels "high poverty",..: 1 2 4 4 1 2 2 2 1 2 ...
 $ grade_level                           : Factor w/ 5 levels "","Grades 3-5",..: 4 3 3 5 4 5 5 4 3 5 ...
 $ fulfillment_labor_materials           : num  30 30 30 30 30 30 30 30 30 30 ...
 $ total_price_excluding_optional_support: num  2185 149 1017 156 860 ...
 $ total_price_including_optional_support: num  2571 175 1197 183 1012 ...
 $ students_reached                      : int  200 110 10 22 180 51 30 15 260 20 ...
 $ eligible_double_your_impact_match     : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ eligible_almost_home_match            : Factor w/ 2 levels "f","t": 2 1 1 1 1 1 1 1 2 1 ...
 $ essay_length                          : int  221 137 313 243 373 344 304 431 231 173 ...


> summary(model)

     Generalized Linear Models Fitted via Gradient Boosting

Call:
glmboost.formula(formula = is_exciting ~ ., data = training,     family = Binomial())


     Negative Binomial Likelihood 

Loss function: { 
     f <- pmin(abs(f), 36) * sign(f) 
     p <- exp(f)/(exp(f) + exp(-f)) 
     y <- (y + 1)/2 
     -y * log(p) - (1 - y) * log(1 - p) 
 } 


Number of boosting iterations: mstop = 100 
Step size:  0.1 
Offset:  -1.197806 

Coefficients: 

NOTE: Coefficients from a Binomial model are half the size of coefficients
 from a model fitted via glm(... , family = 'binomial').
See Warning section in ?coef.mboost

                       (Intercept)                     school_stateDC 
                     -0.5250166130                       0.0426909965 
                    school_stateIL                    school_chartert 
                      0.0084191638                       0.0729272310 
                teacher_prefixMrs.                  teacher_prefixMs. 
                     -0.0181489492                       0.0438425925 
        teacher_teach_for_americat                 resource_typeBooks 
                      0.2593005345                       0.0046126706 
           resource_typeTechnology        fulfillment_labor_materials 
                     -0.0313904871                       0.0120086140 
eligible_double_your_impact_matcht        eligible_almost_home_matcht 
                     -0.0316376431                      -0.0522717398 
                      essay_length 
                      0.0004993224 
attr(,"offset")
[1] -1.197806

Selection frequencies:
       fulfillment_labor_materials         teacher_teach_for_americat 
                              0.24                               0.15 
                      essay_length                    school_chartert 
                              0.15                               0.09 
                 teacher_prefixMs.            resource_typeTechnology 
                              0.08                               0.07 
eligible_double_your_impact_matcht        eligible_almost_home_matcht 
                              0.07                               0.07 
                teacher_prefixMrs.                     school_stateDC 
                              0.04                               0.02 
                    school_stateIL                 resource_typeBooks 
                              0.01                               0.01

我也尝试过glm但它说

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor teacher_prefix has new levels

但我没有看到任何新的关卡teacher_prefix多变的：

> levels(training$teacher_prefix)
[1] ""           "Dr."        "Mr."        "Mr. & Mrs." "Mrs."       "Ms."       
> levels(test$teacher_prefix)
[1] ""           "Dr."        "Mr."        "Mr. & Mrs." "Mrs."       "Ms."

事实上，问题在于glmboost and glm有关系。您的存在问题teacher_prefix多变的。

As the glm示例指出，有些级别位于test那些不在training（有点儿）。虽然这两个因素具有相同的levels()，训练集没有观察值，其中teacher_prefix==""但测试确实如此。比较

table(test$teacher_prefix)
table(training$teacher_prefix)

So glm实际上给出了更准确、更有帮助的错误消息。问题是一样的glmboost虽然没有那么直接说出来。

这样做似乎可以“修复”它

test2 <- subset(test, teacher_prefix %in% c("Dr.","Mr.","Mrs.","Ms."))
test2$teacher_prefix <- droplevels(test2$teacher_prefix)
pred <- predict(model, newdata=test2, type="response")

我们只是去掉未使用的级别，然后进行标准预测。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

r

Regression

scale.default 中的错误：“center”的长度必须等于“x”的列数的相关文章

更改列名称的字母大小写

我有大量数据集每个数据集都包含一长串列名在某些文件中列名称全部大写而在某些文件中仅列名称的第一个字母大写我需要附加数据集并认为匹配数据集中的列名称的最简单方法是将全大写名称转换为仅第一个字母大写的名称我希望找到一个通用的解决
循环更改多个数据帧

例如我有这三个数据集就我而言它们更多并且有很多变量 data frame1 lt data frame a c 1 5 3 3 2 b c 3 6 1 5 5 c c 4 4 1 9 2 data frame2 lt data fra
无法使用include_graphics在Rmarkdown中插入png（错误：文件不是PNG格式）

这个错误很奇怪当我编织文档时出现以下错误 Quitting from lines 42 43 sigminer doc Rmd Error in png readPNG path native TRUE info TRUE file i
通过非 sf 列内连接两个 sf 对象

我尝试使用内连接或左连接连接两个 sf 数据帧这些数据框内部都有几何列我不断收到错误 check join x y 中的错误 y 应该是一个数据框对于空间连接请使用 st joinFALSE 下面的可重现示例 df1 lt data
R 中的匹配和计数字符串（DNA 的 k 聚体）

我有一个字符串列表 DNA 序列包括 A T C G 我想找到所有匹配项并插入到表中该表的列都是这些 DNA 字母表的所有可能组合 4 k k 是每个匹配项的长度 K mer 必须由用户指定行代表 DNA 字母表的数量在列表中按顺序匹
Leaflet Map - 第二个多边形使第一层不可点击

我正在制作美国社区调查数据地图目前我有一个主要层如下所示plotMerge incomePerCapita 它运作良好有一个完全充实的弹出窗口图像等等当我添加第二层以提供县和地区边界时区域边界变得不可单击似乎被新层掩盖了如果
通过变量分割 data.frame [重复]

这个问题在这里已经有答案了我将多个主题的数据存储在一个 CSV 文件中导入 CSV 文件后我想将每个参与者的数据拆分到自己的 data frame 中更确切地说我想采用下面的示例数据并创建三个新的 data frames 每个
列表列中的设置操作

我正在尝试做集合运算在存储在列表列中的向量之间例如this https stackoverflow com questions 38712196 text file to dataframe with a list column DT l
R 中的优化函数可以接受目标、梯度和粗麻布吗？

我有一个想要优化的复杂目标函数优化问题需要相当长的时间来优化幸运的是我确实有可用的函数的梯度和粗麻布 R 中是否有一个优化包可以接受所有这三个输入 optim 类不接受 Hessian 矩阵我已经扫描了用于优化的 CRAN 任务页面
R：使用 RGDAL 和 RASTER 包时抛出错误

给所有可能相关的人这是源代码 GRA D1 lt raster files 1 Sets up an empty output raster GRA D1 lt writeStart GRA D1 filename GRA D1 tif
加拿大人口普查地图分区 R

我对 R 和映射非常陌生我想创建某些数据的映射我有一组名为 D Montreal 的数据显示 2010 年前往蒙特利尔的加拿大人口普查部门游客来自哪个国家我想使用此数据创建一个地图以显示有多少人来自不同地区也许可以通过对根据人数
将值添加到 rCharts hPlot 工具提示

我想通过 rCharts 向标准 Highcharts 工具提示添加一些额外的值示例代码 require rCharts df lt data frame x c 1 5 y c 5 1 z c A B C D E name c K L
更改 ggplot 条形图填充颜色

有了这个数据 df lt data frame value c 20 50 90 group c 1 2 3 我可以得到一个条形图 df gt ggplot aes x group y value fill value geom col c
更改列的顺序

我正在处理一个包含 gt 40 列的大型数据框我希望能够移动列而不必指定所有列名称例如 a lt c 1 5 b lt c 4 3 2 1 1 Percent lt c 40 30 20 10 10 Labels lt c Cat D
R 中有没有快速替换列值的方法？

假设我们有一个包含数值的数据框如下所示 Temperature Height 32 157 31 159 33 139 我想更换Height价值观与pic 00001 pic 00002等等最终结果是 Temperature Heigh
在knitr中打印漂亮的交叉表

我想要的是使用 R Markdown 和 knit 从 RStudio 打印漂亮的交叉表无论是在 pdf 文件中还是在 html 文件中我怀疑我错过了一些非常明显的东西因为我不敢相信这是如此困难我使用 xtabs 或 ftable
完全缺失列的 VaR 计算

我需要计算股票收益的滚动 VaR 从这篇文章使用rollapply函数使用R进行VaR计算 https stackoverflow com questions 25045612 using rollapply function for v
R Shiny - 修复了 Shiny 仪表板中的侧边栏和主标题

我有一个简化的闪亮仪表板请参阅下面的代码我想修复侧边栏和主标题因此在其他帖子的帮助下我编写了一个 CSS 文件来解决该问题 sidebar color FFF position fixed width 220px white sp
将函数应用于每个列组合

我有一个数据框n列并希望对每个列应用一个函数组合列这与如何cor 函数将数据帧作为输入并生成相关矩阵作为输出例如 X lt data frame A rnorm 100 B rnorm 100 C rnorm 100 cor X 这将生
使 matplotlib 图形默认看起来像 R？

Is there a way to make matplotlib behave identically to R or almost like R in terms of plotting defaults For example R t

随机推荐

Azure 上的 React + Express：无效的主机标头

错误当部署到具有多容器支持的 Azure Web Apps 时我收到无效的主机标头消息来自https mysite azurewebsites com https mysite azurewebsites com 本地设置这运行良
无法在 Jenkins 上找到 TFS 插件 [重复]

这个问题在这里已经有答案了无法找到适用于 Jenkins 的 TFVC 或 Azure DevOps 和 Team Foundation Server 插件詹金斯版本 2 263 1 由于安全漏洞 TFS 插件的官方分发已暂停在 Je
空值 - 布尔表达式

所以我有一个关于考试作业的问题在这个作业中我们有一堆布尔表达式例如 FALSE OR NULL NULL 然后我们需要写出布尔表达式的值为此我使用了三值逻辑但是当您获得如下布尔表达式时它如何应用 NULLL AND TRUE O
如何使用 sql 查询将数据库的子集提取到 dbunit 文件中？

Why 我有一个很大的 Oracle 表我想测试一些 DAO 方法为此我使用 dbunit Problem 我想使用 sql 查询将现有数据库的子集提取为 dbunit 平面 xml 文件查询示例 Select t1 field1
spring任意消息传递tcp套接字

我正在使用 spring integration 开发定制的双向 TCP 套接字服务器服务器将处理请求响应任务但我无法向特定的连接 ID 发送任意消息我也知道也许使用TcpSendingMessageHandler and TcpR
如何在Python Flask中定位项目的for循环

所以我在烧瓶中有一个项目的 for 循环我正在尝试找到一种方法将它们放置在页面上我在引导卡中有一个任务的 for 循环当用户创建卡片时卡片变得非常愚蠢并且定位变得非常奇怪截图 https ibb co hRjCrFf 老师作业
如何使用 GitHub API 确定提交属于哪个 Pull Request？

给定提交 SHA 我想使用 GitHub API 来确定它属于哪个拉取请求 GitHub 在上显示此信息提交页面 https github com hammerlab pileup js commit ee49f07dba3821109b3
获取MySQL语句的精确执行时间

现在的结果表明集合中的 X 行 0 00 秒所以我想知道是否有办法可以得到小数点后两位以上的时间 Execute set profiling 1在运行语句之前然后获取计时show profiles query See 显示配置文件语法
Powershell 匹配属性，然后有选择地组合对象以创建第三个

我对此有一个解决方案但我相信这不是最好的方法因为它需要永远所以我正在寻找更快更好更智能的方法我有多个从 csv 文件中提取的 pscustomObject 对象每个对象至少有一个共同属性一个相对较小对象中大约 200 30
为什么 keras 使用“call”而不是 __call__？

我喜欢以下代码 https www tensorflow org tutorials eager custom layers https www tensorflow org tutorials eager custom layers cl
如何检测Ubuntu版本？

我目前正在编写一个Python应用程序来更改一些网络配置文件该应用程序需要在 Ubuntu 10 04 至 13 10 上运行问题是 NetworkManager 在不同版本上以不同的方式被破坏尽管他们似乎最终在 13 04 中修复了
SurfaceView.onAttachedToWindow 中的 NullPointerException（SurfaceView.java:207）

我收到 SurfaceView onAttachedToWindow 引发的异常看起来 SurfaceView 正在尝试引用 mParent 但它为空有谁知道为什么不会设置父级但会调用 onAttachedToWindow 我正在使用
如何强制 Highcharts 在 xAxis 上显示最后一个标签？

似乎在步进 xaxis 中正常的 showLastLabel 不起作用 https stackoverflow com questions 23518720 last label is not showing in stepped labe
Python 未找到 netCDF4 导入

我需要使用 netCDF4 模块来进行一些处理我已按照此步骤操作guide http unidata github io netcdf4 python 并尝试使用 anaconda2 作为推荐post https stackoverflo
如何在构造 std::vector 时放置元素？

我想构建一个std vector某些元素具有由某些特定构造函数而不是默认构造函数构造的元素换句话说我想在构建向量时放置元素我怎样才能做到这一点考虑this https godbolt org g sWNxJv struct Item
数学随机在数组中查找名称而不重复

所以我想从一个名称数组中找到三个名称然后我想将其写入一个新数组虽然还没有做到这一点但我遇到的问题是它不断随机化我已经找到的相同名称查看jsfiddle脚本 http jsfiddle net uxd4mzcp Code findSt
如何解释Map.map结果

当我看着Map mapscaladoc 我可以看到 map B f A B Map B 但下面的简单代码不会返回Map scala gt Map answer gt 42 map case k v gt v res40 scala coll
如何获取字符的ASCII值

我如何获得ASCII https en wikipedia org wiki ASCII一个字符的值int在Python中 From here http mail python org pipermail python win32 2005
使用一元/二元运算符进行后缀中缀

我正在尝试制作一个从后缀到中缀表示法的转换器并且需要一些帮助已经有关于中缀到后缀转换的问题 https stackoverflow com questions 2431863 infix to postfix and unary bin
scale.default 中的错误：“center”的长度必须等于“x”的列数

我在用mboost包做一些分类这是代码 library mboost load so data rdata model lt glmboost is exciting data training family Binomial pred

scale.default 中的错误：“center”的长度必须等于“x”的列数

scale.default 中的错误：“center”的长度必须等于“x”的列数 的相关文章

随机推荐

热门标签

scale.default 中的错误：“center”的长度必须等于“x”的列数的相关文章