如何创建包含特征选择和 KerasClassifier 的 sklearn Pipeline？ GridSearchCV 期间 input_dim 更改的问题

2024-04-15

我创建了一个 sklearn Pipeline，它使用 SelectPercentile(f_classif) 进行通过管道传输到 KerasClassifier 的特征选择。 SelectPercentile 使用的百分位是网格搜索中的超参数。这意味着输入尺寸在网格搜索期间会发生变化，并且我未能成功设置 KerasClassifier 的 input_dim 以相应地适应此参数。

我不认为有办法访问 sklearn 的 GridSearchCV 中的 KerasClassifier 中通过管道传输的减少的数据维度。也许有一种方法可以在 Pipeline 中的 SelectPercentile 和 KerasClassifier 之间共享单个超参数（以便百分位超参数可以确定 input_dim）？我认为一个可能的解决方案是构建一个自定义分类器，将管道中的两个步骤包装成一个步骤，以便可以共享百分位超参数。

到目前为止，该错误始终会产生“ValueError：检查输入时出错：预期dense_1_input具有形状（112，）但在模型拟合期间得到形状为（23，）的数组”的变体。

def create_baseline(input_dim=10, init='normal', activation_1='relu', activation_2='relu', optimizer='SGD'):
    # Create model
    model = Sequential()
    model.add(Dense(50, input_dim=np.shape(X_train)[1], kernel_initializer=init, activation=activation_1))
    model.add(Dense(25, kernel_initializer=init, activation=activation_2))
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=["accuracy"])
    return model

tuned_parameters = dict(
                            anova__percentile = [20, 40, 60, 80],
                            NN__optimizer = ['SGD', 'Adam'],
                            NN__init = ['glorot_normal', 'glorot_uniform'],
                            NN__activation_1 = ['relu', 'sigmoid'],
                            NN__activation_2 = ['relu', 'sigmoid'],
                            NN__batch_size = [32, 64, 128, 256]
                        )

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=2)
for train_indices, test_indices in kfold.split(data, labels):
    # Split data
    X_train = [data[idx] for idx in train_indices]
    y_train = [labels[idx] for idx in train_indices]
    X_test = [data[idx] for idx in test_indices]
    y_test = [labels[idx] for idx in test_indices]

    # Pipe feature selection and classifier together
    anova = SelectPercentile(f_classif)
    NN = KerasClassifier(build_fn=create_baseline, epochs=1000, verbose=0)
    clf = Pipeline([('anova', anova), ('NN', NN)])      

    # Train model
    clf = GridSearchCV(clf, tuned_parameters, scoring='balanced_accuracy', n_jobs=-1, cv=kfold)
    clf.fit(X_train, y_train)
    # Test model
    y_true, y_pred = y_test, clf.predict(X_test)

我找到的解决方案是在 ANOVASelection 期间声明转换后的 X 的全局变量，然后在 create_model 中定义 input_dim 时访问该变量。

# Custom class to allow shape of transformed x to be known to classifier
class ANOVASelection(BaseEstimator, TransformerMixin):
    def __init__(self, percentile=10):
        self.percentile = percentile
        self.m = None
        self.X_new = None
        self.scores_ = None

    def fit(self, X, y):
        self.m = SelectPercentile(f_classif, self.percentile)
        self.m.fit(X,y)
        self.scores_ = self.m.scores_
        return self

    def transform(self, X):
        global X_new
        self.X_new = self.m.transform(X)
        X_new = self.X_new
        return self.X_new


# Define neural net architecture 
def create_model(init='normal', activation_1='relu', activation_2='relu', optimizer='SGD', decay=0.1):
    clear_session()
    # Determine nodes in hidden layers (Huang et al., 2003)
    m = 1 # number of ouput neurons
    N = np.shape(data)[0] # number of samples
    hn_1 = int(np.sum(np.sqrt((m+2)*N)+2*np.sqrt(N/(m+2))))
    hn_2 = int(m*np.sqrt(N/(m+2)))
    # Create layers
    model = Sequential()

    if optimizer == 'SGD':
        model.add(Dense(hn_1, input_dim=np.shape(X_new)[1], kernel_initializer=init,
                        kernel_regularizer=regularizers.l2(decay/2), activation=activation_1))
        model.add(Dense(hn_2, kernel_initializer=init, kernel_regularizer=regularizers.l2(decay/2),
                        activation=activation_2))
    elif optimizer == 'AdamW':
        model.add(Dense(hn_1, input_dim=np.shape(X_new)[1], kernel_initializer=init,
                        kernel_regularizer=regularizers.l2(decay), activation=activation_1))
        model.add(Dense(hn_2, kernel_initializer=init, kernel_regularizer=regularizers.l2(decay),
                        activation=activation_2))

    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    if optimizer == 'SGD':
        model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=["accuracy"])
    if optimizer == 'AdamW':
        model.compile(loss='binary_crossentropy', optimizer=AdamW(), metrics=["accuracy"])
    return model


tuned_parameters = dict(
                            ANOVA__percentile = [20, 40, 60, 80],
                            NN__optimizer = ['SGD', 'AdamW'],
                            NN__init = ['glorot_normal', 'glorot_uniform'],
                            NN__activation_1 = ['relu', 'sigmoid'],
                            NN__activation_2 = ['relu', 'sigmoid'],
                            NN__batch_size = [32, 64, 128, 256],
                            NN__decay = [10.0**i for i in range(-10,-0) if i%2 == 1]
                        )

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=2)
for train_indices, test_indices in kfold.split(data, labels):
    # Ensure models from last iteration have been cleared.
    clear_session()

    # Learning Rate
    clr = CyclicLR(mode='triangular', base_lr=0.001, max_lr=0.6, step_size=5) 

    # Split data
    X_train = [data[idx] for idx in train_indices]
    y_train = [labels[idx] for idx in train_indices]
    X_test = [data[idx] for idx in test_indices]
    y_test = [labels[idx] for idx in test_indices]

    # Apply mean and variance center based on training fold
    scaler = StandardScaler().fit(X_train)
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)

    # Memory handling
    cachedir = tempfile.mkdtemp()
    mem = Memory(location=cachedir, verbose=0)
    f_classif = mem.cache(f_classif)

    # Build and train model
    ANOVA = ANOVASelection(percentile=5)
    NN = KerasClassifier(build_fn=create_model, epochs=1000, verbose=0)
    clf = Pipeline([('ANOVA', ANOVA), ('NN', NN)])
    clf = GridSearchCV(clf, tuned_parameters, scoring='balanced_accuracy', n_jobs=28, cv=kfold)
    clf.fit(X_train, y_train, NN__callbacks=[clr])

    # Test model
    y_true, y_pred = y_test, clf.predict(X_test)

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

如何创建包含特征选择和 KerasClassifier 的 sklearn Pipeline？ GridSearchCV 期间 input_dim 更改的问题的相关文章

检测到通过 ChromeDriver 启动的 Chrome 浏览器

我正在尝试在 python 中使用 selenium chromedriver 来访问 www mouser co uk 网站然而从第一次拍摄开始它就被检测为机器人有人对此有解释吗此后我使用的代码 options Options
numpy python 中的“AttributeError：'matrix'对象没有属性'strftime'”错误

我有一个维度为 72000 1 的矩阵该矩阵涉及时间戳我想使用 strftime 如下所示 strftime d m y 为了得到像这样的输出 11 03 02 我有这样一个矩阵 M np matrix timestamps 我使用了
从 Python 下载/安装 Windows 更新

我正在编写一个脚本来自动安装 Windows 更新我可以将其部署在多台计算机上这样我就不必担心手动更新它们我想用 Python 编写这个但找不到任何关于如何完成此操作的信息我需要知道如何搜索更新下载更新并从 python 脚本安
Pyqt-如何因另一个组合框数据而更改组合框数据？

我有一个表有 4 列这 4 列中的两列是关于功能的一个是特征另一个是子特征在每一列中所有单元格都有组合框我可以在这些单元格中打开txt 我想当我选择电影院作为功能时我只想看到子功能组合框中的电影名称而不是我的数据中的
Pandas dataframe：每批行的操作

我有一个熊猫数据框df我想计算每批行的一些统计信息例如假设我有一个batch size 200000 对于每批batch sizerows 我想要一列的唯一值的数量ID我的数据框我怎样才能做这样的事情呢这是我想要的一个例子 prin
小部件之间的自定义信号

尝试将信号从一个 gtk EventBox 子级发送到另一个在 init HeadMode 第 75 行上出现错误类型错误未知信号名称消息发送 why usr bin env python coding utf8 import p
更改 x 轴比例

我使用 Matlab 创建了这个图使用 matplotlib x 轴绘制大数字例如 100000 200000 300000 我想要 1 2 3 和 10 5 之类的值来指示它实际上是 100000 200000 300000 有没有一
编辑 Jupyter Notebook 时 VS Code 中缺少“在选择中查找”

使用 Jupyter Notebook 时 VSCode 中缺少在选择中查找按钮它会减慢开发速度所以我想请问有人知道如何激活它吗第一张图显示了在 python 文件中的搜索替换第二张图显示了笔记本电脑中缺少的按钮 Python
ValueError：不支持连续[重复]

这个问题在这里已经有答案了我正在使用 GridSearchCV 进行线性回归的交叉验证不是分类器也不是逻辑回归我还使用 StandardScaler 对 X 进行标准化我的数据框有 17 个特征 X 和 5 个目标 y 观察约11
如何使用 javascript/jquery/AJAX 调用 Django REST API？

我想使用 Javascript jQuery AJAX 在前端调用 Django Rest API 请求方法是 POST 但当我看到 API 调用它的调用 OPTIONS 方法时所以我开始了解access control allow o
Alembic：如何迁移模型中的自定义类型？

My User模型是 class User UserMixin db Model tablename users noinspection PyShadowingBuiltins uuid Column uuid GUID default
设置 verify_certs=False 但 elasticsearch.Elasticsearch 因证书验证失败而引发 SSL 错误

self host KibanaProxy 自我端口 443 self user 测试 self password 测试我需要禁止证书验证使用选项时它与curl一起使用 k在命令行上但是在使用 Elasticsearch pytho
迭代列表的奇怪速度差异

我创建了两个重复两个不同值的长列表在第一个列表中值交替出现在第二个列表中一个值出现在另一个值之前 a1 object object 10 6 a2 a1 2 a1 1 2 然后我迭代它们不对它们执行任何操作 for in a1 p
Werkzeug 中的线程和本地代理。用法

首先我想确保我正确理解了功能的分配分配本地代理功能以通过线程内的模块包共享变量对象我对吗其次用法对我来说仍然不清楚也许是因为我误解了作业我用烧瓶如果我有两个或更多模块 A B 我想将对象C从模块A导入到模块B 但我
为什么我应该使用 WSGI？

使用 mod python 一段时间了我读了越来越多关于 WSGI 有多好的文章但没有真正理解为什么那么我为什么要切换到它呢有什么好处这很难吗学习曲线值得吗为了用 Python 开发复杂的 Web 应用程序您可能会使用更全面
使用 pybtex 将 bibtex 转换为格式化的 HTML 参考书目，例如哈佛风格

我正在使用 Django 并将 bibtex 存储在我的模型中并且希望能够以格式化 HTML 字符串的形式向我的视图传递引用使其看起来像哈佛引用样式使用中描述的方法Pybtex 无法识别 bibtex 条目 https stackov
附加两个具有相同列、不同顺序的数据框

我有两个熊猫数据框 noclickDF DataFrame 0 123 321 0 1543 432 columns click id location clickDF DataFrame 1 123 421 1 1543 436 colu
Python问题：打开和关闭文件返回语法错误

大家好我发现了这个有用的 python 脚本它允许我从网站获取一些天气数据我将创建一个文件和其中的数据集有些东西不起作用它返回此错误 File
异常：加载数据时 URL 获取失败

我正在尝试设置我的机器来运行 Tensorflow 2 我从未使用过 Tensorflow 只是下载了 Python 3 7 我不确定这是否是我的机器的问题我按照上面列出的安装说明进行操作TensorFlow 的网站 https www
tkinter：打开一个带有按钮提示的新窗口[关闭]

Closed 这个问题需要调试细节 help minimal reproducible example 目前不接受答案用户如何按下 tkinter GUI 中的按钮来打开新窗口我只需要非常简单的解决方案如果代码也能被解释那就太好了这

随机推荐

IntelliJ读取远程服务器日志文件

您知道如何设置远程配置以在服务器而不是本地计算机上显示日志文件吗在编辑配置屏幕日志选项卡上我可以选择显示日志文件但仅记录来自我的计算机的日志而不是来自远程服务器的日志我不介意是否必须为其安装任何插件但到目前为止我找不到任何
将非 unicode、非英语内容转换为 unicode

我有一个 xyz 语言的文本内容 p style font family xyz eWvS kmwkMns kq t mWmb KmeIvkn kocokns aq mw Xn v p It will not display correct
Rails 3.1应用程序部署教程

我正在寻找一个关于服务器上 Rails 3 1 1 应用程序的良好部署教程我所说的好实际上是指完整我发布这个问题的原因是尽管网络上有大量教程谷歌博客书籍其他 stackoverflow 问题等但它们似乎都集中在部署过程
有没有办法在php代码中注意到E_NOTICE？

我有一个 PHP 脚本需要执行几个小时有时由于某些原因例如执行需要连接到互联网的脚本时出现网络问题等执行过程会停止一段时间然后做了错误的事情当进程走向错误时它总是会导致 E NOTICE 我的问题是有任何方法可以注意到脚
如何将元素翻译为里程表

我有代码 div class wrap2 span 0 span span 1 span CSS wrap2 data num 0 transfom translate 0 0 wrap2 data num 1 transform tran
PHP 应用程序 URL 路由

因此我正在编写一个框架我想在该框架上构建一些我正在开发的应用程序该框架在那里所以我有一个可以使用的环境以及一个可以让我使用单个应用程序的系统登录我想制作这个框架它的应用程序使用面向资源的架构现在我想创建一个可由 APP
VBscript 使用输出参数从 MySQL 调用存储过程[重复]

这个问题在这里已经有答案了编辑20220219 使用下面的 VBSCRIPT 代码解决 SQL CALL NewCheckData pOld cn execute SQL SQL SELECT pOld Set RS cn execute
如何使用 ForwardRefRenderFunction 导出forwardRef

我有一个属于 UI 库的组件我们将其称为输入组件当使用这个库调用Input时我可以调用的类型有很多例如
默认的 msbuild 平台是什么

如果没有指定 msbuild如何选择平台在我看来对于某些解决方案它为其他 x86 选择混合平台我打开日志记录的诊断级别我唯一能看到的是开头的初始属性包含例如平台混合平台没有任何解释为了抢占一些答案我知道我可以手动覆
Spring Boot中使用PostgreSQL驱动创建数据源时出现异常

我正在尝试使用 Spring Boot 创建一个非 Web 应用程序MKyong 的例子 https www mkyong com spring boot spring boot non web application example 但我
Gulp、Reactify 和 Babelify 不能一起转换

这是我的 gulpfile 代码 gulp task react function browserify app src main jsx transform reactify transform babelify bundle pipe
SES：在 lambda 函数内访问电子邮件正文

我对 AWS 比较陌生我正在尝试通过 Lambda 函数处理我的电子邮件我在 node js 中构建了这个 use strict exports handler event context callback gt var http re
soundex算法的数据结构？

谁能建议我使用什么数据结构声学算法 http en wikipedia org wiki Soundex程序使用的语言是Java 如果有人以前用 Java 做过这个工作该程序应具有以下功能能够阅读约50 000字应该能够读取一个单词
单击图像时打开 Bootstrap 模式

当我点击时menu 5 1 png应该会弹出下面的模型 li class men 5l a href span img src images menu 5 1 PNG alt span p Mp3 p a li div class moda
不允许加载本地资源尝试使用 Javascript 在 Android 上打开 googlechrome://navigate?url=xxxx.com URI 架构

在 Android 设备上我需要使用 google chrome 打开 URL 无论单击 URL 的浏览器是什么在我的例子中是 facebook 应用内浏览器为此我使用 chrome URI 模式创建了带有 Javascript 重
可视化管理 MongoDB 文档和集合 [关闭]

就目前情况而言这个问题不太适合我们的问答形式我们希望答案得到事实参考资料或专业知识的支持但这个问题可能会引发辩论争论民意调查或扩展讨论如果您觉得这个问题可以改进并可能重新开放访问帮助中心 help reopen questi
如果已经满足条件，则跳过活动的开始

在我的 Android 应用程序中我有一个使用以下方法的 Google plus 登录活动 Override public void onConnected Bundle connectionHint String accountName
使用 JNDI 数据源的 Spring Boot

我有一个新的 Spring Boot Web 应用程序我想连接到 JNDI 数据源 Tomcat 的 context xml 中定义的 MySQL 数据库然而当我尝试这样做时我总是遇到以下异常 org springframework
Safari Mobil iFrame 内容在视图之外未呈现

Problem Open https run plnkr co preview cjt4eonvv00043e5jhlqw9olb https run plnkr co preview cjt4eonvv00043e5jhlqw9olb 在
如何创建包含特征选择和 KerasClassifier 的 sklearn Pipeline？ GridSearchCV 期间 input_dim 更改的问题

我创建了一个 sklearn Pipeline 它使用 SelectPercentile f classif 进行通过管道传输到 KerasClassifier 的特征选择 SelectPercentile 使用的百分位是网格搜索中的超参数

如何创建包含特征选择和 KerasClassifier 的 sklearn Pipeline？ GridSearchCV 期间 input_dim 更改的问题

如何创建包含特征选择和 KerasClassifier 的 sklearn Pipeline？ GridSearchCV 期间 input_dim 更改的问题 的相关文章

随机推荐

热门标签

如何创建包含特征选择和 KerasClassifier 的 sklearn Pipeline？ GridSearchCV 期间 input_dim 更改的问题的相关文章