THE MNIST DATABASE of handwritten digits

2023-11-14

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Four files are available on this site:

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

please note that your browser may uncompress these files without telling you. If the files you downloaded have a larger size than the above, they have been uncompressed by your browser. Simply rename them to remove the .gz extension. Some people have asked me "my application can't open your image files". These files are not in any standard image format. You have to write your own (very simple) program to read them. The file format is described at the bottom of this page.

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications.

The MNIST database was constructed from NIST's Special Database 3 and Special Database 1 which contain binary images of handwritten digits. NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint.

SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

转自：http://yann.lecun.com/exdb/mnist/

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

THE MNIST DATABASE of handwritten digits 的相关文章

Python中f-string的使用

Python 3 6引入了一个新的格式化字符串的方法 f string formatted string 它可以直接把变量写在字符串中使得格式化的字符串看起来很直观下面对f string进行简单介绍 f string的简单使用 name
如何让div中的内容垂直居中

虽然Div布局已经基本上取代了表格布局但表格布局和Div布局仍然各有千秋互有长处比如表格布局中的垂直居中就是Div布局的一大弱项不过好在千变万化的CSS可以灵活运用可以制作出准垂直居中效果勉强过关要让div中的内容垂直居中无
分析模式

1 找方向方向是最重要的如果一开始找错了方向那么努力多久都是白费最开始一定要确定有多少方向然后选择一个最靠谱的 2 过程中反思方向在过程中一定要经常反思自己的方向是否正确是否还有其他方向尤其是在碰壁之后一定要好好反思 3
STL list源码——实现框架、具体实现的详细分段剖析（迭代器的处理、list的实现）、list基本函数总结

list的底层采用的数据结构是环形的双向链表相对于vector容器的连续线性空间 list插入或删除要付出的代价比vector小很多对空间的运用有绝对的精准一点也不浪费但是list带有链表天生的弱点就是不支持随机访问从内置的迭代

随机推荐

超详细的零基础nodejs树状图~初始化nodejs~模块

前言学习任何新知识最重要的永远都是搭建属于自己的知识框架随后学习的细碎知识点往框架里面填入最后形成一棵属于自己的知识大树本系列的博客专注更新总结好的思维导图希望可以帮助大家快速理清知识结构一初识Node js 内置模块二
Attribute "result" must be declared for element type "select".

返回结果声明错误原因定义返回类型与实际不匹配修改前
贪心算法原理及其应用

概述贪心算法应该算是那种只闻其声不见其人的算法我们可能在好多地方都会听到贪心算法这一概念并且它的算法思想也比较简单就是说算法只保证局部最优进而达到全局最优但我们实际编程的过程中用的并不是很多究其原因可能是贪心算法使用的条件比
学习Kali的笔记

2022 4 15 查看当前使用的Shell类型 usr bin zsh 可以看到我们使用是zsh类型的shell centos7 使用的是bash类型的shell zsh功能更强而且zsh完全兼容bash的用法和命令配置apt命令在线
yolov5代码解读-dataset

前言下两篇 yolov5代码解读网络架构 yolov5代码解读训练代码已上传到github 数据集和权重文件已上传到百度网盘链接在github里如需下载请移步 https github com scc max yolov5 sc
理解密码学中的双线性映射

回顾什么是群一定义定义1 设G是定义了一个二元运算的集合如果这个运算满足下列性质 1 封闭性如果a和b都属于G 则a b也属于G 2 结合律对于G中的任意元素a b和c 都有 a b c a b c 成立 3 单位元 G中存
【华为OD】

华为OD试题注意事项使用合适的编程语言在华为OD机试中多数情况下使用C 或Java 按照题目要求进行编码仔细阅读题目描述并理解要求在编码前可以进行伪代码编写或画流程图有助于理解和排除逻辑错误注意代码的规范性注重代码的可读性和可维
linux启动和停止springboot项目的命令

1 启动命令 nohup java jar dingding function 0 0 1 SNAPSHOT jar gt catalina out 2 gt 1 2 命令详解 nohup 不挂断地运行命令退出帐户之后继续运行相应的进程
Accessors are only available when targeting ECMAScript 5 and higher 错误提示

来到这里说明聪明又勤快的你一定是在学习JavaScript的超大集群Typescript 幸幸苦苦写完代码运行结果如下 error TS1056 Accessors are only available when targeting
码云实战(一)——idea实现将本地的项目推送到码云上

文章目录前言一创建本地仓库并关联二将项目提交本地仓库三关联远程仓库 3 1 创建空白的远程库四推送到远程仓库五验证是否推送成功总结前言本系列文章主要记录日常使用中碰到的码云的相关问题一创建本地仓库并关联用I
Pandas知识点-详解聚合函数agg

Pandas知识点详解聚合函数agg Pandas提供了多个聚合函数聚合函数可以快速简洁地将多个函数的执行结果聚合到一起本文介绍的聚合函数为DataFrame aggregate 别名DataFrame agg aggregate
计算机共享打印怎么设置密码,共享打印机需要密码的解决方法

Q 共享打印机客户机访问主机计算机提示输入账户和密码如何解决 A 造成是此问题的原因是主机电脑安全级别较高造成的在主机电脑按照以下方法调整即可解决 1 Windows XP 点击开始控制面板 WINDOWS防火墙列外将文件和打
Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

Session 0x0 for server null unexpected error closing socket connection and attempting reconnect 错误原因 zookeeper没有正常启动为了避
Linux系统下PORT端口引脚导出GPIO对应的序号关系

文章首发于同名微信公众号 DigCore 欢迎关注同名微信公众号 DigCore 及时获取最新技术博文 PORT端口中的引脚序号与GPIOx的对应关系 GPIOx P 32 N P PORTA 0 PORTB 1 PORTC 2 N PA0
聊一聊fastjson

文章目录一新手引导 1 什么是fastjson 2 fastjson的优点 2 1 速度快 2 2 使用广泛 2 3 测试完备 2 4 使用简单 2 5 功能完备三源码分析 3 1JSON toJSONString 3 1 1调用J
使用OpenCASCADE绘制线束的基本操作

使用OpenCASCADE绘制线束的基本操作在OpenCASCADE中绘制线束是一个常见的操作下面我们将介绍OpenCASCADE中绘制线束的基本命令以及相应的源代码创建导向线要在OpenCASCADE中创建导向线可以使用以下
vscode卡顿优化设置

点击左上角文件首选项设置 1 向Microsoft发送使用情况搜索关键词 telemetry 2 搜索索引搜索关键词 search exclude 搜索是VSCode最耗费内存的活动之一它必须保留所有文件及其内容的索引您可能不
渗透信息收集步骤（简约版）

第一步域名的信息收集 1 whois信息查询备案信息查询相关查询地址天眼查 https www tianyancha com ICP备案查询网 http www beianbeian com 国家企业信用信息公示系统 http ww
互联网情报屋

社交领域微信手机 QQ 新浪微博陌陌等在线游戏腾讯奇虎 360 昆仑在线视频优酷土豆爱奇艺 PPS 乐视迅雷看看在线娱乐 YY 9158 招聘 51job 智联招聘下载工具迅雷 QQ旋风网盘金山快盘 360云
THE MNIST DATABASE of handwritten digits

The MNIST database of handwritten digits available from this page has a training set of 60 000 examples and a test set o

THE MNIST DATABASE of handwritten digits

THE MNIST DATABASE of handwritten digits 的相关文章

随机推荐

热门标签