gpt 语言模型_了解gpt 3 openais最新语言模型

2023-10-27

gpt 语言模型

1.简介 (1. Introduction)

If you have been following the recent developments in the NLP space, then it would be almost impossible to avoid the GPT-3 hype in the last few months. It all started with researchers in OpenAl researchers publishing their paper “Language Models are few Shot Learners” which introduced the GPT-3 family of models.

如果您一直关注NLP领域的最新发展,那么在过去的几个月中避免GPT-3的炒作几乎是不可能的。 一切始于OpenAl研究人员的研究,他们发表了论文“语言模型是很少的镜头学习者”,其中介绍了GPT-3系列模型。

GPT-3’s size and language capabilities are breathtaking, it can create fiction, develop program code, compose thoughtful business memos, summarize text and much more. Its possible use cases are limited only by our imaginations. What makes it fascinating is that the same algorithm can perform a wide range of tasks. At the same time there is widespread misunderstanding about the nature and risks of GPT-3’s abilities.

GPT-3的大小和语言功能令人叹为观止,它可以创建小说,开发程序代码,撰写周到的业务备忘录,摘要文本等等。 它的可能用例仅受我们的想象力限制。 令人着迷的是,同一算法可以执行各种各样的任务。 同时,人们对GPT-3的性质和风险存在广泛的误解。

To better appreciate the powers and limitations of GPT-3, one needs some familiarity with pre-trained NLP models which came before it. Table below compares some of the prominent pre-trained models:

为了更好地了解GPT-3的功能和局限性,需要熟悉GPT-3之前的预训练NLP模型。 下表比较了一些著名的预训练模型:

Source: https://medium.com/dataseries/bert-distilbert-roberta-and-xlnet-simplified-explanation-and-implementation-5a9580242c70
资料来源: https : //medium.com/dataseries/bert-distilbert-roberta-and-xlnet-simplified-explanation-and-implementation-5a9580242c70

Let’s look at some of the common characteristics of the pre-trained NLP models before GPT-3:

让我们看一下GPT-3之前的预训练NLP模型的一些共同特征:

i) NLP Pre-Trained models are based on Transformer Architecture

i) NLP预训练模型基于变压器架构

Most of the pre-trained models belong to the Transformer family that use Attention techniques;These models can be divided into four categories:

大部分经过预训练的模型都属于使用Attention技术的Transformer系列;这些模型可以分为四类:

Types of Language Models 语言模型的类型

ii)Different models for different tasks

ii)用于不同任务的不同模型

The focus has been on creating customized models for various NLP tasks. So we have different pre-trained model for separate NLP tasks like Sentiment Analysis, Question Answering, Entity Extraction etc.

重点一直放在为各种NLP任务创建定制模型。 因此,对于单独的NLP任务(例如情感分析,问题回答,实体提取等),我们有不同的预训练模型。

iii). Fine-tuning of pre-trained model for improved performance

iii)。 对预训练模型进行微调以提高性能

For each task the pre-trained model needs to be fine-tuned to customize it to the data at hand. Fine-tuning involves gradient updates for the respective pre-trained model and the updated weights were then stored for making predictions on respective NLP tasks

对于每个任务,都需要对预训练的模型进行微调,以针对手头的数据进行定制。 微调涉及各个预训练模型的梯度更新,然后存储更新的权重以对各个NLP任务进行预测

iv) Dependency of fine-tuning on large datasets

iv) 对大型数据集进行微调的依赖性

Fine-tuning models requires availability of large custom labeled data. This has been a bottleneck when it comes to extension of Pre-trained model to new domains where labeled data is limited.

微调模型需要使用大量自定义标签数据。 当将预训练模型扩展到标签数据受限的新域时,这是一个瓶颈。

v) Focus was more on architectural improvements rather than size

v) 重点更多是架构改进,而不是规模

While one saw the emergence of new pre-trained models in a short span, the larger focus has been on bringing architectural improvements or training on different datasets to widen the net of NLP applications.

尽管人们在短期内看到了新的预训练模型的出现,但更大的重点是对体系结构进行了改进或对不同数据集进行了训练,以扩大NLP应用的范围。

2. GPT-3-快速概述 (2. GPT-3 -Quick Overview)

Key facts about GPT-3:

有关GPT-3的主要事实:

  • Models: GPT-3 has eight different models with sizes ranging from 125 million to 175 billion parameters.

    型号:GPT-3有八种不同的型号,参数的大小从1.25亿到1,750亿不等。
  • Model Size: The largest GPT-3 model has 175 billion parameter. This is 470 times bigger than the largest BERT model (375 million parameters)

    型号大小:最大的GPT-3型号参数为1750亿。 这是最大的BERT模型(3.75亿个参数)的470倍
  • Architecture: GPT-3 is an Autoregressive model and follows a decoder only architecture. It is trained using next word prediction objective

    体系结构:GPT-3是一种自回归模型,遵循仅解码器的体系结构。 使用下一个单词预测目标进行训练
  • Learning: GPT-3 learns through Few Shots and there is no Gradient updates while learning

    学习:GPT-3通过少量射击学习,学习时没有渐变更新
  • Training Data Needed: GPT-3 needs less training data. It can learn from very less data and this enables its application on domains having less data

    所需的培训数据:GPT-3需要较少的培训数据。 它可以从很少的数据中学习,这使其可以在数据较少的域中应用

Sizes, architectures, and learning hyper-parameters of the GPT 3 models

GPT 3模型的大小,架构和学习超参数

Source: https://arxiv.org/abs/2005.14165
资料来源: https : //arxiv.org/abs/2005.14165

Key design assumptions in GPT 3 model:

GPT 3模型中的关键设计假设:

(i) Increase in model size and training on larger data can lead to improvement in performance

(i)增加模型大小并接受大数据训练可提高性能

(ii) A single model can provide good performance on a host of NLP tasks.

(ii)单个模型可以在许多NLP任务上提供良好的性能。

(iii) Model can infer from new data without the need for fine-tuning

(iii)模型可以从新数据推断而无需进行微调

(iv) The model can solve problems on datasets it has never been trained upon.

(iv)该模型可以解决从未训练过的数据集上的问题。

3. GPT-3如何学习 (3. How GPT-3 learns)

Traditionally the pre-trained models have learnt using fine-tuning. Fine-tuning of models need lot of data for the problem we are solving and also required update to model weights. The existing fine-tuning approach is explained in below diagram.

传统上,预训练模型是使用微调学习的。 模型的微调需要大量数据来解决我们要解决的问题,并且还需要更新模型权重。 下图说明了现有的微调方法。

Learning Process for earlier Pre-Trained Language Models — Fine Tuning

早期预训练语言模型的学习过程-微调

GPT-3 adopts a different learning approach. There is no need for large labeled data for inference on new problems. Instead, it can learn from no data (Zero Shot Learning), just one example (One Shot Learning) or few examples (Few Shot Learning).

GPT-3采用了不同的学习方法。 无需大量标记数据即可推断出新问题。 取而代之的是,它可以从没有数据的学习中(零镜头学习),仅一个示例(一次镜头学习)或少量示例(很少镜头学习)学习。

Below we have shown a representation of different learning approach followed by GPT-3.

下面我们展示了GPT-3所采用的不同学习方法。

4. GPT-3与BERT有何不同 (4. How is GPT-3 different from BERT)

BERT was among the earliest pre-trained model and is credited with setting the benchmarks for most NLP tasks. Below we compare GPT-3 with BERT on three dimensions:

BERT是最早的预训练模型之一,并为大多数NLP任务设定了基准。 下面我们在三个方面比较GPT-3和BERT:

Things which stand out from above representation are:

从上面的表示中脱颖而出的是:

  • GPT-3 size is the stand out feature. It’s almost 470 times the size of largest BERT model

    GPT-3尺寸是突出的功能。 它几乎是最大的BERT型号的470倍
  • On the Architecture dimension, BERT still holds the edge. It’ s trained-on challenges which are better able to capture the latent relationship between text in different problem contexts.

    在体系结构维度上,BERT仍然占据优势。 它是经过培训的挑战,可以更好地捕获不同问题上下文中文本之间的潜在关系。
  • GPT-3 learning approach is relatively simple and can applied on many problems where sufficient data does not exist. Thus GPT-3 should have a wider application when compared to BERT.

    GPT-3学习方法相对简单,可以应用于许多没有足够数据的问题。 因此,与BERT相比,GPT-3应该有更广泛的应用。

5. GPT-3真正成功的地方 (5. Where GPT-3 has really been successful)

Application of NLP techniques have evolved with the progress made in learning better representation of the underlying text corpus. Below chart gives a quick overview of some of the traditional NLP application areas.

NLP技术的应用随着学习基础文本语料库的更好表示所取得的进步而发展。 下图快速概述了一些传统的NLP应用领域。

NLP Application Areas NLP应用领域

Traditional NLP built on Bag of Words approach was limited to tasks like parsing text, sentiment analysis, topic models etc. With the emergence of Word vectors and Neural Language models, new applications like Machine Translation, entity recognition, information retrieval came into prominence.

基于词袋方法的传统NLP仅限于解析文本,情感分析,主题模型等任务。随着词向量和神经语言模型的出现,机器翻译,实体识别,信息检索等新应用程序应运而生。

In last couple of years, the emergence of Pre-trained models like BERT, Roberta etc. and supporting frameworks like Hugging Face, Spacy Transformers have made NLP tasks like Reading Comprehension, Text Summarization etc. possible and state of the art benchmarks were created by these NLP models.

在最近几年中,像BERT,Roberta等这样的预训练模型的出现以及诸如Hugging Face,Spacy Transformers之类的支持框架的出现,使诸如阅读理解,文本摘要等NLP任务成为可能,并且最先进的基准测试是由这些NLP模型。

The frontiers where pre-trained NLP models struggled were tasks like Natural Language Generation, Natural Language Inference, Common Sense Reasoning tasks. Also, there was question mark of application of NLP in these areas where limited data is available. So the question is how much impact is GPT-3 able to make on some of these tasks.

训练有素的NLP模型遇到的难题是诸如自然语言生成,自然语言推理,常识推理任务之类的任务。 同样,在数据有限的这些领域中,NLP的应用存在问号。 因此,问题是GPT-3能够对其中某些任务产生多大的影响。

GPT-3 has been able to make substantive progress on i. Text Generation tasks and ii. Extend NLP’s application into domains where there is lack of enough training data.

GPT-3已在i方面取得实质性进展。 文本生成任务和ii。 将NLP的应用程序扩展到缺少足够培训数据的领域。

  • Text Generation Capabilities: GPT-3 is very powerful when it comes to generating text. Based on the human surveys done, it has been observed that very little separates the text generated by GPT-3 compared to one developed by humans. This is great development for building solutions in the space of generating creative fictions, stories, resumes, narratives, chatbots, text summarization etc. At the same time the world is taking cognizance of the fact that this power can be used by unscrupulous elements to create and plant fake data on social platforms.

    文本生成功能: GPT-3在生成文本方面非常强大。 根据所做的人类调查,已经观察到,与人类开发的文本相比,GPT-3生成的文本几乎没有分隔。 这是在生成创意小说,故事,简历,叙事,聊天机器人,文本摘要等空间中构建解决方案的巨大发展。与此同时,世界正在意识到这样的事实,即这种力量可以被不道德的元素用来创造并将假数据植入社交平台。

Source: https://old.reddit.com/r/slatestarcodex/comments/hmu5lm/fiction_by_neil_gaiman_and_terry_pratchett_by_gpt3
资料来源: https : //old.reddit.com/r/slatestarcodex/comments/hmu5lm/fiction_by_neil_gaiman_and_terry_pratchett_by_gpt3
  • Build NLP solutions with limited data: The other area where GPT-3 models have left a mark are domains where limited data is available. We have seen the open source community use GPT-3 API’s for tasks like generation of UNIX Shell commands, SQL queries, Machine Learning code etc. All that users need to provide is a task description in plain English and some examples of input/output. This can have huge potential for organizations to automate routine tasks, speeding up processes and focus their talent on higher value tasks

    使用有限的数据构建NLP解决方案: GPT-3模型留下深刻印象的其他领域是可获得有限数据的领域。 我们已经看到开放源代码社区使用GPT-3 API来执行诸如UNIX Shell命令生成,SQL查询,机器学习代码等任务。用户所需提供的只是用简单的英语描述的任务以及一些输入/输出示例。 这对于组织自动执行日常任务,加速流程并将其才能集中于高价值任务具有巨大的潜力

Source: https://vimeo.com/427943407/98fe5258a7
资料来源: https : //vimeo.com/427943407/98fe5258a7

6. GPT-3在哪里挣扎 (6. Where GPT-3 struggles)

We have seen GPT-3 being able to make substantive progress on Text Generation tasks and also extend NLP’s applications to domains that have limited data available. However how does it fare on the traditional NLP tasks like Machine Translation, Reading Comprehension, and Natural Language Inference tasks. It’s a mixed bag and this is clearly documented in the original research paper.

我们已经看到GPT-3可以在“文本生成”任务上取得实质性进展,并且还可以将NLP的应用程序扩展到可用数据有限的域。 但是,它如何处理传统的NLP任务,例如机器翻译,阅读理解和自然语言推理任务。 它是一个混合袋,并且在原始研究论文中有明确记录。

  • Language Modeling — GPT-3 beat all the benchmarks on pure Language Modeling tasks.

    语言建模 — GPT-3在纯语言建模任务上击败了所有基准测试。

  • Machine Translation — The model is able to beat the benchmark performance for translation tasks that require conversion of documents to English language. But the opposite is not true and GPT-3 performance struggles if the language translation needs to be done from English to a Non-English Language.

    机器翻译 -对于需要将文档转换为英语的翻译任务,该模型可以超越基准性能。 但是事实并非如此,如果需要将语言从英语翻译成非英语,则GPT-3的性能会很困难。

  • Reading Comprehension — GPT 3 models performance fall well short of the State of the Art here.

    阅读理解 -GPT 3模型的性能远远低于此处的最新水平。

  • Natural Language Inference — Natural Language Inference (NLI) concerns the ability to understand the relationship between two sentences. GPT 3 models performance fall well short on NLI tasks

    自然语言推论 —自然语言推论(NLI)与理解两个句子之间的关系有关。 GPT 3模型的性能远远低于NLI任务

  • Common Sense Reasoning — Common Sense Reasoning datasets test performance on physical or scientific reasoning skills. GPT 3 models performance fall well short on these tasks

    常识推理 -常识推理数据集可测试物理或科学推理技能的性能。 GPT 3模型的性能远远不能满足这些任务

7.未来之路 (7. Road Ahead)

Integration Challenges: At the moment GPT-3 has been made available to selected users using Open AI’s api and the user community is happy to build toy applications using GPT-3. Many firms especially in the Financial world have regulations that prohibit transfer of data outside acceptable the firm. Given GPT-3 size, if the model needs to be integrated into mainstream applications, it will be a herculean effort to develop the necessary infrastructure to ingest the data and model.

集成挑战:目前,已使用Open AI的api将GPT-3提供给选定的用户,并且用户社区很高兴使用GPT-3构建玩具应用程序。 许多公司,特别是在金融界,都有规定禁止在公司可接受的范围之外传输数据。 在GPT-3大小的情况下,如果需要将模型集成到主流应用程序中,则将大力开发必要的基础结构以提取数据和模型。

Single versus Hybrid Model Debate: The dream of having a single model for all tasks, which does not need to be trained and can learn without much data, is a cherished one. GPT-3 has made the first steps towards achieving it, but there is a journey that still needs to be made. In its present form, GPT-3 is a mixed bag. Organizations will have to choose horses for courses approach. One possible approach is to use GPT-3 for tasks like Text Generation, Machine Translation and areas where limited data exists. However, the existing pre-trained customized models (BERT/Roberta/Reformer….) will continue to hold ground on traditional tasks like Entity Recognition, Sentiment Analysis, Question Answering.

单一模型与混合模型的争论:梦dream以求的是,无需为任何任务训练就可以拥有单个模型,并且无需大量数据就可以学习。 GPT-3已朝着实现这一目标迈出了第一步,但仍有一段路要走。 以目前的形式,GPT-3是一个混合袋。 组织将不得不为课程选择马。 一种可能的方法是将GPT-3用于诸如文本生成,机器翻译和数据有限的区域之类的任务。 但是,现有的经过预先训练的定制模型(BERT / Roberta / Reformer…。)将继续在传统任务(例如实体识别,情感分析,问题回答)上站稳脚跟。

Concerns over Model Bias and Explainability: Given the sheer size of GPT-3 it will be very difficult for firms to explain the decisions made by the algorithm. There is no way for firms to regulate the data that was used to train the algorithm. How do we know if the training data has in built bias or the algorithm is making its decisions based on false data that has been put in public domain? This is further complicated by the ambiguity over how the actual decision is made just by observing a few examples in few shot learning

对模型偏差和可解释性的担忧:鉴于GPT-3的庞大规模,公司很难解释该算法做出的决策。 公司无法调节用于训练算法的数据。 我们如何知道训练数据是否存在固有偏差或算法是否基于已放入公共领域的错误数据做出决策? 仅通过观察少量射击学习中的一些示例就对如何做出实际决策的模棱两可进一步使情况变得更加复杂

Need for regulations to prevent misuse: There are some very valid concerns being raised about misuse of GPT-3 powers if not properly regulated. How the AI community is able to come up with regulation that prohibits the misuse will govern to a large extent how the model gets accepted in the organizations where there is increased awareness on Responsible AI.

需要制定法规以防止滥用:如果监管不当,人们会非常关注GPT-3的滥用。 AI社区如何提出禁止滥用的法规,将在很大程度上决定该模型如何在人们对负责任AI意识增强的组织中被接受。

翻译自: https://medium.com/swlh/understanding-gpt-3-openais-latest-language-model-a3ef89cffac2

gpt 语言模型

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

gpt 语言模型_了解gpt 3 openais最新语言模型 的相关文章

随机推荐

  • shell 输出7的倍数

    题目链接 题目描述 写一个 bash脚本以输出数字 0 到 500 中 7 的倍数 0 7 14 21 的命令 最 的语言就是shell了 注意点 数学运算用 expr 命令 且 乘法 用 在前面进行转义 变量前得加个 bin bash l
  • GETH的安装和使用(Windows)

    目录 一 Geth介绍 二 Geth安装 1 下载安装 2 配置环境变量 三 Geth私有链搭建 1 创建创世块文件 2 初始化区块链 3 启动私有节点 四 账户交易 1 创建账户 2 挖矿操作 3 查看区块和奖励 4 转账交易 一 Get
  • openGL之API学习(八十)狭义的游戏引擎的定义

    狭义的游戏引擎的定义 wiki 图形渲染 粒子系统 物理系统 骨骼系统 角色系统 动画系统 场景管理 可视剔除 层次细节 界面模块 脚本接口 纹理模型资源管理 音频功能 网络模块 AI模块 视频功能 更新功能 多核支持 外围工具链
  • Cannot find template location: classpath:/templates/(please add some templates or check your Thymel)

    一 异常信息 Cannot find template location classpath templates please add some templates or check your Thymeleaf configuration
  • C++ shared_ptr和std::move

    以shared ptr为参数调用std move并用等于号时 会调用共享指针的移动构造函数 从而使原共享指针失效 include
  • 目标检测——Detectron2的学习笔记

    1 Detectron2的官方地址 https github com facebookresearch detectron2
  • ssh root账号远程连接出现Permission denied错误

    ssh root账号远程连接出现Permission denied错误 网上搜这个问题出来的问题大多都是修改sshd config中的PermitRootLogin yes 但在我修改之后依然无法登陆 搜到了另一个不知道原因的方法 把你要远
  • 李彦宏被《时代》周刊评为全球AI领袖

    北京时间9月7日晚 时代 周刊发布了首届全球百大AI人物 百度创始人 董事长兼首席执行官李彦宏被评为全球AI领袖 时代 肯定了李彦宏对AI的长期投入及百度在AI方面取得的多项成就 李彦宏是中国最杰出的未来主义者 长期投身于AI发展的浪潮 A
  • NLP领域的预训练模型(Transformer、BERT、GPT-2等)

    英文原文链接 https www analyticsvidhya com blog 2019 03 pretrained models get started nlp 1 介 绍 如今 自然语言处理 Natural Language Pro
  • 远景智能笔试

    第一次复习 操作系统只给一个进程一个时间片 不管你内部有多少个线程 我只给你一个 怎么分配是你的事情 第一次做 30 没有听过的题目 50 知道个大概的题目 20 有把握的题目 远景智能笔试题目链接 多线程和多进程 父子进程间遵循读时共享写
  • MySQL5.7与8.0数据库驱动配置区别

    MySQL5 7 spring datasource url jdbc mysql 192 168 31 200 3306 xxl job useUnicode true characterEncoding utf 8 useSSL fal
  • 深度学习Week9-YOLOv5-C3模块实现(Pytorch)

    本文为 365天深度学习训练营 中的学习记录博客 参考文章 Pytorch实战 第P8天 YOLOv5 C3模块实现 训练营内部成员可读 原作者 K同学啊 接辅导 项目定制 了解C3的结构 方便后续YOLOv5算法的学习 采用的数据集是天气
  • WPF疑难问题之Treeview中HierarchicalDataTemplate多级样式

    文章目录 一 问题场景 二 解决思路 一 问题场景 日常为 TreeView 自定义样式过程中 如果涉及到树形多级样式不同时 又该如何去做 例如树形显示文件夹和文件节点 TreeView 样式如下
  • React 组件生命周期

    组件的生命周期可分成三个状态 Mounting 已插入真实DOM Updating 正在被重新渲染 Unmounting 已移除真是DOM 生命周期的方法有 componentWill ount 在渲染前调用 在客户端也在服务端 compo
  • Win11 锁屏、开机画面使用window聚焦 壁纸失效解决方案

    1 设置 gt 个性化 gt 锁屏界面 gt 个性化锁屏界面 切换为图片 2 打开文件资源管理器 导航栏中点击查看 gt 显示 gt 勾选隐藏的项目 3 打开C 用户 你的用户 AppData Local Packages Microsof
  • Apache Mina学习笔记:JavaNIO框架Mina、Netty、Grizzly介绍与对比

    在开始代码编写 之前 最好先对现有的主流Java NIO框架作一个简单的了解 本文对Java NIO框架 Mina Netty Grizzly作简单的介绍与对比 我们先来看看三者之间的简介 Mina Mina Multipurpose In
  • cocos2d-x 自带动画

    cocos2d x 自带动画 声明 博文内容为自己整理笔记所用 有不对的地方还请大家指正 在cocos2dx 中的tests 项目里 已经带了 足够多的 动画事例 ActionsTest 我只是看了其中一些 自我感觉比较常用的动画 Acti
  • python的基础知识

    Python 的基础知识总结 一 元祖 Python的元组与列表类似 不同之处在于元组的元素不能修改 元组使用小括号 列表使用方括号 元组创建很简单 只需要在括号中添加元素 并使用逗号 隔开即可 例如 tup1 physics chemis
  • C++ fill()和fill_n()函数用法

    fill 和fill n函数是C Primer第十二章泛型算法部分内容 并把它们称为生成和变异算法 也就是说这两个函数只能对输入范围内已存在的元素进行操作 如果试图对空容器进行fill n操作 会导致严重的运行错误 所以在对元素进行写入操作
  • gpt 语言模型_了解gpt 3 openais最新语言模型

    gpt 语言模型 1 简介 1 Introduction If you have been following the recent developments in the NLP space then it would be almost