Re 40：读论文 GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and

2023-05-16

诸神缄默不语-个人CSDN博文目录

论文名称：GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and Slot Filling
论文下载地址：https://aclanthology.org/2021.acl-long.15或https://arxiv.org/abs/2106.01925
论文官方GitHub项目地址：yizhen20133868/GL-GIN

本文是2021年ACL论文，关注意图识别（intent detection）和槽填充（slot filling）任务，这两个任务本来就常常被联合建模，以利用其间的关系。
场景是multi-intent SLU（每个句子需要识别多个意图），以前的工作用的都是AR范式，缺点是推理慢、信息泄露（……其实我不是很懂为什么会泄露，从左到右还泄露个啥？），本文是第一篇用NAR范式做这个任务的工作。

文章目录

1. 背景介绍
2. Global-Locally Graph-Interaction Network (GLGIN)
- 2.1 Self-attentive Encoder
- 2.2 Token-Level Intent Detection Decoder
- 2.3 Slot Filling Decoder
- - 2.3.1 Slot-aware LSTM
  - 2.3.2 Global-locally Graph Interaction Layer
  - 2.3.3 Slot Prediction
- 2.4 Joint Training
3. 实验
- 3.1 数据集
- 3.2 baseline
- 3.3 实验设置
- 3.4 主实验结果
- 3.5 模型分析
- - 3.5.1 速度
  - 3.5.2 ablation study
  - 3.5.3 可视化
  - 3.5.4 定性分析 - 案例分析
  - 3.5.5 预训练模型的功效
4. 代码复现

1. 背景介绍

AR和NAR的区别：
multiple intent detection：对每句话识别出多个意图（多任务文本分类的感觉）
槽填充slot filling：序列标注任务。比如需要知道用户要在什么时间订房，就在输入句中标注时间

在这里插入图片描述

2. Global-Locally Graph-Interaction Network (GLGIN)

联合训练，同时用NAR范式生成目标和槽
逻辑上是先表征输入文本，然后预测intent，然后再用intent预测的信息来预测slot，整个过程是耦合的
在这里插入图片描述

（图右下角）local slot-aware graph layer：槽隐藏层互联，建模槽依赖关系，缓解NAR造成的uncoordinated slot problem¹

（图右上角）global intent-slot interaction layer：sentence-level intent-slot interaction
global graph由all tokens with multiple intents构造，并行生成槽序列，加速解码过程
（以前的工作仅考虑 token-level intent-slot interaction）

加上预训练语言模型RoBerta²后效果更好

参考了：A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding

2.1 Self-attentive Encoder

BiLSTM + self-attention
BiLSTM的输出和self-attention的输出进行concat

2.2 Token-Level Intent Detection Decoder

对每个token预测多个意图，然后在所有token上投票，得到最终的结果

过BiLSTM（E是输入表征）
预测token对应的意图（2层MLP）：
得到最终意图：

直接投票（超过一半token支持该意图）

2.3 Slot Filling Decoder

2.3.1 Slot-aware LSTM

在这里插入图片描述

2.3.2 Global-locally Graph Interaction Layer

Vanilla Graph Attention Network
masked self-attention layers
Local Slot-aware Graph Interaction Layer
构图：节点是每一个word slot（用对应slot的隐藏层表征初始化），边是slot连接一个窗口内的所有其他slot
Information Aggregation：
Global Slot-Intent Graph Interaction Layer
可以实现并行输出output slot sequences（说是这么个意思，但是我记得transformer和GNN等价来着³，所以跟直接用transformer是不是差不多啊……）
构图：
- 节点
  意图和slot token（上一层得到的）
  （这一部分的叙述我总感觉有点问题，但是总之：）
- 边
  - intent-slot connection：slot连接所有被预测的intent
  - slot-slot connection：以指定window size连接相邻slot（slot依赖和incorporate the bidirectional contextual information）
  - intent-intent connection：全连接
    参考：⁴
Information Aggregation：大概来说就是分别把slot和intent的信息聚合后求和

2.3.3 Slot Prediction

用slot的表征过MLP：
在这里插入图片描述

2.4 Joint Training

参考Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
2个任务损失函数的加权求和

在这里插入图片描述

3. 实验

3.1 数据集

数据集都下载自⁴的官方项目：https://github.com/LooperXX/AGIF

MixSNIPS⁵
Mix-ATIS⁶

GL-GIN处理后的数据样本格式都是：

list O
california B-state_name
airports O
, O
list O
la B-city_name
and O
how O
many O
canadian B-airline_name
airlines I-airline_name
international I-airline_name
flights O
use O
aircraft O
320 B-aircraft_code
atis_airport#atis_city#atis_quantity

3.2 baseline

略。

3.3 实验设置

略。

3.4 主实验结果

对具体指标的介绍略。
在这里插入图片描述

3.5 模型分析

3.5.1 速度

在这里插入图片描述

3.5.2 ablation study

在这里插入图片描述

3.5.3 可视化

Global intent-slot GAL的注意力值
在这里插入图片描述

3.5.4 定性分析 - 案例分析

在这里插入图片描述

3.5.5 预训练模型的功效

在这里插入图片描述

用Roberta²替换Self-attentive Encoder，进行微调
consider the first subword label if a word is broken into multiple subwords（参考⁷）

4. 代码复现

我是直接用之前配好的一个有PyTorch的环境，然后单独安了所需的包：
pip install fitlog
pip install ordered-set

直接使用GitHub的README文件给出的执行代码即可：
MixATIS数据集的运行命令（跑了5个小时）：python train.py -g -bs=16 -dd=./data/MixATIS_clean -sd=./save/MixATIS_clean -nh=4 -wed=128 -ied=128 -ehd=256 -sdhd=128 -dghd=64 -nldg=2 -sgw=2 -ne=200
运行结果：slot f1: 0.8737, intent f1: 0.785428378059492, intent acc: 0.7729468599033816, exact acc: 0.4190821256038647

MixSNIPS数据集的运行命令：python train.py -g -bs=16 -dd=./data/MixSNIPS_clean -sd=./save/MixSNIPS_clean -nh=8 -wed=64 -ied=128 -ehd=256 -sdhd=128 -dghd=128 -nldg=2 -sgw=1 -ne=100
运行结果：
在这里插入图片描述

这个代码写得非常清晰，我辈楷模！

我看作者是哈工大的，为什么要用复旦开发的贼难用的fitlog，小编也很好奇！
我写的fitlog笔记博文：fitlog使用教程（持续更新ing…）
train.py：代码入口
config.py：超参设置（也可以通过命令行传参）
loader.py
1. DatasetManager
  1. __init__：词/slot/intent分别建立Alphabet、raw text、序列化text
  2. quick_build：把3个数据集文件里的数据添加到__init__中构建的三个实例中，新建保存路径，保存Alphabet到本地
  3. add_file：把单个数据集文件里的数据添加到__init__中构建的三个实例中
  4. __read_file：读取单个文件里的数据（返回texts, slots, intents的原始文本）
  5. show_summary：打印数据集和超参
2. Alphabet：感觉可以理解成泛词典这种感觉
  1. __init__
  2. add_instance：把token加进对象（#是考虑到类似atis_capacity#atis_flight）
  3. save_content：保存对象到本地
module.py
1. ModelManager：整个模型
  1. __intent_decoder：2层MLP
  2. __intent_embedding：这个是构图时候用的intent表征
  3. generate_global_adj_gat
  4. generate_slot_adj_gat
2. Encoder：LSTMEncoder和SelfAttention的输出concat起来
3. LSTMEncoder：Dropout→单层Bi-LSTM
4. SelfAttention：Dropout→QKVAttention
5. QKVAttention：QKV分别进行线性转换，然后Q×K^T，过softmax，开方（多头），×V，过dropout
6. LSTMDecoder：slot过GAT后和intent合并，再过一次GAT，过2层MLP，取出slot表征
  1. __slot_graph：GAT
  2. __global_graph：GAT
7. GAT：这个是作者手动实现的
8. GraphAttentionLayer

别的我就不写了。

可参考我写的另一篇博文：BIO序列标注中标签不协调的问题及其解决方案 ↩︎
RoBERTa: A Robustly Optimized BERT Pretraining Approach ↩︎ ↩︎
可以参考一篇博文。
中文版：原来Transformer就是一种图神经网络，这个概念你清楚吗？
英文原文：Transformers are Graph Neural Networks | NTU Graph Deep Learning Lab ↩︎
AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling ↩︎ ↩︎
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces ↩︎
The ATIS Spoken Language Systems Pilot Corpus ↩︎
A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding ↩︎

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)