Model() 获得参数“nr_class”的多个值 - SpaCy 多分类模型(BERT 集成)

2024-04-09

您好,我正在致力于使用新的 SpaCy 模型实现多分类模型(5 类)en_pytt_bertbaseuncased_lg。新管道的代码在这里:

nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
    'pytt_textcat',
    config={
        "nr_class":5,
        "exclusive_classes": True,
    }
)
nlp.add_pipe(textcat, last = True)

textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")

训练代码如下,基于此处的示例(https://pypi.org/project/spacy-pytorch-transformers/ https://pypi.org/project/spacy-pytorch-transformers/):

def extract_cat(x):
    for key in x.keys():
        if x[key]:
            return key

# get names of other pipes to disable them during training
n_iter = 250 # number of epochs

train_data = list(zip(train_texts, [{"cats": cats} for cats in train_cats]))


dev_cats_single   = [extract_cat(x) for x in dev_cats]
train_cats_single = [extract_cat(x) for x in train_cats]
cats = list(set(train_cats_single))
recall = {}
for c in cats:
    if c is not None: 
        recall['dev_'+c] = []
        recall['train_'+c] = []



optimizer = nlp.resume_training()
batch_sizes = compounding(1.0, round(len(train_texts)/2), 1.001)

for i in range(n_iter):
    random.shuffle(train_data)
    losses = {}
    batches = minibatch(train_data, size=batch_sizes)
    for batch in batches:
        texts, annotations = zip(*batch)
        nlp.update(texts, annotations, sgd=optimizer, drop=0.2, losses=losses)
    print(i, losses)

所以我的数据结构如下所示:

[('TEXT TEXT TEXT',
  {'cats': {'class1': False,
    'class2': False,
    'class3': False,
    'class4': True,
    'class5': False}}), ... ]

我不确定为什么会出现以下错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-32-1588a4eadc8d> in <module>
     21 
     22 
---> 23 optimizer = nlp.resume_training()
     24 batch_sizes = compounding(1.0, round(len(train_texts)/2), 1.001)
     25 

TypeError: Model() got multiple values for argument 'nr_class'

EDIT:

如果我取出 nr_class 参数,我会在这里收到此错误:

ValueError: operands could not be broadcast together with shapes (1,2) (1,5)

我实际上认为会发生这种情况,因为我没有指定 nr_class 参数。那是对的吗?


这是我们发布的最新版本中的回归spacy-pytorch-transformers。为此事道歉!

根本原因是,这又是一个罪恶的案例。**kwargs。我期待着完善 spaCy API 以防止将来出现这些问题。

您可以在此处查看有问题的行:https://github.com/explosion/spacy-pytorch-transformers/blob/c1def95e1df783c69bff9bc8b40b5461800e9231/spacy_pytorch_transformers/pipeline/textcat.py#L71 https://github.com/explosion/spacy-pytorch-transformers/blob/c1def95e1df783c69bff9bc8b40b5461800e9231/spacy_pytorch_transformers/pipeline/textcat.py#L71。我们提供nr_class位置参数,它与您在配置过程中传入的显式参数重叠。

为了解决该问题,您只需删除nr_class钥匙从你的config你正在传入的字典spacy.create_pipe().

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Model() 获得参数“nr_class”的多个值 - SpaCy 多分类模型(BERT 集成) 的相关文章

随机推荐