我目前正在研究 NER 模型。我有一堆以 CoNLL 格式存储的数据,需要转换为 Spacy 格式。在 CoNLL 中,句子的每个单词旁边都有一个标签。在 Spacy 中,标签仅显示给具有实际标签的单词。我如何从下面的这种格式转换(CoNLL)
From O
2001 B-DateTime
to I-DateTime
2004 I-DateTime
, O
I O
was O
a O
stagehand O
for O
Hartford B-Company
Stage I-Company
Company O
. O
改成下面的格式(Spacy)
TRAIN_DATA = [('what is the price of polo?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of jeans?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of bat?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of shirt?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of bag?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of cup?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of jug?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of plate?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of glass?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of watch?', {'entities': [(21, 26, 'PrdName')]})]