我有一个大型同义词数据集(10000+)作为元组列表,如下所示:
data = [
(435347,'cat'),
(435347,'feline'),
(435347,'lion'),
(6765756,'dog'),
(6765756,'hound'),
(6765756,'puppy'),
(435347,'kitten'),
(987977,'frog')
]
其中每个同义词由任意共享 ID 标识,在本例中435347
, 6765756
, and 987977
.
我想编写一个使数据看起来像这样的函数:
processed_data = [
(435347,'cat','feline','lion','kitten'),
(6765756,'dog','hound','puppy'),
(987977,'frog')
]
任何建议将不胜感激!