这看起来像是一个图表问题。
你可以尝试使用networkx https://networkx.org:
import networkx as nx
G = nx.from_pandas_edgelist(df, 'v1', 'v2')
clusters = nx.connected_components(G)
output:
[{'be', 'belong'}, {'delay', 'increase', 'decrease'}, {'analyze', 'assay'},
{'report', 'bespeak', 'circulate'}, {'induce', 'generate'}, {'trip', 'cause'},
{'distinguish', 'isolate'}, {'infect', 'give'}, {'prove', 'result'},
{'intercede', 'describe', 'explain'}, {'affect', 'expose'}, {'restrict', 'suppress'}]
如图:
在 jupyter 中绘制图表的小函数:
def nxplot(G):
from networkx.drawing.nx_agraph import to_agraph
A = to_agraph(G)
A.layout('dot')
A.draw('/tmp/graph.png')
from IPython.display import Image
return Image(filename='/tmp/graph.png')