另一种方法,使用from_pandas_edgelist https://networkx.github.io/documentation/stable/reference/generated/networkx.convert_matrix.from_pandas_edgelist.html and ancestors https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.dag.ancestors.html来自networkx https://networkx.github.io/documentation/latest/包裹:
import networkx as nx
# Create the Directed Graph
G = nx.from_pandas_edgelist(df,
source='Parent',
target='child',
create_using=nx.DiGraph())
# Create dict of nodes and ancestors
ancestors = {n: {n} | nx.ancestors(G, n) for n in df['child'].unique()}
# Convert dict back to DataFrame if necessary
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
print(df_ancestors)
[out]
node ancestry_tree
0 A1x2 [A1x2, Aw00, bc11]
1 bc11 [bc11, Aw00]
2 Aee1 [Aee1, Aee0]
要从输出表中过滤掉“中间的孩子”,您可以仅使用out_degree https://networkx.github.io/documentation/stable/reference/classes/generated/networkx.DiGraph.out_degree.html方法 - 最后一个孩子应该有一个 out_ Degree== 0
last_children = [n for n, d in G.out_degree() if d == 0]
ancestors = {n: {n} | nx.ancestors(G, n) for n in last_children}
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
[out]
node ancestry_tree
0 A1x2 [A1x2, Aw00, bc11]
1 Aee1 [Aee1, Aee0]