作为叶子集合的递归集合的树
让我们暂时搁置 newick 表示,并考虑该问题的可能的 python 表示。
一棵有根的树可以被视为一棵集合的递归层次结构(组(组...))叶子。集合是无序的,这非常适合描述树中的分支:{{{"A", "B"}, {"C", "D"}}, "E"}
应该是一样的{{{"C", "D"}, {"B", "A"}}, "E"}
.
如果我们考虑初始的叶子集{"A", "B", "C", "D", "E"}
,以“E”为外群的树是以下形式的集合的集合{tree, "E"}
where tree
s 取自可以从叶子集合构建的树集合{"A", "B", "C", "D"}
。我们可以尝试写一个递归trees
函数来生成这组树,我们的总树集将表示如下:
{{tree, "E"} for tree in trees({"A", "B", "C", "D"})}
(这里,我使用的是集合理解 https://en.wikipedia.org/wiki/List_comprehension#Set_comprehension符号。)
实际上,python 不允许集合的集合,因为集合的元素必须是“可散列的”(也就是说,python 必须能够计算对象的一些“散列”值,以便能够检查它们是否属于集)。碰巧Python集合没有这个属性。幸运的是,我们可以使用一个类似的数据结构,名为frozenset https://docs.python.org/3/library/stdtypes.html#frozenset,其行为非常像一个集合,但无法修改并且是“可散列的”。因此,我们的树集将是:
all_trees = frozenset(
{frozenset({tree, "E"}) for tree in trees({"A", "B", "C", "D"})})
实施trees
功能
现在让我们重点关注trees
功能。
对于每一个可能的分割(分解为一组不相交的子集,包括所有元素)的叶子集,我们需要为分区的每个部分找到所有可能的树(通过递归调用)。对于给定的分区,我们将为跨其各个部分的子树的每个可能组合创建一棵树。
例如,如果一个分区是{"A", {"B", "C", "D"}}
,我们将考虑可以由部分组成的所有可能的树"A"
(其实只是叶子"A"
本身),以及可以由部分组成的所有可能的树{"B", "C", "D"}
(那是,trees({"B", "C", "D"})
)。然后,通过获取其中一个元素仅来自的所有可能对来获得该分区的可能树"A"
,另一个来自trees({"B", "C", "D"})
.
这可以推广到具有两个以上部分的分区,并且product
函数来自itertools
这里似乎很有用。
因此,我们需要一种方法来生成一组叶子的可能分区。
生成集合的分区
这里我做了一个partitions_of_set
函数改编自这个解决方案 https://stackoverflow.com/a/30134039/1878788:
# According to https://stackoverflow.com/a/30134039/1878788:
# The problem is solved recursively:
# If you already have a partition of n-1 elements, how do you use it to partition n elements?
# Either place the n'th element in one of the existing subsets, or add it as a new, singleton subset.
def partitions_of_set(s):
if len(s) == 1:
yield frozenset(s)
return
# Extract one element from the set
# https://stackoverflow.com/a/43804050/1878788
elem, *_ = s
rest = frozenset(s - {elem})
for partition in partitions_of_set(rest):
for subset in partition:
# Insert the element in the subset
try:
augmented_subset = frozenset(subset | frozenset({elem}))
except TypeError:
# subset is actually an atomic element
augmented_subset = frozenset({subset} | frozenset({elem}))
yield frozenset({augmented_subset}) | (partition - {subset})
# Case with the element in its own extra subset
yield frozenset({elem}) | partition
为了检查获得的分区,我们创建一个函数来使它们更容易显示(这对于稍后创建树的 newick 表示也很有用):
def print_set(f):
if type(f) not in (set, frozenset):
return str(f)
return "(" + ",".join(sorted(map(print_set, f))) + ")"
我们测试分区是否有效:
for partition in partitions_of_set({"A", "B", "C", "D"}):
print(len(partition), print_set(partition))
Output:
1 ((A,B,C,D))
2 ((A,B,D),C)
2 ((A,C),(B,D))
2 ((B,C,D),A)
3 ((B,D),A,C)
2 ((A,B,C),D)
2 ((A,B),(C,D))
3 ((A,B),C,D)
2 ((A,D),(B,C))
2 ((A,C,D),B)
3 ((A,D),B,C)
3 ((A,C),B,D)
3 ((B,C),A,D)
3 ((C,D),A,B)
4 (A,B,C,D)
实际代码trees
功能
现在我们可以写出tree
功能:
from itertools import product
def trees(leaves):
if type(leaves) not in (set, frozenset):
# It actually is a single leaf
yield leaves
# Don't try to yield any more trees
return
# Otherwise, we will have to consider all the possible
# partitions of the set of leaves, and for each partition,
# construct the possible trees for each part
for partition in partitions_of_set(leaves):
# We need to skip the case where the partition
# has only one subset (the initial set itself),
# otherwise we will try to build an infinite
# succession of nodes with just one subtree
if len(partition) == 1:
part, *_ = partition
# Just to be sure the assumption is correct
assert part == leaves
continue
# We recursively apply *tree* to each part
# and obtain the possible trees by making
# the product of the sets of possible subtrees.
for subtree in product(*map(trees, partition)):
# Using a frozenset guarantees
# that there will be no duplicates
yield frozenset(subtree)
测试它:
all_trees = frozenset(
{frozenset({tree, "E"}) for tree in trees({"A", "B", "C", "D"})})
for tree in all_trees:
print(print_set(tree) + ";")
Output:
(((B,C),A,D),E);
((((A,B),D),C),E);
((((B,D),A),C),E);
((((C,D),A),B),E);
(((A,D),B,C),E);
((A,B,C,D),E);
((((B,D),C),A),E);
(((A,B,C),D),E);
((((A,C),B),D),E);
((((C,D),B),A),E);
((((B,C),A),D),E);
(((A,B),C,D),E);
(((A,C),(B,D)),E);
(((B,D),A,C),E);
(((C,D),A,B),E);
((((A,B),C),D),E);
((((A,C),D),B),E);
(((A,C,D),B),E);
(((A,D),(B,C)),E);
((((A,D),C),B),E);
((((B,C),D),A),E);
(((A,B),(C,D)),E);
(((A,B,D),C),E);
((((A,D),B),C),E);
(((A,C),B,D),E);
(((B,C,D),A),E);
我希望结果是正确的。
这种方法要正确执行有点棘手。我花了一些时间才弄清楚如何避免无限递归(当分区时会发生这种情况){{"A", "B", "C", "D"}}
).