懒惰的、广度优先的一元玫瑰树展开是否可能？

2023-11-24

Data.Tree包括unfoldTreeM_BF and unfoldForestM_BF使用单子操作的结果来广度优先构造树的函数。树展开器可以使用森林展开器轻松编写，因此我将重点关注后者：

unfoldForestM_BF :: Monad m =>
             (b -> m (a, [b])) -> [b] -> m [Tree a]

从种子列表开始，它对每个种子应用一个函数，生成将产生树根和下一级别展开的种子的动作。使用的算法有些严格，因此使用unfoldForestM_BF与Identitymonad 与使用 pure 并不完全相同unfoldForest。我一直在试图找出是否有一种方法可以让它变得懒惰而不牺牲它O(n)有时间限制。如果（正如爱德华·克梅特向我建议的那样）这是不可能的，我想知道是否可以用更受约束的类型来做到这一点，特别需要MonadFix而不是Monad。那里的概念是（以某种方式）设置指向未来计算结果的指针，同时将这些计算添加到待办事项列表中，因此，如果它们对早期计算的影响感到懒惰，它们将立即可用。

我之前声称下面提出的第三种解决方案与深度优先具有相同的严格性unfoldForest，这是不正确的。

即使我们不需要MonadFix实例。当已知分支因子是有限的以及当已知分支因子“大”时，存在针对特殊情况的解决方案。我们将从一个运行的解决方案开始O(n)具有有限分支因子的树的时间，包括每个节点只有一个子节点的退化树。有限分支因子的解决方案将无法在具有无限分支因子的树上终止，我们将使用运行于O(n)“大”分支因子大于 1 的树（包括具有无限分支因子的树）的时间。 “大”分支因子的解决方案将运行在O(n^2)每个节点只有一个子节点或没有子节点的退化树上的时间。当我们将两个步骤中的方法结合起来尝试创建一个运行于O(n)对于任何分支因子，我们将得到一个比有限分支因子的第一个解决方案更惰性的解决方案，但无法适应从无限分支因子快速过渡到没有分支的树。

有限分支因子

总的想法是，我们将首先构建整个级别的所有标签以及下一个级别的森林种子。然后我们将进入下一个级别，构建所有这些。我们将把更深层次的成果汇集起来，为外部造林。我们将把标签和森林放在一起来建造树木。

unfoldForestM_BF相当简单。如果该关卡没有种子，则会返回。构建完所有标签后，它会获取每个森林的种子，并将它们收集到所有种子的列表中，以构建下一个级别并展开整个更深的级别。最后，它根据种子的结构为每棵树构建森林。

import Data.Tree hiding (unfoldTreeM_BF, unfoldForestM_BF)

unfoldForestM_BF :: Monad m => (b->m (a, [b])) -> [b] -> m [Tree a]
unfoldForestM_BF f []    = return []
unfoldForestM_BF f seeds = do
    level <- sequence . fmap f $ seeds
    let (labels, bs) = unzip level
    deeper <- unfoldForestM_BF f (concat bs)
    let forests = trace bs deeper
    return $ zipWith Node labels forests

trace从展平列表中重建嵌套列表的结构。假设有一个项目[b]对于任意位置的每个项目[[a]]。指某东西的用途concat ... trace扁平化有关祖先级别的所有信息会阻止此实现在具有无限子节点的树上工作。

trace :: [[a]] -> [b] -> [[b]]
trace []       ys = []
trace (xs:xxs) ys =
    let (ys', rem) = takeRemainder xs ys
    in   ys':trace xxs rem
    where
        takeRemainder []        ys  = ([], ys)
        takeRemainder (x:xs) (y:ys) = 
            let (  ys', rem) = takeRemainder xs ys
            in  (y:ys', rem)

展开一棵树与展开一片森林相比是微不足道的。

unfoldTreeM_BF :: MonadFix m => (b->m (a, [b])) -> b -> m (Tree a)
unfoldTreeM_BF f = (>>= return . head) . unfoldForestMFix_BF f . (:[])

大分支因子

大分支因子的解决方案与有限分支因子的解决方案大致相同，只是它保留了树的整个结构而不是concat将级别中的分支添加到单个列表中，并且trace荷兰国际集团的名单。除了import在上一节中使用过，我们将使用Compose将树的多个级别的函子组合在一起，并且Traversable to sequence跨越多层结构。

import Data.Tree hiding (unfoldForestM_BF, unfoldTreeM_BF)

import Data.Foldable
import Data.Traversable
import Data.Functor.Compose
import Prelude hiding (sequence, foldr)

而不是将所有祖先结构压平在一起concat我们将用Compose祖先和下一级的种子，并在整个结构上递归。

unfoldForestM_BF :: (Traversable t, Traceable t, Monad m) =>
                    (b->m (a, [b])) -> t b -> m (t (Tree a))
unfoldForestM_BF f seeds
    | isEmpty seeds = return (fmap (const undefined) seeds)
    | otherwise     = do
        level <- sequence . fmap f $ seeds
        deeper <- unfoldForestM_BF f (Compose (fmap snd level))
        return $ zipWithIrrefutable Node (fmap fst level) (getCompose deeper)

zipWithIrrefutable是一个懒惰的版本zipWith这依赖于这样的假设：第一个列表中的每个项目在第二个列表中都有一个项目。这Traceable结构是Functors可以提供一个zipWithIrrefutable。法律适用于Traceable适合每一个a, xs, and ys if fmap (const a) xs == fmap (const a) ys then zipWithIrrefutable (\x _ -> x) xs ys == xs and zipWithIrrefutable (\_ y -> y) xs ys == ys。它的严格性是针对每一个f and xs by zipWithIrrefutable f xs ⊥ == fmap (\x -> f x ⊥) xs.

class Functor f => Traceable f where
    zipWithIrrefutable :: (a -> b -> c) -> f a -> f b -> f c

如果我们已经知道两个列表具有相同的结构，我们可以惰性地组合它们。

instance Traceable [] where
    zipWithIrrefutable f []       ys    = []
    zipWithIrrefutable f (x:xs) ~(y:ys) = f x y : zipWithIrrefutable f xs ys

如果我们知道可以组合每个函子，我们就可以组合两个函子的组合。

instance (Traceable f, Traceable g) => Traceable (Compose f g) where
    zipWithIrrefutable f (Compose xs) (Compose ys) =
        Compose (zipWithIrrefutable (zipWithIrrefutable f) xs ys)

isEmpty检查节点的空结构以像模式匹配一样扩展[]解决了有限分支因子的问题。

isEmpty :: Foldable f => f a -> Bool
isEmpty = foldr (\_ _ -> False) True

聪明的读者可能会注意到zipWithIrrefutable from Traceable非常类似于liftA2这是定义的一半Applicative.

混合解决方案

混合解决方案结合了有限解决方案和“大”解决方案的方法。与有限解一样，我们将在每一步压缩和解压缩树表示。与“大”分支因子的解决方案一样，我们将使用允许跨过完整分支的数据结构。有限分支因子解决方案使用了到处都是扁平化的数据类型，[b]。 “大”分支因子解决方案使用了一种没有任何地方扁平化的数据类型：越来越多的嵌套列表以[b] then [[b]] then [[[b]]]等等。在这些结构之间将是嵌套列表，它们要么停止嵌套，要么只保存一个b或继续嵌套并保持[b]s。这种递归模式一般由Free monad.

data Free f a = Pure a | Free (f (Free f a))

我们将专门与Free []看起来像。

data Free [] a = Pure a | Free [Free [] a]

对于混合解决方案，我们将重复其所有导入和组件，以便下面的代码应该是完整的工作代码。

import Data.Tree hiding (unfoldTreeM_BF, unfoldForestM_BF)

import Data.Traversable
import Prelude hiding (sequence, foldr)

由于我们将与Free []，我们将为它提供一个zipWithIrrefutable.

class Functor f => Traceable f where
    zipWithIrrefutable :: (a -> b -> c) -> f a -> f b -> f c  

instance Traceable [] where
    zipWithIrrefutable f []       ys    = []
    zipWithIrrefutable f (x:xs) ~(y:ys) = f x y : zipWithIrrefutable f xs ys 

instance (Traceable f) => Traceable (Free f) where
    zipWithIrrefutable f (Pure x)  ~(Pure y ) = Pure (f x y)
    zipWithIrrefutable f (Free xs) ~(Free ys) =
        Free (zipWithIrrefutable (zipWithIrrefutable f) xs ys)

广度优先遍历看起来与有限分支树的原始版本非常相似。我们为当前级别构建当前标签和种子，压缩树的其余部分的结构，为剩余深度完成所有工作，并解压缩结果的结构以使森林与标签相匹配。

unfoldFreeM_BF :: (Monad m) => (b->m (a, [b])) -> Free [] b -> m (Free [] (Tree a))
unfoldFreeM_BF f (Free []) = return (Free [])
unfoldFreeM_BF f seeds = do
    level <- sequence . fmap f $ seeds
    let (compressed, decompress) = compress (fmap snd level)
    deeper <- unfoldFreeM_BF f compressed
    let forests = decompress deeper
    return $ zipWithIrrefutable Node (fmap fst level) forests

compress需要一个Free []保存森林的种子[b]并压平[b]进入Free得到一个Free [] b。它还返回一个decompress函数可用于撤消展平以恢复原始结构。我们压缩掉没有剩余种子的树枝和只向一种方向分枝的树枝。

compress :: Free [] [b] -> (Free [] b, Free [] a -> Free [] [a])
compress (Pure [x]) = (Pure x, \(Pure x) -> Pure [x])
compress (Pure xs ) = (Free (map Pure xs), \(Free ps) -> Pure (map getPure ps))
compress (Free xs)  = wrapList . compressList . map compress $ xs
    where    
        compressList []                 = ([], const [])
        compressList ((Free [],dx):xs) = let (xs', dxs) = compressList xs
                                         in  (xs', \xs -> dx (Free []):dxs xs)
        compressList (      (x,dx):xs) = let (xs', dxs) = compressList xs
                                         in  (x:xs', \(x:xs) -> dx x:dxs xs)
        wrapList ([x], dxs) = (x,             \x   -> Free (dxs [x]))
        wrapList (xs , dxs) = (Free xs, \(Free xs) -> Free (dxs xs ))

每个压缩步骤还返回一个函数，当应用于Free []具有相同结构的树。所有这些函数都是部分定义的；他们做什么Free []具有不同结构的树是未定义的。为了简单起见，我们还定义了偏函数的反函数Pure and Free.

getPure (Pure x)  = x
getFree (Free xs) = xs

Both unfoldForestM_BF and unfoldTreeM_BF通过将他们的论点包装成一个来定义Free [] b并假设结果具有相同的结构，对结果进行解包。

unfoldTreeM_BF :: MonadFix m => (b->m (a, [b])) -> b -> m (Tree a)
unfoldTreeM_BF f = (>>= return . getPure) . unfoldFreeM_BF f . Pure


unfoldForestM_BF :: MonadFix m => (b->m (a, [b])) -> [b] -> m [Tree a]
unfoldForestM_BF f = (>>= return . map getPure . getFree) . unfoldFreeM_BF f . Free . map Pure

通过认识到这一点，可以制作该算法的更优雅的版本>>= for a Monad嫁接在树上，两者Free and FreeT提供 monad 实例。两个都compress and compressList可能有更优雅的演示。

上面提出的算法还不够懒惰，不足以允许查询具有无限多种分支方式然后终止的树。一个简单的反例是以下生成函数0.

counterExample :: Int -> (Int, [Int])
counterExample 0 = (0, [1, 2])
counterExample 1 = (1, repeat 3)
counterExample 2 = (2, [3])
counterExample 3 = (3, [])

这棵树看起来像

0
|
+- 1
|  |
|  +- 3
|  |
|  `- 3
|  |
|  ...
|
`- 2
   |
   +- 3

尝试下降第二个分支（到2）并检查剩余的有限子树将无法终止。

Examples

以下示例证明了所有实现unfoldForestM_BF按广度优先顺序运行操作runIdentity . unfoldTreeM_BF (Identity . f)具有相同的严格性unfoldTree对于具有有限分支因子的树。对于具有无穷大分支因子的树，只有“大”分支因子的解决方案具有与unfoldTree。为了证明惰性，我们将定义三个无限树 - 具有一个分支的一元树，具有两个分支的二叉树，以及每个节点具有无限数量分支的无限树。

mkUnary :: Int -> (Int, [Int])
mkUnary x = (x, [x+1])

mkBinary :: Int -> (Int, [Int])
mkBinary x = (x, [x+1,x+2])

mkInfinitary :: Int -> (Int, [Int])
mkInfinitary x = (x, [x+1..])

和...一起unfoldTree，我们将定义unfoldTreeDF按照unfoldTreeM检查一下unfoldTreeM真的像你声称的那样懒惰unfoldTreeBF按照unfoldTreeMFix_BF检查新的实现是否同样懒惰。

import Data.Functor.Identity

unfoldTreeDF f = runIdentity . unfoldTreeM    (Identity . f)
unfoldTreeBF f = runIdentity . unfoldTreeM_BF (Identity . f)

为了获得这些无限树的有限部分，甚至是无限分支的树，我们将定义一种从树中获取的方法，只要它的标签与谓词匹配。这可以根据将函数应用于每个的能力来写得更简洁subForest.

takeWhileTree :: (a -> Bool) -> Tree a -> Tree a
takeWhileTree p (Node label branches) = Node label (takeWhileForest p branches)

takeWhileForest :: (a -> Bool) -> [Tree a] -> [Tree a]
takeWhileForest p = map (takeWhileTree p) . takeWhile (p . rootLabel)

这让我们定义九个示例树。

unary   = takeWhileTree (<= 3) (unfoldTree   mkUnary 0)
unaryDF = takeWhileTree (<= 3) (unfoldTreeDF mkUnary 0)
unaryBF = takeWhileTree (<= 3) (unfoldTreeBF mkUnary 0)

binary   = takeWhileTree (<= 3) (unfoldTree   mkBinary 0)
binaryDF = takeWhileTree (<= 3) (unfoldTreeDF mkBinary 0)
binaryBF = takeWhileTree (<= 3) (unfoldTreeBF mkBinary 0)

infinitary   = takeWhileTree (<= 3) (unfoldTree   mkInfinitary 0)
infinitaryDF = takeWhileTree (<= 3) (unfoldTreeDF mkInfinitary 0)
infinitaryBF = takeWhileTree (<= 3) (unfoldTreeBF mkInfinitary 0)

所有五种方法对于一元树和二元树都有相同的输出。输出来自putStrLn . drawTree . fmap show

0
|
`- 1
   |
   `- 2
      |
      `- 3

0
|
+- 1
|  |
|  +- 2
|  |  |
|  |  `- 3
|  |
|  `- 3
|
`- 2
   |
   `- 3

然而，对于具有无限分支因子的树来说，有限分支因子解的广度优先遍历不够惰性。其他四种方法输出整个树

0
|
+- 1
|  |
|  +- 2
|  |  |
|  |  `- 3
|  |
|  `- 3
|
+- 2
|  |
|  `- 3
|
`- 3

生成的树unfoldTreeBF因为有限分支因子解永远不可能完全超过其第一个分支。

0
|
+- 1
|  |
|  +- 2
|  |  |
|  |  `- 3
|  |
|  `- 3

建设肯定是广度优先。

mkDepths :: Int -> IO (Int, [Int])
mkDepths d = do
    print d
    return (d, [d+1, d+1])

mkFiltered :: (Monad m) => (b -> Bool) -> (b -> m (a, [b])) -> (b -> m (a, [b]))
mkFiltered p f x = do
    (a, bs) <- f x
    return (a, filter p bs)

binaryDepths = unfoldTreeM_BF (mkFiltered (<= 2) mkDepths) 0

Running binaryDepths在内部级别之前输出外部级别

从懒惰到彻头彻尾的懒惰

前面部分的混合解决方案还不够懒惰，无法具有与以下内容相同的严格语义：Data.Tree's unfoldTree。它是一系列算法中的第一个，每个算法都比其前身稍微懒惰一些，但没有一个懒惰到具有与前任相同的严格语义。unfoldTree.

混合解决方案不能保证探索树的一部分不需要探索同一棵树的其他部分。下面提供的代码也不会。在一个特殊但常见的情况下由 dfeuer 识别只探索一个log(N)有限树的大小切片会强制整个树。当探索具有恒定深度的树的每个分支的最后一个后代时，会发生这种情况。压缩树时，我们会丢弃所有没有后代的琐碎分支，这是避免的必要条件O(n^2)运行时间。如果我们能够快速证明一个分支至少有一个后代，那么我们只能懒洋洋地跳过这部分压缩，因此我们可以拒绝该模式Free []。在具有恒定深度的树的最大深度处，没有任何分支具有任何剩余的后代，因此我们永远不能跳过压缩步骤。这导致探索整个树以便能够访问最后一个节点。当由于无限分支因子而达到该深度的整个树是非有限时，探索树的一部分无法终止，而当它生成时会终止unfoldTree.

混合解决方案部分中的压缩步骤压缩掉在第一代中没有后代的分支，这对于压缩来说是最佳的，但对于惰性来说不是最佳的。我们可以通过延迟压缩发生的时间来使算法更加惰性。如果我们将其延迟一代（甚至任何恒定的代数），我们将维持O(n)时间上限。如果我们将其推迟几代人，这在某种程度上取决于N我们必然会牺牲O(N)有时间限制。在本节中，我们将把压缩延迟一代。

为了控制压缩的发生方式，我们将分离最里面的填充物[]进入Free []压缩具有 0 或 1 个后代的退化分支的结构。

因为这个技巧的一部分在压缩过程中如果没有大量的惰性就无法工作，所以我们将在任何地方采用偏执的过度懒惰的程度。如果除了元组构造函数之外还有任何关于结果的信息(,)可以在不强制其部分输入与模式匹配的情况下确定，我们将避免强制它，直到有必要。对于元组，任何与它们匹配的模式都会延迟执行。因此，下面的一些代码看起来像核心或更糟。

bindFreeInvertible取代Pure [b,...] with Free [Pure b,...]

bindFreeInvertible :: Free [] ([] b) -> (Free [] b, Free [] a -> Free [] ([] a))
bindFreeInvertible = wrapFree . go
    where
        -- wrapFree adds the {- Free -} that would have been added in both branches
        wrapFree ~(xs, dxs) = (Free xs, dxs)
        go (Pure xs) = ({- Free -} (map Pure xs), Pure . map getPure . getFree)
        go (Free xs) = wrapList . rebuildList . map bindFreeInvertible $ xs
        rebuildList = foldr k ([], const [])
        k ~(x,dx) ~(xs, dxs) = (x:xs, \(~(x:xs)) -> dx x:dxs xs)
        wrapList ~(xs, dxs) = ({- Free -} xs, \(~(Free xs)) -> Free (dxs xs)))

compressFreeList删除出现的Free []并取代Free [xs] with xs.

compressFreeList :: Free [] b -> (Free [] b, Free [] a -> Free [] a)
compressFreeList (Pure x) = (Pure x, id)
compressFreeList (Free xs) = wrapList . compressList . map compressFreeList $ xs
    where
        compressList = foldr k ([], const [])
        k ~(x,dx) ~(xs', dxs) = (x', dxs')
            where
                x' = case x of
                        Free []   -> xs'
                        otherwise -> x:xs'
                dxs' cxs = dx x'':dxs xs''
                    where
                        x'' = case x of
                            Free []   -> Free []
                            otherwise -> head cxs
                        xs'' = case x of
                            Free []   -> cxs
                            otherwise -> tail cxs
        wrapList ~(xs, dxs) = (xs', dxs')
            where
                xs' = case xs of
                        [x]       -> x
                        otherwise -> Free xs
                dxs' cxs = Free (dxs xs'')
                    where
                        xs'' = case xs of
                            [x]       -> [cxs]
                            otherwise -> getFree cxs

整体压缩不会束缚Pure []s into Free直到退化之后Frees 已被压缩掉，延迟了退化的压缩Free一代中引入了下一代压缩。

compress :: Free [] [b] -> (Free [] b, Free [] a -> Free [] [a])
compress xs = let ~(xs' , dxs' ) = compressFreeList xs
                  ~(xs'', dxs'') = bindFreeInvertible xs'
                  in (xs'', dxs' . dxs'')

出于持续的偏执，帮助者getFree and getPure也无可辩驳地变得懒惰。

getFree ~(Free xs) = xs
getPure ~(Pure x)  = x

这很快就解决了 dfeuer 发现的有问题的示例

print . until (null . subForest) (last . subForest) $
    flip unfoldTreeBF 0 (\x -> (x, if x > 5 then [] else replicate 10 (x+1)))

但由于我们只是将压缩延迟了1一代，如果最后一个分支的最后一个节点是，我们可以重新创建完全相同的问题1比所有其他分支更深。

print . until (null . subForest) (last . subForest) $
    flip unfoldTreeBF (0,0) (\(x,y) -> ((x,y), 
        if x==y
        then if x>5 then [] else replicate 9 (x+1, y) ++ [(x+1, y+1)]
        else if x>4 then [] else replicate 10 (x+1, y)))

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)