Algorithmic complexity is the same, since O(logb n) = O(c log n) = O(log n) but the constant factors, which are hidden in big-O notation, could vary noticeably, depending on implementation and hardware.
B 树是为盘片硬盘设计的,它具有较长的访问时间(将磁头移动到位),然后读取整个物理扇区。使 B 树节点与扇区一样大可以最大限度地减少访问次数,并最大限度地提高每次读取操作的有用数据。
但是,如果您的内存不足,则访问时间可以忽略不计,因此更好的比较是计算算法访问的单个单词的数量。
For example, let's plan a data structure to store 220 keys of 1 word each, for a total of 4MiB of raw data on a 32bit machine.
A binary search tree will have 220 nodes, each holding one key and two pointers (3 words). Depth will be log2(220) = 20. The average search will have to read the key and one of the pointers from each node in its path, from the root all the way down = 40 words.
A B-tree made for hard disks will have 4kB nodes. Each node could be stored internally as a sorted array of key and pointer couples, between 256 and 512 of them. What is the average search going to look like? Considering an average 3/4 fill, each node will contain 384 entries, and its internal binary search will have to visit on average log2(384) = 5.95 keys. The average depth will be log384(220) = 2.33, so our search will have to read on average 2.33 times 5.95 keys, or about 14 words.
In the case of a low-fanout (branching factor) B-tree, with each node holding between 16 and 32 keys, the average fill will be 24 keys, the average depth log24(220) = 4.36, the binary search in each node will make log2(24) = 4.58 comparisons, and the overall average search will have to read about 20 words.
请记住,最后两个数据结构比二叉树获得了更好的结果,因为它们优化读取操作过度修改。要将键插入这些 B 树之一,您必须平均重写整个 384 字或 24 字节点(如果不超过一个),而在二叉树情况下,写入操作仍然只需要最多触摸 40 个单词。
(之前我错了。感谢@virco 和@Groo 在评论中指出我的错误。)
无论如何,看起来扇出较低的纯内存 B 树在实践中似乎比二叉树表现更好 http://panthema.net/2007/stx-btree/speedtest/.
每个节点 32 个密钥似乎是当前架构(32 位和 64 位)的最佳选择。许多较新的语言和库正在使用 32 键 B 树作为内置数据结构,与哈希表和数组一起使用或作为哈希表和数组的替代品。这种用法是由Clojure http://clojure.org/和其他函数式语言,但随后被 Javascript 等更主流的语言所采用,最近的重点是不可变数据结构(例如不可变.js https://facebook.github.io/immutable-js/)
这个结果不仅可以通过计算从内存读取的字数来解释,还可以通过高速缓存未命中来解释,高速缓存未命中是导致 CPU 停止并等待 RAM 的读取操作。如果缓存架构可以一次获取包含整个 B 树节点的 RAM 块,我们就可以获得与基于磁盘的大容量存储成功使用的相同优化。
在硬盘优化数据结构的情况下,我们将使用节点与物理磁盘扇区一样大的 B 树,以最大限度地减少磁盘访问时间。在本例中,我们使用 B 树,其节点与 3 级缓存对 RAM 执行的读取操作一样大,以最大限度地减少缓存未命中。