什么会导致算法具有 O(log n) 复杂度？

2023-11-22

我对大 O 的了解是有限的，当对数项出现在方程中时，它让我更加困惑。

有人可以用简单的语言向我解释一下什么是O(log n)算法是？对数从何而来？

当我试图解决这个期中练习题时，特别出现了这个问题：

令 X(1..n) 和 Y(1..n) 包含两个整数列表，每个列表均按非降序排序。给出一个 O(log n) 时间算法来查找所有 2n 个组合元素的中值（或第 n 个最小整数）。例如，X = (4, 5, 7, 8, 9) 且 Y = (3, 5, 8, 9, 10)，则 7 是组合列表 (3, 4, 5, 5, 7) 的中位数、8、8、9、9、10）。 [提示：使用二分搜索的概念]

我必须承认，当你第一次看到 O(log n) 算法时，这很奇怪……这个对数到底是从哪里来的？然而，事实证明，有几种不同的方法可以让对数项以大 O 表示法显示。以下是一些：

反复除以一个常数

取任意数n；比如说，16。在得到小于或等于 1 的数之前，你能将 n 除以 2 多少次？对于 16 岁，我们有

Notice that this ends up taking four steps to complete. Interestingly, we also have that log₂ 16 = 4. Hmmm... what about 128?

128 / 2 = 64
 64 / 2 = 32
 32 / 2 = 16
 16 / 2 = 8
  8 / 2 = 4
  4 / 2 = 2
  2 / 2 = 1

This took seven steps, and log₂ 128 = 7. Is this a coincidence? Nope! There's a good reason for this. Suppose that we divide a number n by 2 i times. Then we get the number n / 2ⁱ. If we want to solve for the value of i where this value is at most 1, we get

n / 2ⁱ ≤ 1

n ≤ 2ⁱ

log₂ n ≤ i

In other words, if we pick an integer i such that i ≥ log₂ n, then after dividing n in half i times we'll have a value that is at most 1. The smallest i for which this is guaranteed is roughly log₂ n, so if we have an algorithm that divides by 2 until the number gets sufficiently small, then we can say that it terminates in O(log n) steps.

An important detail is that it doesn't matter what constant you're dividing n by (as long as it's greater than one); if you divide by the constant k, it will take log_k n steps to reach 1. Thus any algorithm that repeatedly divides the input size by some fraction will need O(log n) iterations to terminate. Those iterations might take a lot of time and so the net runtime needn't be O(log n), but the number of steps will be logarithmic.

那么这是从哪里出现的呢？一个经典的例子是二分查找，一种用于在排序数组中搜索值的快速算法。该算法的工作原理如下：

如果数组为空，则返回该元素不存在于数组中。
Otherwise:
- 查看数组的中间元素。
- 如果它等于我们要查找的元素，则返回成功。
- If it's greater than the element we're looking for:
  - 丢弃数组的后半部分。
  - Repeat
- If it's less than the the element we're looking for:
  - 丢弃数组的前半部分。
  - Repeat

例如，要在数组中搜索 5

1   3   5   7   9   11   13

我们首先看中间的元素：

1   3   5   7   9   11   13
            ^

由于 7 > 5，并且由于数组已排序，因此我们知道数字 5 不可能位于数组的后半部分，因此我们可以将其丢弃。这留下

1   3   5

现在我们看看中间的元素：

1   3   5
    ^

由于3

我们再次查看该数组的中间：

        5
        ^

由于这正是我们要查找的数字，因此我们可以报告 5 确实在数组中。

那么这有多有效呢？好吧，在每次迭代中，我们都会丢弃至少一半的剩余数组元素。一旦数组为空或者我们找到了我们想要的值，算法就会停止。在最坏的情况下，元素不存在，因此我们不断将数组的大小减半，直到用完元素。这需要多长时间？好吧，由于我们一遍又一遍地将数组切成两半，所以我们最多会在 O(log n) 次迭代中完成，因为在运行之前我们不能将数组切成两半超过 O(log n) 次超出数组元素。

遵循一般技术的算法分而治之（将问题切成碎片，解决这些碎片，然后将问题重新组合在一起）出于同样的原因往往会在其中使用对数项 - 你不能将某个对象切成两半超过 O(log n) 次。您可能想看看归并排序就是一个很好的例子。

一次处理一位数字的值

How many digits are in the base-10 number n? Well, if there are k digits in the number, then we'd have that the biggest digit is some multiple of 10^k. The largest k-digit number is 999...9, k times, and this is equal to 10^{k + 1} - 1. Consequently, if we know that n has k digits in it, then we know that the value of n is at most 10^{k + 1} - 1. If we want to solve for k in terms of n, we get

n ≤ 10^k+1 - 1

n + 1 ≤ 10^k+1

log₁₀ (n + 1) ≤ k + 1

(log₁₀ (n + 1)) - 1 ≤ k

由此我们得知 k 大约是 n 的以 10 为底的对数。换句话说，n 中的位数是 O(log n)。

例如，让我们考虑一下将两个大数字相加的复杂性，这两个数字太大而无法放入机器字中。假设这些数字以 10 为基数表示，我们将这些数字称为 m 和 n。添加它们的一种方法是通过小学方法 - 一次写出一位数字，然后从右到左进行计算。例如，要将 1337 和 2065 相加，我们首先将数字写为

    1  3  3  7
+   2  0  6  5
==============

我们添加最后一位数字并进位 1：

          1
    1  3  3  7
+   2  0  6  5
==============
             2

然后我们添加倒数第二个（“倒数第二个”）数字并携带 1：

       1  1
    1  3  3  7
+   2  0  6  5
==============
          0  2

接下来，我们添加倒数第三个（“倒数第二个”）数字：

       1  1
    1  3  3  7
+   2  0  6  5
==============
       4  0  2

最后，我们添加倒数第四个（“preantepenultimate”......我喜欢英语）数字：

       1  1
    1  3  3  7
+   2  0  6  5
==============
    3  4  0  2

现在，我们做了多少工作？我们对每个数字总共执行 O(1) 次工作（即恒定量的工作），需要处理的数字总数为 O(max{log n, log m})。这总共给出了 O(max{log n, log m}) 复杂度，因为我们需要访问两个数字中的每个数字。

许多算法通过在某个基数中一次处理一位数字来获得 O(log n) 项。一个经典的例子是基数排序，一次对整数进行一位数排序。基数排序有多种形式，但它们通常运行时间为 O(n log U)，其中 U 是要排序的最大可能整数。原因是每次排序都需要 O(n) 时间，并且总共需要 O(log U) 次迭代来处理正在排序的最大数字的每个 O(log U) 数字。许多先进的算法，例如Gabow 的最短路径算法或缩放版本Ford-Fulkerson 最大流量算法，其复杂性有一个对数项，因为它们一次只处理一位数字。

至于你如何解决这个问题的第二个问题，你可能想看看这个相关问题它探索了更高级的应用程序。鉴于此处描述的问题的一般结构，当您知道结果中存在对数项时，您现在可以更好地了解如何思考问题，因此我建议您在给出答案之前不要查看答案一些想法。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)