Update
经过OP的澄清后,最大纯行问题是查找切换后变为 00...0 或 11...1 的最大行数。我已相应更新了我的解决方案。
请注意,我们有以下事实:
-
If two rows ri and rj reduce to a pure row after toggling, then we must have ri = rj to start with.
-
If ri ≠ rj and ri overlaps rj (i.e. some of their corresponding column are the same), then both of them cannot map to a pure row.
上述两个事实直接来自以下观察:
Max number of "pure" rows is the same as the max number of identical rows
Proof
我们声称构成最大纯问题解的所有行在矩阵 M 中必须相同。
Suppose we are given a m-by-n matrix M, and we have found a solution of the max-pure row problem. Let rows ri and rj be two arbitrary rows that get reduce to pure rows after toggling.
Observe that after all the necessary toggling operation on the columns (denote by σ1, σ2, ..., σk), ri and rj are both "pure" rows. i.e. We have the following:
σ1(σ2(...(σk(ri)...)) = σ1(σ2(...(σk(rj)...)) = 00...0
or
σ1(σ2(...(σk(ri)...)) = σ1(σ2(...(σk(rj)...)) = 11...1
So after applying all these toggling operations, ri and rj will equal each other. If we undo the very last toggling (i.e. we toggling the same column entry of these rows), it is obviously that both ri and rj will still map to the same output. i.e. We have the following:
σ2(σ3(...(σk(ri)...)) = σ2(σ3(...(σk(rj)...))
If we we continue undoing the toggling operations, we can conclude that ri = rj. In other words, if you pick any arbitrary rows from a solution of the max-pure problem, these rows must be identical in the beginning.
Idea
Given a row ri, if it can be reduce to the pure row, say 00...0, then we know that another row rj cannot be reduced to 11...1 if ri overlaps with rj (from fact 2 above). We can only hope that another row rk which does not overlap with ri to reduce to 11...1.
算法
根据前面的想法,我们可以有以下简单的算法来解决最大纯行问题。
We first scan over the rows of matrix M, and then find all the unique rows of the matrix (denote by s1, s2, ..., sk). We let count(si)
denotes the number of times si appears in M.
We then loop over all the pairs (si, sj) to determine the max-pure row number as below:
int maxCount = 0;
for each row si:
for each sj ≠ si:
if (sj overlaps si)
continue;
else
if (count(si) + count(sj) > maxCount)
// We have found a better pair
maxCount = count(si) + count(sj);
return maxCount;
We are doing O(n)
works in the inner for loop (for entry-wise checking whether two rows overlap), and the loops are over O(m2)
rows in the worst-case, so the running time of the algorithm is O(nm2)
.