我认为一个好的衡量标准是:
let the result set be s1,s2,...,sk
let MAX be max{sum(si) for each i}
f({s1,...,sk}) = Sigma(MAX-sum(si)) for each i)
好处是:完美的分布将始终产生 0!
缺点:如果没有完美的解决方案,最好的结果不会是0。
这个问题的贪心启发式是:
sorted<-sort(S) (let's say sorted[0] is the highest)
s1=s2=...=sk= {}
for each x in sorted:
s <- find_min() (*)
s.add(x)
其中 find_min() 产生 s,使得每个 si 的 sum(s)
该解决方案将产生 f(上面定义的指标),使得f(sol) <= (k-1)*max{S}
(这里是这个界限的证明):
claim:对于每个子集 s,MAX- sum(s) <= max{S}
proof- 通过归纳法:在每一步中,临时解决方案的断言都是正确的。
在每个步骤中,在迭代开始时(相加之前)让 MAX 为 max{sum(si)}!
base: the set of subsets at start is {},{},.. MAX=sum(si)=0 for each si.
step: assume the assumption is true for iteration i, we'll show it is also true for iteration i+1:
let s be the set that x was added to, then MAX-sum(s) <= max{S} (induction assumption).
if sum(s) + x <= MAX: we are done, MAX was not changed.
else: we sorted the elements at start, so x <= max{S}, and thus if s was chosen
(sum(si) >= sum(s) for each si) and sum(s) + x > MAX then: for each si, sum(si) + x >=
sum(s) + x, so sum(s)+x - sum(si) <= x <= max{S}. since sum(s)+x will be the MAX next
iteration, we are done.
因为对于每组MAX-sum(si) <= max{S}
(显然,对于最大集合,MAX-sum(si)=0
),总体而言Sigma(MAX-sum(si)) <= (k-1)*max{S}
, 按照承诺。
EDIT :
I had some spare time, so I programmed both heuristics suggested by me and by @Akhil, and both metrics, first of all, both results are conclusive (according to Wilcoxon's pair-t test), but which is better is defined by which metric you choose, surprisingly, the algorithm which tried to minimize f() (@Akhil`s) scored lower for this same f, but higher for the second metric.