这称为离散 p 色散 (maxmin) 问题。
最优性界被证明为白 (1991) https://academic.oup.com/imaman/article-abstract/3/2/131/723484 and 拉维等人。 (1994) https://pdfs.semanticscholar.org/1634/b7d3dd67daf1067d97185319c44f8a7d227e.pdf给出问题的因子 2 近似值,后者证明这种启发式是最好的(除非 P=NP)。
因子 2 近似
因子 2 近似如下:
Let V be the set of nodes/objects.
Let i and j be two nodes at maximum distance.
Let p be the number of objects to choose.
P = set([i,j])
while size(P)<p:
Find a node v in V-P such that min_{v' in P} dist(v,v') is maximum.
\That is: find the node with the greatest minimum distance to the set P.
P = P.union(v)
Output P
你可以像这样在 Python 中实现它:
#!/usr/bin/env python3
import numpy as np
p = 50
N = 400
print("Building distance matrix...")
d = np.random.rand(N,N) #Random matrix
d = (d + d.T)/2 #Make the matrix symmetric
print("Finding initial edge...")
maxdist = 0
bestpair = ()
for i in range(N):
for j in range(i+1,N):
if d[i,j]>maxdist:
maxdist = d[i,j]
bestpair = (i,j)
P = set()
P.add(bestpair[0])
P.add(bestpair[1])
print("Finding optimal set...")
while len(P)<p:
print("P size = {0}".format(len(P)))
maxdist = 0
vbest = None
for v in range(N):
if v in P:
continue
for vprime in P:
if d[v,vprime]>maxdist:
maxdist = d[v,vprime]
vbest = v
P.add(vbest)
print(P)
精确解
您还可以将其建模为 MIP。对于p=50,n=400,6000s后,最优性差距仍然是568%。近似算法需要 0.47 秒才能获得 100%(或更小)的最优性差距。一个简单的 Gurobi Python 表示可能如下所示:
#!/usr/bin/env python
import numpy as np
import gurobipy as grb
p = 50
N = 400
print("Building distance matrix...")
d = np.random.rand(N,N) #Random matrix
d = (d + d.T)/2 #Make the matrix symmetric
m = grb.Model(name="MIP Model")
used = [m.addVar(vtype=grb.GRB.BINARY) for i in range(N)]
objective = grb.quicksum( d[i,j]*used[i]*used[j] for i in range(0,N) for j in range(i+1,N) )
m.addConstr(
lhs=grb.quicksum(used),
sense=grb.GRB.EQUAL,
rhs=p
)
# for maximization
m.ModelSense = grb.GRB.MAXIMIZE
m.setObjective(objective)
# m.Params.TimeLimit = 3*60
# solving with Glpk
ret = m.optimize()
Scaling
显然,初始点的 O(N^2) 缩放很糟糕。通过认识到该对必须位于数据集的凸包上,我们可以更有效地找到它们。这给了我们一个O(N log N)找到该对的方法。一旦找到它,我们就像以前一样继续(使用 SciPy 进行加速)。
最好的缩放方法是使用 R* 树来有效地找到候选点 p 和集合 P 之间的最小距离。但这在 Python 中无法有效完成,因为for
仍然涉及循环。
import numpy as np
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist
p = 300
N = 16000000
# Find a convex hull in O(N log N)
points = np.random.rand(N, 3) # N random points in 3-D
# Returned 420 points in testing
hull = ConvexHull(points)
# Extract the points forming the hull
hullpoints = points[hull.vertices,:]
# Naive way of finding the best pair in O(H^2) time if H is number of points on
# hull
hdist = cdist(hullpoints, hullpoints, metric='euclidean')
# Get the farthest apart points
bestpair = np.unravel_index(hdist.argmax(), hdist.shape)
P = np.array([hullpoints[bestpair[0]],hullpoints[bestpair[1]]])
# Now we have a problem
print("Finding optimal set...")
while len(P)<p:
print("P size = {0}".format(len(P)))
distance_to_P = cdist(points, P)
minimum_to_each_of_P = np.min(distance_to_P, axis=1)
best_new_point_idx = np.argmax(minimum_to_each_of_P)
best_new_point = np.expand_dims(points[best_new_point_idx,:],0)
P = np.append(P,best_new_point,axis=0)
print(P)