我在尝试着assign
unique
值在pandas
df
给特定的个人。
For the df
below, [Area]
and [Place]
会一起弥补unique
不同的价值观jobs。这些值将分配给个人,总体目标是使用尽可能少的个人。
诀窍在于这些值不断地开始和结束,并且持续不同的时间长度。最多unique
任何一次分配给个人的价值是3. [On]
显示 [Place] 和 [Area] 当前出现的唯一值的数量。
因此,这为我需要多少人提供了具体的指导。例如3unique
值 1 = 1 人,6 个唯一值 = 2 人
我不能做一个groupby
声明我在哪里assign
首先3 unique values
to individual 1
以及接下来的 3 个unique
价值观individual 2
etc.
我设想的是,当unique
值大于 3 我想将值分组[Area]
首先,然后将剩菜合并。所以看看assign
相同的值[Area]
给个人(最多 3 个)。那么,如果有_leftover_
values (<3),如有可能,应将它们组合成 3 个一组。
我设想这项工作的方式是:展望未来 by an hour
。对于每一个新row
值的script
应该看看有多少个值[On]
(这表明总共需要多少人)。在哪里unique
值>3,它们应该是assigned
by grouping
相同的值在[Area]
。如果有leftover无论如何,它们应该组合起来组成一组 3 个值。
For the df
下面,数量unique
出现的值[Place]
and [Area]
变化范围为 1-6。所以我们不应该有超过 2 个人assigned
. When unique
值 >3 应该由以下方式分配[Area]
第一的。这leftover值应与其他小于 3 的个体相结合unique
values.
对大 df 表示歉意。这是我重现问题的唯一方法!
import pandas as pd
import numpy as np
from collections import Counter
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:33:00','8:47:00','8:48:00','9:03:00','9:15:00','9:18:00','9:33:00','9:45:00','9:48:00','10:03:00','10:15:00','10:15:00','10:15:00','10:18:00','10:32:00','10:33:00','10:39:00','10:43:00','10:48:00','10:50:00','11:03:00','11:03:00','11:07:00','11:25:00','11:27:00','11:42:00','11:48:00','11:51:00','11:57:00','12:00:00','12:08:00','12:15:00','12:17:00','12:25:00','12:30:00','12:35:00','12:39:00','12:47:00','12:52:00','12:55:00','13:00:00','13:03:00','13:07:00','13:12:00','13:15:00','13:22:00','13:27:00','13:27:00'],
'Area' : ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','A','B','A','A','A','A','B','A','A','B','B','A','B','C','A','B','C','C','A','B','C','C','B','A','C','B','C','C','A','C','B','C','C','A','C'],
'Place' : ['House 1','House 2','House 3','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 4','House 1','House 2','House 1','House 1','House 4','House 3','House 2','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 2','House 1','House 4','House 1','House 1','House 2','House 1','House 4','House 1','House 1','House 3','House 2','House 4','House 1','House 2','House 4','House 1','House 4','House 2'],
'On' : ['1','2','3','3','3','3','3','3','3','3','3','3','3','3','4','5','5','5','5','5','5','4','3','3','3','2','2','2','2','3','3','3','4','4','4','4','4','4','4','4','4','4','4','4','4','4','5','6','6','6','6'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1','Person 3','Person 1','Person 2','Person 1','Person 1','Person 3','Person 1','Person 2','Person 3','Person 3','Person 2','Person 3','Person 4','Person 2','Person 3','Person 4','Person 4','Person 2','Person 3','Person 4','Person 4','Person 3','Person 2','Person 4','Person 3','Person 4','Person 4','Person 2','Person 4','Person 3','Person 5','Person 4','Person 2','Person 4'],
})
df = pd.DataFrame(data=d)
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
print(allocatePeople(df))
Output:
Time Area Place On Person
0 8:03:00 A House 1 1 Person 1
1 8:17:00 A House 2 2 Person 1
2 8:20:00 A House 3 3 Person 1
3 8:33:00 A House 1 3 Person 1
4 8:47:00 A House 3 3 Person 1
5 8:48:00 A House 2 3 Person 1
6 9:03:00 A House 1 3 Person 1
7 9:15:00 A House 3 3 Person 1
8 9:18:00 A House 2 3 Person 1
9 9:33:00 A House 1 3 Person 1
10 9:45:00 A House 3 3 Person 1
11 9:48:00 A House 2 3 Person 1
12 10:03:00 A House 1 3 Person 1
13 10:15:00 A House 3 3 Person 1
14 10:15:00 A House 4 4 Person 2
15 10:15:00 B House 1 5 Person 2
16 10:18:00 A House 2 5 Person 1
17 10:32:00 B House 1 5 Person 2
18 10:33:00 A House 1 5 Person 1
19 10:39:00 A House 4 5 Person 2
20 10:43:00 A House 3 5 Person 1
21 10:48:00 A House 2 4 Person 1
22 10:50:00 B House 1 3 Person 2
23 11:03:00 A House 1 3 Person 1
24 11:03:00 A House 4 3 Person 2
25 11:07:00 B House 1 2 Person 2
26 11:25:00 B House 1 2 Person 2
27 11:27:00 A House 4 2 Person 2
28 11:42:00 B House 1 2 Person 2
29 11:48:00 C House 1 3 Person 2
30 11:51:00 A House 4 3 Person 2
31 11:57:00 B House 1 3 Person 2
32 12:00:00 C House 2 4 Person 3
33 12:08:00 C House 1 4 Person 2
34 12:15:00 A House 4 4 Person 2
35 12:17:00 B House 1 4 Person 2
36 12:25:00 C House 1 4 Person 2
37 12:30:00 C House 2 4 Person 3
38 12:35:00 B House 1 4 Person 2
39 12:39:00 A House 4 4 Person 2
40 12:47:00 C House 1 4 Person 2
41 12:52:00 B House 1 4 Person 2
42 12:55:00 C House 3 4 Person 3
43 13:00:00 C House 2 4 Person 3
44 13:03:00 A House 4 4 Person 2
45 13:07:00 C House 1 4 Person 2
46 13:12:00 B House 2 5 Person 3
47 13:15:00 C House 4 6 Person 4
48 13:22:00 C House 1 6 Person 2
49 13:27:00 A House 4 6 Person 2
50 13:27:00 C House 2 6 Person 3
预期输出和关于为什么我认为应该分配它的评论: