一种方法是使用nest
and unnest
,像这样:
let noDuplicates: Frame<(int*string), string> =
df1
|> Frame.groupRowsBy "Tomas"
|> Frame.nest
|> Series.mapValues (Frame.take 1)
|> Frame.unnest
让我们解释一下每个步骤。想象一下你有这个数据框:
// Create from individual observations (row * column * value)
let df1 =
[ ("Monday", "Tomas", 1); ("Tuesday", "Adam", 2)
("Tuesday", "Tomas", 4); ("Wednesday", "Tomas", -5)
("Thursday", "Tomas", 4); ("Thursday", "Adam", 5) ]
|> Frame.ofValues
Tomas Adam
Monday -> 1 <missing>
Tuesday -> 4 2
Wednesday -> -5 <missing>
Thursday -> 4 5
您想要删除“Tomas”列中包含重复值的行。
首先,按此列分组。
let df2 : Frame<(int * string), string> = df1 |> Frame.groupRowsBy "Tomas"
Tomas Adam
1 Monday -> 1 <missing>
4 Tuesday -> 4 2
4 Thursday -> 4 5
-5 Wednesday -> -5 <missing>
现在您有了一个具有两级索引的框架,您可以将其转换为一系列数据框架。
let df3 = df2 |> Frame.nest
Tomas Adam
Monday -> 1 <missing>
Tomas Adam
Tuesday -> 4 2
Thursday -> 4 5
Tomas Adam
Wednesday -> -5 <missing>
取每一帧的第一行。
let df4 = df3 |> Series.mapValues (fun fr -> fr |> Frame.take 1)
Tomas Adam
Monday -> 1 <missing>
Tomas Adam
Tuesday -> 4 2
Tomas Adam
Wednesday -> -5 <missing>
仍然需要执行向后转换:从一系列数据帧转换为具有两级索引的帧。
let df5 = df4 |> Frame.unnest
Tomas Adam
-5 Wednesday -> -5 <missing>
1 Monday -> 1 <missing>
4 Tuesday -> 4 2