You will not (I don't believe) be able to compare clustering algorithms from two different graphs that contain two different sets of nodes. Practically you can't do it in igraph
and conceptually its hard because the way clustering algorithms are compared is by considering all pairs of nodes in a graph and checking whether they are placed in the same cluster or a different cluster in each of the two clustering approaches. If both clustering approaches typically put the same nodes together and the same nodes apart then they are considered more similar.1
我认为解决该问题的另一种有效方法是评估纯粹作为两个图的交集的节点集的聚类方案的相似程度。您必须决定什么对您的环境更有意义。我将展示如何使用节点的并集而不是交集来完成此操作。
因此,您需要两个图中的所有相同节点才能进行比较。事实上,我认为更简单的方法是将所有相同的节点放在一个图中并具有不同的边类型。然后,您可以分别计算每种边缘类型的聚类,然后进行比较。希望下面的表述是清楚的:
# repeat your set-up
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23")))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
# approach from a single graph
# concatenate edges
edges <- rbind(edge, edge2)
# create an edge attribute indicating network type
edges$type <- c("phone", "email", "email")
# the set of nodes (across both graphs)
nodes <- unique(rbind(nodes, nodes2))
g <- graph_from_data_frame(d = edges, vertices = nodes, directed = F)
# We cluster over the graph without the email edges
com_phone <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="email"]))
plot(g, mark.groups = com_phone)
# Now we can cluster over the graph without the phone edges
com_email <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="phone"]))
plot(g, mark.groups = com_email)
# Now we can compare
compare(com_phone, com_email)
#> [1] 0.7803552
正如您从图中看到的,我们挑选出与您在单独的图中找到的相同的初始聚类结构,并添加了额外的孤立节点。
1:显然这是一个相当模糊的解释。使用的默认算法compare
来自这张纸 https://link.springer.com/chapter/10.1007/978-3-540-45167-9_14,这有一个很好的讨论。