在汇总表中添加一列,其中包含每个变量的分布图

2024-05-06

有没有一种好方法来添加包含每个变量的分布图的列?

例如这里:

library(vtable)
st(iris)

没有必要使用这个包。接受建议


尽管我知道这可能更容易做到ggplot,我总是渴望看看我是否可以达到类似的结果base R绘图工具。我将利用iris本例中的数据。

我们首先需要确定我们的哪些列data.frame是数字。

# returns logical which is TRUE if column p is numeric
numeric_cols <- c(rep(NA, ncol(iris)))
for(p in seq_len(ncol(iris))) {
  numeric_cols[p] <- inherits(iris[, p], 'numeric')
}

然后,我们可以选择一些任意颜色的密度。在这里,我选择了三种颜色,分别对应于级别的数量iris$Species.

my_cols <- c('blue4', 'darkorange', '#00b0a4')
adj_col <- \(x) adjustcolor(x, alpha.f = 0.2)
my_transp_cols <- c(
  adj_col('blue4'), adj_col('darkorange'), adj_col('#00b0a4')
)

现在我们需要绘制密度。下面给出的函数(即plot_densities)可以选择提供边际密度或以某些因子变量为条件的密度。如果您想获得以某些因子变量为条件的密度,只需设置include_factor to TRUE并将感兴趣的因子变量传递给factor争论。

plot_densities <- \(DF, columns, include_factor = FALSE, factor) {
  name_vars <- names(DF)
  DF <- DF[complete.cases(DF[name_vars]), ]
  ## setting up plotting device
  layout(matrix(seq_len(4L), ncol = 4L))
  ## only use the TRUEs indicating numeric columns
  n_cols <- length(columns[columns])
  ## if densities are to be shown per factor level
  if (include_factor) {
    par(mar = c(5, 4, 4, 8) + 0.1, xpd = TRUE)
    lvls <- unique(levels(DF[[factor]]))
    for (i in seq_len(n_cols)) {
      ## preallocation
      max_y <- max_x <- min_x <- rep(NA, length(unique(levels(DF[[factor]]))))
      means <- SDs <- rep(NA, length(unique(levels(DF[[factor]]))))
      no_of_levels <- length(lvls)
      for (j in seq_len(no_of_levels)) {
        ## only proceed with this loop if column i is numeric else next
        if (columns[i]) {
          ## subset consisting values of column i for factor level j
          sub <- subset(DF, DF[[factor]] %in% lvls[j])[, i]
          ## make sure that the densities of column i per factor level j
          ## are depicted in the same panel
          if (j == 1) {
            ## limits for the x and y axes per panel for column i
            for (k in seq_len(no_of_levels)) {
              sub_k <- subset(DF, DF[[factor]] %in% lvls[k])[, i]
              x <- density(sub_k)$x
              y <- density(sub_k)$y
              min_x[k] <- min(x)
              max_x[k] <- max(x)
              max_y[k] <- max(y)
            }
            ## mean and SD for column i per factor level j
            r <- \(x) format(round(x, 1L), nsmall = 1L)
            for (kk in seq_len(no_of_levels)) {
              sub_kk <- subset(DF, DF[[factor]] %in% lvls[kk])[, i]
              means[kk] <- r(mean(sub_kk, na.rm = TRUE))
              SDs[kk] <- r(sd(sub_kk, na.rm = TRUE))
            }
            x_lim <- c(min(min_x), max(max_x))
            y_lim <- c(0L, max(max_y))
            plot(density(sub), main = '',
                 las = 1, col = my_cols[j], xlab = '',
                 xlim = x_lim, ylim = y_lim, bty = 'n')
            title(main = names(DF)[i], xpd = TRUE, adj = 1)
            polygon(density(sub), density = -1L, col = my_transp_cols[j])
          } else {
            lines(density(sub), col = my_cols[j])
            polygon(density(sub), density = -1L, col = my_transp_cols[j])
          }
        } else next
      }
      ## add legend to the plot
      legend('topright', paste0(lvls, ': ', means, ' (', SDs, ')'),
             fill = my_transp_cols, bty = 'n',
             inset = c(-0.5, 0.1))
    }
  } else {
    ## if densities are NOT to be shown per factor level
    for (i in seq_len(n_cols)) {
      par(mar = c(5, 4, 4, 8) + 0.1, xpd = TRUE)
      ## only proceed with this loop if column i is numeric else next
      if (columns[i]) {
        ## mean and SD for column i
        r <- \(x) format(round(x, 1L), nsmall = 1L)
        means <- SDs <- rep(NA, n_cols)
        for(j in seq_len(n_cols)) {
          means[j] <- r(mean(DF[, j], na.rm = TRUE))
          SDs[j] <- r(sd(DF[, j], na.rm = TRUE))
        }
        plot(density(DF[, i]),
             las = 1, main = names(DF)[i], col = my_cols[1L], xlab = '',
             bty = 'n')
        polygon(density(DF[, i]), density = -1L, col = my_transp_cols[1L])
        ## add legend to the plot
        legend('topright', paste0(names(DF)[i], ': ', means[i], ' (', SDs[i], ')'),
               fill = my_transp_cols[1L], bty = 'n',
               inset = c(-0.5, 0.1))
      } else next
    }
  }
}

我们可以将输出保存为 .pdf 文件。如果你想改变layout绘图设备,比你还必须玩一点width and height使其适合您的具体情况。

# marginal densities
pdf(file = 'my_directory/my_plot.pdf', # change my_directory
    width = 13, height = 4) 
plot_densities(DF = iris, columns = numeric_cols)
dev.off()

# conditional densities
pdf(file = 'my_directory/my_plot2.pdf', # change my_directory
    width = 13, height = 4) 
plot_densities(DF = iris, columns = numeric_cols,
               include_factor = TRUE, factor = 'Species')
dev.off()

我通常制作我的图的 .pdf 文件,然后使用这个在线转换工具 https://pdf2jpg.net/将它们转换为 .png 文件。

我在图例中显示了平均值 (SD),但您可以选择显示您喜欢的任何统计数据。只是改变mean(sub) and sd(sub)在您感兴趣的统计功能中。

Output

Marginal densities enter image description here

Conditional densities enter image description here

注意:使用function(x)代替\(x)如果您使用 R

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

在汇总表中添加一列,其中包含每个变量的分布图 的相关文章

随机推荐