如果我们想添加比例列,那么我们按“性别”、“前”分组,通过计数除以创建百分比sum
的计数和left_join
out1 <- out %>%
group_by(gender, pre) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(out)
或者使用adorn_percentages
from janitor
library(janitor)
library(tidyr)
out1 <- out %>%
tabyl(gender, pre) %>%
adorn_percentages(denominator = "all") %>%
pivot_longer(cols = -gender, names_to = 'pre',
values_to = 'fpc') %>%
right_join(out)
如果我们需要一个函数
f1 <- function(dat, grp_cols) {
dat %>%
group_by(across(all_of(grp_cols))) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(dat)
}
f1(out, c("gender", "pre"))
#Joining, by = c("gender", "pre")
# A tibble: 200 x 11
# gender pre n fpc no. fake.name sector pretest state email phone
# <chr> <chr> <int> <dbl> <int> <chr> <chr> <int> <chr> <chr> <chr>
# 1 F High 31 0.155 1 Pont Private 1352 NY [email protected] /cdn-cgi/l/email-protection xxx-xx-6216
# 2 F High 31 0.155 2 Street NGO 1438 CA [email protected] /cdn-cgi/l/email-protection xxx-xx-6405
# 3 F High 31 0.155 3 Galvan Private 1389 NY [email protected] /cdn-cgi/l/email-protection xxx-xx-9195
# 4 F High 31 0.155 4 Gorman NGO 1375 CA [email protected] /cdn-cgi/l/email-protection xxx-xx-1845
# 5 F High 31 0.155 5 Jacinto Private 1386 CA [email protected] /cdn-cgi/l/email-protection xxx-xx-6237
# 6 F High 31 0.155 6 Shah Public 1384 CA [email protected] /cdn-cgi/l/email-protection xxx-xx-5723
# 7 F High 31 0.155 7 Randon Private 1360 TX [email protected] /cdn-cgi/l/email-protection xxx-xx-7542
# 8 F High 31 0.155 8 Koucherik NGO 1439 NY [email protected] /cdn-cgi/l/email-protection xxx-xx-9137
# 9 F High 31 0.155 9 Waters Industry 1414 TX [email protected] /cdn-cgi/l/email-protection xxx-xx-7560
#10 F High 31 0.155 10 David Industry 1396 CA [email protected] /cdn-cgi/l/email-protection xxx-xx-6498
# … with 190 more rows