这完全是在乞求一个数据框,那么这只是带有诸如ddply
(惊人的强大):
tot_mamj <- ddply(rain[rain$month %in% 3:6,-2], 'year', colwise(sum))
按年份给出 M/A/M/J 的总计:
year station_1 station_2 station_3 station_4 station_5 ...
1 1972 8.618960 5.697739 10.083192 9.264512 11.152378 ...
2 1973 18.571748 18.903280 11.832462 18.262272 10.509621 ...
3 1974 22.415201 22.670821 32.850745 31.634717 20.523778 ...
4 1975 16.773286 17.683704 18.259066 14.996550 19.007762 ...
...
下面是完美运行的代码。我们创建一个数据框,其列名是“station_n”;还有年份和月份的额外列(因子,或者整数,如果你很懒,请参阅脚注)。现在您可以按月或年进行任意分析(使用 plyr 的 split-apply-combine 范例):
require(plyr) # for d*ply, summarise
#require(reshape) # for melt
# Parameterize everything here, it's crucial for testing/debugging
all_years <- c(1970:2011)
nYears <- length(all_years)
nStations <- 101
# We want station names as vector of chr (as opposed to simple indices)
station_names <- paste ('station_', 1:nStations, sep='')
rain <- data.frame(cbind(
year=rep(c(1970:2011),12),
month=1:12
))
# Fill in NAs for all data
rain[,station_names] <- as.numeric(NA)
# Make 'month' a factor, to prevent any numerical funny stuff e.g accidentally 'aggregating' it
rain$month <- factor(rain$month)
# For convenience, store the row indices for all years, M/A/M/J
I.mamj <- which(rain$month %in% 3:6)
# Insert made-up seasonal data for M/A/M/J for testing... leave everything else NA intentionally
rain[I.mamj,station_names] <- c(3,5,9,6) * runif(4*nYears*nStations)
# Get our aggregate of MAMJ totals, by year
# The '-2' column index means: "exclude month, to prevent it also getting 'aggregated'"
excludeMonthCol = -2
tot_mamj <- ddply(rain[rain$month %in% 3:6, excludeMonthCol], 'year', colwise(sum))
# voila!!
# year station_1 station_2 station_3 station_4 station_5
# 1 1972 8.618960 5.697739 10.083192 9.264512 11.152378
# 2 1973 18.571748 18.903280 11.832462 18.262272 10.509621
# 3 1974 22.415201 22.670821 32.850745 31.634717 20.523778
# 4 1975 16.773286 17.683704 18.259066 14.996550 19.007762
作为脚注,在我将月份从数字转换为因子之前,它会默默地“聚合”(直到我输入“-2”:排除列引用)。
然而,更好的是当你把它作为一个因素时,它会直接拒绝聚合,并抛出一个错误(这对于调试来说是理想的):
ddply(rain[rain$month %in% 3:6, ], 'year', colwise(sum))
Error in Summary.factor(c(3L, 3L, 3L, 3L, 3L, 3L), na.rm = FALSE) :
sum not meaningful for factors