创建游程 ID,同时允许游程中存在一定长度的间隙

2023-12-23

(我最初发布了一个问题here https://stackoverflow.com/questions/66478148/create-a-list-of-vectors-from-a-vector-where-n-consecutive-values-are-not-0-in-r/66480056#66480056,但它并没有完全涵盖我的问题)

我有一个带有“日期”列和降水量(降雨量)的数据框:

  date precip
1    1    0.0
2    2    0.0
3    3   12.4
4    4   10.2
5    5    0.0
6    6   13.6

我想创建一个“事件”列,其中包含每个连续降雨周期的计数器(ID)。降雨事件可以定义为降水量大于例如 的连续运行。 0。

如果我们不允许任何零降雨的短暂间隙,“事件”将如下所示,带有一个用于非降雨的计数器0期间,以及NA没有下雨的时期。

  date precip event
1    1    0.0    NA
2    2    0.0    NA
3    3   12.4     1
4    4   10.2     1
5    5    0.0    NA
6    6   13.6     2

此外,我希望能够允许较短的无雨时间,例如尺寸的n= 1 天,每次运行非0.

例如,在上面的数据框中,如果我们允许在连续的降雨期内有 1 天的降雨量为 0,例如第 5 天,然后第 3 天到第 6 天可以定义为一次降雨事件:

  date precip event
1    1    0.0    NA
2    2    0.0    NA
3    3   12.4     1
4    4   10.2     1
5    5    0.0     1 # <- gap of 1 day with no rain: OK
6    6   13.6     1

稍微大一点的玩具数据集:

structure(list(date = 1:31, precip = c(0, 0, 12.3999996185303, 
10.1999998092651, 0, 13.6000003814697, 16.6000003814697, 21.5, 
7.59999990463257, 0, 0, 0, 0.699999988079071, 0, 0, 0, 5.40000009536743, 
0, 1, 35.4000015258789, 11.5, 16.7000007629395, 13.5, 13.1000003814697, 
11.8000001907349, 1.70000004768372, 0, 15.1000003814697, 12.8999996185303, 
3.70000004768372, 24.2999992370605)), row.names = c(NA, -31L), class = "data.frame")

现在我真的被困住了。我尝试了一些奇怪的事情,比如下面的(只是一个开始),但我想我自己不会弄清楚,并且非常感谢任何帮助

# this is far from being any helpful, but just to show the direction I was heading...
# the threshold could be 0 to mirror the example above...

rainfall_event = function(df,
                          daily_thresh = .2,
                          n = 1) {
  for (i in 1:nrow(df)) {
    zero_index = 1
    
    if (df[i,]$precip < daily_thresh) {
      # every time you encounter a value below the threshold count the 0s
      zero_counter = 0
      
      while (df[i,]$precip < daily_thresh) {

        zero_counter = zero_counter + 1
        
        if (i != nrow(df)) {
          i = i + 1
          zero_index = zero_index + 1
        } else{
          break
        }
      }
      
      if (zero_counter > n) {
        df[zero_index:zero_index + zero_counter,][["event"]] = NA
      }
      
    } else{
      event_counter = 1
      
      while (df[i, ]$precip > daily_thresh) {

        df[["event"]] = event_counter
        if (i != nrow(rainfall_one_slide)) {
          i = i + 1
        } else{
          break
        }
      }
      
    }
  }
  
}

An rle选择:

# limit of n days with precip = 0 to be allowed in runs of non-zero
n = 1

# rle of precip == 0
r = rle(d$precip == 0)

# replace the values of precip = 0 & length > limit with NA
r$values[r$values & r$lengths > n] = NA

# reconstruct the vector from the updated runs
ir = inverse.rle(r)

# rle of "is NA"
r2 = rle(is.na(ir))

# replace length of NA runs with 0
r2$lengths[r2$values] = 0

# replace values of non-NA runs with a sequence
r2$values[!r2$values] = seq_along(r2$values[!r2$values])

# create event column
d[!is.na(ir), "event"] = inverse.rle(r2)

   date precip event
1     1    0.0    NA
2     2    0.0    NA
3     3   12.4     1
4     4   10.2     1
5     5    0.0     1
6     6   13.6     1
7     7   16.6     1
8     8   21.5     1
9     9    7.6     1
10   10    0.0    NA
11   11    0.0    NA
12   12    0.0    NA
13   13    0.7     2
14   14    0.0    NA
15   15    0.0    NA
16   16    0.0    NA
17   17    5.4     3
18   18    0.0     3
19   19    1.0     3
20   20   35.4     3
21   21   11.5     3
22   22   16.7     3
23   23   13.5     3
24   24   13.1     3
25   25   11.8     3
26   26    1.7     3
27   27    0.0     3
28   28   15.1     3
29   29   12.9     3
30   30    3.7     3
31   31   24.3     3
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

创建游程 ID,同时允许游程中存在一定长度的间隙 的相关文章

随机推荐