损坏的 R 代码用于选择文本文件中的特定行和单元格并将其放入数据框中

2024-01-04

这是一个扩展这个问题 https://stackoverflow.com/questions/42513614/select-specific-rows-and-cells-in-text-file-and-put-into-data-frame-python-on-r需要更改以容纳更多行Bands在文本文件中。我想要的是从如下所示的文本文件中选择“基本统计”行,然后将它们组织在数据框中,如问题底部的数据框。Here's https://www.dropbox.com/s/w76w2qqxqcdncik/test2.txt?dl=0如果您想直接使用该文件,请提供该文件的链接。

Filename: /blah/blah/blah.txt
ROI: red_2 [Red] 12 points

Basic Stats      Min         Max        Mean       Stdev
     Band 1 0.032262    0.124425    0.078073    0.028031
     Band 2 0.021072    0.064156    0.037923    0.012178
     Band 3 0.013404    0.066043    0.036316    0.014787
     Band 4 0.005162    0.055781    0.015526    0.013255

Histogram         DN       Npts   Total  Percent     Acc Pct
Band 1      0.032262          1       1   8.3333      8.3333
Bin=0.00036 0.032624          0       1   0.0000      8.3333
            0.032985          0       1   0.0000      8.3333
            0.033346          0       1   0.0000      8.3333

这是我正在使用的代码:

dat <- readLines('/blah/blah/blah.txt') 
# create an index for the lines that are needed: Basic stats and Bands
ti <- rep(which(grepl('ROI:', dat)), each = 8) + 1:8
# create a grouping vector of the same length
grp <- rep(1:203, each = 8)

# filter the text with the index 'ti' 
# and split into a list with grouping variable 'grp'
lst <- split(dat[ti], grp)
# loop over the list a read the text parts in as dataframes
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', header = TRUE, blank.lines.skip = TRUE))

# bind the dataframes in the list together in one data.frame
DF <- do.call(rbind, lst)
# change the name of the first column
names(DF)[1] <- 'ROI'

# get the correct ROI's for the ROI-column
DF$ROI <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)])
DF

输出看起来像这样:

$ROI
[1] "red_2"  "red_3"  "red_4"  "red_5"  "red_6"  "red_7"  "red_8"  "red_9"  "red_10" "bcs_1"  "bcs_2" 
[12] "bcs_3"  "bcs_4"  "bcs_5"  "bcs_6"  "bcs_7"  "bcs_8"  "bcs_9"  "bcs_10" "red_11" "red_12" "red_12"
[23] "red_13" "red_14" "red_15" "red_16" "red_17" "red_18" "red_19" "red_20" "red_21" "red_22" "red_23"
[34] "red_24" "red_25" "red_24" "red_25" "red_26" "red_27" "red_28" "red_29" "red_30" "red_31" "red_33"

$<NA>
[1] "Basic Stats\t     Min\t     Max\t    Mean\t   Stdev"

$<NA>
[1] "Basic Stats\t     Min\t     Max\t    Mean\t   Stdev"
etc...

当它看起来应该是这样的:

ROI      Band         Min        Max         Mean   Stdev
red_2    Band 1 0.032262    0.124425    0.078073    0.028031
red_2    Band 2 0.021072    0.064156    0.037923    0.012178
red_2    Band 3 0.013404    0.066043    0.036316    0.014787
red_2    Band 4 0.005162    0.055781    0.015526    0.013255
red_3    Band 1 values...
red_4    Band 2 
red_4    Band 3 
red_4    Band 4 

我想要一些帮助。


对于此文件,您必须进行调整我在这里提出的方法 https://stackoverflow.com/a/42514499/2204410。对于链接的文本文件(test2.txt)我建议采用以下方法:

dat <- readLines('test2.txt') 

len <- sum(grepl('ROI:', dat))
ti <- rep(which(grepl('ROI:', dat)), each = 7) + 0:6
grp <- rep(1:len, each = 7)

lst <- split(dat[ti], grp)
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', skip = 1, header = TRUE, blank.lines.skip = TRUE))

names(lst) <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)])

library(data.table)
DT <- rbindlist(lst, idcol = 'ROI')
setnames(DT, 2, 'Band')

这给出了期望的结果:

> DT
         ROI        Band      Min      Max     Mean    Stdev
   1:  red_1      Band 1 0.013282 0.133982 0.061581 0.034069
   2:  red_1      Band 2 0.009866 0.112935 0.042688 0.026618
   3:  red_1      Band 3 0.008304 0.037059 0.018434 0.007515
   4:  red_1      Band 4 0.004726 0.040089 0.018490 0.009605
   5:  red_2      Band 1 0.032262 0.124425 0.078073 0.028031
  ---                                                       
1220: bcs_49      Band 4 0.002578 0.010578 0.006191 0.002285
1221: bcs_50      Band 1 0.032775 0.072881 0.051152 0.012593
1222: bcs_50      Band 2 0.020029 0.085993 0.042864 0.018628
1223: bcs_50      Band 3 0.012770 0.034367 0.023056 0.006581
1224: bcs_50      Band 4 0.005804 0.024798 0.014049 0.005744
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

损坏的 R 代码用于选择文本文件中的特定行和单元格并将其放入数据框中 的相关文章

随机推荐