实际上,我们可以直接使用 .Rmd 文件,而不是对输出的 LaTex 文件进行字数统计。
这段代码的方法与字数统计插件 https://github.com/benmarwick/wordcountaddin您提到了标签之间的任何文本<!---TC:ignore--->
and <!---TC:endignore--->
不会被计入计数:
library(stringr)
library(tidyverse)
RmdWords <- function(file) {
# Creates a string of text
file_string <- file %>%
readLines() %>%
paste0(collapse = " ") %>%
# Remove YAML header
str_replace_all("^<--- .*?--- ", "") %>%
str_replace_all("^--- .*?--- ", "") %>%
# Remove code
str_replace_all("```.*?```", "") %>%
str_replace_all("`.*?`", "") %>%
# Remove LaTeX
str_replace_all("[^\\\\]\\$\\$.*?[^\\\\]\\$\\$", "") %>%
str_replace_all("[^\\\\]\\$.*?[^\\\\]\\$", "") %>%
# Deletes text between tags
str_replace_all("TC:ignore.*?TC:endignore", "") %>%
str_replace_all("[[:punct:]]", " ") %>%
str_replace_all(" ", "") %>%
str_replace_all("<", "") %>%
str_replace_all(">", "")
# Save several different results
word_count <- str_count(file_string, "\\S+")
char_count <- str_replace_all(string = file_string, " ", "") %>% str_count()
return(list(num_words = word_count, num_char = char_count, word_list = file_string))
}
该函数返回列表中的三个项目:
-
字数:文件中的字数
-
num_char: 字符数
-
单词表:文档中所有单词的列表
如果您想在编译后的报告中显示结果,您可以内联编写 r 代码,如下所示:
```{r}
words <- RmdWords("MWE.Rmd")
```
There are seven words with 34 characters.
<!-- TC:ignore -->
Words that I want to exclude but cannot,
because the comments do not appear in the `.tex` file.
<!-- TC:endignore -->
<!-- TC:ignore -->
Word Count: `r words$num_words` \newline
Character Count: `r words$num_char`
<!-- TC:endignore -->
注:部分原剧本改编自http://www.questionflow.org/2017/10/13/how-to-scrape-pdf-and-rmd-to-get-inspiration/ http://www.questionflow.org/2017/10/13/how-to-scrape-pdf-and-rmd-to-get-inspiration/