将 Rcpp 对象分配到 Rcpp 列表中会产生最后一个元素的重复项

2023-12-14

我正在尝试采取Rcpp::CharacterMatrix并将每一行转换为它自己的元素Rcpp::List.

然而，我为此编写的函数有一个奇怪的行为，其中列表的每个条目对应于矩阵的最后一行。为什么会这样呢？这是一些与指针相关的概念吗？请解释。

功能：

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List char_expand_list(CharacterMatrix A) {
  CharacterVector B(A.ncol());

  List output;

  for(int i=0;i<A.nrow();i++) {
    for(int j=0;j<A.ncol();j++) {
      B[j] = A(i,j);
    }

    output.push_back(B);
  }

  return output;
}

测试矩阵：

这是矩阵A传递给上述函数。

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
mat
#     [,1] [,2] [,3]
# [1,] "a"  "a"  "a" 
# [2,] "b"  "b"  "b" 
# [3,] "c"  "c"  "c"

Output:

上面的函数应该将此矩阵作为输入并返回矩阵行的列表，如下所示：

char_expand_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

但相反，我得到了不同的东西：

char_expand_list(mat)
# [[1]]
# [1] "c" "c" "c"
#
# [[2]]
# [1] "c" "c" "c"
#
# [[3]]
# [1] "c" "c" "c"

可以看出，输出具有最后一个元素，例如“c”的矩阵行，对第一个和第二个列表元素重复。为什么会发生这种情况？

这里发生的事情很大程度上是 Rcpp 对象工作方式的结果。尤其，CharacterVector充当指向内存位置的指针。通过在外部定义该内存位置for循环，结果是 “全局”指针。也就是说，当更新到B发生在循环中这随后更新了所有变体B已方便地存储在Rcpp::List。因此，重复的行"c"自始至终列表。

话虽这么说，这是一个非常非常非常bad使用的想法.push_back() on any Rcpp数据类型，因为您最终将在不断扩展的对象之间进行复制。当 Rcpp 数据类型隐藏底层时，就会发生复制SEXP控制 R 对象，必须重新创建该对象。因此，您应该尝试以下方法之一：

重新排列一下Rcpp::CharacterVector被创建在第一个内部for循环并预分配Rcpp::List space.
Switch to using only C++ standard library objects and convert at the end to the appropriate type.
- std::list with std::vector<T> type T (i.e. std::string)
- Rcpp::wrap(x)返回正确的对象或修改函数返回类型Rcpp::List to std::list<std::vector<T> >.
预分配Rcpp::List空间和使用std::vector<T> type T (i.e. std::string).
预分配Rcpp::List空间并制作一个clone()在将 Rcpp 对象存储到列表之前。

Option 1

这里我们通过移动声明来重新排列函数B进入第一个循环，预分配列表空间，并正常访问输出列表。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) {
  Rcpp::List output(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {
    Rcpp::CharacterVector B(A.ncol());

    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    output[i] = B;
  }

  return output;
}

Option 2

这里我们删除了Rcpp::CharacterVector有利于std::vector<std::string>并替换为Rcpp::List for std::list<std::vector<std::string> >。最后，我们将标准对象转换为Rcpp::List via Rcpp::wrap().

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) {
  std::vector<std::string> B(A.ncol());

  std::list<std::vector<std::string> > o;

  for(int i = 0 ;i < A.nrow(); i++) {
    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    o.push_back(B);
  }

  return Rcpp::wrap(o);
}

Giving:

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
char_expand_std_to_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

Option 3

或者，您可以致力于保持Rcpp::List，但只需声明大小它提前期待并仍然使用std::vector<T>元素。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) {
  std::vector<std::string> B(A.ncol());

  Rcpp::List o(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {
    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    o[i] = B;
  }

  return o;
}

Option 4

最后，通过为列表预定义空间，可以显式克隆该列表每次迭代时的数据。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) {
  Rcpp::CharacterVector B(A.ncol());
  Rcpp::List output(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {

    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    output[i] = clone(B);
  }

  return output;
}

基准

基准测试结果表明Option 1通过重新排列和预分配的空间表现最好。亚军第二名是Option 4，哪个涉及在将每个载体保存到Rcpp::List.

library("microbenchmark")
library("ggplot2")

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))

micro_mat_to_list = 
  microbenchmark(char_expand_list_rearrange(mat),
                 char_expand_std_to_list(mat),
                 char_expand_list_vec(mat),
                 char_expand_list_clone(mat))
micro_mat_to_list
# Unit: microseconds
#                             expr   min     lq    mean median     uq    max neval
#  char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445  6.797   100
#     char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627   100
#        char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814   100
#      char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965   100

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

c

rcpp