我读到“由于 ptmalloc2 的线程支持,它成为 Linux 的默认内存分配器”。有什么办法让我自己检查一下吗?
glibc
内部使用ptmalloc2
这并不是最近的事态发展。不管怎样,这并不是很难做到的getconf GNU_LIBC_VERSION
,然后交叉检查版本以查看是否ptmalloc2
是否在该版本中使用,但我敢打赌您会浪费时间。
我这样问是因为我似乎没有通过在下面的代码中并行化我的 malloc 循环来获得任何速度
将你的例子变成MVCE http://coliru.stacked-crooked.com/a/5acfcc8825aaf79f(为了简洁起见,此处省略代码),并使用g++ -Wall -pedantic -O3 -pthread -fopenmp
, with g++ 5.3.1
这是我的结果。
使用 OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 746 free_time = 263
parallelism = 2 itr = 10000000 malloc_time = 541 free_time = 267
parallelism = 3 itr = 10000000 malloc_time = 405 free_time = 259
parallelism = 4 itr = 10000000 malloc_time = 324 free_time = 221
parallelism = 5 itr = 10000000 malloc_time = 330 free_time = 242
parallelism = 6 itr = 10000000 malloc_time = 287 free_time = 244
parallelism = 7 itr = 10000000 malloc_time = 257 free_time = 226
parallelism = 8 itr = 10000000 malloc_time = 270 free_time = 225
parallelism = 9 itr = 10000000 malloc_time = 253 free_time = 225
parallelism = 10 itr = 10000000 malloc_time = 236 free_time = 226
parallelism = 11 itr = 10000000 malloc_time = 225 free_time = 239
parallelism = 12 itr = 10000000 malloc_time = 276 free_time = 258
parallelism = 13 itr = 10000000 malloc_time = 241 free_time = 228
parallelism = 14 itr = 10000000 malloc_time = 254 free_time = 225
parallelism = 15 itr = 10000000 malloc_time = 278 free_time = 272
parallelism = 16 itr = 10000000 malloc_time = 235 free_time = 220
23.87 user
2.11 system
0:10.41 elapsed
249% CPU
没有 OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 748 free_time = 263
parallelism = 2 itr = 10000000 malloc_time = 344 free_time = 256
parallelism = 3 itr = 10000000 malloc_time = 751 free_time = 254
parallelism = 4 itr = 10000000 malloc_time = 339 free_time = 262
parallelism = 5 itr = 10000000 malloc_time = 748 free_time = 253
parallelism = 6 itr = 10000000 malloc_time = 330 free_time = 256
parallelism = 7 itr = 10000000 malloc_time = 734 free_time = 260
parallelism = 8 itr = 10000000 malloc_time = 334 free_time = 259
parallelism = 9 itr = 10000000 malloc_time = 750 free_time = 256
parallelism = 10 itr = 10000000 malloc_time = 339 free_time = 255
parallelism = 11 itr = 10000000 malloc_time = 743 free_time = 267
parallelism = 12 itr = 10000000 malloc_time = 342 free_time = 261
parallelism = 13 itr = 10000000 malloc_time = 739 free_time = 252
parallelism = 14 itr = 10000000 malloc_time = 333 free_time = 252
parallelism = 15 itr = 10000000 malloc_time = 740 free_time = 252
parallelism = 16 itr = 10000000 malloc_time = 330 free_time = 252
13.38 user
4.66 system
0:18.08 elapsed
99% CPU
并行似乎快了大约8秒。还是不相信?好的。我继续前行并抓住了dlmalloc https://github.com/ennorehling/dlmalloc, ran make
生产libmalloc.a
。我的新命令是g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc
使用 OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 814 free_time = 277
I CTRL-C'd after 37 seconds.
没有 OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 772 free_time = 271
parallelism = 2 itr = 10000000 malloc_time = 780 free_time = 272
parallelism = 3 itr = 10000000 malloc_time = 783 free_time = 272
parallelism = 4 itr = 10000000 malloc_time = 792 free_time = 277
parallelism = 5 itr = 10000000 malloc_time = 813 free_time = 281
parallelism = 6 itr = 10000000 malloc_time = 800 free_time = 275
parallelism = 7 itr = 10000000 malloc_time = 795 free_time = 277
parallelism = 8 itr = 10000000 malloc_time = 790 free_time = 273
parallelism = 9 itr = 10000000 malloc_time = 788 free_time = 277
parallelism = 10 itr = 10000000 malloc_time = 784 free_time = 276
parallelism = 11 itr = 10000000 malloc_time = 786 free_time = 284
parallelism = 12 itr = 10000000 malloc_time = 807 free_time = 279
parallelism = 13 itr = 10000000 malloc_time = 791 free_time = 277
parallelism = 14 itr = 10000000 malloc_time = 790 free_time = 273
parallelism = 15 itr = 10000000 malloc_time = 785 free_time = 276
parallelism = 16 itr = 10000000 malloc_time = 787 free_time = 275
6.48 user
11.27 system
0:17.81 elapsed
99% CPU
差异相当显着。我怀疑问题出在您更复杂的代码中,或者您的基准测试有问题。