I found oprofile http://oprofile.sourceforge.net在这种情况下,这是比 gprof 更好的选择。这来自oprofile的报告 http://oprofile.sourceforge.net/examples/更加全面。我使用 #ifndef PROFILE 从 C 扩展中编译出了导致段错误的 ruby 部分(并非全部),并将它们替换为非 ruby 代码。我在扩展本身中编写了一个 main() 例程,以调用扩展中的函数。然后我设置了一个 makefile 将扩展编译为定义了 PROFILE 的 C 程序。然后我在 Ubuntu 上安装 oprofile http://lbrandy.com/blog/2008/11/oprofile-profiling-in-linux-for-fun-and-profit/。写了这个脚本。
#!/bin/bash
sudo opcontrol --reset
sudo opcontrol --start
./a.out Rome Damascus NewYork Delhi Bangalore
sudo opcontrol --shutdown
opreport -lt1
编译我的程序,并执行上面的脚本,它从“opreport”命令中给出如下输出:
...
...
Killing daemon.
warning: /no-vmlinux could not be found.
warning: [vdso] (tgid:10675 range:0x920000-0x921000) could not be found.
warning: [vdso] (tgid:1270 range:0xba1000-0xba2000) could not be found.
warning: [vdso] (tgid:1675 range:0x973000-0x974000) could not be found.
warning: [vdso] (tgid:1711 range:0x264000-0x265000) could not be found.
warning: [vdso] (tgid:1737 range:0x990000-0x991000) could not be found.
warning: [vdso] (tgid:2477 range:0xa53000-0xa54000) could not be found.
warning: [vdso] (tgid:5658 range:0x7ae000-0x7af000) could not be found.
CPU: Core Solo / Duo, speed 1000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % app name symbol name
12731 32.8949 a.out levenshtein
11958 30.8976 a.out corpora_pass2
5231 13.5161 no-vmlinux /no-vmlinux
4021 10.3896 a.out corpora_pass1
1733 4.4778 libc-2.10.1.so /lib/tls/i686/cmov/libc-2.10.1.so
542 1.4004 ld-2.10.1.so /lib/ld-2.10.1.so
398 1.0284 a.out method_top_matches
就是这样:顶级消费者是函数 levenshtein()。我接着使用另一个命令来生成反汇编的输出,并用源代码和每行的执行计数/时间进行注释。看起来像这样(计数/时间位于每个执行行的左侧):
> opannotate --source --assembly ./a.out > report.as.handcoded.1
> cat report.as.handcoded.1
...
...
...
: __asm__ (
2 0.0069 : 804918a: mov -0x50(%ebp),%ecx
4 0.0137 : 804918d: mov -0x54(%ebp),%ebx
: 8049190: mov -0x4c(%ebp),%eax
12 0.0412 : 8049193: cmp %eax,%ecx
10 0.0344 : 8049195: cmovbe %ecx,%eax
8 0.0275 : 8049198: cmp %eax,%ebx
11 0.0378 : 804919a: cmovbe %ebx,%eax
16 0.0550 : 804919d: mov %eax,-0x4c(%ebp)
: "cmp %0, %2\n\t"
: "cmovbe %2, %0\n\t"
: : "+r"(a) :
: "%r"(b), "r"(c)
: );
: return a;
...
...
...