统计程序两个点之间执行的指令数量
环境:支持perf
ubuntu安装
apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
android
一般自带simpleperf
分析
两个点作差, 求中间结果;
*(int*)nullptr = 0;
案例
断点 1
代码
#define SETPOINT(...) do { *(int*)nullptr = 0; } while(0)
int main() {SETPOINT(1);int a = 0;for(int i = 0 ; i < 500; i++) a+=i;return 0;
}
执行结果
ch@ch-ubuntu:~/ch/perf_test/exe_count$ sudo perf stat -e instructions:u ./a.out
[sudo] password for ch:
./a.out: Segmentation faultPerformance counter stats for './a.out':100064 instructions:u 0.111183917 seconds time elapsed0.001217000 seconds user0.000000000 seconds sysch@ch-ubuntu:~/ch/perf_test/exe_count$ sudo perf stat -e instructions:u ./a.out
./a.out: Segmentation faultPerformance counter stats for './a.out':100064 instructions:u 0.104691299 seconds time elapsed0.000875000 seconds user0.000000000 seconds sysch@ch-ubuntu:~/ch/perf_test/exe_count$ sudo perf stat -e instructions:u ./a.out
./a.out: Segmentation faultPerformance counter stats for './a.out':100064 instructions:u 0.111463860 seconds time elapsed0.000931000 seconds user0.000000000 seconds sys
断点 2
代码
#define SETPOINT(...) do { *(int*)nullptr = 0; } while(0)
int main() {int a = 0;for(int i = 0 ; i < 500; i++) a+=i;SETPOINT(2);return 0;
}
执行结果
ch@ch-ubuntu:~/ch/perf_test/exe_count$ sudo perf stat -e instructions:u ./a.out
./a.out: Segmentation faultPerformance counter stats for './a.out':102569 instructions:u 0.105533002 seconds time elapsed0.000904000 seconds user0.000000000 seconds sysch@ch-ubuntu:~/ch/perf_test/exe_count$ sudo perf stat -e instructions:u ./a.out
./a.out: Segmentation faultPerformance counter stats for './a.out':102567 instructions:u 0.105150980 seconds time elapsed0.000876000 seconds user0.000000000 seconds sysch@ch-ubuntu:~/ch/perf_test/exe_count$ sudo perf stat -e instructions:u ./a.out
./a.out: Segmentation faultPerformance counter stats for './a.out':102567 instructions:u 0.103408851 seconds time elapsed0.000897000 seconds user0.000000000 seconds sys
结果分析
102567 - 100064 == 2503
代码分析
ch@ch-ubuntu:~/ch/perf_test/exe_count$ objdump --disassemble=main ./a.out./a.out: file format elf64-x86-64Disassembly of section .init:Disassembly of section .plt:Disassembly of section .plt.got:Disassembly of section .text:0000000000001129 <main>:1129: f3 0f 1e fa endbr64112d: 55 push %rbp112e: 48 89 e5 mov %rsp,%rbp1131: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)1138: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)113f: eb 0a jmp 114b <main+0x22>1141: 8b 45 fc mov -0x4(%rbp),%eax1144: 01 45 f8 add %eax,-0x8(%rbp)1147: 83 45 fc 01 addl $0x1,-0x4(%rbp)114b: 81 7d fc f3 01 00 00 cmpl $0x1f3,-0x4(%rbp)1152: 7e ed jle 1141 <main+0x18>1154: b8 00 00 00 00 mov $0x0,%eax1159: c7 00 00 00 00 00 movl $0x0,(%rax)115f: b8 00 00 00 00 mov $0x0,%eax1164: 5d pop %rbp1165: c3 ret
可以看到1141 - 1152
一共有5
条指令, 5 * 500
差不多2500
, perf
统计有些许误差属于正常现象;
结论
可以两个点之间的执行共执行了2500
条指令; 然后根据指令数量和执行环境推断理论执行性能;
不适用于和多种硬件打交道的环境; 而且不同的syscall
耗时不一样;