y**b 发帖数: 10166 | 1 请问一般用什么通用的方法和工具?
有几个问题请教:
(1)PAPI_MEM_SCY(Cycles Stalled Waiting for Memory Access,好处是可以嵌在代
码里面,尤其适合并行计算)似乎不支持sandybridge/ivybridge/haswell几种处理器,
是不是意味着基于PAPI的TAU,PerfSuite,HPCToolkit也没法做这种测试?
(2)用perf进行了测试,结果如下
perf stat -p 48382 sleep 10
Performance counter stats for process id '48382':
4821.604141 task-clock (msec) # 0.963 CPUs utilized
[100.00%]
1,218 context-switches # 0.253 K/sec
[100.00%]
0 cpu-migrations # 0.000 K/sec
[100.00%]
0 page-faults # 0.000 K/sec
17,312,623,873 cycles # 3.591 GHz
[100.00%]
5,783,328,106 stalled-cycles-frontend # 33.41% frontend cycles
idle [100.00%]
2,359,944,745 stalled-cycles-backend # 13.63% backend cycles
idle [100.00%]
27,153,618,219 instructions # 1.57 insns per cycle
# 0.21 stalled cycles
per insn [100.00%]
4,263,391,770 branches # 884.227 M/sec
[100.00%]
27,273,889 branch-misses # 0.64% of all branches
5.004692107 seconds time elapsed
(3)这里面ipc=1.57,那么cpi=1/1.57=0.64;
stalled cycles per insn=0.21,就是说比例为0.21/0.64=33%;
等于上面的stalled-cycles-frontend # 33.41% frontend cycles idle。
(4)比较而言,
front-end(fetch and decode phases) vs back-end(execute),
哪个个更体现memory stall time to execution time ratio,
或者说应该选用哪个指标来描述?我对处理器pipeline结构缺乏了解。
(5)这两个指标算高吗?
33.41% frontend cycles idle
13.63% backend cycles idle |
|