一尘不染

硬件缓存事件和性能

linux

运行时,perf list我看到了一系列的 硬件缓存事件 ,如下所示:

$ perf list | grep 'cache event'
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  node-load-misses                                   [Hardware cache event]
  node-loads                                         [Hardware cache event]
  node-store-misses                                  [Hardware cache event]
  node-stores                                        [Hardware cache event]

这些事件似乎大多基于测试返回合理的值,但是我想知道如何确定将这些事件映射到系统上的硬件性能计数器事件?

也就是说,这些事件肯定是在Skylake CPU上使用一个或多个基础x86 PMU计数器实现的-但是我怎么知道哪个?

您可以查找/sys/devices/cpu/events其他硬件事件,但不能查找“硬件缓存事件”。


阅读 443

收藏
2020-06-03

共1个答案

一尘不染

用户@Margaret指出注释中的合理答案-阅读内核源代码以查看PMU事件的映射。

我们可以检查arch / x86 / events / intel /
core.c
中的事件定义。我实际上不知道这里的“核心”是否指的是Core体系结构,就大多数定义而言,这是最合适的-
但无论如何,这就是您要查看的文件。

关键部分是此部分,它定义了skl_hw_cache_event_ids

static __initconst const u64 skl_hw_cache_event_ids
                [PERF_COUNT_HW_CACHE_MAX]
                [PERF_COUNT_HW_CACHE_OP_MAX]
                [PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
 [ C(L1D ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x81d0,  /* MEM_INST_RETIRED.ALL_LOADS */
        [ C(RESULT_MISS)   ] = 0x151,   /* L1D.REPLACEMENT */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x82d0,  /* MEM_INST_RETIRED.ALL_STORES */
        [ C(RESULT_MISS)   ] = 0x0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0,
        [ C(RESULT_MISS)   ] = 0x0,
    },
},
...

解码嵌套初始化,你得到的L1D-dcahe-load对应MEM_INST_RETIRED.ALL_LOADL1-dcache-load- missesL1D.REPLACEMENT

我们可以用perf仔细检查一下:

$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null

 Performance counter stats for 'head -c100M /dev/zero':

        11,587,793      mem_inst_retired_all_loads                                   
        11,587,793      L1-dcache-loads                                             
            20,233      l1d_replacement                                             
            20,233      L1-dcache-load-misses     #    0.17% of all L1-dcache hits  
        11,587,793      L1-dcache-loads                                             
        11,495,053      mem_load_retired_l1_hit

       0.024322360 seconds time elapsed

“硬件缓存”事件显示的值与使用我们检查源时猜测的基础PMU事件的值完全相同。

2020-06-03