用glibc现代x86_64的Linux将检测到CPU具有支持AVX扩展,并会从通用实现许多字符串函数切换到AVX优化版(带ifunc调度员的帮助:1,2)。
此功能可以提高性能,但会阻止valgrind(较旧的libVEXs,在valgrind-3.8之前)和gdb的target record“(反向执行)”工具无法正常工作(Ubuntu“ Z” 17.04 beta, gdb 7.12 .50.20170207-0ubuntu2, gcc 6.3.0-8ubuntu1 20170221,Ubuntu GLIBC 2.24-7ubuntu2):
target record
$ cat a.c #include <string.h> #define N 1000 int main(){ char src[N], dst[N]; memcpy(dst, src, N); return 0; } $ gcc a.c -o a -fno-builtin $ gdb -q ./a Reading symbols from ./a...(no debugging symbols found)...done. (gdb) start Temporary breakpoint 1 at 0x724 Starting program: /home/user/src/a Temporary breakpoint 1, 0x0000555555554724 in main () (gdb) record (gdb) c Continuing. Process record does not support instruction 0xc5 at address 0x7ffff7b60d31. Process record: failed to record execution log. Program stopped. __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:416 416 VMOVU (%rsi), %VEC(4) (gdb) x/i $pc => 0x7ffff7b60d31 <__memmove_avx_unaligned_erms+529>: vmovdqu (%rsi),%ymm4
Process record does not support instruction 0xc5
_dl_runtime_resolve_avx
在https://sourceware.org/ml/gdb/2016-08/msg00028.html中提出的解决方案。或set LD_BIND_NOW=1,但是重新编译的glibc仍然具有AVX,并且ld bind-now现在无济于事。
LD_BIND_NOW=1
听说glibc 中有/etc/ld.so.nohwcap和LD_HWCAP_MASK配置。可以使用它们禁用glibc中的ifunc分派到AVX优化的字符串函数吗?
/etc/ld.so.nohwcap
LD_HWCAP_MASK
glibc(rtld?)如何使用cpuid,和/proc/cpuinfo(可能不是)或HWCAP aux(LD_SHOW_AUXV=1 /bin/echo |grep HWCAP命令给出AT_HWCAP: bfebfbff)来检测AVX ?
cpuid
/proc/cpuinfo
LD_SHOW_AUXV=1 /bin/echo |grep HWCAP
AT_HWCAP: bfebfbff
似乎没有直接的运行时方法来修补特征检测。此检测发生在动态链接器(ld.so)的早期。
目前,对链接程序进行二进制修补似乎是最简单的方法。@osgx描述了一种覆盖跳转的方法。另一种方法是伪造cpuid结果。通常,在寄存器ebx,ecx和edx 中返回制造商ID的同时,cpuid(eax=0)返回支持的最高功能。我们在glibc 2.25中有以下代码片段:eaxsysdeps/x86/cpu- features.c
cpuid(eax=0)
eax
sysdeps/x86/cpu- features.c
__cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx); /* This spells out "GenuineIntel". */ if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69) { /* feature detection for various Intel CPUs */ } /* another case for AMD */ else { kind = arch_kind_other; get_common_indeces (cpu_features, NULL, NULL, NULL, NULL); }
该__cpuid行将转换为/lib/ld-linux-x86-64.so.2(/lib/ld-2.25.so)中的这些说明:
__cpuid
/lib/ld-linux-x86-64.so.2
/lib/ld-2.25.so
172a8: 31 c0 xor eax,eax 172aa: c7 44 24 38 00 00 00 mov DWORD PTR [rsp+0x38],0x0 172b1: 00 172b2: c7 44 24 3c 00 00 00 mov DWORD PTR [rsp+0x3c],0x0 172b9: 00 172ba: 0f a2 cpuid
因此,除了修补分支之外,我们还可以将更cpuid改为一条nop指令,该指令将导致最后一个else分支的调用(因为寄存器将不包含“ GenuineIntel”)。由于最初的eax=0,cpu_features->max_cpuid也将为0,if (cpu_features->max_cpuid >= 7)也将被绕过。
nop
else
eax=0
cpu_features->max_cpuid
if (cpu_features->max_cpuid >= 7)
二进制补丁cpuid(eax=0)通过nop这个可以用这个工具来完成(x86和x86-64的作品):
#!/usr/bin/env python import re import sys infile, outfile = sys.argv[1:] d = open(infile, 'rb').read() # Match CPUID(eax=0), "xor eax,eax" followed closely by "cpuid" o = re.sub(b'(\x31\xc0.{0,32}?)\x0f\xa2', b'\\1\x66\x90', d) assert d != o open(outfile, 'wb').write(o)
等效的Perl变体-0777可确保立即读取文件,而不是在换行符处分开记录:
-0777
perl -0777 -pe 's/\x31\xc0.{0,32}?\K\x0f\xa2/\x66\x90/' < /lib/ld-linux-x86-64.so.2 > ld-linux-x86-64-patched.so.2 # Verify result, should display "Success" cmp -s /lib/ld-linux-x86-64.so.2 ld-linux-x86-64-patched.so.2 && echo 'Not patched' || echo Success
那是容易的部分。现在,我不想替换系统范围内的动态链接器,而是仅使用此链接器执行一个特定程序。当然,可以使用来完成./ld- linux-x86-64-patched.so.2 ./a,但是朴素的gdb调用无法设置断点:
./ld- linux-x86-64-patched.so.2 ./a
$ gdb -q -ex "set exec-wrapper ./ld-linux-x86-64-patched.so.2" -ex start ./a Reading symbols from ./a...done. Temporary breakpoint 1 at 0x400502: file a.c, line 5. Starting program: /tmp/a During startup program exited normally. (gdb) quit $ gdb -q -ex start --args ./ld-linux-x86-64-patched.so.2 ./a Reading symbols from ./ld-linux-x86-64-patched.so.2...(no debugging symbols found)...done. Function "main" not defined. Temporary breakpoint 1 (main) pending. Starting program: /tmp/ld-linux-x86-64-patched.so.2 ./a [Inferior 1 (process 27418) exited normally] (gdb) quit
如何使用自定义elf解释器调试程序中介绍了手动解决方法。它可以工作,但是很遗憾,它是使用的手动操作add- symbol-file。不过,应该可以使用GDB捕捉点将其自动化。
add- symbol-file
一点不二进制链接是一种替代方法LD_PRELOAD荷兰国际集团,它定义了自定义程序库memcpy,memove等等。这一操作将优先于glibc的程序。功能的完整列表在中提供sysdeps/x86_64/multiarch/ifunc- impl-list.c。与glibc 2.25发行版相比,当前的HEAD具有更多的符号,总计(grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c):
LD_PRELOAD
memcpy
memove
sysdeps/x86_64/multiarch/ifunc- impl-list.c
grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c
memchr,memcmp, memmove_chk,memmove,memrchr, memset_chk,memset,rawmemchr,strlen,strnlen,stpncpy,stpcpy,strcasecmp,strcasecmp_l,strcat,strchr,strchpyuln,strrchr,strcmp,strcmpn,strcatnc, strpbrk,strspn,strstr,wcschr,wcsrchr,wcscpy,wcslen,wcsnlen,wmemchr,wmemcmp,wmemset, memcpy_chk,memcpy, mempcpy_chk,mempcpy,strncmp,__ wmemset_chk,