阿里妹导读
踩内存问题分析成本较高,尤其是低概率问题困难更大。本文详细分析并还原了两个由于动态库全局符号介入机制(it's a feature, not a bug)触发的踩内存案例。
一、踩内存不仅仅是调皮
1.1. 堆内存管理结构
1.2 踩内存后果
二、一个小案例
2.1. 崩溃栈无法分析原因
#0 0x00007f91396cd207 in raise () from /lib64/libc.so.6
in abort () from /lib64/libc.so.6 1 0x00007f91396ce8f8
in __libc_message () from /lib64/libc.so.6 2 0x00007f913970fd27
in malloc_printerr () from /lib64/libc.so.6 3 0x00007f91397165d4
in _int_free () from /lib64/libc.so.6 4 0x00007f91397186cb
in Posxxx::releasexxx () 5 0x00007f913ce85fa1
in xxxProvider::~xxxProvider (this=0x151fbc0, __in_chrg=<optimized out>) 6 0x00007f913cdf53be
at /root/workspace/feature/xxxProvider/xxxProvider.cpp:27
2.2. valgrind报告指认真凶
valgrind --tool=memcheck --leak-check=full --show-reachable=yes --trace-children=yes ./Map /data/ /short.loc 2>&1|tee valgrind.log
valgrind: m_mallocfree.c:305 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed. valgrind: Heap block lo/hi size mismatch: lo = 1360, hi = 3212836864.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.
==92== Invalid write of size 8 ==92== at 0x50F317C: reset (xxx_define.h:380)
==92== by 0x50F317C: xxxInfo (xxx_define.h:299)
==92== by 0x50F317C: xxx::xxx::xxxData() (xxxDefineBase.cpp:4)
==92== by 0x6089286: ??? (in /root/workspace/test/sdk/xxxResim/libxxxSimulater.so)
==92== by 0x400F8F2: _dl_init (in /usr/lib64/ld-2.17.so)
==92== by 0x4001159: ??? (in /usr/lib64/ld-2.17.so)
==92== by 0x2: ???
2.3. 问题修复
2.4. 刨根问底
2.5. 深入分析
LD_DEBUG=files ./Map /data/ /data/short.loc
560: file=libxxxSDK.so [0]; needed by ./Map [0] 560: file=libxxxSDK.so [0]; generating link map
560: dynamic: 0x00007f53a0aa1a10 base: 0x00007f53a02d9000 size: 0x00000000007e1f90
560: entry: 0x00007f53a04a05b0 phdr: 0x00007f53a02d9040 phnum: 7
...
560: file=libxxxSimulater.so [0]; needed by ./Map [0]
560: file=libxxxSimulater.so [0]; generating link map
...
2.6. 真相大白
2.7. 用实践来检验真理
struct A {
A()
{
std::cout << "A() in a1_def.h" << std::endl;
}
~A()
{
std::cout << "~A() in a1_def.h" << std::endl;
}
int a;
};
struct A {
A()
{
std::cout << "A() in a2_def.h" << std::endl;
}
~A()
{
std::cout << "~A() in a2_def.h" << std::endl;
}
int a[8]; // 注意这里故意比a1_def.h中多了几个字节
};
void b1();
void b2();
void b1()
{
struct A a;
std::cout << "b1(): sizeof A is: " << sizeof(a) << std::endl;
}
void b2()
{
struct A a;
std::cout << "b2(): sizeof A is: " << sizeof(a) << std::endl;
}
g++ -fPIC -shared b1.cpp -o b1.so
int main()
{
b2();
b1();
return 0;
}
三、似曾相识
3.1. 熟悉的配方--奇怪的崩溃栈
3.2. 一样的味道--全局符号介入
3.3. 又一次真相大白
3.4. show me the code
void funcA();
void funcA()
{
std::cout << "funcA: size of wchar_t:" << sizeof(wchar_t) << std::endl;
std::vector<wchar_t> words = {};
size_t n = words.size();
}
g++ -fPIC -std=c++11 -fshort-wchar -shared -g A.cpp -o A.so
void funcB();
void funcB()
{
std::cout << "funcB: size of wchar_t:" << sizeof(wchar_t) << std::endl;
std::vector<wchar_t> words = {};
size_t n = words.size();
}
g++ -fPIC -std=c++11 -shared -g B.cpp -o B.so
stl_template_demo]# nm A.so |grep vector
000000000000134a W _ZNKSt6vectorIwSaIwEE4sizeEv
00000000000012ec W _ZNSt6vectorIwSaIwEEC1Ev
00000000000012ec W _ZNSt6vectorIwSaIwEEC2Ev
0000000000001306 W _ZNSt6vectorIwSaIwEED1Ev
0000000000001306 W _ZNSt6vectorIwSaIwEED2Ev
[root@4bad734105ec stl_template_demo]
std::vector<wchar_t, std::allocator<wchar_t> >::size() const
[root@4bad734105ec stl_template_demo]
std::vector<wchar_t, std::allocator<wchar_t> >::vector()
[root@4bad734105ec stl_template_demo]
std::vector<wchar_t, std::allocator<wchar_t> >::~vector()
[root@4bad734105ec stl_template_demo]
std::vector<wchar_t, std::allocator<wchar_t> >::~vector()
int main()
{
funcA();
funcB();
}
// 先链接A.so
g++ main.cpp A.so B.so -g -o main_A_link_first
// 先链接B.so
g++ main.cpp B.so A.so -g -o main_B_link_first
Starting program: /root/workspace/test/stl_template_demo/main_A_link_first
warning: Error disabling address space randomization: Operation not permitted
Breakpoint 1, main () at main.cpp:6
6 funcA();
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.3.x86_64 libgcc-4.8.5-36.el7_6.1.x86_64
(gdb) n
funcA: size of wchar_t:2
7 funcB();
(gdb) s
funcB () at B.cpp:7
7 std::cout << "funcB: size of wchar_t:" << sizeof(wchar_t) << std::endl;
(gdb) n
funcB: size of wchar_t:4
8 std::vector<wchar_t> words = {};
(gdb) s
std::vector<wchar_t, std::allocator<wchar_t> >::vector (this=0x7fff35804dd0) at /usr/include/c++/4.8.2/bits/stl_vector.h:249
249 : _Base() { }
(gdb) i register pc
pc 0x7f312b8f62f8 0x7f312b8f62f8 <std::vector<wchar_t, std::allocator<wchar_t> >::vector()+12>
(gdb) i symbol 0x7f312b8f62f8
std::vector<wchar_t, std::allocator<wchar_t> >::vector() + 12 in section .text of ./A.so
(gdb) i symbol 0x7f312b8f62f8 std::vector<wchar_t, std::allocator<wchar_t> >::vector() + 12 in section .text of ./A.so
[root@4bad734105ec stl_template_demo]# ./main_A_link_first
funcA: size of wchar_t:2
funcB: size of wchar_t:4
3.5. 整理回顾
我们的SDK额外导出了一些符号,更好的做法应该是使用编译参数-fvisibility=hidden默认隐藏所有符号,只针对性导出对外接口符号。
四、小结&感悟
不一致,是万恶之源。
[1]https://bnikolic.co.uk/blog/linux-ld-debug.html