We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
近期遇到一个堆外内存导致swap飙高的问题,这类问题比较罕见,因此将整个排查过程记录下来了
最近1周线上服务器时不时出现swap报警(swap超过内存10%时触发报警,内存是4G,因此swap超过400M会触发报警),每次都是童鞋们通过重启tomcat解决的; 但导致的根本原因是什么呢?必须找到根本原因才行,总是这么重启就有点low了
于是找了1台占用了swap但还未触发报警的服务器进行了排查 以下是当时通过top命令观察到的结果
23:03:22 swap占用了354M的内存
23:55:42 swap占用了398M的内存
到底是什么原因导致swap飙高呢?肯定是tomcat,因为每次重启tomcat就解决了;但根本原因是?
通过以下脚本 swap.sh
#!/bin/bash # Get current swap usage for all running processes # Erik Ljungstrom 27/05/2011 do_swap () { SUM=0 OVERALL=0 for DIR in `find /proc/ -maxdepth 1 -type d | egrep "^/proc/[0-9]"` ; do PID=`echo $DIR | cut -d / -f 3` PROGNAME=`ps -p $PID -o comm --no-headers` for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'` do let SUM=$SUM+$SWAP done echo "PID=$PID - Swap used: $SUM - ($PROGNAME )" let OVERALL=$OVERALL+$SUM SUM=0 done echo "Overall swap used: $OVERALL" } do_swap |awk -F[\ \(] '{print $5,$1,$8}' | sort -n | tail -3
可以看出PID=19911这个进程使用了324M的swap
通过grep进程号19911可以看出确实是tomcat占用swap最多
进程19911占用总的物理内存是3.1G,java占用的堆内内存大小为2.78G,剩下的320M是堆外内存占用的
Max memory = [-Xmx] + [-XX:MaxPermSize] + number_of_threads * [-Xss]
2779M=2048M+268M+463*1M
sudo -u tomcat ./jinfo -flag MaxPermSize 19911 -XX:MaxPermSize=268435456
java -XX:+PrintFlagsFinal -version | grep ThreadStackSize intx CompilerThreadStackSize = 0 {pd product} intx ThreadStackSize = 1024 {pd product} intx VMThreadStackSize = 1024 {pd product} java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
java -XX:+PrintFlagsFinal -version | grep -i permsize uintx AdaptivePermSizeWeight = 20 {product} uintx MaxPermSize = 85983232 {pd product} uintx PermSize = 21757952 {pd product} java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
堆内内存溢出可以直接通过MAT分析堆信息就可以定位到具体的代码,但是对于堆外内存就必须通过BTrace来解决
如何安装和使用google-perftools见这里 由于要启动google-perftools需要重启tomcat,所以重启tomcat后,PID从19911变成了9176
重启tomcat后,会自动生成heap文件,文件名的命名规范是gperf_pid.xxx.heap,所以我们只需要关注gperf_9176.*即可
[xxxx@xxxx /home/xxx/logs]$ ll *.heap …... -rw-r--r-- 1 tomcat tomcat 5048 May 6 10:46 gperf_9171.0001.heap -rw-r--r-- 1 tomcat tomcat 5036 May 6 10:46 gperf_9173.0001.heap -rw-r--r-- 1 tomcat tomcat 5055 May 6 10:46 gperf_9174.0001.heap -rw-r--r-- 1 tomcat tomcat 5352 May 6 10:46 gperf_9175.0001.heap -rw-r--r-- 1 tomcat tomcat 1048563 May 6 10:46 gperf_9176.0001.heap -rw-r--r-- 1 tomcat tomcat 1048564 May 6 10:46 gperf_9176.0002.heap -rw-r--r-- 1 tomcat tomcat 1048563 May 6 10:47 gperf_9176.0003.heap -rw-r--r-- 1 tomcat tomcat 1048565 May 6 10:47 gperf_9176.0004.heap -rw-r--r-- 1 tomcat tomcat 1048574 May 6 10:49 gperf_9176.0005.heap -rw-r--r-- 1 tomcat tomcat 1048574 May 6 10:50 gperf_9176.0006.heap -rw-r--r-- 1 tomcat tomcat 1048568 May 6 10:51 gperf_9176.0007.heap -rw-r--r-- 1 tomcat tomcat 1048572 May 6 10:53 gperf_9176.0008.heap -rw-r--r-- 1 tomcat tomcat 1048564 May 6 10:55 gperf_9176.0009.heap -rw-r--r-- 1 tomcat tomcat 1048560 May 6 10:58 gperf_9176.0010.heap -rw-r--r-- 1 tomcat tomcat 1048563 May 6 11:00 gperf_9176.0011.heap -rw-r--r-- 1 tomcat tomcat 1048564 May 6 11:03 gperf_9176.0012.heap …...
分析heap文件
/home/google-perftools/bin/pprof --text /home/java /home/logs/gperf_9176.0010.heap Using local file /home/java. Using local file /home/logs/gperf_9176.0010.heap. Total: 186.4 MB 91.2 48.9% 48.9% 91.2 48.9% updatewindow 52.5 28.2% 77.1% 52.5 28.2% os::malloc 38.0 20.4% 97.4% 38.0 20.4% inflateInit2_ 3.0 1.6% 99.0% 3.0 1.6% init 0.8 0.4% 99.5% 0.8 0.4% ObjectSynchronizer::omAlloc 0.4 0.2% 99.7% 0.4 0.2% readCEN 0.3 0.2% 99.9% 38.3 20.5% Java_java_util_zip_Inflater_init 0.1 0.1% 100.0% 0.1 0.1% _dl_allocate_tls 0.0 0.0% 100.0% 0.0 0.0% _dl_new_object 0.0 0.0% 100.0% 1.1 0.6% Thread::Thread 0.0 0.0% 100.0% 0.0 0.0% CollectedHeap::CollectedHeap 0.0 0.0% 100.0% 0.0 0.0% Events::init 0.0 0.0% 100.0% 0.4 0.2% ZIP_Put_In_Cache0 0.0 0.0% 100.0% 0.0 0.0% read_alias_file 0.0 0.0% 100.0% 0.0 0.0% _nl_intern_locale_data
可以看出是java.util.zip.Inflater的init()占用了比较多的内存
编写代码BtracerInflater.java对init方法进行拦截
import static com.sun.btrace.BTraceUtils.*; import com.sun.btrace.annotations.*; import java.nio.ByteBuffer; import java.lang.Thread; @BTrace public class BtracerInflater{ @OnMethod( clazz="java.util.zip.Inflater", method="/.*/" ) public static void traceCacheBlock(){ println("Who call java.util.zip.Inflater's methods :"); jstack(); } }
运行BTrace
[xxxx@l-xxx.xx.xx /home/xxx/btrace-bin/bin]$ sudo -u tomcat ./btrace -cp ../build 9176 BtracerInflater.java|more Who call java.util.zip.Inflater's methods : java.util.zip.Inflater.<init>(Inflater.java:102) java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:76) java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:90) com.xxx.OrderDiffUtil.ungzip(OrderDiffUtil.java:54) com.xxx.OrderDiffUtil.parse(OrderDiffUtil.java:32) com.xxx.FaxOrderEventListener.takeSectionChangedInfo(FaxOrderEventListener.java:87) com.xxx.FaxOrderEventListener.onMessage(FaxOrderEventListener.java:46) .......
可以看出是OrderDiffUtil的ungzip()调用了java.util.zip.Inflater的init() 看看OrderDiffUtil.ungzip()
private static String ungzip(String encodeJson) { ByteArrayOutputStream out = new ByteArrayOutputStream(encodeJson.length() * 5); ByteArrayInputStream in = null; try { in = new ByteArrayInputStream(Base64.decode(encodeJson)); } catch (UnsupportedEncodingException e) { return "{}"; } try { GZIPInputStream gunzip = new GZIPInputStream(in); byte buffer[] = new byte[1024]; int len = 0; while ((len = gunzip.read(buffer)) != -1) { out.write(buffer, 0, len); } } catch (IOException e) { return "{}"; } try { return out.toString("ISO-8859-1"); } catch (UnsupportedEncodingException e) { } return "{}"; }
可见gunzip未被close 所以根本原因是未调用GZIPInputStream的close()关闭流导致堆外内存占用
http://m.blog.csdn.net/blog/whuoyunshen88/19508075 http://itindex.net/detail/11709-perftools-内存-hbase http://outofmemory.cn/code-snippet/1713/Btrace-usage-introduction
The text was updated successfully, but these errors were encountered:
No branches or pull requests
近期遇到一个堆外内存导致swap飙高的问题,这类问题比较罕见,因此将整个排查过程记录下来了
现象描述
最近1周线上服务器时不时出现swap报警(swap超过内存10%时触发报警,内存是4G,因此swap超过400M会触发报警),每次都是童鞋们通过重启tomcat解决的;
但导致的根本原因是什么呢?必须找到根本原因才行,总是这么重启就有点low了
问题排查
于是找了1台占用了swap但还未触发报警的服务器进行了排查
以下是当时通过top命令观察到的结果
23:03:22 swap占用了354M的内存
23:55:42 swap占用了398M的内存
原因分析
到底是什么原因导致swap飙高呢?肯定是tomcat,因为每次重启tomcat就解决了;但根本原因是?
谁占用了swap
通过以下脚本 swap.sh
可以看出PID=19911这个进程使用了324M的swap
通过grep进程号19911可以看出确实是tomcat占用swap最多
进程19911占用总的物理内存是3.1G,java占用的堆内内存大小为2.78G,剩下的320M是堆外内存占用的
2779M=2048M+268M+463*1M
哪行代码占用了堆外内存
堆内内存溢出可以直接通过MAT分析堆信息就可以定位到具体的代码,但是对于堆外内存就必须通过BTrace来解决
google-perftools 定位类名和方法名
如何安装和使用google-perftools见这里
由于要启动google-perftools需要重启tomcat,所以重启tomcat后,PID从19911变成了9176
重启tomcat后,会自动生成heap文件,文件名的命名规范是gperf_pid.xxx.heap,所以我们只需要关注gperf_9176.*即可
分析heap文件
可以看出是java.util.zip.Inflater的init()占用了比较多的内存
通过BTrace定位代码调用方
编写代码BtracerInflater.java对init方法进行拦截
运行BTrace
可以看出是OrderDiffUtil的ungzip()调用了java.util.zip.Inflater的init()
看看OrderDiffUtil.ungzip()
可见gunzip未被close
所以根本原因是未调用GZIPInputStream的close()关闭流导致堆外内存占用
总结
参考资料
http://m.blog.csdn.net/blog/whuoyunshen88/19508075
http://itindex.net/detail/11709-perftools-内存-hbase
http://outofmemory.cn/code-snippet/1713/Btrace-usage-introduction
The text was updated successfully, but these errors were encountered: