Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAVA使用堆外内存导致swap飙高 #29

Open
nereuschen opened this issue May 6, 2015 · 0 comments
Open

JAVA使用堆外内存导致swap飙高 #29

nereuschen opened this issue May 6, 2015 · 0 comments

Comments

@nereuschen
Copy link
Owner

近期遇到一个堆外内存导致swap飙高的问题,这类问题比较罕见,因此将整个排查过程记录下来了

现象描述

最近1周线上服务器时不时出现swap报警(swap超过内存10%时触发报警,内存是4G,因此swap超过400M会触发报警),每次都是童鞋们通过重启tomcat解决的;
但导致的根本原因是什么呢?必须找到根本原因才行,总是这么重启就有点low了

问题排查

于是找了1台占用了swap但还未触发报警的服务器进行了排查
以下是当时通过top命令观察到的结果

23:03:22 swap占用了354M的内存
top-swap

23:55:42 swap占用了398M的内存
top-swap-2

原因分析

到底是什么原因导致swap飙高呢?肯定是tomcat,因为每次重启tomcat就解决了;但根本原因是?

谁占用了swap

通过以下脚本 swap.sh

#!/bin/bash
# Get current swap usage for all running processes
# Erik Ljungstrom 27/05/2011
do_swap () {
SUM=0
OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d | egrep "^/proc/[0-9]"` ; do
PID=`echo $DIR | cut -d / -f 3`
PROGNAME=`ps -p $PID -o comm --no-headers`
for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'`
do
let SUM=$SUM+$SWAP
done
echo "PID=$PID - Swap used: $SUM - ($PROGNAME )"
let OVERALL=$OVERALL+$SUM
SUM=0

done
echo "Overall swap used: $OVERALL"
}
do_swap  |awk -F[\ \(] '{print $5,$1,$8}' | sort -n | tail -3

可以看出PID=19911这个进程使用了324M的swap

swap-sh

通过grep进程号19911可以看出确实是tomcat占用swap最多
grep-pid

进程19911占用总的物理内存是3.1G,java占用的堆内内存大小为2.78G,剩下的320M是堆外内存占用的

Max memory = [-Xmx] + [-XX:MaxPermSize] + number_of_threads * [-Xss]

2779M=2048M+268M+463*1M

sudo -u tomcat ./jinfo -flag MaxPermSize 19911
-XX:MaxPermSize=268435456
java -XX:+PrintFlagsFinal -version | grep ThreadStackSize
     intx CompilerThreadStackSize                   = 0               {pd product}
     intx ThreadStackSize                           = 1024            {pd product}
     intx VMThreadStackSize                         = 1024            {pd product}
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
java -XX:+PrintFlagsFinal -version | grep -i permsize
    uintx AdaptivePermSizeWeight                    = 20              {product}
    uintx MaxPermSize                               = 85983232        {pd product}
    uintx PermSize                                  = 21757952        {pd product}
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

哪行代码占用了堆外内存

堆内内存溢出可以直接通过MAT分析堆信息就可以定位到具体的代码,但是对于堆外内存就必须通过BTrace来解决

google-perftools 定位类名和方法名

如何安装和使用google-perftools见这里
由于要启动google-perftools需要重启tomcat,所以重启tomcat后,PID从19911变成了9176

重启tomcat后,会自动生成heap文件,文件名的命名规范是gperf_pid.xxx.heap,所以我们只需要关注gperf_9176.*即可

[xxxx@xxxx   /home/xxx/logs]$ ll *.heap
…...
-rw-r--r-- 1 tomcat tomcat    5048 May  6 10:46 gperf_9171.0001.heap
-rw-r--r-- 1 tomcat tomcat    5036 May  6 10:46 gperf_9173.0001.heap
-rw-r--r-- 1 tomcat tomcat    5055 May  6 10:46 gperf_9174.0001.heap
-rw-r--r-- 1 tomcat tomcat    5352 May  6 10:46 gperf_9175.0001.heap
-rw-r--r-- 1 tomcat tomcat 1048563 May  6 10:46 gperf_9176.0001.heap
-rw-r--r-- 1 tomcat tomcat 1048564 May  6 10:46 gperf_9176.0002.heap
-rw-r--r-- 1 tomcat tomcat 1048563 May  6 10:47 gperf_9176.0003.heap
-rw-r--r-- 1 tomcat tomcat 1048565 May  6 10:47 gperf_9176.0004.heap
-rw-r--r-- 1 tomcat tomcat 1048574 May  6 10:49 gperf_9176.0005.heap
-rw-r--r-- 1 tomcat tomcat 1048574 May  6 10:50 gperf_9176.0006.heap
-rw-r--r-- 1 tomcat tomcat 1048568 May  6 10:51 gperf_9176.0007.heap
-rw-r--r-- 1 tomcat tomcat 1048572 May  6 10:53 gperf_9176.0008.heap
-rw-r--r-- 1 tomcat tomcat 1048564 May  6 10:55 gperf_9176.0009.heap
-rw-r--r-- 1 tomcat tomcat 1048560 May  6 10:58 gperf_9176.0010.heap
-rw-r--r-- 1 tomcat tomcat 1048563 May  6 11:00 gperf_9176.0011.heap
-rw-r--r-- 1 tomcat tomcat 1048564 May  6 11:03 gperf_9176.0012.heap
…...

分析heap文件

/home/google-perftools/bin/pprof  --text  /home/java  /home/logs/gperf_9176.0010.heap
Using local file /home/java.
Using local file /home/logs/gperf_9176.0010.heap.
Total: 186.4 MB
91.2 48.9% 48.9% 91.2 48.9% updatewindow
52.5 28.2% 77.1% 52.5 28.2% os::malloc
38.0 20.4% 97.4% 38.0 20.4% inflateInit2_
3.0 1.6% 99.0% 3.0 1.6% init
0.8 0.4% 99.5% 0.8 0.4% ObjectSynchronizer::omAlloc
0.4 0.2% 99.7% 0.4 0.2% readCEN
0.3 0.2% 99.9% 38.3 20.5% Java_java_util_zip_Inflater_init
0.1 0.1% 100.0% 0.1 0.1% _dl_allocate_tls
0.0 0.0% 100.0% 0.0 0.0% _dl_new_object
0.0 0.0% 100.0% 1.1 0.6% Thread::Thread
0.0 0.0% 100.0% 0.0 0.0% CollectedHeap::CollectedHeap
0.0 0.0% 100.0% 0.0 0.0% Events::init
0.0 0.0% 100.0% 0.4 0.2% ZIP_Put_In_Cache0
0.0 0.0% 100.0% 0.0 0.0% read_alias_file
0.0 0.0% 100.0% 0.0 0.0% _nl_intern_locale_data

可以看出是java.util.zip.Inflater的init()占用了比较多的内存

通过BTrace定位代码调用方

编写代码BtracerInflater.java对init方法进行拦截

import static com.sun.btrace.BTraceUtils.*;
import com.sun.btrace.annotations.*;

import java.nio.ByteBuffer;
import java.lang.Thread;

@BTrace public class BtracerInflater{
   @OnMethod(
      clazz="java.util.zip.Inflater",
      method="/.*/"
   )
   public static void traceCacheBlock(){
      println("Who call java.util.zip.Inflater's methods :");
     jstack();
   }
}

运行BTrace

[xxxx@l-xxx.xx.xx /home/xxx/btrace-bin/bin]$ sudo -u tomcat ./btrace -cp ../build 9176 BtracerInflater.java|more
Who call java.util.zip.Inflater's methods :
java.util.zip.Inflater.<init>(Inflater.java:102)
java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:76)
java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:90)
com.xxx.OrderDiffUtil.ungzip(OrderDiffUtil.java:54)
com.xxx.OrderDiffUtil.parse(OrderDiffUtil.java:32)
com.xxx.FaxOrderEventListener.takeSectionChangedInfo(FaxOrderEventListener.java:87)
com.xxx.FaxOrderEventListener.onMessage(FaxOrderEventListener.java:46)
.......

可以看出是OrderDiffUtil的ungzip()调用了java.util.zip.Inflater的init()
看看OrderDiffUtil.ungzip()

    private static String ungzip(String encodeJson) {
        ByteArrayOutputStream out = new ByteArrayOutputStream(encodeJson.length() * 5);
        ByteArrayInputStream in = null;
        try {
            in = new ByteArrayInputStream(Base64.decode(encodeJson));
        } catch (UnsupportedEncodingException e) {
            return "{}";
        }
        try {
            GZIPInputStream gunzip = new GZIPInputStream(in);
            byte buffer[] = new byte[1024];
            int len = 0;
            while ((len = gunzip.read(buffer)) != -1) {
                out.write(buffer, 0, len);
            }
        } catch (IOException e) {
            return "{}";
        }
        try {
            return out.toString("ISO-8859-1");
        } catch (UnsupportedEncodingException e) {
        }
        return "{}";
    }

可见gunzip未被close
所以根本原因是未调用GZIPInputStream的close()关闭流导致堆外内存占用

总结

  • 堆外内存的问题使用google-prefile和BTrace定位问题非常方便
  • 堆外内存主要出现在使用DirectByteBuffer或者gzip不当时出现

参考资料

http://m.blog.csdn.net/blog/whuoyunshen88/19508075
http://itindex.net/detail/11709-perftools-内存-hbase
http://outofmemory.cn/code-snippet/1713/Btrace-usage-introduction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant