转载

关于JVM堆外内存的一切

Java中的对象都是在JVM堆中分配的，其好处在于开发者不用关心对象的回收。但有利必有弊，堆内内存主要有两个缺点：1.GC是有成本的，堆中的对象数量越多，GC的开销也会越大。2.使用堆内内存进行文件、网络的IO时，JVM会使用堆外内存做一次额外的中转，也就是会多一次内存拷贝。

和堆内内存相对应，堆外内存就是把内存对象分配在Java虚拟机堆以外的内存，这些内存直接受操作系统管理（而不是虚拟机），这样做的结果就是能够在一定程度上减少垃圾回收对应用程序造成的影响。

我们先看下堆外内存的实现原理，再谈谈它的应用场景。

更多文章见个人博客： github.com/farmerjohng…

堆外内存的实现

Java中分配堆外内存的方式有两种，一是通过 ByteBuffer.java#allocateDirect 得到以一个DirectByteBuffer对象，二是直接调用 Unsafe.java#allocateMemory 分配内存，但Unsafe只能在JDK的代码中调用，一般不会直接使用该方法分配内存。

其中DirectByteBuffer也是用Unsafe去实现内存分配的，对堆内存的分配、读写、回收都做了封装。本篇文章的内容也是分析DirectByteBuffer的实现。

我们从堆外内存的分配回收、读写两个角度去分析DirectByteBuffer。

堆外内存的分配与回收

//ByteBuffer.java 
public static ByteBuffer allocateDirect(int capacity) {
    return new DirectByteBuffer(capacity);
}
复制代码

ByteBuffer#allocateDirect 中仅仅是创建了一个DirectByteBuffer对象，重点在DirectByteBuffer的构造方法中。

DirectByteBuffer(int cap) {                   // package-private
    //主要是调用ByteBuffer的构造方法，为字段赋值
    super(-1, 0, cap, cap);
    //如果是按页对齐，则还要加一个Page的大小；我们分析只pa为false的情况就好了
    boolean pa = VM.isDirectMemoryPageAligned();
    int ps = Bits.pageSize();
    long size = Math.max(1L, (long)cap + (pa ? ps : 0));
    //尝试释放内存
    Bits.reserveMemory(size, cap);

    long base = 0;
    try {
        //分配内存
        base = unsafe.allocateMemory(size);
    } catch (OutOfMemoryError x) {
        Bits.unreserveMemory(size, cap);
        throw x;
    }
    //将分配的内存的所有值赋值为0
    unsafe.setMemory(base, size, (byte) 0);
    //为address赋值，address就是分配内存的起始地址，之后的数据读写都是以它作为基准
    if (pa && (base % ps != 0)) {
        // Round up to page boundary
        address = base + ps - (base & (ps - 1));
    } else {
        //pa为false的情况，address==base
        address = base;
    }
    //创建一个Cleaner，将this和一个Deallocator对象传进去
    cleaner = Cleaner.create(this, new Deallocator(base, size, cap));
    att = null;

}
复制代码

DirectByteBuffer构造方法中还做了挺多事情的，总的来说分为几个步骤：

释放内存
分配内存
将刚分配的内存空间初始化为0
创建一个cleaner对象，Cleaner对象的作用是当DirectByteBuffer对象被回收时，释放其对应的堆外内存

Java的堆外内存回收设计是这样的：当GC发现DirectByteBuffer对象变成垃圾时，会调用 Cleaner#clean 回收对应的堆外内存，一定程度上防止了内存泄露。当然，也可以手动的调用该方法，对堆外内存进行提前回收。

Cleaner的实现

我们先看下 Cleaner#clean 的实现：

public class Cleaner extends PhantomReference<Object> {
   ...
    private Cleaner(Object referent, Runnable thunk) {
        super(referent, dummyQueue);
        this.thunk = thunk;
    }
    public void clean() {
        if (remove(this)) {
            try {
                //thunk是一个Deallocator对象
                this.thunk.run();
            } catch (final Throwable var2) {
              ...
            }

        }
    }
}

private static class Deallocator
    implements Runnable
    {

        private static Unsafe unsafe = Unsafe.getUnsafe();

        private long address;
        private long size;
        private int capacity;

        private Deallocator(long address, long size, int capacity) {
            assert (address != 0);
            this.address = address;
            this.size = size;
            this.capacity = capacity;
        }

        public void run() {
            if (address == 0) {
                // Paranoia
                return;
            }
            //调用unsafe方法回收堆外内存
            unsafe.freeMemory(address);
            address = 0;
            Bits.unreserveMemory(size, capacity);
        }

    }

复制代码

Cleaner继承自PhantomReference，关于虚引用的知识，可以看我之前写的文章

简单的说，就是当字段referent(也就是DirectByteBuffer对象)被回收时，会调用到 Cleaner#clean 方法，最终会调用到 Deallocator#run 进行堆外内存的回收。

Cleaner是虚引用在JDK中的一个典型应用场景。

释放内存

然后再看下DirectByteBuffer构造方法中的第二步， reserveMemory

static void reserveMemory(long size, int cap) {
        //maxMemory代表最大堆外内存，也就是-XX:MaxDirectMemorySize指定的值
        if (!memoryLimitSet && VM.isBooted()) {
            maxMemory = VM.maxDirectMemory();
            memoryLimitSet = true;
        }

        //1.如果堆外内存还有空间，则直接返回
        if (tryReserveMemory(size, cap)) {
            return;
        }
		//走到这里说明堆外内存剩余空间已经不足了
        final JavaLangRefAccess jlra = SharedSecrets.getJavaLangRefAccess();

        //2.堆外内存进行回收，最终会调用到Cleaner#clean的方法。如果目前没有堆外内存可以回收则跳过该循环
        while (jlra.tryHandlePendingReference()) {
            //如果空闲的内存足够了，则return
            if (tryReserveMemory(size, cap)) {
                return;
            }
        }

       //3.主动触发一次GC，目的是触发老年代GC
        System.gc();

        //4.重复上面的过程
        boolean interrupted = false;
        try {
            long sleepTime = 1;
            int sleeps = 0;
            while (true) {
                if (tryReserveMemory(size, cap)) {
                    return;
                }
                if (sleeps >= MAX_SLEEPS) {
                    break;
                }
                if (!jlra.tryHandlePendingReference()) {
                    try {
                        Thread.sleep(sleepTime);
                        sleepTime <<= 1;
                        sleeps++;
                    } catch (InterruptedException e) {
                        interrupted = true;
                    }
                }
            }

            //5.超出指定的次数后，还是没有足够内存，则抛异常
            throw new OutOfMemoryError("Direct buffer memory");

        } finally {
            if (interrupted) {
                // don't swallow interrupts
                Thread.currentThread().interrupt();
            }
        }
    }
	
    private static boolean tryReserveMemory(long size, int cap) {
        //size和cap主要是page对齐的区别，这里我们把这两个值看作是相等的
        long totalCap;
        //totalCapacity代表通过DirectByteBuffer分配的堆外内存的大小
        //当已分配大小<=还剩下的堆外内存大小时，更新totalCapacity的值返回true
        while (cap <= maxMemory - (totalCap = totalCapacity.get())) {
            if (totalCapacity.compareAndSet(totalCap, totalCap + cap)) {
                reservedMemory.addAndGet(size);
                count.incrementAndGet();
                return true;
            }
        }
		//堆外内存不足，返回false
        return false;
    }
复制代码

在创建一个新的DirecByteBuffer时，会先确认有没有足够的内存，如果没有的话，会通过一些手段回收一部分堆外内存，直到可用内存大于需要分配的内存。具体步骤如下：

tryHandlePendingReference
-XX:+DisableExplicitGC

详细分析下第2步是如何回收垃圾的： tryHandlePendingReference 最终调用到的是 Reference#tryHandlePending 方法，在之前的文章中有介绍过该方法

static boolean tryHandlePending(boolean waitForNotify) {
        Reference<Object> r;
        Cleaner c;
        try {
            synchronized (lock) {
                //pending由jvm gc时设置
                if (pending != null) {
                    r = pending;
                    // 如果是cleaner对象，则记录下来
                    c = r instanceof Cleaner ? (Cleaner) r : null;
                    // unlink 'r' from 'pending' chain
                    pending = r.discovered;
                    r.discovered = null;
                } else {
                    // waitForNotify传入的值为false
                    if (waitForNotify) {
                        lock.wait();
                    }
                    // 如果没有待回收的Reference对象，则返回false
                    return waitForNotify;
                }
            }
        } catch (OutOfMemoryError x) {
            ...
        } catch (InterruptedException x) {
           ...
        }

        // Fast path for cleaners
        if (c != null) {
            //调用clean方法
            c.clean();
            return true;
        }

        ...
        return true;
}
复制代码

可以看到， tryHandlePendingReference 的最终效果就是：如果有垃圾DirectBytebuffer对象，则调用对应的 Cleaner#clean 方法进行回收。clean方法在上面已经分析过了。

堆外内存的读写

public ByteBuffer put(byte x) {
       unsafe.putByte(ix(nextPutIndex()), ((x)));
       return this;
}

final int nextPutIndex() {                         
    if (position >= limit)
        throw new BufferOverflowException();
    return position++;
}

private long ix(int i) {
    return address + ((long)i << 0);
}

public byte get() {
    return ((unsafe.getByte(ix(nextGetIndex()))));
}

final int nextGetIndex() {                          // package-private
    if (position >= limit)
        throw new BufferUnderflowException();
    return position++;
}

复制代码

读写的逻辑也比较简单，address就是构造方法中分配的native内存的起始地址。Unsafe的putByte/getByte都是native方法，就是写入值到某个地址/获取某个地址的值。

堆外内存的使用场景

适合长期存在或能复用的场景

堆外内存分配回收也是有开销的，所以适合长期存在的对象

适合注重稳定的场景

堆外内存能有效避免因GC导致的暂停问题。

适合简单对象的存储

因为堆外内存只能存储字节数组，所以对于复杂的DTO对象，每次存储/读取都需要序列化/反序列化，

适合注重IO效率的场景

用堆外内存读写文件性能更好

文件IO

关于堆外内存IO为什么有更好的性能这点展开一下。

BIO

BIO的文件写 FileOutputStream#write 最终会调用到native层的 io_util.c#writeBytes 方法

void
writeBytes(JNIEnv *env, jobject this, jbyteArray bytes,
           jint off, jint len, jboolean append, jfieldID fid)
{
    jint n;
    char stackBuf[BUF_SIZE];
    char *buf = NULL;
    FD fd;

 	...

    // 如果写入长度为0，直接返回0
    if (len == 0) {
        return;
    } else if (len > BUF_SIZE) {
        // 如果写入长度大于BUF_SIZE（8192），无法使用栈空间buffer
        // 需要调用malloc在堆空间申请buffer
        buf = malloc(len);
        if (buf == NULL) {
            JNU_ThrowOutOfMemoryError(env, NULL);
            return;
        }
    } else {
        buf = stackBuf;
    }

    // 复制Java传入的byte数组数据到C空间的buffer中
    (*env)->GetByteArrayRegion(env, bytes, off, len, (jbyte *)buf);
 	
     if (!(*env)->ExceptionOccurred(env)) {
        off = 0;
        while (len > 0) {
            fd = GET_FD(this, fid);
            if (fd == -1) {
                JNU_ThrowIOException(env, "Stream Closed");
                break;
            }
            //写入到文件，这里传递的数组是我们新创建的buf
            if (append == JNI_TRUE) {
                n = (jint)IO_Append(fd, buf+off, len);
            } else {
                n = (jint)IO_Write(fd, buf+off, len);
            }
            if (n == JVM_IO_ERR) {
                JNU_ThrowIOExceptionWithLastError(env, "Write error");
                break;
            } else if (n == JVM_IO_INTR) {
                JNU_ThrowByName(env, "java/io/InterruptedIOException", NULL);
                break;
            }
            off += n;
            len -= n;
        }
    }
}

复制代码

GetByteArrayRegion 其实就是对数组进行了一份拷贝，该函数的实现在jni.cpp宏定义中，找了很久才找到

//jni.cpp
JNI_ENTRY(void, /
jni_Get##Result##ArrayRegion(JNIEnv *env, ElementType##Array array, jsize start, /
             jsize len, ElementType *buf)) /
 ...
      int sc = TypeArrayKlass::cast(src->klass())->log2_element_size(); /
      //内存拷贝
      memcpy((u_char*) buf, /
             (u_char*) src->Tag##_at_addr(start), /
             len << sc);                          /
...
  } /
JNI_END
复制代码

可以看到，传统的BIO，在native层真正写文件前，会在堆外内存（c分配的内存）中对字节数组拷贝一份，之后真正IO时，使用的是堆外的数组。要这样做的原因是

1.底层通过write、read、pwrite，pread函数进行系统调用时，需要传入buffer的起始地址和buffer count作为参数。如果使用java heap的话，我们知道jvm中buffer往往以byte[] 的形式存在，这是一个特殊的对象，由于java heap GC的存在，这里对象在堆中的位置往往会发生移动，移动后我们传入系统函数的地址参数就不是真正的buffer地址了，这样的话无论读写都会发生出错。而C Heap仅仅受Full GC的影响，相对来说地址稳定。

2.JVM规范中没有要求Java的byte[]必须是连续的内存空间，它往往受宿主语言的类型约束；而C Heap中我们分配的虚拟地址空间是可以连续的，而上述的系统调用要求我们使用连续的地址空间作为buffer。
复制代码

以上内容来自于知乎 ETIN的回答 www.zhihu.com/question/60…

BIO的文件读也一样，这里就不分析了。

NIO

NIO的文件写最终会调用到 IOUtil#write

static int write(FileDescriptor fd, ByteBuffer src, long position,
                     NativeDispatcher nd, Object lock)
        throws IOException
    {
    	//如果是堆外内存，则直接写
        if (src instanceof DirectBuffer)
            return writeFromNativeBuffer(fd, src, position, nd, lock);

        // Substitute a native buffer
        int pos = src.position();
        int lim = src.limit();
        assert (pos <= lim);
        int rem = (pos <= lim ? lim - pos : 0);
        //创建一块堆外内存，并将数据赋值到堆外内存中去
        ByteBuffer bb = Util.getTemporaryDirectBuffer(rem);
        try {
            bb.put(src);
            bb.flip();
            // Do not update src until we see how many bytes were written
            src.position(pos);

            int n = writeFromNativeBuffer(fd, bb, position, nd, lock);
            if (n > 0) {
                // now update src
                src.position(pos + n);
            }
            return n;
        } finally {
            Util.offerFirstTemporaryDirectBuffer(bb);
        }
    }
    
 	/**
     * 分配一片堆外内存
     */
    static ByteBuffer getTemporaryDirectBuffer(int size) {
        BufferCache cache = bufferCache.get();
        ByteBuffer buf = cache.get(size);
        if (buf != null) {
            return buf;
        } else {
            // No suitable buffer in the cache so we need to allocate a new
            // one. To avoid the cache growing then we remove the first
            // buffer from the cache and free it.
            if (!cache.isEmpty()) {
                buf = cache.removeFirst();
                free(buf);
            }
            return ByteBuffer.allocateDirect(size);
        }
    }

   
复制代码

可以看到，NIO的文件写，对于堆内内存来说也是会有一次额外的内存拷贝的。