从源码理解Java虚引用

1,485 阅读9分钟

引用

在jdk1.2之后,java对引用的概念进行了扩充。将引用分为了强引用,软引用,弱引用,和虚引用四种。

  • 强引用:即我们平常说的引用,指在程序代码中普遍存在的引用赋值。在垃圾回收中,只要强引用还存在,那么当前对象就永远不会被回收。
  • 软引用:比强引用弱一级的引用关系。在系统将要发生内存溢出前,会将软引用关联的对象纳入垃圾回收的范围。
  • 弱引用:比软引用更弱。被弱引用关联的对象只能活到下一次垃圾回收前。
  • 虚引用:虚引用是最弱的一种引用关系,也被称为“幽灵引用”或“幻影引用”。一个对象是否有虚引用存在,对其生存不会产生任何影响,也无法通过虚引用来取得一个对象实例。

虚引用

上面已经说过:虚用于存在与否,不会对对象的生存产生任何影响,且也无法通过虚引用来获得对象实例。

那么,虚引用到底有什么用呢?

创建虚引用需要使用java.lang.ref.PhantomReference。

我们先看一下他的注释:

/**
 * Phantom reference objects, which are enqueued after the collector
 * determines that their referents may otherwise be reclaimed.  Phantom
 * references are most often used for scheduling pre-mortem cleanup actions in
 * a more flexible way than is possible with the Java finalization mechanism.
 *
 * <p> If the garbage collector determines at a certain point in time that the
 * referent of a phantom reference is <a
 * href="package-summary.html#reachability">phantom reachable</a>, then at that
 * time or at some later time it will enqueue the reference.
 *
 * <p> In order to ensure that a reclaimable object remains so, the referent of
 * a phantom reference may not be retrieved: The <code>get</code> method of a
 * phantom reference always returns <code>null</code>.
 *
 * <p> Unlike soft and weak references, phantom references are not
 * automatically cleared by the garbage collector as they are enqueued.  An
 * object that is reachable via phantom references will remain so until all
 * such references are cleared or themselves become unreachable.
 *
 * @author   Mark Reinhold
 * @since    1.2
 */

被虚引用关联的对象在收集器确认回收时会加入一个队列。经常用来以一种灵活的方式定制对象在被回收前的清理工作。

如果gc已经确定在特定时间点该虚引用对象是虚引用可达的,那么就会将其加入队列中。

为了确保可回收对象保持不变,无法通过虚引用获取对象实例:虚引用的get方法 始终返回null

与软引用和弱引用不同,虚引用不会在加入队列时自动清除。直到所有此类引用已被清除或本身无法访问时,虚引用才会被清除。

PhantomReference的注释中反复提到了一个引用队列,他就是java.lang.ref.ReferenceQueue。

ReferenceQueue

下面我们重点分析一下ReferenceQueue

在了解一个类时,最简单的办法就是看他的注释:

/**
 * Reference queues, to which registered reference objects are appended by the
 * garbage collector after the appropriate reachability changes are detected.
 *
 * @author   Mark Reinhold
 * @since    1.2
 */

引用队列,在检测到适当的可到达性变更后,垃圾回收器将已注册的引用对象添加到该队列中。

让我们看一下ReferenceQueue中入队的代码:

 boolean enqueue(Reference<? extends T> r) { /* Called only by Reference class */
        synchronized (lock) {
            // Check that since getting the lock this reference has not already been
            // enqueued (and even then removed)
            ReferenceQueue<?> queue = r.queue;
            if ((queue == NULL) || (queue == ENQUEUED)) {
                return false;
            }
            assert queue == this;
            r.queue = ENQUEUED;
            r.next = (head == null) ? r : head;
            head = r;
            queueLength++;
            if (r instanceof FinalReference) {
                sun.misc.VM.addFinalRefCount(1);
            }
            lock.notifyAll();
            return true;
        }
    }

入队代码是比较简单的,只是将入参Reference对象中的queue加入至当前队列中,下面将用图文描述调用enqueue后各个对象引用关系的变化:

在入队前,对象的引用关系如下图,在这里,Reference对象中的queue属性应该是指向ReferenceQueue对象的(assert queue == this):

在enqueue方法中,先将reference中的queue状态设为ENQUEUED,由于目前队列为null,将haed和next属性都执行入参reference对象,最后将queueLength加1。结果如下图:

此时,当前ReferenceQueue长度为1。

如果再有一个入队请求的话,由于队列中已有一个元素,head不为空,所以需要将next指针指向当前head指向的队首元素,最后更新head指针为当前入参reference,如下图:

标为黄色的为本次新加入的reference对象,此时,queueLength==2。

通过上面的图示,可以看出,java中的referenceQueue实际上是利用Reference中的next指针实现了一个先入先出的队列。

并且,在入队的过程中,使用如下代码来保证只有已注册到当前队列的对象才可以进入当前referenceQueue实例:

    ReferenceQueue<?> queue = r.queue;
    if ((queue == NULL) || (queue == ENQUEUED)) {
        return false;
    }
    assert queue == this;

那么,如何将一个引用对象注册到引用队列中去呢?

其实,我们已经通过代码看出,判断一个引用是否注册到队列中的依据就是当前引用是否持有对应队列的引用。

Reference

进入抽象类java.lang.ref.Reference中,按照惯例,先看注释:

/**
 * Abstract base class for reference objects.  This class defines the
 * operations common to all reference objects.  Because reference objects are
 * implemented in close cooperation with the garbage collector, this class may
 * not be subclassed directly.
 *
 * @author   Mark Reinhold
 * @since    1.2
 */

作为引用对象的抽象基类,该类中定义了所有引用中的一些公用操作。因为引用对象是与gc紧密相关的,此类没有直接父类。

根据注释中来看,Reference存在四种内部状态:

  • Active

    新创建的实例为Active状态。

  • Pending

    等待进入reference队列。当然,未注册到队列中的实例永远不会处于此状态。

  • Enqueued

    已经成为reference队列中成员。同样的,未注册到队列中的实例永远不会处于此状态。

  • Inactive

    终态,只要一个实例变为此状态就永远不会再改变。

下面我画了一个不太专业的状态图来表示四种状态的变化关系:

引用由Pending状态转移到Enqueued状态由后台线程Reference-handler thread操作,那么我们来看一下这个线程都干了些什么:

在Reference中,jdk使用静态代码块的方式去启动Reference Handler线程:

static {
        ThreadGroup tg = Thread.currentThread().getThreadGroup();
        for (ThreadGroup tgn = tg;
             tgn != null;
             tg = tgn, tgn = tg.getParent());
        Thread handler = new ReferenceHandler(tg, "Reference Handler");
        /* If there were a special system-only priority greater than
         * MAX_PRIORITY, it would be used here
         */
        handler.setPriority(Thread.MAX_PRIORITY);
        handler.setDaemon(true);
        handler.start();

        // provide access in SharedSecrets
        SharedSecrets.setJavaLangRefAccess(new JavaLangRefAccess() {
            @Override
            public boolean tryHandlePendingReference() {
                return tryHandlePending(false);
            }
        });
    }

可以看出Reference Handler线程为一个优先级为MAX_PRIORITY的守护线程,其中ReferenceHandler是Reference中的一个私有静态内部类。

ReferenceHandler类继承了Thread类,并在run方法中实现了一个死循环:

private static class ReferenceHandler extends Thread {

        private static void ensureClassInitialized(Class<?> clazz) {
            try {
                Class.forName(clazz.getName(), true, clazz.getClassLoader());
            } catch (ClassNotFoundException e) {
                throw (Error) new NoClassDefFoundError(e.getMessage()).initCause(e);
            }
        }

        static {
            // pre-load and initialize InterruptedException and Cleaner classes
            // so that we don't get into trouble later in the run loop if there's
            // memory shortage while loading/initializing them lazily.
            ensureClassInitialized(InterruptedException.class);
            ensureClassInitialized(Cleaner.class);
        }

        ReferenceHandler(ThreadGroup g, String name) {
            super(g, name);
        }

        public void run() {
            while (true) {
                tryHandlePending(true);
            }
        }
    }

为了理解Reference Handler线程中的逻辑,我们先看Reference中的几个关键成员属性:

    /* When active:   next element in a discovered reference list maintained by GC (or this if last)
     *     pending:   next element in the pending list (or null if last)
     *   otherwise:   NULL
     */
    transient private Reference<T> discovered;  /* used by VM */

discovered,私有transient变量,没有任何地方给他赋值。注释中也写出他是给虚拟机用的,看上去虚拟机会在引用合适的状态下给他赋对应的值。

    /* Object used to synchronize with the garbage collector.  The collector
     * must acquire this lock at the beginning of each collection cycle.  It is
     * therefore critical that any code holding this lock complete as quickly
     * as possible, allocate no new objects, and avoid calling user code.
     */
    static private class Lock { }
    private static Lock lock = new Lock();

lock,锁对象,供gc同步使用。在每次收集周期中,收集器必须获取该锁,因此,任何需要持有该锁的代码都应该尽快完成,尽可能不分配任何对象和调用用户代码。

    /* List of References waiting to be enqueued.  The collector adds
     * References to this list, while the Reference-handler thread removes
     * them.  This list is protected by the above lock object. The
     * list uses the discovered field to link its elements.
     */
    private static Reference<Object> pending = null;

pending,等待入队的引用列表(Reference中维护有next指针,所以这里说是list)。收集器会将引用加入至该list,同时,Reference-handler线程会将他们移出list。该list使用上面的lock对象作为锁,使用discovered变量去链接其元素。

下面我看Reference-handler中主要逻辑代码: 先看tryHandlePending方法的注释:

    /**
     * Try handle pending {@link Reference} if there is one.<p>
     * Return {@code true} as a hint that there might be another
     * {@link Reference} pending or {@code false} when there are no more pending
     * {@link Reference}s at the moment and the program can do some other
     * useful work instead of looping.
     *
     * @param waitForNotify if {@code true} and there was no pending
     *                      {@link Reference}, wait until notified from VM
     *                      or interrupted; if {@code false}, return immediately
     *                      when there is no pending {@link Reference}.
     * @return {@code true} if there was a {@link Reference} pending and it
     *         was processed, or we waited for notification and either got it
     *         or thread was interrupted before being notified;
     *         {@code false} otherwise.
     */

用来处理pending状态的引用(如果有的话)。如果方法返回为true,表示当前还有其他状态为pending的引用,false则相反。程序可以用此标识来确定是否进入循环。

主要代码如下(去除了异常处理逻辑):

    Reference<Object> r;
    Cleaner c;
    synchronized (lock) {
            if (pending != null) {
                r = pending;
                // 'instanceof' might throw OutOfMemoryError sometimes
                // so do this before un-linking 'r' from the 'pending' chain...
                c = r instanceof Cleaner ? (Cleaner) r : null;
                // unlink 'r' from 'pending' chain
                pending = r.discovered;
                r.discovered = null;
            } else {
                // The waiting on the lock may cause an OutOfMemoryError
                // because it may try to allocate exception objects.
                if (waitForNotify) {
                    lock.wait();
                }
                // retry if waited
                return waitForNotify;
            }
        }
    //...省略异常处理...
    //...省略异常处理...
    // Fast path for cleaners
    if (c != null) {
        c.clean();
        return true;
    }
    
    ReferenceQueue<? super Object> q = r.queue;
    if (q != ReferenceQueue.NULL) q.enqueue(r);
    return true;
    

根据pending变量和discovered的注释,可以画出此时内存中的对象状态:

pending为等待入队的变量列表,而discovered,当引用为pending状态时,为pending List的下一个指针。 利用discovered对象,将pending指向pending_list链表中的下一个位置

(这里注意下:这个链表是pending list,和上面说的referenceQueue不是一个东西):

如果r不是继承至sun.misc.Cleaner的话,最后会将r加入引用队列:

到这里,我们已经将本文开始处介绍的PhantomReference和ReferenceQueue都串连了起来:

对象从创建,注册引用队列,到最后被加入引用队列的过程;

引用的四种内部状态;

pending list,后台线程Reference-handler的处理逻辑;

在最后,还看到了sun.misc.Cleaner类,在DirectByteBuffer中,就是利用Cleaner来完成堆外内存的清理。

参考资料

深入理解java虚拟机

jdk8 相关源码&注释