Android异常：finalize() timed out after 10 seconds

图片来自必应

本文是根据Android开发高手课第二节写的，类似笔记。

这篇文章主要是针对一个 TimeoutException 的问题，是来自系统的 FinalizerWatchdogDaemon 的异常。是因为finalize方法GC超过10s，就会抛出这个异常。在解决这个问题之前，首先要了解什么是 FinalizerWatchdogDaemon :

FinalizerWatchdogDaemon 是继承自 Damons 的，在启动应用的时候，Zygote会fork一个进程，Daemon的就是在创建子进程的时候创建的。创建的过程包括三个步骤：

1、VM_HOOK.preFork(), 该方法是做一些fork进程前的准备工作。

2、nativeForkAndSpecialize：创建子进程的方法。

3、VM_HOOK.postForkCommon() : 启动Zygote的四个Damon线程，其中就包括了 FinalizerWatchdogDaemon。

public static int forkAndSpecialize(int uid, int gid, int[] gids, int runtimeFlags,
      int[][] rlimits, int mountExternal, String seInfo, String niceName, int[] fdsToClose,
      int[] fdsToIgnore, boolean startChildZygote, String instructionSet, String appDataDir) {
    VM_HOOKS.preFork();
    // Resets nice priority for zygote process.
    resetNicePriority();
    int pid = nativeForkAndSpecialize(
              uid, gid, gids, runtimeFlags, rlimits, mountExternal, seInfo, niceName, fdsToClose,
              fdsToIgnore, startChildZygote, instructionSet, appDataDir);
    // Enable tracing as soon as possible for the child process.
    if (pid == 0) {
        Trace.setTracingEnabled(true, runtimeFlags);
    
        // Note that this event ends at the end of handleChildProc,
        Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "PostFork");
    }
    VM_HOOKS.postForkCommon();
    return pid;
    }

 //step 1:
    //停止四个线程：Daemon线程，java堆整理，引用队列，析构线程 
    //也就是创建子进程的时候，这几个线程要停止运行。
    public void preFork() {
        Daemons.stop();
        waitUntilAllThreadsStopped();
        token = nativePreFork();
    }
    
    ....
    
    /**
     * Called by the zygote in both the parent and child processes after
     * every fork. In the child process, this method is called after
     * {@code postForkChild}.
     */
     //step 3, 启动Daemons
    public void postForkCommon() {
        Daemons.start();
    }

在了解了创建过程的之后，再来看一下上面说到的四个Damon线程：

ReferenceQueueDaemon：引用队列守护线程。我们知道，在创建引用对象的时候，可以关联一个队列。当被引用对象引用的对象被GC回收的时候，被引用对象就会被加入到其创建时关联的队列去。这个加入队列的操作就是由ReferenceQueueDaemon守护线程来完成的。这样应用程序就可以知道哪些被引用的对象已经被回收了。
FinalizerDaemon：析构守护线程。对于重写了成员函数finalize的对象，它们被GC决定回收时，并没有马上被回收，而是被放入到一个队列中，等待FinalizerDaemon守护线程去调用它们的成员函数finalize，然后再被回收。
FinalizerWatchdogDaemon：析构监护守护线程。用来监控FinalizerDaemon线程的执行。一旦检测那些重写了finalize的对象在执行成员函数finalize时超出一定时间，那么就会退出VM。
HeapTaskDaemon : 堆裁剪守护线程。用来执行裁剪堆的操作，也就是用来将那些空闲的堆内存归还给系统。

可以看到，FinalizerWatchdogDaemon 主要就是监控finalize的时间的。那么再看下它的源码：

 @Override public void runInternal() {
    while (isRunning()) {
        if (!sleepUntilNeeded()) {
            // We have been interrupted, need to see if this daemon has been stopped.
            continue;
        }
        final Object finalizing = waitForFinalization();
        if (finalizing != null && !VMRuntime.getRuntime().isDebuggerActive()) {
            finalizerTimedOut(finalizing);
            break;
        }
    }
}

可以看的出来，当执行完waitForFinalization 之后，会返回一个finalizing，如果不为空，则会调用 finalizerTimeOut , 首先看一下 waitForFinalization :

/**
* Return an object that took too long to finalize or return null.
* Wait MAX_FINALIZE_NANOS.  If the FinalizerDaemon took essentially the whole time
* processing a single reference, return that reference.  Otherwise return null.
*/
private Object waitForFinalization() {
  long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
  // Avoid remembering object being finalized, so as not to keep it alive.
  if (!sleepFor(MAX_FINALIZE_NANOS)) {
    // Don't report possibly spurious timeout if we are interrupted.
    return null;
  }
  if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
    // ...
    Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
    sleepFor(NANOS_PER_SECOND / 2);
    //...
    if (getNeedToWork()
        && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
      return finalizing;
    }
  }
  return null;
}

从这个方法的注释就可以看的出来，如果finalize超过了 MAX_FINALIZE_NANOS （也就是10s），则会返回一个FinalizerDaemon的实例赋值给finalizing并且返回，否则返回null。上面说过，如果这个方法返回值不为null，则会调用 finalizerTimeOut 方法：

private static void finalizerTimedOut(Object object) {
    // The current object has exceeded the finalization deadline; abort!
    String message = object.getClass().getName() + ".finalize() timed out after "
            + (MAX_FINALIZE_NANOS / NANOS_PER_SECOND) + " seconds";
    Exception syntheticException = new TimeoutException(message);
    // We use the stack from where finalize() was running to show where it was stuck.
    syntheticException.setStackTrace(FinalizerDaemon.INSTANCE.getStackTrace());

    // Send SIGQUIT to get native stack traces.
    try {
        Os.kill(Os.getpid(), OsConstants.SIGQUIT);
        // Sleep a few seconds to let the stack traces print.
        Thread.sleep(5000);
    } catch (Exception e) {
        System.logE("failed to send SIGQUIT", e);
    } catch (OutOfMemoryError ignored) {
        // May occur while trying to allocate the exception.
    }

    //...
    if (Thread.getUncaughtExceptionPreHandler() == null &&
            Thread.getDefaultUncaughtExceptionHandler() == null) {
        // If we have no handler, log and exit.
        System.logE(message, syntheticException);
        System.exit(2);
    }

    // Otherwise call the handler to do crash reporting.
    // We don't just throw because we're not the thread that
    // timed out; we're the thread that detected it.
    Thread.currentThread().dispatchUncaughtException(syntheticException);
}

可以看到这个方法就是构造了一个 TimeoutException 并且抛出，这里退出程序调用了 System.exit(2) , 好像在我们平时写代码的过程中不常见，一般都是调用 System.exit(0) ，那这个exit的参数是什么意义呢？

System.exit(int code) 中的code参数，除了0以外，其余的都是代表发生错误或者异常而退出程序，只有0代表正常的退出程序。

1-127: 1-127是用户定义的code。

128-255: 表示unix定义的不同的异常信号量，例如 SIGSEGV 或者 SIGTERM。

回到TimeoutException , 通过上面的分析，已经知道了异常的抛出源头在哪里，所以应该只要让这个方法不要执行，或者说让 FinalizeWatchdogDaemon 停止，因为它本质上是一个线程，通过它的父类也能看到有提供 stop 方法，所以，首先考虑Hook这个类，然后调用stop方法：

final Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");
final Field field = clazz.getDeclaredField("INSTANCE");
field.setAccessible(true);
final Method method = clazz.getSuperclass().getDeclaredMethod("stop");
method.setAccessible(true);
method.invoke(watchdog);

这样看起来没有问题，但是当运行在 Android 6.0以下的系统的时候，可能会发生一些线程同步的问题，所以需要来对比一下 Android 6.0以上和 Android 5.1的源码有什么区别：

Android 7.0:

public void stop() {
  Thread threadToStop;
  synchronized (this) {
    threadToStop = thread;
    thread = null;
  }
  if (threadToStop == null) {
    throw new IllegalStateException("not running");
  }
  interrupt(threadToStop);
  while (true) {
    try {
      threadToStop.join();
      return;
    } catch (InterruptedException ignored) {
    } catch (OutOfMemoryError ignored) {
      // An OOME may be thrown if allocating the InterruptedException failed.
    }
  }
}

Android 5.1

public void stop() {
  Thread threadToStop;
  synchronized (this) {
    threadToStop = thread;
    thread = null;
  }
  if (threadToStop == null) {
    throw new IllegalStateException("not running");
  }
  threadToStop.interrupt();
  while (true) {
    try {
      threadToStop.join();
      return;
    } catch (InterruptedException ignored) {
    }
  }
}

通过对比发现，Android 6.0 以上中断线程是通过调用方法 interrupt(threadToStop) 实现的，而Android 5.1 是通过直接调用 Thread.interrupt , 看一下 interrupt方法：

 public synchronized void interrupt(Thread thread) {
   if (thread == null) {
     throw new IllegalStateException("not running");
   }
   thread.interrupt();
 }

到这里应该能发现，如果是5.0以下，没有对interrupt做同步处理，在多线程的访问下就可能会发生问题。因此，在给的Demo中用了另外一种方式：

final Field thread = clazz.getSuperclass().getDeclaredField("thread");
thread.setAccessible(true);
thread.set(watchdog, null);

是直接将Damon的thread属性赋值为null，在 FinlaizerWatchdogDaemon 的 runInternal方法中，是通过 :

while(isRunning()){
	//...
}

protected synchronized boolean isRunning() {
  return thread != null;
}

可以看到，当thread为null的时候，while会跳出循环，和调用stop的效果一样，所以，通过这种方式可以停止对finalize的10s监听，从而解决TimeoutException的异常。