Android性能优化 - 捕获java crash的那些事

4,294 阅读7分钟

本文为稀土掘金技术社区首发签约文章,14天内禁止转载,14天后未获授权禁止转载,侵权必究!

背景

crash一直是影响app稳定性的大头,同时在随着项目逐渐迭代,复杂性越来越提高的同时,由于主观或者客观的的原因,都会造成意想不到的crash出现。同样的,在android的历史化过程中,就算是android系统本身,在迭代中也会存在着隐含的crash。我们常说的crash包括java层(虚拟机层)crash与native层crash,本期我们着重讲一下java层的crash。

java层crash由来

虽然说我们在开发过程中会遇到各种各样的crash,但是这个crash是如果产生的呢?我们来探讨一下一个crash是如何诞生的!

我们很容易就知道,在java中main函数是程序的开始(其实还有前置步骤),我们开发中,虽然android系统把应用的主线程创建封装在了自己的系统中,但是无论怎么封装,一个java层的线程无论再怎么强大,背后肯定是绑定了一个操作系统级别的线程,才真正得与驱动,也就是说,我们平常说的java线程,它其实是被操作系统真正的Thread的一个使用体罢了,java层的多个thread,可能会只对应着native层的一个Thread(便于区分,这里thread统一只java层的线程,Thread指的是native层的Thread。其实native的Thread也不是真正的线程,只是操作系统提供的一个api罢了,但是我们这里先简单这样定义,假设了native的线程与操作系统线程为同一个东西)

每一个java层的thread调用start方法,就会来到native层Thread的世界

public synchronized void start() {
        throw new IllegalThreadStateException();
    group.add(this);

    started = false;
    try {
        nativeCreate(this, stackSize, daemon);
        started = true;
    } finally {
        try {
            if (!started) {
                group.threadStartFailed(this);
            }
        } catch (Throwable ignore) {
            /* do nothing. If start0 threw a Throwable then
              it will be passed up the call stack */
        }
    }
}

最终调用的是一个jni方法

private native static void nativeCreate(Thread t, long stackSize, boolean daemon);

而nativeCreate最终在native层的实现是

static void Thread_nativeCreate(JNIEnv* env, jclass, jobject java_thread, jlong stack_size,
jboolean daemon) {
    // There are sections in the zygote that forbid thread creation.
    Runtime* runtime = Runtime::Current();
    if (runtime->IsZygote() && runtime->IsZygoteNoThreadSection()) {
        jclass internal_error = env->FindClass("java/lang/InternalError");
        CHECK(internal_error != nullptr);
        env->ThrowNew(internal_error, "Cannot create threads in zygote");
        return;
    }
     // 这里就是真正的创建线程方法
    Thread::CreateNativeThread(env, java_thread, stack_size, daemon == JNI_TRUE);
}

CreateNativeThread 经过了一系列的校验动作,终于到了真正创建线程的地方了,最终在CreateNativeThread方法中,通过了pthread_create创建了一个真正的Thread

Thread::CreateNativeThread 方法中
...
pthread_create_result = pthread_create(&new_pthread,
&attr,
Thread::CreateCallback,
child_thread);
CHECK_PTHREAD_CALL(pthread_attr_destroy, (&attr), "new thread");

if (pthread_create_result == 0) {
    // pthread_create started the new thread. The child is now responsible for managing the
    // JNIEnvExt we created.
    // Note: we can't check for tmp_jni_env == nullptr, as that would require synchronization
    //       between the threads.
    child_jni_env_ext.release();  // NOLINT pthreads API.
    return;
}
...

到这里我们就能够明白,一个java层的thread其实真正绑定的,是一个native层的Thread,有了这个知识,我们就可以回到我们的crash主题了,当发生异常的时候(即检测到一些操作不符合虚拟机规定时),注意,这个时候还是在虚拟机的控制范围之内,就可以直接调用

void Thread::ThrowNewException(const char* exception_class_descriptor,
const char* msg) {
    // Callers should either clear or call ThrowNewWrappedException.
    AssertNoPendingExceptionForNewException(msg);
    ThrowNewWrappedException(exception_class_descriptor, msg);
}

进行对exception的抛出,我们目前所有的java层crash都是如此,因为对crash的识别还属于本虚拟机所在的进程的范畴(native crash 虚拟机就没办法直接识别),比如我们常见的各种crash

image.png

然后就会调用到Thread::ThrowNewWrappedException 方法,在这个方法里面再次调用到Thread::SetException方法,成功的把当次引发异常的信息记录下来

void Thread::SetException(ObjPtr<mirror::Throwable> new_exception) {
    CHECK(new_exception != nullptr);
    // TODO: DCHECK(!IsExceptionPending());
    tlsPtr_.exception = new_exception.Ptr();
}

此时,此时就会调用Thread的Destroy方法,这个时候,线程就会在里面判断,本次的异常该怎么去处理

void Thread::Destroy() {
   ...

    if (tlsPtr_.opeer != nullptr) {
        ScopedObjectAccess soa(self);
        // We may need to call user-supplied managed code, do this before final clean-up.
        HandleUncaughtExceptions(soa);
        RemoveFromThreadGroup(soa);
        Runtime* runtime = Runtime::Current();
        if (runtime != nullptr) {
                runtime->GetRuntimeCallbacks()->ThreadDeath(self);
        }

HandleUncaughtExceptions 这个方式就是处理的函数,我们继续看一下这个异常处理函数

void Thread::HandleUncaughtExceptions(ScopedObjectAccessAlreadyRunnable& soa) {
    if (!IsExceptionPending()) {
        return;
    }
    ScopedLocalRef<jobject> peer(tlsPtr_.jni_env, soa.AddLocalReference<jobject>(tlsPtr_.opeer));
    ScopedThreadStateChange tsc(this, ThreadState::kNative);

    // Get and clear the exception.
    ScopedLocalRef<jthrowable> exception(tlsPtr_.jni_env, tlsPtr_.jni_env->ExceptionOccurred());
    tlsPtr_.jni_env->ExceptionClear();

    // Call the Thread instance's dispatchUncaughtException(Throwable)
    // 关键点就在此,回到java层
    tlsPtr_.jni_env->CallVoidMethod(peer.get(),
    WellKnownClasses::java_lang_Thread_dispatchUncaughtException,
    exception.get());

    // If the dispatchUncaughtException threw, clear that exception too.
    tlsPtr_.jni_env->ExceptionClear();
}

到这里,我们就接近尾声了,可以看到我们的处理函数最终通过jni,再次回到了java层的世界,而这个连接的java层函数就是dispatchUncaughtException(java_lang_Thread_dispatchUncaughtException)

public final void dispatchUncaughtException(Throwable e) {
    // BEGIN Android-added: uncaughtExceptionPreHandler for use by platform.
    Thread.UncaughtExceptionHandler initialUeh =
            Thread.getUncaughtExceptionPreHandler();
    if (initialUeh != null) {
        try {
            initialUeh.uncaughtException(this, e);
        } catch (RuntimeException | Error ignored) {
            // Throwables thrown by the initial handler are ignored
        }
    }
    // END Android-added: uncaughtExceptionPreHandler for use by platform.
    getUncaughtExceptionHandler().uncaughtException(this, e);
}

到这里,我们就彻底了解到了一个java层异常的产生过程!

为什么java层异常会导致crash

从上面我们文章我们能够看到,一个异常是怎么产生的,可能细心的读者会了解到,笔者一直在用异常这个词,而不是crash,因为异常发生了,crash是不一定产生的!我们可以看到dispatchUncaughtException方法最终会尝试着调用UncaughtExceptionHandler去处理本次异常,好家伙!那么UncaughtExceptionHandler是在什么时候设置的?其实就是在Init中,由系统提前设置好的!frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

protected static final void commonInit() {
    if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");

    /*
     * set handlers; these apply to all threads in the VM. Apps can replace
     * the default handler, but not the pre handler.
     */
    LoggingHandler loggingHandler = new LoggingHandler();
    RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));

    /*
     * Install a time zone supplier that uses the Android persistent time zone system property.
     */
    RuntimeHooks.setTimeZoneIdSupplier(() -> SystemProperties.get("persist.sys.timezone"));
    
    LogManager.getLogManager().reset();
    new AndroidConfig();

    /*
     * Sets the default HTTP User-Agent used by HttpURLConnection.
     */
    String userAgent = getDefaultUserAgent();
    System.setProperty("http.agent", userAgent);

    /*
     * Wire socket tagging to traffic stats.
     */
    TrafficStats.attachSocketTagger();

    initialized = true;
}

好家伙,原来是KillApplicationHandler“捣蛋”,在异常到来时,就会通过KillApplicationHandler去处理,而这里的处理就是,杀死app!!


private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
    private final LoggingHandler mLoggingHandler;
    public KillApplicationHandler(LoggingHandler loggingHandler) {
        this.mLoggingHandler = Objects.requireNonNull(loggingHandler);
    }

    @Override
    public void uncaughtException(Thread t, Throwable e) {
        try {
            ensureLogging(t, e);

            // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
            if (mCrashing) return;
            mCrashing = true;

            if (ActivityThread.currentActivityThread() != null) {
                ActivityThread.currentActivityThread().stopProfiling();
            }

            // Bring up crash dialog, wait for it to be dismissed
            ActivityManager.getService().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
        } catch (Throwable t2) {
            if (t2 instanceof DeadObjectException) {
                // System process is dead; ignore
            } else {
                try {
                    Clog_e(TAG, "Error reporting crash", t2);
                } catch (Throwable t3) {
                    // Even Clog_e() fails!  Oh well.
                }
            }
        } finally {
            // Try everything to make sure this process goes away.
            Process.killProcess(Process.myPid());
            System.exit(10);
        }
    }

   
    private void ensureLogging(Thread t, Throwable e) {
        if (!mLoggingHandler.mTriggered) {
            try {
                mLoggingHandler.uncaughtException(t, e);
            } catch (Throwable loggingThrowable) {
                // Ignored.
            }
        }
    }
}

看到了吗!异常的产生导致的crash,真正的源头就是在此了!

捕获crash

通过对前文的阅读,我们了解到了crash的源头就是KillApplicationHandler,因为它默认处理就是杀死app,此时我们也注意到,它是继承于UncaughtExceptionHandler的。当然,有异常及时抛出解决,是一件好事,但是我们也可能有一些异常,比如android系统sdk的问题,或者其他没那么重要的异常,直接崩溃app,这个处理就不是那么好了。但是不要紧,java虚拟机开发者也肯定注意到了这点,所以提供

Thread.java

public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh)

方式,导入一个我们自定义的实现了UncaughtExceptionHandler接口的类

public interface UncaughtExceptionHandler {
    /**
     * Method invoked when the given thread terminates due to the
     * given uncaught exception.
     * <p>Any exception thrown by this method will be ignored by the
     * Java Virtual Machine.
     * @param t the thread
     * @param e the exception
     */
    void uncaughtException(Thread t, Throwable e);
}

此时我们只需要写一个类,模仿KillApplicationHandler一样,就能写出一个自己的异常处理类,去处理我们程序中的异常(或者Android系统中特定版本的异常)。例子demo比如

class MyExceptionHandler:Thread.UncaughtExceptionHandler {
    override fun uncaughtException(t: Thread, e: Throwable) {
        // 做自己的逻辑
        Log.i("hello",e.toString())
    }
}

总结

到这里,我们能够了解到了一个java crash是怎么产生的了,同时我们也了解到了常用的UncaughtExceptionHandler为什么可以拦截一些我们不希望产生crash的异常,在接下来的android性能优化系列中,会持续带来相关的其他分享,感谢观看