02.Android崩溃Crash库之App崩溃分析

1,636 阅读9分钟

目录总结

  • 01.抛出异常导致崩溃分析
  • 02.RuntimeInit类分析
  • 03.Looper停止App就退出吗
  • 04.handleApplicationCrash
  • 05.native_crash如何监控
  • 06.ANR是如何监控的
  • 07.回过头看addErrorToDropBox

前沿

  • 上一篇整体介绍了crash崩溃库崩溃重启,崩溃记录记录,查看以及分享日志等功能。
  • 项目地址:github.com/yangchong21…
  • 欢迎star,哈哈哈

01.抛出异常导致崩溃分析

  • 线程中抛出异常以后的处理逻辑。
    • 一旦线程出现抛出异常,并且我们没有捕捉的情况下,JVM将调用Thread中的dispatchUncaughtException方法把异常传递给线程的未捕获异常处理器。
    • 如果没有设置uncaughtExceptionHandler,将使用线程所在的线程组来处理这个未捕获异常。线程组ThreadGroup实现了UncaughtExceptionHandler,所以可以用来处理未捕获异常。
    public final void dispatchUncaughtException(Throwable e) {
        Thread.UncaughtExceptionHandler initialUeh =
                Thread.getUncaughtExceptionPreHandler();
        if (initialUeh != null) {
            try {
                initialUeh.uncaughtException(this, e);
            } catch (RuntimeException | Error ignored) {
                // Throwables thrown by the initial handler are ignored
            }
        }
        getUncaughtExceptionHandler().uncaughtException(this, e);
    }
    
    public static UncaughtExceptionHandler getUncaughtExceptionPreHandler() {
        return uncaughtExceptionPreHandler;
    }
    
    public UncaughtExceptionHandler getUncaughtExceptionHandler() {
        return uncaughtExceptionHandler != null ?
            uncaughtExceptionHandler : group;
    }
    
    private ThreadGroup group;
    
  • 然后看一下ThreadGroup中实现uncaughtException(Thread t, Throwable e)方法,代码如下
    • 默认情况下,线程组处理未捕获异常的逻辑是,首先将异常消息通知给父线程组,
    • 然后尝试利用一个默认的defaultUncaughtExceptionHandler来处理异常,
    • 如果没有默认的异常处理器则将错误信息输出到System.err。
    • 也就是JVM提供给我们设置每个线程的具体的未捕获异常处理器,也提供了设置默认异常处理器的方法。
    public void uncaughtException(Thread t, Throwable e) {
        if (parent != null) {
            parent.uncaughtException(t, e);
        } else {
            Thread.UncaughtExceptionHandler ueh =
                Thread.getDefaultUncaughtExceptionHandler();
            if (ueh != null) {
                ueh.uncaughtException(t, e);
            } else if (!(e instanceof ThreadDeath)) {
                System.err.print("Exception in thread \""
                                 + t.getName() + "\" ");
                e.printStackTrace(System.err);
            }
        }
    }
    
  • 既然Android遇到异常会发生崩溃,然后找一些哪里用到设置setDefaultUncaughtExceptionHandler,即可定位到RuntimeInit类。

02.RuntimeInit类分析

  • 然后看一下RuntimeInit类,由于是java代码,所以首先找main方法入口。代码如下所示
    public static final void main(String[] argv) {
        enableDdms();
        if (argv.length == 2 && argv[1].equals("application")) {
            if (DEBUG) Slog.d(TAG, "RuntimeInit: Starting application");
            redirectLogStreams();
        } else {
            if (DEBUG) Slog.d(TAG, "RuntimeInit: Starting tool");
        }
    
        commonInit();
    
        /*
         * Now that we're running in interpreted code, call back into native code
         * to run the system.
         */
        nativeFinishInit();
    
        if (DEBUG) Slog.d(TAG, "Leaving RuntimeInit!");
    }
    
  • 然后再来看一下commonInit()方法,看看里面做了什么操作?
    • 可以发现这里调用了setDefaultUncaughtExceptionHandler方法,设置了自定义的Handler类
    protected static final void commonInit() {
        if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
        /*
         * set handlers; these apply to all threads in the VM. Apps can replace
         * the default handler, but not the pre handler.
         */
        LoggingHandler loggingHandler = new LoggingHandler();
        Thread.setUncaughtExceptionPreHandler(loggingHandler);
        Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
    
        initialized = true;
    }
    
  • 接着看一下KillApplicationHandler类,可以发现该类实现了Thread.UncaughtExceptionHandler 接口

    private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
        private final LoggingHandler mLoggingHandler;
    
        @Override
        public void uncaughtException(Thread t, Throwable e) {
            try {
                ensureLogging(t, e);
    
                // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
                if (mCrashing) return;
                mCrashing = true;
    
                // Try to end profiling. If a profiler is running at this point, and we kill the
                // process (below), the in-memory buffer will be lost. So try to stop, which will
                // flush the buffer. (This makes method trace profiling useful to debug crashes.)
                if (ActivityThread.currentActivityThread() != null) {
                    ActivityThread.currentActivityThread().stopProfiling();
                }
    
                // Bring up crash dialog, wait for it to be dismissed
                ActivityManager.getService().handleApplicationCrash(
                        mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
            } catch (Throwable t2) {
                if (t2 instanceof DeadObjectException) {
                    // System process is dead; ignore
                } else {
                    try {
                        Clog_e(TAG, "Error reporting crash", t2);
                    } catch (Throwable t3) {
                        // Even Clog_e() fails!  Oh well.
                    }
                }
            } finally {
                // Try everything to make sure this process goes away.
                Process.killProcess(Process.myPid());
                System.exit(10);
            }
        }
    }
    
  • 得出结论
    • 其实在fork出app进程的时候,系统已经为app设置了一个异常处理,并且最终崩溃后会直接导致执行该handler的finallly方法最后杀死app直接退出app。如果你要自己处理,你可以自己实现Thread.UncaughtExceptionHandler。

03.Looper停止App就退出吗

  • looper如果停止了,那么app会退出吗,先做个实验看一下。代码如下所示
    • 可以发现调用这句话,是会让app退出的。会报错崩溃日志是:java.lang.IllegalStateException: Main thread not allowed to quit.
    Looper.getMainLooper().quit();
    //下面这种是安全退出
    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN_MR2) {
        Looper.getMainLooper().quitSafely();
    }
    
  • 然后看一下Looper中quit方法源码
    • Looper的quit方法源码如下:
    public void quit() {
        mQueue.quit(false);
    }
    
    • Looper的quitSafely方法源码如下:
    public void quitSafely() {
        mQueue.quit(true);
    }
    
  • 以上两个方法中mQueue是MessageQueue类型的对象,二者都调用了MessageQueue中的quit方法,MessageQueue的quit方法源码如下:
    • 可以发现上面调用了quit方法,即会出现出现崩溃,主要原因是因为调用prepare()-->new Looper(true)--->new MessageQueue(true)--->mQuitAllowed设置为true
    void quit(boolean safe) {
        if (!mQuitAllowed) {
            throw new IllegalStateException("Main thread not allowed to quit.");
        }
    
        synchronized (this) {
            if (mQuitting) {
                return;
            }
            mQuitting = true;
    
            if (safe) {
                removeAllFutureMessagesLocked();
            } else {
                removeAllMessagesLocked();
            }
    
            // We can assume mPtr != 0 because mQuitting was previously false.
            nativeWake(mPtr);
        }
    }
    
  • 通过观察以上源码我们可以发现:
    • 当我们调用Looper的quit方法时,实际上执行了MessageQueue中的removeAllMessagesLocked方法,该方法的作用是把MessageQueue消息池中所有的消息全部清空,无论是延迟消息(延迟消息是指通过sendMessageDelayed或通过postDelayed等方法发送的需要延迟执行的消息)还是非延迟消息。
    • 当我们调用Looper的quitSafely方法时,实际上执行了MessageQueue中的removeAllFutureMessagesLocked方法,通过名字就可以看出,该方法只会清空MessageQueue消息池中所有的延迟消息,并将消息池中所有的非延迟消息派发出去让Handler去处理,quitSafely相比于quit方法安全之处在于清空消息之前会派发所有的非延迟消息。
    • 无论是调用了quit方法还是quitSafely方法只会,Looper就不再接收新的消息。即在调用了Looper的quit或quitSafely方法之后,消息循环就终结了,这时候再通过Handler调用sendMessage或post等方法发送消息时均返回false,表示消息没有成功放入消息队列MessageQueue中,因为消息队列已经退出了。
    • 需要注意的是Looper的quit方法从API Level 1就存在了,但是Looper的quitSafely方法从API Level 18才添加进来。

04.handleApplicationCrash

  • 在KillApplicationHandler类中的uncaughtException方法,可以看到ActivityManager.getService().handleApplicationCrash被调用,那么这个是用来做什么的呢?
    • ActivityManager.getService().handleApplicationCrash-->ActivityManagerService.handleApplicationCrash-->handleApplicationCrashInner方法
    • 从下面可以看出,若传入app为null时,processName就设置为system_server
    public void handleApplicationCrash(IBinder app,
            ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
        ProcessRecord r = findAppProcess(app, "Crash");
        final String processName = app == null ? "system_server"
                : (r == null ? "unknown" : r.processName);
    
        handleApplicationCrashInner("crash", r, processName, crashInfo);
    }
    
  • 然后接着看一下handleApplicationCrashInner方法做了什么
    • 调用addErrorToDropBox将应用crash,进行封装输出。
    void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
            ApplicationErrorReport.CrashInfo crashInfo) {
    
        addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);
    
        mAppErrors.crashApplication(r, crashInfo);
    }
    

05.native_crash如何监控

  • native_crash,顾名思义,就是native层发生的crash。其实他是通过一个NativeCrashListener线程去监控的。
    final class NativeCrashListener extends Thread {
        ...
    
        @Override
        public void run() {
            final byte[] ackSignal = new byte[1];
    
           ...
    
            // The file system entity for this socket is created with 0777 perms, owned
            // by system:system. selinux restricts things so that only crash_dump can
            // access it.
            {
                File socketFile = new File(DEBUGGERD_SOCKET_PATH);
                if (socketFile.exists()) {
                    socketFile.delete();
                }
            }
    
            try {
                FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
                final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(
                        DEBUGGERD_SOCKET_PATH);
                Os.bind(serverFd, sockAddr);
                Os.listen(serverFd, 1);
                Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);
    
                //1.一直循环地读peerFd文件,若发生存在,则进入consumeNativeCrashData
                while (true) {
                    FileDescriptor peerFd = null;
                    try {
                        if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection");
                        peerFd = Os.accept(serverFd, null /* peerAddress */);
                        if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd);
                        if (peerFd != null) {
                            // the reporting thread may take responsibility for
                            // acking the debugger; make sure we play along.
                            //2.进入native crash数据处理流程
                            consumeNativeCrashData(peerFd);
                        }
                    } catch (Exception e) {
                        Slog.w(TAG, "Error handling connection", e);
                    } finally {
                        ...
                    }
                }
            } catch (Exception e) {
                Slog.e(TAG, "Unable to init native debug socket!", e);
            }
        }
    
        // Read a crash report from the connection
        void consumeNativeCrashData(FileDescriptor fd) {
            try {
                    ...
                    //3.启动NativeCrashReporter作为上报错误的新线程
                    final String reportString = new String(os.toByteArray(), "UTF-8");
                    (new NativeCrashReporter(pr, signal, reportString)).start();
    
            } catch (Exception e) {
                ...
            }
        }
    }
    
  • 上报native_crash的线程-->NativeCrashReporter:
    class NativeCrashReporter extends Thread {
        ProcessRecord mApp;
        int mSignal;
        String mCrashReport;
    
        NativeCrashReporter(ProcessRecord app, int signal, String report) {
            super("NativeCrashReport");
            mApp = app;
            mSignal = signal;
            mCrashReport = report;
        }
    
        @Override
        public void run() {
            try {
                //1.包装崩溃信息
                CrashInfo ci = new CrashInfo();
                ci.exceptionClassName = "Native crash";
                ci.exceptionMessage = Os.strsignal(mSignal);
                ci.throwFileName = "unknown";
                ci.throwClassName = "unknown";
                ci.throwMethodName = "unknown";
                ci.stackTrace = mCrashReport;
    
                if (DEBUG) Slog.v(TAG, "Calling handleApplicationCrash()");
                //2.转到ams中处理,跟普通crash一致,只是类型不一样
                mAm.handleApplicationCrashInner("native_crash", mApp, mApp.processName, ci);
                if (DEBUG) Slog.v(TAG, "<-- handleApplicationCrash() returned");
            } catch (Exception e) {
                Slog.e(TAG, "Unable to report native crash", e);
            }
        }
    }
    
  • native crash跟到这里就结束了,后面的流程就是跟application crash一样,都会走到addErrorToDropBox中,这个最后在说。

06.ANR是如何监控的

  • 这里就不讨论每种anr发生后的原因和具体的流程了,直接跳到已经触发ANR的位置。AppErrors.appNotResponding:
    final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
        SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
    
        if (mService.mController != null) {
            try {
                //1.判断是否继续后面的流程,还是直接kill掉当前进程
                // 0 == continue, -1 = kill process immediately
                int res = mService.mController.appEarlyNotResponding(
                        app.processName, app.pid, annotation);
                if (res < 0 && app.pid != MY_PID) {
                    app.kill("anr", true);
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }
    
        //2.记录发生anr的时间
        long anrTime = SystemClock.uptimeMillis();
        //3.更新cpu使用情况
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
        }
    
        //可以在设置中设置发生anr后,是弹框显示还是后台处理,默认是后台
        // Unless configured otherwise, swallow ANRs in background processes & kill the process.
        boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
                Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
    
        boolean isSilentANR;
    
        synchronized (mService) {
            ...
    
            // In case we come through here for the same app before completing
            // this one, mark as anring now so we will bail out.
            app.notResponding = true;
    
            //3.将anr写入event log中
            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                    app.processName, app.info.flags, annotation);
    
            // Dump thread traces as quickly as we can, starting with "interesting" processes.
            firstPids.add(app.pid);
    
            // Don't dump other PIDs if it's a background ANR
            isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
            if (!isSilentANR) {
                int parentPid = app.pid;
                if (parent != null && parent.app != null && parent.app.pid > 0) {
                    parentPid = parent.app.pid;
                }
                if (parentPid != app.pid) firstPids.add(parentPid);
    
                if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
    
                for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                    ProcessRecord r = mService.mLruProcesses.get(i);
                    if (r != null && r.thread != null) {
                        int pid = r.pid;
                        if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                            if (r.persistent) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                            } else if (r.treatLikeActivity) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                            } else {
                                lastPids.put(pid, Boolean.TRUE);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                            }
                        }
                    }
                }
            }
    
        }
    
        // 4.将主要的anr信息写到main.log中
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        info.append("ANR in ").append(app.processName);
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        info.append("PID: ").append(app.pid).append("\n");
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("\n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("\n");
        }
    
        ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
        ArrayList<Integer> nativePids = null;
    
        // don't dump native PIDs for background ANRs unless it is the process of interest
        String[] nativeProc = null;
        if (isSilentANR) {
            for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
                if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
                    nativeProc = new String[] { app.processName };
                    break;
                }
            }
            int[] pid = nativeProc == null ? null : Process.getPidsForCommands(nativeProc);
            if(pid != null){
                nativePids = new ArrayList<Integer>(pid.length);
                for (int i : pid) {
                    nativePids.add(i);
                }
            }
        } else {
            nativePids = Watchdog.getInstance().getInterestingNativePids();
        }
    
        //5.dump出stacktraces文件
        // For background ANRs, don't pass the ProcessCpuTracker to
        // avoid spending 1/2 second collecting stats to rank lastPids.
        File tracesFile = ActivityManagerService.dumpStackTraces(
                true, firstPids,
                (isSilentANR) ? null : processCpuTracker,
                (isSilentANR) ? null : lastPids,
                nativePids);
    
        String cpuInfo = null;
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            //6.再次更新cpu使用情况
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                //7.打印anr时cpu使用状态
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }
    
        info.append(processCpuTracker.printCurrentState(anrTime));
    
        //8.当traces文件不存在时,只能打印线程日志了
        if (tracesFile == null) {
            // There is no trace file, so dump (only) the alleged culprit's threads to the log
            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
        }
    
        ...
        //9.关键,回到了我们熟悉的addErrorToDropBox,进行错误信息包装跟上传了
        mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
                cpuInfo, tracesFile, null);
    
        if (mService.mController != null) {
            try {
                //10.根据appNotResponding返回结果,看是否继续等待,还是结束当前进程
                // 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
                int res = mService.mController.appNotResponding(
                        app.processName, app.pid, info.toString());
                if (res != 0) {
                    if (res < 0 && app.pid != MY_PID) {
                        app.kill("anr", true);
                    } else {
                        synchronized (mService) {
                            mService.mServices.scheduleServiceTimeoutLocked(app);
                        }
                    }
                    return;
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }
    
       ...
    }
    
  • 我们来看一下traces文件是怎么dump出来的:
    public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,
            ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids,
            ArrayList<Integer> nativePids) {
        ArrayList<Integer> extraPids = null;
    
        //1.测量CPU的使用情况,以便在请求时对顶级用户进行实际的采样。
        if (processCpuTracker != null) {
            processCpuTracker.init();
            try {
                Thread.sleep(200);
            } catch (InterruptedException ignored) {
            }
    
            processCpuTracker.update();
    
            // 2.爬取顶级应用到的cpu使用情况
            final int N = processCpuTracker.countWorkingStats();
            extraPids = new ArrayList<>();
            for (int i = 0; i < N && extraPids.size() < 5; i++) {
                ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);
                if (lastPids.indexOfKey(stats.pid) >= 0) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + stats.pid);
    
                    extraPids.add(stats.pid);
                } else if (DEBUG_ANR) {
                    Slog.d(TAG, "Skipping next CPU consuming process, not a java proc: "
                            + stats.pid);
                }
            }
        }
    
        //3.读取trace文件的保存目录
        File tracesFile;
        final String tracesDirProp = SystemProperties.get("dalvik.vm.stack-trace-dir", "");
        if (tracesDirProp.isEmpty()) {
            ...
            String globalTracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
            ...
        } else {
           ...
        }
    
        //4.传入指定目录,进入实际dump逻辑
        dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids,
                useTombstonedForJavaTraces);
        return tracesFile;
    }
    
  • dumpStackTraces
    private static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
            ArrayList<Integer> nativePids, ArrayList<Integer> extraPids,
            boolean useTombstonedForJavaTraces) {
    
        ...
        final DumpStackFileObserver observer;
        if (useTombstonedForJavaTraces) {
            observer = null;
        } else {
            // Use a FileObserver to detect when traces finish writing.
            // The order of traces is considered important to maintain for legibility.
            observer = new DumpStackFileObserver(tracesFile);
        }
    
        //我们必须在20秒内完成所有堆栈转储。
        long remainingTime = 20 * 1000;
        try {
            if (observer != null) {
                observer.startWatching();
            }
    
            // 首先收集所有最重要的pid堆栈。
            if (firstPids != null) {
                int num = firstPids.size();
                for (int i = 0; i < num; i++) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for pid "
                            + firstPids.get(i));
                    final long timeTaken;
                    if (useTombstonedForJavaTraces) {
                        timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile, remainingTime);
                    } else {
                        timeTaken = observer.dumpWithTimeout(firstPids.get(i), remainingTime);
                    }
    
                    remainingTime -= timeTaken;
                    if (remainingTime <= 0) {
                        Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
                            "); deadline exceeded.");
                        return;
                    }
    
                    if (DEBUG_ANR) {
                        Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                    }
                }
            }
    
            //接下来收集native pid的堆栈
            if (nativePids != null) {
                for (int pid : nativePids) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for native pid " + pid);
                    final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);
    
                    final long start = SystemClock.elapsedRealtime();
                    Debug.dumpNativeBacktraceToFileTimeout(
                            pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
                    final long timeTaken = SystemClock.elapsedRealtime() - start;
    
                    remainingTime -= timeTaken;
                    if (remainingTime <= 0) {
                        Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
                            "); deadline exceeded.");
                        return;
                    }
    
                    if (DEBUG_ANR) {
                        Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
                    }
                }
            }
    
            // 最后,从CPU跟踪器转储所有额外PID的堆栈。
            if (extraPids != null) {
                for (int pid : extraPids) {
                    if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + pid);
    
                    final long timeTaken;
                    if (useTombstonedForJavaTraces) {
                        timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);
                    } else {
                        timeTaken = observer.dumpWithTimeout(pid, remainingTime);
                    }
    
                    remainingTime -= timeTaken;
                    if (remainingTime <= 0) {
                        Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
                                "); deadline exceeded.");
                        return;
                    }
    
                    if (DEBUG_ANR) {
                        Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
                    }
                }
            }
        } finally {
            if (observer != null) {
                observer.stopWatching();
            }
        }
    }
    
  • 看完之后,应该可以很清楚地的明白。ANR的流程就是打印一些 ANR reason、cpu stats、线程日志,然后分别写入main.log、event.log,然后调用到addErrorToDropBox中,最后kill该进程。

07.回过头看addErrorToDropBox

  • 为什么说addErrorToDropBox是殊途同归呢,因为无论是crash、native_crash、ANR或是wtf,最终都是来到这里,交由它去处理。那下面我们就来揭开它的神秘面纱吧。
    public void addErrorToDropBox(String eventType,
            ProcessRecord process, String processName, ActivityRecord activity,
            ActivityRecord parent, String subject,
            final String report, final File dataFile,
            final ApplicationErrorReport.CrashInfo crashInfo) {
        // NOTE -- this must never acquire the ActivityManagerService lock,
        // otherwise the watchdog may be prevented from resetting the system.
    
        // Bail early if not published yet
        if (ServiceManager.getService(Context.DROPBOX_SERVICE) == null) return;
        final DropBoxManager dbox = mContext.getSystemService(DropBoxManager.class);
    
        //只有这几种类型的错误,才会进行上传
        final boolean shouldReport = ("anr".equals(eventType)
                || "crash".equals(eventType)
                || "native_crash".equals(eventType)
                || "watchdog".equals(eventType));
    
        // Exit early if the dropbox isn't configured to accept this report type.
        final String dropboxTag = processClass(process) + "_" + eventType;
        //1.如果DropBoxManager没有初始化,或不是要上传的类型,则直接返回
        if (dbox == null || !dbox.isTagEnabled(dropboxTag)&& !shouldReport)
            return;
    
        ...
    
        final StringBuilder sb = new StringBuilder(1024);
        //2.添加一些头部log信息
        appendDropBoxProcessHeaders(process, processName, sb);
        //3.添加崩溃进程和界面的信息
        try {
            if (process != null) {
                //添加是否前台前程log
                sb.append("Foreground: ")
                        .append(process.isInterestingToUserLocked() ? "Yes" : "No")
                        .append("\n");
            }
            //触发该崩溃的界面,可以为null
            if (activity != null) {
                sb.append("Activity: ").append(activity.shortComponentName).append("\n");
            }
            if (parent != null && parent.app != null && parent.app.pid != process.pid) {
                sb.append("Parent-Process: ").append(parent.app.processName).append("\n");
            }
            if (parent != null && parent != activity) {
                sb.append("Parent-Activity: ").append(parent.shortComponentName).append("\n");
            }
            //定入简要信息
            if (subject != null) {
                sb.append("Subject: ").append(subject).append("\n");
            }
            sb.append("Build: ").append(Build.FINGERPRINT).append("\n");
            //是否连接了调试
            if (Debug.isDebuggerConnected()) {
                sb.append("Debugger: Connected\n");
            }
        } catch (NullPointerException e) {
           e.printStackTrace();
        } finally {
            sb.append("\n");
        }
    
    
        final String fProcessName = processName;
        final String fEventType = eventType;
        final String packageName = getErrorReportPackageName(process, crashInfo, eventType);
        Slog.i(TAG,"addErrorToDropbox, real report package is "+packageName);
    
        // Do the rest in a worker thread to avoid blocking the caller on I/O
        // (After this point, we shouldn't access AMS internal data structures.)
        Thread worker = new Thread("Error dump: " + dropboxTag) {
            @Override
            public void run() {
                //4.添加进程的状态到dropbox中
                BufferedReader bufferedReader = null;
                String line;
                try {
                    bufferedReader = new BufferedReader(new FileReader("/proc/" + pid + "/status"));
                    for (int i = 0; i < 5; i++) {
                        if ((line = bufferedReader.readLine()) != null && line.contains("State")) {
                            sb.append(line + "\n");
                            break;
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                } finally {
                    if (bufferedReader != null) {
                        try {
                            bufferedReader.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
    
                if (report != null) {
                    sb.append(report);
                }
    
                String setting = Settings.Global.ERROR_LOGCAT_PREFIX + dropboxTag;
                int lines = Settings.Global.getInt(mContext.getContentResolver(), setting, 0);
                int maxDataFileSize = DROPBOX_MAX_SIZE - sb.length()
                        - lines * RESERVED_BYTES_PER_LOGCAT_LINE;
    
                //5.将dataFile文件定入dropbox中,一般只有anr时,会将traces文件通过该参数传递进来者,其他类型都不传.
                if (dataFile != null && maxDataFileSize > 0) {
                    try {
                        sb.append(FileUtils.readTextFile(dataFile, maxDataFileSize,
                                    "\n\n[[TRUNCATED]]"));
                    } catch (IOException e) {
                        Slog.e(TAG, "Error reading " + dataFile, e);
                    }
                }
    
                //6.如果是crash类型,会传入crashInfo,此时将其写入dropbox中
                if (crashInfo != null && crashInfo.stackTrace != null) {
                    sb.append(crashInfo.stackTrace);
                }
    
                if (lines > 0) {
                    sb.append("\n");
    
                    // 7.合并几个logcat流,取最新部分log
                    InputStreamReader input = null;
                    try {
                        java.lang.Process logcat = new ProcessBuilder(
                                "/system/bin/timeout", "-k", "15s", "10s",
                                "/system/bin/logcat", "-v", "threadtime", "-b", "events", "-b", "system",
                                "-b", "main", "-b", "crash", "-t", String.valueOf(lines))
                                        .redirectErrorStream(true).start();
    
                        try { logcat.getOutputStream().close(); } catch (IOException e) {}
                        try { logcat.getErrorStream().close(); } catch (IOException e) {}
                        input = new InputStreamReader(logcat.getInputStream());
    
                        int num;
                        char[] buf = new char[8192];
                        while ((num = input.read(buf)) > 0) sb.append(buf, 0, num);
                    } catch (IOException e) {
                        Slog.e(TAG, "Error running logcat", e);
                    } finally {
                        if (input != null) try { input.close(); } catch (IOException e) {}
                    }
                }
    
                ...
    
    
                if (shouldReport) {
                    synchronized (mErrorListenerLock) {
                        try {
                            if (mIApplicationErrorListener == null) {
                                return;
                            }
                            //8.关键,在这里可以添加一个application error的接口,用来实现应用层接收崩溃信息
                            mIApplicationErrorListener.onError(fEventType,
                                    packageName, fProcessName, subject, dropboxTag
                                            + "-" + uuid, crashInfo);
                        } catch (DeadObjectException e) {
                            Slog.i(TAG, "ApplicationErrorListener.onError() E :" + e, e);
                            mIApplicationErrorListener = null;
                        } catch (Exception e) {
                            Slog.i(TAG, "ApplicationErrorListener.onError() E :" + e, e);
                        }
                    }
                }
            }
        };
    
        ...
    }
    
  • 调用appendDropBoxProcessHeaders添加头部log信息:
    private void appendDropBoxProcessHeaders(ProcessRecord process, String processName,
            StringBuilder sb) {
        // Watchdog thread ends up invoking this function (with
        // a null ProcessRecord) to add the stack file to dropbox.
        // Do not acquire a lock on this (am) in such cases, as it
        // could cause a potential deadlock, if and when watchdog
        // is invoked due to unavailability of lock on am and it
        // would prevent watchdog from killing system_server.
        if (process == null) {
            sb.append("Process: ").append(processName).append("\n");
            return;
        }
        // Note: ProcessRecord 'process' is guarded by the service
        // instance.  (notably process.pkgList, which could otherwise change
        // concurrently during execution of this method)
        synchronized (this) {
            sb.append("Process: ").append(processName).append("\n");
            sb.append("PID: ").append(process.pid).append("\n");
            int flags = process.info.flags;
            IPackageManager pm = AppGlobals.getPackageManager();
            //添加该进程的flag
            sb.append("Flags: 0x").append(Integer.toHexString(flags)).append("\n");
            for (int ip=0; ip<process.pkgList.size(); ip++) {
                String pkg = process.pkgList.keyAt(ip);
                sb.append("Package: ").append(pkg);
                try {
                    PackageInfo pi = pm.getPackageInfo(pkg, 0, UserHandle.getCallingUserId());
                    if (pi != null) {
                        sb.append(" v").append(pi.getLongVersionCode());
                        if (pi.versionName != null) {
                            sb.append(" (").append(pi.versionName).append(")");
                        }
                    }
                } catch (RemoteException e) {
                    Slog.e(TAG, "Error getting package info: " + pkg, e);
                }
                sb.append("\n");
            }
            //如果是执行安装的app,会在log中添加此项
            if (process.info.isInstantApp()) {
                sb.append("Instant-App: true\n");
            }
        }
    }
    

项目地址:github.com/yangchong21…