Crash监控平台Sentry的iOS SDK源码解析(一)

5,037 阅读7分钟

背景

Sentry是一个实时事件日志记录和汇集的平台。其专注于错误监控以及提取一切事后处理所需信息而不依赖于麻烦的用户反馈。在国内例如Bugtags、Bugly等APP crash 采集平台。但是Sentry 的优势在于支持服务端、Android、iOS、Web等N种平台。还有最重要的就是他是开源的!他是开源的!他是开源的!(重要的事情要说三遍)全部平台的SDK和服务端的代码全是开源的! 由于我们公司APP最近想接入这个平台(其实是本人自己想私部这个平台),所以想研究下他们的SDK,以更好的让客户使用我们的产品。

接入使用

根据官方文档,iOS的接入只需要两个步骤:

  1. 添加依赖
source 'https://github.com/CocoaPods/Specs.git'
platform :ios, '8.0'
use_frameworks!

target 'YourApp' do
    pod 'Sentry', :git => 'https://github.com/getsentry/sentry-cocoa.git', :tag => '4.3.1'
end
  1. didFinishLaunchingWithOptions方法中初始化
NSError *error = nil;
SentryClient *client = [[SentryClient alloc] initWithDsn:@"https://xxxx@sentry.io/xxxx" didFailWithError:&error];
SentryClient.sharedClient = client;
[SentryClient.sharedClient startCrashHandlerWithError:&error];
if (nil != error) {
    NSLog(@"%@", error);
}

这样就可以捕获异常并上报服务端了,简直方便!

源码解析

初始化

我们看到初始化的里面有一个关键性代码(其实特么一共也就7行代码)

[SentryClient.sharedClient startCrashHandlerWithError:&error];

我们startCrashHandlerWithError函数体里面

- (BOOL)startCrashHandlerWithError:(NSError *_Nullable *_Nullable)error {
    [SentryLog logWithMessage:@"SentryCrashHandler started" andLevel:kSentryLogLevelDebug];
    static dispatch_once_t onceToken = 0;
    dispatch_once(&onceToken, ^{
        installation = [[SentryInstallation alloc] init];
        [installation install];
        [installation sendAllReports];
    });
    return YES;
}

我们看到是一个只执行一次的dispatch_once,防止多次初始化。从这里开始我们要打开新世界的大门了,关键性的方法调用可以看下面的时序图

我们可以看到最后初始化的是一系列的Monitors,而这些Monitors根据代码中的枚举定义一共有9种,如下:

typedef enum
{
    /* Captures and reports Mach exceptions. */
    SentryCrashMonitorTypeMachException      = 0x01,

    /* Captures and reports POSIX signals. */
    SentryCrashMonitorTypeSignal             = 0x02,

    /* Captures and reports C++ exceptions.
     * Note: This will slightly slow down exception processing.
     */
    SentryCrashMonitorTypeCPPException       = 0x04,

    /* Captures and reports NSExceptions. */
    SentryCrashMonitorTypeNSException        = 0x08,

    /* Detects and reports a deadlock in the main thread. */
    SentryCrashMonitorTypeMainThreadDeadlock = 0x10,

    /* Accepts and reports user-generated exceptions. */
    SentryCrashMonitorTypeUserReported       = 0x20,

    /* Keeps track of and injects system information. */
    SentryCrashMonitorTypeSystem             = 0x40,

    /* Keeps track of and injects application state. */
    SentryCrashMonitorTypeApplicationState   = 0x80,

    /* Keeps track of zombies, and injects the last zombie NSException. */
    SentryCrashMonitorTypeZombie             = 0x100,
} SentryCrashMonitorType;

根据我的理解,这些Monitor可以分成两类,如下图(这里我以文件名的形式展现类别,方便大家定位代码)

其中CrashMonitors指的是可以捕捉到Crash的监控,另外一种ContextMonitors指的是用于记录上下文注入到日志中的监控。但是大家注意下,这些Monitors并不是都会去初始化的,根据不同的情况Sentry会初始化不同的Monitors。例如:

void sentrycrashcm_setActiveMonitors(SentryCrashMonitorType monitorTypes)
{
    if(sentrycrashdebug_isBeingTraced() && (monitorTypes & SentryCrashMonitorTypeDebuggerUnsafe))
    {
        static bool hasWarned = false;
        if(!hasWarned)
        {
            hasWarned = true;
            SentryCrashLOGBASIC_WARN("    ************************ Crash Handler Notice ************************");
            SentryCrashLOGBASIC_WARN("    *     App is running in a debugger. Masking out unsafe monitors.     *");
            SentryCrashLOGBASIC_WARN("    * This means that most crashes WILL NOT BE RECORDED while debugging! *");
            SentryCrashLOGBASIC_WARN("    **********************************************************************");
        }
        monitorTypes &= SentryCrashMonitorTypeDebuggerSafe;
    }
    ...
}

其中在debug的情况下会启动SentryCrashMonitorTypeDebuggerSafe一系列的监控,其中SentryCrashMonitorTypeDebuggerSafe宏定义如下:

/** Monitors that are safe to enable in a debugger. */
#define SentryCrashMonitorTypeDebuggerSafe (SentryCrashMonitorTypeAll & (~SentryCrashMonitorTypeDebuggerUnsafe))

Monitors

下面我们来逐个了解这些Monitors是如何去捕捉异常和记录信息的,我们根据Sentry的枚举定义的顺序来逐个解析。温馨提示:下面的代码可能会引起大家的不适,如果大家没有耐心可以分多次阅读。

1. SentryCrashMonitorTypeMachException

这是捕捉内核异常的Monitor,其中核心代码如下

static bool installExceptionHandler()
{
    SentryCrashLOG_DEBUG("Installing mach exception handler.");

    bool attributes_created = false;
    pthread_attr_t attr;

    kern_return_t kr;
    int error;

    const task_t thisTask = mach_task_self();
    exception_mask_t mask = EXC_MASK_BAD_ACCESS |
    EXC_MASK_BAD_INSTRUCTION |
    EXC_MASK_ARITHMETIC |
    EXC_MASK_SOFTWARE |
    EXC_MASK_BREAKPOINT;

    //备份现有的异常接收端口
    SentryCrashLOG_DEBUG("Backing up original exception ports.");
    kr = task_get_exception_ports(thisTask,
                                  mask,
                                  g_previousExceptionPorts.masks,
                                  &g_previousExceptionPorts.count,
                                  g_previousExceptionPorts.ports,
                                  g_previousExceptionPorts.behaviors,
                                  g_previousExceptionPorts.flavors);
    if(kr != KERN_SUCCESS)
    {
        SentryCrashLOG_ERROR("task_get_exception_ports: %s", mach_error_string(kr));
        goto failed;
    }

    if(g_exceptionPort == MACH_PORT_NULL)
    {
        //分配新端口并赋予接收权限
        SentryCrashLOG_DEBUG("Allocating new port with receive rights.");
        kr = mach_port_allocate(thisTask,
                                MACH_PORT_RIGHT_RECEIVE,
                                &g_exceptionPort);
        if(kr != KERN_SUCCESS)
        {
            SentryCrashLOG_ERROR("mach_port_allocate: %s", mach_error_string(kr));
            goto failed;
        }

        SentryCrashLOG_DEBUG("Adding send rights to port.");
        kr = mach_port_insert_right(thisTask,
                                    g_exceptionPort,
                                    g_exceptionPort,
                                    MACH_MSG_TYPE_MAKE_SEND);
        if(kr != KERN_SUCCESS)
        {
            SentryCrashLOG_ERROR("mach_port_insert_right: %s", mach_error_string(kr));
            goto failed;
        }
    }

    //将新端口设置为接受异常的端口
    SentryCrashLOG_DEBUG("Installing port as exception handler.");
    kr = task_set_exception_ports(thisTask,
                                  mask,
                                  g_exceptionPort,
                                  EXCEPTION_DEFAULT,
                                  THREAD_STATE_NONE);
    if(kr != KERN_SUCCESS)
    {
        SentryCrashLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr));
        goto failed;
    }

    //创建辅助异常线程
    SentryCrashLOG_DEBUG("Creating secondary exception thread (suspended).");
    pthread_attr_init(&attr);
    attributes_created = true;
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    error = pthread_create(&g_secondaryPThread,
                           &attr,
                           &handleExceptions,
                           kThreadSecondary);
    if(error != 0)
    {
        SentryCrashLOG_ERROR("pthread_create_suspended_np: %s", strerror(error));
        goto failed;
    }
    g_secondaryMachThread = pthread_mach_thread_np(g_secondaryPThread);
    sentrycrashmc_addReservedThread(g_secondaryMachThread);

    //创建主要异常线程
    SentryCrashLOG_DEBUG("Creating primary exception thread.");
    error = pthread_create(&g_primaryPThread,
                           &attr,
                           &handleExceptions,
                           kThreadPrimary);
    if(error != 0)
    {
        SentryCrashLOG_ERROR("pthread_create: %s", strerror(error));
        goto failed;
    }
    pthread_attr_destroy(&attr);
    g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread);
    sentrycrashmc_addReservedThread(g_primaryMachThread);

    SentryCrashLOG_DEBUG("Mach exception handler installed.");
    return true;

failed:
    SentryCrashLOG_DEBUG("Failed to install mach exception handler.");
    if(attributes_created)
    {
        pthread_attr_destroy(&attr);
    }
    uninstallExceptionHandler();
    return false;
}

其中关于task_get_exception_portsmach_port_allocatetask_set_exception_ports等关于内核的函数大家可能都比较陌生,其实我也是第一次见。但是我们查看官方的开发文档Kernel Functions发现这些函数并没有文档描述。但是别慌!我又找到了XNU的开源仓库darwin-xnu,其中可以检索到这几个函数的说明。另外还可以参考另外一个网站,是我google出来的,方便阅读web.mit.edu/darwin/src/…
大家有兴趣可以研究下这些内核的函数,今天我们就点到为止。

2. SentryCrashMonitorTypeSignal

static bool installSignalHandler()
{
    SentryCrashLOG_DEBUG("Installing signal handler.");

#if SentryCrashCRASH_HAS_SIGNAL_STACK

    if(g_signalStack.ss_size == 0)
    {
        //给新的信号处理函数栈分配内存空间
        SentryCrashLOG_DEBUG("Allocating signal stack area.");
        g_signalStack.ss_size = SIGSTKSZ;
        g_signalStack.ss_sp = malloc(g_signalStack.ss_size);
    }

    //替换信号处理函数栈
    SentryCrashLOG_DEBUG("Setting signal stack area.");
    if(sigaltstack(&g_signalStack, NULL) != 0)
    {
        SentryCrashLOG_ERROR("signalstack: %s", strerror(errno));
        goto failed;
    }
#endif

    const int* fatalSignals = sentrycrashsignal_fatalSignals();
    int fatalSignalsCount = sentrycrashsignal_numFatalSignals();

    if(g_previousSignalHandlers == NULL)
    {
        //分配内存空间保存以前的信号处理函数
        SentryCrashLOG_DEBUG("Allocating memory to store previous signal handlers.");
        g_previousSignalHandlers = malloc(sizeof(*g_previousSignalHandlers)
                                          * (unsigned)fatalSignalsCount);
    }

    struct sigaction action = {{0}};
    action.sa_flags = SA_SIGINFO | SA_ONSTACK;
#if SentryCrashCRASH_HOST_APPLE && defined(__LP64__)
    action.sa_flags |= SA_64REGSET;
#endif
    //将信号集初始化为空
    sigemptyset(&action.sa_mask);
    //设置信号异常处理器
    action.sa_sigaction = &handleSignal;

    //逐个设置不同异常信号的处理器
    for(int i = 0; i < fatalSignalsCount; i++)
    {
        SentryCrashLOG_DEBUG("Assigning handler for signal %d", fatalSignals[i]);
        //如果设置失败,还原之前的处理器现场
        if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) != 0)
        {
            char sigNameBuff[30];
            const char* sigName = sentrycrashsignal_signalName(fatalSignals[i]);
            if(sigName == NULL)
            {
                snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]);
                sigName = sigNameBuff;
            }
            SentryCrashLOG_ERROR("sigaction (%s): %s", sigName, strerror(errno));
            // Try to reverse the damage
            for(i--;i >= 0; i--)
            {
                sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
            }
            goto failed;
        }
    }
    SentryCrashLOG_DEBUG("Signal handlers installed.");
    return true;

failed:
    SentryCrashLOG_DEBUG("Failed to install signal handlers.");
    return false;
}

其中核心API函数就是sigaction,这里我也为大家准备了一份gnu的官方文档www.gnu.org/software/li…

3. SentryCrashMonitorTypeCPPException

捕获c++异常的Monitor,其中核心代码如下

static void setEnabled(bool isEnabled)
{
    if(isEnabled != g_isEnabled)
    {
        g_isEnabled = isEnabled;
        if(isEnabled)
        {
            initialize();

            sentrycrashid_generate(g_eventID);
            //替换异常处理函数为CPPExceptionTerminate
            g_originalTerminateHandler = std::set_terminate(CPPExceptionTerminate);
        }
        else
        {
            std::set_terminate(g_originalTerminateHandler);
        }
        g_captureNextStackTrace = isEnabled;
    }
}

其中std::set_terminate函数描述可以参看C++ 参考手册 zh.cppreference.com/w/cpp/error…

4. SentryCrashMonitorTypeNSException

这个大家就比较熟悉了,用于捕获APP层的异常,核心代码如下

static void setEnabled(bool isEnabled)
{
    if(isEnabled != g_isEnabled)
    {
        g_isEnabled = isEnabled;
        if(isEnabled)
        {
            //备份现有的异常处理器
            SentryCrashLOG_DEBUG(@"Backing up original handler.");
            g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler();

            //设置新的异常处理器
            SentryCrashLOG_DEBUG(@"Setting new handler.");
            NSSetUncaughtExceptionHandler(&handleUncaughtException);
            SentryCrash.sharedInstance.uncaughtExceptionHandler = &handleUncaughtException;
            SentryCrash.sharedInstance.currentSnapshotUserReportedExceptionHandler = &handleCurrentSnapshotUserReportedException;
        }
        ...
    }
}

其中关于NSSetUncaughtExceptionHandler的用法可以参考苹果官方文档developer.apple.com/documentati…

5. SentryCrashMonitorTypeMainThreadDeadlock

主要捕获主线程阻塞的异常,核心代码如下

- (void) watchdogPulse
{
    __block id blockSelf = self;
    self.awaitingResponse = YES;
    dispatch_async(dispatch_get_main_queue(), ^
                   {
                       [blockSelf watchdogAnswer];
                   });
}

- (void) watchdogAnswer
{
    self.awaitingResponse = NO;
}

- (void) runMonitor
{
    BOOL cancelled = NO;
    do
    {
        // Only do a watchdog check if the watchdog interval is > 0.
        // If the interval is <= 0, just idle until the user changes it.
        @autoreleasepool {
            NSTimeInterval sleepInterval = g_watchdogInterval;
            BOOL runWatchdogCheck = sleepInterval > 0;
            if(!runWatchdogCheck)
            {
                sleepInterval = kIdleInterval;
            }
            [NSThread sleepForTimeInterval:sleepInterval];
            cancelled = self.monitorThread.isCancelled;
            if(!cancelled && runWatchdogCheck)
            {
                if(self.awaitingResponse)
                {
                    [self handleDeadlock];
                }
                else
                {
                    [self watchdogPulse];
                }
            }
        }
    } while (!cancelled);
}

通过主线程执行block的间隔时间确定,间隔时间超过设置的阈值即认为主线程阻塞了。

6. SentryCrashMonitorTypeUserReported

用户通过调用下面API函数主动上报的异常

[SentryClient.sharedClient reportUserException:<#(NSString *)name#> 
                                        reason:<#(NSString *)reason#> 
                                      language:<#(NSString *)language#> 
                                    lineOfCode:<#(NSString *)lineOfCode#> 
                                    stackTrace:<#(NSArray *)stackTrace#> 
                                 logAllThreads:<#(BOOL)logAllThreads#> 
                              terminateProgram:<#(BOOL)terminateProgram#>];

7. SentryCrashMonitorTypeSystem

主要记录和注入系统的状态的,例如系统版本号、内核版本、是否是模拟器等,部分代码如下

static void initialize()
{
    static bool isInitialized = false;
    if(!isInitialized)
    {
        isInitialized = true;
        ...
        g_systemData.kernelVersion = stringSysctl("kern.version");
        g_systemData.osVersion = stringSysctl("kern.osversion");
        g_systemData.isJailbroken = isJailbroken();
        g_systemData.bootTime = dateSysctl("kern.boottime");
        g_systemData.appStartTime = dateString(time(NULL));
        g_systemData.executablePath = cString(getExecutablePath());
        g_systemData.executableName = cString(infoDict[@"CFBundleExecutable"]);
        g_systemData.bundleID = cString(infoDict[@"CFBundleIdentifier"]);
        g_systemData.bundleName = cString(infoDict[@"CFBundleName"]);
        g_systemData.bundleVersion = cString(infoDict[@"CFBundleVersion"]);
        g_systemData.bundleShortVersion = cString(infoDict[@"CFBundleShortVersionString"]);
        g_systemData.appID = getAppUUID();
        g_systemData.cpuArchitecture = getCurrentCPUArch();
        g_systemData.cpuType = sentrycrashsysctl_int32ForName("hw.cputype");
        g_systemData.cpuSubType = sentrycrashsysctl_int32ForName("hw.cpusubtype");
        g_systemData.binaryCPUType = header->cputype;
        g_systemData.binaryCPUSubType = header->cpusubtype;
        g_systemData.timezone = cString([NSTimeZone localTimeZone].abbreviation);
        g_systemData.processName = cString([NSProcessInfo processInfo].processName);
        g_systemData.processID = [NSProcessInfo processInfo].processIdentifier;
        g_systemData.parentProcessID = getppid();
        g_systemData.deviceAppHash = getDeviceAndAppHash();
        g_systemData.buildType = getBuildType();
        g_systemData.storageSize = getStorageSize();
        g_systemData.memorySize = sentrycrashsysctl_uint64ForName("hw.memsize");
    }
}

8. SentryCrashMonitorTypeApplicationState

主要记录和注入APP的状态的,比如启动时间、是否在前台等等,比较简单,这里就不在累述。

9. SentryCrashMonitorTypeZombie

跟踪并注入僵尸对象信息,核心代码如下

#define CREATE_ZOMBIE_HANDLER_INSTALLER(CLASS) \
static IMP g_originalDealloc_ ## CLASS; \
static void handleDealloc_ ## CLASS(id self, SEL _cmd) \
{ \
    handleDealloc(self); \
    typedef void (*fn)(id,SEL); \
    fn f = (fn)g_originalDealloc_ ## CLASS; \
    f(self, _cmd); \
} \
static void installDealloc_ ## CLASS() \
{ \
    Method method = class_getInstanceMethod(objc_getClass(#CLASS), sel_registerName("dealloc")); \
    g_originalDealloc_ ## CLASS = method_getImplementation(method); \
    method_setImplementation(method, (IMP)handleDealloc_ ## CLASS); \
}

CREATE_ZOMBIE_HANDLER_INSTALLER(NSObject)
CREATE_ZOMBIE_HANDLER_INSTALLER(NSProxy)

static void install()
{
    //分配Zombie缓存空间
    unsigned cacheSize = CACHE_SIZE;
    g_zombieHashMask = cacheSize - 1;
    g_zombieCache = calloc(cacheSize, sizeof(*g_zombieCache));
    if(g_zombieCache == NULL)
    {
        SentryCrashLOG_ERROR("Error: Could not allocate %u bytes of memory. SentryCrashZombie NOT installed!",
              cacheSize * sizeof(*g_zombieCache));
        return;
    }

    g_lastDeallocedException.class = objc_getClass("NSException");
    g_lastDeallocedException.address = NULL;
    g_lastDeallocedException.name[0] = 0;
    g_lastDeallocedException.reason[0] = 0;

    // Hook dealloc函数
    installDealloc_NSObject();
    installDealloc_NSProxy();
}

这里主要通过Method Swizzling方式Hook了NSObjectNSProxy两个类的dealloc函数。其中关于Method Swizzling的原理我也为大家准备了一个科普文章iOS黑魔法-Method Swizzling

总结

好了,到这里大家基本已经能够知道Sentry是如何捕捉各种Crash异常事件的了,后面将会介绍Sentry是如何记录和发送异常事件。