iOS 底层探索篇 —— cache_t分析

872 阅读7分钟

前言

观察cache_t结构体

1. 类结构体

struct objc_class : objc_object {
    // Class ISA;           //8
    Class superclass;       //8
    cache_t cache;          //16        // formerly cache pointer and vtable
    class_data_bits_t bits;  
...省略部分信息...
};

2. cache_t结构体

struct cache_t {
    struct bucket_t *_buckets;  //8
    mask_t _mask;               //4      typedef uint32_t mask_t;  
    mask_t _occupied;           //4

public:
    struct bucket_t *buckets();
    mask_t mask();
    mask_t occupied();
    void incrementOccupied();
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    void initializeToEmpty();

    mask_t capacity();
    bool isConstantEmptyCache();
    bool canBeFreed();

    static size_t bytesForCapacity(uint32_t cap);
    static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap);

    void expand();
    void reallocate(mask_t oldCapacity, mask_t newCapacity);
    struct bucket_t * find(SEL sel, id receiver);

    static void bad_cache(id receiver, SEL sel, Class isa) __attribute__((noreturn));
};

3. struct bucket_t *_buckets

struct bucket_t {
private:
#if __arm64__
    uintptr_t _imp;   //typedef unsigned long           uintptr_t;
    SEL _sel;
#else
    SEL _sel;
    uintptr_t _imp;
#endif
    ...省略部分信息...
};
  • _buckets一个指向bucket_t结构体的指针,可以看命名是一个数组;
  • _imp函数实现地址;
  • _sel函数;
  • 注意在arm64的平台上_imp_sel的前面。

4. mask_t _mask

一个修饰的值,在接下来的分析可以得到是 hash链表的长度减1。

5. mask_t _occupied

缓存已占用的空间

cache_t缓存的内容

通过bucket_t结构体观察存的就是impsel,证实缓存的是OC的方法。

cache_t缓存的流程

1. 缓存入口

我们在objc的源码objc_cache.mm文件的开始位置可以看到这么一段注释

 * Cache readers (PC-checked by collecting_in_critical())
 * objc_msgSend*
 * cache_getImp
 *
 * Cache writers (hold cacheUpdateLock while reading or writing; not PC-checked)
 * cache_fill         (acquires lock)
 * cache_expand       (only called from cache_fill)
 * cache_create       (only called from cache_expand)
 * bcopy               (only called from instrumented cache_expand)
 * flush_caches        (acquires lock)
 * cache_flush        (only called from cache_fill and flush_caches)
 * cache_collect_free (only called from cache_expand and cache_flush)

既然是缓存就是要写入,我们直接从Cache writers下面看,cache_fill就是缓存开始的位置。

2. cache_fill

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if !DEBUG_TASK_THREADS
    mutex_locker_t lock(cacheUpdateLock);
    cache_fill_nolock(cls, sel, imp, receiver);
#else
    _collecting_in_critical();
    return;
#endif
}

忽略其他信息,只有cache_fill_nolock这个函数是我们的目标。

3. cache_fill_nolock

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
    cacheUpdateLock.assertLocked();

    // Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasnot added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return;

    cache_t *cache = getCache(cls);

    // Use the cache as-is if it is less than 3/4 full
    mask_t newOccupied = cache->occupied() + 1;
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
       // Cache is read-only. Replace it.
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
    else {
        // Cache is too full. Expand it.
        cache->expand();
    }

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    bucket_t *bucket = cache->find(sel, receiver);
    if (bucket->sel() == 0) cache->incrementOccupied();
    bucket->set<Atomic>(sel, imp);
}

cache_t缓存的流程深入

从上面的分析我们知道在函数cache_fill_nolock里面会实现缓存步骤。接下来开始对每一步深入研究。

1. 缓存主线流程分析

1. 条件判断

 if (!cls->isInitialized()) return;
 if (cache_getImp(cls, sel)) return;
  • 判断类是否已经初始化,还没有就直接返回了。
  • 在类里面是否可以找到sel,可以找到,说明已经缓存了,直接返回。

2. cache_t *cache = getCache(cls)

cache_t *getCache(Class cls) 
{
    assert(cls);
    return &cls->cache;
}

从当前类获取cache成员。

3. mask_t newOccupied = cache->occupied() + 1

  • occupied()函数返回的是_occupied,即当前已经占用的空间大小。
  • _occupied + 1 , 表示新的即将占用的内存空间大小。

4. mask_t capacity = cache->capacity()

mask_t cache_t::capacity() 
{
    return mask() ? mask()+1 : 0; 
    //mask() return _mask;
}

获取缓存的容量,注意这里 + 1,原因是因为赋值的时候,用了容量 - 1。

5. 缓存容量的处理

if (cache->isConstantEmptyCache()) {
        // Cache is read-only. Replace it.
        //INIT_CACHE_SIZE 4
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
    else {
        // Cache is too full. Expand it.
        cache->expand();
    }
  • cache是空的,就需要reallocte
  • newOccupied小于或者等于总容量的3/4,就走后面的。
  • 如果大于3/4,就需要expand()即缓存扩容。

6. 查找bucket以及赋值

bucket_t *bucket = cache->find(sel, receiver);
if (bucket->sel() == 0) cache->incrementOccupied();
bucket->set<Atomic>(sel, imp);

缓存的主线流程就是这样的一个过程的,接下来把第5步和第6步进行分析。

2. 缓存主线流程 容量处理过程查找过程分析

1. 容量处理过程

  1. cache->isConstantEmptyCache()
bool cache_t::isConstantEmptyCache()
{
    return 
        occupied() == 0  &&  
        buckets() == emptyBucketsForCapacity(capacity(), false);
}
  • 第一次进来的时候occupied()返回值_occupoed值为0。
  • buckets()返回值_buckets是一个还没有初始化的指针为nil。
bucket_t *emptyBucketsForCapacity(mask_t capacity, bool allocate = true)
{
    cacheUpdateLock.assertLocked();

    size_t bytes = cache_t::bytesForCapacity(capacity);
   
    // Use _objc_empty_cache if the buckets is small enough.

    if (bytes <= EMPTY_BYTES) {
        return (bucket_t *)&_objc_empty_cache;
    }
    
    *
    走到这个就会直接返回了
    bucket_t *          8                          1000
    _objc_empty_cache   16                        10000
                        &                         00000
    *
    
    //下面的代码与我们的流程没有关系 暂不分析
    // Use shared empty buckets allocated on the heap.
    static bucket_t **emptyBucketsList = nil;
    static mask_t emptyBucketsListCount = 0;
    
    mask_t index = log2u(capacity);

    if (index >= emptyBucketsListCount) {
        if (!allocate) return nil;

        mask_t newListCount = index + 1;
        bucket_t *newBuckets = (bucket_t *)calloc(bytes, 1);
        emptyBucketsList = (bucket_t**)
            realloc(emptyBucketsList, newListCount * sizeof(bucket_t *));
        // Share newBuckets for every un-allocated size smaller than index.
        // The array is therefore always fully populated.
        for (mask_t i = emptyBucketsListCount; i < newListCount; i++) {
            emptyBucketsList[i] = newBuckets;
        }
        emptyBucketsListCount = newListCount;

        if (PrintCaches) {
            _objc_inform("CACHES: new empty buckets at %p (capacity %zu)", 
                         newBuckets, (size_t)capacity);
        }
    }

    return emptyBucketsList[index];
}
  • cache_t::bytesForCapacity(capacity),参数capacity为0,实现为sizeof(bucket_t) * (cap + 1)bucket_t结构体里面有两个成员都是8字节总共16字节,运算结果为16。
  • EMPTY_BYTES,一个宏定义,在DEBUG值为(8+1)*16,否则为(1024+1)*16
  • if (bytes <= EMPTY_BYTES)显然这个条件是满足的。
  • (bucket_t *)&_objc_empty_cache,这是一个&运算,OBJC_EXPORT struct objc_cache _objc_empty_cache这是_objc_empty_cache的定义,它的字节占用的值为16,我们在objc_cache这个结构体里面是可以看到的。 8 & 16 记过计算出来为0。
  • if条件里面返回了,后面的流程就不用看了。

则第一次进来的时候这个条件判断是否为空的缓存是存在的。

  1. cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE)

表达的意思为缓存空的时候,就去开辟。

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    bool freeOld = canBeFreed();
   
    bucket_t *oldBuckets = buckets();

    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache is old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    setBucketsAndMask(newBuckets, newCapacity - 1);
  
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}
  • bool freeOld = canBeFreed(),函数的实现就是!isConstantEmptyCache(),则freeOld的值为true
  • bucket_t *newBuckets = allocateBuckets(newCapacity),根据容量 系统开辟一个新的buckets
  • setBucketsAndMask(newBuckets, newCapacity - 1)。做了三件事,空的_buckets赋值;_mask赋值,注意这里是容量 - 1;_occupied已占用的空间置为空。
  • if (freeOld)通过上面的取值以及知道是成立的,主要做了清空旧的缓存。
  1. newOccupied <= capacity / 4 * 3

这个意思就是新占用的空间是否小于总容量的3/4。

  1. cache->expand()

同比第7步,如果容量大于3/4的时候,内存需要扩容。

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    uint32_t oldCapacity = capacity();
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can it grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
    }

    reallocate(oldCapacity, newCapacity);
}
  • uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;,这里扩容后的容量会是旧容量的两倍。
  • reallocate(oldCapacity, newCapacity),这一步如同刚开始没有缓存的时候,重新让系统来开辟。

2. 查找过程

  1. bucket_t *bucket = cache->find(sel, receiver)
bucket_t * cache_t::find(SEL s, id receiver)
{
    assert(s != 0);

    bucket_t *b = buckets(); //获取_buckets
    mask_t m = mask(); //获取_mask

    mask_t begin = cache_hash(s, m);
    mask_t i = begin;
    do {
        if (b[i].sel() == 0  ||  b[i].sel() == s) {
            return &b[i];
        }
    } while ((i = cache_next(i, m)) != begin); 
    
    // hack
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)s, cls);
}
  • cache_hash(sel, _mask),这是一个hash算法return (mask_t)(uintptr_t)sel & mask,这里也可以知道_mask会比总容量小1的原因,就是这里计算的时候保证不会越界。
  • do...while循环,当_sel不存在或者在缓存里面匹配到了,就返回当前的bucket。循环的条件在arm64下面是return i ? i-1 : mask;,实际上就是形成一个闭环来查找。
  1. if (bucket->sel() == 0) cache->incrementOccupied()
void cache_t::incrementOccupied() 
{
    _occupied++;
}

表示在缓存里面没有找到的时候,这个时候要存到缓存里面,则已占用的空间需要增加。

  1. bucket->set<Atomic>(sel, imp)
void bucket_t::set(SEL newSel, IMP newImp)
{
    _imp = (uintptr_t)newImp;
    
    if (_sel != newSel) {
        if (atomicity == Atomic) {
            mega_barrier();
        }
        _sel = newSel;
    }
}

这一步就是表示当前取出来的bucket来重新赋值_sel_imp这两个成员。

到此为止cache的缓存流程就结束了。

学习之路,砥砺前行

不足之处可以在评论区指出