阅读 1102

SIL optimizer - string append 优化

简单介绍

这是最近Swift开发人员对Swift String进行的一个优化,PR地址SIL optimizer: Add a new string optimization #33128,根据描述可以看到有一下几个方面的优化:

  • 当表达式 x.append(y) 的x是空的时候,用表达式x = y代替它。
  • 移除x.append("")
  • 当表达式x.append(y)的 x 和 y都是常亮字符串时,用表达式 x = x + y代替 。
  • 如果T是静态已知的,则用常量字符串替换_typeName(T.self)

SIL分析

因为这个优化是用过添加一个SIL Pass完成的,也就是在SIL层面进行的优化,我们需要简单了解一下String append相关的一些SIL指令。

来看一个简单的例子,创建一个String.swift文件,然后添加如下代码:


func stringTest() -> String {
    var string = "Hello"
    string.append("Roy")
    return string
}

复制代码

代码很简单,就是创建一个字符串,然后通过string.append向字符串尾部添加另一个字符串。

swiftc -emit-sil String.swift > String.sil
复制代码

然后通过上面命令将上面的swift源码转成SIL代码,代码并不多。

// stringTest()
sil hidden @$s6String10stringTestSSyF : $@convention(thin) () -> @owned String {
bb0:
  %0 = alloc_stack $String, var, name "string"    // users: %7, %24, %23, %14, %19
  %1 = string_literal utf8 "Hello"                // user: %6
  %2 = integer_literal $Builtin.Word, 5           // user: %6
  %3 = integer_literal $Builtin.Int1, -1          // user: %6
  %4 = metatype $@thin String.Type                // user: %6
  // function_ref String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:)
  %5 = function_ref @$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // user: %6
  %6 = apply %5(%1, %2, %3, %4) : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // user: %7
  store %6 to %0 : $*String                       // id: %7
  %8 = string_literal utf8 "Roy"                  // user: %13
  %9 = integer_literal $Builtin.Word, 3           // user: %13
  %10 = integer_literal $Builtin.Int1, -1         // user: %13
  %11 = metatype $@thin String.Type               // user: %13
  // function_ref String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:)
  %12 = function_ref @$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // user: %13
  %13 = apply %12(%8, %9, %10, %11) : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // users: %18, %16
  %14 = begin_access [modify] [static] %0 : $*String // users: %17, %16
  // function_ref String.append(_:)
  %15 = function_ref @$sSS6appendyySSF : $@convention(method) (@guaranteed String, @inout String) -> () // user: %16
  %16 = apply %15(%13, %14) : $@convention(method) (@guaranteed String, @inout String) -> ()
  end_access %14 : $*String                       // id: %17
  release_value %13 : $String                     // id: %18
  %19 = begin_access [read] [static] %0 : $*String // users: %20, %22
  %20 = load %19 : $*String                       // users: %25, %21
  retain_value %20 : $String                      // id: %21
  end_access %19 : $*String                       // id: %22
  destroy_addr %0 : $*String                      // id: %23
  dealloc_stack %0 : $*String                     // id: %24
  return %20 : $String                            // id: %25
} // end sil function '$s6String10stringTestSSyF'

// String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:)
sil [serialized] [always_inline] [readonly] [_semantics "string.makeUTF8"] @$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String

// String.append(_:)
sil [_semantics "string.append"] @$sSS6appendyySSF : $@convention(method) (@guaranteed String, @inout String) -> ()

复制代码

下面来简单分析分析。

  %0 = alloc_stack $String, var, name "string"    // users: %7, %24, %23, %14, %19
  %1 = string_literal utf8 "Hello"                // user: %6
  %2 = integer_literal $Builtin.Word, 5           // user: %6
  %3 = integer_literal $Builtin.Int1, -1          // user: %6
  %4 = metatype $@thin String.Type                // user: %6
复制代码
  • alloc_stack T 在堆栈上分配(未初始化的)内存以包含T,并返回分配的内存的地址。
  • $String 我们为其分配内存为String类型,SIL中的类型以开头$。
  • %1 = string_literal utf8 "Hello" 在全局字符串表中创建对字符串的引用。结果是指向数据的指针。引用的字符串始终以空值结尾。字符串文字值是使用Swift的字符串文字语法指定的。编码为utf8。
  • integer_literal $Builtin.Word, 5 创建一个integer_literal,类型是Builtin.Word,值为5。这是我们要为其分配内存的字符串大小,因为字符串"Hello"长度为5。
  • integer_literal $Builtin.Int1, -1 创建一个integer_literal,类型是Builtin.Int1,值为-1。SIL中bool类型也是Builtin.Int1,表示是否是ASCII。
  • metatype $T.Type 创建对type的元类型对象的引用T,在这里,我们得到对类型的引用String。请注意,这是实际类型,因为它没有任何占位符类型。
 // function_ref String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:)
  %5 = function_ref @$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // user: %6
  %6 = apply %5(%1, %2, %3, %4) : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // user: %7
  store %6 to %0 : $*String                       // id: %7
复制代码
  • 寄存器%5是一个函数的引用,函数是String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:),即String的初始化方法。这个方法共有4个参数,分别是Builtin.RawPointerBuiltin.Word,Builtin.Int1@thin String.Type类型,返回值是String类型。
  • apply 是函数调用,调用的是%5,传入的参数是%1, %2, %3, %4。执行完返回结果存储在寄存器%6。
  • store 是内存访问指令,将值%6存储到地址%0的内存中。%0的类型是* String,也就是一个String类型指针,%6的类型是String,它将覆盖%0处的内存。
  %8 = string_literal utf8 "Roy"                  // user: %13
  ....
  %13 = apply %12(%8, %9, %10, %11) : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // users: %18, %16
复制代码

这段指令和前面讲过的一样,目的是初始化字符串"Roy" 。

  %14 = begin_access [modify] [static] %0 : $*String // users: %17, %16
  // function_ref String.append(_:)
  %15 = function_ref @$sSS6appendyySSF : $@convention(method) (@guaranteed String, @inout String) -> () // user: %16
  %16 = apply %15(%13, %14) : $@convention(method) (@guaranteed String, @inout String) -> ()
  end_access %14 : $*String                       // id: %17
  release_value %13 : $String                     // id: %18
  %19 = begin_access [read] [static] %0 : $*String // users: %20, %22

复制代码
  • begin_access 获取对%0的内存访问权限,权限是[modify] 修改。
  • %15是一个函数引用,函数是String.append(_:),即String的append方法。这个方法共有4个参数,分别是@guaranteed String@inout String,没有返回值。
  • apply 是函数调用,调用的是%15,传入的参数是%13, %14。其中%13是"Roy",%14是"Hello"

这里需要解释一下两个参数是@guaranteed @inout类型 以及begin_access 指令。

@guaranteed

SIL Ownership 模型允许表达静态生命周期不变式并由SIL IR沿SSA边实施。是SSA的派生形式,用于表达沿def-use边的所有权不变式。

ValueOwnershipKind有三种,其中一种就是Guaranteed。具有@guaranteed所有权的值是一个有范围的生命周期的不可变值,它是由@owned值得出的。@guaranteed值的生存期是对@owned值的生存期的约束,并静态地防止@owned值被破坏,直到@guaranteed值的生存期结束。

从SIL中可以看到 %13 中的函数返回是的@owned String,与上面的描述刚好符合。

@inout参数

以下是master/docs/SIL.rst#inout-arguments 中的解释

@inout参数通过地址传递到函数入口点。被调用方不拥有所引用内存的所有权。引用的内存必须在函数进入和退出时进行初始化。如果@inout变量引用的是易碎的物理变量,则自变量是该变量的地址。如果@inout变量引用逻辑属性,则自变量是调用者拥有的写回缓冲区的地址。调用者有责任在调用函数之前通过存储属性getter的结果来初始化缓冲区,并在返回时通过从缓冲区加载并用最终值调用setter来写回属性。

func inout(_ x:inout Int){
  x = 1
}
复制代码

比如这个方法,就可以在方法内部修改传入的参数的值。

begin_access

begin_accessend_access 是内存访问指令,begin_access开始访问内存,end_access结束访问内存,并且访问必须在每个控制流路径上唯一结束。

StringOptimizationPass

StringOptimizationPass类继承自SILFunctionTransform是一个SIL Pass。

/// The StringOptimization function pass.
class StringOptimizationPass : public SILFunctionTransform {
public:

  void run() override {
    SILFunction *F = getFunction();
    if (!F->shouldOptimize())
      return;

    LLVM_DEBUG(llvm::dbgs() << "*** StringOptimization on function: "
                            << F->getName() << " ***\n");

    StringOptimization stringOptimization;
    bool changed = stringOptimization.run(F);

    if (changed) {
      invalidateAnalysis(SILAnalysis::InvalidationKind::CallsAndInstructions);
    }
  }
};
复制代码

这个类比较简单,void run()是入口方法,这个方法没太多好讲的,通过getFunction()获得SILFunction对象,如果不需要优化就退出,需要优化就进入StringOptimization类的run方法,参数是SILFunction对象。

StringOptimization

run

/// 优化的主要入口.
bool StringOptimization::run(SILFunction *F) {
  /// 找到字符串声明,因为只有在本方法中声明的字符串才能进行判断进行优化
  NominalTypeDecl *stringDecl = F->getModule().getASTContext().getStringDecl();
  /// 如果没找到声明就返回false
  if (!stringDecl)
    return false;
  stringType = SILType::getPrimitiveObjectType(
                 CanType(stringDecl->getDeclaredType()));

  /// 创建临时变量来保存是否修改了SIL代码,也就是是否进行了优化,初始值为false
  bool changed = false;
  
  /// 遍历SILFunction的SILBasicBlock,对SILBasicBlock进行优化
  for (SILBasicBlock &block : *F) {
    changed |= optimizeBlock(block);
  }
  return changed;
}
复制代码

optimizeBlock

///  对basic block进行优化
bool StringOptimization::optimizeBlock(SILBasicBlock &block) {
  bool changed = false;
  
  ///  一个DenseMap类型的Map表,将可识别的对象(alloc_stack,inout参数)映射到
  /// 存储在这些对象中的string values
  llvm::DenseMap<SILValue, SILValue> storedStrings;
  
  /// 遍历SILBasicBlock中的SILInstruction
  for (auto iter = block.begin(); iter != block.end();) {
    SILInstruction *inst = &*iter++;
    /// 找到StoreInst
    if (StoreInst *store = isStringStoreToIdentifyableObject(inst)) {
      /// storedStrings存储store instruction中需要存储的值和存储的目的地址的值映射关系
      storedStrings[store->getDest()] = store->getSrc();
      continue;
    }
    /// 找到string.append的apply instruction,参数2个
    if (ApplyInst *append = isSemanticCall(inst, semantics::STRING_APPEND, 2)) {
      /// 优化String.append 
      if (optimizeStringAppend(append, storedStrings)) {
        changed = true;
        continue;
      }
    }
    /// 找到typeName的apply instruction,参数2个
    if (ApplyInst *typeName = isSemanticCall(inst, semantics::TYPENAME, 2)) {
      if (optimizeTypeName(typeName)) {
        changed = true;
        continue;
      }
    }
    // 如果inst覆盖(或可能覆盖)可识别对象中存储的String,则从storedStrings中删除项目。
    invalidateModifiedObjects(inst, storedStrings);
  }
  return changed;
}
复制代码
StoreInst *store = isStringStoreToIdentifyableObject(inst)

复制代码

中的store是store %6 to %0 : $*String // id: %这个store指令。

storedStrings[store->getDest()] = store->getSrc();

复制代码
enum {
    /// the value being stored
    Src,
    /// the lvalue being stored to
    Dest
  };

  SILValue getSrc() const { return Operands[Src].get(); }
  SILValue getDest() const { return Operands[Dest].get(); }
复制代码

store->getDest()是获得store 指令的目的值,也就是%0。store-> getSrc()是获得store 指令需要存储的值,也就是%6。

ApplyInst *append = isSemanticCall(inst, semantics::STRING_APPEND, 2)
复制代码

semantics::STRING_APPEND是SEMANTICS_ATTR(STRING_APPEND, "string.append"),在include/swift/AST/SemanticAttrs.def中,SIL中定义则是:

// String.append(_:)
sil [_semantics "string.append"] @$sSS6appendyySSF : $@convention(method) (@guaranteed String, @inout String) -> ()

复制代码

所以这个appen实际是%16。

semantics::TYPENAME是这个PR新添加的,是SEMANTICS_ATTR(TYPENAME, "typeName"),也在在include/swift/AST/SemanticAttrs.def中。是开始所说的优化中的一个,这个优化就不细讲了。

isStringStoreToIdentifyableObject

StoreInst *StringOptimization::
isStringStoreToIdentifyableObject(SILInstruction *inst) {
    auto *store = dyn_cast<StoreInst>(inst);
    /// 判断是StoreInst类型
    if (!store)
        return nullptr; 
    /// 判断StoreInst需要存储的数据是字符串类型
    if (store->getSrc()->getType() != stringType)
        return nullptr;
    
    SILValue destAddr = store->getDest();    
    /// 我们只处理alloc_stack的间接函数参数。仅通过检查所有users就可以确保它们没有别名。
    /// 也就是存储目的是直接的AllocStackInst
    if (!isa<AllocStackInst>(destAddr) && !isExclusiveArgument(destAddr))
        return nullptr;
    /// 如果有cache,直接返回。没有的话cache之后返回
    if (identifyableObjectsCache.count(destAddr) != 0) {
        return identifyableObjectsCache[destAddr] ? store : nullptr;
    }
    
    /// 检查它是否是"identifyable"的对象。这是一种case,它仅拥有我们可以通过简单方式跟踪的users:stores和applies。
    for (Operand *use : destAddr->getUses()) {
        SILInstruction *user = use->getUser();
        switch (user->getKind()) {
            case SILInstructionKind::DebugValueAddrInst:
            case SILInstructionKind::DeallocStackInst:
            case SILInstructionKind::LoadInst:
                break;
            default:
                if (!mayWriteToIdentifyableObject(inst)) {
                    // We don't handle user. It is some instruction which may write to
                    // destAddr or let destAddr "escape" (like an address projection).
                    identifyableObjectsCache[destAddr] = false;
                    return nullptr;
                }
                break;
        }
    }
    identifyableObjectsCache[destAddr] = true;
    return store;
}
复制代码
store->getSrc()->getType() != stringType

复制代码

store需要存储的是字符串类型,也就是%6,String.init初始化的字符串。

isa<AllocStackInst>(destAddr)
复制代码

要求目的地址是AllocStackInst类型寄存器,也就是0%,%0 = alloc_stack $String, var, name "string"

isSemanticCall

如果\ p inst是具有语义属性\ p attr和正好\ p numArgs参数的函数的调用,则返回apply指令。

ApplyInst *StringOptimization::isSemanticCall(SILInstruction *inst,
                                              StringRef attr, unsigned numArgs) {
    auto apply = dyn_cast<ApplyInst>(inst);
    if (!apply || apply->getNumArguments() != numArgs)
        return nullptr;
    
    SILFunction *callee = apply->getReferencedFunctionOrNull();
    if (callee && callee->hasSemanticsAttr(attr))
        return apply;
    
    return nullptr;
}
复制代码

optimizeStringAppend

优化最开始提到的几种String.append

bool StringOptimization::optimizeStringAppend(ApplyInst *appendCall,
                            llvm::DenseMap<SILValue, SILValue> &storedStrings) {
  /// 得到appendCall的参数0
  SILValue rhs = appendCall->getArgument(0);
  /// 获得参数0的字符串信息
  StringInfo rhsString = getStringInfo(rhs);
  
  // 如果lhs.append(rhs)中rhs是空的,则移除appendCall。是需要优化的第二种case。
  if (rhsString.isEmpty()) {
    appendCall->eraseFromParent();
    return true;
  }
  
  /// 得到appendCall的参数1
  SILValue lhsAddr = appendCall->getArgument(1);
  /// 获得storedStrings[lhsAddr]的字符串信息
  StringInfo lhsString = getStringInfo(storedStrings[lhsAddr]);

  // The following two optimizations are a trade-off: Performance-wise it may be
  // benefitial to initialize an empty string with reserved capacity and then
  // append multiple other string components.
  // Removing the empty string (with the reserved capacity) might result in more
  // allocations.
  // So we just do this optimization up to a certain capacity limit (found by
  // experiment).
  if (lhsString.reservedCapacity > 50)
    return false;

  // 如果 lhs.append(rhs) 中lhs是空,用 'lhs = rhs' 代替。是需要优化的第一种case。
  if (lhsString.isEmpty()) {
    // 用rhs替换String.append指令
    replaceAppendWith(appendCall, rhs, /*copyNewValue*/ true);
    storedStrings[lhsAddr] = rhs;
    return true;
  }
  
  // 如果lhs.append(rhs)中lhs 和 rhs是常量字符串,用 "lhs = lhs + rhs" 代替
  if (lhsString.isConstant() && rhsString.isConstant()) {
    std::string concat = lhsString.str;
    /// 字符串相加
    concat += rhsString.str;
    // 创建字符串初始化函数调用指令
    if (ApplyInst *stringInit = createStringInit(concat, appendCall)) {
      // 指令替换,用字符串调用指令替换String.append指令,并返回true
      replaceAppendWith(appendCall, stringInit, /*copyNewValue*/ false);
      storedStrings[lhsAddr] = stringInit;
      return true;
    }
  }
  
  return false;
}

复制代码

ApplyInst *appendCall是SIL节提到的16%。

// function_ref String.append(_:)
  %15 = function_ref @$sSS6appendyySSF : $@convention(method) (@guaranteed String, @inout String) -> () // user: %16
  %16 = apply %15(%13, %14) : $@convention(method) (@guaranteed String, @inout String) -> ()
复制代码

他有两个参数,为别为13%和14%。13%是"Roy"地址,14%是"Hello"的地址%0。

  SILValue rhs = appendCall->getArgument(0);
  StringInfo rhsString = getStringInfo(rhs);
   if (rhsString.isEmpty()) {
    appendCall->eraseFromParent();
    return true;
  }
复制代码

rhs是13%,也是"Roy"地址,rhsString是"Roy的字符串详情,如果这个字符串为空就移除appendCall。即移除x.append("")

  SILValue lhsAddr = appendCall->getArgument(1);
  StringInfo lhsString = getStringInfo(storedStrings[lhsAddr]);
复制代码

lhsAddr是%14,也是%0的地址。

store %6 to %0 : $*String  
复制代码

再结合store 指令可以知道storedStrings[lhsAddr]是%6,也就是"Hello"地址,lhsString"Hello"字符串详情。

replaceAppendWith

/// Replace a String.append() with a store of \p newValue to the destination.
void StringOptimization::replaceAppendWith(ApplyInst *appendCall,
                                      SILValue newValue, bool copyNewValue) {
  SILBuilder builder(appendCall);
  /// 获得appendCall的SILLocation
  SILLocation loc = appendCall->getLoc();
  /// 获得appendCall参数1
  SILValue destAddr = appendCall->getArgument(1);
  if (appendCall->getFunction()->hasOwnership()) {
    if (copyNewValue)
      newValue = builder.createCopyValue(loc, newValue);
    builder.createStore(loc, newValue, destAddr,
                        StoreOwnershipQualifier::Assign);
  } else {
    if (copyNewValue)
      builder.createRetainValue(loc, newValue, builder.getDefaultAtomicity());
    builder.createDestroyAddr(loc, destAddr);
    builder.createStore(loc, newValue, destAddr,
                        StoreOwnershipQualifier::Unqualified);
  }
  appendCall->eraseFromParent();
}
复制代码

通过SILBuilder构建替换指令

appendCall->getFunction()->hasOwnership()

复制代码
/// Returns true if this function has qualified ownership instructions in it.
  bool hasOwnership() const { return HasOwnership; }

复制代码

如果appendCall的调用函数有ownership 指令,返回true。而这里第一个参数是@guaranteed,属于ownership 指令。

builder.createDestroyAddr(loc, destAddr);
复制代码

这个目的是创建destroy_addr指令。而因为有@guaranteed,所以可以到原SIL已经创建了释放destroy_addr ,来释放%0指向的内存地址,因此不需要重复创建。

destroy_addr %0 : $*String                      // id: %23
复制代码
builder.createStore(loc, newValue, destAddr,
                            StoreOwnershipQualifier::Assign);
复制代码

创建store指令,将newValue存储到destAddr,StoreOwnershipQualifier是Assign类型。

createStringInit

创建字符串初始化函数调用指令

/// Creates a call to a string initializer.
ApplyInst *StringOptimization::createStringInit(StringRef str,
                                                SILInstruction *beforeInst) {
  SILBuilder builder(beforeInst);
  SILLocation loc = beforeInst->getLoc();
  SILModule &module = beforeInst->getFunction()->getModule();
  ASTContext &ctxt = module.getASTContext();
  
  if (!makeUTF8Func) {
    // Find the String initializer which takes a string_literal as argument.
    ConstructorDecl *makeUTF8Decl = ctxt.getMakeUTF8StringDecl();
    if (!makeUTF8Decl)
      return nullptr;
    
    auto Mangled = SILDeclRef(makeUTF8Decl, SILDeclRef::Kind::Allocator).mangle();
    makeUTF8Func = module.findFunction(Mangled, SILLinkage::PublicExternal);
    if (!makeUTF8Func)
      return nullptr;
  }

  auto *literal = builder.createStringLiteral(loc, str,
                    StringLiteralInst::Encoding::UTF8);

  auto *length = builder.createIntegerLiteral(loc,
                    SILType::getBuiltinWordType(ctxt),
                    literal->getCodeUnitCount());

  auto *isAscii = builder.createIntegerLiteral(loc,
                    SILType::getBuiltinIntegerType(1, ctxt),
                    intmax_t(ctxt.isASCIIString(str)));

  SILType stringMetaType = SILType::getPrimitiveObjectType(
    CanType(MetatypeType::get(stringType.getASTType(),
      MetatypeRepresentation::Thin)));

  auto *metaTypeInst = builder.createMetatype(loc, stringMetaType);

  auto *functionRef = builder.createFunctionRefFor(loc, makeUTF8Func);

  return builder.createApply(loc, functionRef, SubstitutionMap(),
                             { literal, length, isAscii, metaTypeInst });
}
复制代码

这个就不细讲了,结合SIL中的两次创建字符串初始化调用指令能看明白了。分别是%1到%6和 %8到%13。

getStringInfo

返回字符串详情,如果它是常量字符串

/// Returns information about value if it's a constant string.
StringOptimization::StringInfo StringOptimization::getStringInfo(SILValue value) {
  // Start with a non-constant result.
  StringInfo result;
  
  auto *apply = dyn_cast_or_null<ApplyInst>(value);
  if (!apply)
    return result;

  SILFunction *callee = apply->getReferencedFunctionOrNull();
  if (!callee)
    return result;
  // 如果是初始化空字符串,设置result.numCodeUnits = 0;
  if (callee->hasSemanticsAttr(semantics::STRING_INIT_EMPTY)) {
    result.numCodeUnits = 0;
    return result;
  }
  // 如果初始化空大小的字符串,并且设置了容量大小,设置result.numCodeUnits = 0;
  if (callee->hasSemanticsAttr(semantics::STRING_INIT_EMPTY_WITH_CAPACITY)) {
    result.numCodeUnits = 0;
    result.reservedCapacity = std::numeric_limits<int>::max();
    if (apply->getNumArguments() > 0) {
      if (Optional<int> capacity = getIntConstant(apply->getArgument(0)))
        // result.reservedCapacity 为初始化的容量大小
        result.reservedCapacity = capacity.getValue();
    }
    return result;
  }
  // 如果是string literal initializer
  if (callee->hasSemanticsAttr(semantics::STRING_MAKE_UTF8)) {
    SILValue stringVal = apply->getArgument(0);
    auto *stringLiteral = dyn_cast<StringLiteralInst>(stringVal);
    SILValue lengthVal = apply->getArgument(1);
    auto *intLiteral = dyn_cast<IntegerLiteralInst>(lengthVal);
    if (intLiteral && stringLiteral &&
        // For simplicity, we only support UTF8 string literals.
        stringLiteral->getEncoding() == StringLiteralInst::Encoding::UTF8) {
      result.str = stringLiteral->getValue();
      result.numCodeUnits = intLiteral->getValue().getSExtValue();
      return result;
    }
  }
  return result;
}
复制代码

semantics::STRING_INIT_EMPTYSEMANTICS_ATTR(STRING_INIT_EMPTY, "string.init_empty"),在include/swift/AST/SemanticAttrs.def中。

semantics::STRING_INIT_EMPTY_WITH_CAPACITYSEMANTICS_ATTR(STRING_INIT_EMPTY_WITH_CAPACITY, "string.init_empty_with_capacity"),在include/swift/AST/SemanticAttrs.def中。表示创建空的字符串,但设置了容量大小的初始化方法。

semantics::STRING_MAKE_UTF8,是string literal initializer。SIL如下:

// String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:)
sil [serialized] [always_inline] [readonly] [_semantics "string.makeUTF8"] @$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String
复制代码

apply指令如下:

 %6 = apply %5(%1, %2, %3, %4) : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String // user: %7
复制代码

SILValue stringVal = apply->getArgument(0);,参数0是%1,也就是IntegerLiteralInst。SILValue lengthVal = apply->getArgument(1);,参数1是%2,字符串长度5。result.str = stringLiteral->getValue();,获得字符串,也就是"Hello"。

ASTContext.cpp

ASTContext.cpp 中增加了一个getMakeUTF8StringDecl()方法获得MakeUTF8StringDec,目的是在createStringInit方法中使用,来手动创建string初始化call。

ConstructorDecl *ASTContext::getMakeUTF8StringDecl() const {
  if (getImpl().MakeUTF8StringDecl)
    return getImpl().MakeUTF8StringDecl;

  // 获得初始化
  auto initializers =
    getStringDecl()->lookupDirect(DeclBaseName::createConstructor());
  
  for (Decl *initializer : initializers) {
    auto *constructor = cast<ConstructorDecl>(initializer);
    auto Attrs = constructor->getAttrs();
    for (auto *A : Attrs.getAttributes<SemanticsAttr, false>()) {
      if (A->Value != semantics::STRING_MAKE_UTF8)
        continue;
      auto ParamList = constructor->getParameters();
      if (ParamList->size() != 3)
        continue;
      ParamDecl *param = constructor->getParameters()->get(0);
      if (param->getArgumentName().str() != "_builtinStringLiteral")
        continue;

      getImpl().MakeUTF8StringDecl = constructor;
      return constructor;
    }
  }
  return nullptr;
}
复制代码

semantics::STRING_MAKE_UTF8SEMANTICS_ATTR(STRING_MAKE_UTF8, "string.makeUTF8"),在include/swift/AST/SemanticAttrs.def中。SIL代码如下:

// String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:)
sil [serialized] [always_inline] [readonly] [_semantics "string.makeUTF8"] @$sSS21_builtinStringLiteral17utf8CodeUnitCount7isASCIISSBp_BwBi1_tcfC : $@convention(method) (Builtin.RawPointer, Builtin.Word, Builtin.Int1, @thin String.Type) -> @owned String
复制代码
ParamList->size() != 3
复制代码

判断参数是否为3,从SIL来看,参数确实是3。

ParamDecl *param = constructor->getParameters()->get(0);
      if (param->getArgumentName().str() != "_builtinStringLiteral")
复制代码

通过参数列表获得参数0的参数名称,判断是否是"_builtinStringLiteral",从SIL来看也是这样。

PassManager中添加Pass

最后在include/swift/SILOptimizer/PassManager/Passes.def中添加Pass

PASS(StringOptimization, "string-optimization",
     "Optimization for String operations")
复制代码