ART虚拟机 | JNI优化简史

3,923 阅读30分钟

当我们在Java里调用Native方法时,多数人会认为程序将直接跳转到对应的C/C++函数中。但实际情况并非如此,我们需要一个中间函数来处理线程状态切换、Local Reference Table更新、参数转换等一系列工作。这个函数通常被称为"JNI Trampoline"(trampoline:蹦床),它的运行时间越短,JNI调用的性能就越好。

JNI Trampoline.png

在Android发展的历史中,Google针对JNI Trampoline有过几次大的优化,总的来说可以分为两个方向:

  1. 适用于所有参数类型的generic trampoline向只适用于特定参数类型的specific trampoline转变。
  2. 根据C/C++函数的实际情况,省去trampoline中的一些工作。

众所周知,ART虚拟机支持三种执行模式:解释执行、AOT和JIT。后两者同属于机器码执行。当解释器碰到Native方法时,它会选择虚拟机内置的art_quick_generic_jni_trampoline来处理中间事务。这种generic trampoline可以适用所有的参数类型,但由于考虑了各种情况,甚至最极端的情况,因此性能并不好。举个例子,即便我们只传递一个参数,art_quick_generic_jni_trampoline也会在栈上分配5K的大小。

// Reserved area on stack for art_quick_generic_jni_trampoline:
//           4    local state ref
//           4    padding
//        4096    4k scratch space, enough for 2x 256 8-byte parameters
//   8*(32+32)    max 32 GPRs and 32 FPRs on each architecture, 8 bytes each
// +         4    padding for 16-bytes alignment
// -----------
//        4616
// Round up to 5k, total 5120
#define GENERIC_JNI_TRAMPOLINE_RESERVED_AREA 5120

至于注释中为什么写"256 8-byte parameters",原因是JVM Specification中限制了Java参数传递的数量上限为255。

The number of method parameters is limited to 255 by the definition of a method descriptor (§4.3.3), where the limit includes one unit for this in the case of instance or interface method invocations.

这种情况在机器码执行时有所改善,原因是编译器会为每种参数类型生成特定的trampoline,通常也被称为"compiler JNI trampoline"。由于这些trampoline知道了参数类型,所以在参数转换、传递时更加直接。此外,编译器还对线程状态切换做了inline的处理。这些都使得compiler JNI trampoline相较于generic JNI trampoline有了性能上的提升。

注意这里说的是为每种参数类型生成一个trampoline,而不是为每个Native方法。举个例子,下面是boot.oat中两个不同的Native方法。它们虽然有着不同的返回类型,但是参数类型是一致的,第一个参数为引用类型,第二个参数为4字节的基本类型。因此它们的trampoline是共用的。

  • static native void listen(FileDescriptor fd, int backlog) throws IOException;
  • private static native int chmod(String fileName, int permission);

通过oatdump拿到这两个方法的汇编代码(也即compiler trampoline),可以发现二者完全一致,连地址都是相同的,这表明此trampoline在oat文件中只存在一份,符合参数类型的不同Native方法都指向它。(关于这个汇编代码的具体解释可以看我之前的文章

32: int java.util.prefs.FileSystemPreferences.chmod(java.lang.String, int) (dex_method_idx=26767)
  DEX CODE:
  OatMethodOffsets (offset=0x0000e9fc)
    code_offset: 0x00098d90 
  OatQuickMethodHeader (offset=0x00098d8c)
    vmap_table: (offset=0x0006f7bd)
  QuickMethodFrameInfo
    frame_size_in_bytes: 176
    core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
    fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
  CODE: (code_offset=0x00098d90 size=304)...
    0x00098d90: d102c3ff	sub sp, sp, #0xb0 (176)
    0x00098d94: a90553f3	stp tr, x20, [sp, #80]
    0x00098d98: a9065bf5	stp x21, x22, [sp, #96]
    0x00098d9c: a90763f7	stp x23, x24, [sp, #112]
    0x00098da0: a9086bf9	stp x25, x26, [sp, #128]
    0x00098da4: a90973fb	stp x27, x28, [sp, #144]
    0x00098da8: a90a7bfd	stp x29, lr, [sp, #160]
    0x00098dac: 6d0127e8	stp d8, d9, [sp, #16]
    0x00098db0: 6d022fea	stp d10, d11, [sp, #32]
    0x00098db4: 6d0337ec	stp d12, d13, [sp, #48]
    0x00098db8: 6d043fee	stp d14, d15, [sp, #64]
    0x00098dbc: f90003e0	str x0, [sp]
    0x00098dc0: 35000614	cbnz w20, #+0xc0 (addr 0x98e80)
    0x00098dc4: b900bbe1	str w1, [sp, #184]
    0x00098dc8: 910003f0	mov x16, sp
    0x00098dcc: f9005a70	str x16, [tr, #176] ; top_quick_frame_method
    0x00098dd0: 885f7e70	ldxr w16, [tr]
    0x00098dd4: 52ab8011	mov w17, #0x5c000000
    0x00098dd8: 35000610	cbnz w16, #+0xc0 (addr 0x98e98)
    0x00098ddc: 8810fe71	stlxr w16, w17, [tr]
    0x00098de0: 35ffff90	cbnz w16, #-0x10 (addr 0x98dd0)
    0x00098de4: f904f67f	str xzr, [tr, #2536] ; 2536
    0x00098de8: f9406a76	ldr x22, [tr, #208] ; jni_env
    0x00098dec: b9401ad7	ldr w23, [x22, #24]
    0x00098df0: b94022d8	ldr w24, [x22, #32]
    0x00098df4: b9001ad8	str w24, [x22, #24]
    0x00098df8: 2a0203e3	mov w3, w2
    0x00098dfc: 9102e3f0	add x16, sp, #0xb8 (184)
    0x00098e00: 7100003f	cmp w1, #0x0 (0)
    0x00098e04: 9a9f1202	csel x2, x16, xzr, ne
    0x00098e08: aa0003e1	mov x1, x0
    0x00098e0c: aa1603e0	mov x0, x22
    0x00098e10: f940083e	ldr lr, [x1, #16]
    0x00098e14: d63f03c0	blr lr
    0x00098e18: 885ffe70	ldaxr w16, [tr]
    0x00098e1c: 52ab8011	mov w17, #0x5c000000
    0x00098e20: 6b11021f	cmp w16, w17
    0x00098e24: 54000401	b.ne #+0x80 (addr 0x98ea4)
    0x00098e28: 88107e7f	stxr w16, wzr, [tr]
    0x00098e2c: 35ffff70	cbnz w16, #-0x14 (addr 0x98e18)
    0x00098e30: f943d270	ldr x16, [tr, #1952] ; 1952
    0x00098e34: f904f670	str x16, [tr, #2536] ; 2536
    0x00098e38: b9401ad8	ldr w24, [x22, #24]
    0x00098e3c: b90022d8	str w24, [x22, #32]
    0x00098e40: b9001ad7	str w23, [x22, #24]
    0x00098e44: f9405270	ldr x16, [tr, #160] ; exception
    0x00098e48: b5000350	cbnz x16, #+0x68 (addr 0x98eb0)
    0x00098e4c: a94553f3	ldp tr, x20, [sp, #80]
    0x00098e50: a9465bf5	ldp x21, x22, [sp, #96]
    0x00098e54: a94763f7	ldp x23, x24, [sp, #112]
    0x00098e58: a9486bf9	ldp x25, x26, [sp, #128]
    0x00098e5c: a94973fb	ldp x27, x28, [sp, #144]
    0x00098e60: a94a7bfd	ldp x29, lr, [sp, #160]
    0x00098e64: 6d4127e8	ldp d8, d9, [sp, #16]
    0x00098e68: 6d422fea	ldp d10, d11, [sp, #32]
    0x00098e6c: 6d4337ec	ldp d12, d13, [sp, #48]
    0x00098e70: 6d443fee	ldp d14, d15, [sp, #64]
    0x00098e74: b9402674	ldr w20, [tr, #36] ; is_gc_marking
    0x00098e78: 9102c3ff	add sp, sp, #0xb0 (176)
    0x00098e7c: d65f03c0	ret
    0x00098e80: b9400016	ldr w22, [x0]
    0x00098e84: b94006d0	ldr w16, [x22, #4]
    0x00098e88: 37eff9f0	tbnz w16, #29, #-0xc4 (addr 0x98dc4)
    0x00098e8c: f942fe7e	ldr lr, [tr, #1528] ; pJniReadBarrier
    0x00098e90: d63f03c0	blr lr
    0x00098e94: 17ffffcc	b #-0xd0 (addr 0x98dc4)
    0x00098e98: f941967e	ldr lr, [tr, #808] ; pJniMethodStart
    0x00098e9c: d63f03c0	blr lr
    0x00098ea0: 17ffffd2	b #-0xb8 (addr 0x98de8)
    0x00098ea4: f9419a7e	ldr lr, [tr, #816] ; pJniMethodEnd
    0x00098ea8: d63f03c0	blr lr
    0x00098eac: 17ffffe3	b #-0x74 (addr 0x98e38)
    0x00098eb0: f9405260	ldr x0, [tr, #160] ; exception
    0x00098eb4: f942867e	ldr lr, [tr, #1288] ; pDeliverException
    0x00098eb8: d63f03c0	blr lr
    0x00098ebc: d4200000	brk #0x0
39: void sun.nio.ch.Net.listen(java.io.FileDescriptor, int) (dex_method_idx=34223)
  DEX CODE:
  OatMethodOffsets (offset=0x0000748c)
    code_offset: 0x00098d90 
  OatQuickMethodHeader (offset=0x00098d8c)
    vmap_table: (offset=0x0006f7bd)
  QuickMethodFrameInfo
    frame_size_in_bytes: 176
    core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
    fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
  CODE: (code_offset=0x00098d90 size=304)...
    0x00098d90: d102c3ff	sub sp, sp, #0xb0 (176)
    0x00098d94: a90553f3	stp tr, x20, [sp, #80]
    0x00098d98: a9065bf5	stp x21, x22, [sp, #96]
    0x00098d9c: a90763f7	stp x23, x24, [sp, #112]
    0x00098da0: a9086bf9	stp x25, x26, [sp, #128]
    0x00098da4: a90973fb	stp x27, x28, [sp, #144]
    0x00098da8: a90a7bfd	stp x29, lr, [sp, #160]
    0x00098dac: 6d0127e8	stp d8, d9, [sp, #16]
    0x00098db0: 6d022fea	stp d10, d11, [sp, #32]
    0x00098db4: 6d0337ec	stp d12, d13, [sp, #48]
    0x00098db8: 6d043fee	stp d14, d15, [sp, #64]
    0x00098dbc: f90003e0	str x0, [sp]
    0x00098dc0: 35000614	cbnz w20, #+0xc0 (addr 0x98e80)
    0x00098dc4: b900bbe1	str w1, [sp, #184]
    0x00098dc8: 910003f0	mov x16, sp
    0x00098dcc: f9005a70	str x16, [tr, #176] ; top_quick_frame_method
    0x00098dd0: 885f7e70	ldxr w16, [tr]
    0x00098dd4: 52ab8011	mov w17, #0x5c000000
    0x00098dd8: 35000610	cbnz w16, #+0xc0 (addr 0x98e98)
    0x00098ddc: 8810fe71	stlxr w16, w17, [tr]
    0x00098de0: 35ffff90	cbnz w16, #-0x10 (addr 0x98dd0)
    0x00098de4: f904f67f	str xzr, [tr, #2536] ; 2536
    0x00098de8: f9406a76	ldr x22, [tr, #208] ; jni_env
    0x00098dec: b9401ad7	ldr w23, [x22, #24]
    0x00098df0: b94022d8	ldr w24, [x22, #32]
    0x00098df4: b9001ad8	str w24, [x22, #24]
    0x00098df8: 2a0203e3	mov w3, w2
    0x00098dfc: 9102e3f0	add x16, sp, #0xb8 (184)
    0x00098e00: 7100003f	cmp w1, #0x0 (0)
    0x00098e04: 9a9f1202	csel x2, x16, xzr, ne
    0x00098e08: aa0003e1	mov x1, x0
    0x00098e0c: aa1603e0	mov x0, x22
    0x00098e10: f940083e	ldr lr, [x1, #16]
    0x00098e14: d63f03c0	blr lr
    0x00098e18: 885ffe70	ldaxr w16, [tr]
    0x00098e1c: 52ab8011	mov w17, #0x5c000000
    0x00098e20: 6b11021f	cmp w16, w17
    0x00098e24: 54000401	b.ne #+0x80 (addr 0x98ea4)
    0x00098e28: 88107e7f	stxr w16, wzr, [tr]
    0x00098e2c: 35ffff70	cbnz w16, #-0x14 (addr 0x98e18)
    0x00098e30: f943d270	ldr x16, [tr, #1952] ; 1952
    0x00098e34: f904f670	str x16, [tr, #2536] ; 2536
    0x00098e38: b9401ad8	ldr w24, [x22, #24]
    0x00098e3c: b90022d8	str w24, [x22, #32]
    0x00098e40: b9001ad7	str w23, [x22, #24]
    0x00098e44: f9405270	ldr x16, [tr, #160] ; exception
    0x00098e48: b5000350	cbnz x16, #+0x68 (addr 0x98eb0)
    0x00098e4c: a94553f3	ldp tr, x20, [sp, #80]
    0x00098e50: a9465bf5	ldp x21, x22, [sp, #96]
    0x00098e54: a94763f7	ldp x23, x24, [sp, #112]
    0x00098e58: a9486bf9	ldp x25, x26, [sp, #128]
    0x00098e5c: a94973fb	ldp x27, x28, [sp, #144]
    0x00098e60: a94a7bfd	ldp x29, lr, [sp, #160]
    0x00098e64: 6d4127e8	ldp d8, d9, [sp, #16]
    0x00098e68: 6d422fea	ldp d10, d11, [sp, #32]
    0x00098e6c: 6d4337ec	ldp d12, d13, [sp, #48]
    0x00098e70: 6d443fee	ldp d14, d15, [sp, #64]
    0x00098e74: b9402674	ldr w20, [tr, #36] ; is_gc_marking
    0x00098e78: 9102c3ff	add sp, sp, #0xb0 (176)
    0x00098e7c: d65f03c0	ret
    0x00098e80: b9400016	ldr w22, [x0]
    0x00098e84: b94006d0	ldr w16, [x22, #4]
    0x00098e88: 37eff9f0	tbnz w16, #29, #-0xc4 (addr 0x98dc4)
    0x00098e8c: f942fe7e	ldr lr, [tr, #1528] ; pJniReadBarrier
    0x00098e90: d63f03c0	blr lr
    0x00098e94: 17ffffcc	b #-0xd0 (addr 0x98dc4)
    0x00098e98: f941967e	ldr lr, [tr, #808] ; pJniMethodStart
    0x00098e9c: d63f03c0	blr lr
    0x00098ea0: 17ffffd2	b #-0xb8 (addr 0x98de8)
    0x00098ea4: f9419a7e	ldr lr, [tr, #816] ; pJniMethodEnd
    0x00098ea8: d63f03c0	blr lr
    0x00098eac: 17ffffe3	b #-0x74 (addr 0x98e38)
    0x00098eb0: f9405260	ldr x0, [tr, #160] ; exception
    0x00098eb4: f942867e	ldr lr, [tr, #1288] ; pDeliverException
    0x00098eb8: d63f03c0	blr lr
    0x00098ebc: d4200000	brk #0x0

因此,将Native方法经过JIT/AOT编译后可以提升性能。这种提升并不来源于字节码到机器码的转变,因为Native方法是空的,它没有字节码。它本质上来自art_quick_generic_jni_trampoline到"compiler JNI trampoline"的转变。

结合当下主流APP在尝试的Baseline Profile方案,或许可以将这些Native方法都放入profile名单。由于相同参数类型的不同方法共用一个trampoline,所以最终编译增加的code size是微乎其微的,但其带来的性能提升会让每个Native方法都享受到。

让我们回到故事的最开始,为什么JNI跳转需要一个trampoline?

参数转换可以理解,毕竟C/C++函数多了一个参数JNIEnv*,而且Java的引用类型和String都要转换为对应的C++类型。可是线程状态切换和Local Reference Table的目的又是什么呢?

简言之,它们的目的都是为了保证GC可以正常进行。无论GC Collector如何演变,有两个基础是不变的。一个是要有静止的窗口期可以观察到稳定的堆状态,另一个是GC Root要找齐全。静止的窗口期又被称为"stop the world",它表示所有线程在这个阶段都不能去触碰堆内存。基于此,才衍生出Java线程状态的概念。Runnable状态表明线程运行在Java世界,随时可能使用堆内存。Native状态则表明线程运行在C/C++世界,且不会使用到Java堆内存。一个Native状态的线程在GC眼中就是“暂停”的线程,这里的“暂停”并不表示线程不运行,而只是不接触Java堆。

因此正常的JNI调用发生时,都需要让线程状态由Runnable切换到Native,以此来告诉GC:接下来我不会使用到Java堆,你就当我睡去好了。

那如果C/C++函数中需要再次使用Java堆怎么办?这可是常有的事,譬如env->CallObjectMethod调用Java方法,或者obj->GetFieldBoolean获取某个引用参数的内部字段。这时就需要将线程状态从Native切换回Runnable。因此正常的JNI调用过程中涉及频繁的线程状态切换,这可是一笔不小的性能开销。

要么索性不进行线程切换,就让线程保持Runnable状态!那会有什么后果呢?

无论是解释执行还是机器码执行,Java/Kotlin代码在执行过程中都会插入很多检测点,它们保证线程在需要的时候能够及时暂停下来,而不成为蒙眼狂奔的疯子。但是C/C++代码并不会插入这样的检测点(一是影响性能,二是不同编译器的编译规则不同,无法保证),因此如果它的状态为Runnable,那么GC只能等着它回到Java世界,等着它进入检测点才能暂停它。如果C/C++函数中的运行时间很短,那么省去线程切换确实能够带来性能提升。但如果C/C++函数运行时间过长,或者函数内部可能会存在挂起的动作(譬如等锁或是主动sleep),那么GC将会受到严重的干扰,省下的这点性能说不定不及给GC带来的负面影响。

因此当开发者能够确保Native方法对应的C/C++函数耗时很短时,他就可以采用Android提供的方式告诉trampoline不要进行线程切换。

早期Android版本中,通过在签名前加上!可以省去线程切换的时间,这种方式称为fast jni,如下所示。

static JNINativeMethod gMethods[] = {
  NATIVE_METHOD(Unsafe, compareAndSwapInt, "!(Ljava/lang/Object;JII)Z"),
  NATIVE_METHOD(Unsafe, compareAndSwapLong, "!(Ljava/lang/Object;JJJ)Z"),

不过这个方式从Android 8开始就废弃了,转而被@FastNative注解取代,如下所示。生成出来的JNI trampoline汇编代码为61行。

@FastNative
private static native int getArrayBaseOffsetForComponentType(Class component_class);
2: int sun.misc.Unsafe.getArrayBaseOffsetForComponentType(java.lang.Class) (dex_method_idx=32803)
  DEX CODE:
  OatMethodOffsets (offset=0x000070cc)
    code_offset: 0x00096080 
  OatQuickMethodHeader (offset=0x0009607c)
    vmap_table: (offset=0x00082eed)
  QuickMethodFrameInfo
    frame_size_in_bytes: 176
    core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
    fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
  CODE: (code_offset=0x00096080 size=244)...
    0x00096080: d102c3ff	sub sp, sp, #0xb0 (176)
    0x00096084: a90553f3	stp tr, x20, [sp, #80]
    0x00096088: a9065bf5	stp x21, x22, [sp, #96]
    0x0009608c: a90763f7	stp x23, x24, [sp, #112]
    0x00096090: a9086bf9	stp x25, x26, [sp, #128]
    0x00096094: a90973fb	stp x27, x28, [sp, #144]
    0x00096098: a90a7bfd	stp x29, lr, [sp, #160]
    0x0009609c: 6d0127e8	stp d8, d9, [sp, #16]
    0x000960a0: 6d022fea	stp d10, d11, [sp, #32]
    0x000960a4: 6d0337ec	stp d12, d13, [sp, #48]
    0x000960a8: 6d043fee	stp d14, d15, [sp, #64]
    0x000960ac: f90003e0	str x0, [sp]
    0x000960b0: 35000494	cbnz w20, #+0x90 (addr 0x96140)
    0x000960b4: b900bbe1	str w1, [sp, #184]
    0x000960b8: 910003f0	mov x16, sp
    0x000960bc: f9005a70	str x16, [tr, #176] ; top_quick_frame_method
    0x000960c0: f9406a76	ldr x22, [tr, #208] ; jni_env
    0x000960c4: b9401ad7	ldr w23, [x22, #24]
    0x000960c8: b94022d8	ldr w24, [x22, #32]
    0x000960cc: b9001ad8	str w24, [x22, #24]
    0x000960d0: 9102e3f0	add x16, sp, #0xb8 (184)
    0x000960d4: 7100003f	cmp w1, #0x0 (0)
    0x000960d8: 9a9f1202	csel x2, x16, xzr, ne
    0x000960dc: aa0003e1	mov x1, x0
    0x000960e0: aa1603e0	mov x0, x22
    0x000960e4: f940083e	ldr lr, [x1, #16]
    0x000960e8: d63f03c0	blr lr
    0x000960ec: b9401ad8	ldr w24, [x22, #24]
    0x000960f0: b90022d8	str w24, [x22, #32]
    0x000960f4: b9001ad7	str w23, [x22, #24]
    0x000960f8: f9405270	ldr x16, [tr, #160] ; exception
    0x000960fc: b5000350	cbnz x16, #+0x68 (addr 0x96164)
    0x00096100: b9400270	ldr w16, [tr] ; state_and_flags
    0x00096104: 72000a1f	tst w16, #0x7
    0x00096108: 54000281	b.ne #+0x50 (addr 0x96158)
    0x0009610c: a94553f3	ldp tr, x20, [sp, #80]
    0x00096110: a9465bf5	ldp x21, x22, [sp, #96]
    0x00096114: a94763f7	ldp x23, x24, [sp, #112]
    0x00096118: a9486bf9	ldp x25, x26, [sp, #128]
    0x0009611c: a94973fb	ldp x27, x28, [sp, #144]
    0x00096120: a94a7bfd	ldp x29, lr, [sp, #160]
    0x00096124: 6d4127e8	ldp d8, d9, [sp, #16]
    0x00096128: 6d422fea	ldp d10, d11, [sp, #32]
    0x0009612c: 6d4337ec	ldp d12, d13, [sp, #48]
    0x00096130: 6d443fee	ldp d14, d15, [sp, #64]
    0x00096134: b9402674	ldr w20, [tr, #36] ; is_gc_marking
    0x00096138: 9102c3ff	add sp, sp, #0xb0 (176)
    0x0009613c: d65f03c0	ret
    0x00096140: b9400016	ldr w22, [x0]
    0x00096144: b94006d0	ldr w16, [x22, #4]
    0x00096148: 37effb70	tbnz w16, #29, #-0x94 (addr 0x960b4)
    0x0009614c: f942fe7e	ldr lr, [tr, #1528] ; pJniReadBarrier
    0x00096150: d63f03c0	blr lr
    0x00096154: 17ffffd8	b #-0xa0 (addr 0x960b4)
    0x00096158: f942827e	ldr lr, [tr, #1280] ; pTestSuspend
    0x0009615c: d63f03c0	blr lr
    0x00096160: 17ffffeb	b #-0x54 (addr 0x9610c)
    0x00096164: f9405260	ldr x0, [tr, #160] ; exception
    0x00096168: f942867e	ldr lr, [tr, #1288] ; pDeliverException
    0x0009616c: d63f03c0	blr lr
    0x00096170: d4200000	brk #0x0

作为对比,我们将同参数类型的普通Native方法生成的汇编代码也列在这里。它和getArrayBaseOffsetForComponentType一样也是private static方法,因此唯一的区别就在于@FastNative注解。可以看到,普通Native方法生成的汇编代码行数为75行,而@FastNative方法生成的汇编代码行数只有61行。

5: int sun.nio.ch.IOUtil.fdVal(java.io.FileDescriptor) (dex_method_idx=34034)
  DEX CODE:
  OatMethodOffsets (offset=0x0000738c)
    code_offset: 0x00097110 
  OatQuickMethodHeader (offset=0x0009710c)
    vmap_table: (offset=0x0008147a)
  QuickMethodFrameInfo
    frame_size_in_bytes: 176
    core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
    fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
  CODE: (code_offset=0x00097110 size=300)...
    0x00097110: d102c3ff	sub sp, sp, #0xb0 (176)
    0x00097114: a90553f3	stp tr, x20, [sp, #80]
    0x00097118: a9065bf5	stp x21, x22, [sp, #96]
    0x0009711c: a90763f7	stp x23, x24, [sp, #112]
    0x00097120: a9086bf9	stp x25, x26, [sp, #128]
    0x00097124: a90973fb	stp x27, x28, [sp, #144]
    0x00097128: a90a7bfd	stp x29, lr, [sp, #160]
    0x0009712c: 6d0127e8	stp d8, d9, [sp, #16]
    0x00097130: 6d022fea	stp d10, d11, [sp, #32]
    0x00097134: 6d0337ec	stp d12, d13, [sp, #48]
    0x00097138: 6d043fee	stp d14, d15, [sp, #64]
    0x0009713c: f90003e0	str x0, [sp]
    0x00097140: 350005f4	cbnz w20, #+0xbc (addr 0x971fc)
    0x00097144: b900bbe1	str w1, [sp, #184]
    0x00097148: 910003f0	mov x16, sp
    0x0009714c: f9005a70	str x16, [tr, #176] ; top_quick_frame_method
    0x00097150: 885f7e70	ldxr w16, [tr]
    0x00097154: 52ab8011	mov w17, #0x5c000000
    0x00097158: 350005f0	cbnz w16, #+0xbc (addr 0x97214)
    0x0009715c: 8810fe71	stlxr w16, w17, [tr]
    0x00097160: 35ffff90	cbnz w16, #-0x10 (addr 0x97150)
    0x00097164: f904f67f	str xzr, [tr, #2536] ; 2536
    0x00097168: f9406a76	ldr x22, [tr, #208] ; jni_env
    0x0009716c: b9401ad7	ldr w23, [x22, #24]
    0x00097170: b94022d8	ldr w24, [x22, #32]
    0x00097174: b9001ad8	str w24, [x22, #24]
    0x00097178: 9102e3f0	add x16, sp, #0xb8 (184)
    0x0009717c: 7100003f	cmp w1, #0x0 (0)
    0x00097180: 9a9f1202	csel x2, x16, xzr, ne
    0x00097184: aa0003e1	mov x1, x0
    0x00097188: aa1603e0	mov x0, x22
    0x0009718c: f940083e	ldr lr, [x1, #16]
    0x00097190: d63f03c0	blr lr
    0x00097194: 885ffe70	ldaxr w16, [tr]
    0x00097198: 52ab8011	mov w17, #0x5c000000
    0x0009719c: 6b11021f	cmp w16, w17
    0x000971a0: 54000401	b.ne #+0x80 (addr 0x97220)
    0x000971a4: 88107e7f	stxr w16, wzr, [tr]
    0x000971a8: 35ffff70	cbnz w16, #-0x14 (addr 0x97194)
    0x000971ac: f943d270	ldr x16, [tr, #1952] ; 1952
    0x000971b0: f904f670	str x16, [tr, #2536] ; 2536
    0x000971b4: b9401ad8	ldr w24, [x22, #24]
    0x000971b8: b90022d8	str w24, [x22, #32]
    0x000971bc: b9001ad7	str w23, [x22, #24]
    0x000971c0: f9405270	ldr x16, [tr, #160] ; exception
    0x000971c4: b5000350	cbnz x16, #+0x68 (addr 0x9722c)
    0x000971c8: a94553f3	ldp tr, x20, [sp, #80]
    0x000971cc: a9465bf5	ldp x21, x22, [sp, #96]
    0x000971d0: a94763f7	ldp x23, x24, [sp, #112]
    0x000971d4: a9486bf9	ldp x25, x26, [sp, #128]
    0x000971d8: a94973fb	ldp x27, x28, [sp, #144]
    0x000971dc: a94a7bfd	ldp x29, lr, [sp, #160]
    0x000971e0: 6d4127e8	ldp d8, d9, [sp, #16]
    0x000971e4: 6d422fea	ldp d10, d11, [sp, #32]
    0x000971e8: 6d4337ec	ldp d12, d13, [sp, #48]
    0x000971ec: 6d443fee	ldp d14, d15, [sp, #64]
    0x000971f0: b9402674	ldr w20, [tr, #36] ; is_gc_marking
    0x000971f4: 9102c3ff	add sp, sp, #0xb0 (176)
    0x000971f8: d65f03c0	ret
    0x000971fc: b9400016	ldr w22, [x0]
    0x00097200: b94006d0	ldr w16, [x22, #4]
    0x00097204: 37effa10	tbnz w16, #29, #-0xc0 (addr 0x97144)
    0x00097208: f942fe7e	ldr lr, [tr, #1528] ; pJniReadBarrier
    0x0009720c: d63f03c0	blr lr
    0x00097210: 17ffffcd	b #-0xcc (addr 0x97144)
    0x00097214: f941967e	ldr lr, [tr, #808] ; pJniMethodStart
    0x00097218: d63f03c0	blr lr
    0x0009721c: 17ffffd3	b #-0xb4 (addr 0x97168)
    0x00097220: f9419a7e	ldr lr, [tr, #816] ; pJniMethodEnd
    0x00097224: d63f03c0	blr lr
    0x00097228: 17ffffe3	b #-0x74 (addr 0x971b4)
    0x0009722c: f9405260	ldr x0, [tr, #160] ; exception
    0x00097230: f942867e	ldr lr, [tr, #1288] ; pDeliverException
    0x00097234: d63f03c0	blr lr
    0x00097238: d4200000	brk #0x0

@FastNativefast jni虽然都省去了线程切换时间,但实现细节仍然有些差别。从JNI trampoline的角度,你可以认为@FastNativefast jni的重构,性能稍微好一些。

不过@FastNative不支持synchronize,原因是synchronize加锁的动作发生在trampoline里面,它可能阻塞,而这不符合@FastNative设计的初衷。

说完线程状态切换,再来说说JNI调用过程中的GC Root。它们主要来自两个地方:

  • Native方法传入的引用参数。
  • C/C++函数中创建的Java对象。

这些对象必须被当作GC Root,原因是它们可能没有被其他任何地方引用,譬如new一个对象将它直接作为参数。为了在GC时能够找到这些GC Root,所以虚拟机引入了Local Reference Table,Global Reference Table,HandleScope(已弃用)等一系列数据结构,而且JNI trampoline中也增加了一些处理环节。那么如果C/C++函数中不需要这些Java对象呢?是不是就意味JNI Trampoline中可以省去这些处理环节?

根据这个优化思路,谷歌引入了@CriticalNative注解,如下所示。

@CriticalNative
public static native long getNanoTimeAdjustment(long offsetInSeconds);

JNIEXPORT jlong JNICALL VM_getNanoTimeAdjustment(jlong offsetInSeconds) {
    return JVM_GetNanoTimeAdjustment(nullptr, nullptr, offsetInSeconds);
}

由于被@CriticalNative注解的JNI函数内部不能使用Java对象,因此它只能被用于static方法。因为非static方法会默认将this作为参数传入。此外,C/C++层对应的函数也不再拥有JNIEnv*jclass两个参数。

让我们来见识一下@CriticalNative的威力!

10: long jdk.internal.misc.VM.getNanoTimeAdjustment(long) (dex_method_idx=32015)
  DEX CODE:
  OatMethodOffsets (offset=0x00006fd0)
    code_offset: 0x000988c0 
  OatQuickMethodHeader (offset=0x000988bc)
    vmap_table: (offset=0x0006fd81)
  QuickMethodFrameInfo
    frame_size_in_bytes: 0
    core_spill_mask: 0x00000000 
    fp_spill_mask: 0x00000000 
  CODE: (code_offset=0x000988c0 size=16)...
    0x000988c0: aa0003ef	mov x15, x0
    0x000988c4: aa0103e0	mov x0, x1
    0x000988c8: f94009f0	ldr x16, [x15, #16]
    0x000988cc: d61f0200	br x16

可以看到,经过@CriticalNative注解的方法,最终编译生成的汇编代码只剩下了4行,性能可谓大大提升。

@FastNative@CriticalNative虽然对性能提升有帮助,但一定要注意它的适用范围,切莫随意使用,造成其他的问题。关于这二者的详细介绍和注意事项可以参考官方链接:@FastNative@CriticalNative。本文就不再狗尾续貂了。