当我们在Java里调用Native方法时,多数人会认为程序将直接跳转到对应的C/C++函数中。但实际情况并非如此,我们需要一个中间函数来处理线程状态切换、Local Reference Table更新、参数转换等一系列工作。这个函数通常被称为"JNI Trampoline"(trampoline:蹦床),它的运行时间越短,JNI调用的性能就越好。
在Android发展的历史中,Google针对JNI Trampoline有过几次大的优化,总的来说可以分为两个方向:
- 适用于所有参数类型的generic trampoline向只适用于特定参数类型的specific trampoline转变。
- 根据C/C++函数的实际情况,省去trampoline中的一些工作。
众所周知,ART虚拟机支持三种执行模式:解释执行、AOT和JIT。后两者同属于机器码执行。当解释器碰到Native方法时,它会选择虚拟机内置的art_quick_generic_jni_trampoline
来处理中间事务。这种generic trampoline可以适用所有的参数类型,但由于考虑了各种情况,甚至最极端的情况,因此性能并不好。举个例子,即便我们只传递一个参数,art_quick_generic_jni_trampoline
也会在栈上分配5K的大小。
// Reserved area on stack for art_quick_generic_jni_trampoline:
// 4 local state ref
// 4 padding
// 4096 4k scratch space, enough for 2x 256 8-byte parameters
// 8*(32+32) max 32 GPRs and 32 FPRs on each architecture, 8 bytes each
// + 4 padding for 16-bytes alignment
// -----------
// 4616
// Round up to 5k, total 5120
#define GENERIC_JNI_TRAMPOLINE_RESERVED_AREA 5120
至于注释中为什么写"256 8-byte parameters",原因是JVM Specification中限制了Java参数传递的数量上限为255。
The number of method parameters is limited to 255 by the definition of a method descriptor (§4.3.3), where the limit includes one unit for
this
in the case of instance or interface method invocations.
这种情况在机器码执行时有所改善,原因是编译器会为每种参数类型生成特定的trampoline,通常也被称为"compiler JNI trampoline"。由于这些trampoline知道了参数类型,所以在参数转换、传递时更加直接。此外,编译器还对线程状态切换做了inline的处理。这些都使得compiler JNI trampoline相较于generic JNI trampoline有了性能上的提升。
注意这里说的是为每种参数类型生成一个trampoline,而不是为每个Native方法。举个例子,下面是boot.oat中两个不同的Native方法。它们虽然有着不同的返回类型,但是参数类型是一致的,第一个参数为引用类型,第二个参数为4字节的基本类型。因此它们的trampoline是共用的。
- static native void listen(FileDescriptor fd, int backlog) throws IOException;
- private static native int chmod(String fileName, int permission);
通过oatdump拿到这两个方法的汇编代码(也即compiler trampoline),可以发现二者完全一致,连地址都是相同的,这表明此trampoline在oat文件中只存在一份,符合参数类型的不同Native方法都指向它。(关于这个汇编代码的具体解释可以看我之前的文章)
32: int java.util.prefs.FileSystemPreferences.chmod(java.lang.String, int) (dex_method_idx=26767)
DEX CODE:
OatMethodOffsets (offset=0x0000e9fc)
code_offset: 0x00098d90
OatQuickMethodHeader (offset=0x00098d8c)
vmap_table: (offset=0x0006f7bd)
QuickMethodFrameInfo
frame_size_in_bytes: 176
core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
CODE: (code_offset=0x00098d90 size=304)...
0x00098d90: d102c3ff sub sp, sp, #0xb0 (176)
0x00098d94: a90553f3 stp tr, x20, [sp, #80]
0x00098d98: a9065bf5 stp x21, x22, [sp, #96]
0x00098d9c: a90763f7 stp x23, x24, [sp, #112]
0x00098da0: a9086bf9 stp x25, x26, [sp, #128]
0x00098da4: a90973fb stp x27, x28, [sp, #144]
0x00098da8: a90a7bfd stp x29, lr, [sp, #160]
0x00098dac: 6d0127e8 stp d8, d9, [sp, #16]
0x00098db0: 6d022fea stp d10, d11, [sp, #32]
0x00098db4: 6d0337ec stp d12, d13, [sp, #48]
0x00098db8: 6d043fee stp d14, d15, [sp, #64]
0x00098dbc: f90003e0 str x0, [sp]
0x00098dc0: 35000614 cbnz w20, #+0xc0 (addr 0x98e80)
0x00098dc4: b900bbe1 str w1, [sp, #184]
0x00098dc8: 910003f0 mov x16, sp
0x00098dcc: f9005a70 str x16, [tr, #176] ; top_quick_frame_method
0x00098dd0: 885f7e70 ldxr w16, [tr]
0x00098dd4: 52ab8011 mov w17, #0x5c000000
0x00098dd8: 35000610 cbnz w16, #+0xc0 (addr 0x98e98)
0x00098ddc: 8810fe71 stlxr w16, w17, [tr]
0x00098de0: 35ffff90 cbnz w16, #-0x10 (addr 0x98dd0)
0x00098de4: f904f67f str xzr, [tr, #2536] ; 2536
0x00098de8: f9406a76 ldr x22, [tr, #208] ; jni_env
0x00098dec: b9401ad7 ldr w23, [x22, #24]
0x00098df0: b94022d8 ldr w24, [x22, #32]
0x00098df4: b9001ad8 str w24, [x22, #24]
0x00098df8: 2a0203e3 mov w3, w2
0x00098dfc: 9102e3f0 add x16, sp, #0xb8 (184)
0x00098e00: 7100003f cmp w1, #0x0 (0)
0x00098e04: 9a9f1202 csel x2, x16, xzr, ne
0x00098e08: aa0003e1 mov x1, x0
0x00098e0c: aa1603e0 mov x0, x22
0x00098e10: f940083e ldr lr, [x1, #16]
0x00098e14: d63f03c0 blr lr
0x00098e18: 885ffe70 ldaxr w16, [tr]
0x00098e1c: 52ab8011 mov w17, #0x5c000000
0x00098e20: 6b11021f cmp w16, w17
0x00098e24: 54000401 b.ne #+0x80 (addr 0x98ea4)
0x00098e28: 88107e7f stxr w16, wzr, [tr]
0x00098e2c: 35ffff70 cbnz w16, #-0x14 (addr 0x98e18)
0x00098e30: f943d270 ldr x16, [tr, #1952] ; 1952
0x00098e34: f904f670 str x16, [tr, #2536] ; 2536
0x00098e38: b9401ad8 ldr w24, [x22, #24]
0x00098e3c: b90022d8 str w24, [x22, #32]
0x00098e40: b9001ad7 str w23, [x22, #24]
0x00098e44: f9405270 ldr x16, [tr, #160] ; exception
0x00098e48: b5000350 cbnz x16, #+0x68 (addr 0x98eb0)
0x00098e4c: a94553f3 ldp tr, x20, [sp, #80]
0x00098e50: a9465bf5 ldp x21, x22, [sp, #96]
0x00098e54: a94763f7 ldp x23, x24, [sp, #112]
0x00098e58: a9486bf9 ldp x25, x26, [sp, #128]
0x00098e5c: a94973fb ldp x27, x28, [sp, #144]
0x00098e60: a94a7bfd ldp x29, lr, [sp, #160]
0x00098e64: 6d4127e8 ldp d8, d9, [sp, #16]
0x00098e68: 6d422fea ldp d10, d11, [sp, #32]
0x00098e6c: 6d4337ec ldp d12, d13, [sp, #48]
0x00098e70: 6d443fee ldp d14, d15, [sp, #64]
0x00098e74: b9402674 ldr w20, [tr, #36] ; is_gc_marking
0x00098e78: 9102c3ff add sp, sp, #0xb0 (176)
0x00098e7c: d65f03c0 ret
0x00098e80: b9400016 ldr w22, [x0]
0x00098e84: b94006d0 ldr w16, [x22, #4]
0x00098e88: 37eff9f0 tbnz w16, #29, #-0xc4 (addr 0x98dc4)
0x00098e8c: f942fe7e ldr lr, [tr, #1528] ; pJniReadBarrier
0x00098e90: d63f03c0 blr lr
0x00098e94: 17ffffcc b #-0xd0 (addr 0x98dc4)
0x00098e98: f941967e ldr lr, [tr, #808] ; pJniMethodStart
0x00098e9c: d63f03c0 blr lr
0x00098ea0: 17ffffd2 b #-0xb8 (addr 0x98de8)
0x00098ea4: f9419a7e ldr lr, [tr, #816] ; pJniMethodEnd
0x00098ea8: d63f03c0 blr lr
0x00098eac: 17ffffe3 b #-0x74 (addr 0x98e38)
0x00098eb0: f9405260 ldr x0, [tr, #160] ; exception
0x00098eb4: f942867e ldr lr, [tr, #1288] ; pDeliverException
0x00098eb8: d63f03c0 blr lr
0x00098ebc: d4200000 brk #0x0
39: void sun.nio.ch.Net.listen(java.io.FileDescriptor, int) (dex_method_idx=34223)
DEX CODE:
OatMethodOffsets (offset=0x0000748c)
code_offset: 0x00098d90
OatQuickMethodHeader (offset=0x00098d8c)
vmap_table: (offset=0x0006f7bd)
QuickMethodFrameInfo
frame_size_in_bytes: 176
core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
CODE: (code_offset=0x00098d90 size=304)...
0x00098d90: d102c3ff sub sp, sp, #0xb0 (176)
0x00098d94: a90553f3 stp tr, x20, [sp, #80]
0x00098d98: a9065bf5 stp x21, x22, [sp, #96]
0x00098d9c: a90763f7 stp x23, x24, [sp, #112]
0x00098da0: a9086bf9 stp x25, x26, [sp, #128]
0x00098da4: a90973fb stp x27, x28, [sp, #144]
0x00098da8: a90a7bfd stp x29, lr, [sp, #160]
0x00098dac: 6d0127e8 stp d8, d9, [sp, #16]
0x00098db0: 6d022fea stp d10, d11, [sp, #32]
0x00098db4: 6d0337ec stp d12, d13, [sp, #48]
0x00098db8: 6d043fee stp d14, d15, [sp, #64]
0x00098dbc: f90003e0 str x0, [sp]
0x00098dc0: 35000614 cbnz w20, #+0xc0 (addr 0x98e80)
0x00098dc4: b900bbe1 str w1, [sp, #184]
0x00098dc8: 910003f0 mov x16, sp
0x00098dcc: f9005a70 str x16, [tr, #176] ; top_quick_frame_method
0x00098dd0: 885f7e70 ldxr w16, [tr]
0x00098dd4: 52ab8011 mov w17, #0x5c000000
0x00098dd8: 35000610 cbnz w16, #+0xc0 (addr 0x98e98)
0x00098ddc: 8810fe71 stlxr w16, w17, [tr]
0x00098de0: 35ffff90 cbnz w16, #-0x10 (addr 0x98dd0)
0x00098de4: f904f67f str xzr, [tr, #2536] ; 2536
0x00098de8: f9406a76 ldr x22, [tr, #208] ; jni_env
0x00098dec: b9401ad7 ldr w23, [x22, #24]
0x00098df0: b94022d8 ldr w24, [x22, #32]
0x00098df4: b9001ad8 str w24, [x22, #24]
0x00098df8: 2a0203e3 mov w3, w2
0x00098dfc: 9102e3f0 add x16, sp, #0xb8 (184)
0x00098e00: 7100003f cmp w1, #0x0 (0)
0x00098e04: 9a9f1202 csel x2, x16, xzr, ne
0x00098e08: aa0003e1 mov x1, x0
0x00098e0c: aa1603e0 mov x0, x22
0x00098e10: f940083e ldr lr, [x1, #16]
0x00098e14: d63f03c0 blr lr
0x00098e18: 885ffe70 ldaxr w16, [tr]
0x00098e1c: 52ab8011 mov w17, #0x5c000000
0x00098e20: 6b11021f cmp w16, w17
0x00098e24: 54000401 b.ne #+0x80 (addr 0x98ea4)
0x00098e28: 88107e7f stxr w16, wzr, [tr]
0x00098e2c: 35ffff70 cbnz w16, #-0x14 (addr 0x98e18)
0x00098e30: f943d270 ldr x16, [tr, #1952] ; 1952
0x00098e34: f904f670 str x16, [tr, #2536] ; 2536
0x00098e38: b9401ad8 ldr w24, [x22, #24]
0x00098e3c: b90022d8 str w24, [x22, #32]
0x00098e40: b9001ad7 str w23, [x22, #24]
0x00098e44: f9405270 ldr x16, [tr, #160] ; exception
0x00098e48: b5000350 cbnz x16, #+0x68 (addr 0x98eb0)
0x00098e4c: a94553f3 ldp tr, x20, [sp, #80]
0x00098e50: a9465bf5 ldp x21, x22, [sp, #96]
0x00098e54: a94763f7 ldp x23, x24, [sp, #112]
0x00098e58: a9486bf9 ldp x25, x26, [sp, #128]
0x00098e5c: a94973fb ldp x27, x28, [sp, #144]
0x00098e60: a94a7bfd ldp x29, lr, [sp, #160]
0x00098e64: 6d4127e8 ldp d8, d9, [sp, #16]
0x00098e68: 6d422fea ldp d10, d11, [sp, #32]
0x00098e6c: 6d4337ec ldp d12, d13, [sp, #48]
0x00098e70: 6d443fee ldp d14, d15, [sp, #64]
0x00098e74: b9402674 ldr w20, [tr, #36] ; is_gc_marking
0x00098e78: 9102c3ff add sp, sp, #0xb0 (176)
0x00098e7c: d65f03c0 ret
0x00098e80: b9400016 ldr w22, [x0]
0x00098e84: b94006d0 ldr w16, [x22, #4]
0x00098e88: 37eff9f0 tbnz w16, #29, #-0xc4 (addr 0x98dc4)
0x00098e8c: f942fe7e ldr lr, [tr, #1528] ; pJniReadBarrier
0x00098e90: d63f03c0 blr lr
0x00098e94: 17ffffcc b #-0xd0 (addr 0x98dc4)
0x00098e98: f941967e ldr lr, [tr, #808] ; pJniMethodStart
0x00098e9c: d63f03c0 blr lr
0x00098ea0: 17ffffd2 b #-0xb8 (addr 0x98de8)
0x00098ea4: f9419a7e ldr lr, [tr, #816] ; pJniMethodEnd
0x00098ea8: d63f03c0 blr lr
0x00098eac: 17ffffe3 b #-0x74 (addr 0x98e38)
0x00098eb0: f9405260 ldr x0, [tr, #160] ; exception
0x00098eb4: f942867e ldr lr, [tr, #1288] ; pDeliverException
0x00098eb8: d63f03c0 blr lr
0x00098ebc: d4200000 brk #0x0
因此,将Native方法经过JIT/AOT编译后可以提升性能。这种提升并不来源于字节码到机器码的转变,因为Native方法是空的,它没有字节码。它本质上来自art_quick_generic_jni_trampoline
到"compiler JNI trampoline"的转变。
结合当下主流APP在尝试的Baseline Profile方案,或许可以将这些Native方法都放入profile名单。由于相同参数类型的不同方法共用一个trampoline,所以最终编译增加的code size是微乎其微的,但其带来的性能提升会让每个Native方法都享受到。
让我们回到故事的最开始,为什么JNI跳转需要一个trampoline?
参数转换可以理解,毕竟C/C++函数多了一个参数JNIEnv*
,而且Java的引用类型和String都要转换为对应的C++类型。可是线程状态切换和Local Reference Table的目的又是什么呢?
简言之,它们的目的都是为了保证GC可以正常进行。无论GC Collector如何演变,有两个基础是不变的。一个是要有静止的窗口期可以观察到稳定的堆状态,另一个是GC Root要找齐全。静止的窗口期又被称为"stop the world",它表示所有线程在这个阶段都不能去触碰堆内存。基于此,才衍生出Java线程状态的概念。Runnable状态表明线程运行在Java世界,随时可能使用堆内存。Native状态则表明线程运行在C/C++世界,且不会使用到Java堆内存。一个Native状态的线程在GC眼中就是“暂停”的线程,这里的“暂停”并不表示线程不运行,而只是不接触Java堆。
因此正常的JNI调用发生时,都需要让线程状态由Runnable切换到Native,以此来告诉GC:接下来我不会使用到Java堆,你就当我睡去好了。
那如果C/C++函数中需要再次使用Java堆怎么办?这可是常有的事,譬如env->CallObjectMethod
调用Java方法,或者obj->GetFieldBoolean
获取某个引用参数的内部字段。这时就需要将线程状态从Native切换回Runnable。因此正常的JNI调用过程中涉及频繁的线程状态切换,这可是一笔不小的性能开销。
要么索性不进行线程切换,就让线程保持Runnable状态!那会有什么后果呢?
无论是解释执行还是机器码执行,Java/Kotlin代码在执行过程中都会插入很多检测点,它们保证线程在需要的时候能够及时暂停下来,而不成为蒙眼狂奔的疯子。但是C/C++代码并不会插入这样的检测点(一是影响性能,二是不同编译器的编译规则不同,无法保证),因此如果它的状态为Runnable,那么GC只能等着它回到Java世界,等着它进入检测点才能暂停它。如果C/C++函数中的运行时间很短,那么省去线程切换确实能够带来性能提升。但如果C/C++函数运行时间过长,或者函数内部可能会存在挂起的动作(譬如等锁或是主动sleep),那么GC将会受到严重的干扰,省下的这点性能说不定不及给GC带来的负面影响。
因此当开发者能够确保Native方法对应的C/C++函数耗时很短时,他就可以采用Android提供的方式告诉trampoline不要进行线程切换。
早期Android版本中,通过在签名前加上!
可以省去线程切换的时间,这种方式称为fast jni
,如下所示。
static JNINativeMethod gMethods[] = {
NATIVE_METHOD(Unsafe, compareAndSwapInt, "!(Ljava/lang/Object;JII)Z"),
NATIVE_METHOD(Unsafe, compareAndSwapLong, "!(Ljava/lang/Object;JJJ)Z"),
不过这个方式从Android 8开始就废弃了,转而被@FastNative
注解取代,如下所示。生成出来的JNI trampoline汇编代码为61行。
@FastNative
private static native int getArrayBaseOffsetForComponentType(Class component_class);
2: int sun.misc.Unsafe.getArrayBaseOffsetForComponentType(java.lang.Class) (dex_method_idx=32803)
DEX CODE:
OatMethodOffsets (offset=0x000070cc)
code_offset: 0x00096080
OatQuickMethodHeader (offset=0x0009607c)
vmap_table: (offset=0x00082eed)
QuickMethodFrameInfo
frame_size_in_bytes: 176
core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
CODE: (code_offset=0x00096080 size=244)...
0x00096080: d102c3ff sub sp, sp, #0xb0 (176)
0x00096084: a90553f3 stp tr, x20, [sp, #80]
0x00096088: a9065bf5 stp x21, x22, [sp, #96]
0x0009608c: a90763f7 stp x23, x24, [sp, #112]
0x00096090: a9086bf9 stp x25, x26, [sp, #128]
0x00096094: a90973fb stp x27, x28, [sp, #144]
0x00096098: a90a7bfd stp x29, lr, [sp, #160]
0x0009609c: 6d0127e8 stp d8, d9, [sp, #16]
0x000960a0: 6d022fea stp d10, d11, [sp, #32]
0x000960a4: 6d0337ec stp d12, d13, [sp, #48]
0x000960a8: 6d043fee stp d14, d15, [sp, #64]
0x000960ac: f90003e0 str x0, [sp]
0x000960b0: 35000494 cbnz w20, #+0x90 (addr 0x96140)
0x000960b4: b900bbe1 str w1, [sp, #184]
0x000960b8: 910003f0 mov x16, sp
0x000960bc: f9005a70 str x16, [tr, #176] ; top_quick_frame_method
0x000960c0: f9406a76 ldr x22, [tr, #208] ; jni_env
0x000960c4: b9401ad7 ldr w23, [x22, #24]
0x000960c8: b94022d8 ldr w24, [x22, #32]
0x000960cc: b9001ad8 str w24, [x22, #24]
0x000960d0: 9102e3f0 add x16, sp, #0xb8 (184)
0x000960d4: 7100003f cmp w1, #0x0 (0)
0x000960d8: 9a9f1202 csel x2, x16, xzr, ne
0x000960dc: aa0003e1 mov x1, x0
0x000960e0: aa1603e0 mov x0, x22
0x000960e4: f940083e ldr lr, [x1, #16]
0x000960e8: d63f03c0 blr lr
0x000960ec: b9401ad8 ldr w24, [x22, #24]
0x000960f0: b90022d8 str w24, [x22, #32]
0x000960f4: b9001ad7 str w23, [x22, #24]
0x000960f8: f9405270 ldr x16, [tr, #160] ; exception
0x000960fc: b5000350 cbnz x16, #+0x68 (addr 0x96164)
0x00096100: b9400270 ldr w16, [tr] ; state_and_flags
0x00096104: 72000a1f tst w16, #0x7
0x00096108: 54000281 b.ne #+0x50 (addr 0x96158)
0x0009610c: a94553f3 ldp tr, x20, [sp, #80]
0x00096110: a9465bf5 ldp x21, x22, [sp, #96]
0x00096114: a94763f7 ldp x23, x24, [sp, #112]
0x00096118: a9486bf9 ldp x25, x26, [sp, #128]
0x0009611c: a94973fb ldp x27, x28, [sp, #144]
0x00096120: a94a7bfd ldp x29, lr, [sp, #160]
0x00096124: 6d4127e8 ldp d8, d9, [sp, #16]
0x00096128: 6d422fea ldp d10, d11, [sp, #32]
0x0009612c: 6d4337ec ldp d12, d13, [sp, #48]
0x00096130: 6d443fee ldp d14, d15, [sp, #64]
0x00096134: b9402674 ldr w20, [tr, #36] ; is_gc_marking
0x00096138: 9102c3ff add sp, sp, #0xb0 (176)
0x0009613c: d65f03c0 ret
0x00096140: b9400016 ldr w22, [x0]
0x00096144: b94006d0 ldr w16, [x22, #4]
0x00096148: 37effb70 tbnz w16, #29, #-0x94 (addr 0x960b4)
0x0009614c: f942fe7e ldr lr, [tr, #1528] ; pJniReadBarrier
0x00096150: d63f03c0 blr lr
0x00096154: 17ffffd8 b #-0xa0 (addr 0x960b4)
0x00096158: f942827e ldr lr, [tr, #1280] ; pTestSuspend
0x0009615c: d63f03c0 blr lr
0x00096160: 17ffffeb b #-0x54 (addr 0x9610c)
0x00096164: f9405260 ldr x0, [tr, #160] ; exception
0x00096168: f942867e ldr lr, [tr, #1288] ; pDeliverException
0x0009616c: d63f03c0 blr lr
0x00096170: d4200000 brk #0x0
作为对比,我们将同参数类型的普通Native方法生成的汇编代码也列在这里。它和getArrayBaseOffsetForComponentType
一样也是private static方法,因此唯一的区别就在于@FastNative
注解。可以看到,普通Native方法生成的汇编代码行数为75行,而@FastNative
方法生成的汇编代码行数只有61行。
5: int sun.nio.ch.IOUtil.fdVal(java.io.FileDescriptor) (dex_method_idx=34034)
DEX CODE:
OatMethodOffsets (offset=0x0000738c)
code_offset: 0x00097110
OatQuickMethodHeader (offset=0x0009710c)
vmap_table: (offset=0x0008147a)
QuickMethodFrameInfo
frame_size_in_bytes: 176
core_spill_mask: 0x7ff80000 (r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30)
fp_spill_mask: 0x0000ff00 (fr8, fr9, fr10, fr11, fr12, fr13, fr14, fr15)
CODE: (code_offset=0x00097110 size=300)...
0x00097110: d102c3ff sub sp, sp, #0xb0 (176)
0x00097114: a90553f3 stp tr, x20, [sp, #80]
0x00097118: a9065bf5 stp x21, x22, [sp, #96]
0x0009711c: a90763f7 stp x23, x24, [sp, #112]
0x00097120: a9086bf9 stp x25, x26, [sp, #128]
0x00097124: a90973fb stp x27, x28, [sp, #144]
0x00097128: a90a7bfd stp x29, lr, [sp, #160]
0x0009712c: 6d0127e8 stp d8, d9, [sp, #16]
0x00097130: 6d022fea stp d10, d11, [sp, #32]
0x00097134: 6d0337ec stp d12, d13, [sp, #48]
0x00097138: 6d043fee stp d14, d15, [sp, #64]
0x0009713c: f90003e0 str x0, [sp]
0x00097140: 350005f4 cbnz w20, #+0xbc (addr 0x971fc)
0x00097144: b900bbe1 str w1, [sp, #184]
0x00097148: 910003f0 mov x16, sp
0x0009714c: f9005a70 str x16, [tr, #176] ; top_quick_frame_method
0x00097150: 885f7e70 ldxr w16, [tr]
0x00097154: 52ab8011 mov w17, #0x5c000000
0x00097158: 350005f0 cbnz w16, #+0xbc (addr 0x97214)
0x0009715c: 8810fe71 stlxr w16, w17, [tr]
0x00097160: 35ffff90 cbnz w16, #-0x10 (addr 0x97150)
0x00097164: f904f67f str xzr, [tr, #2536] ; 2536
0x00097168: f9406a76 ldr x22, [tr, #208] ; jni_env
0x0009716c: b9401ad7 ldr w23, [x22, #24]
0x00097170: b94022d8 ldr w24, [x22, #32]
0x00097174: b9001ad8 str w24, [x22, #24]
0x00097178: 9102e3f0 add x16, sp, #0xb8 (184)
0x0009717c: 7100003f cmp w1, #0x0 (0)
0x00097180: 9a9f1202 csel x2, x16, xzr, ne
0x00097184: aa0003e1 mov x1, x0
0x00097188: aa1603e0 mov x0, x22
0x0009718c: f940083e ldr lr, [x1, #16]
0x00097190: d63f03c0 blr lr
0x00097194: 885ffe70 ldaxr w16, [tr]
0x00097198: 52ab8011 mov w17, #0x5c000000
0x0009719c: 6b11021f cmp w16, w17
0x000971a0: 54000401 b.ne #+0x80 (addr 0x97220)
0x000971a4: 88107e7f stxr w16, wzr, [tr]
0x000971a8: 35ffff70 cbnz w16, #-0x14 (addr 0x97194)
0x000971ac: f943d270 ldr x16, [tr, #1952] ; 1952
0x000971b0: f904f670 str x16, [tr, #2536] ; 2536
0x000971b4: b9401ad8 ldr w24, [x22, #24]
0x000971b8: b90022d8 str w24, [x22, #32]
0x000971bc: b9001ad7 str w23, [x22, #24]
0x000971c0: f9405270 ldr x16, [tr, #160] ; exception
0x000971c4: b5000350 cbnz x16, #+0x68 (addr 0x9722c)
0x000971c8: a94553f3 ldp tr, x20, [sp, #80]
0x000971cc: a9465bf5 ldp x21, x22, [sp, #96]
0x000971d0: a94763f7 ldp x23, x24, [sp, #112]
0x000971d4: a9486bf9 ldp x25, x26, [sp, #128]
0x000971d8: a94973fb ldp x27, x28, [sp, #144]
0x000971dc: a94a7bfd ldp x29, lr, [sp, #160]
0x000971e0: 6d4127e8 ldp d8, d9, [sp, #16]
0x000971e4: 6d422fea ldp d10, d11, [sp, #32]
0x000971e8: 6d4337ec ldp d12, d13, [sp, #48]
0x000971ec: 6d443fee ldp d14, d15, [sp, #64]
0x000971f0: b9402674 ldr w20, [tr, #36] ; is_gc_marking
0x000971f4: 9102c3ff add sp, sp, #0xb0 (176)
0x000971f8: d65f03c0 ret
0x000971fc: b9400016 ldr w22, [x0]
0x00097200: b94006d0 ldr w16, [x22, #4]
0x00097204: 37effa10 tbnz w16, #29, #-0xc0 (addr 0x97144)
0x00097208: f942fe7e ldr lr, [tr, #1528] ; pJniReadBarrier
0x0009720c: d63f03c0 blr lr
0x00097210: 17ffffcd b #-0xcc (addr 0x97144)
0x00097214: f941967e ldr lr, [tr, #808] ; pJniMethodStart
0x00097218: d63f03c0 blr lr
0x0009721c: 17ffffd3 b #-0xb4 (addr 0x97168)
0x00097220: f9419a7e ldr lr, [tr, #816] ; pJniMethodEnd
0x00097224: d63f03c0 blr lr
0x00097228: 17ffffe3 b #-0x74 (addr 0x971b4)
0x0009722c: f9405260 ldr x0, [tr, #160] ; exception
0x00097230: f942867e ldr lr, [tr, #1288] ; pDeliverException
0x00097234: d63f03c0 blr lr
0x00097238: d4200000 brk #0x0
@FastNative
和fast jni
虽然都省去了线程切换时间,但实现细节仍然有些差别。从JNI trampoline的角度,你可以认为@FastNative
是fast jni
的重构,性能稍微好一些。
不过@FastNative
不支持synchronize
,原因是synchronize
加锁的动作发生在trampoline里面,它可能阻塞,而这不符合@FastNative
设计的初衷。
说完线程状态切换,再来说说JNI调用过程中的GC Root。它们主要来自两个地方:
- Native方法传入的引用参数。
- C/C++函数中创建的Java对象。
这些对象必须被当作GC Root,原因是它们可能没有被其他任何地方引用,譬如new一个对象将它直接作为参数。为了在GC时能够找到这些GC Root,所以虚拟机引入了Local Reference Table,Global Reference Table,HandleScope(已弃用)等一系列数据结构,而且JNI trampoline中也增加了一些处理环节。那么如果C/C++函数中不需要这些Java对象呢?是不是就意味JNI Trampoline中可以省去这些处理环节?
根据这个优化思路,谷歌引入了@CriticalNative
注解,如下所示。
@CriticalNative
public static native long getNanoTimeAdjustment(long offsetInSeconds);
JNIEXPORT jlong JNICALL VM_getNanoTimeAdjustment(jlong offsetInSeconds) {
return JVM_GetNanoTimeAdjustment(nullptr, nullptr, offsetInSeconds);
}
由于被@CriticalNative
注解的JNI函数内部不能使用Java对象,因此它只能被用于static方法。因为非static方法会默认将this
作为参数传入。此外,C/C++层对应的函数也不再拥有JNIEnv*
和jclass
两个参数。
让我们来见识一下@CriticalNative
的威力!
10: long jdk.internal.misc.VM.getNanoTimeAdjustment(long) (dex_method_idx=32015)
DEX CODE:
OatMethodOffsets (offset=0x00006fd0)
code_offset: 0x000988c0
OatQuickMethodHeader (offset=0x000988bc)
vmap_table: (offset=0x0006fd81)
QuickMethodFrameInfo
frame_size_in_bytes: 0
core_spill_mask: 0x00000000
fp_spill_mask: 0x00000000
CODE: (code_offset=0x000988c0 size=16)...
0x000988c0: aa0003ef mov x15, x0
0x000988c4: aa0103e0 mov x0, x1
0x000988c8: f94009f0 ldr x16, [x15, #16]
0x000988cc: d61f0200 br x16
可以看到,经过@CriticalNative
注解的方法,最终编译生成的汇编代码只剩下了4行,性能可谓大大提升。
@FastNative
和@CriticalNative
虽然对性能提升有帮助,但一定要注意它的适用范围,切莫随意使用,造成其他的问题。关于这二者的详细介绍和注意事项可以参考官方链接:@FastNative、@CriticalNative。本文就不再狗尾续貂了。