《走进JVM系列(一)》之JVM的系统学习和全局认识

418 阅读18分钟

1.什么是JVM

JVM是Java Virtual Machine(Java虚拟机)的缩写,JVM是一种用于计算设备的规范,它是一个虚构出来的计算机,是通过在实际的计算机上仿真模拟各种计算机功能来实现的。

引入Java语言虚拟机后,Java语言在不同平台上运行时不需要重新编译。

Java语言使用Java虚拟机屏蔽了与具体平台相关的信息,使得Java语言编译程序只需生成在Java虚拟机上运行的目标代码(字节码),就可以在多种平台上不加修改地运行。

首先我们通过官网(docs.oracle.com/javase/8/do…),先对JDK进行简单的了解,进入我们会看到一张图。

从最早接触java,我们就知道,java代码可以一直编写,到处运行。那是什么支持java代码的一次编写,到处运行呢,最底层的支撑就是我们今天要写到的JVM。

很多同学搞不清楚JDK,JRE,JVM三者之间的关系是什么,但是通过上面这张图,三者之间的关系应该很明显了。

JDK包含了JRE,JRE包含了JVM。

2.从源码到类文件

首先,我们来看一个.java是如何执行的?

接下来我们实战操作一波。

第一步,我们准备一个HelloWorld.java文件

第二步,我们通过javac 生成一个HelloWorld.class文件

第三步,我们通过工具打开HelloWorld.class文件

我们发现编译器将源码生成了二进制或者十六进制的文件,大家肯定很迷惑,一串串的我们怎么看的懂,没关系,JVM看的懂就行。毕竟没有面试官会拿着一个.class文件给你,让你给他翻译成.java文件~~~

3.类文件到java虚拟机(类加载机制)

HelloWorld.class又是如何添加到JVM中呢,这就要说到我们的类加载机制了。

通过官网我们能看到:

3.1.装载(Loading)

查找和导入class文件

(1)通过一个类的全限定名获取定义此类的二进制字节流 。

(2)将这个字节流所代表的静态存储结构转化为方法区的运行时数据结构 (存入方法区)。

(3)在Java堆中生成一个代表这个类的java.lang.Class对象,作为对方法区中这些数据的访问入口(存入堆中)。

进行(1)时,我们需要借助类装载 器完成,顾名思义,就是用来装载Class文件的。所谓ClassLoader并不是只有一个,我们需要不同的CLassLoader去装载不同的目录下的类。如下图所示(_也就是我们面试会经常被问到的_):

检查某个类是否已经加载:顺序是自底向上,从Custom ClassLoader到BootStrap ClassLoader逐层检 查,只要某个Classloader已加载,就视为已加载此类,保证此类只所有ClassLoader加载一次。

加载的顺序:加载的顺序是自顶向下,也就是由上层来逐层尝试加载此类。

这里自然而然的引申出一个面试高频题:什么是双亲委派机制?

双亲委派机制

定义:如果一个类加载器在接到加载类的请求时,它首先不会自己尝试去加载这个类,而是把 这个请求任务委托给父类加载器去完成,依次递归,如果父类加载器可以完成类加载任务,就 成功返回;只有父类加载器无法完成此加载任务时,才自己去加载。
优势:Java类随着加载它的类加载器一起具备了一种带有优先级的层次关系。比如,Java中的Object类,它存放在rt.jar之中,无论哪一个类加载器要加载这个类,最终都是委派给处于模型 最顶端的启动类加载器进行加载,因此Object在各种类加载环境中都是同一个类。如果不采用 双亲委派模型,那么由各个类加载器自己取加载的话,那么系统中会存在多种不同的Object类。

通常在你说明了什么是双亲委派机制后,面试官还会在跟一个问题:如何破坏双亲委派机制?

首先我们先看loadClass源码:

protected Class<?> loadClass(String name, boolean resolve)
        throws ClassNotFoundException
    {
        synchronized (getClassLoadingLock(name)) {
            // First, check if the class has already been loaded
            Class<?> c = findLoadedClass(name);
            if (c == null) {
                long t0 = System.nanoTime();
                try {
                    if (parent != null) {
                        c = parent.loadClass(name, false);
                    } else {
                        c = findBootstrapClassOrNull(name);
                    }
                } catch (ClassNotFoundException e) {
                    // ClassNotFoundException thrown if class not found
                    // from the non-null parent class loader
                }

                if (c == null) {
                    // If still not found, then invoke findClass in order
                    // to find the class.
                    long t1 = System.nanoTime();
                    c = findClass(name);

                    // this is the defining class loader; record the stats
                    sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0);
                    sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1);
                    sun.misc.PerfCounter.getFindClasses().increment();
                }
            }
            if (resolve) {
                resolveClass(c);
            }
            return c;
        }
    }

通过源码我们会看到,代码中首先回去判断parent 是否加载到,只有parent 为null的时候才会继续往下加载。

那么就很简单了,我们只需要自定义类加载器,然后复写classLoader,就可以破坏双亲委派机制。

这只是其中一种方法,更多破坏方法,大家可以自行研究。

3.2 链接(Link)

验证(Verify)

保证被加载类的正确性
  1. 文件格式验证
  2. 元数据验证
  3. 字节码验证
  4. 符号引用验证

准备(Prepare)

为类的静态变量分配内存,并将其初始化为默认值

解析(Resolve)

把类中的符号引用转换为直接引用

什么是符号引用? class文件中对应相关信息的符号(代称)。

什么是直接引用? String str 的直接地址。

初始化(Initialize)

对类的静态变量,静态代码块执行初始化操作

4.运行时数据区(初探)

在上述装载(2)(3)步时,让JVM装载class文件。

我们知道,最终我们的目的是让JVM执行.class文件,那么我们如何将.class文件放入JVM中,是将整个class文件放入JVM中吗,我觉得肯定不是,作为JVM设计者肯定不允许这么草率的操作。

所以,我们必须将class打散,不同的信息放入不同的块中。由此我们可知,JVM也需要相应的分成若干区域,分别存储class对应的不同的信息。

这就是我们接下来要说的运行时数据区

首先我们来看一下运行时数据区的图。

我们从官网上可以看到:

2.5. Run-Time Data Areas(运行时数据区

The Java Virtual Machine defines various run-time data areas that are used during execution of a program. Some of these data areas are created on Java Virtual Machine start-up and are destroyed only when the Java Virtual Machine exits. Other data areas are per thread. Per-thread data areas are created when a thread is created and destroyed when the thread exits.

2.5.1.方法区(Method Area)

The Java Virtual Machine has a_method area_that is shared among all Java Virtual Machine threads. The method area is analogous to the storage area for compiled code of a conventional language or analogous to the "text" segment in an operating system process. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors, including the special methods (§2.9) used in class and instance initialization and interface initialization.
The method area is created on virtual machine start-up. Although the method area is logically part of the heap, simple implementations may choose not to either garbage collect or compact it. This specification does not mandate the location of the method area or the policies used to manage compiled code. The method area may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger method area becomes unnecessary. The memory for the method area does not need to be contiguous.
A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the method area, as well as, in the case of a varying-size method area, control over the maximum and minimum method area size.

The following exceptional condition is associated with the method area:

If memory in the method area cannot be made available to satisfy an allocation request, the Java Virtual Machine throws anOutOfMemoryError.

2.5.2.堆(Heap)

The Java Virtual Machine has a_heap_that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.
The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a_garbage collector_); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous.
A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the heap, as well as, if the heap can be dynamically expanded or contracted, control over the maximum and minimum heap size.

The following exceptional condition is associated with the heap:

If a computation requires more heap than can be made available by the automatic storage management system, the Java Virtual Machine throws anOutOfMemoryError.

2.5.3.常量池(Run-Time Constant Pool)

A_run-time constant pool_is a per-class or per-interface run-time representation of theconstant_pooltable in aclassfile (§4.4). It contains several kinds of constants, ranging from numeric literals known at compile-time to method and field references that must be resolved at run-time. The run-time constant pool serves a function similar to that of a symbol table for a conventional programming language, although it contains a wider range of data than a typical symbol table.
Each run-time constant pool is allocated from the Java Virtual Machine's method area (§2.5.4). The run-time constant pool for a class or interface is constructed when the class or interface is created (§5.3) by the Java Virtual Machine.
The following exceptional condition is associated with the construction of the run-time constant pool for a class or interface:
When creating a class or interface, if the construction of the run-time constant pool requires more memory than can be made available in the method area of the Java Virtual Machine, the Java Virtual Machine throws anOutOfMemoryError.

2.5.4.本地方法栈(Native Method Stacks)

An implementation of the Java Virtual Machine may use conventional stacks, colloquially called "C stacks," to supportnativemethods (methods written in a language other than the Java programming language). Native method stacks may also be used by the implementation of an interpreter for the Java Virtual Machine's instruction set in a language such as C. Java Virtual Machine implementations that cannot loadnativemethods and that do not themselves rely on conventional stacks need not supply native method stacks. If supplied, native method stacks are typically allocated per thread when each thread is created.
This specification permits native method stacks either to be of a fixed size or to dynamically expand and contract as required by the computation. If the native method stacks are of a fixed size, the size of each native method stack may be chosen independently when that stack is created.
A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the native method stacks, as well as, in the case of varying-size native method stacks, control over the maximum and minimum method stack sizes.

The following exceptional conditions are associated with native method stacks:

If the computation in a thread requires a larger native method stack than is permitted, the Java Virtual Machine throws aStackOverflowError.
If native method stacks can be dynamically expanded and native method stack expansion is attempted but insufficient memory can be made available, or if insufficient memory can be made available to create the initial native method stack for a new thread, the Java Virtual Machine throws anOutOfMemoryError.

2.5.5.虚拟机栈(Java Virtual Machine Stacks)

Each Java Virtual Machine thread has a private_Java Virtual Machine stack_, created at the same time as the thread. A Java Virtual Machine stack stores frames (§2.6). A Java Virtual Machine stack is analogous to the stack of a conventional language such as C: it holds local variables and partial results, and plays a part in method invocation and return. Because the Java Virtual Machine stack is never manipulated directly except to push and pop frames, frames may be heap allocated. The memory for a Java Virtual Machine stack does not need to be contiguous.
In the First Edition of__The__Java® Virtual Machine Specification, the Java Virtual Machine stack was known as the__Java stack.

This specification permits Java Virtual Machine stacks either to be of a fixed size or to dynamically expand and contract as required by the computation. If the Java Virtual Machine stacks are of a fixed size, the size of each Java Virtual Machine stack may be chosen independently when that stack is created.
A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of Java Virtual Machine stacks, as well as, in the case of dynamically expanding or contracting Java Virtual Machine stacks, control over the maximum and minimum sizes.

The following exceptional conditions are associated with Java Virtual Machine stacks:

If the computation in a thread requires a larger Java Virtual Machine stack than is permitted, the Java Virtual Machine throws aStackOverflowError.
If Java Virtual Machine stacks can be dynamically expanded, and expansion is attempted but insufficient memory can be made available to effect the expansion, or if insufficient memory can be made available to create the initial Java Virtual Machine stack for a new thread, the Java Virtual Machine throws anOutOfMemoryError.

2.5.6.程序计数器(ThepcRegister)

The Java Virtual Machine can support many threads of execution at once (JLS §17). Each Java Virtual Machine thread has its ownpc(program counter) register. At any point, each Java Virtual Machine thread is executing the code of a single method, namely the current method (§2.6) for that thread. If that method is notnative, thepcregister contains the address of the Java Virtual Machine instruction currently being executed. If the method currently being executed by the thread isnative, the value of the Java Virtual Machine'spcregister is undefined. The Java Virtual Machine'spcregister is wide enough to hold areturnAddressor a native pointer on the specific platform.

上述英文,大家可以细细的品。一定要细细的品。

品完之后,得到运行时数据总图如下:

1.方法区

方法区是各个线程共享的内存区域,在虚拟机启动时创建。

用于存储已被虚拟机加载的类信息、常量、静态变量、即时编译器编译后的代码等数据。

虽然Java虚拟机规范把方法区描述为堆的一个逻辑部分,但是它却又一个别名叫做Non-Heap(非堆),目 的是与Java堆区分开来。

当方法区无法满足内存分配需求时,将抛出OutOfMemoryError异常。

此时回看装载阶段的第2步:(2)将这个字节流所代表的静态存储结构转化为方法区的运行时数据结构
如果这时候把从Class文件到装载的第(1)和(2)步合并起来理解的话,可以画个图:

特别说明

(1)方法区在JDK 8中就是Metaspace,在JDK6或7中就是Perm Space

(2)Run-Time Constant Pool

Class文件中除了有类的版本、字段、方法、接口等描述信息外,还有一项信息就是常量池,用于存放编译时期生成的各种字面量和符号引用,这部分内容将在类加载后进入方法区的运行时常量池中存放。

2.Heap(堆)

Java堆是Java虚拟机所管理内存中最大的一块,在虚拟机启动时创建,被所有线程共享。

Java对象实例以及数组都在堆上分配。

此时回看装载阶段的第3步:(3)在Java堆中生成一个代表这个类的java.lang.Class对象,作为对方法区中这些数据的访问入口
此时装载(1)(2)(3)的图可以改动一下

3.虚拟机栈(Java Virtual Machine Stacks)

经过上面的分析,类加载机制的装载过程已经完成,后续的链接,初始化也会相应的生效。
假如目前的阶段是初始化完成了,后续做啥呢?肯定是Use使用咯,不用的话这样折腾来折腾去 有什么意义?那怎样才能被使用到?换句话说里面内容怎样才能被执行?比如通过主函数main调 用其他方法,这种方式实际上是main线程执行之后调用的方法,即要想使用里面的各种内容,得 要以线程为单位,执行相应的方法才行。 那一个线程执行的状态如何维护?
一个线程可以执行多少个方法?这样的关系怎么维护呢?

虚拟机栈是一个线程执行的区域,保存着一个线程中方法的调用状态。换句话说,一个Java线程的运行状态,由一个虚拟机栈来保存,所以虚拟机栈肯定是线程私有的,独有的,随着线程的创建而创建。

每一个被线程执行的方法,为该栈中的栈帧,即每个方法对应一个栈帧。 调用一个方法,就会向栈中压入一个栈帧;一个方法调用完成,就会把该栈帧从栈中弹出。

有如下代码:

a(){
  b();
}
b(){
  c();
}
c(){
}
运行上述代码时,栈和栈帧图示如下

那么,每个栈帧中有存储了那些信息呢?依旧是通过官网我们可以看到:

Local Variables

Each frame (§2.6) contains an array of variables known as its_local variables_. The length of the local variable array of a frame is determined at compile-time and supplied in the binary representation of a class or interface along with the code for the method associated with the frame (§4.7.3).
A single local variable can hold a value of typeboolean,byte,char,short,int,float,reference, orreturnAddress. A pair of local variables can hold a value of typelongordouble.
Local variables are addressed by indexing. The index of the first local variable is zero. An integer is considered to be an index into the local variable array if and only if that integer is between zero and one less than the size of the local variable array.
A value of typelongor typedoubleoccupies two consecutive local variables. Such a value may only be addressed using the lesser index. For example, a value of typedoublestored in the local variable array at index_n_actually occupies the local variables with indices_n_and_n_+1; however, the local variable at index_n_+1 cannot be loaded from. It can be stored into. However, doing so invalidates the contents of local variable_n_.
The Java Virtual Machine does not require_n_to be even. In intuitive terms, values of typeslonganddoubleneed not be 64-bit aligned in the local variables array. Implementors are free to decide the appropriate way to represent such values using the two local variables reserved for the value.
The Java Virtual Machine uses local variables to pass parameters on method invocation. On class method invocation, any parameters are passed in consecutive local variables starting from local variable_0_. On instance method invocation, local variable_0_is always used to pass a reference to the object on which the instance method is being invoked (thisin the Java programming language). Any parameters are subsequently passed in consecutive local variables starting from local variable_1_.

Operand Stacks

Each frame (§2.6) contains a last-in-first-out (LIFO) stack known as its_operand stack_. The maximum depth of the operand stack of a frame is determined at compile-time and is supplied along with the code for the method associated with the frame (§4.7.3).
Where it is clear by context, we will sometimes refer to the operand stack of the current frame as simply the operand stack.
The operand stack is empty when the frame that contains it is created. The Java Virtual Machine supplies instructions to load constants or values from local variables or fields onto the operand stack. Other Java Virtual Machine instructions take operands from the operand stack, operate on them, and push the result back onto the operand stack. The operand stack is also used to prepare parameters to be passed to methods and to receive method results.
For example, the_iadd_instruction (a href="docs.oracle.com/javase/spec…">§iadd) adds twointvalues together. It requires that theintvalues to be added be the top two values of the operand stack, pushed there by previous instructions. Both of theintvalues are popped from the operand stack. They are added, and their sum is pushed back onto the operand stack. Subcomputations may be nested on the operand stack, resulting in values that can be used by the encompassing computation.
Each entry on the operand stack can hold a value of any Java Virtual Machine type, including a value of typelongor typedouble.
Values from the operand stack must be operated upon in ways appropriate to their types. It is not possible, for example, to push twointvalues and subsequently treat them as alongor to push twofloatvalues and subsequently add them with an_iadd_instruction. A small number of Java Virtual Machine instructions (the_dup_instructions (a href="docs.oracle.com/javase/spec…">§dup) and_swap_(a href="docs.oracle.com/javase/spec…">§swap)) operate on run-time data areas as raw values without regard to their specific types; these instructions are defined in such a way that they cannot be used to modify or break up individual values. These restrictions on operand stack manipulation are enforced throughclassfile verification (§4.10).
At any point in time, an operand stack has an associated depth, where a value of typelongordoublecontributes two units to the depth and a value of any other type contributes one unit.

Dynamic Linking

Each frame (§2.6) contains a reference to the run-time constant pool (§2.5.5) for the type of the current method to support_dynamic linking_of the method code. Theclassfile code for a method refers to methods to be invoked and variables to be accessed via symbolic references. Dynamic linking translates these symbolic method references into concrete method references, loading classes as necessary to resolve as-yet-undefined symbols, and translates variable accesses into appropriate offsets in storage structures associated with the run-time location of these variables.
This late binding of the methods and variables makes changes in other classes that a method uses less likely to break this code.

Normal Method Invocation Completion

A method invocation_completes normally_if that invocation does not cause an exception (§2.10) to be thrown, either directly from the Java Virtual Machine or as a result of executing an explicitthrowstatement. If the invocation of the current method completes normally, then a value may be returned to the invoking method. This occurs when the invoked method executes one of the return instructions (§2.11.8), the choice of which must be appropriate for the type of the value being returned (if any).
The current frame (§2.6) is used in this case to restore the state of the invoker, including its local variables and operand stack, with the program counter of the invoker appropriately incremented to skip past the method invocation instruction. Execution then continues normally in the invoking method's frame with the returned value (if any) pushed onto the operand stack of that frame.

Abrupt Method Invocation Completion

A method invocation_completes abruptly_if execution of a Java Virtual Machine instruction within the method causes the Java Virtual Machine to throw an exception (§2.10), and that exception is not handled within the method. Execution of an_athrow_instruction (a href="docs.oracle.com/javase/spec…">§athrow) also causes an exception to be explicitly thrown and, if the exception is not caught by the current method, results in abrupt method invocation completion. A method invocation that completes abruptly never returns a value to its invoker.

依然要大家自己细细的在去品品。

4.程序计数器(The pc Register)

我们都知道一个JVM进程中有多个线程在执行,而线程中的内容是否能够拥有执行权,是根据 CPU调度来的。
假如线程A正在执行到某个地方,突然失去了CPU的执行权,切换到线程B了,然后当线程A再获 得CPU执行权的时候,怎么能继续执行呢?这就是需要在线程中维护一个变量,记录线程执行到 的位置。

程序计数器占用的内存空间很小,由于Java虚拟机的多线程是通过线程轮流切换,并分配处理器执行时 间的方式来实现的,在任意时刻,一个处理器只会执行一条线程中的指令。因此,为了线程切换后能够 恢复到正确的执行位置,每条线程需要有一个独立的程序计数器(线程私有)。

如果线程正在执行Java方法,则计数器记录的是正在执行的虚拟机字节码指令的地址; 如果正在执行的是Native方法,则这个计数器为空。

5.本地方法栈(Native Method Stacks)

如果当前线程执行的方法是Native类型的,这些方法就会在本地方法栈中执行。


本文为 《走进JVM系列》第一篇,只是带领大家对JVM做一个全局的认识,JVM作为性能优化中很重要的一环,要想走向java架构师,JVM是必不可少的一环。因此,不要惧怕困难,一遍不懂看两遍,两遍不行看三遍,四遍,只要功夫深,铁棒磨成针。加油!共勉!

更多精彩文章,关注公众号【ToBeTopJavaer】,更有如下数万元精品vip资源免费等你来拿!!! 

1574903796(1).jpg