从源码来聊一聊hashmap

HashMap为什么会是面试中的常客呢？我觉得有以下几点原因：
* 考察你阅读源码的能力
* 是否了解内部数据结构
* 是否了解其存储和查询逻辑
* 对非线程安全情况下的使用考虑
前段时间一同事面试蚂蚁金服，就被问到了这个问题；其实很多情况下都是从hashMap,hashTable,ConcurrentHahMap三者之间的关系衍生而出，当然也有直接就针对hashMap原理直接进行考察的。实际上本质都一样，就是为了考察你是否对集合中这些常用集合的原理、实现和使用场景是否清楚。一方面是我们开发中用的多，当然用的人也就多，但是用的好的人却不多（我也用的多，用的也不好）。所以就借此机会（强行蹭一波）再来捋一捋这个HashMap。本文基于jdk1.7.0_80；jdk 1.8之后略有改动，这个后面细说。

继承关系

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

hashMap实现了Map、Cloneable、Serializable三个接口，并且继承了AbstractMap这个抽象类。hashTable继承的是Dictionary这个类，同时也实现了Map、Cloneable、Serializable三个接口。

主要属性

DEFAULT_INITIAL_CAPACITY 默认初始容量 16 （hashtable 是11）常量

 /**
     * The default initial capacity - MUST be a power of two.
     * 默认初始容量-必须是2的幂。
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

MAXIMUM_CAPACITY 默认最大容量常量

/**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     *如果有一个更大的值被用于构造HashMap,则使用最大值
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

DEFAULT_LOAD_FACTOR 负载因子(默认0.75) 常量

/**
     * The load factor used when none specified in constructor.
     * 加载因子，如果构造函数中没有指定，则使用默认的
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

EMPTY_TABLE 默认的空表

/**
     * An empty table instance to share when the table is not inflated.
     * 当表不膨胀时共享的空表实例。
     */
    static final Entry<?,?>[] EMPTY_TABLE = {};

table 表，必要时调整大小。长度必须是两个幂。这个也是hashmap中的核心的存储结构

/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

size 表示HashMap中存放KV的数量（为链表/树中的KV的总和）

/**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

threshold 扩容变量，表示当HashMap的size大于threshold时会执行resize操作。 threshold=capacity*loadFactor

/**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.
    int threshold;

loadFactor 负载因子负载因子用来衡量HashMap满的程度。loadFactor的默认值为0.75f。计算HashMap的实时装载因子的方法为：size/capacity，而不是占用桶的数量去除以capacity。（桶的概念后续介绍）

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

modCount 这个HashMap的结构修改的次数是那些改变HashMap中的映射数量或修改其内部结构(例如rehash)的那些。这个字段用于使迭代器对HashMap失败快速的集合视图。(见ConcurrentModificationException)。

/**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

hashSeed 与此实例相关联的随机值，用于哈希键的散列代码，使散列冲突更难找到。如果0，那么替代哈希是禁用的。

/**
     * A randomizing value associated with this instance that is applied to
     * hash code of keys to make hash collisions harder to find. If 0 then
     * alternative hashing is disabled.
     */
    transient int hashSeed = 0;

结构分析

static class Entry<K,V> implements Map.Entry<K,V>

hashmap中是通过使用一个继承自Map中内部类Entry的Entry静态内部类来存储每一个K-V值的。看下具体代码：

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key; //键对象
        V value;     //值对象
        Entry<K,V> next; //指向链表中下一个Entry对象，可为null，表示当前Entry对象在链表尾部
        int hash;    //键对象的hash值

        /**
         * 构造对象
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
        /**
        * 获取key
        */
        public final K getKey() {
            return key;
        }
        /**
        * 获取value
        */
        public final V getValue() {
            return value;
        }
        /**
        * 设置value，这里返回的是oldValue(这个不太明白，哪位大佬清楚的可以留言解释下，非常感谢)
        */
        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        /**
        * 重写equals方法
        */
        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }
        /**
        * 重写hashCode方法
        */
        public final int hashCode() {
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

HashMap是一个用于存储Key-Value键值对的集合，每一个键值对也叫做Entry。这些个键值对（Entry）分散存储在一个数组当中，这个数组就是HashMap的主干（也就是上面的table--桶）。看一张图：

hashmap初始化时各个空间的默认值为null，当插入元素时（具体插入下面分析），根据key值来计算出具体的索引位置，如果重复，则使用尾插入法进行插入后面链表中。

尾插法
之前我是通过插入17条数据来试验的（具体数据数目随意，越大重复的几率越高）

public static void main(String[] args) throws Exception {
		HashMap<String, Object> map=new HashMap<>();
		for (int i = 0; i < 170; i++) {
			map.put("key"+i, i);
		}
		System.out.println(map);
	}

通过断点查看next，可以得出我们上面的结论：
1.索引冲突时会使用链表来存储； 2.插入链表的方式是从尾部开始插入的（官方的解释是一般情况下，后来插入的数据被使用的频次较高），这样的话有利于查找。

主要方法

我们平时在开发是最常用的hashMap中的方法无非就是先创建一个HashMap对象，然后存，接着取；对应的方法就是：

构造函数
put函数
get函数

构造函数

 /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity 指定的初始化容量大小
     * @param  loadFactor      the load factor 指定的负载因子
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        //如果初始化容量小于0，则抛出异常
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        //如果初始化容量大于最大容量，则使用默认最大容量
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
       //如果负载因子小于0或者非数值类型，则抛出异常
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        //初始化负载因子
        this.loadFactor = loadFactor;
        //初始化threshold
        threshold = initialCapacity;
        //这个初始化方法是个空方法，应该是意在HashMap的子类中由使用者自行重写该方法的具体实现
        init();
    }

另外两个构造方法实际上都是对上面这个构造方法的调用：

//只制定默认容量
 public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
 }
 //使用HashMap默认的容量大小和负载因子
 public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
 }

还有一个是：

public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);

        putAllForCreate(m);
    }

构造一个映射关系与指定 Map 相同的新 HashMap。所创建的 HashMap 具有默认加载因子 (0.75) 和足以容纳指定 Map 中映射关系的初始容量。

put方法
首先，我们都知道hashmap中的key是允许为null的，这一点也是面试中最常问到的点。那我先看下为什么可以存null作为key值。

public V put(K key, V value) {
        //如果table是空的
        if (table == EMPTY_TABLE) {
            //inflate：扩容/膨胀的意思
            inflateTable(threshold);
        }
        //如果key为null 此处敲下桌子，为什么可以存null？
        if (key == null)
            //执行putForNullKey方法，这个方法的作用是如果key为null，就将当前的k-v存放到table[0],即第一个桶。
            return putForNullKey(value);
        //对key进行一次hash运算，获取hash值
        int hash = hash(key);
        //根据key值得hash值和表的长度来计算索引位置
        int i = indexFor(hash, table.length);
        //移动数据，插入数据
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                //上面Entry中的setValue中也有提到，返回的都是旧的数据
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

hash方法：检索对象哈希代码，并将附加哈希函数应用于结果哈希，该哈希函数防止质量差的哈希函数。这是至关重要的，因为HashMap使用两个长度的哈希表，否则会碰到hashCode的冲突，这些hashCodes在低位上没有区别。注意：空键总是映射到散列0，因此索引为0。

/**
    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        //这个函数确保在每个比特位置上仅以恒定倍数不同
        //的散列码具有有限数量的冲突（在默认加载因子下大约为8）。
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

冲突具体过程描述：

一个空的hashmap表
插入元素，通过hash计算得出索引为3，因为当前3的位置没有元素，因此直接插入进去即可
再次插入元素，通过hash计算得出索引还是3，发生冲突，则将当前新插入的元素放在原来的已有的元素位置，并将其next指向原来已经存在的元素。
get方法
返回指定键映射到的值;如果此映射不包含键映射，则返回null。

 public V get(Object key) {
        //和存null key一样，取的时候也是从table[0]取
        if (key == null)
            return getForNullKey();
        //获取entry
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

getEntry方法

 final Entry<K,V> getEntry(Object key) {
        //size等于0，说明当前hashMap中没有元素，直接返回null（每个entry默认值为null）
        if (size == 0) {
            return null;
        }
        //根据key值计算hash值
        int hash = (key == null) ? 0 : hash(key);
        //通过hash值获取到索引位置，找到对应的桶链进行遍历查找
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            //如果找到则返回，如果没有链表指针移动到下一个节点继续查找。
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

扩容机制

在前面提到过threshold，扩容变量，表示当HashMap的size大于threshold时会执行resize操作。其计算方式是：threshold=capacity*loadFactor。从上面的式子中我们可以得知hashmap的扩容时机是当前当前size的值超过容量乘以负载因子时就会触发扩容。来看下源码：

void addEntry(int hash, K key, V value, int bucketIndex) {
        //如果当前size超过threshold 并且满足桶索引位置不为null的情况下，扩容
        if ((size >= threshold) && (null != table[bucketIndex])) {
           //扩容之后为原来的两倍
            resize(2 * table.length);
            //重新计算hash值
            hash = (null != key) ? hash(key) : 0;
            //重写计算索引
            bucketIndex = indexFor(hash, table.length);
        }
        //执行具体的插入操作
        createEntry(hash, key, value, bucketIndex);
    }

void createEntry(int hash, K key, V value, int bucketIndex) {
        //先取到当前桶的entry
        Entry<K,V> e = table[bucketIndex];
        //将新的数据插入到table[bucketIndex]，再将之前的entry通过链表简介到table[bucketIndex]的next指向；前面的图已经进行了描述。
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

需要注意的是，扩容并不是在hashmap满了之后才进行的，看下面断点：

通过默认构造函数new了一个map对象出来，通过for循环插入12条数据，断点到执行结束，我们看到当前table的容量是16，扩容变量threshold为12（16x0.75），现在我们将12改为13.

此时13还是小于16的，但是还是触发了hashmap 的扩容。当前table容量为32（扩容为了之前的两倍），threshold为24（32x0.75），通过这两种图我们得知：

每次扩容之后的容量为原有容量的两倍（2n）
触发扩容并不是因为当前hashmap对象已经满了，而是通过threhold扩容变量来控制触发时机的。

小结

本文就单纯的扒了一波源码，并对源码中的注释并结合自己的理解进行了翻译，通过断点调试简单的介绍了尾插法在hashmap的应用。最后通过几张图描述了下hashmap发生索引冲突时的解决方案。hashmap在面试时真的是可深可浅，但是源码的阅读还是很有必要的，下面推荐两篇博客给大家。

1.关于hashmap与hashtable的具体对比可以参考这个博客：
HashMap和HashTable到底哪不同？
2.关于为什么hashmap中的容量必须是2的幂，这篇博客大家可以看下：
什么是hashmap？
3.关于hashmap非线程安全的解释
并发安全问题之HashMap