0%

HashMap源码解析

一、JDK1.8

1、总的介绍

官方注释摘录如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
*
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
* HashMap 的实例有两个影响其性能的参数:初始容量和负载因子。容量是哈希表中桶的数量,初始容量只是创建哈希表时的容量。负载因子是哈希
* 表在其容量自动增加之前允许达到多满的度量。当哈希表的条目数超过负载因子与当前容量的乘积时,哈希表将被重新哈希(即重建内部数据结
* 构),使哈希表的桶数大约增加一倍。
*
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
*
* <p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
*
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>
*
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* index = (n - 1) & hash 高效取余数
* ^:异或,相同取0,不同取1;
* &:与,1&1=1,其余为0,即两位同时为“1”,结果才为“1”,否则为0;
* |:或,0|0=0,其余为1,即参加运算的两个对象只要有一个为1,其值为1
*
* 1、对hashCode进行16位的无符号右移
* 2、对自身进行与或运算
* 3、取余
*
* hash的再次计算能够把高位的变化影响到了低位的变化,大大减少了hash冲突
* 主要是让高16位参与运算,使下标更加散列,产生更少的碰撞,增加效率。
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Doug Lea
* @author Josh Bloch
* @author Arthur van Hoff
* @author Neal Gafter
* @see Object#hashCode()
* @see Collection
* @see Map
* @see TreeMap
* @see Hashtable
* @since 1.2
*/

2、成员变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
*
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* The root of a tree bin is normally its first node. However,
* sometimes (currently only upon Iterator.remove), the root might
* be elsewhere, but can be recovered following parent links
* (method TreeNode.root()).
*
* All applicable internal methods accept a hash code as an
* argument (as normally supplied from a public method), allowing
* them to call each other without recomputing user hashCodes.
* Most internal methods also accept a "tab" argument, that is
* normally the current table, but may be a new or old one when
* resizing or converting.
*
* When bin lists are treeified, split, or untreeified, we keep
* them in the same relative access/traversal order (i.e., field
* Node.next) to better preserve locality, and to slightly
* simplify handling of splits and traversals that invoke
* iterator.remove. When using comparators on insertion, to keep a
* total ordering (or as close as is required here) across
* rebalancings, we compare classes and identityHashCodes as
* tie-breakers.
*
* The use and transitions among plain vs tree modes is
* complicated by the existence of subclass LinkedHashMap. See
* below for hook methods defined to be invoked upon insertion,
* removal and access that allow LinkedHashMap internals to
* otherwise remain independent of these mechanics. (This also
* requires that a map instance be passed to some utility methods
* that may create new nodes.)
*
* The concurrent-programming-like SSA-based coding style helps
* avoid aliasing errors amid all of the twisty pointer operations.
*/

初始化常量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* 1073741824
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;

3、方法解析

3.1 put方法

方法参数:

1
2
3
4
V put(
K key,
V value
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
// 根据key计算hashcode,相对于JDK7中hash算法有所简化
return putVal(hash(key), key, value, false, true);
}

3.2 putVal方法

方法参数:

1
2
3
4
5
6
7
V putVal(
int hash,
K key,
V value,
boolean onlyIfAbsent,
boolean evict
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
/**
* Implements Map.put and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;

// 给tab赋值,并判断数组是否为null,如果是则初始化数组,数组大小为n
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;

// 根据hashcode计算对应数组的下标i,并判断该位置是否存在元素
// 如果等于null,则生成一个Node对象赋值到该数组位置
// 如果不为null,将该位置对应的元素取出来赋值给p
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
// 如果该下标位置存在元素,则进行一系列判断
Node<K,V> e; K k;

// 首先判断该下标位置存在的元素的key是否和当前put进来的key相等
// 如果相等,则在后续代码中更新value,并返回oldValue
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
// 如果该下标位置存在的元素的类型是TreeNode,表示该位置存的是一棵红黑树
// 那么就会把新元素添加到红黑树中,并且也会判断新key是否已经存在红黑树中
// 如果存在则返回该TreeNode,并在后续代码中更新value
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// 否则该位置存的是一个链表,那就要把新元素插入到链表中
// 因为要看当前链表的长度,所以需要遍历链表
// 在遍历链表的过程中,一边记录链表上的元素个数,一边判断是否存在相同的key
// 遍历到尾节点后,将新元素封装为Node对象,并插入到链表的尾部
// 并且链表上的元素个数如果已经有8个了(不包括新元素对应的节点),则将链表改造成红黑树
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// TREEIFY_THRESHOLD = 8
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}

// 如果key相同,则更新value,返回oldValue
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}

// 增加修改次数
++modCount;

// 新元素插入后,判断是否需要扩容
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}

3.3 resize方法

方法参数:

1
Node<K,V>[] resize()

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
// resize()包括数组初始化和扩容

// 记录当前数组信息
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;

// 计算新数组的扩容大小、扩容阈值
int newCap, newThr = 0;

// 如果旧数组大小大于0,则双倍扩容
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
// 表示要初始化数组,但用户指定了初始化容量
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
// 表示要初始化数组,用默认值
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 用新数组的大小计算新数组的扩容阈值
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;

// 生成新数组,并赋值给table属性
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

// 如果是扩容,则把老数组上的元素转移到新数组上
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;

// 遍历数组的每一个位置
if ((e = oldTab[j]) != null) {
oldTab[j] = null;

// 如果该位置只有一个元素,则直接转移到新数组上
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
// 如果该位置上的元素是TreeNode,则对这棵红黑树进行转移
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
// 否则,该位置上是一个链表,则要转移链表
else { // preserve order

// 将当前链表拆分为两个链表,记录链表的头节点和尾节点
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;

// 遍历链表
do {
next = e.next;
// 加入低位链表
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// 加入高位链表
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);

// 将拆分后的链表转移到新数组上
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

3.4 split方法

方法参数:

1
2
3
4
5
6
void split(
HashMap<K,V> map,
Node<K,V>[] tab,
int index,
int bit
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

/**
* Splits nodes in a tree bin into lower and upper tree bins,
* or untreeifies if now too small. Called only from resize;
* see above discussion about split bits and indices.
*
* @param map the map
* @param tab the table for recording bin heads
* @param index the index of the table being split
* @param bit the bit of hash to split on
*/
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
TreeNode<K,V> b = this;
// Relink into lo and hi lists, preserving order
TreeNode<K,V> loHead = null, loTail = null;
TreeNode<K,V> hiHead = null, hiTail = null;
int lc = 0, hc = 0;

// 由于红黑树是由链表改造而来,所以链表其实还是存在的
// 对链表进行高低拆分
for (TreeNode<K,V> e = b, next; e != null; e = next) {
next = (TreeNode<K,V>)e.next;
e.next = null;
if ((e.hash & bit) == 0) {
if ((e.prev = loTail) == null)
loHead = e;
else
loTail.next = e;
loTail = e;
++lc;
}
else {
if ((e.prev = hiTail) == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
++hc;
}
}

// 拆分之后,如果存在低位链表,则看链表长度,如果小于等于6,则把节点类型改成Node类型
if (loHead != null) {
// UNTREEIFY_THRESHOLD = 6
if (lc <= UNTREEIFY_THRESHOLD)
tab[index] = loHead.untreeify(map);
else {
// 否则,把头节点转移到新节点(红黑树的根节点一定是链表的头节点)
tab[index] = loHead;
// 如果存在高位链表
if (hiHead != null) // (else is already treeified)
loHead.treeify(tab);
}
}

// 和上面类似
if (hiHead != null) {
if (hc <= UNTREEIFY_THRESHOLD)
tab[index + bit] = hiHead.untreeify(map);
else {
tab[index + bit] = hiHead;
if (loHead != null)
hiHead.treeify(tab);
}
}
}

3.5 putTreeVal方法

方法参数:

1
2
3
4
5
6
TreeNode<K,V> putTreeVal(
HashMap<K,V> map,
Node<K,V>[] tab,
int h, K k,
V v
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
int h, K k, V v) {
Class<?> kc = null;
boolean searched = false;
TreeNode<K,V> root = (parent != null) ? root() : this;
for (TreeNode<K,V> p = root;;) {
int dir, ph; K pk;
if ((ph = p.hash) > h)
dir = -1;
else if (ph < h)
dir = 1;
else if ((pk = p.key) == k || (k != null && k.equals(pk)))
return p;
else if ((kc == null &&
(kc = comparableClassFor(k)) == null) ||
(dir = compareComparables(kc, k, pk)) == 0) {
if (!searched) {
TreeNode<K,V> q, ch;
searched = true;
if (((ch = p.left) != null &&
(q = ch.find(h, k, kc)) != null) ||
((ch = p.right) != null &&
(q = ch.find(h, k, kc)) != null))
return q;
}
dir = tieBreakOrder(k, pk);
}

TreeNode<K,V> xp = p;
if ((p = (dir <= 0) ? p.left : p.right) == null) {
Node<K,V> xpn = xp.next;
TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
if (dir <= 0)
xp.left = x;
else
xp.right = x;
xp.next = x;
x.parent = x.prev = xp;
if (xpn != null)
((TreeNode<K,V>)xpn).prev = x;
moveRootToFront(tab, balanceInsertion(root, x));
return null;
}
}
}

3.6 treeifyBin方法

方法参数:

1
2
3
4
void treeifyBin(
Node<K,V>[] tab,
int hash
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/**
* Replaces all linked nodes in bin at index for given hash unless
* table is too small, in which case resizes instead.
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;

// 如果长度小于MIN_TREEIFY_CAPACITY(默认为64),则会扩容
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {

// 把链表改造为双向链表,并且把节点类型改为TreeNode
TreeNode<K,V> hd = null, tl = null;
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);

// 改造为红黑树
if ((tab[index] = hd) != null)
hd.treeify(tab);
}
}

3.7 replacementTreeNode方法

方法参数:

1
2
3
4
TreeNode<K,V> replacementTreeNode(
Node<K,V> p,
Node<K,V> next
)

源码:

1
2
3
4
// For treeifyBin
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
return new TreeNode<>(p.hash, p.key, p.value, next);
}

3.8 treeify方法

方法参数:

1
2
3
void treeify(
Node<K,V>[] tab
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/**
* Forms tree of the nodes linked from this node.
*/
final void treeify(Node<K,V>[] tab) {
TreeNode<K,V> root = null;
for (TreeNode<K,V> x = this, next; x != null; x = next) {
next = (TreeNode<K,V>)x.next;
x.left = x.right = null;
if (root == null) {
x.parent = null;
x.red = false;
root = x;
}
else {
K k = x.key;
int h = x.hash;
Class<?> kc = null;
for (TreeNode<K,V> p = root;;) {
int dir, ph;
K pk = p.key;
if ((ph = p.hash) > h)
dir = -1;
else if (ph < h)
dir = 1;
else if ((kc == null &&
(kc = comparableClassFor(k)) == null) ||
(dir = compareComparables(kc, k, pk)) == 0)
dir = tieBreakOrder(k, pk);

TreeNode<K,V> xp = p;
if ((p = (dir <= 0) ? p.left : p.right) == null) {
x.parent = xp;
if (dir <= 0)
xp.left = x;
else
xp.right = x;
root = balanceInsertion(root, x);
break;
}
}
}
}
moveRootToFront(tab, root);
}

3.9 untreeify方法

方法参数:

1
2
3
final Node<K,V> untreeify(
HashMap<K,V> map
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* Returns a list of non-TreeNodes replacing those linked from
* this node.
*/
final Node<K,V> untreeify(HashMap<K,V> map) {
Node<K,V> hd = null, tl = null;
for (Node<K,V> q = this; q != null; q = q.next) {
Node<K,V> p = map.replacementNode(q, null);
if (tl == null)
hd = p;
else
tl.next = p;
tl = p;
}
return hd;
}

// For conversion from TreeNodes to plain nodes
Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) {
return new Node<>(p.hash, p.key, p.value, next);
}

3.10 红黑树方法

1
// Red-black tree methods, all adapted from CLR

3.9.1 rotateLeft方法

方法参数:

1
2
3
4
<K,V> TreeNode<K,V> rotateLeft(
TreeNode<K,V> root,
TreeNode<K,V> p
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root,
TreeNode<K,V> p) {
TreeNode<K,V> r, pp, rl;
if (p != null && (r = p.right) != null) {
if ((rl = p.right = r.left) != null)
rl.parent = p;
if ((pp = r.parent = p.parent) == null)
(root = r).red = false;
else if (pp.left == p)
pp.left = r;
else
pp.right = r;
r.left = p;
p.parent = r;
}
return root;
}

3.9.2 rotateRight方法

方法参数:

1
2
3
4
<K,V> TreeNode<K,V> rotateRight(
TreeNode<K,V> root,
TreeNode<K,V> p
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root,
TreeNode<K,V> p) {
TreeNode<K,V> l, pp, lr;
if (p != null && (l = p.left) != null) {
if ((lr = p.left = l.right) != null)
lr.parent = p;
if ((pp = l.parent = p.parent) == null)
(root = l).red = false;
else if (pp.right == p)
pp.right = l;
else
pp.left = l;
l.right = p;
p.parent = l;
}
return root;
}

3.9.3 balanceInsertion方法

方法参数:

1
2
3
4
<K,V> TreeNode<K,V> balanceInsertion(
TreeNode<K,V> root,
TreeNode<K,V> x
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root,
TreeNode<K,V> x) {
x.red = true;
for (TreeNode<K,V> xp, xpp, xppl, xppr;;) {
if ((xp = x.parent) == null) {
x.red = false;
return x;
}
else if (!xp.red || (xpp = xp.parent) == null)
return root;
if (xp == (xppl = xpp.left)) {
if ((xppr = xpp.right) != null && xppr.red) {
xppr.red = false;
xp.red = false;
xpp.red = true;
x = xpp;
}
else {
if (x == xp.right) {
root = rotateLeft(root, x = xp);
xpp = (xp = x.parent) == null ? null : xp.parent;
}
if (xp != null) {
xp.red = false;
if (xpp != null) {
xpp.red = true;
root = rotateRight(root, xpp);
}
}
}
}
else {
if (xppl != null && xppl.red) {
xppl.red = false;
xp.red = false;
xpp.red = true;
x = xpp;
}
else {
if (x == xp.left) {
root = rotateRight(root, x = xp);
xpp = (xp = x.parent) == null ? null : xp.parent;
}
if (xp != null) {
xp.red = false;
if (xpp != null) {
xpp.red = true;
root = rotateLeft(root, xpp);
}
}
}
}
}
}

3.9.4 balanceDeletion方法

方法参数:

1
2
3
4
<K,V> TreeNode<K,V> balanceDeletion(
TreeNode<K,V> root,
TreeNode<K,V> x
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
static <K,V> TreeNode<K,V> balanceDeletion(TreeNode<K,V> root,
TreeNode<K,V> x) {
for (TreeNode<K,V> xp, xpl, xpr;;) {
if (x == null || x == root)
return root;
else if ((xp = x.parent) == null) {
x.red = false;
return x;
}
else if (x.red) {
x.red = false;
return root;
}
else if ((xpl = xp.left) == x) {
if ((xpr = xp.right) != null && xpr.red) {
xpr.red = false;
xp.red = true;
root = rotateLeft(root, xp);
xpr = (xp = x.parent) == null ? null : xp.right;
}
if (xpr == null)
x = xp;
else {
TreeNode<K,V> sl = xpr.left, sr = xpr.right;
if ((sr == null || !sr.red) &&
(sl == null || !sl.red)) {
xpr.red = true;
x = xp;
}
else {
if (sr == null || !sr.red) {
if (sl != null)
sl.red = false;
xpr.red = true;
root = rotateRight(root, xpr);
xpr = (xp = x.parent) == null ?
null : xp.right;
}
if (xpr != null) {
xpr.red = (xp == null) ? false : xp.red;
if ((sr = xpr.right) != null)
sr.red = false;
}
if (xp != null) {
xp.red = false;
root = rotateLeft(root, xp);
}
x = root;
}
}
}
else { // symmetric
if (xpl != null && xpl.red) {
xpl.red = false;
xp.red = true;
root = rotateRight(root, xp);
xpl = (xp = x.parent) == null ? null : xp.left;
}
if (xpl == null)
x = xp;
else {
TreeNode<K,V> sl = xpl.left, sr = xpl.right;
if ((sl == null || !sl.red) &&
(sr == null || !sr.red)) {
xpl.red = true;
x = xp;
}
else {
if (sl == null || !sl.red) {
if (sr != null)
sr.red = false;
xpl.red = true;
root = rotateLeft(root, xpl);
xpl = (xp = x.parent) == null ?
null : xp.left;
}
if (xpl != null) {
xpl.red = (xp == null) ? false : xp.red;
if ((sl = xpl.left) != null)
sl.red = false;
}
if (xp != null) {
xp.red = false;
root = rotateRight(root, xp);
}
x = root;
}
}
}
}
}

3.9.5 balanceDeletion方法

方法参数:

1
2
3
<K,V> boolean checkInvariants(
TreeNode<K,V> t
)

源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
* Recursive invariant check
*/
static <K,V> boolean checkInvariants(TreeNode<K,V> t) {
TreeNode<K,V> tp = t.parent, tl = t.left, tr = t.right,
tb = t.prev, tn = (TreeNode<K,V>)t.next;
if (tb != null && tb.next != t)
return false;
if (tn != null && tn.prev != t)
return false;
if (tp != null && t != tp.left && t != tp.right)
return false;
if (tl != null && (tl.parent != t || tl.hash > t.hash))
return false;
if (tr != null && (tr.parent != t || tr.hash < t.hash))
return false;
if (t.red && tl != null && tl.red && tr != null && tr.red)
return false;
if (tl != null && !checkInvariants(tl))
return false;
if (tr != null && !checkInvariants(tr))
return false;
return true;
}

4、注意事项

4.1 允许为null

HashTable和ConcurrentHashMap不允许键值为Null,而HashMap是允许键值为null。

这里简单说一下,HashTable代码是如何限制的,我们可以看下面的HashTable的put方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public synchronized V put(K key, V value) {
// Make sure the value is not null
if (value == null) {
throw new NullPointerException();
}

// Makes sure the key is not already in the hashtable.
Entry<?,?> tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
@SuppressWarnings("unchecked")
Entry<K,V> entry = (Entry<K,V>)tab[index];
for(; entry != null ; entry = entry.next) {
if ((entry.hash == hash) && entry.key.equals(key)) {
V old = entry.value;
entry.value = value;
return old;
}
}

addEntry(hash, key, value, index);
return null;
}

当HashTable的key为null时,会直接抛出空指针异常;当HashTable的value为null时,key.hashCode()此时执行会报错。

再来看ConcurrentHashMap的put方法,是直接判断key或者value是否为null,如果有一个为null,则直接排除空指针异常。

1
2
3
4
5
6
7
8
9
public V put(K key, V value) {
return putVal(key, value, false);
}

/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
// 剩下的就不展示啦...
}

4.1.1 为什么HashTable和ConcurrentHashMap要设计键值不为null呢?

因为它们是并发安全的类,当线程A去获取一个key=“k”的值的时候,因为当前key不存在,所以会返回null。但同时另一个线程B插入了key=“k”但是值为null,这个时候就会产生歧义(二义性问题),究竟是key不存在为null还是key存在但是值为null呢?而HashTable和ConcurrentHashMap又是线程安全的实现,所以它们不允许为null。

而HashMap就没有这个担忧了,它的值可以为null,value也可以为null。不过需要注意的是,HashMap的key为null的只能有一个,value为null的却可以有多个。

4.2 并发不安全

HashMap是不安全的实现,HashTable和ConcurrentHashMap是安全的,HashTable锁的粒度比ConcurrentHashMap要大。

4.3 fail-fast机制

fail-fast机制和fail-safe机制,两种机制都是多线程并发操作集合时的一种失败处理机制。

4.3.1 fail-fast

表示快速失败,在集合遍历过程中,一旦发现容器中的数据被修改了,会立刻抛出ConcurrentModificationException异常,从而导致遍历失败。

在HashMap源码中有解释:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>

HashMap中所有“集合视图方法”返回的迭代器都是fail-fast:如果在迭代器创建后的任何时候对映射进行结构修改,除了通过迭代器自己的remove方法外,其他任何方式,迭代器将抛出ConcurrentModificationException。因此,面对并发修改,迭代器会迅速而干净地失败,而不是冒着在未来不确定的时间出现任意、不确定行为的风险。
迭代器的故障快速行为是无法保证的,因为一般来说,在存在不同步的并发修改的情况下,不可能做出任何硬保证。故障快速迭代器在尽力而为的基础上抛出ConcurrentModificationException。因此,编写一个依赖于此异常的正确性的程序是错误的:迭代器的快速故障行为应该只用于检测错误。

java.util包下的集合类都是fail-fast机制的,常见的使用fail-fast方式遍历的容器除了HashMap,还有ArrayList等。

4.3.2 fail-safe

表示失败安全,也就是在这种机制下,出现集合元素的修改,不会抛出ConcurrentModificationException异常。原因是采用安全失败机制的集合容器,在遍历是不是直接在集合内容上访问的,而是先复制原有集合内容,在拷贝的集合上进行遍历。由于迭代器是对原集合的拷贝进行遍历,所以在遍历过程中对原结合所做的修改并不能被迭代器检测到。

java.util.concurrent包下的容器都是fail-safe的,可以在多线程下并发修改。常见的使用fail-safe方式遍历的容器有ConcurrentHashMap和CopyOnWriteArrayList等。

4.4 初始容量16

HashMap底层是通过数组实现的,数组的初始容量为1 << 4,为2的幂次方。那么就有两个问题,一是为什么数组的容量要为2的幂次方,二是为什么初始化容量是16而不是8或者32?基于这两个问题,作如下分析:

4.4.1 为什么数组的容量要为2的幂次方?

首先,计算机是二进制的,我们所有的代码计算机执行最后都是二进制0和1。其次,二进制进行位操作效率更高,我们看源码的时候,经常发现位运算,这也是因为位运算比起一般的运算(比如取模运算,十进制运算)效率更高的原因,这也是DEFAULT_INITIAL_CAPACITY = 1 << 4而不是DEFAULT_INITIAL_CAPACITY = 16的原因。

4.4.2 HashMap是如何保证数组容量一直为2的幂次方?

一是在容量初始化的时候指定容量为16,二是在扩容的时候,容量变成两倍。

4.4.3 为什么初始化容量是16而不是8或者32呢?

在源码文档里有做解释:

1
2
3
4
5
6
7
8
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.

此实现为基本操作(get和put)提供恒定时间性能,假设哈希函数在存储桶中正确分散元素。对集合视图进行迭代所需的时间与 HashMap 实例的“容量”(存储桶数)加上其大小(键值映射数)成正比。因此,如果迭代性能很重要,则不要将初始容量设置得太高(或负载系数太低),这一点非常重要。

16是经验值的选择。

4.5 负载因子0.75

负载因子表示哈希表的填满程度,跟扩容息息相关。为什么不是0.5或者1呢?

如果是0.5,就是说哈希表填到一半就开始扩容了,这样会导致扩容频繁,并且空间利用率比较低。 如果是1,就是说哈希表完全填满才开始扩容,这样虽然空间利用提高了,但是哈希冲突机会却大了。这里我们可以看一下源码文档的解释(总的介绍里面也有):

1
2
3
4
5
6
7
8
9
10
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.

作为一般规则,默认负载因子(0.75)在时间和空间成本上提供了良好的权衡。负载因子数值越大,空间开销越低,但是会提高查找成本(体现在大多数的HashMap类的操作,包括get和put)。设置初始大小时,应该考虑预计的entry数在map及其负载系数,并且尽量减少rehash操作的次数。如果初始容量大于最大条目数除以负载因子,rehash操作将不会发生。

也就是说,负载因子0.75就是冲突的机会与空间利用率权衡的最后体现,也是一个程序员实验的经验值。

4.6 扩容机制

HashMap的扩容条件就是当HashMap中的元素个数(size)超过临界值(threshold)时就会自动扩容。

在HashMap中,threshold = loadFactor * capacity。loadFactor是装载因子,表示HashMap满的程度,默认值为0.75f,设置成0.75有一个好处,那就是0.75正好是3/4,而capacity又是2的幂。所以,两个数的乘积都是整数。

下面是HashMap中的扩容方法(resize)中的一段:

1
2
3
4
if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}

从上面代码可以看出,扩容后的table大小变为原来的两倍,这一步执行之后,就会进行扩容后table的调整,这部分非本文重点,省略。

所以,通过保证初始化容量均为2的幂,并且扩容时也是扩容到之前容量的2倍,所以,保证了HashMap的容量永远都是2的幂。

4.7 红黑树转换

JDK 1.8 的 HashMap 和 ConcurrentHashMap 都有这样一个特点:最开始的 Map 是空的,因为里面没有任何元素,往里放元素时会计算 hash 值,计算之后,第 1 个 value 会首先占用一个桶(也称为槽点)位置,后续如果经过计算发现需要落到同一个桶中,那么便会使用链表的形式往后延长,这种解决hash冲突的方法称为“拉链法”。

4.7.1 JDK1.8为什么要使用红黑树?

使用红黑树是为了优化hash表链表过长导致时间复杂度增加的问题。这样可以利用链表对内存的使用率以及红黑树的高效检索,是一种很有效率的数据结构。

但同样是树,为什么不用AVL而是使用红黑树呢?

AVL树是一种高度平衡的二叉树,所以查找的非常高,但是,有利就有弊,AVL树为了维持这种高度的平衡,就要付出更多代价。每次插入、删除都要做调整,复杂、耗时。所以,hashmap用红黑树。

4.7.2 什么时候链表会转换为红黑树?

当链表长度大于8,且,满足容量大于或等于 MIN_TREEIFY_CAPACITY(默认为 64)时,就会把链表转换为红黑树。

这里要注意边界,第一个条件是链表长度大于8

4.7.3 为什么是链表长度大于8?为什么不是大于等于8?为什么不是大于等于7?

1
2
3
// TREEIFY_THRESHOLD = 8
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);

我们看代码的时候,可能会困惑,这不是binCount > 8 - 1 也就是链表长度大于等于7?其实,产生这样的困惑是因为代码没有看全,binCount是从0开始计数,所以在做树化判断时binCount的值等于 链表长度 - 1(注意此时的链表长度没有算新插入的节点),判断条件为 binCount >= TREEIFY_THRESHOLD - 1也就是binCount+1(链表长度)>= TREEIFY_THRESHOLD。看到这里,有人就立马得出了链表长度大于等于8的结论,但这还是不对。因为我们没有计算新插入的节点,此时链表新插入了一个节点,所以链表树化的那一刻,它的真实长度应该是binCount +1+1( 链表长度)> TREEIFY_THRESHOLD(8)。

综上,所见不一定即所得,人云亦云不可取,知其然也需知其所以然!

第二个条件是要满足容量大于或等于64

1
2
3
4
5
6
7
8
9
// MIN_TREEIFY_CAPACITY = 64
// 先判断table的长度是否小于 MIN_TREEIFY_CAPACITY (64)
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
// 小于64则扩容
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
// ...
}

4.7.4 为什么最小树形化容量是64?

这是因为容量低于64时,哈希碰撞的机率比较大,而这个时候出现长链表的可能性会稍微大一些,这种原因下产生的长链表,我们应该优先选择扩容而避免不必要的树化。

4.7.5 为什么链表长度大于等于8后,还需要判断table长度是否小于MIN_TREEIFY_CAPACITY呢?小于的话会直接扩容,大于或等于才会转换红黑树呢?

链表长度大于8有两种情况:

  • table长度足够,hash冲突过多
  • hash没有冲突,但是在计算table下标的时候,由于table长度太小,导致很多hash不一致的key计算的下标一致

第二种情况是可以用扩容的方式来避免的,扩容后链表长度变短,读写效率自然提高。另外,扩容相对于转换为红黑树的好处在于可以保证数据结构更简单。 由此可见并不是链表长度超过8就一定会转换成红黑树,而是先尝试扩容。

4.7.6 什么时候红黑树会转换为链表?

红黑树转换为链表有两种情况:

  • 调用map的remove方法删除元素
  • resize的时候,对红黑树进行了拆分

第一种情况的话,是通过红黑树根节点及其子节点是否为空来判断。

1
2
3
4
5
6
7
8
9
// removeTreeNode
if (root == null
|| (movable
&& (root.right == null
|| (rl = root.left) == null
|| rl.left == null))) {
tab[index] = first.untreeify(map); // too small
return;
}

第二种情况,当红黑树的节点小于或等于 6 个以后,会恢复为链表形态。

1
2
3
4
5
6
7
8
9
10
if (hiHead != null) {
// UNTREEIFY_THRESHOLD = 6
if (hc <= UNTREEIFY_THRESHOLD)
tab[index + bit] = hiHead.untreeify(map);
else {
tab[index + bit] = hiHead;
if (loHead != null)
hiHead.treeify(tab);
}
}

4.7.7 为什么不一开始就用红黑树,而要经历一个转换的过程呢?

通过查看源码可以发现,默认是链表长度达到 8 就转成红黑树,而当长度降到 6 就转换回去,这体现了时间和空间平衡的思想,最开始使用链表的时候,空间占用是比较少的,而且由于链表短,所以查询时间也没有太大的问题。可是当链表越来越长,需要用红黑树的形式来保证查询的效率。对于何时应该从链表转化为红黑树,需要确定一个阈值,这个阈值默认为 8,并且在源码中也对选择 8 这个数字做了说明:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million

如果 hashCode 分布良好,也就是 hash 计算的结果离散好的话,那么红黑树这种形式是很少会被用到的,因为各个值都均匀分布,很少出现链表很长的情况。在理想情况下,链表长度符合泊松分布,各个长度的命中概率依次递减,当长度为 8 的时候,概率仅为 0.00000006。这是一个小于千万分之一的概率,通常我们的 Map 里面是不会存储这么多的数据的,所以通常情况下,并不会发生从链表向红黑树的转换。

但是,HashMap 决定某一个元素落到哪一个桶里,是和这个对象的 hashCode 有关的,JDK 并不能阻止用户实现自己的哈希算法。

4.8 hash冲突解决方案

4.8.1 开放寻址法(线性探测法)(ThreadLocal)

从发生冲突的那个位置开始,按照一定的次序(按顺序向前)从hash表中找到一个空闲的位置,然后把发生冲突的元素存入到这个空闲的位置中。ThreadLocal就用到了线性探测法来解决hash冲突。

4.8.2 链式寻址法(拉链法)(HashMap)

把存在hash冲突的key,以单向链表的方式来存储,比如HashMap就是采用链式寻址法来实现的。

4.8.3 再hash法

当通过某个hash函数计算的key存在冲突时,再用另一个hash函数对这个key做hash,一直运算直至不再产生冲突。这种方式会增加计算时间,性能影响较大。

4.8.4 建立公共溢出区

把hash表分为基本表和溢出表两个部分,凡是存在冲突的元素一律放到溢出表中。

上面的四种是常见的解决hash冲突的方法。HashMap在JDK1.8版本中,通过链式寻址法+红黑树的方式来解决hash冲突问题。

5、总结

阅读HashMap源码,对我们实际编程也十分受益。在我们实际编程中,也需要注意:能够使用位运算的地方尽量使用位运算;初始化的数字都需要慎重对待;数据结构是基础;优化不是一蹴而就的,但贯穿始终。

更多HashTable和ConcurrentHashMap相关,请博客内搜索查看相关文档。

文章参考:https://juejin.cn/post/6917866631322402830