在Javadoc中,Object.hashCode()指出
在合理可行的范围内,由class定义的hashCode方法Object确实为不同的对象返回不同的整数。(这通常是通过将对象的 内部地址 转换为整数来实现的,但是Java™编程语言不需要此实现技术。)
Object
这是一个常见的误解,它与内存地址有关,但没有关系,因为它可以在不通知的情况下发生更改,并且hashCode()不会更改,也不得针对对象更改。
这是一个例子来说明我的关注
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); for (int t = 0; t < 10; t++) { System.gc(); Object[] objects = new Object[10]; for (int i = 0; i < objects.length; i++) objects[i] = new Object(); for (int i = 0; i < objects.length; i++) { if (i > 0) System.out.print(", "); int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i); System.out.printf("%08x: hc= %08x", location, objects[i].hashCode()); } System.out.println(); }
版画
eac00038: hc= 4f47e0ba, eac00048: hc= 2342d884, eac00058: hc= 7994d431, eac00068: hc= 19f71b53, eac00078: hc= 2e22f376, eac00088: hc= 789ddfa3, eac00098: hc= 44c58432, eac000a8: hc= 036a11e4, eac000b8: hc= 28bc917c, eac000c8: hc= 73f378c8 eac00038: hc= 30813486, eac00048: hc= 729f624a, eac00058: hc= 3dee2310, eac00068: hc= 5d400f33, eac00078: hc= 18a60d19, eac00088: hc= 3da5f0f3, eac00098: hc= 596e0123, eac000a8: hc= 450cceb3, eac000b8: hc= 4bd66d2f, eac000c8: hc= 6a9a4f8e eac00038: hc= 711dc088, eac00048: hc= 584b5abc, eac00058: hc= 3b3219ed, eac00068: hc= 564434f7, eac00078: hc= 17f17060, eac00088: hc= 6c08bae7, eac00098: hc= 3126cb1a, eac000a8: hc= 69e0312b, eac000b8: hc= 7dbc345a, eac000c8: hc= 4f114133 eac00038: hc= 50c8c3b8, eac00048: hc= 2ca98e77, eac00058: hc= 2fc83d89, eac00068: hc= 034005e1, eac00078: hc= 6041f871, eac00088: hc= 0b1df416, eac00098: hc= 5b83d60d, eac000a8: hc= 2c5a1e6b, eac000b8: hc= 5083198c, eac000c8: hc= 4f025f9f eac00038: hc= 00c5eb8a, eac00048: hc= 41eab16b, eac00058: hc= 1726099c, eac00068: hc= 4240eca3, eac00078: hc= 346fe350, eac00088: hc= 1db4b415, eac00098: hc= 429addef, eac000a8: hc= 45609812, eac000b8: hc= 489fe953, eac000c8: hc= 7a8f6d64 eac00038: hc= 7e628e42, eac00048: hc= 7869cfe0, eac00058: hc= 6aceb8e2, eac00068: hc= 29cc3436, eac00078: hc= 1d77daaa, eac00088: hc= 27b4de03, eac00098: hc= 535bab52, eac000a8: hc= 274cbf3f, eac000b8: hc= 1f9fd541, eac000c8: hc= 3669ae9f eac00038: hc= 772a3766, eac00048: hc= 749b46a8, eac00058: hc= 7e3bfb66, eac00068: hc= 13f62649, eac00078: hc= 054b8cdc, eac00088: hc= 230cc23b, eac00098: hc= 1aa3c177, eac000a8: hc= 74f2794a, eac000b8: hc= 5af92541, eac000c8: hc= 1afcfd10 eac00038: hc= 396e1dd8, eac00048: hc= 6c696d5c, eac00058: hc= 7d8aea9e, eac00068: hc= 2b316b76, eac00078: hc= 39862621, eac00088: hc= 16315e08, eac00098: hc= 03146a9a, eac000a8: hc= 3162a60a, eac000b8: hc= 4382f3da, eac000c8: hc= 4a578fd6 eac00038: hc= 225765b0, eac00048: hc= 17d5176d, eac00058: hc= 26f50154, eac00068: hc= 1f2a45c7, eac00078: hc= 104b1bcd, eac00088: hc= 330e3816, eac00098: hc= 6a844689, eac000a8: hc= 12330301, eac000b8: hc= 530a3ffc, eac000c8: hc= 45eee3fb eac00038: hc= 3f9432e0, eac00048: hc= 1a9830bc, eac00058: hc= 7da79447, eac00068: hc= 04f801c4, eac00078: hc= 363bed68, eac00088: hc= 185f62a9, eac00098: hc= 1e4651bf, eac000a8: hc= 1aa0e220, eac000b8: hc= 385db088, eac000c8: hc= 0ef0cda1
附带说明;如果您看这段代码
if (value == 0) value = 0xBAD ;
似乎0xBAD映射到此值的概率是任何hashCode的两倍,因为0。如果运行足够长的时间,您会看到
long count = 0, countBAD = 0; while (true) { for (int i = 0; i < 200000000; i++) { int hc = new Object().hashCode(); if (hc == 0xBAD) countBAD++; count++; } System.out.println("0xBAD ratio is " + (double) (countBAD << 32) / count + " times expected."); }
0xBAD ratio is 2.0183116992481205 times expected.
这显然是特定于实现的。
下面,我包括Object.hashCode()OpenJDK 7中使用的实现。
Object.hashCode()
该函数支持六种不同的计算方法,其中只有两种可以注意到对象的地址(“地址”是C ++ oop强制转换为intptr_t)。两种方法之一按原样使用地址,而另一种方法则进行一些打乱,然后使用不经常更新的随机数将结果混搭。
oop
intptr_t
在其余方法中,一个返回一个常量(大概是为了测试),一个返回序号,其余的则基于伪随机序列。
似乎可以在运行时选择该方法,默认似乎是方法0,即os::random()。后者是线性同余生成器,带有所谓的竞争条件。:-)竞争条件是可以接受的,因为在最坏的情况下,它将导致两个对象共享相同的哈希码。这不会破坏任何不变性。
os::random()
第一次需要哈希码时执行计算。为了保持一致性,结果将存储在对象的标头中,并在随后的调用中返回hashCode()。缓存是在此功能之外完成的。
hashCode()
总而言之,Object.hashCode()基于对象地址的概念在很大程度上是一个历史性的手工艺品,已被现代垃圾收集器的特性所淘汰。
// hotspot/src/share/vm/runtime/synchronizer.hpp // hashCode() generation : // // Possibilities: // * MD5Digest of {obj,stwRandom} // * CRC32 of {obj,stwRandom} or any linear-feedback shift register function. // * A DES- or AES-style SBox[] mechanism // * One of the Phi-based schemes, such as: // 2654435761 = 2^32 * Phi (golden ratio) // HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ; // * A variation of Marsaglia's shift-xor RNG scheme. // * (obj ^ stwRandom) is appealing, but can result // in undesirable regularity in the hashCode values of adjacent objects // (objects allocated back-to-back, in particular). This could potentially // result in hashtable collisions and reduced hashtable efficiency. // There are simple ways to "diffuse" the middle address bits over the // generated hashCode values: // static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0 ; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random() ; } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; } else if (hashCode == 2) { value = 1 ; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence ; } else if (hashCode == 4) { value = intptr_t(obj) ; } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD ; assert (value != markOopDesc::no_hash, "invariant") ; TEVENT (hashCode: GENERATE) ; return value; }