Home  >  Article  >  Java  >  Sample code sharing for implementing hashCode method in Java

Sample code sharing for implementing hashCode method in Java

黄舟
黄舟Original
2017-03-28 10:47:521635browse

You know that the only identification of a object cannot just be done by writing a nice equals Implementing

is great, but now you also have to implement the hashCode method

Let’s see why and how to do it right.

##Equality and hash codes

Equality is from a general perspective, hash codes are more technical, and if we have difficulty understanding, we can say that they are just an implementation detail. To improve performance.

Most data structures use the equals method to determine whether they contain an element, for example:

List<String> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");

This

variablecontains result is true, because , although "b" is not the same instance (in addition, ignoring the string resident), they are equal by comparing each element of the instance and then assigning the comparison result to contains.

is more wasteful, although the data structure of the entire class has been optimized to improve performance

They perform comparisons by using a shortcut method (reducing potential instance equality), thereby replacing by comparison. Each element contained in the instance. The shortcut comparison only needs to compare the following aspects:

Shortcut comparison compares the hash value, which can replace an instance with the same hash code. The instances are not necessarily equal, but equal instances must have the same hash value (or should have, we'll discuss this soon). These data structures are often named using this technique, and can be named by

. Hash

to identify them, among which

HashMap is the most famous representative They usually work like this

When an element is added, its hash code is used to calculate the
    index
  • of the internal

    array (the so-called bucket)

    if , unequal elements have the same hash code, they end up on the same bucket and are bundled together, for example by adding to a list
  • ## when an instance is made. When containing

    operates, its hash code will be used to calculate the bucket value (index value), and the instance will be compared only if there is an element at the corresponding index value.
  • Therefore equals

    ,
  • hashCode
is defined in the

Object idea of ​​the hash method##. # If hashCode<a href="http://www.php.cn/wiki/60.html" target="_blank"> is used as a shortcut to determine equality, then there is only one thing we should care about: equal objects should have the same hash code, which is why if we override </a> equalsmethod, we must create a matching

hashCode

implementation reason!

Otherwise equal objects may not have the same hash code, because they What will be called is the default implementation of Object's. HashCode Guidelines

QuoteFrom official documentation

hashCode General convention:* Call to run Java For the same object in the application, the hashCode method must always return the same integer. This integer does not need to be consistent across different Java applications.

* Compare according to the equals(Object) method. If two objects are equal, calling the hashCode method on the two objects must produce the same result.

* According to the equals(Object) method, if the two objects are not equal, then calling the hashCode method on the two objects will not necessarily produce different integer results. However, programmers should be aware that producing different integer results for unequal objects will potentially improve hash table performance.

The first point reflects equal consistency
Attributes
, and the second one is the requirement we made above. The third illustrates an important detail that we will discuss later.
HashCode implementation

The following is a very simple implementation of
Person.hashCode

@Override
public int hashCode() {
    return Objects.hash(firstName, lastName);
}
person's calculates the hash code by combining multiple fields . They are all calculated through the

hash

function

of Object.

Select fieldsBut which fields are related? The requirement will help us answer this question: If equal objects must have the same hash code, then calculating the hash code is not Any fields not used for equality checking should be included. (Otherwise, the two objects are only different in these fields but may still be equal. At this time, the hash codes of the two objects will be different.) So it is used when the hash group fields should be equal. A subset of fields. By default both use the same fields, but there are some details to consider.

一致性

首先,有一致性的要求。它应该相当严格。虽然它允许如果一些字段改变对应的哈希码发生变化(对于可变的类是不可避免的),但是哈希数据结构并不是为这种场景准备的。

正如我们以上所见的哈希码用于确定元素的桶。但如果hash-relevant字段发生了改变,并不会重新计算哈希码、也不会更新内部数组。

这意味着以后通过相等的对象,甚至同一实例进行查询也会失败,数据结构计算当前的哈希码与之前存储实例计算的哈希码并不一致,并是错误的桶。

结论:最好不要使用可变字段计算哈希码!

性能

哈希码最终计算的频率与可能调用equals差不多,那么这里将是影响性能的关键部分,因此考虑此部分性能也是非常有意义的。并且与equals相比,优化之后又更大的上升空间。

除非使用非常复杂的算法或者涉及非常多的字段,那么计算哈希码的运算成本是微不足道的、同样也是不可避免的。但是也应该考虑是否需要包含所有的字段来进行运算。集合需要特别警惕的对待。以Listssets为例,将会包含集合里面的每一个元素来计算哈希码。是否需要调用它们需要具体情况具体分析。

如果性能是至关重要的,使用Objects.hash因为需要为varargs创建一个数组也许并不是最好的选择。但一般规则优化是适用的:不要过早地使用一个通用的散列码算法,也许需要放弃集合,只有优化分析显示潜在的改进。

碰撞

总是关注性能,这个实现怎么呢?

@Override
public int hashCode() {
    return 0;
}

快是肯定的。相等的对象将具有相同的哈希码。并且,没有可变的字段!

但是,我们之前说过的桶呢?!这种方式下所有的实例将会有相同的桶!这将会导致一个链表来包含所有的元素,这样一来将会有非常差的性能。每次调用contains将会触发对整个list线性扫描。

我们希望尽可能少的元素在同一个桶!一个算法返回变化多端的哈希码,即使对于非常相似的对象,是一个好的开始。

怎样才能达到上面的效果部分取决于选取的字段,我们在计算中包含更多的细节,越有可能获取到不同的哈希码。注意:这个与我们所说的性能是完全相反的。因此,有趣的是,使用过多或者过少的字段都会导致糟糕的性能。

防止碰撞的另一部分是使用实际计算散列的算法。

计算Hsah

最简单的方法来计算一个字段的哈希码是通过直接调用hashCode,结合的话会自动完成。常见的算法是首先在以任意数量的数值(通常是基本数据类型)反复进行相乘操作再与字段哈希码相加

int prime = 31;
int result = 1;
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : lastName.hashCode());
return result;

这可能导致溢出,但是不是特别有问题的,因为他们并没有产生Java异常。

注意,即使是非常良好的的哈希算法也可能因为输入特定的模式的数据有导致频繁碰撞。作为一个简单的例子假设我们会计算点的散列通过增加他们的x和y坐标。当我们处理f(x) = -x线上的点时,线上的点都满足:x + y == 0,将会有大量的碰撞。

但是:我们可以使用一个通用的算法,只到分析表明并不正确,才需要对哈希算法进行修改。

总结

我们了解到计算哈希码就是压缩相等的一个整数值:相等的对象必须有相同的哈希码,而出于对性能的考虑:最好是尽可能少的不相等的对象共享相同的哈希码。

这就意味着如果重写了equals方法,那么就必须重写hashCode方法

当实现hashCode

  • 使用与equals中使用的相同的字段(或者equals中使用字段的子集)

  • 最好不要包含可变的字段。

  • 对集合不要考虑调用hashCode

  • 如果没有特殊的输入特定的模式,尽量采用通用的哈希算法

记住hashCode性能,所以除非分析表明必要性,否则不要浪费太多的精力。


The above is the detailed content of Sample code sharing for implementing hashCode method in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn