Home >Technology peripherals >It Industry >How to Implement Java's hashCode Correctly

How to Implement Java's hashCode Correctly

尊渡假赌尊渡假赌尊渡假赌
尊渡假赌尊渡假赌尊渡假赌Original
2025-02-18 10:46:14624browse

SitePoint Explore the Java world: Invite Java developers to contribute

How to Implement Java's hashCode Correctly

SitePoint continues to expand its content field and will focus on Java in the near future. If you are an experienced Java developer and want to contribute to our Java content, please contact us to share the topic ideas you want to write.

Correct implementation of equals and hashCode methods in Java

You have implemented the equals method for your class? great! But you also have to implement the method. Let's understand why and how to implement it correctly. hashCodeKey points:

In Java, equal objects should have the same hash code. Therefore, if the
    method is rewritten, a matching
  • implementation must be created to ensure accuracy and consistency of storing and retrieving objects in a hash-based collection. equals When implementing hashCode, the same fields used in the
  • method should be used. Try to avoid using mutable fields and collections, as this may cause performance issues.
  • hashCode Hash codes are related to performance optimization, so you should not put too much effort into hashing unless performance analysis indicates improvements are needed. equals
  • Hash collisions (two different objects have the same hash code) can be reduced by improving the hash algorithm and using larger prime numbers as multipliers. This helps to more evenly distribute the hash codes in the set, reducing the possibility of hash conflicts and ensuring faster data retrieval.
  • and
Methods

equalsAlthough the hashCode method is reasonable from a general perspective, the

method is more technical. Strictly speaking, it is just an implementation detail for improving performance.

equalsMost data structures use the hashCode method to check whether they contain an element. For example:

equalsVariable

is true, because although the instances of "b" are not the same (again ignore string residency), they are equal.
<code class="language-java">List<string> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");</string></code>

However, comparing each element with an instance passed to the contains method is inefficient, while a class of data structures uses a more efficient method. Instead of comparing requested instances with each element they contain, they use shortcuts to reduce the number of instances that may be equal, and then compare only those instances.

This shortcut is the hash code, which can be regarded as the equality of the object to be reduced to an integer value. Instances with the same hash code are not necessarily equal, but equal instances have the same hash code. (Or should have the same hash code, which we will discuss later.) Such data structures are usually named after their technical name, which contains "Hash" in which contains is the most famous representative.

They usually work as follows: HashMap

  • When adding an element, its hash code is used to calculate the index in the internal array (called the bucket).
  • If other unequal elements have the same hash code, they end up in the same bucket and must be bundled together, for example by adding them to a list.
  • When passing an instance to the contains method, the bucket is calculated using its hash code. Only the elements in it are compared with the instance.

In this way, implementing the contains method may only require very few, ideally without any equals comparison.

Like the equals method, the hashCode method is also defined in the Object class.

Thinking about hash

If the hashCode method is used as a shortcut to determine equality, then there is only one thing we should really care about: equal objects should have the same hash code.

This is also why if we rewrite the equals method, we have to create a matching hashCode implementation! Otherwise, things that are equal according to our implementation may not have the same hash code, because they use the implementation of the Object class.

hashCode Method agreement

Cite source code:

hashCode The general agreement of the method is:

  • Whenever it is called on the same object multiple times during execution of a Java application, the hashCode method must always return the same integer, provided that the information used in the object's equals comparison is not modified. This integer does not have to be consistent between the execution of one application and the other execution of the same application.
  • If two objects are equal according to the equals(Object) method, the call to the hashCode method must produce the same integer result on each of the two objects.
  • If the two objects are not equal according to the equals(Object) method, you do not need to call the hashCode method on the two objects that must produce different integer results. However, programmers should realize that generating different integer results for unequal objects can improve the performance of the hash table.

The first point reflects the consistency attribute of the equals method, and the second point is the requirements we have drawn above. The third point illustrates an important detail that we will discuss later.

Implement hashCode Method

A very simple Person.hashCode implementation is as follows:

<code class="language-java">List<string> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");</string></code>

The human hash code is calculated by calculating the hash codes of related fields and combining them together. Both are left to the utility function Objects for processing. hash

Select field

But which fields are related? These requirements help answer this question: if an equal object must have the same hash code, the hash code calculation should not contain any fields that are not used for equality checks. (Otherwise, only two objects that are different on these fields will be equal, but have different hash codes.)

Therefore, the set of fields used for hashing should be a subset of the set of fields used for equality. By default, both will use the same fields, but there are some details to consider.

Consistency

First of all, there are consistency requirements. It should be interpreted quite strictly. While it allows the hash code to change when some fields change (which is often inevitable for mutable classes), the hash data structure is not ready for this scenario.

As we saw above, the hash code is used to determine the bucket of the element. However, if the hash-related fields change, the hash is not recalculated and the internal array is not updated.

This means that subsequent queries using equal objects or even using exactly the same instance will fail! The data structure computes the current hash code (unlike the hash code used to store instances) and looks for it in the wrong bucket.

Conclusion: It is best not to use variable fields for hash code calculation!

Performance

The number of times the hash code is calculated may be approximately the same as the number of times the

method is called. This is likely to happen in the critical performance part of the code, so it makes sense to consider performance. And unlike the equals method, there is more room for optimization here. equals

Unless complex algorithms are used or many fields are involved, the arithmetic cost of combining its hash code is as trivial as the inevitable cost. But consider whether all fields need to be included in the calculation! In particular, the set should be skeptical. For example, lists and collections will calculate the hash value for each of their elements. Whether they need to be called should be considered based on the situation.

If performance is critical, using

may also not be the best choice, as it requires creating an array for its mutable parameters. Objects.hash

But the general rule about optimization still applies: don't optimize too early! Using common hash code algorithms, perhaps abandoning inclusion sets and optimization is only done after performance analysis shows that there is a possibility of improvement.

Conflict

Take all your efforts to pursue performance, so how about this implementation?

<code class="language-java">List<string> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");</string></code>
It must be fast. And equal objects will have the same hash code, so we're also fine in this regard. As a bonus, no variable fields are involved!

But remember what we had about buckets before? This way all instances will enter the same bucket! This usually results in a linked list holding all elements, which is very bad for performance. For example, each

call triggers a linear scan of the linked list. contains

Therefore, we want to minimize the number of items in the same bucket! An algorithm that returns a hash code that is very different even for very similar objects is a good start. How to implement depends in part on the selected field. The more details we include in the calculation, the greater the possibility that the hash code is different. Note that this is the exact opposite of what we think about performance. So, it is interesting to note that using too many or too few fields can lead to poor performance.

Another part of preventing conflicts is the algorithm used to actually calculate the hash.

Calculate hash value

The easiest way to calculate the field hash code is to call the

method on it. They can be combined manually. A common algorithm is to start with an arbitrary number, then repeatedly multiply it with another number (usually a small prime number) and then add the hash of the field: hashCode

<code class="language-java">List<string> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");</string></code>
This may cause overflow, but this does not cause exceptions in Java, so there is no big problem.

Note that even excellent hashing algorithms can lead to unusually frequent conflicts if the input data has a specific pattern. As a simple example, suppose we calculate the hash value of a point by adding the x and y coordinates of the point. This sounds pretty good until we realize that we often deal with points on the straight line f(x) = -x, which means that for all of these points, x y == 0. Conflict, a lot!

But again: use common algorithms and don't worry unless performance analysis shows problems.

Summary

We have seen that calculating hash codes is like compressing equality into integer values: equal objects must have the same hash code, and for performance reasons it is best to share the same as few unequal objects as possible as possible with the same hash code.

This means that if the

method is rewritten, the equals method must always be rewritten. hashCode

Implement

Method: hashCode

    Use the same fields (or subsets thereof) as used in the
  • method. equals
  • It is best not to include variable fields.
  • Consider not calling the
  • method on the collection. hashCode
  • Use common algorithms unless the pattern of input data is the opposite.
Remember, the

method is related to performance, so don't waste too much energy unless the performance analysis shows it is necessary. hashCode

FAQs on Correct Implementing Java

Methods (FAQ)hashCode

What is the significance of the

method in Java? hashCode()

The

method in Java is a built-in function that returns an integer value. It is mainly used for hash-based collections (such as hashCode(), HashMap and HashSet) to store and retrieve objects more efficiently. The HashTable method works in conjunction with the hashCode() method to ensure that each object has a unique identifier. This helps to quickly retrieve data, especially in large collections, thereby improving the performance of Java applications. equals()

How does the hashCode() method work in Java?

The hashCode() method in Java works by generating an integer value that represents the memory address of the object. This value is used as the index number of the object in a hash-based collection. When you call the hashCode() method on the object, it uses a hashing algorithm to generate this unique integer. However, it is important to note that two different objects may have the same hashCode, which is called hash conflict.

What is the convention between the

and equals() methods in hashCode() in Java?

The convention between

and equals() methods in hashCode() in Java is a set of rules used to manage their interactions. The convention states that if two objects are equal according to the equals() method, the call to the hashCode() method must produce the same integer result on each of the two objects. This ensures consistency and accuracy when storing and retrieving objects in a hash-based collection.

How to rewrite the hashCode() method in Java?

Rewrite the hashCode() in Java method includes providing your own implementation that returns a unique integer for each object. This can be achieved by using instance variables of the object and prime multiplier. Prime numbers help to evenly distribute the hash codes in the set, thereby reducing the possibility of hash collisions.

What is hash conflict and how to avoid it?

Hash conflict means that the hashCode() method generates the same integer for two different objects. If not handled properly, this can lead to data loss. To avoid hash conflicts, you can improve the hash algorithm to generate more unique integers. Furthermore, using larger prime numbers as multipliers can help to more evenly distribute the hash codes in the set.

Why should the hashCode() method be rewritten?

Rewrite hashCode() Methods can improve the performance of Java applications, especially when dealing with large collections. By providing your own implementation, you can generate more unique and evenly distributed hash codes, reducing the possibility of hash conflicts and ensuring faster data retrieval.

In Java, can two unequal objects have the same hashCode?

Yes, in Java, two unequal objects can have the same hashCode. This is called hash conflict. However, by improving the hashing algorithm and using a larger prime number as multiplier, the possibility of this happening can be reduced.

What happens if I don't rewrite the hashCode() method?

If you don't override the hashCode() method, Java will use its default implementation, which may not provide a unique hash code for each object. This can lead to hash conflicts and slower data retrieval in hash-based collections.

hashCode() How to improve the performance of Java applications?

The

hashCode() method improves the performance of a Java application by providing a unique identifier for each object. This allows data to be retrieved faster in hash-based collections, as the object can be found directly using the hash code of the object without searching the entire collection.

Can I use the hashCode() method in a non-hash-based collection?

Although the hashCode() method is mainly used for hash-based collections, it can also be used for non-hash-based collections. However, the benefits may be less obvious, because non-hash-based collections do not rely on hash code for data storage and retrieval.

The above is the detailed content of How to Implement Java's hashCode Correctly. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn