Use it every day! Do you know what HASH is?-javaTutorial-php.cn

Home

Java

javaTutorial

Use it every day! Do you know what HASH is?

Java学习指南

Jul 26, 2023 pm 02:47 PM

hash

Have you ever used HashMap? So have you seriously understood what Hash is? This article will take you through an in-depth study of the Hash algorithm from a principle level.

1. The concept of hashing

The main idea of the hashing method is to determine the storage address of the node based on the key code value: based on the key code The value K is an independent variable. Through a certain functional relationship h(K) (called a hash function), the corresponding function value is calculated. This value is interpreted as the storage address of the node, and the node is stored in this storage. in the unit. When retrieving, use the same method to calculate the address, and then go to the corresponding unit to get the node you are looking for. Nodes can be quickly retrieved through hashing. Hash (also called "hash") is an important storage method and a common retrieval method.

The storage structure constructed according to the hash storage method is called a hash table (hash table). A position in the hash table is called a slot. The core of hashing technology is the hash function. For any given dynamic lookup table DL, if an "ideal" hash function h and the corresponding hash table HT are selected, then for each data element X in DL. The function value h (X.key) is the storage location of X in the hash table HT. Data element X will be placed at this position when inserting (or creating a table), and will also be searched at this position when retrieving X. The storage location determined by the hash function is called the hash address. Therefore, the core of hashing is: the hash function determines the corresponding relationship between the key code value (X.key) and the hash address h (X.key). Through this relationship, organizational storage and retrieval are realized.

Generally, the storage space of the hash table is a one-dimensional array HT[M], and the hash address is the subscript of the array. The goal of designing a hashing method is to design a certain hash function h, 0

2. Hash function

In the following discussion, we assume that we are dealing with key codes with integer values. Otherwise, we can always create a key code with a normal value. One-to-one correspondence between integers, thereby converting the retrieval of the key into a retrieval of the corresponding positive integer; at the same time, it is further assumed that the value of the hash function falls between 0 and M-1. The principles for selecting a hash function are: the operation is as simple as possible; the value range of the function must be within the range of the hash table; the nodes should be distributed evenly as much as possible, that is, try to make different keys have different hash function values. Various factors need to be considered: key length, hash table size, key distribution, record retrieval frequency, etc. Below we introduce several commonly used hash functions.

1. Division by remainder method

#As the name suggests, division by remainder method is to divide the key code x by M (often taking the length of the hash table), And take the remainder as the hash address. The remainder method is almost the simplest hashing method, and the hash function is: h(x) = x mod M.

2. Multiply remainder method

When using this method, first multiply the key code by a constant A (0

##3. Square method

Since integer division usually runs slower than multiplication, consciously avoid using the remainder Legal operations can improve the running time of hashing algorithms. The specific implementation of the square-middle method is to first find the square value of the key code to expand the difference between similar numbers, and then take the middle few digits (often binary bits) according to the length of the table as the hash function value. Because the middle digits of a product are related to each digit of the multiplier, the resulting hash address is more uniform.

Use it every day! Do you know what HASH is?

4. Numerical analysis method

Assume that each keyword in the keyword set is It consists of s digits (u1, u2, …, us), analyzes the entire keyword set, and extracts several evenly distributed bits or their combination as addresses. The digital analysis method is a method of taking some digital bits with relatively uniform values in the data element keywords as the hash address. That is, when the keyword has a lot of digits, you can analyze the bits of the keyword and discard the unevenly distributed bits as a hash value. It is only suitable when all keyword values are known. By analyzing the distribution, the keyword value range is converted into a smaller keyword value range.

For example: To construct a hash table with the number of data elements n=80 and the hash length m=100. Without loss of generality, we only give 8 keywords for analysis here. The 8 keywords are as follows:

K1=61317602 K2=61326875 K3=62739628 K4=61343634

K5=62706815 K6=62774638 K7=61381262 K8=61394220

Analysis of the above 8 keywords shows that the values of the 1st, 2nd, 3rd, and 6th digits from left to right of the keywords are relatively concentrated and should not be used as Haha. Hash address, the remaining 4th, 5th, 7th, and 8th bits have even values, and two of them can be selected as the hash address. Suppose the last two digits are selected as the hash address, then the hash addresses of these eight keywords are: 2, 75, 28, 34, 15, 38, 62, 20.

This method is suitable for: being able to estimate in advance the frequency of occurrence of various numbers on each digit of all keywords.

5. Base conversion method

Consider the key value as a number in another base system and then convert it into the original base number , and then select several of them as hash addresses.

Example Hash(80127429)=(80127429)13=8137 0136 1135 2134 7133 4132 2*131 9=(502432641)10 If you take the middle three digits as the hash value, you get Hash(80127429)=432 In order to obtain a good hash function, several methods can be used in combination, such as changing the base first, then folding or squaring, etc. As long as the hash is uniform, you can piece it together at will.

6. Folding method

#Sometimes the key code contains many digits, and it is too complicated to use the square-centering method, then you can The key code is divided into several parts with the same number of digits (the number of digits in the last part can be different), and then the superposition sum of these parts (carry out) is taken as the hash address. This method is called the folding method.

is divided into:

Shift superposition: Align and add the low bits of the divided parts.
Border Overlay: Fold back and forth from one end along the dividing boundary, then align and add.

3. Conflict resolution strategy

Although the goal of the hash function is to minimize conflicts, in fact conflicts are unavoidable. Therefore, we must look into conflict resolution strategies. Conflict resolution techniques can be divided into two categories: open hashing (also known as zipper method, separate chaining) and closed hashing method (closed hashing, also known as open address method, open addressing). The difference between these two methods is that the open hashing method stores the colliding key outside the main hash table, while the closed hashing method stores the colliding key in another slot in the table.

1. Separated linked list method

(1)Zipper method

A kind of hashing method The simple form is to define each slot in the hash table as the head of a linked list. All records that hash to a specific slot are put into the linked list for that slot. Figure 9-5 illustrates an open hash table in which each slot stores a record and a pointer to the rest of the linked list. These 7 numbers are stored in a hash table with 11 slots, and the hash function used is h(K) = K mod 11. The insertion order of numbers is 77, 7, 110, 95, 14, 75 and 62. There are 2 values hashed to slot 0, 1 value hashed to slot 3, 3 values hashed to slot 7, and 1 value hashed to slot 9.

2. Closed hashing method (open address method)

The closed hashing method combines all Records are stored directly in the hash table. Each record key key has a base position calculated by a hash function, that is, h(key). If a key is to be inserted and another record already occupies the base position of R (a collision occurs), then R is stored at another address in the table, and the conflict resolution policy determines which address it is.

The basic idea of closed hash table conflict resolution is: when a conflict occurs, use a certain method to generate a hash address sequence d0, d1, d2,...di,...dm for the key K -1. Among them, d0=h (K) is called the base address home position of K; all di (0

Different methods of forming probes lead to different methods of resolving conflicts. Below are several common construction methods.

(1) Linear detection method

Treat the hash table as a ring table. If a conflict occurs at the base address d (i.e. h(K)=d) , then the following address units are explored in sequence: d 1, d 2,..., M-1, 0, 1,..., d-1 until a free address is found or the key code is found to the node of key. Of course, if you return to address d after searching along the probing sequence, it means failure whether you are doing an insertion operation or a retrieval operation. The probe function for simple linear probe is: p(K,i) = i

Example 9.7 It is known that a set of key codes is (26, 36, 41, 38, 44, 15, 68, 12, 06, 51, 25), the length of the hash table M = 15, and the linear exploration method is used to resolve the conflict structure A hash table of this set of keys. Because n=11, use the remainder method to construct a hash function and select the largest prime number P=13 less than M. Then the hash function is: h(key) = key. Insert each node in order: 26: h(26) = 0, 36: h(36) = 10, 41: h(41) = 2, 38: h(38) = 12, 44: h(44) = 5. When 15 is inserted, its hash address is 2. Since 2 is already occupied by the element with key code 41, it needs to be probed. According to the sequential probing method, it is obvious that 3 is an open free address, so it can be placed in unit 3. Similarly, 68 and 12 can be placed in units 4 and 13 respectively.

(2) Secondary exploration method

The basic idea of the secondary exploration method is: The generated subsequent hash addresses are not consecutive, but skipped to leave room for subsequent data elements and thus reduce aggregation. The detection sequence of the secondary detection method is: 12, -12, 22, -22,... etc. That is to say, when a conflict occurs, the synonyms are hashed back and forth at both ends of the first address. The formula for finding the next open address is:

##(3) Random exploration method

The ideal probing function should randomly select the next position from an unvisited slot in the probing sequence, that is, the probing sequence should be a random permutation of the hash table positions. However, we cannot actually randomly select a position from the probe sequence because the same probe sequence cannot be established when retrieving the key. However, we can do something like pseudo-random probing. In pseudo-random probing, the i-th slot in the probing sequence is (h(K) ri) mod M, where ri is a "random" sequence of numbers between 1 and M - 1. All insertions and retrievals use the same "random" number. The probe function will be p(K,i) = perm[i - 1], where perm is an array of length M - 1 containing a random sequence of values from 1 to M - 1.

Example:

For example, the known hash table length m=11, the hash function is: H (key) = key % 11, then H (47) = 3, H ( 26)=4, H(60)=5, assuming the next keyword is 69, then H(69)=3, which conflicts with 47. If linear detection and then hashing are used to handle conflicts, the next hash address is H1=(3 1)% 11 = 4, and there is still a conflict, and then the next hash address is H2=(3 2)% 11 = 5, still the same If there is a conflict, continue to find the next hash address as H3 = (3 3)% 11 = 6. At this time, there is no conflict. Fill 69 into unit 5, see Figure 8.26 (a). If you use secondary detection and hashing to handle conflicts, the next hash address is H1=(3 12)% 11 = 4. If there is still a conflict, find the next hash address H2=(3 - 12)% 11 = 2 , there is no conflict at this time, fill 69 into unit 2, see Figure 8.26 (b). If pseudo-random detection and then hashing are used to handle conflicts, and the pseudo-random number sequence is: 2, 5, 9,…….., then the next hash address is H1 = (3 2)% 11 = 5, and there is still a conflict. Find the next hash address as H2 = (3 5)% 11 = 8. At this time, there is no conflict. Fill 69 into unit 8, see Figure 8.26 (c).

(4)Double hash detection method

Both pseudo-random probing and secondary probing can eliminate the problem of basic aggregation - that is, key codes with different base addresses, and some segments of their probing sequences overlap. However, if two keys hash to the same base address, the same probe sequence will still be obtained using both methods, and aggregation will still occur. This is because the probing sequence generated by pseudo-random probing and secondary probing is only a function of the base address, not a function of the original key value. This problem is called secondary clustering.

In order to avoid secondary aggregation, we need to make the exploration sequence a function of the original key value rather than a function of the base position. The double hash detection method uses the second hash function as a constant, skips the constant term each time, and performs linear detection.

The above is the detailed content of Use it every day! Do you know what HASH is?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:Java学习指南. If there is any infringement, please contact admin@php.cn delete

Is Java Platform Independent if then how?May 09, 2025 am 12:11 AM

Java is platform-independent because of its "write once, run everywhere" design philosophy, which relies on Java virtual machines (JVMs) and bytecode. 1) Java code is compiled into bytecode, interpreted by the JVM or compiled on the fly locally. 2) Pay attention to library dependencies, performance differences and environment configuration. 3) Using standard libraries, cross-platform testing and version management is the best practice to ensure platform independence.

The Truth About Java's Platform Independence: Is It Really That Simple?May 09, 2025 am 12:10 AM

Java'splatformindependenceisnotsimple;itinvolvescomplexities.1)JVMcompatibilitymustbeensuredacrossplatforms.2)Nativelibrariesandsystemcallsneedcarefulhandling.3)Dependenciesandlibrariesrequirecross-platformcompatibility.4)Performanceoptimizationacros

Java Platform Independence: Advantages for web applicationsMay 09, 2025 am 12:08 AM

Java'splatformindependencebenefitswebapplicationsbyallowingcodetorunonanysystemwithaJVM,simplifyingdeploymentandscaling.Itenables:1)easydeploymentacrossdifferentservers,2)seamlessscalingacrosscloudplatforms,and3)consistentdevelopmenttodeploymentproce

JVM Explained: A Comprehensive Guide to the Java Virtual MachineMay 09, 2025 am 12:04 AM

TheJVMistheruntimeenvironmentforexecutingJavabytecode,crucialforJava's"writeonce,runanywhere"capability.Itmanagesmemory,executesthreads,andensuressecurity,makingitessentialforJavadeveloperstounderstandforefficientandrobustapplicationdevelop

Key Features of Java: Why It Remains a Top Programming LanguageMay 09, 2025 am 12:04 AM

Javaremainsatopchoicefordevelopersduetoitsplatformindependence,object-orienteddesign,strongtyping,automaticmemorymanagement,andcomprehensivestandardlibrary.ThesefeaturesmakeJavaversatileandpowerful,suitableforawiderangeofapplications,despitesomechall

Java Platform Independence: What does it mean for developers?May 08, 2025 am 12:27 AM

Java'splatformindependencemeansdeveloperscanwritecodeonceandrunitonanydevicewithoutrecompiling.ThisisachievedthroughtheJavaVirtualMachine(JVM),whichtranslatesbytecodeintomachine-specificinstructions,allowinguniversalcompatibilityacrossplatforms.Howev

How to set up JVM for first usage?May 08, 2025 am 12:21 AM

To set up the JVM, you need to follow the following steps: 1) Download and install the JDK, 2) Set environment variables, 3) Verify the installation, 4) Set the IDE, 5) Test the runner program. Setting up a JVM is not just about making it work, it also involves optimizing memory allocation, garbage collection, performance tuning, and error handling to ensure optimal operation.

How can I check Java platform independence for my product?May 08, 2025 am 12:12 AM

ToensureJavaplatformindependence,followthesesteps:1)CompileandrunyourapplicationonmultipleplatformsusingdifferentOSandJVMversions.2)UtilizeCI/CDpipelineslikeJenkinsorGitHubActionsforautomatedcross-platformtesting.3)Usecross-platformtestingframeworkss

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Blue Prince: How To Get To The Basement

1 months agoByDDD

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

1664

1423

1318

1269

1248