ホームページ >ウェブフロントエンド >jsチュートリアル >ハッシュテーブル : 衝突、サイズ変更、ハッシュ

ハッシュテーブル : 衝突、サイズ変更、ハッシュ

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBオリジナル: 2024-08-17 08:50:02966ブラウズ

Hash Tables : Collisions, Resizing, Hashing

Big O 記法を理解していることを前提としています。例は JavaScript で示されています。情報参照先「Cracking thecoding Interview」ゲイル・ラークマン・マクダウェル著

ハッシュテーブルを理解する

辞書、ハッシュマップ、ハッシュテーブルについて聞いたことがあるかどうかに関係なく、それらはすべて本質的に同じです。このブログでは、わかりやすくするために、このデータ構造をハッシュテーブルとして参照します。ハッシュテーブルとは何かを定義することから始めましょう。 ハッシュテーブル は、非常に効率的な検索を実現するために、キーと値のペアの形式でキーを値にマップするデータ構造です。実装するには複数の方法があります。

ハッシュテーブルの主要なコンポーネント

リンクされたリストの配列とハッシュ関数を使用して、ハッシュテーブルを実装できます。ハッシュ関数とは何かをさらに詳しく見てみましょう。

ハッシュ関数とは何ですか?

ハッシュ関数は、ハッシュテーブルの重要なコンポーネントです。これは、入力 (または「キー」) を受け取り、固定サイズのバイト文字列を通常は整数の形式で返す、通常は関数の形式のアルゴリズムです。出力はハッシュコード、または単にハッシュと呼ばれます。ハッシュテーブルのコンテキストにおけるハッシュ関数の主な目的は、ハッシュコードをバケット/スロットの配列の有効なインデックスにマップし、そこから目的の値を見つけることです。この場合、これらのバケット/スロットはリンクリストになります。 優れたハッシュ関数の特徴:

決定的: 指定された入力に対して、常に同じハッシュ出力を生成する必要があります。
均一分布: 予想される入力を出力範囲全体にわたってできるだけ均等にマッピングする必要があります。
効率的: 計算が速いはずです。
雪崩効果: 入力の小さな変更により、ハッシュ出力が大幅に異なります。

リンクされたリストの配列を使用する理由

ハッシュテーブルの実装におけるリンクリストの配列の使用は、チェーンとして知られる一般的な手法です。このアプローチにはいくつかの利点があります。

衝突処理: リンクリストの配列を使用する主な理由は、衝突を効率的に処理することです。 2 つのキーをハッシュして同じインデックスにマップする場合、新しいキーと値のペアをそのインデックスのリンクリストに追加するだけです。
スペース効率: ハッシュテーブルが基になる配列のサイズよりも多くの項目を格納できるようになります。各配列スロットは、リンクされたリストを通じて複数の項目を保持できます。

Array:  [0] -> (key1, value1) -> (key2, value2)
        [1] -> (key3, value3)
        [2] -> (key4, value4) -> (key5, value5) -> (key6, value6)
        [3] -> Empty
        [4] -> (key7, value7)

この例では、キー 1 と 2 はインデックス 0 にハッシュされ、キー 4、5、および 6 はすべてインデックス 2 にハッシュされます。

キーと値のペアの挿入

ハッシュ関数とチェーンを使用する理由をよく理解したところで、キーと値のペアをハッシュテーブルに挿入するフローを見てみましょう。

キー (任意の値) を挿入するときは、最初にキーの ハッシュコード (通常は int または long) を計算します。 キーの数は無限で、整数の数は有限である可能性があるため、2 つの異なるキーが同じハッシュコードを持つ可能性があります。
ハッシュコードを配列内のインデックスにマッピングします。ハッシュコードを配列にマッピングする一般的な方法は、モジュラス演算子 を使用することです。 (例: hash(key) % array.length))。 このメソッドを使用すると、2 つの異なるハッシュコードが同じインデックスにマッピングされる可能性があります。
インデックスには、キーと値のリンクされたリストがあります。キーと値のペアをこのインデックスに保存します。キーに同じハッシュコードがある場合、またはハッシュコードが同じインデックスにマップされている場合、衝突が発生します。

キーと値のペアへのアクセス

ハッシュテーブルの実装では、キーと値のペアへのアクセスが非常に効率的です。単純にキーからハッシュコードを計算し、次にハッシュコードからインデックスを計算し、最後にリンクリストでこのキーの値を検索します。実装が適切であると仮定すると、キーと値のペア (挿入と削除も) にアクセスするにはがかかります

O (1) O(1)

What Makes a Hash Table Implementation "Good"?

A well-implemented hash table should balance efficiency, space utilization, and collision handling. Here are the key factors that contribute to a good hash table implementation:

A Good Hash Function

The heart of any hash table is its hash function. A good hash function should:

Be quick to compute
Minimize collisions

Optimal Load Factor

The load factor is the ratio of filled slots to total slots in the hash table. Maintaining an appropriate load factor is crucial:

A typical sweet spot is between 0.6 and 0.75

Too low (< 0.5): Wastes space
Too high (> 0.8): Increases collision risk

Collision Resolution Techniques

Two primary methods for handling collisions are:

Chaining: Each table position stores a linked list of collided items. Simple to implement but can lead to slower lookups if chains become long.
Open Addressing: If a collision occurs, look for the next available slot. Keeps all data in the table but requires careful implementation to avoid clustering of stored data.

Note that chaining and open-addressing cannot coexist easily. Logically, it would not make sense to look for the next available slot but store collided items at a specific index.

Dynamic Resizing

As the number of elements grows, the hash table should resize to maintain performance:

Typically, the table size is doubled when the load factor exceeds a threshold. All elements need to be rehashed into the new, larger table.

This operation is expensive but infrequent, keeping the amortized time complexity at O(1).

JavaScript Implementation

This implementation will utilize resizing and chaining for collision resolution. We will assume that our keys are integers.

For the hash function + mapping, we will keep it very simple and simply perform the following given a key:

k e y % a r r a y c a p a c i t y key \hspace{0.2cm} \% \hspace{0.2cm} array \hspace{0.1cm} capacity

Classical OOP

class HashNode {
    constructor(key, value) {
        this.key = key;
        this.value = value;
        this.next = null;
    }
}

class HashTable {
    constructor(capacity = 16) {
        this.capacity = capacity;
        this.size = 0;
        this.buckets = new Array(this.capacity).fill(null);
        this.threshold = 0.75;
    }

    hash(key) {
        return key % this.capacity;
    }

    insert(key, value) {
        const index = this.hash(key);
        if (!this.buckets[index]) {
            this.buckets[index] = new HashNode(key, value);
            this.size++;
        } else {
            let currentNode = this.buckets[index];
            while (currentNode.next) {
                if (currentNode.key === key) {
                    currentNode.value = value;
                    return;
                }
                currentNode = currentNode.next;
            }
            if (currentNode.key === key) {
                currentNode.value = value;
            } else {
                currentNode.next = new HashNode(key, value);
                this.size++;
            }
        }

        if (this.size / this.capacity >= this.threshold) {
            this.resize();
        }
    }

    get(key) {
        const index = this.hash(key);
        let currentNode = this.buckets[index];
        while (currentNode) {
            if (currentNode.key === key) {
                return currentNode.value;
            }
            currentNode = currentNode.next;
        }
        return undefined;
    }

    remove(key) {
        const index = this.hash(key);
        if (!this.buckets[index]) {
            return false;
        }
        if (this.buckets[index].key === key) {
            this.buckets[index] = this.buckets[index].next;
            this.size--;
            return true;
        }
        let currentNode = this.buckets[index];
        while (currentNode.next) {
            if (currentNode.next.key === key) {
                currentNode.next = currentNode.next.next;
                this.size--;
                return true;
            }
            currentNode = currentNode.next;
        }
        return false;
    }

    resize() {
        const newCapacity = this.capacity * 2;
        const newBuckets = new Array(newCapacity).fill(null);
        this.buckets.forEach(head => {
            while (head) {
                const newIndex = head.key % newCapacity;
                const next = head.next;
                head.next = newBuckets[newIndex];
                newBuckets[newIndex] = head;
                head = next;
            }
        });
        this.buckets = newBuckets;
        this.capacity = newCapacity;
    }

    getSize() {
        return this.size;
    }

    getCapacity() {
        return this.capacity;
    }
}

Functional OOP

function createHashTable(initialCapacity = 16) {
    let capacity = initialCapacity;
    let size = 0;
    let buckets = new Array(capacity).fill(null);
    const threshold = 0.75;

    function hash(key) {
        return key % capacity;
    }

    function resize() {
        const newCapacity = capacity * 2;
        const newBuckets = new Array(newCapacity).fill(null);
        buckets.forEach(function(head) {
            while (head) {
                const newIndex = head.key % newCapacity;
                const next = head.next;
                head.next = newBuckets[newIndex];
                newBuckets[newIndex] = head;
                head = next;
            }
        });
        buckets = newBuckets;
        capacity = newCapacity;
    }

    return {
        insert: function(key, value) {
            const index = hash(key);
            const newNode = { key, value, next: null };

            if (!buckets[index]) {
                buckets[index] = newNode;
                size++;
            } else {
                let currentNode = buckets[index];
                while (currentNode.next) {
                    if (currentNode.key === key) {
                        currentNode.value = value;
                        return;
                    }
                    currentNode = currentNode.next;
                }
                if (currentNode.key === key) {
                    currentNode.value = value;
                } else {
                    currentNode.next = newNode;
                    size++;
                }
            }

            if (size / capacity >= threshold) {
                resize();
            }
        },

        get: function(key) {
            const index = hash(key);
            let currentNode = buckets[index];
            while (currentNode) {
                if (currentNode.key === key) {
                    return currentNode.value;
                }
                currentNode = currentNode.next;
            }
            return undefined;
        },

        remove: function(key) {
            const index = hash(key);
            if (!buckets[index]) {
                return false;
            }
            if (buckets[index].key === key) {
                buckets[index] = buckets[index].next;
                size--;
                return true;
            }
            let currentNode = buckets[index];
            while (currentNode.next) {
                if (currentNode.next.key === key) {
                    currentNode.next = currentNode.next.next;
                    size--;
                    return true;
                }
                currentNode = currentNode.next;
            }
            return false;
        },

        getSize: function() {
            return size;
        },

        getCapacity: function() {
            return capacity;
        }
    };
}

以上がハッシュテーブル : 衝突、サイズ変更、ハッシュの詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

JavaScript String Integer Array if for while int using Length operator finally map number function this position input table

声明：

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

前の記事：Google スプレッドシート: 期間 (時間) の SUMIFS、パート 2次の記事：Google スプレッドシート: 期間 (時間) の SUMIFS、パート 2

続きを見る

ハッシュ テーブル : 衝突、サイズ変更、ハッシュ