HomeAboutMeBlogGuest
© 2025 Sejin Cha. All rights reserved.
Built with Next.js, deployed on Vercel
해시 테이블

해시 테이블

Reference (참고 사이트)
6.5. Hashing - Problem Solving with Algorithms and Data Structures
In previous sections we were able to make improvements in our search algorithms by taking advantage of information about where items are stored in the collection with respect to one another. For example, by knowing that a list was ordered, we could search in logarithmic time using a binary search.
6.5. Hashing - Problem Solving with Algorithms and Data Structures
https://runestone.academy/ns/books/published/pythonds/SortSearch/Hashing.html
Hash table - Wikipedia
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.
Hash table - Wikipedia
https://en.wikipedia.org/wiki/Hash_table
Hash table - Wikipedia
Hash Tables
Hash tables (also known as hash maps) are associative arrays, or dictionaries, that allow for fast insertion, lookup and removal regardless of the number of items stored.
Hash Tables
https://programming.guide/hash-tables.html

What is a Hash Table?

Data stucture that stores elements in key-value pairs where
  • Key - unique integer (or property name is Javascript objects) that is used for indexing the values
  • Value - data that is associated with the key
  • Bucket - array cell/slot where nodes of key-value pairs are stored
notion image
notion image

For fast data store and retrieval

  • In a linear search (ie. linked lists) and binary search (ie. BST), data lookups/search has a time complexity of O(n) and O(log n). For large datasets, these complexities can become significantly high
  • Hashing allows lookups to occur in constant time (O(1))

Actions and their time complexity

insert ⇒ O(1)
  • dependant on hash function (but it is assumed that most programming languages uses a hash function that is O(1))
lookup (finding a value based on a key) ⇒ O(1)
  • same as insert. given property name will be hashed and direct exactly to the address to find values
  • however, depends on whether there are any collisions
delete ⇒ O(1)
  • same thing because we use the key
  • also, hash tables are unordered, we don’t have to shift indexes like we would with arrays
search (check if certain key exists) ⇒ O(1)
  • same thing as lookup

Hash functions (해시 함수)

A hash function is used for mapping each element of a dataset to indexes in the table. The function generates a value of fixed length for each input that it gets. There are many types of hash functions (ie. md5, SHA-1, SHA-256, etc.) and we can assume that the hash functions in our programming languages use a type that has a time complexity of O(1).
 
A hash function for a hash table will generate a hash output, which is then used as an index in the hash table and an index space (or an address space) is alloted for that index.
 
Some key aspects of hash functions:
  • One-way function (일방적) - it’s practically impossible to get what the input would be by the output (the key value cannot be retrieved from the hash)
  • Idempotent (멱등성) (The same input will give the same output) - It’s not a random gibberish generator. If you keep inputing the same value, its output will be the same until you change the input value; avoids producing the same hash for different keys.

Hash Collisions (해시 충돌)

A collision occurs when two keys get mapped to the same index/memory space.
notion image
Remember, our computer has limited space and when we create a hash table, the computer decides how much space to allocate. It doesn’t decide to allocate all the space to this hash table; just a bit of it.
 
Within this limited space, the hash function randomly assigns a space and memory to store the data. Although hash functions are optimized to try to distribute data all over the available slots in memory, there is still a possibility that two keys will result in the same index. This situation where a newly inserted key maps to an already occupied slot in the has table is called a hash collision and there are mutliple ways of handling collisions.
 

Load Factor & Capacity

The load factor is the average number of key-value pairs per bucket.
When the load factor reaches a given limit, rehashing kicks in. Since rehashing increases the number of buckets, it reduces the load factor.
 
However, there is a tradeoff between time and space costs that we must consider when configuring the load factor limit.
notion image
Statistically speaking, the best performance for hash table is to operate between 0.25 and 0.75 load factor.
 
The capacity is the maximum number of key-value pairs for the given load factor limit and current bucket count. Since rehashsing increases the number of buckets, it increases the capacity.
Here is an example of a hash table, configured with a load factor limit of 4
notion image

Methods to resolve collisions (해시 충돌 해결)

Separate Chaining (분리 연결법)

Separate Chaining - formatting linked lists on collisions
Separate Chaining - formatting linked lists on collisions

Data structures used for Separate Chaining:

  1. Linked List
  1. Self Balancing Binary Search Tree (Red-Black Tree)

Rehashing (Resize)

As the lengths of the linked lists grow, the average lookup time increases. Therefore, to keep linked lists short and lookups fast, the number of buckets must be increased (and nodes redistributed). This process is called rehashing.
notion image
 

Open Addressing (개방 주소법)

Tries to find another open slot to hold the item that caused the collision.
notion image

Linear Probing

  • Systematically visiting each slot one at a time to find the next open slot (adding +1)
  • Creates a tendency for clustering
    • If many collisions occur at the same hash slot, the surrounding slots will be filled due to linear probing. Therefore, this will have an impact on items that are being inserted later (inducing collisions into other collisions)
linearProbing(pos) { return (pos + 1) % tableSize; } newHashValue = linearProbing(oldHashValue);

Quadratic Probing

  • Similar to linear probing but instead of adding +1, adds a successive value of an arbitrary quadratic polynomial until an open slot is found. This is to lessen the problem of clustering.
notion image
quadraticProbing(pos) { for(let i = 0; i < tableSize; i++) { return (pos + i * i) % tableSize; } }
 

Open Addressing vs. Separate Chaining

  • 일단 두 방식 모두 Worst Case 에서 O(n)이다
  • 하지만 Open Address 방식은 연속된 공간에 데이터를 저장하기 때문에 Separate Chaining에 비해 캐시 효율이 높다. 따라서 데이터의 개수가 충분히 적다면 Open Address방식이 Separate Chaining 보다 더 성능이 좋다.
  • 한 가지 차이점이 더 존재한다. Separate Chaining 방식에 비해 Open Address 방식은 버킷을 계속해서 사용한다. 따라서 Separate Chaining 방식은 테이블의 확장(rehash)을 보다 늦출 수 있다.
    • 보조 해시 함수(supplement hash function)의 목적은 key의 해시 값을 변형하여 해시 충돌 가능성을 줄이는 것이다. Separate Chaining 방식을 사용할 때 함께 사용되며 보조 해시 함수로 Worst Case 에 가까워지는 경우를 줄일 수 있다.

Arrays vs. Sets and Objects vs. Maps

Arrays
notion image
  • Use when you need ordered list of values (and it might contain duplicates)
  • Use when you need to manipulate data
 
Sets
notion image
  • Use when you need to work with unique values
  • Use when high-performance is really important (operations like searching for an item or deleting an item from a set can be 10x faster than in arrays)
  • Use to remove duplicates from arrays
Objects
notion image
  • More “traditional” key/value store
  • Easier to write and access values with . and [] operators

  • Use when you need to include functions (methods) and also use this keyword to access properties of the same object
  • Use when working with JSON (can convert to map)
Maps
notion image
  • Better performance
  • Keys can have any data type
  • Easy to iterate
  • Easy to compute size of map

  • Use when you simply need to map key to values
  • Use when you need key that are NOT strings

Implementing Hash Tables (해시 테이블 구현)

Simple hash function

_hash(key) { let hash = 0; for(let i = 0; i < key.length; i++){ hash = (hash + key.charCodeAt(i) * i) % this.data.length } return hash; }

Simple hash table constructor

class HashTable { constructor(size){ this.limit = size; // set inital limit size this.data = new Array(this.limit); this.count = 0; // count how many key-values are currently stored } _hash(key) { let hash = 0; for (let i = 0; i < key.length; i++){ hash = (hash + key.charCodeAt(i) * i) % this.data.length } return hash; } _resize(limit) { const oldData = this.data; this.limit = limit; this.data = new Array(this.limit); this.count = 0; // iterate through oldData and insert into new for(let bucket of oldData) { if(bucket) { for(let arr of bucket){ this.insert(bucket[arr][0], bucket[arr][1]);; } } } } insert(key, value) { const index = this._hash(key); if(!this.data[index]) { this.data[index] = []; } this.data[index].push([key, value]); this.count++; // check if new insert exceeds 75% limit if(this.count / this.limit > 0.75) { this._resize(this.limit * 2); } } lookup(key) { const index = this._hash(key); const currentBucket = this.data[index]; if(!currentBucket) return console.error('Item does not exist'); for(const i in currentBucket){ if(currentBucket[i][0] === key) return currentBucket[i][1]; }; } keys() { const keysArr = []; for(let bucket of this.data) { if(bucket) { for(let arr of bucket) { keysArr.push(arr[0]) } } } return keysArr; } } const myHashTable = new HashTable(100); myHashTable.set('grapes', 10000); myHashTable.set('grape', 12345); myHashTable.set('apples', 500); myHashTable.set('banana', 2000); console.log(myHashTable); console.log(myHashTable.get('grape'));
  • In keys(), we have to loop over the whole length of the hash table & contains nested loop (time complexity of O(n^2)) to find existing keys because it is unordered and we need to consider for collisions
    • This is why in JavaScript, for...in method is very slow
 

Exercise: Find First Recurring Character in Array

Test Case
array
return
[2, 5, 1, 2, 3, 5, 1, 2, 4]
2
[2, 1, 1, 2, 3, 5, 1, 2, 4]
1
[2, 3, 4, 5]
undefined