# Hashtables --- CS 137 // 2021-09-20 ## Administrivia - You should have turned in: + Your reflection for daily exercise 4 + Your solution to daily exercise 5 - No daily exercise for Wednesday + Need to catch up on grading! # Questions ## ...about anything? # Daily Exercise # Binary Search Trees ## Trees - A **tree** is a linked data structure where each node has a reference to **zero or more** other nodes
dfa
A
A
B
B
A->B
C
C
A->C
D
D
A->D
F
F
B->F
G
G
B->G
E
E
C->E
- Trees are **acyclic**, so the arrows only go "top down" ## Binary Trees - A **binary tree** has nodes with at most two children
dfa
A
A
B
B
A->B
C
C
A->C
E
E
B->E
F
F
B->F
G
G
C->G
H
H
C->H
## Binary Trees - The `Node` type for a binary tree would look something like this: ```java class Node
{ E value; // data stored in the node Node left; // left subtree Node right; // right subtree } ``` ## Binary Tree Traversals - An **in-order** traversal is one that iterates over the left subtree first, then the root, then the right subtree ```java void iterateInOrder(Node
root) { if (root != null) { iterateInOrder(root.left); // do something with root.value here iterateInOrder(root.right); } } ``` # Binary Search Trees ## Binary Search Trees - Store key/value pairs just like a dictionary - Keys follow the BST property:
dfa
4 // A
4 // A
2 // B
2 // B
4 // A->2 // B
6 // C
6 // C
4 // A->6 // C
1 // D
1 // D
2 // B->1 // D
3 // E
3 // E
2 // B->3 // E
5 // F
5 // F
6 // C->5 // F
7 // G
7 // G
6 // C->7 // G
## Binary Search Tree ```java class Node
{ K key; // key stored in the node V value; // value stored in the node Node left; // left subtree Node right; // right subtree } ``` ## Implementing a Dictionary with a BST - Recall that common dictionary operations include + `search(key)` + `insert(key, value)` + `delete(key)` ## Searching in a BST ```java V search(Node
node, K key) { if (node == null) return null; else if (key == node.key) return node.value; else if (key < node.key) return search(node.left, key); else return search(node.right, key); } ``` ## Inserting/Deleting in a BST - Similar to searching - For inserting, we search for the location the node *should be* and then add a new node there - For deleting, we search for the node that contains the key and then remove it ## Runtime Complexity of BST - If we implement a dictionary using a BST, then what is the complexity of each of these operations? - $O(h)$ where $h$ is the height of the tree - If a tree has $n$ nodes, what is the height of the tree? + In the worst-case, $O(n)$! ## Self-Balancing Trees - It is possible to implement BSTs that are **self-balancing**, ensuring that $h = O(\log n)$ - Two approaches are [AVL trees](https://en.wikipedia.org/wiki/AVL_tree) and [Red-Black trees](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) ## Self-Balancing Trees - With self-balancing trees, it is possible to implement a dictionary with the following complexity | **Operation** | **BST** | |-----------------|--------------| | `search` | O(log n) | | `insert` | O(log n) | | `delete` | O(log n) | # Hashtables ## Dictionary Example - Suppose I want to map phone numbers to names + `5152714599` $\rightarrow$ `"Titus Klinge"` + `5152712177` $\rightarrow$ `"Eric Manley"` - Since arrays have $O(1)$ random access, so what if we use a large array for the dictionary? + Make the length 10000 + Use the last four digits as the "key" ## Dictionary Example - Notice that we have: + `arr[2177]` $\rightarrow$ `"Eric Manley"` + `arr[4599]` $\rightarrow$ `"Titus Klinge"` - The array behaves just like a map! - **Pros**: $O(1)$ search/lookup/deletion - **Cons**: Might waste a **LOT** of space! ## Hashing Functions - Turning a phone number into an array index was relatively easy---but what about other objects? - This is exactly what a **hash** function is! + In essence, a hash function turns a complicated object into an integer - Using such a function, we can turn our dictionary keys into integers! ## Hashing Functions - **Question**: What if the array is only 10000 elements long but the hashcode is larger than 9999? + We can use the modulo operator! + `hashIndex = hashCode % 10000` ## Example - Suppose I have an array with 2000 elements - Hashcode of my phone number is 4599 - Hash index is `4599 % 2000 = 599` - `arr[599]` $\rightarrow$ `"Titus Klinge"` ## Self Check Questions - **Question:** Will this approach use *all* of the space in the array? + Not necessarily---for example if most phone numbers happen to be odd... lot of space wasted - **Question**: What happens if someone else at Drake has a phone number 515-271-0599? + Hashcode is also 0599, so it **collides** with mine! + We need some way to resolve collisions ## Properties of Good Hash Functions 1. A hash function needs to be **fast**! 2. If `x == y `, then they should have the same hash 3. Distribution should be as **random** as possible to spread out hashcodes ## Collision Resolution - How do we handle if two distinct items have the same hashcode? - Two primary approaches 1. Open addressing 2. Chaining ## Open Addressing - Every slot in the array contains at most one item - When a collision happens, we look for the next available open spot + Sometimes called *linear probing* ## Open Addressing Example - Suppose we have an array of length 7 and we insert the following numbers in order into the table + 5, 13, 7, 28, 14 - How would searching work? - How would removal work? + What if I remove 28 and then search for 14? + Need to insert "dummy" nodes ## Pros/Cons of Open Addressing - **Pros**: Makes efficient use of memory and cache - **Cons**: Lots of dummies if you add/remove a lot ## Chaining - Another idea for handling collisions is **chaining** - Idea: Each slot has a **linked list** of items, all with the same hash code ## Efficiency - Recall that we wanted better than $O(\log n)$ runtime - For hashtables, performance depends on the number of collisions - Things that help minimize collisions: 1. Using a prime number for an array length 2. Having a great hash function - However, things still go bad if the table gets too full ## Efficiency - The **load factor** of a hashtable is: $$\lambda = \frac{\text{\# items}}{\text{size of array}}$$ - Keeping the load factor low is important - Most hashtable implementations reallocate the array (similar to ArrayList) once $\lambda$ reaches $0.75$ ## Efficiency | **Operation** | **Hashtable (expected)** | **Hashtable** (worstcase) | |-----------------|--------------|----| | `search` | $O(\lambda)$ | $O(n)$ | | `insert` | $O(1)$ | $O(1)$ | | `delete` | $O(1)$ | $O(1)$ |