reveal.js

# Hashtables

---

CS 137 // 2021-09-20

## Administrivia
- You should have turned in:
    + Your reflection for daily exercise 4
    + Your solution to daily exercise 5
- 
 No daily exercise for Wednesday
    + Need to catch up on grading!

# Questions
## ...about anything?

# Daily Exercise

# Binary Search Trees

## Trees
- A **tree** is a linked data structure where each node has a reference to **zero or more** other nodes

</div>

- 
 Trees are **acyclic**, so the arrows only go "top down"

## Binary Trees
- A **binary tree** has nodes with at most two children

</div>

## Binary Trees
- The `Node` type for a binary tree would look something like this:
    ```java
    class Node<E> {
        E value;        // data stored in the node
        Node left;      // left subtree
        Node right;     // right subtree
    }
    ```

## Binary Tree Traversals
- An **in-order** traversal is one that iterates over the left subtree first, then the root, then the right subtree

```java
void iterateInOrder(Node<E> root) {
    if (root != null) {
        iterateInOrder(root.left);
        // do something with root.value here
        iterateInOrder(root.right);
    }
}
```

# Binary Search Trees

## Binary Search Trees
- Store key/value pairs just like a dictionary
- Keys follow the BST property:

</div>

## Binary Search Tree
```java
class Node<K,V> {
    K key;          // key stored in the node
    V value;        // value stored in the node
    Node left;      // left subtree
    Node right;     // right subtree
}
```

## Implementing a Dictionary with a BST
- Recall that common dictionary operations include
    + `search(key)`
    + `insert(key, value)`
    + `delete(key)`

## Searching in a BST
```java
V search(Node<K,V> node, K key) {
    if (node == null)
        return null;
    else if (key == node.key)
        return node.value;
    else if (key < node.key)
        return search(node.left, key);
    else
        return search(node.right, key);
}
```

## Inserting/Deleting in a BST
- Similar to searching
- 
 For inserting, we search for the location the node *should be* and then add a new node there
- 
 For deleting, we search for the node that contains the key and then remove it

## Runtime Complexity of BST
- If we implement a dictionary using a BST, then what is the complexity of each of these operations?
- 
 $O(h)$ where $h$ is the height of the tree
- 
 If a tree has $n$ nodes, what is the height of the tree?
    + 
 In the worst-case, $O(n)$!

## Self-Balancing Trees
- It is possible to implement BSTs that are **self-balancing**, ensuring that  $h = O(\log n)$
- 
 Two approaches are [AVL trees](https://en.wikipedia.org/wiki/AVL_tree) and [Red-Black trees](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)

## Self-Balancing Trees
- With self-balancing trees, it is possible to implement a dictionary with the following complexity

| **Operation**   | **BST**      |
|-----------------|--------------|
| `search`        | O(log n)     |
| `insert`        | O(log n)     |
| `delete`        | O(log n)     |

# Hashtables

## Dictionary Example
- Suppose I want to map phone numbers to names
    + `5152714599` $\rightarrow$ `"Titus Klinge"`
    + `5152712177` $\rightarrow$ `"Eric Manley"`
- 
 Since arrays have $O(1)$ random access, so what if we use a large array for the dictionary?
    + Make the length 10000
    + Use the last four digits as the "key"

## Dictionary Example
- Notice that we have:
    + `arr[2177]` $\rightarrow$ `"Eric Manley"`
    + `arr[4599]` $\rightarrow$ `"Titus Klinge"`
- 
 The array behaves just like a map!
- 
 **Pros**: $O(1)$ search/lookup/deletion
- 
 **Cons**: Might waste a **LOT** of space!

## Hashing Functions
- Turning a phone number into an array index was relatively easy---but what about other objects?
- 
 This is exactly what a **hash** function is!
    + 
 In essence, a hash function turns a complicated object into an integer
- 
 Using such a function, we can turn our dictionary keys into integers!

## Hashing Functions
- **Question**: What if the array is only 10000 elements long but the hashcode is larger than 9999?
    + 
 We can use the modulo operator!
    + 
 `hashIndex = hashCode % 10000`

## Example
- Suppose I have an array with 2000 elements
- 
 Hashcode of my phone number is 4599
- 
 Hash index is `4599 % 2000 = 599`
- 
 `arr[599]` $\rightarrow$ `"Titus Klinge"`

## Self Check Questions
- **Question:** Will this approach use *all* of the space in the array?
    + 
 Not necessarily---for example if most phone numbers happen to be odd... lot of space wasted
- 
 **Question**: What happens if someone else at Drake has a phone number 515-271-0599?
    + 
 Hashcode is also 0599, so it **collides** with mine!
    + 
 We need some way to resolve collisions

## Properties of Good Hash Functions
1. A hash function needs to be **fast**!
2. 
 If `x == y `, then they should have the same hash
3. 
 Distribution should be as **random** as possible to spread out hashcodes

## Collision Resolution
- How do we handle if two distinct items have the same hashcode?
- 
 Two primary approaches
    1. Open addressing
    2. Chaining

## Open Addressing
- Every slot in the array contains at most one item
- 
 When a collision happens, we look for the next available open spot
    + 
 Sometimes called *linear probing*

## Open Addressing Example
- Suppose we have an array of length 7 and we insert the following numbers in order into the table
    + 5, 13, 7, 28, 14
- 
 How would searching work?
- 
 How would removal work?
    + What if I remove 28 and then search for 14?
    + Need to insert "dummy" nodes

## Pros/Cons of Open Addressing
- **Pros**: Makes efficient use of memory and cache
- 
 **Cons**: Lots of dummies if you add/remove a lot

## Chaining
- Another idea for handling collisions is **chaining**
- 
 Idea: Each slot has a **linked list** of items, all with the same hash code

## Efficiency
- Recall that we wanted better than $O(\log n)$ runtime
- For hashtables, performance depends on the number of collisions
- 
 Things that help minimize collisions:
    1. Using a prime number for an array length
    2. Having a great hash function
- 
 However, things still go bad if the table gets too full

## Efficiency
- The **load factor** of a hashtable is:
    $$\lambda = \frac{\text{\# items}}{\text{size of array}}$$
- 
 Keeping the load factor low is important
- 
 Most hashtable implementations reallocate the array (similar to ArrayList) once $\lambda$ reaches $0.75$

## Efficiency

| **Operation**   | **Hashtable (expected)** | **Hashtable** (worstcase) |
|-----------------|--------------|----|
| `search`        | $O(\lambda)$ | $O(n)$   |
| `insert`        | $O(1)$       | $O(1)$    |
| `delete`        | $O(1)$       | $O(1)$    |