# Examples of Dynamic Programming --- CS 137 // 2021-11-10 ## Administrivia - Daily exercises 9 and 10 + Due before class today + Reflections due before next class - Daily exercises 11 is due Monday # Questions ## ...about anything? # Daily Exercise Solutions # Dynamic Programming ## Dynamic Programming - Dynamic programming is a technique for improving the efficiency of certain recursive algorithms by caching previous results so they aren't recomputed - Name is synonymous with: + "filling in a table over time" ## Fibonacci Numbers - Recall that Fibonacci using direct recursion is: ```py def fib(n): if n == 0 or n == 1: return 1 else: return fib(n-1) + fib(n-2) ``` - This makes an exponential number of calls! - Since there are only $n$ **unique** calls, dynamic programming can improve the runtime significantly ## Fibonacci with Dynamic Programming 1. Create array $A$ with $n+1$ slots 2. Set $A[0] = 1$ and $A[1] = 1$ 3. For $i = 2\ldots n$: 1. $A[i] = A[i-1] + A[i-2]$ 4. Return $A[n]$ ## Word Segmentation - **INPUT:** A string of letters $x\in\\{a,\ldots,z\\}^\*$ and a dictionary of words with $O(1)$ lookup time - **OUTPUT:** `True` if $x$ is the concatenation of a sequence of legal words in the dictionary and `False` otherwise --- - `"thequickbrownfox"` can be segmented into `["the", "quick", "brown", "fox"]` - However, `"abcdefg"` has no segmentation ## Recursive WS Solution - Define $WS(x)$ to be the following algorithm: 1. If $|x| = 0$, return `True` 2. Otherwise, for $j = 1\ldots |x|$: 1. If $WS(x_{1\ldots j})$ and $x_{j+1\ldots |x|} \in \text{dictionary}$ 1. Return `True` 3. Return `False` ## WS Analysis - Running time is exponential - Only $n$ unique calls to $WS$ ## DP WS Solution 1. Initialize array VALID[0] = True 2. For $i = 1\ldots |x|$: 1. VALID[i] = False 2. For $j=0\ldots i-1$: 1. If VALID[j] and $x_{j+1\ldots i} \in \text{dictionary}$, then set VALID[i] = True and break inner loop 3. Return VALID[$|x|$] ## DP WS Analysis - Running time? + $O(n^2)$ # Solving WS with Graphs # Edit Distance ## Edit Distance - Approximate string matching is a common problem that appears in various disciplines + "Fuzzy searching" + Calculating similarity between genomes - We want to find the minimal number of "edits" required to transform a string $x$ into a string $y$ - Requires defining what we mean by "edit" and what the "cost" of each edit is ## Types of Edits 1. **Substitution:** Replace a single character with another single character + e.g., `"shot"` to `"shop"` 2. **Insertion:** Add a new character into the string + e.g., `"lot"` to `"loft` 3. **Deletion:** Remove a character from the string + e.g., `"dogs"` to `"dos"` ## Types of Edits - Commonly these costs are uniform (i.e. 1), but it can be useful to keep them generic + Characters close to each other on the keyboard might have cheaper costs - Let $c_\text{sub}(x,y)$ be the cost of substituting character $x$ for character $y$ in a string + Note that $c_\text{sub}(x,x)$ should always be zero - Let $c_\text{ins}(x)$ and $c_\text{del}(x)$ be the costs for insertion and deletion of a single character $x$ ## Towards a Solution - Let $a$ and $b$ be strings and $|a|=n$ and $|b|=m$ + Use $a_i$ for the $i$th character of $a$ - Let $d(i,j)$ be the minimum cost of editing the string $a_{1\ldots i}$ into $b_{1\ldots j}$ using the edit operations - Note that $d(n,m)$ is the solution we're looking for ## Recursive Characterization - $d(0,0) = 0$ + Editing `""` to `""` costs nothing - $d(i,0) = \sum_{k=1}^i c_\text{del}(a_k)$ + Editing `"xyz"` to `""` is just the sum of the costs of deleting `x`, `y`, and `z` individually - $d(0,j) = \sum_{k=1}^j c_\text{ins}(b_k)$ + Editing `""` to `"xyz"` is just the sum of the costs of inserting `x`, `y`, and `z` individually ## Recursive Characterization - If $i > 0$ and $j > 0$, then we have three possible cases (deletion, insertion, substitution): $$d(i,j) = \min\begin{cases} d(i-1,j) + c_\text{del}(a_i)\\\\ d(i,j-1) + c_\text{ins}(b_j)\\\\ d(i-1,j-1) + c_\text{sub}(a_i,b_j) \end{cases}$$ ## Recursive Solution ```py def edit_dist(a, b, i, j): if i == 0 and j == 0: return 0 elif j == 0: return sum(c_del(a[k]) for k in range(i)) elif i == 0: return sum(c_ins(b[k]) for k in range(j)) else: d1 = edit_dist(a, b, i-1, j ) + c_del(a[i]) d2 = edit_dist(a, b, i , j-1) + c_ins(b[j]) d3 = edit_dist(a, b, i-1, j-1) + c_sub(a[i], b[j]) return min(d1, d2, d3) ``` ## Analysis of Solution - Make three recursive calls every time but only remove 1-2 characters - Thus runtime is exponential! - How many unique calls to `edit_dist`? - Only $n\cdot m$ + Once for each $i$ and $j$ ## Dynamic Programming Solution for Edit Distance
1. Create an $(n+1)\times (m+1)$ array, $D$ 2. $D[0,0] = 0$ 3. For $i = 1\ldots n$ do: 1. $D[i,0] = D[i-1,0] + c_\text{del}(a_i)$ 4. For $j = 1\ldots m$ do: 1. $D[0,j] = D[0,j-1] + c_\text{ins}(b_j)$ 5. For $i = 1\ldots n$ do: 1. For $j = 1\ldots m$ do: $$D[i,j] = \min\begin{cases} D[i-1,j] + c_\text{del}(a_i)\\\\ D[i,j-1] + c_\text{ins}(b_j)\\\\ D[i-1,j-1] + c_\text{sub}(a_i,b_j) \end{cases}$$ 6. Return $D[n,m]$
## Analysis of DP Solution - Runtime of DP algorithm? + $O(nm)$, dominated by the nested for-loops # Floyd Warshall # Seam Carving