Assignment 2

4 minute read

Type: Individual Assignment
Due: before class on Wednesday, October 27, 2021

Prologue

The goal of this assignment is two-fold:

  1. To give you experience implementing algorithms in an actual programming language rather than just pseudocode, and
  2. Benchmark different algorithms for the same problem to compare their runtime in practice.

In particular, we will be looking at algorithms that solve the following problem (which should look familiar to you):

As input we are given two unsorted sets $S_1$ and $S_2$ (each of length $n$) and a number $x$. As output, we wish to return True if there exists a pair of elements $a\in S_1$, $b\in S_2$ such that $a + b = x$, and returns False otherwise.

List of Algorithms

Below is a list of algorithms, each of which solve the above problem using different approaches.

Brute Force

Given input $S_1$, $S_2$, and $x$ do the following:

  1. For each element $a\in S_1$:
    • For each element $b\in S_2$:
      • If $x = a + b$, then return True
  2. Return False

Given input $S_1$, $S_2$, and $x$ do the following:

  1. Sort the set $S_2$ in ascending order
  2. For each element $a\in S_1$:
    • Using binary search, check if $(x-a) \in S_2$. If so, return True
  3. Return False

With a Hashset

Given input $S_1$, $S_2$, and $x$ do the following:

  1. Build a hashset $H_2$ from the elements of $S_2$
  2. For each element $a\in S_1$:
    • Check if $(x-a) \in H_2$. If so, return True
  3. Return False

NOTE: A hashset is implemented as a hashtable. However, rather than having key/value pairs, a hashset simply stores keys. Although hashsets do not allow duplicates, it is possible for two different keys to have the same hash code, so it still is possible for collisions within the table to occur.

Sorting and Sweeping

Given input $S_1$, $S_2$, and $x$ do the following:

  1. Sort $S_1$ in ascending order
  2. Sort $S_2$ in ascending order
  3. Initialize variables $l := 0$ and $r := n-1$
  4. While $l < n$ and $r \ge 0$ do the following:
    • Let $y := S_1[l] + S_2[r]$
    • If $y = x$, then return True
    • Else if $y < x$, then let $l := l + 1$
    • Else, let $r := r-1$
  5. Return False

The idea behind this algorithm is that we sweep across one sorted list from left-to-right and the other sorted list from right-to-left. If there exists such a pair, then we will eventually find it in the middle.

Part 1: Implementation

For the first part of this assignment, you must implement all of the above algorithms in either Python or Java.

If you choose to use Python:

  • Use Python’s built-in list data type for $S_1$ and $S_2$
  • Use Python’s built-in set data type for your hashset
  • Use Python’s built-in sorted function to create sorted lists
  • Use Python’s built-in bisect module for binary search

If you choose to use Java:

  • Treat $S_1$ and $S_2$ as arrays
  • Use Java’s built-in HashSet class for your hashset
  • Use Java’s built-in Arrays class for binary search and sorting methods

Be sure to test your functions to make sure they work as expected!

Part 2: Benchmarking

After you have each of your functions implemented, it is time to benchmark them. For this assignment, we will benchmark the algorithms by running them on randomly generated inputs with varying sizes $n$ and creating a plot of how the running times of each algorithm change as a function of $n$. Since the above algorithms are either $O(n^2)$ or $O(n\log n)$, I suggest you do 100 experiments with $n$ ranging from 100 to 10000 in increments of 100. For each choice of $n$, you should take the average running time of at least 10 randomly generated inputs so as to minimize noise introduced by background processes and garbage collection.

If you are using Python, you can check how long a function call takes to run by using the timeit module. The timeit module attempts to time only the Python contributions to the elapsed time in order to get an accurate measurement.

from timeit import default_timer as timer
start_time = timer()
# <=== function you want to measure goes here
end_time = timer()
total_time = end_time - start_time
print("Ran in", total_time, "seconds")

If you are using Java, you can use System.nanoTime() to calculate elapsed time:

long start_time = System.nanoTime();
// <=== function you want to measure goes here
long end_time = System.nanoTime();
double total_time = (end_time - start_time) / 1000000;
System.out.println("Ran in " + total_time + "milliseconds");

From your benchmark tests, you should produce a table with a column for $n$ and four columns corresponding to the four algorithms above. From such a table, it is easy to plot the time-series for each algorithm to see how its performance changes as $n$ gets large.

Part 3: Analysis

In this part, your job is to write a short report comparing each of the algorithms using the data collected from Part 2. Your report should include the following for each algorithm:

  1. Graphs of the runtime of your experiments as a function of $n$
  2. A description of the algorithm’s asymptotic runtime complexity (what is its Big Oh?) and whether the graph supports your asymptotic analysis

Finally, your report should conclude with a discussion of which algorithm is better to use in practice. Are there situations where the highest performing algorithm would be inferior to one of the other algorithms? Why or why not?

What to Turn In

Ultimately you should submit the following to Gradescope:

  1. Your source code for both Part 1 and Part 2, ideally in the same file.
  2. A text file with the table generated from Part 2 with five columns and 100 data points from your experiments. (This should essentially be the output of your Part 2 code)
  3. Your report from Part 3, preferably as a PDF.