Asymptotics I Study Guide

Author: Josh Hug

Overview

Runtime Minimization. One of the most important properties of a program is the time it takes to execute. One goal as a programmer is to minimize the time (in seconds) that a program takes to complete.

Runtime Measurement. Some natural techniques:

Measure the number of seconds that a program takes to complete using a stopwatch (either physical or in software). This tells you the actual runtime, but is dependent on the machine and inputs.
Count the number of operations needed for inputs of a given size. This is a machine independent analysis, but still depends on the input, and also doesn't actually tell you how long the code takes to run.
Derive an algebraic expression relating the number of operations to the size of an input. This tells you how the algorithm scales, but does not tell you how long the code takes to run.

Algorithm Scaling. While we ultimately care about the runtime of an algorithm in seconds, we'll often say that one algorithm is better than another simply because of how it scales. By scaling, we mean how the runtime of a piece of code grows as a function of its input size. For example, inserting at the beginning of ArrayList on an old computer might take $R(N) = 0.0001N$ seconds, where $N$ is the size of the list.

For example, if the runtime of two algorithms is $R_1(N) = N^2$, and $R_2(N) = 5000 + N$, we'd say algorithm 2 is better, even though R1 is much faster for small N.

A rough justification for this argument is that performance critical situations are exactly those for which N is "large", though this is not an obvious fact. In almost all cases we'd prefer the linear algorithm. In some limited real-world situations like matrix multiplication, one might select one algorithm for small N, and another algorithm for large N. We won't do this in 61B.

Simplfying Algebraic Runtime. We utilize several simplifications to make runtime analysis simpler.

Pick an arbitrary option to be our cost model, e.g. # of array accesses.
Ignore small inputs, e.g. treat $2N+1$ just like $2N$.
Ignore constant scaling factor, e.g. treat $2N$ just like $N$.

As an example, if we have an algorithm that performs $2N + 1$ increment operations and $4N^2 + 2N + 6$ compares, our intuitive simplifications will lead us to say that this algorithm has a runtime proportional to $N^2$.

The cost model is simply an operation that we're picking to represent the entire piece of code. Make sure to pick an appropriate cost model! If we had chosen the number of increment operations as our cost model, we'd mistakenly determine that the runtime was proportional to $N$. This is incorrect since for large N, the comparisons will vastly outnumber the increments.

Big Theta. To formalize our intuitive simplifications, we introduce Big-Theta notation. We say that a function $R(N) \in \Theta(f(N))$ if there exists positive constants $k_1$ and $k_2$ such that $k_1 f_1(N) \leq R(N) \leq k_2f_2(N)$.

Many authors write $R(N) = \Theta(f(N))$ instead of $R(N) \in \Theta(f(N))$. You may use either notation as you please. I will them interchangeably.

An alternate non-standard definition is that $R(N) \in \Theta(f(N))$ iff the $\lim_{N\to\infty} \frac{R(N)}{f(N)} = k$, where $k$ is some positive constant. We will not use this calculus based definition in class. I haven't thought carefully about this alternate definition, so it might be slightly incorrect due to some calculus subtleties.

When using $\Theta$ to capture a function's asymptotic scaling, we avoid unnecessary terms in our $\Theta$ expression. For example, while $4N^2 + 3N + 6 \in \Theta(4N^2 + 3N)$, we will usually make the simpler claim that is $4N^2 + 3N + 6 \in \Theta(N^2)$.

Order of Growth. If a function $R(N) \in \Theta(f(N))$, we saw the the order of growth is $f(N)$. For example $4N^2 + 3N + 6 \in \Theta(N^2)$, so we say its order of growth is $N^2$.

The terms "constant", "linear", and "quadratic" are often used for algorithms with order of growth $1$, $N$, and $N^2$, respectively. For example, we might say that an algorithm with runtime $4N^2 + 3N + 6$ is quadratic.

Big O. Big O notation is similar to BigTheta. However, instead of bounding from below and above, big O only bounds from above.

In other words $R(N) \in O(f(N))$ iff there exists a positive constant $k_2$ such that $R(N) <= k_2 f_2(N)$.

In terms of limits, $R(N) \in O(f(N))$ if $\lim_{N\to\infty} \frac{R(N)}{f(N)} = k$, where $0 \leq k \leq 1$.

You can think of $\Theta$ sort of like $=$ for orders of growth, and big O sort of like $\leq$.

For example the following facts are true:

$N + N^2 \in O(N^{500})$, runtime is $\leq$ 500th power
$N + N^2 \in O(N^2)$, runtime is $\leq$ quadratic
$N + N^2 \in \Theta(N^2)$, runtime is $=$ quadratic

Best Case vs. Always vs. Worst Case. One particularly vexing topic is what happens when a piece of code has a runtime that depends on the nature of its input. For example, consider a containsZero function that checks an array to see if it has zeros. The runtime of this algorithm for arrays seems like it should just be $\Theta(N)$, meaning that the runtime should grow linearly with the array size.

However, this isn't quite correct. This code has order of growth equal to $N$ only in the worst case. Suppose that we test this algorithm by feeding it increasingly large arrays that are all zeros: The runtime function we measure will be completely flat, since it will always find a zero at the front of the array. In the special case of an all-zeros array, the runtime is $\Theta(1)$.

Here, the problem isn't with $\Theta$ notation, but rather the fact that asking for the runtime of containsZero is an imprecise question.

It's somewhat like asking "If you have 1000 dollars in savings, how much will you have next year?". There is no answer. If it's invested in the stock market, it will grow (or decline) exponentially. If it is invested in the space under your mattress, it will be constant.

Big Theta in the Worst Case vs. Big O. In the real world, it is very common for people to use $O$ notation when they really mean to say "$\Theta$ in the Worst Case". For example, the containsZero function above is $\Theta(N)$ in the worst case, but most posts on the internet would say it is simply $O(N)$ (with no explicit mention of the phrase 'worst case').

Both statements are correct! However, "$Theta(N)$ in the worst case" is a slightly stronger statement.

The difference is identical to the one between the English sentences "Every member of my family is less than or equal to 100 years old" vs. "The oldest member of my family is 100 years old." The latter statement is a stronger statement. That is, any family for which the latter statement applies is also a family for which the former statement applies (but not necessarily the other way around).

This is not particular important, but since we're students at arguably the best CS program in the country, it seems like a thing worth mentioning given the ubiquity of $O$ notation.

Please forgive your TAs if they are unfamiliar with this distinction. It is a subtle difference that often goes overlooked.

Overview

Recommended Problems

B level

A level