350 lines
8.8 KiB
Markdown
350 lines
8.8 KiB
Markdown
<!-- vim: set ts=2 sw=2 et tw=80: -->
|
|
|
|
# Complexity
|
|
General way to describe efficiency algorithms (linear vs exponential)
|
|
indipendent from the computer architecture/speed.
|
|
|
|
## The RAM - random-access machine
|
|
Model of computer used in this course.
|
|
|
|
Has random-access memory.
|
|
|
|
### Basic types and basic operations
|
|
Has basic types (like int, float, 64bit words). A basic step is an operation on
|
|
a basic type (load, store, add, sub, ...). A branch is a basic step. Invoking a
|
|
function and returning is a basic step as well, but the entire execution takes
|
|
longer.
|
|
|
|
Complexity is not measured by the input value but by the input size in bits.
|
|
`Fibonacci(10)` in linear in `n` (size of the value) but exponential in `l`
|
|
(number of bits in `n`, or size of the input).
|
|
|
|
By default, WORST complexity is considered.
|
|
|
|
## Donald Knuth's A-notation
|
|
A(c) indicates a quantity that is absolutely at most c
|
|
|
|
Antonio's weight = (pronounced "is") A(100)
|
|
|
|
## (big-) O-notation
|
|
f(n) = O(g(n))
|
|
|
|
*Definition:* if f(n) is such that f(n) = k * A(g(n)) for all _n_ sufficiently
|
|
large and for some constant k > 0, then we say that
|
|
|
|
# Complexity notations (lecture 2019-02-26)
|
|
|
|
## Characterizing unknown functions
|
|
|
|
pi(n) = number of primes less than n
|
|
|
|
## First approximation
|
|
|
|
*Upper bound:* linear function
|
|
|
|
pi(n) = O(n)
|
|
|
|
*Lower bound:* constant function
|
|
|
|
pi(n) = omega(1)
|
|
|
|
*Non-trivial tight bound*:
|
|
|
|
pi(n) = theta(n/log n)
|
|
|
|
## Theta notation
|
|
Given a functio ng(n), we define the __family__ of functions theta(g(n)) such
|
|
that given a c_1, c_2 and an n_0, for all n >= n_0 g(n) is sandwiched between
|
|
c_1g(n) and c_2g(n)
|
|
|
|
## Big omega notation
|
|
Omega(g(n)) is a family of functions such that there exists a c and an n_0 such
|
|
that for all n>= n_0 g(n) dominates c\*g(n)
|
|
|
|
## Big "oh" notation
|
|
O(g(n)) is a family of functions such that there exists a c and an n_0 such
|
|
that for all n>= n_0 g(n) is dominated by c\*g(n)
|
|
|
|
## Small "oh" notation
|
|
o(g(n)) is the family of functions O(g(n)) excluding all the functions in
|
|
theta(g(n))
|
|
|
|
## Small omega notation
|
|
omega(g(n)) is the family of functions Omega(g(n)) excluding all the functions
|
|
in theta(g(n))
|
|
|
|
## Recap
|
|
*asymptotically* = <=> theta(g(n))
|
|
*asymptotically* < <=> o(g(n))
|
|
*asymptotically* > <=> omega(g(n))
|
|
*asymptotically* <= <=> O(g(n))
|
|
*asymptotically* >= <=> Omega(g(n))
|
|
|
|
# Insertion sort
|
|
|
|
## Complexity
|
|
|
|
- *Best case:* Linear (theta(n))
|
|
- *Worst case:* Number of swaps = 1 + 2 + ... + n-1 = (n-1)n/2 = theta(n^2)
|
|
- *Average case:* Number of swaps half of worst case = n(n-1)/4 = theta(n^2)
|
|
|
|
## Correctness
|
|
|
|
Proof sort of by induction.
|
|
|
|
An algorithm is correct if given an input the output satisfies the conditions
|
|
stated. The algorithm must terminate.
|
|
|
|
### The loop invariant
|
|
|
|
Invariant condition able to make a loop equivalent to a straight path in an
|
|
execution graph.
|
|
|
|
# Heaps and Heapsort
|
|
|
|
A data structure is a way to structure data.
|
|
A binary heap is like an array and can be of two types: max heap and min heap.
|
|
|
|
## Interface of an heap
|
|
- `Build_max_heap(A)` and rearranges a into a max-heap;
|
|
- `Heap_insert(H, key)` inserts `key` in the heap;
|
|
- `Heap_extract_max(H)` extracts the maximum `key`;
|
|
- `H.heap_size` returns the size of the heap.
|
|
|
|
A binary heap is like a binary tree mapped on an array:
|
|
|
|
```
|
|
1
|
|
/ \
|
|
/ \
|
|
2 3
|
|
/ \ / \
|
|
4 5 6 7
|
|
|
|
=> [1234567]
|
|
```
|
|
|
|
The parent position of `n` is the integer division of n by 2:
|
|
|
|
```python
|
|
def parent(x):
|
|
return x // 2
|
|
```
|
|
|
|
The left of `n` is `n` times 2, and the right is `n` times 2 plus 1:
|
|
|
|
```python
|
|
def left(x):
|
|
return x * 2
|
|
|
|
def right(x):
|
|
return x * 2 + 1
|
|
```
|
|
|
|
**Max heap property**: for all i > 1 A[parent(i)] >= A[i]
|
|
|
|
# Some data structures
|
|
|
|
Way to organize information
|
|
|
|
A data structure has an interface (functions to work with the DS)
|
|
|
|
A data structure has data and meta-data (like size, length).
|
|
|
|
## Stack (LIFO)
|
|
|
|
### Operations:
|
|
- `push(S, x)` (put one element, move TOS)
|
|
- `pop(S)` (remove element in TOS, move TOS)
|
|
- `stack-empty(S)` (returns TRUE if stack is empty)
|
|
|
|
## Queue (FIFO)
|
|
|
|
### Structure
|
|
|
|
- Based on array
|
|
- `length`
|
|
- `head`
|
|
- `tail`
|
|
|
|
Queue has always 1 cell free to avoid confusion with full/empty
|
|
|
|
## Dictionary
|
|
|
|
Data structure for fast search
|
|
|
|
A way to implement a dictionary is a *Direct-access table*
|
|
|
|
### API of dictionary
|
|
|
|
- `Insert(D, k)` insert a ket `k` to dictionary `D`
|
|
- `Delete(D, k)` removes key `k`
|
|
- `Search(D, k)` tells whether `D` contains a key `k`
|
|
|
|
Many different implementations
|
|
|
|
# Direct-access tables
|
|
|
|
- universe of keys = {1,2,...,M}
|
|
- array `T` of size M
|
|
- each key has its own position in T
|
|
|
|
## The 'dumb' approach
|
|
|
|
```python
|
|
|
|
def Insert(D, x):
|
|
D[x] == True
|
|
|
|
def Delete(D, x):
|
|
D[x] == False
|
|
|
|
def Search(D, k):
|
|
return D[x]
|
|
|
|
```
|
|
|
|
Everything is O(1), but memory complexity is awful.
|
|
|
|
Solve this by mapping keys to smaller keys using an hash function.
|
|
|
|
Table T is a table many times smaller than the universe and different by a small
|
|
constant factor from the set of keys stored S.
|
|
|
|
Instead of using keys directly, the index returned by the hash function is used.
|
|
|
|
**Pigeon hole problem:** More keys (pigeons) in the universe than holes (indexes
|
|
for table t)
|
|
|
|
## The solution: the *chained hash table*
|
|
|
|
For every cell in table T don't store True/False but a linked list of keys for
|
|
each index in the table. This is called *Chained hash table*.
|
|
|
|
```python
|
|
|
|
def Chained_hash_insert(T, k):
|
|
return List_insert(T[hash(k)], k)
|
|
|
|
def Chained_hash_search(T, k):
|
|
return List_search(T[hash(k)], k)
|
|
|
|
```
|
|
|
|
Elements are spreaded evenly across the table if the hash function is good.
|
|
|
|
*alpha* = *n / |T|*
|
|
|
|
is the average-case average length of the linked lists inside
|
|
the table (where n is the number of elements in the table and *|T|* is the size
|
|
of the table.
|
|
|
|
A good hash table implementation makes the complexity of *alpha* O(1).
|
|
If *n*, the number of elements that we want to store in the hash table, grows,
|
|
then *|T|* must also grow. *alpha* represents the time complexity of both
|
|
insertion and search.
|
|
|
|
## Growing a *chained hash table*
|
|
|
|
In order to grow a table, a new table must be created. The hash function (or
|
|
its range parameters) must be changed as well.
|
|
|
|
### Rehashing
|
|
|
|
*Rehashing* is the process of putting all the elements of the old table in the
|
|
new table according to the new hash function. The complexity is O(n), since
|
|
`Chained-hash-insert` is constant.
|
|
|
|
### Growing the table every time
|
|
|
|
If the table is grown by a constant factor every time the table *overflows*,
|
|
then the complexity of insertion is O(n^2) due to all the *rehashing* needed.
|
|
|
|
### Growing the table by doubling the size
|
|
|
|
If the table size is doubled when *overfloiwng*, then the complexity for
|
|
insertion becomes linear again by sacrificing some memory complexity.
|
|
|
|
# Binary search trees
|
|
|
|
Implementation of a dynamic set over a *totally ordered* (with an order relation
|
|
that has ...)
|
|
|
|
## Interface
|
|
|
|
- `Tree-Insert(T, k)` adds a key K to tree T;
|
|
- `Tree-Delete(T, k)` deletes the key K from tree T;
|
|
- `Tree-Search(T, k)` returns if key K is in the tree T;
|
|
- `Tree-Minimum(T)` finds the smallest element in the tree;
|
|
- `Tree-Maximum(T)` finds the biggest element in the tree;
|
|
- `Tree-successor(T, k)` and `Tree-predecessor(T, k)` find the next and
|
|
previous position in the tree
|
|
|
|
## Height of a tree
|
|
|
|
The height of a tree is the maximum number of edges traversed from parent to
|
|
child in order to reach a leaf from the root of the tree.
|
|
|
|
## Rotation
|
|
|
|
```
|
|
b
|
|
/ \
|
|
a \
|
|
/ \ \
|
|
/ \ \
|
|
k <= a a <= k <= b k >= b
|
|
```
|
|
|
|
```
|
|
a
|
|
/ \
|
|
/ b
|
|
/ / \
|
|
/ / \
|
|
k <= a a <= k <= b k >= b
|
|
```
|
|
|
|
# Red-Black trees
|
|
|
|
1. Every node has a color: red or black
|
|
2. The root is black
|
|
3. Every NULL leaf node is black
|
|
4. If a node is red, both of its children are black
|
|
5. The black height is the same for every branch in the tree
|
|
|
|
A red node is a way to strech the tree, but you cannot stretch it too far.
|
|
|
|
# B-trees
|
|
|
|
Disk access sucks (a billion times slower than registers). So, ignore the RAM
|
|
model and come up with a better solution.
|
|
|
|
If the number of keys per node is increased to k keys, the complexity of tree
|
|
search will be log_k+1(n). We can search between the keys linearly because we
|
|
have to wait for the disks. And log_1000() is much better than log_2().
|
|
|
|
## Implementation
|
|
|
|
- Minimum degree (min. num. of children in each node) defined as `t` (`t >= 2`)
|
|
- Every node other than the root must have at least `t - 1` keys
|
|
- Every node must contain at most `2t - 1` keys
|
|
- Every node has a boolean flag for leaf
|
|
|
|
### Insertion
|
|
|
|
- Read node from disk
|
|
- If full, split
|
|
- If leaf, insert in the right position
|
|
- If not leaf, recurse on the right branch based on the keys
|
|
|
|
### Search
|
|
|
|
- Read node from disk
|
|
- Linearly search between keys
|
|
- If key found, return True
|
|
- otherwise, if leaf, return False
|
|
- otherwise, if not leaf, recurse on the right branch
|
|
|
|
|
|
|