## Learning Outcomes

At the end of this lecture, you’ll be able to:

- Explain and trace the core operations of
**Set ADT**. - Implement the core operations of Set efficiently using array/linked structure.
- Analyze the time/space efficiency of alternative implementation approaches (e.g. array vs linked structure)
- Understand “move-to-front” and “transpose sequential search”
*heuristics*well enough to implement them. - Implement
**Fail-Fast**iterators for the Set ADT. - Explain the difference between the Set and the
**OrderedSet ADT**. - Implement the core operations of OrderedSet efficiently using array/linked structure.
- Describe set-theoretical operations
*union*,*intersection*, and*set difference*.

## Lecture Plan

In this lecture, we'll cover the following lessons:

- Set ADT: The Interface
- Linked Implementation of Set⚡
- LinkedSet⚡
- Hacking the O(n) Find⚡
- Array Implementation of Set⚡
- ArraySet⚡
- Hacking the O(n) Find⚡
- Set Iterator⚡
- Fail-Fast Iterators⚡
- Ordered Set: The Interface
- Linked Implementation of OrderedSet⚡
- Array Implementation of OrderedSet⚡
- Set-theoretical Operations

Lessons marked with ⚡ contain exercise/activity.

## Downloads

## Set ADT: The Interface ↗

A set is an *iterable* collection of **unique** elements. A set has no particular ordering of elements (neither by position nor by value).

```
/**
* Sets of arbitrary values (not necessarily Comparable).
* Iteration order is undefined.
*
* @param <T> Element type.
*/
public interface Set<T> extends Iterable<T> {
/**
* Insert a value.
* Set doesn't change if we try to insert an existing value.
* Post: has(t) == true.
*
* @param t Value to insert.
*/
void insert(T t);
/**
* Remove a value.
* Set doesn't change if we try to remove a non-existent value.
* Post: has(t) == false.
*
* @param t Value to remove.
*/
void remove(T t);
/**
* Test membership of a value.
*
* @param t Value to test.
* @return True if t is in the set, false otherwise.
*/
boolean has(T t);
/**
* Number of values.
*
* @return Number of values in the set, always greater equal to 0.
*/
int size();
}
```

## Linked Implementation of Set ↗

We want to *efficiently* implement the Set ADT with an underlying linked list. (Go for the simplest choice, *singly linked list*, unless efficiently demands more complex structures.)

Exercise Complete the following table.

Operation | How? | Runtime |
---|---|---|

`has` | ||

`insert` | ||

`remove` | ||

`size` |

## Solution

All operations, except for `size`

, require a helper `find`

method to check if an element exists. We cannot do better than Linear Search for `find`

.

Operation | How? | Runtime |
---|---|---|

`has` | `return find(t) != null;` | $O(n)$ |

`insert` | `if (find(t) == null), prepend(t);` | $O(n)$ |

`remove` | `remove(find(t));` | $O(n)$ |

`size` | `return numElements;` | $O(1)$ |

`find` | Linear search | $O(n)$ |

We can use a doubly linked list so once the “node to be removed” is found, we can remove it in constant time (we need access to the previous node). Or we can have a `findPrevious`

method to get hold of the node before the one “to be removed” in a singly linked list, in linear time, and then remove the “next” node (the target node) in constant time.

## LinkedSet ↗

Exercise Open the starter code and complete the implementation of `LinkedSet`

. (Do this at home!)

## Solution

Please check the posted solution.

## Hacking the O(n) Find ↗

Consider the following implementation of `find`

:

```
private Node<T> find(T t) {
for (Node<T> n = head; n != null; n = n.next) {
if (n.data.equals(t)) {
return n;
}
}
return null;
}
```

Exercise Update the implementation of `find`

to employ the “move-to-front heuristic” as it is described in the “Dictionary of Algorithms and Data Structures”.

## Solution

Assuming there are helper methods `remove`

and `prepend`

:

```
private Node<T> find(T t) {
for (Node<T> n = head; n != null; n = n.next) {
if (n.data.equals(t)) {
remove(n); // removes node n from this list
prepend(n.data); // add to the front of this list
return head; // assuming no sentinel node
}
}
return null;
}
```

## Array Implementation of Set ↗

We want to *efficiently* implement the Set ADT with an array as the underlying data structure.

Exercise Complete the following table.

Operation | How? | Runtime |
---|---|---|

`has` | ||

`insert` | ||

`remove` | ||

`size` |

## Solution

All operations, except for `size`

, require a helper `find`

method to check if an element exists. We can keep the underlying data in order to perform Binary Search. We will however explore this option for implementing an array based OrderedSet ADT. Let’s keep the underlaying data unordered and perform Linear Search in `find`

.

Operation | How? | Runtime |
---|---|---|

`has` | `return find(t) != -1;` | $O(n)$ |

`insert` | `if (fint(t) == -1) data[numElement++] = t;` | $O(n)$ |

`remove` | Find the element, swap with last, `numElements--` | $O(n)$ |

`size` | `return numElements;` | $O(1)$ |

`find` | Linear search | $O(n)$ |

Notice the strategy for `remove`

which allows us to spend constant time after the element is found.

## ArraySet ↗

Exercise Open the starter code and complete the implementation of `ArraySet`

. (Do this at home!)

## Solution

Please check the posted solution.

## Hacking the O(n) Find ↗

Recall we can use a *heuristic* that moves the target of a search to the head of a list so it is found faster next time. This technique is called “move-to-front heuristic”. It speeds up linear search performance in linked list, if the target item is likely to be searched for again soon.

Exercise Can we apply the *move-to-front* heuristic to speed up linear search in an array?

## Solution

Maybe! Moving the target of a search to the front of an array requires shifting all the other elements to the right. This is an additional linear time operation (in addition to the linear search).

Please watch the video lecture where some of the students brilliantly came up with ideas to keep this a constant time operation (at the cost of more complex implementations such as circular array).

A more common strategy is “transpose sequential search” heuristic.

**Note:** think about why it would not be a good idea to implement “move-to-front” heuristic in an array but instead of “moving” the target element to the front, swapping it with the front value.

## Resources

- Wikipedia’s entry on Techniques for rearranging nodes in Self-organizing list

## Set Iterator ↗

We know set is an *unordered* collection.

Exercise How should we implement the iterator for Set ADT?

A) Simply iterate in the same order the elements have been added.

B) To ensure this is an *unordered* collection, we must iterate over the elements in a random order.

C) Iterate over the elements from `head`

to `tail`

(from index `0`

to `numElements - 1`

in array).

## Solution

The correct answer is **C**. It is the cheapest strategy.

The statement that “set is an *unordered* collection” implies that a client shall **not** expect the iteration is done in any particular order. We don’t need to go out of our way to ensure an un-orderly iteration!

## Fail-Fast Iterators ↗

Have you ever thought about what will happen if you *structurally* modify a data structure while you are iterating over it?

A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the data structure.

Assume you are iterating over an ArraySet. While the iteration going one, an element you’ve already visited (iterated over) is removed. This could happen in a *concurrent* program but here is a contrived example to showcase this scenario:

```
for (int num: myArraySet) {
// do something with num
if (feelingLucky()) {
myArraySet.remove(num);
}
}
```

Exercise Can you anticipate any issues with iteration?

## Solution

In depends on the implementation of the iterator and the remove method. In general, the results of the iteration are undefined under a structural modification.

If we assume the removal strategy is the one we have discussed earlier, then the last element of the array will be swapped with the element to be removed. Effectively we will end the iteration not visiting (not knowing about) the last element before removal.

In Java’s Collection Framework, it is generally prohibited for one thread to modify a Collection while another thread is iterating over it.

When that happens, the iterator will throw `ConcurrentModificationException`

.

Iterators that do this are known as

fail-fastiterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Exercise How can you make a iterator “fail-fast”?

**Hint:** To make an iterator “fail-fast” we need to be able to tell that the data structure has been modified since the iteration started.

## Solution

Here is one strategy: We can use a **version number** in the data structure class to achieve this.

- The number starts at
`0`

and is incremented whenever a structural modification is performed. - Each iterator also “remembers” the version number it was created for.
- We can then check for modifications by comparing version numbers in the Iterator operations: If we notice a mismatch, we raise an exception.

We have implemented this feature in the `LinkedSet`

. Make sure to carefully study it when you get the solution code. Then, try to implement it for `ArraySet`

.

## Ordered Set: The Interface ↗

An ordered-set is an *iterable* collection of **ordered** *unique* elements. The elements are expected to be iterated over in order, __based on their values__.

```
/**
* Ordered set of arbitrary values.
* Iteration order is based on the values.
*
* @param <T> Element type.
*/
public interface OrderedSet<T extends Comparable<T>> extends Set<T> {
// Same operations as the Set ADT
}
```

Notice we must use *bounded generics* to ensure the elements are comparable (otherwise we cannot put them in order).

## Linked Implementation of OrderedSet ↗

We want to *efficiently* implement the OrderedSet ADT with an underlying linked list.

Exercise Complete the following table.

Operation | How? | Runtime |
---|---|---|

`has` | ||

`insert` | ||

`remove` | ||

`size` |

## Solution

All operations, except for `size`

, require a helper `find`

method to check if an element exists. We cannot do better than Linear Search for `find`

. (Performing Binary Search on a linked list is futile as its cost is $O(n)$ while its implementation is more complex than Linear Search.)

Operation | How? | Runtime |
---|---|---|

`has` | `return find(t) != null;` | $O(n)$ |

`insert` | Find where to insert, then insert! | $O(n)$ |

`remove` | `remove(find(t));` | $O(n)$ |

`size` | `return numElements;` | $O(1)$ |

`find` | Linear search | $O(n)$ |

We can come up with clever implementation so the `find`

method would return the previous node of the target (instead of the target node itself). This will make it easier for the `insert`

method to use the same `find`

operation as the one used by other operations. We leave this as an (unsolved) exercise to you, to implement the operations of OrderedSet using a linked list.

## Array Implementation of OrderedSet ↗

We want to *efficiently* implement the OrderedSet ADT with an array as the underlying data structure.

Exercise Complete the following table.

Operation | How? | Runtime |
---|---|---|

`has` | ||

`insert` | ||

`remove` | ||

`size` |

## Solution

All operations, except for `size`

, require a helper `find`

method to check if an element exists. We can perform Binary Search so `find`

and `has`

will cost $O(\lg n)$. The `insert`

and `remove`

operation will remain linear time since we must shift the elements around to keep the values in order.

Operation | How? | Runtime |
---|---|---|

`has` | `return find(t) != -1;` | $O(\lg n)$ |

`insert` | Find where to insert, then shift elements to make room. | $O(n)$ |

`remove` | Find the element, shift all element after it to left. | $O(n)$ |

`size` | `return numElements;` | $O(1)$ |

`find` | Binary search | $O(\lg n)$ |

We leave this as an (unsolved) exercise to you: implement the operations of an array-based OrderedSet.

## Set-theoretical Operations ↗

In Mathematics, two sets can be “added” together. The **union** of $A$ and $B$, denoted by $A \cup B$, is the set of all elements that are members of either $A$ or $B$.

We can include this operation in the Set ADT:

```
/**
* Constructing a new set with elements that are in this set or
* in the other set.
*
* @param other set.
* @return all elements that are in this set or the other set.
*/
Set<T> union(Set<T> other);
```

The **intersection** of $A$ and $B$, denoted by $A \cap B$, is the set of all elements that are members of both $A$ and $B$.

```
/**
* Constructing a new set with elements that are in this set and
* in the other set.
*
* @param other set.
* @return the elements this set and other set have in common.
*/
Set<T> intersect(Set<T> other);
```

The **set difference** of $A$ and $B$, denoted by $A - B$, is the set of all elements that are in $A$ but not in $B$.

```
/**
* Constructing a new set with elements that are in this set but not
* in the other set.
*
* @param other set.
* @return the elements in this set but not in the other set.
*/
Set<T> subtract(Set<T> other);
```

These operations can be defined for OrderedSet ADT as well:

```
OrderedSet<T> union(OrderedSet<T> other);
OrderedSet<T> intersect(OrderedSet<T> other);
OrderedSet<T> subtract(OrderedSet<T> other);
```

We leave it to you as a *challenge>* exercise to implement these operations *efficiently*.