Learning Outcomes
At the end of this lecture, you’ll be able to:
- Explain and trace the core operations of Set ADT.
- Implement the core operations of Set efficiently using array/linked structure.
- Analyze the time/space efficiency of alternative implementation approaches (e.g. array vs linked structure)
- Understand “move-to-front” and “transpose sequential search” heuristics well enough to implement them.
- Implement Fail-Fast iterators for the Set ADT.
- Explain the difference between the Set and the OrderedSet ADT.
- Implement the core operations of OrderedSet efficiently using array/linked structure.
- Describe set-theoretical operations union, intersection, and set difference.
Lecture Plan
In this lecture, we'll cover the following lessons:
- Set ADT: The Interface
- Linked Implementation of Set⚡
- LinkedSet⚡
- Hacking the O(n) Find⚡
- Array Implementation of Set⚡
- ArraySet⚡
- Hacking the O(n) Find⚡
- Set Iterator⚡
- Fail-Fast Iterators⚡
- Ordered Set: The Interface
- Linked Implementation of OrderedSet⚡
- Array Implementation of OrderedSet⚡
- Set-theoretical Operations
Lessons marked with ⚡ contain exercise/activity.
Downloads
Set ADT: The Interface ↗
A set is an iterable collection of unique elements. A set has no particular ordering of elements (neither by position nor by value).
/**
* Sets of arbitrary values (not necessarily Comparable).
* Iteration order is undefined.
*
* @param <T> Element type.
*/
public interface Set<T> extends Iterable<T> {
/**
* Insert a value.
* Set doesn't change if we try to insert an existing value.
* Post: has(t) == true.
*
* @param t Value to insert.
*/
void insert(T t);
/**
* Remove a value.
* Set doesn't change if we try to remove a non-existent value.
* Post: has(t) == false.
*
* @param t Value to remove.
*/
void remove(T t);
/**
* Test membership of a value.
*
* @param t Value to test.
* @return True if t is in the set, false otherwise.
*/
boolean has(T t);
/**
* Number of values.
*
* @return Number of values in the set, always greater equal to 0.
*/
int size();
}
Linked Implementation of Set ↗
We want to efficiently implement the Set ADT with an underlying linked list. (Go for the simplest choice, singly linked list, unless efficiently demands more complex structures.)
Exercise Complete the following table.
Operation | How? | Runtime |
---|---|---|
has | ||
insert | ||
remove | ||
size |
Solution
All operations, except for size
, require a helper find
method to check if an element exists. We cannot do better than Linear Search for find
.
Operation | How? | Runtime |
---|---|---|
has | return find(t) != null; | $O(n)$ |
insert | if (find(t) == null), prepend(t); | $O(n)$ |
remove | remove(find(t)); | $O(n)$ |
size | return numElements; | $O(1)$ |
find | Linear search | $O(n)$ |
We can use a doubly linked list so once the “node to be removed” is found, we can remove it in constant time (we need access to the previous node). Or we can have a findPrevious
method to get hold of the node before the one “to be removed” in a singly linked list, in linear time, and then remove the “next” node (the target node) in constant time.
LinkedSet ↗
Exercise Open the starter code and complete the implementation of LinkedSet
. (Do this at home!)
Solution
Please check the posted solution.
Hacking the O(n) Find ↗
Consider the following implementation of find
:
private Node<T> find(T t) {
for (Node<T> n = head; n != null; n = n.next) {
if (n.data.equals(t)) {
return n;
}
}
return null;
}
Exercise Update the implementation of find
to employ the “move-to-front heuristic” as it is described in the “Dictionary of Algorithms and Data Structures”.
Solution
Assuming there are helper methods remove
and prepend
:
private Node<T> find(T t) {
for (Node<T> n = head; n != null; n = n.next) {
if (n.data.equals(t)) {
remove(n); // removes node n from this list
prepend(n.data); // add to the front of this list
return head; // assuming no sentinel node
}
}
return null;
}
Array Implementation of Set ↗
We want to efficiently implement the Set ADT with an array as the underlying data structure.
Exercise Complete the following table.
Operation | How? | Runtime |
---|---|---|
has | ||
insert | ||
remove | ||
size |
Solution
All operations, except for size
, require a helper find
method to check if an element exists. We can keep the underlying data in order to perform Binary Search. We will however explore this option for implementing an array based OrderedSet ADT. Let’s keep the underlaying data unordered and perform Linear Search in find
.
Operation | How? | Runtime |
---|---|---|
has | return find(t) != -1; | $O(n)$ |
insert | if (fint(t) == -1) data[numElement++] = t; | $O(n)$ |
remove | Find the element, swap with last, numElements-- | $O(n)$ |
size | return numElements; | $O(1)$ |
find | Linear search | $O(n)$ |
Notice the strategy for remove
which allows us to spend constant time after the element is found.
ArraySet ↗
Exercise Open the starter code and complete the implementation of ArraySet
. (Do this at home!)
Solution
Please check the posted solution.
Hacking the O(n) Find ↗
Recall we can use a heuristic that moves the target of a search to the head of a list so it is found faster next time. This technique is called “move-to-front heuristic”. It speeds up linear search performance in linked list, if the target item is likely to be searched for again soon.
Exercise Can we apply the move-to-front heuristic to speed up linear search in an array?
Solution
Maybe! Moving the target of a search to the front of an array requires shifting all the other elements to the right. This is an additional linear time operation (in addition to the linear search).
Please watch the video lecture where some of the students brilliantly came up with ideas to keep this a constant time operation (at the cost of more complex implementations such as circular array).
A more common strategy is “transpose sequential search” heuristic.
Note: think about why it would not be a good idea to implement “move-to-front” heuristic in an array but instead of “moving” the target element to the front, swapping it with the front value.
Resources
- Wikipedia’s entry on Techniques for rearranging nodes in Self-organizing list
Set Iterator ↗
We know set is an unordered collection.
Exercise How should we implement the iterator for Set ADT?
A) Simply iterate in the same order the elements have been added.
B) To ensure this is an unordered collection, we must iterate over the elements in a random order.
C) Iterate over the elements from head
to tail
(from index 0
to numElements - 1
in array).
Solution
The correct answer is C. It is the cheapest strategy.
The statement that “set is an unordered collection” implies that a client shall not expect the iteration is done in any particular order. We don’t need to go out of our way to ensure an un-orderly iteration!
Fail-Fast Iterators ↗
Have you ever thought about what will happen if you structurally modify a data structure while you are iterating over it?
A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the data structure.
Assume you are iterating over an ArraySet. While the iteration going one, an element you’ve already visited (iterated over) is removed. This could happen in a concurrent program but here is a contrived example to showcase this scenario:
for (int num: myArraySet) {
// do something with num
if (feelingLucky()) {
myArraySet.remove(num);
}
}
Exercise Can you anticipate any issues with iteration?
Solution
In depends on the implementation of the iterator and the remove method. In general, the results of the iteration are undefined under a structural modification.
If we assume the removal strategy is the one we have discussed earlier, then the last element of the array will be swapped with the element to be removed. Effectively we will end the iteration not visiting (not knowing about) the last element before removal.
In Java’s Collection Framework, it is generally prohibited for one thread to modify a Collection while another thread is iterating over it.
When that happens, the iterator will throw ConcurrentModificationException
.
Iterators that do this are known as fail-fast iterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Exercise How can you make a iterator “fail-fast”?
Hint: To make an iterator “fail-fast” we need to be able to tell that the data structure has been modified since the iteration started.
Solution
Here is one strategy: We can use a version number in the data structure class to achieve this.
- The number starts at
0
and is incremented whenever a structural modification is performed. - Each iterator also “remembers” the version number it was created for.
- We can then check for modifications by comparing version numbers in the Iterator operations: If we notice a mismatch, we raise an exception.
We have implemented this feature in the LinkedSet
. Make sure to carefully study it when you get the solution code. Then, try to implement it for ArraySet
.
Ordered Set: The Interface ↗
An ordered-set is an iterable collection of ordered unique elements. The elements are expected to be iterated over in order, based on their values.
/**
* Ordered set of arbitrary values.
* Iteration order is based on the values.
*
* @param <T> Element type.
*/
public interface OrderedSet<T extends Comparable<T>> extends Set<T> {
// Same operations as the Set ADT
}
Notice we must use bounded generics to ensure the elements are comparable (otherwise we cannot put them in order).
Linked Implementation of OrderedSet ↗
We want to efficiently implement the OrderedSet ADT with an underlying linked list.
Exercise Complete the following table.
Operation | How? | Runtime |
---|---|---|
has | ||
insert | ||
remove | ||
size |
Solution
All operations, except for size
, require a helper find
method to check if an element exists. We cannot do better than Linear Search for find
. (Performing Binary Search on a linked list is futile as its cost is $O(n)$ while its implementation is more complex than Linear Search.)
Operation | How? | Runtime |
---|---|---|
has | return find(t) != null; | $O(n)$ |
insert | Find where to insert, then insert! | $O(n)$ |
remove | remove(find(t)); | $O(n)$ |
size | return numElements; | $O(1)$ |
find | Linear search | $O(n)$ |
We can come up with clever implementation so the find
method would return the previous node of the target (instead of the target node itself). This will make it easier for the insert
method to use the same find
operation as the one used by other operations. We leave this as an (unsolved) exercise to you, to implement the operations of OrderedSet using a linked list.
Array Implementation of OrderedSet ↗
We want to efficiently implement the OrderedSet ADT with an array as the underlying data structure.
Exercise Complete the following table.
Operation | How? | Runtime |
---|---|---|
has | ||
insert | ||
remove | ||
size |
Solution
All operations, except for size
, require a helper find
method to check if an element exists. We can perform Binary Search so find
and has
will cost $O(\lg n)$. The insert
and remove
operation will remain linear time since we must shift the elements around to keep the values in order.
Operation | How? | Runtime |
---|---|---|
has | return find(t) != -1; | $O(\lg n)$ |
insert | Find where to insert, then shift elements to make room. | $O(n)$ |
remove | Find the element, shift all element after it to left. | $O(n)$ |
size | return numElements; | $O(1)$ |
find | Binary search | $O(\lg n)$ |
We leave this as an (unsolved) exercise to you: implement the operations of an array-based OrderedSet.
Set-theoretical Operations ↗
In Mathematics, two sets can be “added” together. The union of $A$ and $B$, denoted by $A \cup B$, is the set of all elements that are members of either $A$ or $B$.
We can include this operation in the Set ADT:
/**
* Constructing a new set with elements that are in this set or
* in the other set.
*
* @param other set.
* @return all elements that are in this set or the other set.
*/
Set<T> union(Set<T> other);
The intersection of $A$ and $B$, denoted by $A \cap B$, is the set of all elements that are members of both $A$ and $B$.
/**
* Constructing a new set with elements that are in this set and
* in the other set.
*
* @param other set.
* @return the elements this set and other set have in common.
*/
Set<T> intersect(Set<T> other);
The set difference of $A$ and $B$, denoted by $A - B$, is the set of all elements that are in $A$ but not in $B$.
/**
* Constructing a new set with elements that are in this set but not
* in the other set.
*
* @param other set.
* @return the elements in this set but not in the other set.
*/
Set<T> subtract(Set<T> other);
These operations can be defined for OrderedSet ADT as well:
OrderedSet<T> union(OrderedSet<T> other);
OrderedSet<T> intersect(OrderedSet<T> other);
OrderedSet<T> subtract(OrderedSet<T> other);
We leave it to you as a challenge> exercise to implement these operations efficiently.