Lecture 16 Handout

Set ADT

Learning Outcomes

At the end of this lecture, you’ll be able to:

Lecture Plan

In this lecture, we'll cover the following lessons:

  1. Set ADT: The Interface
  2. Linked Implementation of Set
  3. LinkedSet
  4. Hacking the O(n) Find
  5. Array Implementation of Set
  6. ArraySet
  7. Hacking the O(n) Find
  8. Set Iterator
  9. Fail-Fast Iterators
  10. Ordered Set: The Interface
  11. Linked Implementation of OrderedSet
  12. Array Implementation of OrderedSet
  13. Set-theoretical Operations

Lessons marked with ⚡ contain exercise/activity.

Downloads

Set ADT: The Interface

A set is an iterable collection of unique elements. A set has no particular ordering of elements (neither by position nor by value).

/**
 * Sets of arbitrary values (not necessarily Comparable).
 * Iteration order is undefined.
 *
 * @param <T> Element type.
 */
public interface Set<T> extends Iterable<T> {
  /**
   * Insert a value.
   * Set doesn't change if we try to insert an existing value.
   * Post: has(t) == true.
   *
   * @param t Value to insert.
   */
  void insert(T t);

  /**
   * Remove a value.
   * Set doesn't change if we try to remove a non-existent value.
   * Post: has(t) == false.
   *
   * @param t Value to remove.
   */
  void remove(T t);

  /**
   * Test membership of a value.
   *
   * @param t Value to test.
   * @return True if t is in the set, false otherwise.
   */
  boolean has(T t);

  /**
   * Number of values.
   *
   * @return Number of values in the set, always greater equal to 0.
   */
  int size();
}

Linked Implementation of Set

We want to efficiently implement the Set ADT with an underlying linked list. (Go for the simplest choice, singly linked list, unless efficiently demands more complex structures.)

Exercise Complete the following table.

OperationHow?Runtime
has
insert
remove
size
Solution

All operations, except for size, require a helper find method to check if an element exists. We cannot do better than Linear Search for find.

OperationHow?Runtime
hasreturn find(t) != null;$O(n)$
insertif (find(t) == null), prepend(t);$O(n)$
removeremove(find(t));$O(n)$
sizereturn numElements;$O(1)$
findLinear search$O(n)$

We can use a doubly linked list so once the “node to be removed” is found, we can remove it in constant time (we need access to the previous node). Or we can have a findPrevious method to get hold of the node before the one “to be removed” in a singly linked list, in linear time, and then remove the “next” node (the target node) in constant time.

LinkedSet

Exercise Open the starter code and complete the implementation of LinkedSet. (Do this at home!)

Solution

Please check the posted solution.

Hacking the O(n) Find

Consider the following implementation of find:

private Node<T> find(T t) {
  for (Node<T> n = head; n != null; n = n.next) {
    if (n.data.equals(t)) {
      return n;
    }
  }
  return null;
}

Exercise Update the implementation of find to employ the “move-to-front heuristic” as it is described in the “Dictionary of Algorithms and Data Structures”.

Solution

Assuming there are helper methods remove and prepend:

private Node<T> find(T t) {
  for (Node<T> n = head; n != null; n = n.next) {
    if (n.data.equals(t)) {
      remove(n);   // removes node n from this list
      prepend(n.data);  // add to the front of this list
      return head; // assuming no sentinel node 
    }
  }
  return null;
}

Array Implementation of Set

We want to efficiently implement the Set ADT with an array as the underlying data structure.

Exercise Complete the following table.

OperationHow?Runtime
has
insert
remove
size
Solution

All operations, except for size, require a helper find method to check if an element exists. We can keep the underlying data in order to perform Binary Search. We will however explore this option for implementing an array based OrderedSet ADT. Let’s keep the underlaying data unordered and perform Linear Search in find.

OperationHow?Runtime
hasreturn find(t) != -1;$O(n)$
insertif (fint(t) == -1) data[numElement++] = t;$O(n)$
removeFind the element, swap with last, numElements--$O(n)$
sizereturn numElements;$O(1)$
findLinear search$O(n)$

Notice the strategy for remove which allows us to spend constant time after the element is found.

ArraySet

Exercise Open the starter code and complete the implementation of ArraySet. (Do this at home!)

Solution

Please check the posted solution.

Hacking the O(n) Find

Recall we can use a heuristic that moves the target of a search to the head of a list so it is found faster next time. This technique is called “move-to-front heuristic”. It speeds up linear search performance in linked list, if the target item is likely to be searched for again soon.

Exercise Can we apply the move-to-front heuristic to speed up linear search in an array?

Solution

Maybe! Moving the target of a search to the front of an array requires shifting all the other elements to the right. This is an additional linear time operation (in addition to the linear search).

Please watch the video lecture where some of the students brilliantly came up with ideas to keep this a constant time operation (at the cost of more complex implementations such as circular array).

A more common strategy is “transpose sequential search” heuristic.

Note: think about why it would not be a good idea to implement “move-to-front” heuristic in an array but instead of “moving” the target element to the front, swapping it with the front value.

Resources

Set Iterator

We know set is an unordered collection.

Exercise How should we implement the iterator for Set ADT?

A) Simply iterate in the same order the elements have been added.
B) To ensure this is an unordered collection, we must iterate over the elements in a random order.
C) Iterate over the elements from head to tail (from index 0 to numElements - 1 in array).

Solution

The correct answer is C. It is the cheapest strategy.

The statement that “set is an unordered collection” implies that a client shall not expect the iteration is done in any particular order. We don’t need to go out of our way to ensure an un-orderly iteration!

Fail-Fast Iterators

Have you ever thought about what will happen if you structurally modify a data structure while you are iterating over it?

A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the data structure.

Assume you are iterating over an ArraySet. While the iteration going one, an element you’ve already visited (iterated over) is removed. This could happen in a concurrent program but here is a contrived example to showcase this scenario:

for (int num: myArraySet) {
  // do something with num
  if (feelingLucky()) {
    myArraySet.remove(num);
  }
}

Exercise Can you anticipate any issues with iteration?

Solution

In depends on the implementation of the iterator and the remove method. In general, the results of the iteration are undefined under a structural modification.

If we assume the removal strategy is the one we have discussed earlier, then the last element of the array will be swapped with the element to be removed. Effectively we will end the iteration not visiting (not knowing about) the last element before removal.

In Java’s Collection Framework, it is generally prohibited for one thread to modify a Collection while another thread is iterating over it.

When that happens, the iterator will throw ConcurrentModificationException.

Iterators that do this are known as fail-fast iterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Exercise How can you make a iterator “fail-fast”?

Hint: To make an iterator “fail-fast” we need to be able to tell that the data structure has been modified since the iteration started.

Solution

Here is one strategy: We can use a version number in the data structure class to achieve this.

  • The number starts at 0 and is incremented whenever a structural modification is performed.
  • Each iterator also “remembers” the version number it was created for.
  • We can then check for modifications by comparing version numbers in the Iterator operations: If we notice a mismatch, we raise an exception.

We have implemented this feature in the LinkedSet. Make sure to carefully study it when you get the solution code. Then, try to implement it for ArraySet.

Ordered Set: The Interface

An ordered-set is an iterable collection of ordered unique elements. The elements are expected to be iterated over in order, based on their values.

/**
 * Ordered set of arbitrary values.
 * Iteration order is based on the values.
 *
 * @param <T> Element type.
 */
public interface OrderedSet<T extends Comparable<T>> extends Set<T> {
  // Same operations as the Set ADT
}

Notice we must use bounded generics to ensure the elements are comparable (otherwise we cannot put them in order).

Linked Implementation of OrderedSet

We want to efficiently implement the OrderedSet ADT with an underlying linked list.

Exercise Complete the following table.

OperationHow?Runtime
has
insert
remove
size
Solution

All operations, except for size, require a helper find method to check if an element exists. We cannot do better than Linear Search for find. (Performing Binary Search on a linked list is futile as its cost is $O(n)$ while its implementation is more complex than Linear Search.)

OperationHow?Runtime
hasreturn find(t) != null;$O(n)$
insertFind where to insert, then insert!$O(n)$
removeremove(find(t));$O(n)$
sizereturn numElements;$O(1)$
findLinear search$O(n)$

We can come up with clever implementation so the find method would return the previous node of the target (instead of the target node itself). This will make it easier for the insert method to use the same find operation as the one used by other operations. We leave this as an (unsolved) exercise to you, to implement the operations of OrderedSet using a linked list.

Array Implementation of OrderedSet

We want to efficiently implement the OrderedSet ADT with an array as the underlying data structure.

Exercise Complete the following table.

OperationHow?Runtime
has
insert
remove
size
Solution

All operations, except for size, require a helper find method to check if an element exists. We can perform Binary Search so find and has will cost $O(\lg n)$. The insert and remove operation will remain linear time since we must shift the elements around to keep the values in order.

OperationHow?Runtime
hasreturn find(t) != -1;$O(\lg n)$
insertFind where to insert, then shift elements to make room.$O(n)$
removeFind the element, shift all element after it to left.$O(n)$
sizereturn numElements;$O(1)$
findBinary search$O(\lg n)$

We leave this as an (unsolved) exercise to you: implement the operations of an array-based OrderedSet.

Set-theoretical Operations

In Mathematics, two sets can be “added” together. The union of $A$ and $B$, denoted by $A \cup B$, is the set of all elements that are members of either $A$ or $B$.

We can include this operation in the Set ADT:

/**
 * Constructing a new set with elements that are in this set or
 * in the other set.
 *
 * @param other set.
 * @return all elements that are in this set or the other set.
 */
Set<T> union(Set<T> other);

The intersection of $A$ and $B$, denoted by $A \cap B$, is the set of all elements that are members of both $A$ and $B$.

/**
 * Constructing a new set with elements that are in this set and
 * in the other set.
 *
 * @param other set.
 * @return the elements this set and other set have in common.
 */
Set<T> intersect(Set<T> other);

The set difference of $A$ and $B$, denoted by $A - B$, is the set of all elements that are in $A$ but not in $B$.

/**
 * Constructing a new set with elements that are in this set but not
 * in the other set.
 *
 * @param other set.
 * @return the elements in this set but not in the other set.
 */
Set<T> subtract(Set<T> other);

These operations can be defined for OrderedSet ADT as well:

OrderedSet<T> union(OrderedSet<T> other); 
OrderedSet<T> intersect(OrderedSet<T> other);  
OrderedSet<T> subtract(OrderedSet<T> other);  

We leave it to you as a challenge> exercise to implement these operations efficiently.