JEP draft: Sequenced Collections

OwnerStuart Marks
TypeFeature
ScopeSE
StatusDraft
Componentcore-libs / java.util:collections
Created2022/01/27 22:13
Updated2022/02/18 20:14
Issue8280836

Summary

Introduce a new Collection interface representing a collection with a defined encounter order. Such a collection has a well-defined first element, second element, and so forth, up to the last element.

"Life can only be understood backwards; but it must be lived forwards."
— Kierkegaard

Motivation

The Collections Framework lacks a collection type that represents a sequence of elements with a defined encounter order, and also lacks a uniform set of operations that apply across such collections. These gaps have been a repeated source of complaints about the Collections Framework.

For example, List and Deque both define an encounter order, but their common supertype is Collection, which does not. Similarly, Set does not define an encounter order, and implementations such as HashSet do not provide one, but some subtypes such as SortedSet and LinkedHashSet do. Support for encounter order is thus somewhat scattered across the type hierarchy, and it is therefore difficult to express certain useful concepts in APIs. Neither Collection nor List are very good for describing a parameter or return value that has an encounter order; Collection is too weak (relegating such constraints to the prose specification) and List is too strong (excluding SortedSet and LinkedHashSet).

A related problem is that view collections are often forced to downgrade to weaker semantics. Wrapping a LinkedHashSet with Collections::unmodifiableSet yields a Set, discarding the information about encounter order. Without an interface to define them, operations on collections with encounter order are inconsistent or missing. While many implementations support getting the first or last element, each collection defines its own way, and some are not obvious or are missing entirely:

.               First Element                     Last Element

List            list.get(0)                       list.get(list.size() - 1)
Deque           deque.getFirst()                  deque.getLast()
SortedSet       sortedSet.first()                 sortedSet.last()
LinkedHashSet   linkedHashSet.iterator().next()   [missing]

Some of these are unnecessarily cumbersome, such as getting the last element of a List. Some are not even possible without heroics: the only way to get the last element of a LinkedHashSet is to iterate the entire set. Similarly, all of these collections support multiple means of iterating from first to last, with an Iterator, the enhanced for loop, a stream(), or toArray. However, iterating the elements in reverse order is less simple and uniform. NavigableSet provides the descendingSet view for reverse iteration:

for (var e : navSet.descendingSet())
    process(e);

Deque does so with a reverse Iterator:

for (var it = deque.descendingIterator(); it.hasNext();) {
    var e = it.next();
    process(e);
}

List does so, but requires using ListIterator:

for (var it = list.listIterator(list.size()); it.hasPrevious();) {
    var e = it.previous();
    process(e);
}

The only practical way to process the elements of a LinkedHashSet in reverse order is to copy its elements into another collection and iterate that collection, reversing either during copying or iteration.

Similarly, processing a collection's elements using streams is a powerful and effective alternative to processing elements using loops, but obtaining a stream in reverse order can be difficult. Of the various collections that provide encounter order, the only one that supports this conveniently is NavigableSet:

navSet.descendingSet().stream()

The others require either copying the elements to another collection or creating a stream from a customized Spliterator that reverses iteration.

Overall, this is an unfortunate state of affairs, as there is no fundamental reason all of these collection types could not easily implement reverse iteration. The lack of API points in the Collections Framework forces programmers to write slower and more complicated code than necessary. The concept of a collection with defined encounter order exists in multiple places in the Collections Framework, but there is no single type that represents it.
As a result, the operations on such collections are inconsistent or missing, and processing elements in reverse order ranges from inconvenient to impossible. These gaps in the Collections Framework should be filled.

Description

Define a sequenced collection as a collection whose elements have a well-defined linear ordering. The word "sequenced" is the past participle of the verb to sequence, meaning "to arrange elements in a particular order." A non-empty sequenced collection has first and last elements, and every element in between them has a successor and a predecessor. A sequenced collection supports common operations at either end, and it supports processing of the elements from first to last and from last to first (forward and reverse). A sequenced set is a Set; it is a sequenced collection that contains no duplicate elements. A sequenced map is a Map whose entries have a well-defined linear ordering.

Add three new interfaces embodying these concepts into the collection type hierarchy.

Interface declarations are as follows:

interface SequencedCollection<E> extends Collection<E> {
    // new methods
    void addFirst();
    void addLast();
    E getFirst();
    E getLast();
    E removeFirst();
    E removeLast();
    SequencedCollection<E> reversed();
}

interface SequencedSet<E> extends Set<E>, SequencedCollection<E> {
    // covariant override
    SequencedSet<E> reversed();
}

interface SequencedMap<K,V> extends Map<K,V> {
    // new methods
    Entry<K, V> firstEntry();
    Entry<K, V> lastEntry();
    K firstKey();
    K lastKey();
    Entry<K, V> pollFirstEntry();
    Entry<K, V> pollLastEntry();
    V putFirst(K, V);
    V putLast(K, V);
    SequencedMap<K,V> reversed();
    // covariant overrides
    SequencedSet<K> keySet();
    SequencedCollection<V> values();
    SequencedSet<Entry<K,V>> entrySet();
}

Retrofit List and Deque to have SequencedCollection as their immediate superinterfaces. Retrofit LinkedHashSet to implement SequencedSet, and retrofit SortedSet so its immediate superinterface is SequencedSet. Retrofit LinkedHashMap to implement SequencedMap, and retrofit SortedMap so its immediate superinterface is SequencedMap.

Default implementations are provided for all of the new methods. There is no definition of equals and hashCode in SequencedCollection, as the sub-interfaces have conflicting definitions. Add several covariant overrides for reversed in appropriate places. For example, List::reversed is overridden with a return type of List.

The following new methods are added to the Collections utility class to create unmodifiable wrappers for the respective types:

Collections.unmodifiableSequencedCollection(collection)
Collections.unmodifiableSequencedSet(set)
Collections.unmodifiableSequencedMap(map)

The new interface types unify the collections hierarchy so that the ordered collections interfaces List, Deque, SortedSet, and NavigableSet, as well as the LinkedHashSet class, all have a common supertype of SequencedCollection. The SortedMap and NavigableMap interfaces and the LinkedHashMap class all have a common supertype of SequencedMap.

The reversed method provides a reverse-ordered view of the original collection. Any modifications made to the original collection are visible in the view. If permitted, modifications to the view write through to the original collection.

The reverse-ordered view enables all the different sequenced types to process elements in both directions, using all the mechanisms: Iterator, an enhanced-for ("for-each") loop, the forEach method, streams, and the toArray methods. For example, obtaining a reverse-ordered stream from a LinkedHashSet was previously quite difficult; now it is simply

linkedHashSet.reversed().stream()

The reversed method is essentially the same as the NavigableSet::descendingSet method, renamed and promoted to the SequencedCollection interface and thus inherited much more broadly.

New methods on SequencedCollection support adding, getting, and removing at both ends. These methods are essentially the same as the corresponding methods currently on Deque, promoted to SequencedCollection in order to be inherited broadly. The adding and removing methods are optional, primarily to support the case of unmodifiable collections. Note that the getting and removing methods throw NoSuchElementException if the collection is empty.

Collections can be ordered externally or internally. In an externally ordered collection, the ordering of elements is determined by the succession of API calls that populate and modify the collection. In an internally ordered collection, the ordering is determined by comparisons among the elements themselves. (This distinction is described in Goldberg & Robson, Smalltalk-80: The Language and its Implementation, Addison-Wesley, 1983 (the "Blue Book") at the beginning of Chapter 10.)

The addFirst and addLast methods of SequencedSet have special case semantics for externally ordered collections such as LinkedHashSet. If the element is already present in the set, it is also moved into the appropriate position. If the element is not present, it is simply added at the appropriate position. This remedies a long-standing deficiency in LinkedHashSet, the inability to reposition elements that are already in the collection.

Internally ordered collections such as SortedSet cannot support any explicit-positioning operations, including the addFirst and addLast methods. Thus, these methods will throw UnsupportedOperationException.

The methods firstEntry, lastEntry, firstKey, lastKey, pollFirstEntry, and pollLastEntry are present on SequencedMap, promoted from SortedMap and NavigableMap. If the map is empty, entry-returning methods return null instead of throwing NoSuchElementException. This is inconsistent with other methods on the new interfaces; this is the way the methods are defined on NavigableMap. There is however no ambiguity with these methods over a null return value, as an Entry is always non-null, even if the map can contain null keys or values.

New methods putFirst and putLast are added to SequencedMap. They have special semantics similar to the corresponding methods of SequencedSet, For externally ordered maps (e.g., LinkedHashMap) these methods have the additional effect of repositioning the mapping. For internally ordered maps (e.g., SortedMap) the methods throw UnsupportedOperationException.

Rationale and Alternatives

New Types

Adding a new type SequencedCollection is useful because it facilitates creation of APIs that accept, consume, or implement ordered collection semantics. For example, a method may have a parameter that processes a collection of elements in a particular order. In the absence of SequencedCollection, the parameter must be of some other type that doesn't express the desired semantics.

For example, the method can declare the parameter as type Collection and specify in prose that it must be ordered. This requirement is easy to miss. In addition, there is no compile-time or runtime checking of this requirement, which may lead to subtle, hard-to-debug errors if an unordered collection is passed to this parameter.

An alternative is for the method to declare a parameter of a more specific type such as List, which is unambiguously an ordered collection. However, this imposes constraints on callers to produce their input in a collection of this specific type, even if they already have an ordered collection of a different type.

SequencedSet adds no new operations, but it is necessary to support certain view collections that represent something that is a Set and that also is a SequencedCollection. For example, if the LinkedHashSet::reversed method were to return a SequencedCollection, it would no longer be a Set. Similarly, the keySet and entrySet views of SequencedMap were merely of the type Set, they would lack the sequence semantics and the corresponding operations.

SequencedMap seems to add less value as it unifies only the two subtypes LinkedHashmap and SortedMap. (Indeed, earlier versions of this proposal omitted it.) However, its value becomes clear when the complete set of operations are considered. The LinkedHashMap::reversed method returns a reverse-ordered view of the map. This view needs to be of a sequenced type in order to provide sequenced map views and ordered operations.

Having the full complement of sequenced types supports the addition of the corresponding unmodifiable wrappers that provide unmodifiable and sequenced return types. Without the sequenced types, other unmodifiable wrappers would need to be used, resulting in a loss of sequenced semantics.

Another alternative would be to repurpose an existing interface such as List or Deque to denote a sequenced collection. List is sequenced, but it is also externally ordered and it supports access by integer indexes. External ordering is incompatible with the internal ordering of types such as SortedSet. Indexing is a direct-access style of API that can only be implemented via traversal in classes such as LinkedHashSet; this would lead to surprising performance problems, similar to the ones that occur already with LinkedList. Supporting all the List methods does not add much value, yet they can be burdensome to implement. For example, consider implementing replaceAll on a LinkedHashSet.

Deque seems like a promising interface to reuse as a general sequence type, as it supports the right set of operations. However, it is cluttered with other operations, including a family of null-returning operations (offerFirst, offerLast, peekFirst, peekLast, pollFirst, pollLast) and stack operations (push and pop). Deque also inherits operations from Queue including element, offer, peek, and poll which are sensible for a queue but less so for sequences in general. There is also a clash of semantics. If Deque were made a general-purpose sequence, then List would also be a Queue and it would also support stack operations. The result would be a cluttered and confusing API.

Naming

The term sequence implies elements that are arranged in order. It's fairly descriptive, and it's in common usage across various platforms to represent collections with similar semantics to those described here.

The term ordered is useful and descriptive, but is not quite specific enough. To support the semantics and operations described here, it's insufficient for a collection to be ordered; it most also be iterable in both directions, and it must support operations at both ends. Queue is a notable outlier: it is ordered, but it is also decidedly asymmetric.

The term reversible (used in a previous version of this proposal) is reasonably descriptive but it doesn't immediately evoke the concept of having two ends. Perhaps a bigger issue is that the Map variant would have been named ReversibleMap, which seems to imply incorrectly the ability to look up entries by value as well as by key (like a "BiMap" or "BidiMap"). This would likely have been a continual source of confusion.

Omission of Queue

Queue is ordered, but it has been omitted from the SequencedCollection family. It is iterable in only one direction, and its operations are decidedly asymmetric: elements can be added only at the tail and removed only from the head. In addition, certain implementations such as PriorityQueue are not iterable in order; the ordering induced by its comparison method is revealed only destructively. Since the Deque interface inherits all the operations of Queue, it is possible to use a Deque to handle cases where both queue-related and SequencedCollection operations are desired.

Exception-Throwing Methods

The getFirst, getLast, removeFirst, and removeLast methods are promoted from Deque into SequencedCollection. These methods throw NoSuchElementException if the collection is empty. Deque also provides corresponding peek and poll methods, which return null instead of throwing.

These latter methods were not included in SequencedCollection because they introduce ambiguity between null meaning "the collection is empty" and "the element that was retrieved is a null." The Map::get method already has this problem; it seemed unwise extend that problem to additional methods.

We have chosen not to add methods that return Optional, because this would set a new precedent for APIs that isn't followed anywhere else in the Collections Framework.

History

This proposal is an incremental evolution of the ReversibleCollections proposal posted to core-libs-dev on 2021-04-16. The major changes from that proposal are renaming, the addition of a Map interface, and the addition of unmodifiable wrapper methods.

The ReversibleCollection proposal was in turn based on an earlier OrderedMap/OrderedSet proposal posted by Tagir Valeev to core-libs-dev on 2020-04-25. Several fundamental concepts from the proposal are still present, although there are many differences in detail.

Over the years there have been a variety of requests and proposals filed in the vein of combining a List and a Set (or a Map). The recurring themes are a List that contains unique elements, or a Set (or Map) that maintains ordering. Some of the related requests in the JDK Bug System include the following:

JDK-4152834, JDK-4245809, JDK-4264420, JDK-4268146, JDK-6447049, and JDK-8037382.

Some early requests were partially addressed with the introduction of LinkedHashSet and LinkedHashMap in JDK 1.4. While these classes do satisfy some use cases, their introduction left gaps in the abstractions and operations provided by the Collections Framework, as described here.

Testing

A comprehensive set of tests are added to the JDK Regression Test suite.

Risks and Assumptions

Introduction of new methods fairly high in the inheritance hierarchy runs the risk of method name clashes over obvious names such as "reversed" and "getFirst".

Of particular concern is that the reversed method returns a SequencedCollection while on List it is overridden to return a List. This is source and binary incompatible with collections that implement both List and Deque. There are two examples of this in the JDK: LinkedList and an internal class sun.awt.util.IdentityLinkedList.

This incompatibility must be handled by modifying such classes to override reversed to return List. Note that Deque does not provide a covariant override of reversed returning Deque, which would make resolving this issue impossible. Instead, it provides the reversedDeque method with return type Deque.

This situation could be mitigated by avoiding covariant overrides of reversed and providing alternative methods (such as reversedList) instead.

The introduction of covariant overrides on view-providing methods of the Map interface could result in source and binary incompatibilities with subclasses that override these methods while retaining their existing return type. A fallback approach would be to introduce new methods instead covariant overrides. This approach was taken with the navigableKeySet method instead of introducing a covariant override of keySet.

There is a possibility of a source incompatibility caused by type inference. The type Collection might be inferred in a certain context, and other code might depend on that implicitly, which might break if the inferred type were to change to be SequencedCollection. Situations where this occurs can easily be mitigated by providing an explicit type instead of using type inference.

A further potential for source incompatibilities might occur if there are collections subclasses or independent classes with the same name.

Analysis of a large corpus of Java code will be performed in order to assess each of these risks.