JEP 103: Parallel Array Sorting

AuthorsDavid Holmes, Chris Hegarty
OwnerChris Hegarty
Created2011/09/26 20:00
Updated2014/07/10 20:16
TypeFeature
StatusCompleted
Componentcore-libs
ScopeSE
Discussioncore dash libs dash dev at openjdk dot java dot net
EffortXS
DurationXS
Priority4
Endorsed byBrian Goetz
Release8
Issue8046093
DependsJEP 155: Concurrency Updates
Relates to8003981: Support Parallel Array Sorting - JEP 103

Summary

Add additional utility methods to java.util.Arrays that use the JSR 166 Fork/Join parallelism common pool to provide sorting of arrays in parallel.

Non-Goals

There are many algorithms for parallel array sorting with different trade-offs for time and space. The goal here is to provide a generally useful utility operation, not to provide a framework of different algorithms from which the programmer can select.

Success Metrics

A minimum speedup of at least 1.3 over sequential sort on a two core system, for a suitably sized data set.

Motivation

Java 7 added the Fork/Join framework for lightweight data parallelism, but users must presently implement their own algorithms for simple/common tasks. This proposal addresses one common task by providing parallel sorting of arrays. By converting to/from arrays this can also be used to sort arbitrary collections (for those that have a defined traversal order).

Description

Current sorting implementations provided by the Java Collections Framework (Collections.sort and Arrays.sort) all perform the sorting operation sequentially in the calling thread. This enhancement will offer the same set of sorting operations currently provided by the Arrays class, but with a parallel implementation that utilizes the Fork/Join framework. These new API's are still synchronous with regard to the calling thread as it will not proceed past the sorting operation until the parallel sort is complete.

The actual sorting API this proposal adds will leverage the ForkJoinPool commonPool (default Fork/Join pool being defined by JEP 155).

public class Arrays {

  ...
  public static void parallelSort(byte[] a) { ... }
  public static void parallelSort(byte[] a, int fromIndex, int toIndex)
  public static void parallelSort(short[] a) { ... }
  public static void parallelSort(short[] a, int fromIndex, int toIndex)
    {...}

  ...

}

Sort methods are defined for all primitive array types except boolean, plus Comparable object types, plus arbitrary Object types using a supplied Comparator. The sorting algorithm is that used in Doug Lea's ParallelArray implementation and requires a working space the same size as the array to be sorted (this is the whole array, not just the portion to be sorted).

Open issues:

Alternatives

No general alternatives have been considered. The parallel sorting implementation comes from Doug Lea's ParallelArray framework. Some variations in the API's, particularly in regard to which pool to use, have been considered, but are presently deemed more complex than needed.

Testing

Risks and Assumptions

It is assumed that the choice of the Fork/Join system-wide common pool, and the tying of this API to that pool will not be contentious (or at least not to the point where it prevents the progress of this proposal). The ability to expand the API to allow further control over the pool could be added at a later stage.

It is also assumed that the simple algorithm choice will be sufficient for the general use case.

Impact