JEP draft: Vector API (Second Incubator)

OwnerPaul Sandoz
TypeFeature
ScopeJDK
StatusDraft
Componenthotspot / compiler
Discussionpanama dash dev at openjdk dot java dot net
EffortM
DurationM
Created2021/02/12 17:06
Updated2021/03/01 08:07
Issue8261663

Summary

Provide an second iteration of an incubator module, jdk.incubator.vector, to express vector computations that reliably compile at runtime to optimal vector hardware instructions on supported CPU architectures and thus achieve superior performance to equivalent scalar computations.

History

The Vector API was first proposed by JEP 338 and was integrated into Java 16 as an incubating API. This JEP proposes to incorporate Vector API API enhancements based on feedback, performance improvements, and signifiant implementation enhancements optimizing masked vector operations on supporting hardware.

Goals

Motivation

The primary motivation of the Vector API remains unchanged, as described in JEP 338.

This JEP has three specific motivations. The first is to improve the Vector API by incorporating feedback, which involves some minor additions and adjustments. The second is to broaden the support of the Vector API on new CPU architectures, specifically on ARM SVE. The third is to improve the performance of Vector API with enhancements to HotSpot, specifically, enhancing vector support in the C2 runtime compiler and the specific architectures of Intel x64 and ARM SVE. Where possible this may also enhance, or enable future enhancements, of auto-vectorization in HotSpot.

Description

API enhancements

The following API enhancements are proposed:

Intel SVML intrinsics

[TODO]

ARM SVE

[TODO]

Masking

Vector operations that accept masks are not optimally supported on architectures that support masking in hardware. Currently, such operations are implemented by composing the non-masked operation with a blend operation, for example the masked lanewise operation on DoubleVector is implemented as follows:

@ForceInline
public final
DoubleVector lanewise(VectorOperators.Binary op,
                      Vector<Double> v,
                      VectorMask<Double> m) {
     return blend(lanewise(op, v), m);
}

On hardware that supports masked registers, such as AVX-512 and SVE, the blend operation is not required. Instead the mask m can be compiled to a mask register and the vector operation compiled to a vector hardware instruction that operates with the mask register.

[TODO description of C2 generic modifications]

[TODO description of C2 x64 modifications]

[TODO description of C2 SVE modifications]

Alternatives

Testing

Existing tests will be updated to test enhancements to the Vector API.

Existing tests are considered sufficient to cover enhancements to HotSpot. Testing on ARM SVE and AVX-512 hardware will aided by the contributors, since such hardware may not be widely available.

Risks and Assumptions

If the Foreign-Memory Access API does not exit incubation, then some API enhancements and corresponding implementation updates will need to be deferred.