JEP 285: Spin-Wait Hints
|Authors||Gil Tene, Ivan Krylov|
|Status||Closed / Delivered|
|Component||core-libs / java.lang|
|Discussion||core dash libs dash dev at openjdk dot java dot net|
|Reviewed by||Doug Lea, Paul Sandoz|
|Endorsed by||Brian Goetz|
Define an API to allow Java code to hint that a spin loop is being executed.
Define an API that would allow Java code to hint to the run-time system that it is in a spin loop. The API will be a pure hint, and will carry no semantic behaviour requirements (for example, a no-op is a valid implementation). Allow the JVM to benefit from spin loop specific behaviours that may be useful on certain hardware platforms. Provide both a no-op implementation and an intrinsic implementation in the JDK, and demonstrate an execution benefit on at least one major hardware platform.
It is not a goal to look at performance hints beyond spin loops. Other performance hints, such as prefetch hints, are outside the scope of this JEP.
Some hardware platforms benefit from software indication that a spin loop is in progress. Some common execution benefits may be observed:
The reaction time of a spin loop may be improved when a spin hint is used due to various factors, reducing thread-to-thread latencies in spinning wait situations;
The power consumed by the core or hardware thread involved in the spin loop may be reduced, benefiting the overall power consumption of a program, and possibly allowing other cores or hardware threads to execute at faster speeds within the same power consumption envelope.
While long term spinning is often discouraged as a general user-mode programming practice, short term spinning prior to blocking is a common practice (both inside and outside of the JDK). Furthermore, as core-rich computing platforms are commonly available, many performance and latency sensitive applications, such as the Disruptor, use a pattern that dedicates a spinning thread to a latency critical function and may involve long term spinning as well.
As a practical example and use case, current x86 processors support a
instruction that can be used to indicate spinning behavior. Using a
instruction demonstrably reduces thread-to-thread round trips. Due to its
benefits and widely recommended use, the x86
PAUSE instruction is commonly
used in kernel spinlocks, in POSIX libraries that perform heuristic spins prior
to blocking, and even by the JVM itself. However, due to the inability to hint
that a Java loop is spinning, its benefits are not available to regular Java
We include specific supporting evidence: In simple tests performed on a E5-2697 v2, measuring the round trip latency behavior between two threads that communicate by spinning on a volatile field, round-trip latencies were demonstrably reduced by 18-20 nsec across a wide percentile spectrum (from the 10%'ile to the 99.9%'ile). This reduction can represent an improvement as high as 35%-50% in best-case thread-to-thread communication latency, for example when two spinning threads execute on two hardware threads that share a physical CPU core and an L1 data cache. The full listing of the test may be found here.
The above image shows an example latency measurement comparing the reaction
latency of a spin loop that includes an intrinsic
(intrinsified as a
PAUSE instruction) to the same loop executed without using
PAUSE instruction, along with the measurements of the time it takes to
perform an actual
System.nanoTime() call to measure time.
We propose to add a method to the JDK which would hint that a spin loop is being
An empty method would be a valid implementation of the
java.lang.Thread.onSpinWait() method, but an intrinsic implementation is the
obvious goal for hardware platforms that can benefit from it. We intend to
produce an intrinsic x86 implementation for the JDK as part of this JEP. A
prototype implementation already exists and results from initial testing show
promise. Refer to JBS bug JDK-8147844
for pointers to webrevs with the proposed changes in class libraries and JVM.
JNI can be used to loop with a spin-loop-hinting CPU instruction, however the JNI-boundary crossing overhead tends to be larger than the benefit provided by the instruction, at least where latency is concerned.
We could attempt to have the JIT compilers deduce spin-loop situations and automatically include spin-loop-hinting CPU instructions with no Java code hints required. We suspect that the complexity of automatically and reliably detecting spinning situations, coupled with questions about potential tradeoffs in using the hints on some platforms, would significantly delay the availability of viable implementations.
Testing of a "vanilla" no-op implementation will obviously be fairly simple.
We believe that given the very small footprint of this API, testing of an intrinsified x86 implementation will also be straightforward. We expect testing to focus on confirming both the code generation correctness and latency benefits of using the spin loop hint with an intrinsic implementation.
Should this API be accepted as a Java SE API (e.g. for inclusion in the
namespace in a future Java SE 9 or Java SE 10), we expect to develop associated
TCK tests for the API for potential inclusion in the Java SE TCK.
Risks and Assumptions
The "vanilla" no-op implementation is obviously fairly low risk. An intrinsic x86 implementation will involve modifications to multiple JVM components and as such they carry some risks, but no more than other simple intrinsics added to the JDK.