JEP draft: Virtual Threads (Preview)

AuthorsRon Pressler, Alan Bateman
OwnerAlan Bateman
TypeFeature
ScopeSE
StatusDraft
Componentcore-libs
Created2021/11/15 16:43
Updated2021/11/18 14:02
Issue8277131

Summary

Drastically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications that make the best use of available hardware through virtual threads, a lightweight user-mode thread implementation with dramatically reduced costs. This is a Preview Feature.

Goals

Non-Goals

Motivation

Developers have been using Java widely for the past couple of decades to write concurrent applications such as servers, and threads, specifically java.lang.Thread, have served as their core building block. Threads work well to represent some application unit of concurrency, such as a transaction, because the platform and its tooling know about it and track it. The platform attaches troubleshooting context to exceptions in the form of a thread's stack trace, and thread dumps allow us to get a snapshot of what the program is doing by grabbing all the threads' stacks; the debugger allows us to step through the execution of a thread; Java Flight Recorder (JFR) emits events for analysis by profilers grouped by thread. These capabilities give us invaluable insight into the program, but only as long as the thread -- the platform's view of the program -- corresponds to the developer's logical view of the application, as, say, a collection of concurrent transactions.

Unfortunately, the current implementation of Thread consumes an OS thread for each Java thread, and OS threads are scarce and costly, much more than, say, sockets. This means that a modern server can handle orders of magnitude more concurrent transactions than OS threads. Developers writing high-throughput server software have had to make effective use of hardware so as not to waste it, and so had to share threads among transactions. First by using thread-pools that would loan a thread to a transaction so as to save on the cost of creating a new thread for each one, and then, when that wasn't enough, as the OS simply cannot support as many concurrent threads as needed for each transaction, developers have begun returning threads to the pool even in the middle of a transaction, when it's waiting on I/O. This results in the asynchronous style of programming, that not only requires a separate and incompatible set of APIs, but breaks the connection between the logical application unit (transaction) and the platform's unit (thread), which makes the platform unaware of the application's logical units. As a result, troubleshooting, observation, debugging, and profiling, become very difficult, as the platform's context -- the thread -- no longer represents a transaction, and so is not very useful. Better hardware utilization is bought for much more difficult development and maintenance, which also translates to waste. Developers, then, are forced to choose between a natural style that models logical units of concurrency directly as threads and wasting considerable throughput that their hardware could support.

Virtual threads -- user-mode implementations of java.lang.Thread -- give us the best of both worlds. When using the same synchronous APIs on virtual threads, those cheap threads block without blocking any precious OS threads. Hardware utilization is close to optimal, allowing a high level of concurrency and, as a result, high throughput, while the program remains harmonious with the thread-based design of the Java platform and its tooling. Virtual threads are to platform threads what virtual memory is to physical RAM: a mechanism that gives the illusion of a plentiful "virtual" resource through an automatic and efficient mapping to the underlying "physical" resource. Because virtual threads are cheap and plentiful, patterns of thread use are expected to change. For example, a server that, in the course of serving a request, consults two remote services, awaiting their responses concurrently, might today either submit two blocking HTTP client tasks to some thread-pool, or initiate two asynchronous HTTP client tasks that notify some callback upon completion. Instead, it could spawn two virtual threads, each doing nothing other than perform an HTTP client call on behalf of the transaction. This would be as efficient as the asynchronous option -- which, unlike the thread pool option, does not hold on to two precious OS threads for the duration of the requests -- and the code is not only as familiar and simple as the thread-pool option, but also safer, as threads are not shared by multiple tasks, risking thread-local pollution.

There is no need to learn a new programming model to use virtual threads. Anyone who uses Java to write concurrent applications today already knows this model, which is pretty much the same one as Java's original programming model. But we will need to unlearn old habits -- complicated, suboptimal ones -- that arose out of necessity, not elegance, because of threads' high cost; in particular, the use of thread pools that, like all pools, are only useful when the resource they're pooling is scarce and/or costly to create.

Description

Virtual threads are instances of java.lang.Thread implemented by the JDK in such a manner that would allow a great many active instances to coexist in the same process. They can be created with the java.lang.Thread.Builder interface like so:

Thread thread = Thread.ofVirtual().name("duke").unstarted(runnable);

Whether the thread is virtual or not can be queried by the Thread::isVirtual method.

In practice, as it is today, developers will rarely directly construct virtual threads using the builder, but will, instead, use constructs that abstract the creation of the threads, possibly taking an instance of a ThreadFactory created with the builder, like so:

ThreadFactory factory = Thread.ofVirtual().factory();

As far as Java code is concerned, the semantics of virtual threads are identical to that of platform threads, except that they all belong to a single ThreadGroup and cannot be enumerated. However, native code called on such threads may observe a different behavior; for example, when called multiple times on the same virtual thread, it may observe a different OS thread ID in each instance. In addition, OS-level monitoring will observe that the process uses fewer OS threads than the virtual threads created. Virtual threads are invisible to OS-level monitoring, as the OS is unaware of their existence.

The JDK implements virtual threads by storing their state, including the stack, on the Java heap. Virtual threads are scheduled by a scheduler in the Java class libraries, whose worker threads mount virtual threads on their backs when the virtual threads are executing, thus becoming their carriers. When a virtual thread parks -- say, when it blocks on some I/O operation or a java.util.concurrent synchronization construct -- it suspends, and the virtual thread's carrier is free to run any other task. When a virtual thread is unparked -- say, by an I/O operation completing -- it is submitted to the scheduler, which, when available, will mount and resume the virtual thread on some carrier thread, not necessarily the same one it ran on previously. In this way, when a virtual thread performs a blocking operation, instead of parking an OS thread, it is suspended by the JVM and another one scheduled in its place, all without blocking any OS threads (see the Limitations section).

While the carrier thread shares its corresponding OS thread with the virtual thread it mounts, from the perspective of Java code, the carrier and virtual threads are completely separate. The identity of the carrier is not known to the virtual threads, and the two threads’ stack traces are independent.

The JVM Tool Interface (JVM TI) can observe and manipulate virtual threads as it does platform threads, but some operations are not supported, as summarised below and detailed in the JVM TI spec. In particular, JVM TI cannot enumerate all virtual threads. Similarly, the debugger interface JDI supports most operations on virtual threads, but cannot enumerate them. JFR associates events occurring on a virtual thread with the virtual thread. Ordinary thread dumps will show all running platform threads and mounted virtual threads, but a new kind of thread dump is added, and will be described later.

java.lang.Thread API

The java.lang.Thread API is updated as follows:

The java.lang.Thread API is otherwise unchanged. The constructors defined by java.lang.Thread create platform threads as before. No new public constructors have been added.

The API differences between virtual and platforms threads are:

Thread locals

Virtual threads support thread locals and inheritable thread-locals, just like platform threads, so they can run existing code that uses thread locals.

In preparation for virtual threads, many usages of thread locals have been eliminated from the java.base module. This should reduce some of the concerns with memory footprint when running with millions of virtual threads.

The Thread.Builder API defines a method to opt-out of thread locals when creating a thread. It also defines a method to opt-out of inheriting the initial value of inheritable thread-locals When invoked from a thread that does not support thread locals, the ThreadLocal::get method returns the initial value, and the ThreadLocal::set method throws an exception.

The legacy context ClassLoader is re-specified to work like an inheritable thread local. If Thread::setContextClassLoader is invoked on a thread that does not support thread locals then an exception is thrown.

JEP: Scope Locals (Preview) proposes the addition of Scope Locals as a better alternative to thread locals for some use-cases.

java.util.concurrent APIs

LockSupport, the primitive API to support locking, has been updated to support virtual threads. If a virtual thread parks then it releases the underlying carrier thread to do other work if possible. Unparking a virtual thread submits it to the scheduler so that it is scheduled to continue. The update to LockSupport enables all APIs that use it (Locks, Semaphores, blocking queues, ...) to park gracefully when used in virtual threads.

A small number of APIs are added:

Networking APIs

The implementation of the networking APIs defined in the java.net and java.nio.channels API packages have been updated to work with virtual threads. An operation that blocks, e.g. establishing a network connection or reading from a socket, will release the underlying carrier thread to do other work.

To allow for interruption and cancellation, the blocking I/O methods defined by java.net.Socket, java.net.ServerSocket and java.net.DatagramSocket have been re-specified to be interruptible when invoked in the context a virtual Thread. Interrupting a virtual thread blocked on a socket will unpark the thread and close the socket.

java.io APIs

The java.io package provides APIs for streams of bytes and characters. The implementations of these APIs are heavily synchronized and require changes to avoid pinning when using these APIs from virtual threads.

As background, the byte-oriented input/output streams are not specified to be thread-safe and do not specify the expected behavior when close is invoked while a thread is blocked in a read or write method. In most scenarios it doesn't make sense to use an input or output stream from concurrent threads. The character-oriented reader/writers are also not specified to be thread-safe but they do expose a lock object for sub-classes. Aside from pinning, the synchronization is problematic and inconsistent, e.g. the stream encoder/decoders used by InputStreamReader and OutputStreamWriter synchronize on the stream rather than the lock object.

As a workaround, to avoid pinning, the implementations are changed as follows:

Going further and eliminating the locking is beyond the scope of this JEP. A future JEP may re-examine all the locking in this area.

In addition to the changes to locking, the initial size of the buffers used by BufferedOutptuStream, BufferedWriter, and the underlying stream encoder for OutputStreamWriter implementations, are changed to reduce memory usage when there are many output stream or writers in the heap (as might arise if there are 1M virtual threads, each with a buffered stream on a socket connection).

Scheduler

The scheduler for virtual threads is a work stealing ForkJoinPool, that works in first-on-first-out (async) mode, and with parallelism set to the number of available processors.

Some blocking APIs temporarily pin the carrier thread, e.g.most file I/O operations. The implementations of these APIs will compensate for the pinning by temporarily expanding parallelism by means of the ForkJoinPool "managed blocker" mechanism. Consequentially, the number of carrier threads may temporarily exceed the number of available processors.

The scheduler may be configured, for tuning purposes, with two system properties:

Java Native Interface (JNI)

JNI has been updated to define one new function, IsVirtualThread, to test if an object is a virtual Thread. The JNI specification is otherwise unchanged.

Debugger

The debugger architecture consists of three interfaces, namely the JVM Tool Interface (JVM TI), the Java Debug Wire Protocol (JDWP), and the Java Debug Interface (JDI). All three interfaces have been updated to support virtual threads.

JVM TI has been significantly updated as follows:

Existing JVM TI agents will mostly work as before but may encounter errors if they invoke functions that are not supported on virtual threads. This will arise when a "virtual thread unaware" agent is used with an application that uses virtual threads. The change to GetAllThreads to return an array containing only the platform threads may also be an issue for some agents. There may also be performance issues for existing agents that enable the ThreadStart/ThreadEnd events as they lack the ability to limit the events to only platform threads.

JDWP is updated as follows:

JDI is updated as follows:

As noted above, virtual threads are not considered to be active threads in a thread group. Consequentially, the JVM TI function GetThreadGroupChildren, the JDWP command ThreadGroupReference/Children, and the JDI method com.sun.jdi.ThreadGroupReference::threads return a list of platform threads in the thread group, they do not return a list of virtual threads.

Java Flight Recorder (JFR)

JFR is updated to support virtual threads. A number of new events are added:

Troubleshooting and Diagnosability

A new thread dump implementation is added that supports virtual threads in addition to the platform threads. Virtual threads that are blocked in network I/O operations, or created by the "new thread per task" ExecutorService listed above, are included in the thread dump. The new thread dump does not include object addresses, locks, JNI stats, heap stats, and other information that appear in a regular HotSpot VM thread dump. The new thread dump outputs JSON format to make it easy to parse. The JSON output has an array of "thread containers" with one for each thread pool (ThreadPoolExecutor, ForkJoinPool) and thread-per-task executor.

A new method/operation is added to com.sun.management.HotSpotDiagnosticsMXBean to generate threads dumps with the new implementation. This can be used directly, or indirectly via the platform MBeanServer from a local or remote JMX tool.

A new command is added to jcmd to use the new thread dump implementation:

jcmd <pid> JavaThread.dump -format=json <file>

As listed in the Java Flight Recorder section, a JFR event is emitted when a thread is pinned when attempting to park with a native frame on the stack or while holding a monitor. A development-time system property, jdk.tracePinnedThreads, is added to print a stack trace to System.out when a thread is pinned. Running with -Djdk.tracePinnedThreads=full prints a complete stack trace when a thread is pinned with the native frames and frames holding monitors highlighted. Running with -Djdk.tracePinnedThreads=short limits the output to just the problematic frames.

Degrade java.lang.ThreadGroup API

java.lang.ThreadGroup is a legacy API for grouping threads that is rarely used in modern applications and not the right API for grouping virtual threads. It is significantly deprecated and degraded to "make space" to introduce a new construct for organizing threads in the future (see Structured Concurrency).

As background, the ThreadGroup API dates from JDK 1.0 and was intended as a form of job control for threads, e.g. "stop all threads". Modern code is more likely to use the thread pool APIs provided by java.util.concurrent API since Java 5. ThreadGroup supported the isolation of applets in early JDK releases. The Java security architecture evolved significantly in Java 1.2 with thread groups no longer having a significant role. ThreadGroup was also intended to be useful for diagnostic purposes but that aspect has been superseded by the monitoring and management support and java.lang.management API since Java 5. Aside from relevance, the ThreadGroup API and implementation have a number of significant problems, including:

ThreadGroup is re-specified, deprecated, and degraded as follows:

Limitations

There are situations when the VM cannot suspend a virtual thread, in which case it is said to be pinned. Currently, there are two:

  1. When a native method is currently executing in the virtual thread (even if it is calling back into Java)
  2. When a native monitor is held by the virtual thread, meaning it is currently executing inside a synchronized block or method.

The first limitation is here to stay, while the second might be removed in the future.

When a virtual thread tries to park, say, by performing a blocking I/O operation, while pinned, rather than released, its underlying OS thread will be blocked for the duration of the operation. For this reason, very frequent pinning for long durations might harm the scalability of virtual threads.

Therefore, to gain the most out of virtual threads, synchronized blocks or methods that are run frequently and guard potentially long I/O operations should be replaced with java.util.concurrent.ReentrantLock. There is no need to replace synchronized blocks and methods that are infrequent (say, only performed at startup) or guard in-memory operations, although it's always a good idea to consider java.util.concurrent.StampedLock for the latter case. As always, keeping a locking policy simple and clear should be a priority.

To assist in migration and help assess whether a particular use of synchronized should be considered for replacement with a j.u.c lock, the JFR event jdk.VirtualThreadPinnedwill be emitted when a virtual thread attempts to park while pinned (with a default threshold of 20ms). See also the Troubleshooting section for further diagnostics of pinning.

Alternatives

Testing

The existing tests in the OpenJDK repository will be used to ensure that the changes do not cause any unexpected regressions in the multitude of configurations and execution modes that they are run.

Risks and Assumptions

The primary risks of this proposal are ones of compatibility due to changes in existing APIs and their implementation:

In addition, there are several behavioural differences between platform and virtual Threads that may be observed when using existing code with newer code that takes advantage of virtual threads or the new APIs:

Tooling based JVM Tool Interface (JVM TI) may also observe differences as a number of functions are not supported on virtual threads. The JVM TI spec has more details.

Dependences