JEP draft: Re-implement ThreadGroup

OwnerAlan Bateman
TypeFeature
ScopeSE
StatusDraft
Componentcore-libs
Created2020/09/07 18:22
Updated2020/11/24 17:50
Issue8252885

Summary

Re-implement java.lang.ThreadGroup to address its most significant flaws. Remove the ability to explicitly destroy a thread group. Remove the notion of daemon thread group. Degrade or remove the methods that are already deprecated and terminally deprecated, but otherwise leave the ThreadGroup API "as is".

Non-Goals

It is not a goal of this JEP to terminally deprecate or remove ThreadGroup. Removing ThreadGroup would be a disruptive change and is discussed further in the Alternatives section below. This JEP does not preclude deprecating, terminally deprecating, or removing ThreadGroup in the future.

It is not a goal to modernize or add new features to the ThreadGroup API.

Description

From the API docs:

"A thread group represents a set of threads. In addition, a thread group can also include other thread groups. The thread groups form a tree in which every thread group except the initial thread group has a parent."

The java.lang.ThreadGroup APIs dates from JDK 1.0. The API appears to have been influenced by Unix process groups at the time and was intended as a form of job control for threads, e.g. "stop all threads". Modern code is more likely to use the thread pool executors, provided by the java.util.concurrent API.

ThreadGroup supported the isolation of applets in early JDK releases. The Java security architecture evolved significantly in Java 1.2 with ThreadGroups no longer having a significant role.

ThreadGroup was also intended to be useful for diagnostic purposes but that aspect has been superseded by the monitoring and management support and java.lang.management API since Java 5.

Problems with ThreadGroup

Aside from relevance, the ThreadGroup API and implementation has a number of significant problems:

  1. The API and mechanism to destroy thread groups is flawed. To avoid a memory leak, users need to explicitly destroy a thread group when it is empty and no longer needed, or set its daemon status so that the group is automatically destroyed when the last thread in the group terminates. There will often not be a reliable point to destroy a thread group and/or set the daemon status, e.g.

    • ThreadGroup::destroy is not an atomic operation. It can fail with IllegalThreadStateException after destroying some, but not all, subgroups.

    • Threads can be created, but not started, before their thread group is destroyed. This can lead to Thread::start failing with an undocumented IllegalThreadStateException.

    • If setDaemon(true) is called after starting threads in the group then it races with the termination of the last thread in the group. If the last thread terminates before setDaemon is called then the group will not be destroyed.

    • If setDaemon(true) is called before starting threads in the group then it also races with thread termination when there is more than one thread in the group. Thread termination may destroy the group before the remaining threads have been created or started.

  2. The implementation has a reference to all live threads in the group. This adds synchronization and contention overhead to thread creation, start, and termination. ThreadGroup maintains an internal "threads" array that increases the likelihood of Thread objects being on the same cache line (hence the padding in Thread to avoid false sharing on the mutable fields used by java.util.concurrent.ThreadLocalRandom).

  3. Defines suspend, resume, and stop methods that are inherently deadlock prone and unsafe as they as invoke the the same named deadlock prone and unsafe methods defined by java.lang.Thread.

  4. Defines enumerate methods that are inherently racy and flawed. These methods are unable to return a complete snapshot of all threads when called with an undersized array and so need to be called in a loop until a thread count less than the array size is returned.

  5. Defines several methods that should have been final (activeCount, enumerate, isDestroyed, ...).

  6. ThreadGroup has a number of concurrency bugs, e.g. the daemon and maxPriority fields are accessed without synchronization.

Usages of ThreadGroup

A search of 100,000 artifacts on Maven Central found 2394 unique artifacts with compiled code referencing java.lang.ThreadGroup. Many of the usages sampled have "ThreadFactory" in the class name and appear to just create threads. Several usages sampled are classes that extend Thread and define constructors that take a ThreadGroup.

The following is the usage count of specific methods from the search. The usages of Thread.activeCount and Thread.enumerate are included as these methods delegate to ThreadGroup.

Method #Usages Notes
suspend / resume / allowThreadSuspension 0 / 0 / 0
stop 3
interrupt 420
list 10
setDaemon / isDaemon 154 / 12
destroy / isDestroyed 57 / 71
setMaxPriority / getMaxPrority 11 / 132
activeCount 744
activeGroupCount 325
enumerate 651 Mix of enumerate(Thread[]) and enumerate(Threadgroup[])
Thread.activeCount 175 Delegates to ThreadGroup activeCount
Thread.enumerate 134 Delegates to ThreadGroup enumerate(Thread[])

The search doesn't provide insight into the age or relevance of the artifacts but it at least shows which methods have some usage, and which methods are rarely or never used, and in particular

  1. There are no usages of the suspend, resume or allowThreadSuspension methods and only 3 usages of the stop method. These problematic methods could be removed with little impact.
  2. The activeCount and enumerate methods are used (usually in conjunction with each other). Removing these methods would be disruptive for at least some libraries and tools.

Debugger support

Changes to ThreadGroup need to take debugger support into account:

In addition to the debugger, changes to ThreadGroup need to take the Java Flight Recorder API into account because its API for consuming JFR data has support for accessing data recorded about threads and thread groups.

Project Loom and ThreadGroup

Project Loom aims to drastically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications that make the best use of available hardware. The foundation in the current design and prototype is virtual threads that are scheduled by the Java virtual machine rather than the operating system. The current proposal uses the existing java.lang.Thread API which necessitates addressing a number of issues with both Thread and ThreadGroup. In the current prototype, the relationship between virtual threads and thread groups is as follows:

  1. Virtual threads are not active threads of a thread group. Thread::getThreadGroup returns a placeholder "VirtualThreads" thread group that cannot be destroyed. Its activeCount() method returns 0, it appears empty. There is no way to create virtual threads in other thread groups.
  2. By default, threads are created in the same thread group as their parent. If code executing in the context of a virtual thread creates a kernel thread then it is currently created in a sub-group of "VirtualThreads".

Proposed Changes

  1. Remove the ability to explicitly destroy a thread group. This impacts the destroy and isDestroyed methods which can be degraded for a release or two prior to removal. In degraded form, the destroy method is changed to be a no-op or unconditionally throw, and the isDestroyed method changed to always return false.
  2. Remove the notion of a daemon thread group that is automatically destroyed when the last thread in the group terminates. This impacts the setDaemon and isDaemon methods which can be degraded for a release or two prior to removal. In degraded form, the setDaemon method is changed to be a no-op or unconditionally throw, and the isDaemon method changed to always return false.
  3. Re-specify Thread to allow a thread group be eligible to be GC'ed when there are no live threads in the group and there is nothing else keeping the thread group alive.
  4. Change the implementation to not keep a reference to the threads in the group, meaning its internal "threads" array goes away. The activeCount and enumerate methods are re-implemented to take a snapshot of the VM thread list. This change should be transparent to users of the API.
  5. Remove the suspend, resume, stop methods are degraded to unconditionally throw UnsupportedOperationException or are removed.
  6. If removed, the allowThreadSuspension method will also be removed.
  7. Fix the concurrency bugs by way of re-implementation.

The proposal does not deprecate the flawed enumerate methods. It wouldn't be hard to define better methods to enumerate the set of threads or subgroups but ThreadGroup is legacy and not interesting for modern code.

The proposed changes do not do anything about the "should be final" methods, it's not worth the disruption.

The proposed changes require adjustments to the implementation of 3 JVM TI functions, and a small update to the specification of the JVM TI GetThreadGroupInfo function, but otherwise have no impact on the debugger support.

Preparatory Changes

In advance of the changes, the stop, destroy, isDestroyed, setDaemon and isDaemon methods will be terminally deprecated.

The suspend, resume, and allowThreadSuspension methods are already terminally deprecated. These methods could be removed in advance of the changes proposed in this JEP although this is not critical.

Alternatives

Deprecate, terminally deprecate, and eventually remove ThreadGroup. This would be a disruptive change as there are at least some tools using it (including Apache ant and the jtreg test harness used by the JDK tests). The debugger support is deeply tied to thread groups, meaning ThreadGroup cannot be significantly degraded or removed without also providing migration paths for debuggers and other tools that use JVM TI or JDI. The proposed changes do not preclude re-visiting this alternative in the future.

Reduce the impact of removing the ability to destroy a thread group. The main compatibility impact in the proposal is on code that depends on destroying a thread group. Several alternatives to this aspect of the proposal have been explored:

  1. Keep the daemon status and change isDestroyed to return true if the thread group is a daemon thread group and activeCount returns 0. This alternative was ruled out because it creates inconsistencies such as not preventing a thread to be created or started in a "destroyed" thread group. It would also mean that isDestroyed could return false some time after it has returned true.
  2. Change isDestroyed to return true if destroy has been invoked. This alternative was ruled out because it creates the same inconsistencies as the previous alternative. Fixing the anomalies or races would require thread creation or start to coordinate with the thread group.

Do nothing. This alternative is problematic for Project Loom because the introduction of virtual threads necessitates specifying the behavior of their thread group (e.g. an early prototype had to specify that the thread group could not be destroyed).

Risks and Assumptions

The following are the risks and compatibility issues that have been identified with the proposed changes. Preliminary testing of the changes with tools that use ThreadGroup in anger have not run into any of these issues.

  1. Code that depends on destroying a thread group may be impacted, e.g.
    • There may be code that waits for a group to be destroyed with a loop like this: while (!group.isDestroyed()) { ... } This code would loop forever with the proposed change.
    • There may be code that invokes destroy and catches IllegalThreadStateException to detect that there are threads still running. The exception will not be thrown with the proposed change.
  2. Code that depends on finding a thread group by name may be impacted, e.g. it is possible that code enumerates a thread group to find a subgroup by name. In that scenario, the subgroup may have been GC'ed so the search will fail.
  3. Performance. The performance of activeCount will regress. One data point is a group with two sub-groups, each with 100 threads. The activeCount method takes around 45 ns/op on an Intel 2.6Ghz i7 with the existing ThreadGroup implementation. With the proposal, it takes about 4 us/op on the same system, a significant hit. The performance of enumerate(Thread[]) may be impacted in some cases although it may be faster than the existing implementation in other cases.
  4. Code that depends on suspend, resume, or stop will fail (at run-time if they are changed to throw UnsupportedOperationException or at compile-time if these methods are removed).