JEP draft: Re-implement ThreadGroup

OwnerAlan Bateman
TypeFeature
ScopeSE
StatusDraft
Componentcore-libs
Created2020/09/07 18:22
Updated2020/09/23 08:01
Issue8252885

Summary

Re-implement, and partly re-specify, java.lang.ThreadGroup to address its most significant flaws. Remove the ability to explicitly destroy a thread group, degrade the methods that are already depreacted and terminally deprecated, but otherwise leave the ThreadGroup API "as is".

Non-Goals

It is not a goal of this JEP to terminally deprecate or remove ThreadGroup (see Alternatives below for more on this). It is also not a goal to modernize or add new features to the ThreadGroup API.

Description

From the API docs:

"A thread group represents a set of threads. In addition, a thread group can also include other thread groups. The thread groups form a tree in which every thread group except the initial thread group has a parent."

The java.lang.ThreadGroup APIs dates from JDK 1.0. The API appears to have been influenced by Unix process groups at the time and was intended as a form of job control for threads, e.g. "stop all threads".

ThreadGroup is mildly useful for compartmentalisation, e.g. one ThreadGroup for the threads with green stars on their bellies, another for the threads without stars. Modern code is more likely to use the thread pool executors, provided by the java.util.concurrent API, so thread groups are less useful than they were in the early JDK releases.

ThreadGroup was also intended to be useful for diagnostic purposes but that aspect has been superseded by the monitoring and management support and java.lang.management API since Java 5.

Problems with ThreadGroup

Aside from relevance, the ThreadGroup API and implementation has a number of significant problems:

  1. The API and mechanism to destroy thread groups is flawed. To avoid a memory leak, users need to explicitly destroy a thread group when it is empty and no longer needed, or set its daemon status so that the group is automatically destroyed when the last thread in the group terminates. There will often not be a reliable point to destroy a thread group and/or set the daemon status, e.g.

    • ThreadGroup::destroy is not an atomic operation. It can fail with IllegalThreadStateException after destroying some, but not all, subgroups.

    • Threads can be created, but not started, before their thread group is destroyed. This can lead to Thread::start failing with an undocumented IllegalThreadStateException.

    • If setDaemon(true) is called after starting threads in the group then it races with the termination of the last thread in the group. If the last thread terminates before setDaemon is called then the group will not be destroyed.

    • If setDaemon(true) is called before starting threads in the group then it also races with thread termination when there is more than one thread in the group. Thread termination may destroy the group before the remaining threads have been created or started.

  2. The implementation has a reference to all live threads in the group. This adds synchronization and contention overhead to thread creation, start, and termination. ThreadGroup maintains an internal "threads" array that increases the likelihood of Thread objects being on the same cache line (hence the padding in Thread to avoid false sharing on the mutable fields used by java.util.concurrent.ThreadLocalRandom).

  3. Defines suspend/resume/stop methods that are inheritly unsafe.

  4. Defines enumerate methods that are inheritly racy.

  5. Defines several methods that should have been final (activeCount, enumerate, isDestroyed, ...).

  6. ThreadGroup has a number of concurrency bugs, e.g. the daemon and maxPriority fields are accessed without synchronization.

Usages of ThreadGroup

A search of 100,000 artifacts on Maven Central found 2394 unique artifacts with compiled code referencing java.lang.ThreadGroup. Many of the usages sampled have "ThreadFactory" in the class name and appear to just create threads. Several usages sampled are classes that extend Thread and define constructors that take a ThreadGroup.

The following is the usage count of specific methods from the search. The usages of Thread.activeCount and Thread.enumerate are included as these methods delegate to ThreadGroup.

Method #Usages Notes
suspend / resume / allowThreadSuspension 0 / 0 / 0
stop 3
interrupt 420
list 10
setDaemon / isDaemon 154 / 12
destroy / isDestroyed 57 / 71
setMaxPriority / getMaxPrority 11 / 132
activeCount 744
activeGroupCount 325
enumerate 651 Mix of enumerate(Thread[]) and enumerate(Threadgroup[])
Thread.activeCount 175 Delegates to ThreadGroup activeCount
Thread.enumerate 134 Delegates to ThreadGroup enumerate(Thread[])

The search doesn't provide insight into the age or relevance of the artifacts but it at least shows which methods have some usage (and which methods are rarely or never used).

Debugger support

Changes to ThreadGroup need to take debugger support into account:

In addition to the debugger, changes to ThreadGroup need to take Java Flight Recorder (JFR) into account because its API for consuming JFR data has support for accessing data recorded about threads and thread groups.

Project Loom and ThreadGroup

Project Loom aims to drastically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications that make the best use of available hardware. The foundation in the current design and prototype is virtual threads that are scheduled by the Java virtual machine rather than the operating system. The current proposal uses the existing java.lang.Thread API which necessitates addressing a number of issues with both Thread and ThreadGroup. In the current prototype, the relationship between virtual threads and thread groups is as follows:

  1. Virtual threads are not active threads of a thread group. Thread::getThreadGroup returns a dummy "VirtualThreads" thread group that cannot be destroyed. Its activeCount() method returns 0, it appears empty. There is no way to create virtual threads in other thread groups.
  2. By default, threads are created in the same thread group as their parent. If code executing in the context of a virtual thread creates a kernel thread then it is currently created in a sub-group of "VirtualThreads".

Proposed Changes

  1. Remove the ability to explicitly destroy a thread group. By default a ThreadGroup will be eligible to be GC'ed when there are no live threads in the group and there is nothing else keeping the group alive. The destroy method is re-specified to be a no-op and the isDestroyed method re-specified to return false.
  2. The notion of daemon thread group is re-specified to mean it is weakly reachable from the parent. A non-daemon thread group is strongly reachable from its parent. Newly created threads groups are daemon thread groups. The system and main threads groups created during VM startup are created as non-daemon thread groups.
  3. The implementation is changed to not keep a reference to the threads in the group, meaning its internal "threads" array goes away. The activeCount/enumerate methods are re-implemented to take a snapshort of the VM thread list. This change should be transparent to users of the API.
  4. The stop/suspend/resume methods are changed to unconditionally throw UnsupportedOperationException .
  5. Fix the concurrency bugs by way of re-implementation.

The proposal does not deprecate the racy enumerate methods. It wouldn't be hard to define better methods to enumerate the set of threads or subgroups but ThreadGroup is legacy and not interesting for modern code.

The proposed changes do not do anything about the "should be final" methods, it's not worth the disruption.

The proposed changes require changes to the implementation of 3 JVM TI functions but otherwise have no impact on the debugger support.

Preparatory Changes

In advance of the changes, the destroy and isDestroyed and stop methods will be terminally deprecated. The suspend and resume methods are already terminally deprecated.

Alternatives

Deprecate, terminally deprecate, and eventually remove ThreadGroup. This would be a disruptive change as there are at least some tools using it (including Apache ant and the jtreg test harness used by the JDK tests). The debugger support is deeply tied to thread groups, meaning ThreadGroup cannot be significantly degraded or removed without also providing migration paths for debuggers and other tools that use JVM TI or JDI. The proposed changes do not preclude re-visiting this alternative in the future.

Reduce the impact of removing the ability to destroy a therad group. The main compatibility impact in the proposal is on code that depends on destroying a thread group. Several alternatives to this aspect of the proposal have been explored:

  1. Change destroy to throw IllegalThreadStateException. Ruled out because it is disruptive and just invites bug reports that the method throws when the thread group is empty.
  2. Change isDestroyed to return true if the thread group is a daemon thread group and activeCount returns 0. Ruled out because it creates inconsistencies such as not preventing a thread to be created or started in a "destroyed" thread group. It would also mean that isDestroyed could return false some time after it has returned true.
  3. Change isDestroyed to return true if destroy has been invoked. Ruled out because it creates the same inconsistencies as the previous alternative. Fixing the anomalies or races would require thread creation or start to coordinate with the thread group.

Do nothing. This alternative is problematic for Project Loom because the introducion of virtual threads necessitates specifying the behavior of their thread group (e.g. an early prototype had to specify that the thread group could not be destroyed).

Risks and Assumptions

The following are the risks and compatibility issues that have been identified with the proposed changes. Preliminary testing of the changes with tools that use ThreadGroup in anger have not into run into any of these issues.

  1. Code that depends on destroying a thread group may be impacted, e.g.

    • There may be code that waits for a group to be destroyed with a loop like this: while (!group.isDestroyed()) { ... } This code would loop forever with the proposed change.
    • There may be code that invokes destroy and catches IllegalThreadStateException to detect that there are threads still running. The exception will not be thrown with the proposed change.
  2. Code that depends on finding a thread group by name may be impacted, e.g. it is possible that code enumerates a thread group to find a subgroup by name. In that scenario, the subgroup may have been GC'ed so the search will fail.

  3. Performance. The performance of activeCount is degraded. One data point is a group with two sub-groups, each with 100 threads. The activeCount method takes around 45 ns/op on an Intel 2.6Ghz i7 with the existing ThreadGroup implementation. With the proposal, it takes about 4 us/op on the same system, a significant hit. On the other hand, the performance of enumerate(Thread[]) is not impacted significantly, and may even be faster in some cases when compared to the existing implementation. Furthermore, the proposal change removes overhead at Thread creation, start and termination, measured at about 300ns in the uncontented case on this system.

  4. Code that depends on suspend/resume/stop will fail with UOE.