JEP draft: Container aware Java

OwnerBob Vandette
Created2017/06/13 18:30
Updated2017/08/11 14:48
TypeFeature
StatusSubmitted
Componenthotspot / runtime
ScopeJDK
EffortM
DurationM
Priority3
Issue8182070

Summary

Container aware Java runtime

Goals

Enhance the JVM and Core libraries to detect running in a container and adapt to the system resources available to it. This JEP will only support Docker on Linux-x64 although the design should be flexible enough to allow support for other platforms and container technologies. The initial focus will be on Linux low level container technology such as cgroups so that we will be able to easily support other container technologies running on Linux in addition to Docker.

Non-Goals

It is not a goal of this JEP to support any platform other than Docker container technology running on Linux x64.

Success Metrics

Success will be measured by the improved efficiency of running multiple Java containers on a host system with out of the box options.

Motivation

Container technology is becoming more and more prevalent in Cloud based applications. This technology provides process isolation and allows the platform vendor to specify limits and alter the behavior of a process running inside a container that the Java runtime is not aware of. This causes the Java runtime to potentially attempt to use more system resources than are available to it causing performance degradation or even termination.

Description

This enhancement will be made up of the following work items:

A. Detecting if Java is running in a container.

The Java runtime, as well as any tests that we might write for this feature, will need to be able to detect that the current Java process is running in a container. I propose that we add a new JVM_ native function that returns a boolean true value if we are running inside a container.

JVM_InContainer will return true if the currently running process is running in a container, otherwise false will be returned.

B. Exposing container resource limits and configuration.

There are several configuration options and limits that can be imposed upon a running container. Not all of these are important to a running Java process. We clearly want to be able to detect how many CPUs have been allocated to our process along with the maximum amount of memory that we be allocated but there are other options that we might want to base runtime decisions on.

In addition, since Container typically impose limits on system resources, they also provide the ability to easily access the amount of consumption of these resources. I intent on providing this information in addition to the configuration data.

I propose adding a new jdk.internal.Platform class that will allow access to this information. Since some of this information is needed during the startup of the VM, I propose that much of the implementation of the methods in the Platform class be done in the VM and exposed as JVM_xxxxxx functions. In hotspot, the JVM_xxxxxx function will be implemented via the os.hpp interface.

Here are the categories of configuration and consumption statistics that will be made available (The exact API is TBD):

isContainerized
Memory Limit 
Total Memory Limit
Soft Memory Limit
Max Memory Usage
Current Memory Usage 
Maximum Kernel Memory
CPU Shares
CPU Period
CPU Quote
Number of CPUs
CPU Sets
CPU Set Memory Nodes
CPU Usage
CPU Usage Per CPU
Block I/O Weight
Block I/O Device Weight 
Device I/O Read Rate
Device I/O Write Rate
OOM Kill Enabled
OOM Score Adjustment
Memory Swappiness
Shared Memory Size

TODO:

  1. Need to specify the exact arguments and return format for these accessor functions.

C. Adjusting Java runtime configuration based on limits.

Java startup normally queries the operating system in order to setup runtime defaults for things such as the number of GC threads and default memory limits. When running in a container, the operating system functions used provide information about the host and does not include the containers configuration and limits. The VM and core libraries will be modified as part of this JEP to first determine if the current running process is running in a container. It will then cause the runtime to use the container values rather than the general operating system functions for configuring and managing the Java process. There have been a few attempts to correct some of these issue in the VM but they are not complete. The CPU detection in the VM currently only handles a container that limits cpu usage via CPU sets. If the Docker --cpu or --cpu-period along with --cpu-quota options are specified, it currently has no effect on the VMs configuration.

The experimental memory detection that has been implemented only impacts the Heap selection and does not apply to the os::physical_memory or os::available_memory low level functions. This leaves other parts of the VM and core libraries to believe there is more memory available than there actually is.

The Numa support available in the VM is also not correct when running in a container. The number of available memory nodes and enabled nodes as reported by the libnuma library does not take into account the impact of the Docker --cpuset-mems option which restricts which memory nodes the container can use. Inside the container, the file /proc/{pid}/self does report the correct Cpus_allowed and Mems_Allowed but libnuma doesn't respect this. This has been verified via the numactl utility.

To correct these shortcomings and make this support more robust, here's a list of the current cgroup subsystems that we be examined in order to update the internal VM and core library configuration.

Number of CPUs

Use a combination of number_of_cpus() and cpu_sets() in order to determine how many processors are available to the process and adjust the JVMs os::active_processor_count appropriately. The number_of_cpus() will be calculated based on the cpu_quota() and cpu_period() using this formula: number_of_cpus() = cpu_quota() / cpu_period(). Since it's not currently possible to understand the relative weight of the running container against all other containers, altering the cpu_shares of a running container will have no affect on altering Java's configuration.

Also add a new VM flag that allows the number of CPUs to be overridden. This flag will be honored even if UseContainerSupport is not enabled.

Total available memory

Use the memory_limit() value from the cgroup file system to initialize the os::physical_memory() value in the VM. This value will propagate to all other parts of the Java runtime.

We might also consider examining the soft_memory_limit and total_memory_limit in addition to the memory_limit during the ergonomics startup processing in order to fine tuning some of the other VM settings.

CPU Memory Nodes

Use cpu_set_memory_nodes() to configure the os::numa support.

Memory usage

Use memory_usage_in_bytes() for providing os::available_memory() by subtracting the usage from the total available memory allocated to the container.

D. Adding container configuration to error crash logs and Unified JVM logging.

As as troubleshooting aid, we will dump any available container statistics to the hotspot error log and add container specific information to the JVM logging system.

E. Adding a startup flag to enable/disable this support.

Add a -XX:+UseContainerSupport VM option that will be used to enable this support. The default will be off until this feature is proven.

F. Configuration change notifications

An additional API will be provided to allow an application to receive a notification when configuration changes occur. Configuration change events will not necessarily cause the VM and Java core libraries to reconfigure their usage of resources. This support will be optional.

Alternatives

There are a few existing RFE's filed that could be used to enhance the current experimental implementation rather than taking the JEP route.

Testing

Docker/container specific tests should be added in order to validate the functionality being provided with this JEP.

Risks and Assumptions

Docker is currently based on cgroups v1. Cgroups v2 is also available but is incomplete and not yet supported by Docker. It's possible that v2 could replace v1 in an incompatible way rendering this work unusable until it is upgraded.

Dependencies

None at this time.