JEP draft: Adding support for rsockets

OwnerYingqi Lu
Created2018/05/16 18:45
Updated2018/05/18 17:17
TypeFeature
StatusClosed / Withdrawn
Componentcore-libs
ScopeJDK
EffortM
DurationM
Priority3
Issue8203314

Summary

Add rsocket support into JDK to improve throughput and latency of socket based network communication.

Motivation

For HPC and cloud applications, fully utilizing networking hardware capabilities to reach maximum bandwidth at low latency is challenging. Networking libraries inside JDK are currently based on OS kernel socket. Multiple memory copies between user and kernel spaces are involved during data transfers which result in extra memory bandwidth and CPU cycle consumptions. To improve this, we propose to add rsocket, a protocol over Remote Direct Memory Access (RDMA).

Description

In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters. RDMA supports zero-copy networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer. – Wikipedia [1]

rsocket is a protocol over RDMA that supports socket-level API for applications. It is intended to match the behavior of corresponding socket calls. rsocket functions match the name and function signature of socket calls, with the exception that all function calls are prefixed with an 'r' [2]. For example, to create a socket and return a file descriptor,

default socket call: int socket(int domain, int type, int protocol);
rsocket function: int rsocket(int domain, int type, int protocol);

Currently, following rsocket functions are supported: rsocket, rbind, rlisten, raccept, rconnect, rshutdown, rclose, rrecv, rrecvfrom, rrecvmsg, rread, rreadv, rsend, rsendto, rsendmsg, rwrite, rwritev, rpoll, rselect, rgetpeername, rgetsockname, rsetsockopt, rgetsockopt and rfcntl [2].

Given current JDK networking libraries are built with socket-level API, we believe rsocket is a good fit for enabling RDMA on both traditional sockets and non-blocking socket channels. Below is the list of proposed public APIs and non-public classes.

Public APIs proposed for RDMA based sockets

Module name: jdk.net; Package name: jdk.net

jdk.net.Sockets.openRdmaSocket(), return java.net.Socket

jdk.net.Sockets.openRdmaServerSocket(), return java.net.ServerSocket

Non-public classes:

Module name: jdk.net; Package name: rdma.ch

RdmaSocketImpl/RdmaSocketImpl.PlatformRdmaSocketImpl: RdmaSocketImpl is a subclass of java.net.SocketImpl. It is the implementation for 
RDMA based socket and server socket. When jdk.net.openRdmaSocket/jdk.net.openRdmaServerSocket is invoked, a new instance of RdmaSocketImpl 
gets created. The newly created impl will be used to create a socket/server socket. RdmaSocketImpl has a static inner class 
RdmaSocketImpl.PlatformRdmaSocketImpl

LinuxRdmaSocketImpl: a subclass of RdmaSocketImpl.PlatformRdmaSocketImpl for Linux OS

RdmaSocketInputStream/RdmaSocketOutputStream: subclasses of java.io.FileInputStream/java.io.FileOutputStream, 
handling rsocket specific IO operations

RdmaSocketOptions/RdmaSocketOptions.PlatformRdmaSocketOptions: In addition to the supported standard socket options, 
rsocket has three additional options: RDMA_SQSIZE, RDMA_RQSIZE and RDMA_INLINE [2]. This class is created for set/get 
rsocket specific options. RdmaSocketOptions has an inner class for RdmaSocketOptions.PlatformRdmaSocketOptions

LinuxRdmaSocketOptions: a subclass of RdmaSocketOptions.PlatformRdmaSocketOptions

The class diagrams are shown in Figure 1. enter image description here

Public APIs proposed for RDMA based socket channels:

Module name: jdk.net; Package name: jdk.net

jdk.net.Sockets.openRdmaSocketChannel(), return java.nio.channels.SocketChannel

jdk.net.Sockets.openRdmaServerSocketChannel(), return java.nio.channels.ServerSocketChannel

jdk.net.Sockets.openRdmaSelector(), return java.nio.channels.Selector

Non-public classes:

Module name: jdk.net; Package name: rdma.ch

RdmaSocketChannelImpl: a subclass of java.nio.channels.SocketChannel that defines the implementations of RDMA channel operations 
such as connect, read and write

RdmaServerSocketChannelImpl: a subclass of java.nio.channels.ServerSocketChannel that defines the implementations of RDMA 
server channel operations such as bind and accept

RdmaSocketAdaptor: a subclass of java.net.Socket. It gets created from RdmaSocketChannelImpl to make an RDMA socket channel looks 
like an RDMA socket

RdmaServerSocketAdaptor: a subclass of java.net.ServerSocket. It gets created from RdmaServerSocketChannelImpl to make an RDMA 
server socket channel looks like an RDMA server socket 

RdmaPollSelectorProvider: a subclass of sun.nio.ch.PollSelectorProvider for RDMA based socket channels. 
When jdk.net.Sockets.openRdmaSelector() is invoked, RdmaPollSelectorProvider.provider().openSelector() is called internally 
and a new instance of RdmaPollSelectorImpl is returned

RdmaPollSelectorImpl: a subclass of sun.nio.ch.PollSelectorImpl. It is the implementation of RdmaPollSelectorProvider 
for RDMA based socket channels

RdmaSocketDispatcher/RdmaSocketDispatcher.PlatformRdmaSocketDispatcher: RdmaSocketDispatcher is a subclass of sun.nio.ch.SocketDispatcher. 
It does majority of the RDMA based socket channel IO operations. It has a static inner class PlatformRdmaSocketDispatcher 

LinuxRdmaSocketDispatcher: a subclass of RdmaSocketDispatcher.PlatformRdmaSocketDispatcher for Linux OS

RdmaNet: a subclass of sun.nio.ch.Net for RDMA based socket channel operations such as listen, bind and setSocketOption/getSocketOption

The class diagrams are shown in Figure 2.

enter image description here

Testing

  1. Functional testing on both RDMA based sockets and RDMA based non-blocking socket channels.

  2. CPU usage profiling with and without the feature to ensure CPU consumption is reduced, especially in kernel space.

Alternative

  1. Socket Direct Protocol (SDP) [3] is an alternative approach to enable RDMA for networking. It has been released with JDK1.7. However, SDP kernel support libsdp has been deprecated from Open Fabric Enterprise Edition (OFED) version 3.5 (February 2013) [4]. rsocket was introduced in April 2012 to OFED as a successor to SDP. Specifically to Linux, rsocket support has been part of the kernel distribution too (no need to download and install from OFED).

  2. Another alternative approach is to use LD_PRELOAD with librspreload library, which is part of librdmacm [2]. When using this approach, all the system socket calls are intercepted with rsocket calls provided by the library. This does not provide the flexibility of having both regular socket operations and RDMA socket operations in the same application.

Risks and Assumptions

  1. rsocket is currently only available on Linux. The assumption is the RDMA verbs transport library is pre-installed on the OS.

  2. IPv4 and IPv6 incompatibility. Similar to SDP, rsocket does not work with IPv6-mapped-IPv4 addresses [5]. -Djava.net.preferIPv4Stack=true is needed to run applications.

  3. rsocket does not currently have support for EPoll equivalent capability. rpoll is used instead.

References

[1] https://en.wikipedia.org/wiki/Remote_direct_memory_access

[2] https://linux.die.net/man/7/rsocket

[3] https://docs.oracle.com/javase/tutorial/sdp/sockets/index.html

[4] https://openfabrics.org/downloads/OFED/release_notes/OFED_3.5_release_notes

[5] https://docs.oracle.com/javase/tutorial/sdp/sockets/issues.html