JEP 246: Leverage CPU Instructions for GHASH and RSA

OwnerAnthony Scarpino
Created2014/06/16 17:18
Updated2017/01/26 18:41
TypeFeature
StatusClosed / Delivered
Componentsecurity-libs / javax.crypto
ScopeJDK
Discussionsecurity dash dev at openjdk dot java dot net
EffortL
DurationL
Priority2
Reviewed byBrian Goetz, Sean Mullan
Endorsed byBrian Goetz
Release9
Issue8046943

Summary

Improve the performance of GHASH and RSA cryptographic operations by leveraging recently-introduced SPARC and Intel x64 CPU instructions.

Success Metrics

The support for AES-CBC included in JDK 8 (see JEP 164) shows about an 8x improvement over the pure software-based implementation. Different algorithms will vary, but we should see similar significant performance gains.

Motivation

The less we use native libraries, such as PKCS#11, the fewer complicated code and memory issues are caused by interacting with complex native APIs. The fewer JNI calls to native libraries, the faster the crypto. By implementing crypto operations directly in the JVM we can control their implementation and management through a built-in provider, thereby providing out-of-the-box support.

Description

No existing APIs will modified or extended.

Algorithms

The existing implementation invokes AES instructions in HotSpot when those instructions are supported. In addition to CBC mode there are optimizations that help AES and CBC to work fast together. The instructions and optimizations replace the current SunJCE byte-code methods. The plan is to implement similar optimizations for GCM and RSA which can greatly benefit from hardware assistance. Both AES-GCM and RSA are part of the TLS cipher suites.

GHASH, which is part of GCM, will be accelerated using pclmulqdq on Intel x64 and xmul/xmulhi on SPARC.

RSA will be accelerated by using Bit Manipulation Instruction Set 2. It is likely that other asymmetric algorithms will benefit from from these changes, but they will be measured by RSA. SPARC instructions were not added given their complexity and limitations of the 'montmul' and 'montsqr' instructions. Using the native library provides complete RSA functionality without the down side. Additionally because RSA is a slow operation, JNI and native API layers most likely cost little in the overall all performance picture.

Providers

The management of algorithms is an important issue which has become more complicated over time. An extreme case is the default provider configuration for Solaris. The SunPKCS11 provider is ahead of SunJCE in the provider list. The SunPKCS11 provider supports all the hardware accelerated and optimized algorithms for Solaris. To use the JDK 8 AES-CBC support, SunJCE must be moved ahead of SunPKCS11. For an application that only needs AES-CBC, such as a performance test, the other algorithms are not needed, so this works. However, for applications that use multiple algorithms, other algorithms will run using unaccelerated software-based implementations instead of hardware accelerated implementations from SunPKCS11. Other OSes can also have this problem when NSS (configured via the SunPKCS11 provider) is used.

As a result, a new security property has been added to the java.security file, jdk.security.provider.preferred, to allow certain algorithm and algorithm groups to be directed to a particular provider before the ordered provider list is checked. This property is intended for advanced users and is not set by default. With many different versions of x86 and SPARC CPUs in current use, setting a default would likely lead to performance regressions for older systems and require continuous maintenance as new CPUs provide more support. Additionally, existing JDK configurations such as FIPS 140 or other specialized providers could unknowingly be directed toward a different provider. Thus, it is best for the jdk.security.provider.preferred property to be unset by default but let vendors and advanced users set the property to what their CPUs support.

Testing

Existing Known Answer Tests (KAT) should suffice for functional testing. There will be a significant amount of performance testing using existing benchmarks and internal tests.