JEP 151: Compress Time-Zone Data
|Author||Stuart Marks, Darryl Mocek, Peter Jensen|
|Discussion||i18n dash dev at openjdk dot java dot net|
Store time-zone data more efficiently, in a single compressed file rather than in one uncompressed file per zone.
The original reason to keep time-zone data in individual uncompressed files, rather than in a single compressed file, was (we surmise) to optimize access to the data for a particular time zone and to reduce dynamic memory consumption.
Given that the data for a given zone is read only once, and most applications only use one or a few zones, this is probably not a major concern. It is entirely possible that the current implementation was simply more convenient, and that there was simply no requirement to justify the extra effort of using a compressed format.
For a large number of files of random size, the amount of disk overhead is expected to be number-of-files * 0.5 * file-system-block-size.
The block size on UNIX, Linux (including embedded Linuxes), and NTFS file systems is typically 4KB. There are 500+ time zone files, resulting in an expected overhead of about 1MB (in line with observations).
On a system with a smaller block size of 1KB we would still expect to see an overhead of about 250KB, or about 100% of the actual file size.
Options for reducing the dynamic footprint include:
- Store files in a zip/jar archive
- Use an embedded database
For (1) very minimal and localized changes are required to implement reading zip-file entries rather than individual files.
(2) requires a database. The performance characteristics of using a database are unknown. This may still be interesting if the future installed-module format already makes use of a database for efficient storage and access to items contained in a module.
Requires testing of the performance impact of retrieving time zone data, especially the first call to retrieve time zone data.
Requires changing the testing of the upgrade tools to ensure the time zone data has been written out properly.
Risks and Assumptions
Time-zone updates using a compressed format will not apply to older JDKs. This might require some duplication of effort, to provide updates in two different formats.
There is a risk of a decrease in performance when retrieving time-zone data, in particular for the first zone requested. In the case of a zip file, the performance decrease is expected to be small.
- Other JDK components: Time-zone and locale data-upgrade tools will have to be changed to handle the new format.