Few cents about my commits

fix: robovm crashed on some locales due U_MISSING_RESOURCE_ERROR


github user @obigu reported that he observes crashes in 0.1% of sessions related to code in icu. it is well known dances around icu.dat but it should not cause crashes. First of all it was required to obtain Locale id example that caused crash.
UPDATE: also included rfc3491 to fix #86 PR#227

Crash log shows that easiest way is to trying fetching LocaleData for all possible locales as ut crash inside of it:

Caused by: java.lang.RuntimeException: DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR at
libcore.icu.ICU.getBestDateTimePatternNative(Native Method) at
libcore.icu.ICU.getBestDateTimePattern(ICU.java:133) at
libcore.icu.LocaleData.initLocaleData(LocaleData.java:179) at
libcore.icu.LocaleData.get(LocaleData.java:129) at
libcore.icu.LocaleData.<clinit>(LocaleData.java:42) ... 19 more

Travesal over all java Local.getAvailableLocales() will just ask ICU about locales. It is required to ask iOS to find out if there is anything icu is not happy about. Following snipped does the trick:

List<String> langs = NSLocale.getAvailableLocaleIdentifiers();
for (String l : langs) {
    try {
        Locale locale = new Locale(l);
    } catch (Exception e) {
        System.out.println("Locale " + l + " exception " + e.getLocalizedMessage());

And there is a list of trouble Locales (on different systems could be different):

Locale uz_Arab exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR
Locale ps_AF exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR
Locale ps exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR
Locale fa_IR exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR
Locale fa exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR
Locale fa_AF exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR
Locale uz_Arab_AF exception DateTimePatternGenerator::createInstance failed: U_MISSING_RESOURCE_ERROR

Looking for fix First approach is update icu library to recent. Sadly it looks like Android is not using it in recent versions at least I failed to locate it in sources. Will use it from origin icu-project

try1: icu59.1: latest stable version but had to drop it as icu-project migrated to conform C++11 and changed type of UChar for unsigned to signed UChar→char16_t which break compilation of RoboVM in big amount of places and increase the risk break it.

try2: icu58.2: with minor adaptations was integrated but RoboVM lack of icudt58l.dat as embedded one is v51 and it keep crashing. Try to genereate one at data customizer failed as it is stateted This tool has not yet been updated to ICU 58, see this bug

try3: icu57.1: with minor adaptations was integrated and icu.dat generated. Locale issue solved but there is doubt what solved it: v57 or new icu file that is 6 time bigger than original one

Searching for minimal icu.dat To solve what fixed issue with v57, it was reverted to v51 and very fat icu.dat was provided. this caused no issues so, it seems it is possible stay on v51 but just fix icu.dat

try1: just generating Base Data (294 KB) package failed as RoboVM crashed due missing resources.

try2: lets check what RoboVM embeds.
RoboVMs embeds icu.dat file as byte array in compiler/vm/rt/android/libcore/luni/src/main/native/icudt51l.dat.gz.h. Converting hex array to file with online service, unpack and there is icudt51l to work with. Lets see what items is in it and generate new one with only this list to receive fresh dat file. To work with icu.dat binary ICU4C Binary has to be downloaded. So inside old icudt51l.dat there are following sections:

bin64>icupkg.exe -l icudt51l.dat


UPDATE: include rfc3491 as well to fix #86

Using this list as reference allows to generate new minimal dat file to try.

  • go to data customizer
  • make sure that only Base Data is checked
  • expand Advanced Options at the bottom and expand all groups
  • now just use the list above, find the line and check it
  • generate dat file by clicking Get Library Data

File generate and issue is solved with it. But its size 635472 against 259248 of embedded one. Keeping in mind that embedded icu dat is loaded into memory directly it will waste twice of memory than before.

try3: finding what makes it bigger, unpacking both versions with icu tools:

  • fresh generate one: icupkg -x icudt51l.dat -d icu-gen icudt51l.dat
  • robovm one: icupkg -x icudt51l.dat -d icu-robo icudt51l.dat
    |icu-gen                   |icu-robo                 |
    |cnvalias.icu              |cnvalias.icu             |
    |confusables.cfu           |                         |
    |en.res                    |en.res                   |
    |icustd.res                |icustd.res               |
    |icuver.res                |icuver.res               |
    |likelySubtags.res         |likelySubtags.res        |
    |nfc.nrm                   |                         |
    |nfkc.nrm                  |                         |
    |nfkc_cf.nrm               |                         |
    |numberingSystems.res      |numberingSystems.res     |
    |pool.res                  |pool.res                 |
    |                          |res_index.res            |
    |root.res                  |root.res                 |
    |supplementalData.res      |supplementalData.res     |
    |uts46.nrm                 |                         |
    |                          |brkitr                   |
    |                          |    res_index.res        |
    |                          |coll                     |
    |                          |    res_index.res        |
    |curr                      |curr                     |
    |    en.res                |    en.res               |
    |    pool.res              |    pool.res             |
    |    res_index.res         |    res_index.res        |
    |    root.res              |    root.res             |
    |    supplementalData.res  |    supplementalData.res |
    |lang                      |lang                     |
    |    en.res                |    en.res               |
    |    pool.res              |    pool.res             |
    |    res_index.res         |    res_index.res        |
    |    root.res              |    root.res             |
    |rbnf                      |rbnf                     |
    |    en.res                |    en.res               |
    |    res_index.res         |                         |
    |    root.res              |    root.res             |
    |region                    |region                   |
    |    en.res                |    en.res               |
    |    pool.res              |    pool.res             |
    |    res_index.res         |    res_index.res        |
    |    root.res              |    root.res             |
    |zone                      |zone                     |
    |    en.res                |    en.res               |
    |    pool.res              |    pool.res             |
    |    res_index.res         |    res_index.res        |
    |    root.res              |    root.res             |

Size of particular files like en.res and root.res increased but this and makes a fix. But there is also files that were missing in embedded version:

confusables.cfu nfc.nrm

These are files that are used with Normalizer2. There is normalizer support in RT in class java.text.Normalizer but it is not used directly by runtime and these can be removed. Put the list above to the file remove.txt and removing these from icu.dat with command icupkg -r remove.txt icudt51l.dat. Result is 372640 which is closer to original 259248.

Adding icudt51l.dat to resources just to test: works, no crashes no complaints about missing locales. Preparing to embed it:

  • pack with gz: gzip -9 -k icudt51l.dat
  • covert it to hex array: xxd -i icudt51l.dat.gz > data.c
  • copy content of array from data.c to compiler/vm/rt/android/libcore/luni/src/main/native/icudt51l.dat.gz.h, also update out_icudt51l_dat_gz_len, out_icudt51l_dat_len:
    unsigned char out_icudt51l_dat_gz[] = {
    // array from data.c here
    unsigned int out_icudt51l_dat_gz_len = 139073;
    unsigned int out_icudt51l_dat_len = 372640;

recompiling robovm to get it in place – all works

Few notes:

  1. upgrading to v57 has own benefits as obviously there are positive changes but also has cons:
    • it is risky as affect entire string infrastructure;
    • no direct need into this as it is already works in v51
    • compatibility: lot of developers bundles own v51 icu.dat files into resources and they will have to regenerate this to v57
  2. why not make embedded file as big as you wish (cover all cases): it is being loaded into memory directly and has memory footprint, e.g. consumes memory. and there is not enough memory all time
  3. it is better to use file instead of embedded one as file is loaded as mmap which means big files can be used without wasting memory;
  4. probably RoboVM one day shall add smart logic bundle icu dat as resource as well to save 300K of memory