Chapter 10. Performance Optimizations

Table of Contents
Code Size vs. Runtime Performance
Optimizing RAM Memory Demand

In this section we give an example of how to achieve optimal runtime performance for your Java application while reducing the code size and RAM memory demand to a minimum. As a small example application we use Pendragon Software's embedded CaffeineMark (tm) 3.0.

Code Size vs. Runtime Performance

Using Smart Linking

When an application is built, smart linking is used to reduce the set of standard classes that become part of the application. However, due to the large set of library classes that are available, this still results in a fairly large application. In this example compilation is turned off:

 > jamaica CaffeineMarkEmbeddedApp -interpret \
 > -destination=caffeine_interpret
Reading configuration from '/usr/local/jamaica/target/linux-x86/etc/
  jamaica.conf'...
Reading configuration from '/usr/local/jamaica/etc/jamaica.conf'...
Jamaica Builder Tool 3.2 Release 1
Generating code for target 'linux-x86', optimisation 'none'
 + caffeine_interpret__.c
 + caffeine_interpret__.h
Class file compaction gain: 83.17969% (5858125 ==> 985355)
 * C compiling 'caffeine_interpret__.c'
 + caffeine_interpret__nc.o
 * linking
 * stripping
Application memory demand will be as follows:
                       initial               max
Thread C    stacks:     512KB (=   8*  64KB)   31MB (= 511*  64KB)
Thread Java stacks:     128KB (=   8*  16KB) 8176KB (= 511*  16KB)
Heap Size:             2048KB                 256MB
GC data:                128KB                  16MB
TOTAL:                 2816KB                 311MB

 > filesize caffeine_interpret
3123396
      

The runtime performance for the built application is slightly better compared to an interpreted version using jamaicavm_slim, but a stronger performance increase will be achieved by compilation as shown in the next section below.

 > ./caffeine_interpret
Sieve score = 697 (98)
Loop score = 673 (2017)
Logic score = 623 (0)
String score = 4388 (708)
Float score = 588 (185)
Method score = 426 (166650)
Overall score = 827
      
 > jamaicavm_slim CaffeineMarkEmbeddedApp
Sieve score = 590 (98)
Loop score = 438 (2017)
Logic score = 607 (0)
String score = 3306 (708)
Float score = 481 (185)
Method score = 455 (166650)
Overall score = 695
      

Using Compilation

Compilation can be used to increase the runtime performance of Java applications significantly. Compiled code is typically about 20 to 30 times faster than interpreted code. However, due to the fact that Java bytecode is very compact compared to machine code on CISC or RISC machines, fully compiled applications require significantly more memory. This is why we recommend using a profile as decribed in the Section called Compilation via Profiling instead of fully compiling the application as described in the Section called Using Full Compilation.

Using Default Compilation

If none of the options interpret, compile, or useProfile is specified, the default compilation will be used. The default means that a pre-generated profile will be used for the system classes, and all application classes will be compiled fully. This default usually results in good performance for small applications, but it causes extreme code size increase for larger applications and it results in slow execution of applications that use the system classes in a way different than recorded in the system profile.

 > jamaica CaffeineMarkEmbeddedApp -destination=caffeine
Reading configuration from '/usr/local/jamaica/target/linux-x86/etc/
  jamaica.conf'...
Reading configuration from '/usr/local/jamaica/etc/jamaica.conf'...
Jamaica Builder Tool 3.2 Release 1
Generating code for target 'linux-x86', optimisation 'none'
+ PKG__V549b799c4894b375__.c
 + PKG_com_aicas_jamaica_V8fb9360ab2450838__.c
 + PKG_gnu_classpath_V8fb9360ab2450838__.c
 + PKG_java_beans_V8fb9360ab2450838__.c
 + PKG_java_io_V8fb9360ab2450838__.c
 + PKG_java_lang_V8fb9360ab2450838__.c
 + PKG_java_lang_ref_V8fb9360ab2450838__.c
 + PKG_java_lang_reflect_V8fb9360ab2450838__.c
 + PKG_java_math_V8fb9360ab2450838__.c
 + PKG_java_net_V8fb9360ab2450838__.c
 + PKG_java_security_V8fb9360ab2450838__.c
 + PKG_java_text_V8fb9360ab2450838__.c
 + PKG_java_util_V8fb9360ab2450838__.c
 + PKG_java_util_regex_V8fb9360ab2450838__.c
 + PKG_java_util_zip_V8fb9360ab2450838__.c
 + caffeine__.c
 + caffeine__.h
Class file compaction gain: 83.73798% (5767243 ==> 937870)
 * C compiling 'caffeine__.c'
 * C compiling 'PKG__V549b799c4894b375__.c'
 * C compiling 'PKG_com_aicas_jamaica_V8fb9360ab2450838__.c'
 * C compiling 'PKG_gnu_classpath_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_beans_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_io_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_lang_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_lang_ref_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_lang_reflect_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_math_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_net_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_security_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_text_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_util_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_util_regex_V8fb9360ab2450838__.c'
 * C compiling 'PKG_java_util_zip_V8fb9360ab2450838__.c'
 + caffeine__nc.o
 * linking
 * stripping
Application memory demand will be as follows:
                       initial               max
Thread C    stacks:     512KB (=   8*  64KB)   31MB (= 511*  64KB)
Thread Java stacks:     128KB (=   8*  16KB) 8176KB (= 511*  16KB)
Heap Size:             2048KB                 256MB
GC data:                128KB                  16MB
TOTAL:                 2816KB                 311MB

 > filesize caffeine
3873440
      

The runtime performance is better than the interpreted version. But compared to the executable in the next section the application size is bigger and the performance is slightly worse. It is strongly recommended to create a profile as described in the Section called Compilation via Profiling.

 > ./caffeine
Sieve score = 20520 (98)
Sieve score = 20521 (98)
Loop score = 28783 (2017)
Logic score = 11473 (0)
String score = 16085 (708)
Float score = 12287 (185)
Method score = 5230 (166650)
Overall score = 13832
      

Compilation via Profiling

Generation of a profile for compilation is a powerful tool for creating small applications with fast turn-around times. The profile collects information on the runtime behavior of an application, guiding the compiler in its optimization process and in the selection of which methods to compile and which methods to leave in compact bytecode format.

To generate the profile, we first have to create a profiling version of the applications using the builder option profile (see Chapter 8) or using the command jamaicavmp:

 > jamaicavmp CaffeineMarkEmbeddedApp
Sieve score = 337 (98)
Loop score = 313 (2017)
Logic score = 387 (0)
String score = 2813 (708)
Float score = 343 (185)
Method score = 291 (166650)
Overall score = 474
Start writing profile data into file 'CaffeineMarkEmbeddedApp.prof'
 Write threads data...
 Write instantiation data...
 Write invocation data...
 Write heap data...
0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Done writing profile data
        

This profiling run also illustrates the runtime overhead of the profiling data collection: The profiling run is about 30% slower than the interpreted version.

Now, an application can be compiled using the profiling data that was stored in file CaffeineMarkEmbeddedApp.prof:

 > jamaica -useProfile=CaffeineMarkEmbeddedApp.prof \
 > CaffeineMarkEmbeddedApp -destination=caffeine_useProfile10
Reading configuration from '/usr/local/jamaica/target/linux-x86/etc/
  jamaica.conf'...
Reading configuration from '/usr/local/jamaica/etc/jamaica.conf'...
Jamaica Builder Tool 3.2 Release 1
 + PKG__V92bece49ddb8a421__.c
 + PKG_java_lang_V366cdfcbd50ed4c6__.c
 + PKG_java_lang_ref_V366cdfcbd50ed4c6__.c
 + PKG_java_util_V366cdfcbd50ed4c6__.c
 + PKG_javax_realtime_V366cdfcbd50ed4c6__.c
 + caffeine_useProfile10__.c
 + caffeine_useProfile10__.h
Class file compaction gain: 83.04776% (5767243 ==> 977677)
 * C compiling 'caffeine_useProfile10__.c'
 * C compiling 'PKG__V92bece49ddb8a421__.c'
 * C compiling 'PKG_java_lang_V366cdfcbd50ed4c6__.c'
 * C compiling 'PKG_java_lang_ref_V366cdfcbd50ed4c6__.c'
 * C compiling 'PKG_java_util_V366cdfcbd50ed4c6__.c'
 * C compiling 'PKG_javax_realtime_V366cdfcbd50ed4c6__.c'
 + caffeine_useProfile10__nc.o
 * linking
 * stripping
Application memory demand will be as follows:
                       initial               max
Thread C    stacks:     512KB (=   8*  64KB)   31MB (= 511*  64KB)
Thread Java stacks:     128KB (=   8*  16KB) 8176KB (= 511*  16KB)
Heap Size:             2048KB                 256MB
GC data:                128KB                  16MB
TOTAL:                 2816KB                 311MB

 > filesize caffeine_useProfile10
3255608
        

The resulting application size is only slightly larger than the interpreted version, but the runtime performance is nearly the same as that of the fully compiled version as presented in the Section called Using Full Compilation:

 > ./caffeine_useProfile10
Sieve score = 20873 (98)
Loop score = 29011 (2017)
Logic score = 11538 (0)
String score = 17197 (708)
Float score = 12211 (185)
Method score = 5249 (166650)
Overall score = 14052
        

When a profile is used to guide the compiler, by default 10% of the methods executed during the profile run are compiled. This results in a moderate code size increase compared with fully interpreted code and typically results in a run-time performance very close to fully compiled code. Using the builder option percentageCompiled, this default setting can be adjusted to any value between 0% and 100%. Note that setting the value to 100% is not the same as setting the option compile (see the Section called Using Full Compilation), since the percentage value only refers to those methods executed during the profiling run. Methods not executed during the profiling run will not be compiled when useProfile is used.

Entries in the profile can be edited manually, for example to enforce compilation of a method that is performance critical. For example, the profile generated for this example contains the following entry for the method size() of class java.util.Vector.

PROFILE: 64 (0%)        java/util/Vector.size()I
        

To enforce compilation of this method even when percentageCompiled is not set to 100%, the profiling data can be changed to a higher value, e.g.,

PROFILE: 1000000 (0%)        java/util/Vector.size()I
        

Selecting C compiler optimization level

Enabling C compiler optimizations for code size or execution speed can have an important effect on the the size and speed of the application. These optimizations are enabled via setting the command line options "-optimize=size" or "-optimize=speed", respectively.

 > jamaica -useProfile=CaffeineMarkEmbeddedApp.prof -optimize=speed \
 > CaffeineMarkEmbeddedApp -destination=caffeine_useProfile10_speed
Reading configuration from '/usr/local/jamaica/target/linux-x86/etc/
  jamaica.conf'...
Reading configuration from '/usr/local/jamaica/etc/jamaica.conf'...
Jamaica Builder Tool 3.2 Release 1
Generating code for target 'linux-x86', optimisation 'speed'
+ PKG__V299b443475949ffb__.c
[..]

 > filesize caffeine_useProfile10_speed
3206488
        
 > jamaica -useProfile=CaffeineMarkEmbeddedApp.prof -optimize=size \
 > CaffeineMarkEmbeddedApp -destination=caffeine_useProfile10_size
Reading configuration from '/usr/local/jamaica/target/linux-x86/etc/
  jamaica.conf'...
Reading configuration from '/usr/local/jamaica/etc/jamaica.conf'...
Jamaica Builder Tool 3.2 Release 1
Generating code for target 'linux-x86', optimisation 'size'
 + PKG__V5cef8b506b6f85cf__.c  
[..]

 > filesize caffeine_useProfile10_size
3169624
        

The resulting performance depends strongly on the C compiler that is employed and may even show anomalies such as better runtime performance for the version optimized for smaller code size:

 > ./caffeine_useProfile10_speed
Sieve score = 16623 (98)
Loop score = 87567 (2017)
Logic score = 75990 (0)
String score = 20265 (708)
Float score = 20222 (185)
Method score = 14583 (166650)
Overall score = 29514

 > ./caffeine_useProfile10_size
Sieve score = 23449 (98)
Loop score = 68301 (2017)
Logic score = 212193 (0)
String score = 20445 (708)
Float score = 21781 (185)
Method score = 14470 (166650)
Overall score = 36035
        

Using Full Compilation

Full compilation can be used when no profiling information is available and code size or built time is not an important issue.

Warning

Fully compiling an application leads to very poor turn-around times and may require significant amounts of memory during the C compilation phase. We recommend compilation be used only through profiling ad described above (the Section called Compilation via Profiling).

To compile the complete application, the option compile must to be set:

 > jamaica -compile CaffeineMarkEmbeddedApp \
 > -destination=caffeine_compiled
Reading configuration from '/usr/local/jamaica/target/linux-x86/etc/
  jamaica.conf'...
Reading configuration from '/usr/local/jamaica/etc/jamaica.conf'...
Jamaica Builder Tool 3.2 Release 1
Generating code for target 'linux-x86', optimisation 'none'
 + PKG__Vabe8b46b7ebd5924__.c
 + PKG_com_aicas_jamaica_V70cafbfd846ce269__.c
[..]
 + caffeine_compiled__.c
 + caffeine_compiled__.h
Class file compaction gain: 89.87185% (5767243 ==> 584115)
 * C compiling 'caffeine_compiled__.c'
 * C compiling 'PKG__Vabe8b46b7ebd5924__.c'
 * C compiling 'PKG_com_aicas_jamaica_V70cafbfd846ce269__.c'
[..]
 + caffeine_compiled__nc.o
 * linking
 * stripping
Application memory demand will be as follows:
                       initial               max
Thread C    stacks:     512KB (=   8*  64KB)   31MB (= 511*  64KB)
Thread Java stacks:     128KB (=   8*  16KB) 8176KB (= 511*  16KB)
Heap Size:             2048KB                 256MB
GC data:                128KB                  16MB
TOTAL:                 2816KB                 311MB

 > filesize caffeine_compiled
9978244
        

The performance of the compiled version is significantly better than the interpreted version. However, there is only a small difference compared to the version created using the profile as described in the the Section called Compilation via Profiling.

 > ./caffeine_compiled
Sieve score = 20194 (98)
Loop score = 28863 (2017)
Logic score = 11583 (0)
String score = 13435 (708)
Float score = 11972 (185)
Method score = 5231 (166650)
Overall score = 13357
        

For a better performance of a fully compiled application, compile can of course be combined with the appropriate C compiler optimization level as shown in the Section called Selecting C compiler optimization level.