Chapter 3. The Program Transformation Framework
3.5. Code Generation
In this section we show some examples of program transformation using JGene. Before going into the example, we first describe briefly about the Java Native Interface (JNI). JNI is a programming framework that allows Java code running in Java VM to call and be called by native applications and libraries written in other languages such as C, C++ and assembly. To bind native method code to the Java application, a native method interface is used, which has been standardized across most JVMs. This interface is used to write native methods to handle situations when an application cannot be written entirely in the Java programming language such as when the standard Java class library does not support the platform-dependent features or program library. It is also used to modify an existing application, written in another programming language, to be accessible to Java applications.
For example, consider the pure java program below:
public class adder { void exec() {
int x = add(1, 2);
System.out.println(x);
}
int add(int i, int j) {
return i + j;
} }
The corresponding JNI-compliant C program may be generated below:
// Java
public class adder { public void exec() {
int x = add(1, 2);
System.out.println(x);
}
public native int add(int i, int j);
} // JNI
JNIEXPORT jint JNICALL Java_adder_add(JNIEnv *env, jclass cl, jint i, jint j) {
return i+j;
}
In the context of embedded systems, the overhead to support JNI becomes large and may not justify the gain. The main reason is that JNI is intended to be standardized across multiple JVMs, so that the same native C/C++ code may still be portable, but the benefit of this approach is less relevant for embedded systems. KVM Native Interface (KNI) is a trimmed-down version of JNI for KVM - a small Java VM written for embedded systems.
Unlike JNI, KNI is essentially implementation specific, not intend to be standardized.
Another VM for embedded systems, CVM, is a larger JVM implementation that complies with the JNI standard, has more advanced JIT compiler technology inside, and hence requires more resources than KVM. Note that many of the standard library classes depend on KNI to provide functionality to the developer and the user, e.g. I/O file reading and sound capabilities.
Including performance- and platform-sensitive API implementations in the standard library allows all Java applications to access this functionality in a safe and platform-independent manner.
Below we illustrate the transformation targeting KNI using the same example given previously. Currently, the transformation framework only works for a subset of input programs, namely, Java programs with certain syntactical restrictions. The goal is to investigate the potential usefulness of the transformation approach in general and the internals of the KVM and CVM. We show the generated KNI C program below.
// Java
public class Adder {
public void exec() {
KNIEXPORT KNI_RETURNTYPE_INT adder_add() {
jint i=KNI_GetParameterAsInt(1);
jint j=KNI_GetParameterAsInt2);
KNI_ReturnInt(i + j);
}
// Modified Java class public class Test {
public native int loop(n);
}
As the example above shows, when targeting KNI, the generated C function needs to comply with the programming model defined by KNI. In particular, parameter passing is done through explicit stack operations using predefined KNI macros like KNI_GetParameterAsInt(). KNI also defines macros for accessing objects inside KVM. Note that this indicates some potential issues when the generated C function is to be placed on a different core than the one running the KVM.
Another more involved example is a Lens Blur Filter (see Chapter 4), as shown below:
public static int[] lensBlurFilter(int[] rgbIn, int width, int height) {
... ImageFFT fft = new ImageFFT( Math.max(log2rows, log2cols) );
// Normalize the kernel i = 0;
fft.transform2D( mask1, mask2, w, h, true );
for (...) { for (...) {
...
//src image getRGB ...
} }
// Transform into frequency space
fft.transform2D( ar1, ar2, cols, rows, true);
fft.transform2D( gb1, gb2, cols, rows, true);
// Multiply the transformed pixels by the transformed kernel ... // Transform back
fft.transform2D( ar1, ar2, cols, rows, false );
fft.transform2D( gb1, gb2, cols, rows, false );
... //dst Image setRGB ...
Return ...;
}
public void transform2D( float[] real, float[] imag, int cols, int rows, boolean forward )
{ ...
// FFT the rows
for ( int y = 0; y < rows; y++ ) { ...
transform1D(rtemp, itemp, log2cols, cols, forward);
...
}
// FFT the columns
for ( int x = 0; x < cols; x++ ) { ...
transform1D(rtemp, itemp, log2rows, rows, forward);
...
} }
The most frequently invoked method in the Lens Blur Filter is transform2D(). Indeed, after profiling, we found that transform2D() occupied most of the run time. Naturally, we would like to make transform2D() native, although we could also translate all lensBlurFilter(), including transform2D() and the other methods it calls into pure C code. But this may result in excessive code size. Using JGene, we are able to explore both approaches quickly. Chapter 4 will show the performance results.
Consider the case where only transform2D() is made native. The following code shows that transform2D() is generated in KNI-compliant format and transform1D() in pure C format.
KNIEXPORT KNI_RETURNTYPE_VOID
Java_ccl_midlet_image_ImageFFT_transform2D() {
/* Get the java input parameter */
KNI_StartHandles(5);
KNI_DeclareHandle(handle1);
KNI_GetParameterAsObject(1, handle1);
jint handle1_len = KNI_GetArrayLength(handle1);
jfloat real [handle1_len];
KNI_GetRawArrayRegion(handle1, 0, handle1_len*sizeof(jfloat), (jbyte*)real);
...
jint cols = KNI_GetParameterAsInt(6);
jint rows = KNI_GetParameterAsInt(7);
jboolean forward = KNI_GetParameterAsBoolean(8);
// start real java code here ...
// FFT the rows
for (int y = 0; y < rows; y++) { ...
transform1D(rtemp, itemp, w1, w2, w3, log2cols, cols, forward);
...
}
// FFT the columns
for (int x = 0; x < cols; x++) { ...
transform1D(rtemp, itemp, w1, w2, w3, log2rows, rows, forward);
...
} // end real java code
KNI_SetRawArrayRegion(handle1, 0, handle1_len*sizeof(jfloat), (jbyte*)real);
KNI_SetRawArrayRegion(handle2, 0, handle2_len*sizeof(jfloat), (jbyte*)imag);
KNI_EndHandles();
}
void transform1D(float *real, float *imag, float *w1, float *w2, float
*w3, int logN, int n, bool forward) { scramble(n, real, imag);
butterflies(n, logN, forward ? 1 : -1, real, imag, w1, w2, w3);
}
Note that static program transformation also has intimate relation with the dynamic JIT compiler. For example, in some situation, a method may become too big to compile dynamically by the JIT compiler, because the embedded system may not provide sufficient memory. One example is the FFT benchmark in Scimark2 (see Chapter 4). In this case we try to rewrite the main method of transformInternal(), the goal is try to reduce the original method code size by dividing transformInternal() into several smaller methods. With the help of JGene, the user can easily perform the necessary refactoring and explore the differences in terms of memory cost and speed-ups. In our experiment, after proper refactoring, the new version of transformInternal() could be translated into native format easily, and the original transformInternal() can be JIT-compiled again.
Although the JGene offer an automatic translation mechanism, it does not guarantee the translated code work exactly the same as the original code. The engineer should look at the generated code along with its Java counterpart to ensure that the translation scheme is valid.
Because of this, no extensive compiler background needed to extend and use JGene unless one wants to provide more advanced analyses.