5.4 Method calls
5.4.1 Lightweight methods
For these cases we introduce a new type of method call: lightweight methods. These methods differ from normal methods in two ways:
• No stack frame is created for lightweight methods, but space reserved in the caller’s frame is used.
• Parameters are passed on the stack, rather than in local variables.
10
Figure 5.2: Number of CoreMark method calls vs. duration (logarithmic scales)
Lightweight methods give us a third choice, in between a normal method call and method inlining. When calling a lightweight method, the method’s AOT compiled code is called directly. This bypasses the VM completely, reusing the caller’s stack frame, and leaving the parameters on the (caller’s) stack. In effect, the lightweight methods behave similar to inlined code, but do not incur the code size overhead of duplicating large inlined methods. Because the method will be called from multiple locations which may have different cache states, the stack cache must be flushed to memory before a call. This results in slightly more overhead than for inlined code, but much less than for a normal method call.
As an example, consider the simple isOdd method in Listing 5.3:
1 // JAVA
Listing 5.3: Simple, stack-only lightweight method example
The normal implementation has a single local variable. It expects the parameter to be stored in the local variable and the stack to be empty when we enter the method. In con-trast, the lightweight method does not have any local variables and expects the parameter to be on the stack at the start of the method.
We added a new instruction, INVOKELIGHT, to call lightweight methods. Listing 5.4 shows how INVOKELIGHT and INVOKESTATIC are translated to native code. Both
first flush the stack cache to memory. After that, the lightweight invocation can directly call the implementation of isOdd, while the normal version first saves the stack pointers, and then enters an expensive call to the callMethod function in the VM, which will set up a stack frame for isOdd, and then call the actual method.
1 // NORMAL INVOCATION
2 // INVOKESTATIC isOdd:
3 push r25 // Flush the cache
4 push r24
5 call &preinvoke // Save X and SP
6 ldi r22, 253 // Set parameters
7 ldi r23, 2 // for callMethod
8 ldi r24, 21
14 call &callMethod // Call to VM
15 call &postinvoke // Restore X and SP
// LIGHTWEIGHT INVOCATION // INVOKELIGHT isOdd:
push r25 // Flush the cache push r24
call &isOdd
Listing 5.4: Comparison of lightweight and normal method invocation
Local variables
The lightweight implementation of the isOdd example only needs to process the values that are on the stack, but this is only possible for the smallest methods. If a lightweight method has local variables, space is reserved for them in the caller’s stack frame, equal to the maximum number of slots needed by all the lightweight methods it may call.
CapeVM uses the ATmega’s Y register to point to the start of a method’s local vari-ables. To call a lightweight method with local variables, the caller shifts Y up to the region reserved for lightweight method variables before doing the CALL. The lightweight method can then access its locals as if it were a normal method. After the lightweight method returns, the caller lowers Y, so it points to the caller’s own variables again.
Nested calls
A final extension is to allow for nested calls. While frequently called leaf methods ben-efit the most from lightweight methods, there are many cases where it is useful for one lightweight methods to call another. A good example from the CoreMark benchmark is the 32-bit crcu32 function, which is implemented as two calls to crcu16. For the best
performance, both methods should be lightweight.
So far we have not discussed how to handle the return address in a lightweight method.
The AOT compiler uses the native stack to store VM’s integer stack value, which means the operands to a lightweight method will be on the native stack. But after a native CALL instruction, the return address is also put on the native stack, covering the method param-eters.
For leaf methods, the lightweight method will first pop the return address into two fixed registers, and avoid using these register for stack caching. When the method returns, the return address is pushed back onto the stack just before the RET instruction.
For lightweight methods that will call another lightweight method, the return value is also popped from the stack, but instead of leaving it in the fixed register, where it would be overwritten by the nested call, it is saved in the first local variable slot and Y is incremented to skip this slot. Since each lightweight method has its own block of locals, lightweight calls can be nested as deeply as needed.
This difference in method prologue and epilogue is the only difference in the way the VM generates code for a lightweight method, all bytecode instructions can then be translated the same way as for a normal method.
Stack frame layout
A normal method that invokes a possible string of lightweight methods, needs to save space for this in its stack frame. How much space it needs to reserve can be determined by the infuser at compile time, and this information is added to the method header used to create the stack frame.
An example is shown in Figure 5.3, which shows the stack frame for a normal method f, which calls lightweight method g_lw, which in turn calls another lightweight method h_lw.
The stack frame for f contains space for its own locals, and for the locals of the lightweight method it calls: g_lw. In turn, g_lw’s locals contain space for h_lw’s lo-cals, as well as a slot to store the return address back to f. Since h_lw does not call any
Stack frame layout for method f,
return address to from g_lw to f
Y when calling g_lw
which calls lightweight method g_lw local variables
reference stack
reference stack optional extra space for lw methods
Stack frame layout for methods without lightweight method calls
Figure 5.3: Stack frame layout for a normal method f, which calls lightweight method g_lw, which in turn calls lightweight method h_lw
other methods, it just keeps its return address in registers.
When a method calls a lightweight method with local variables, it will move the Y register to point to the lightweight method’s locals. From Figure 5.3 it is clear it only needs to increment Y by the size of its own locals. For f, this will place the Y register at the beginning of g_lw’s locals. Since g_lw may call h_lw, g_lw’s prologue will first store the return address in the first local slot, moving Y forward in the process so that Y points to the first free slot.
Mark loops
Lightweight methods may use any register and do not save call-saved registers like normal methods. When a lightweight method is called inside a MARKLOOP block, it may corrupt some of the variables pinned to registers. In this case the caller saves those variables back to memory before calling the lightweight method and loads them again after the call returns. Since lightweight methods always come before their invocation in the infusion, the VM already knows which registers it will use, and will only save and restore pinned variables if there is a conflict. Because registers for MARKLOOP are allocated low to high, and for normal stack caching from high to low, in many cases the two may not collide.
Example call
An example of the most complex case for a lightweight call is shown in Listing 5.5, which shows how method f from Figure 5.3 would call g_lw, assuming f is in a MARKLOOP block at the time which pinned a variable to registers R14:R15, and these registers are also used by g_lw.
In the translation of the INVOKELIGHT instruction, first the stack cache is flushed to memory, then the value of the local variable at offset 22 is saved because it was pinned to R14:R15. Next, the Y register is incremented to skip the caller’s own local variables and point to the start of the space reserved for lightweight method locals.
In the implementation of g_lw, the return address is popped off the stack into R18:R19. Since g_lw may call another lightweight method which will do the same, the return address is stored in the first local slot, incrementing Y in the process. After g_lw’s body, the return address is pushed back onto the stack before the final ret instruction.
After g_lw returns, the reverse process is used to return to the caller: the Y register is restored to point to the caller’s locals, and the local variable at offset 22 is loaded back into the pinned registers R14:R15.
1 // LIGHTWEIGHT INVOCATION
2 INVOKELIGHT g_lw
3 push r25 // Flush the cache
4 push r24
5 std Y+22, r14 // Save pinned value
6 std Y+23, r15
22 ldd r14, Y+22 // Reload pinned value
23 ldd r15, Y+23
// IMPLEMENTATION OF g_lw
pop r18 // Pop the return address pop r19
st Y+, r18 // Save in 1st local slot, st Y+, r19 // and increment Y
.. // g_lw's body
ld r19, -Y // Load return address, ld r18, -Y // and decrement Y push r19 // Push return address push r18 // onto the stack ret
Listing 5.5: Full lightweight method call