A leaf subroutine is one that does not call any other subroutine. A simpler calling method might be used for the leaf subroutines.
A leaf subroutine uses only the first six out registers, and the global registers %g0 and %g1.
A leaf routine does not execute a save or a restore instruction. e.g. of such subroutines: .mul, .div, etc.
Here, the return address is (%o7 + 8), (and not %i7 + 8)
There is a synthetic instruction for that:
retl ==> jmpl %o7, 8, %g0
If the subroutine foo in the previous example is written as a leaf subroutine, then it should have the following code:
define(a8_s, arg8_s)
define(a7_s, arg7_s)
define(a6_r, o5)
define(a5_r, o4)
define(a4_r, o3)
define(a3_r, o2)
define(a2_r, o1)
define(a1_r, o0)
.global _foo
_foo: add %a2_r, %a1_r, %o0 !o0 = first + second
add %a3_r, %o0, %o0 !o0 += third argument
add %a4_r, %o0, %o0 !o0 += fourth argument
add %a5_r, %o0, %o0 !o0 += fifth argument
add %a6_r, %o0, %o0 !o0 += sixth argument
ld [%sp + a7_s], %o1 !the seventh argument
add %o1, %o0, %o0 !o0 += seventh argument
ld [%sp + a8_s], %o1 !the eighth argument
retl
add %o1, %o0, %o0 !o0 += eighth argument
swap (int *x, int *y)
{ int temp;
temp = *x;
*x = *y;
*y = temp;
}
The corresponding assemply code is given below:
define(x_s, -4)
define(y_s, -8)
.global _main
_main: save %sp, (-64 -8) & -8, %sp
mov 5, %o0
st %o0, [%fp + x_s] !x = 5
mov 7, %o0
st %o0, [%fp + y_s] !y = 7
add %fp, x_s, %o0 !pointer to x in %o0
call _swap
add %fp, y_s, %o1 !pointer to y in %o1
ret
restore
.global _swap ! a leaf routine
_swap: ld [%o0], %o2 !%o2 = x
ld [%o1], %o3 !%o3 = y
st %o2, [%o1]
retl
st %o3, [%o0]
If x and y are initially stored in registers, then a few modifications to the code are required, as shown below:
define(x_r, l0) ! x in %l0
define(y_r, l1) ! y in %l1
define(x_s, -4) ! where x might be stored on the stack
define(y_s, -8) ! where y might be stored on the stack
.global _main
_main: save %sp, -72, %sp
mov 5, %x_r !x = 5
mov 7, %y_r !y = 7 , now call swap
st %x_r, [%fp + x_s] ! place args on stack
st %y_r, [%fp + y_s]
add %fp, x_s, %o0 !pass pointers to args on stack
call _swap
add %fp, y_s, %o1
ld [%fp + x_s], %x_r !move values back into reg.
ld [%fp + y_s], %y_r
ret
restore
.global _swap ! a leaf routine
_swap: ld [%o0], %o2 !%o2 = x
ld [%o1], %o3 !%o3 = y
st %o2, [%o1]
retl
st %o3, [%o0]
The principle design decision supporting efficiency is the fact that instructions in the SPARC architecture are constant size. Each instructions occupies a single word, 32 bits. This allows the instruction fetch unit to be very simple and fast. For each instruction executed, the i-fetch unit simply gets one word from memory.
The 32 bits that we have available to encode an instruction are divided into fields -- contiguous subsets of the bits. We need to define fields to specify the instruction's opcode, and fields to specify the instruction's arguments (operands).
The format of the instruction should be uniform and as simple as possible. A typical scheme is given below:
8 bits - To specify the instruction (we get 256 instructions!!) 5 bits - For address of the three registers (32 different regs.) 1 bit - To specify if second argument is a source register or constant 8 bits - Remaining 8 bits either combined with 5 for signed immediate constant or to provide additional 8 bits to specify floating point instructions.
The 8 bits of the instruction opcode are divided into 2 parts:
op - 2 bits op3 - 6 bits
The division of the instruction into fields in done in three different ways, called formats. The format is specified in the first two bits of the instruction:
OP INSTRUCTION CLASS Format 00 Branch Instruction Format 2 01 call instruction Format 1 10 Format three instruction Format 3 11 Format three instruction Format 3
op rs1, rs2, rd
or
op rs1, imm, rd
where "imm" is an immediate value, that is, it is coded right in to the
instruction.
For the first case (3 registers), the fields in the instruction look like this:
Width: 2 5 6 5 1 8 5
----------------------------------------
| | | | |0| | |
----------------------------------------
Name: op rd op3 rs1 empty rs2
op: 10 or 11, indicating that this is a format 3 instruction
rd: numeric ID of destination register
op3: encoding of opcode
rs1: numeric ID of first source register
rs2: numeric ID of second source register
The numeric IDs of registers are as follows:
%g0 - %g7 -- 0 to 7
%o0 - %o7 -- 8 to 15
%l0 - %l7 -- 16 to 23
%i0 - %i7 -- 24 to 31
Five bits are needed to encode a register, since there are only 32 of
them. The numeric encoding of opcodes can be found in Chapter 8, pp
215-216.
We can see that eight bits have been left unused, which is necessary to "pad" the instruction out to the required 32 bit length.
For the second case, (2 registers and an immediate value), the fields in the instruction look like this:
Width: 2 5 6 5 1 13
----------------------------------------
| | | | |1| |
----------------------------------------
Name: op rd op3 rs1 signed constant
op: 10 or 11, indicating that this is a format 3 instruction
rd: numeric ID of destination register
op3: encoding of opcode
rs1: numeric ID of first source register
The way the processor tells these two cases apart is by the 1
or 0 right after the rs1 field.
Note that the signed constant is only allotted 13 bits. This is a consequence of our decision to limit all instructions to exactly 32 bits in size. The immediate value is encoded in two's complement, so the range of values we can encode is -4096 to +4095. Thus, if we try to use an immediate value in a format 3 instruction that is out of this range, the assembler will complain -- it just can't fit such a value in there.
7 bits are used for format 3 instructions, so we can have a total of
128 possible instructions. All the opcodes that are not used are named
UNIMP