GAS/GCC ARM Assemler

version 0.2 [updated 17th Oct 2001]

Colour useage

any sections/text in this cyan colour are untested
any sections/text in this greeny colour are tested and should work, if they do not then it was a mistake when I cut'n'pasted them from working code
and anything in this colour is a command entered at the command line
finally items in this colour are jargon, or terms that you should know as some point I may make them links to their definitions

it is assumes that the following is true:
You are using Windows (All test done on Win2000)
You are using gcc 3.0.1 (dev kit advance)
devkitadvance is installed in c:\devkitadv\....
and that c:\devkitadv\bin is on your PATH
i.e. set PATH=%PATH;c:\devkitadv\bin

The basics

I'll assume you have read the arm instruction set, and have a basic understanding of a CPU
I will start from what I consider the basics for many this will be too slow, hopefully nobody will think it too fast, if you do find your getting lost or find items confusing email me Mike Wynn and I'll produde a FAQ from the mails Please tell me if there are any error, speling mistakes or ommisions
A.I hate inacurate docs
B.I can not spell
C.I hate inacurate docs

To make life easy for those of you who do not have RISC expericance I'm going to start with the thumb mode instruction set.
the GBA CPU has two instruction sets thumb and arm. The arm set is more classical RISC instruction set any one who has MIPS or PowerPC experiance will find it quite familiar. The basic differences are; is no load linked, store conditional, every instruction has an optional condition code and you do not have to worry about the branch delay slot.
thumb on the other hand is much more akin to the instruction set of old 8 bit CPU's like the 6502, but it operates on 32 bit registers. it is a much more restrictive set than arm but if you are not experianced with RISC assembler then its a good starting point as it has less pit falls than arm mode; all instructions set the flags, the spack pointer is the stack pointer (in arm mode any register can be a stack pointer), all instruction are executed (no condition code prefix) and as I said before it maybe a little more familiar to those who are from a CISC or old 8 bit CPU background.
I have not YET written a guide to arm/thumb (and may never do) have a look on www.devrs.com/gba or www.gbadev.org for docs these are lots there on GBA and ARM.
Let's start by looking at a simple function from test1.c
static unsigned int getValue( void )
{
	return 0xC0DEF00D;
}
which returns a unsigned 32 bit quantity.
if you enter the following command
gcc -mthumb -S test1.c
you will have created test1.s an assembler version of the test1.c file
the important part of this file is the assmebler for that function
	.align	2
	.thumb_func
	.type	getValue,function
getValue:
	push	{r7, lr}
	mov	r7, sp
	ldr	r0, .L8
	mov	sp, r7
	pop	{r7, pc}
.L9:
	.align	2
.L8:
	.word	-1059131379
.Lfe2:
	.size	getValue,.Lfe2-getValue
ARM codeMeaning
.align 2 notify the linker that this is should be 2 byte aligned half word aligned (N.B. all thumb MUST be half word aligned
.thumb_func notify the linker that this is a thumb function
.type getValue,function notify the linker what type of item follows. In this case a function (local to this file) called getValue
getValue: this is a label that we have seen earlier is a function.
push {r7, lr} This is the first of two prolog instructions. This saves any regsters that might be required and the old stack frame
r7 in thumb mode is similar to fp in arm mode it points to the current stack frame for accessing parameters that have been passed in and so that the saved values can be reloaded in the epilog N.B. this is a RISC CPU so the return address is not automatically pushed on the stack as a CISC CPU would (i.e. Z80, 6502, 68K or x86 to name but a few) instead the return address is store in a register namely lr or link register (other RISC i.e. MIPS and PowerPC do exactly the same)
you should note that hear we are pushing the frame pointer and link register on to the stack
mov r7, sp This is the second and final instruction in the prolog we set the frame pointer correctly (having saved the old value in the previous instruction) so that the stack pointer (sp) can be used without us having to know how much stack is being used within the function for calling other functions
ldr r0, .L8 Finally the actual code, in this example it is just this
we loada register in this case r0 with a value from a label local to this item here is is .L8
N.B. the label must lie within no nearer than 4 bytes and no further than 1024 bytes from the instruction and by word (4byte) aligned
r0 is now loaded with the value to return.
mov sp, r7 having set r0 we can leave
first we must restore the stack pointer so we can recover the registers we saved in the prolog. simply setting the stack pointer to current frame pointer will do
pop {r7, pc} now the stack pointer is restored we can restore the the other registers
But note that we pushed frame pointer and link register now we are restoring frame pointer and program counter
of course we are! in the the prolog the link register held the return address. Now we want to go back there so to save a pointless copy instruction from link register to program counter we just pop the required value straigh into the program counter and that's the end of the function.
in one hit the frame pointer now pointer to the parent stack frame and the program counter is at the instruction directly after the function call (branch and link) that sent it here.
.L9: a local label that identified the end of the item (see below)
.align 2 tell the assmebler that the next instruction/directive must be 2 byte aligned (half word aligned)
.L8: a local label that you have seen earlier that marks the constant that we are going to load into r0
.word -1059131379 Tell the assembler to put the word (32bit) literal (this is 0xC0DEF00D) as a signed 32 bit integer
.Lfe2: label to identify the end of the item
.size getValue,.Lfe2-getValue Tell the linker how bit this item was in this case it is a function and its associated constants

inlining yourself

if you look into test2.c you should find this
static unsigned int getValue( void )
{
	asm volatile 
	("
	ldr r0, .myValue
	b .exit_this_function
	.align 2
	.myValue:
	.word 0xFEEDFACE
	.align 2
	.exit_this_function:
	");
}
but, if you run the following command
gcc -mthumb -S test2.c
you will have created test2.s is an assembler version of the test2.c file
and within it you will find the following
	.align	2
	.thumb_func
	.type	getValue,function
getValue:
	push	{r7, lr}
	mov	r7, sp
	
	ldr r0, .myValue
	b .exit_this_function
	.align 2
	.myValue:
	.word 0xFEEDFACE
	.align 2
	.exit_this_function:
	
	.code	16
	mov	sp, r7
	pop	{r7, pc}
.Lfe2:
	.size	getValue,.Lfe2-getValue
Now I'm not going to anotate this as I'm sure you can see that the compiler has taken the inline code and wrapped it with the same prolog and epilog code that it would out put around a normal "C" function
and you can see why the extra branch to the end of the inline block is require, else the constant you have dropping into the code segment will be executed as if it was an instruction, not a very pleasant prospect.
First potential pitfall
As you have seen things get added outside your inline code so if you do require inline constants and invariably you will, then be very careful.
you could have done:
static unsigned int getValue( void )
{
	asm volatile 
	("
	ldr r0, .myValue
	mov	sp, r7
	pop	{r7, pc}
	.align 2
	.myValue:
	.word 0xFEEDFACE
	");
}
because you know what prolog code would be generated, so you can write your own epilog. I would not advise this approach, for the folowing reason.
try this command (ignore the warning about string literals!)
$(GCC) -O3 -S -mthumb -o test2_opt.s test2.c
is should generate test2_opt.s is an assembler version of the test2.c file BUT this time we have told gcc to fully optimise the code (-O3)
	.align	2
	.thumb_func
	.type	getValue,function
getValue:
	
	ldr r0, .myValue
	b .exit_this_function
	.align 2
	.myValue:
	.word 0xFEEDFACE
	.align 2
	.exit_this_function:
	
	.code	16
	bx	lr
.Lfe5:
	.size	getValue,.Lfe5-getValue
at first glance you will be forgiven for thinking that the compiler has gone against your wishes and plonked the CPU into ARM mode somewhere in main before calling this.
but that is not true the bx instruction in thumb mode optionally returns to arm mode. and if out check the arm instruction bx you will see the same is true.
if you look in the documents on the thumb instruction bl you will see that is sets the bottom bit of the link register (lr) to 1 then if you read the docs on the thumb instruction bx you will see that the mode that the CPU is in is dependant on the bottom bit of the register used in the bx statement, in this case lr.
As you see this has no prolog or epilog because it is a leaf function (that's one that does not call any others [tree->leafs]) so we never have to create a real stack frame and even better we have no parameter, locals and we are not using any callee save registers. try running this
gcc -S -mthumb -o param1.s param1.c
gcc -O3 -S -mthumb -o param1_opt.s param1.c
then look into param1.c,param1.s and param1_opt.s and you will see examples of leaf functions before and after optimisation with differing prolog and epilogs due the optimisations that the complier could perform. I have no idea yet why in arm mode gcc uses mov pc, lr and in thumb mode uses bx as I did not have either in interwork mode and thumb has a mov pc, lr instruction
as a quick aside in interwork mode the compiler always uses bx for both arm and thumb and if you where wondering about how to branch and link to a thumb sub-routine from arm code this is how
	ldr r0, thumb_func_addr+1
	mov lr, pc
	bx r0
	... rest of arm code ...
This works because pc is 8 bytes ahead of the current instruction. and as long as the thumb sub-routine returns with a bx instruction all is o.k. if it uses pop {r4, pc} instruction then things will go pearshaped.
the same is true for calling arm from thumb
	ldr r0, arm_func_addr
	mov lr, pc
	bx r0
	... rest of thumb code ...
as you see the bottom bit of the link register is set to the correct mode for the return.