We can all be glad that the current popularity of ARM has saved us from the wretched and vile mess that x86 assembly is. It is a tribute to the work of clever engineers at Intel that they are able to coax the amazing performance they do out of the x86 while maintaining binary compatibility with such a miserable architecture. Enough said on that topic. ARM by contrast is quite civilized.
I will say little if anything about machine language. If you are writing an assembler or disassembler, you will need to care about how instructions are encoded, but I find little need to be concerned about it. Sometimes it forces itself on us though. It is worth knowing though that ARM instructions have a constant 32 bit size. (But see my notes on "thumb encoding" below).
R11 - fp (frame pointer if you use gcc) R13 - sp (the stack pointer, entirely by convention) R14 - lr (the link register, holds the return address for subroutine calls) R15 - pc (the program counter)Athough r13 is the stack pointer only by convention, you would be making severe trouble for yourself if you do otherwise. If you are using gcc, the R11 register is used for the frame pointer.
Why do we skip r12 you should be asking. I have found no good reason. It was probably specified for some special purpose in some committee designed standard, but then was never actually put to use. Who knows!
The Gnu tools use obsolete aliases for the R10 and R12 registers. These may have had particular definitions in some deprecated ABI, but I have never found them to be anything but general registers and it is unfortunate and somewhat confusing that the Gnu tools retain the old names.
R10 - sl (the stack limit) R12 - ip (intra procedure scratch register)
bl mysubThe call places the return address in the "lr" register which can simply be copied into "pc" to return. However the usual way to return is as follows.
bx lrSince to return you just copy lr to pc, you can achieve the proper effect in any number of ways. You are probably saying, "why not just return by using "mov pc,lr" and indeed you could. The "bx" instruction does something extra to allow switching to and from thumb mode (see below). If you are not using thumb mode (and I have yet to use it), this does not matter, but doing a return using "bx lr" is as good as anything and a fine convention to follow. So if you were wondering why we use this special "bx" instruction to do something that doesn't seem all that special, this is why and now you know.
This business of not using the stack makes for tidy and efficient "leaf" routines, but if your subroutine intends to call other subroutines, you will need to shove lr onto the stack otherwise it will get overwritten by the next call. This leads to the following idiom for coding a subroutine. Let's suppose we also want to save the r4 register
push {r4, lr} ... bl next_sub ... pop {r4, pc}This can be expanded to save and restore any number of registers.
As near as I can tell, you never need to save r0, r1, r2, or r3 to play nice with gcc. Also r0 serves to return function values. Subroutine arguments simply get passed in r0, r1, ... ad nauseum. This is all about C compiler conventions, so if you are writing pure assembly language you can do anything you can keep straight with yourself. Good luck, and "Vaya con Dios".
The ARM has an alternate encoding where each instruction occupies 16 bits that is known as "thumb mode". This allows compact code and possibly more efficient code if the memory bus is a design bottleneck. Thumb mode has a fair number of instruction differences and even instructions that have no direct counterpart in regular ARM mode.
You enter thumb mode using the "bx" instruction with the target address in some register. If the address being branched to is odd (the low bit is set) execution switches to thumb mode. Another way is to fiddle wit the "t" bit in the SPSR in a specific way. You can't just fiddle it directly, but you must let some other instruction restore the CPSR.
To exit thumb mode, execute a "bx" instruction with an even target address.
The instruction PUSH {r3} writes the contents of r3 to the address sp-4, then subtracts 4 from the value in sp.
The instruction POP {r3} reads the contents of the address pointed to by sp into r3, then adds 4 to the value in sp.
push and pop are really just shorthand for more general and powerful multiple register instructions (STMDB, LDM, and/or LDMIA). Ignoring that though, they nicely handle any number of registers in a single instruction.
And no, you cannot just write "push r3", the assembler demands the curly braces.
mov r0,#0 mov r3,#0x40This is all well and good, but things get more complicated with larger values. Also there is no clever trick to clear a register, loading a zero immediate works as well as anything. This is a RISC processor after all and that instruction will run in a single clock like any other.
The story with larger values is simply that there is only so much room in a 32 bit instruction set aside for the immediate value, namely 12 bits. But it isn't even that simple - the 12 bits is divided into a 4 bit "rotation" and an 8 bit value. So values from 0-255 are no big deal. Beyond that, the assembler does the dirty work, so you can write things like this:
mov r1, #0x00ab0000The assembler stores the 8 bit value 0xab along with an rotation value, so it looks like magic stuffing 32 bit constants into 12 bits. There is even more to this, which you can read about elsewhere: If you need to load some general 32 bit constant that is not accomodated by this compression scheme, you have two choices. You can use two instructions and load it one half at a time, or you can stick it someplace in memory and then fetch it. The first scheme works like this:
movw sp, #:lower16:my_stack movt sp, #:upper16:my_stackNote that you must do the "movw" first, as it sign extends into the upper half of the word. The second scheme uses a nice assembler construct like this:
ldr r2, =0x01F00220Just for the record, this typically compiles into something like the following. The assembler finds some spot (typically after a subroutine routine) nearby to dump the constant, then loads it via a PC relative address, like this:
ldr r2, [pc, #140]There is a fair chance this will be in the cache and the instruction will nicely execute in one cycle.
mov r2, #0xFFFFFFFF32 bits and all ones, how does this fit into an 8 bit immediate field? The answer is that the assembler is being clever and mapping this to the "mvn" instruction. The assembler will do many things of this sort, so one approach to big immediate values is to just write naive code and wait for the assembler to blow the whistle, then convert as needed to one of the other forms mentioned above.
The "mvn" instruction flips all the bits of the operand, then loads it into a register.
ldr r0, [r4] str r1, [r0]But you can add offsets, and do other things like store bytes and halfwords
ldrh r4, [r0, #136] ldrb r0, [r5, #24] strb r1, [r0, #24] strh r0, [r1, #130]And you can do this. No telling what this does! As near as I can tell this gets a shift operator involved, but who knows. When I get smart enough to figure this out, I will fill you in.
strb r4, [r3], #1
add r4, r5, r6
eor r0, r0, r8 orr r5, r5, #1 and r6, r0, r1 bic r6, r0, r1The first three instructions in the above should be obvious enough, and note that we slipped in EOR (the exclusive or) just to see if you were paying attention. However, you should be asking, "What the heck is "bic" and what is it doing with these familiar logical operations?"
BIC stands for "bit clear" and it is simply an and with the complement of the second operand (which clears all bits in the mask). Note that the second operand is a "mask" not a bit number and multiple bits can be cleared, as in the following instruction:
bic r0, r0, #0x1fHandy enough, but you may be saying, "why not just use "and" and let the compiler invert the mask?". Well, you can code that way if you want, but sometimes it is handy to have a single mask in a register and just use different instructions to set or clear a bit. And just in case you are asking where the "BIS" instruction is, there ain't one, you just use "ORR" to set bits.
add r6, r6, r0 add r5, r5, #1 add r0, r0, r5, lsl #4 add r1, r1, r0, asr #7The first two of these are clear enough, but what about the last two? The ARM has a place in each instruction to provide a shift specification. You can also do:
mul r0, r5, r0 sub r0, r1, #1 subs r9, r9, #1 subeq r0, r0, r2, lsr #1 subscc r4, r4, lr, lsr #4
If they are 0000 you loose. If they are 0001, you get SDIV and UDIV in the thumb instructions If they are 0010, you get SDIV and UDIV in both thumb and ARM instructions.
There are all kinds of ways to accomplish integer division without a divide instruction. Being lazy, my approach is to write code in C and let the compiler sweat it out. Many ARM processors these days have floating point hardware, and you can use floating point instructions to do your division for you.
Tom's Computer Info / tom@mmto.org