The Gnu C compiler (gcc) supports inline assembly within C code. As they say, there are two reasons to use inline assembly. One is for code optimization, the other is access to specific instructions that the compiler is unable to generate for you. My use here is a bit of both.
First I want to explain one thing that has just become clear to me and that has greatly aided my understanding. Consider a statement like the following:
asm volatile ( "movs %[digit], #0x7\n\t" : [digit] "=r" (d) );I am clearly diving in without explaining many things, but stick with me. This is equivalent to the C statement "d=7;" and of course we are using inline assembly here just for the sake of illustration. The clause "[digit] "=r" (d)" is what establishes the connection between the assembly language world and the C world. The symbol "d" -- as in (d) is the C variable we want to act on. In the assembly world we use "digit" to refer to it. The "=r" business indicates it is an output register.
The important idea is that %[digit] refers to a register. It is the compilers problem to assign that register and know that is needs to put the value in the register somewhere and operate on it.
What I am saying might be clearer if I was talking about an input register. In that case it would be the compilers problem to assign that register and to get the value of the C variable into that register. That is all done for you and you don't have to worry about it.
All of this means that you job is easier than you thought it might be. You just decide what needs to be done with ARM registers coming and going and the compiler sets the up for you coming in and takes them from you going out. Having said all of that, here is a great tutorial on gcc inline assembly for the ARM:
The original code in my printf for %d looked like this:
do { *cp++ = hex_table[n%10]; n /= 10; } while (n);Note the % to get the modulo and the divide by 10. I recoded this like so:
do { d = digit ( &n ); *cp++ = hex_table[d]; } while (n);Here the function "digit" does the divide by 10, returning the remainder and modifying the value of n (dividing it by 10). The assembly code I first wrote and put into start.S looks like this:
#define SIO_BASE 0xD0000000 // Calculate p/q where p is argument and q is 10 .global digit .thumb_func digit: ldr r1,=SIO_BASE ldr r2, [r0] str r2, [r1,#0x60] movs r2, #10 // divide by 10 str r2, [r1,#0x64] // Delay for 8 cycles // each branch gives us 2 cycles b 1f 1: b 1f 1: b 1f 1: b 1f 1: // Must read quotient last ldr r3, [r1,#0x74] // remainder ldr r2, [r1,#0x70] // quotient str r2, [r0] movs r0, r3 bx lrThis worked just fine, but I wanted to see if I could use inline assembly code, and I ended up with this:
// #define SIO_BASE 0xD0000000 do { asm volatile ( "ldr r1, =0xD0000000\n\t" "str %[value], [r1,#0x60]\n\t" "movs r2, #10\n\t" "str r2, [r1,#0x64]\n\t" // Delay for 8 cycles "b 1f\n\t" "1: b 1f\n\t" "1: b 1f\n\t" "1: b 1f\n\t" "1:" "ldr %[digit], [r1,#0x74]\n\t" // remainder "ldr %[value], [r1,#0x70]\n\t" // quotient : [digit] "=r" (d) , [value] "+r" (n) : : "r1", "r2" ); *cp++ = hex_table[d]; } while (n);Note that the inline assembly did not incorporate the macro value SIO_BASE, so I had to hand code it. This actually works fine and yields code as good as anyone could want.
Generalizing this (perhaps with an inline function) for integer division would not be hard, I have yet to investiage what gcc optimization would do with this. The generated code from objdump is as follows:
100002ba: 49a2 ldr r1, [pc, #648] 100002bc: 660b str r3, [r1, #96] @ 0x60 100002be: 220a movs r2, #10 100002c0: 664a str r2, [r1, #100] @ 0x64 100002c2: e7ff b.n 100002c4Here we see that the compiler selected r3 to hold the value of "n" both coming in and going out. The copiler selected r0 to hold the value of "d" going out.100002c4: e7ff b.n 100002c6 100002c6: e7ff b.n 100002c8 100002c8: e7ff b.n 100002ca 100002ca: 6f48 ldr r0, [r1, #116] @ 0x74 100002cc: 6f0b ldr r3, [r1, #112] @ 0x70
Tom's electronics pages / tom@mmto.org