December 4, 2016

GCC inline assembly

This stuff is a pain. It has a weird syntax and lots of tricky details. But it is rewarding when mastered by allowing you to do lots of amazing things without actually writing assembly language source files. And the resulting inline code is much more efficient and elegant. Almost anyone working on operating systems code will be interested in this topic. It is possible to avoid it entirely by coding C callable routines in assembly language, but using inline assembly avoids call overhead, making this more efficient. In addition, using inline assembly can make code more readible by keeping all the details in one place; however these inline things tend to get gathered up into include files, negating this virtue.

Some examples and a few notes

Unless otherwise mentioned, most of the examples are for the x86, although plenty of ARM examples are creeping in. The basic concepts are identical for both architectures.

The simplest possible case is something like this:

    asm ( "nop" );	/* on an x86 */
    asm ( "move r0, r0" );	/* on an ARM */
These examples are simple because these "nop" instructions do not affect registers or memory. In other words, they have no side effects. As soon as you start using registers or doing things that the compiler may need to know about, you really need to sit down and learn the whole business. The above links are pretty good.

If you actually want to use this code (such as for a short delay of some such) you will probably need to add the word "volatile" after the "asm" or the compiler will optimize it away. Essentially the word "volatile" tells the optimizer to keep its hands off.

    asm volatile ( "nop" );
You can also use __asm__ and __volatile__ if you think there may be name conflicts with your code.

The extended case

    asm ( "A" : O : I : C );
Where: So there are 4 sections separated by colons. Empty sections may be truncated, but may need to be retained as place holders. I will leave the details to the excellent links above.

Two flavors

The best thing about standards are that there are so many go choose from, or so the saying goes. There are two ways of specifying the C to assembler linkage (the old way and the new way). The old way uses numerical values like this:
asm("mov %0, %1, ror #1" : "=r" (result) : "r" (value));	/* ARM assembler */
In this statement, %0 refers to the first item encountered, not in the "A" section, but in the "O, I, C" list that follows. The man bad thing here is the confusion about numbering. Here %0 corresponds to "result" and %1 corresponds to "value". Here, "result" and "value" are variables in the C code.

Also note in passing that in ARM assembler data flows from right to left, whereas in X86 assembler, data flows from left to right.

The new way avoids all the confusion about numbering. Each "connection" is given a name. It looks like this (ARM assembly here):

asm("mov %[res], %[val], ror #1" : [res] "=r" (result) : [val] "r" (value));
Here the symbols "res" and "val" specify what corresponds to what between the "A" section and the "O, I, C" sections to the right of it. The fancy name for these is "symbolic operand names" and they live in their own isolated name space and have nothing to do with C variable names. Here again, "result" and "value" are variables in the C code.

GCC supports both of these methods so that old code need never die.

Constraint strings

What about these weird "=r" things and such? These are called constraint strings. Many of these are hardware dependent (i.e. different for the x86 and arm). An "r" indicates a general purpose register.

The clobber list

This is a comma separated list of register names (and a few other things besides). It looks like:
"r1", "r4", "cc"
"esp", "memory"

Examples

Here are some examples to put some meat on the bones of all this abstract description.

The following transfers the contents of the ESP register (stack pointer) to a C variable "sp".

    asm("movl %%esp, %0\n" :"=r"(sp));
Here are a pair to read and write the x86 "cr8" register, to and from a C variable "val".
 asm volatile("movq %%cr8,%0" : "=r" (val));
 asm volatile("movq %0,%%cr8" :: "r" (val) : "memory");
The following are from arch/x86/include/asm/pci_x86.h. The first supplies the C variable "pos" and reads into "val". The second supplies both "pos" and "val". Notice that the letter "a" indicates the EAX register is being used.
asm volatile("movl (%1),%%eax" : "=a" (val) : "r" (pos));
asm volatile("movl %%eax,(%1)" : : "a" (val), "r" (pos) : "memory");

Using inline C functions to wrap assembly language statements

The linux kernel does a lot of this. Here is an example from arch/x86/include/asm/processor.h. A style that might be worth emulating.

You used to see a lot more inline assembly hidden inside macros, but that seems to have fallen out of vogue.

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
        asm volatile("rep; nop" ::: "memory");
}
And for the person who endures to the end, here is a bonus: a routine to read the x86 timestamp counter on a 32 bit processor: Compare this to the first example that reads the value of the stack pointer. Here the big difference is the "A" (not to be confused with "a"). This is x86 specific and pertains to 64-bit integer values intended to be returned with the `D’ register holding the most significant bits and the `A’ register holding the least significant bits. Just about custom made for the 64 bit TSC counter, eh?
static inline unsigned long long read_tsc (void)
{
	unsigned long long val;

	asm volatile("rdtsc" : "=A" (val) );
	return val;
}


Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org