The BBB (beaglebone black) has a single ARM Cortex-A8 core, which has:
I am going to ignore NEON in this write-up. It is a SIMD floating point device that targets multi-media applications. It supports only single precision floating point, and has no divide.
The VFD is an IEEE-754 compatible (with a few caveats) floating point unit. The "V" for vector was an early feature that was quickly dropped, and now the letter sticks around just to foster confusion.
Note that Cortex-A8 is not necessarily better than Cortex-A7. Both of these implement the ARMv7 instruction-set architecture. Note that the Cortex-A7 has an integer divide instruction and the Cortex-A8 does not.
This is as good a time as any to dicuss the muddle of terminology that ARM has produced. A good way to try to keep your mind straight is to realize that on one hand we have the "architecture", i.e ARMv7, while on the other hand we have "marketing names" like Cortex-A8 with absolutely no obvious relationship between the labels used in the two worlds.
This is as good a place as any to discuss the issue of documentation. There are two printed books entitled the "ARM Architectural Reference Manual". These are fossils, though still of use with some care. The original book, edited by Dave Jaggar (1996) covers through ARMv4 (and is over 20 years old). The second edition, edited by David Seal (2000) covers through ARMv5, and is also plenty old. After these two books, they gave up trying to produce printed manuals, ARM variants proliferated, and you absolutely must find the manual for your ARM variant online and study it for the last word on any specific details. The ARMv7-A reference manual I currently use is 2734 pages.
That being said, the second edition book has a nice section on VFPv1 that provides a good starting point.
-marm -march=armv7-a -msoft-floatNaturally the "soft-float" option prevents any hardware floating point code from being generated. To find out what target dependent options are available, you can reference the link above, or do something like this:
arm-linux-gnu-gcc --target-helpThis will yield several screens of options. What did the trick for me was to change my options to:
-marm -march=armv7-a -mfpu=vfpv4This generates floating point instructions, but running it yields an undefined instruction exception when it encounters the first floating point instruction, namely this:
vldr s13, [r3]So, the floating point unit itself needs to be enabled before you can use it.
The processor comes up with the floating point coprocessors disabled. My old friend David Welch suggests the following code:
mrc p15, 0, r0, c1, c0, 2 orr r0, r0, #0x300000 @ single precision orr r0, r0, #0xC00000 @ double precision mcr p15, 0, r0, c1, c0, 2 mov r0, #0x40000000 fmxr fpexc,r0The manuals recommend an IMB (instruction memory barrier) after the "mcr" instruction.
The "mrc" and "mcr" instructions are accessing the "Coprocessor access control register" and setting the bits to enable both single and double precision floating point. Supposedly bad things happen if you enable one and not the other.
ARM Architecture Reference Manual (ARMv7-A and ARMv7-R edition) (2734 pages) Cortex-A7 MPCore Technical Reference Manual (268 pages) Cortex-A7 Floating-Point Unit Technical Reference Manual (25 pages) Cortex-A7 NEON Media Processing Engine Technical Reference Manual (26 pages) Cortex-A Series (Version: 2.0) Programmer’s Guide (455 pages)
It turns out to indicate one of the double precision registers, the "w" constraint letter also works, which is nice. Apparently the compiler is clever enough to get a clue from the type of the variable being translated. If it is a float, it maps to an "s" register, and if it is a double, it maps to a "d" register.
More about these sorts of things in these excellent guides:
Note that the second link digs a bunch of these inline tricks out of the "constraints.md" file, which is in the gcc source tree somewhere. And the comment is made that a lot of this is being kept intentionally secret because the gcc maintainers do not consider it a public interface and may change it at any time.static int sqrt_i ( int arg ) { float farg = arg; float root; asm volatile ("vsqrt.f32 %0, %1" : "=w" (root) : "w" (farg) ); return 10000 * root; } void arm_float ( void ) { int val; int num = 2; val = sqrt_i ( num ); printf ( "Square root of %d is %d\n", num, val ); } void my_main ( void ) { fp_enable (); arm_float (); }When this runs, I get:
Square root of 2 is 14142Notice the call in the above to fp_enable(). This is a bit of assembly language in my start.S file that looks like this (and which you should recognize from above):
.globl fp_enable fp_enable: mrc p15, 0, r0, c1, c0, 2 orr r0, r0, #0x300000 @ single precision orr r0, r0, #0xC00000 @ double precision mcr p15, 0, r0, c1, c0, 2 isb mov r0, #0x40000000 fmxr fpexc,r0 mov pc, lr
There are the usual single and double precision entities, but also a 16 bit "half precision". The "vector" in VFP is a bit of a misnomer as the vector operations are now deprecated. I guess they expect you to use the NEON unit if that is your game.
The VFP unit provides the usual floating math operations, and includes square root in hardware.
Different implementations of the ARM vfp can have different number of registers. The usual situation seems to be that you get 32 single precision registers or 16 double precision registers. Each double precision register sits on top of two single precision registers. The results of setting a double precision register, then accessing one of the underlying single precision registers is undefined - so there is nothing clever of that sort going on. To do a single or double precision add where s1 = s2 + s3 (or d1 = d2 + d3) you do either:
vadd.f32 s1, s2, s3 vadd.f64 d1, d2, d3loads and stores from and to memory look like:
vldr s1, [r3] vldr s2, [r3, #4] vstr s10, [r4]The "vmov" instruction can move between regular ARM registers and floating point registers.
Tom's Computer Info / tom@mmto.org