December 22, 2016

Multiple cores - Part 1

This is entirely new ground to me. Much to my surprise, the multiple core business falls outside of the ARM specification. An ARM core is an ARM core and each is equal to any other from the view of the ARM itself. The business of launching cores other than core 0 is handles by hardware (and registers) that are part of the SoC that contains the cores and is unique to each SoC fabricator.

The Allwinner H3 chip on the Orange Pi PC has four cores. There are hints in the datasheet, but as it turns out there is not sufficient information there to know how to get things done. I have learned the basic game by looking at the linux kernel code.

Some possibly useful links follow.

The first link got me started looking at the linux sources. (As it turns out this was somewhat misleading, see Multiple Cores - part 2).

I have the linux 4.2.1 kernel sources on my machine and looked at those. This quickly led me to the file arch/arm/mach-sunxi/platsmp.c and the function sun8i_boot_secondary(). The process for starting a core accesses registers in the CPUCFG and PRCM sections of the H3 chip. The base addresses for these are:

CPUCFG_BASE = 0x01f01c00
PRCM_BASE   = 0x01f01400
(These offsets are specified to the linux kernel in arch/arm/boot/dts/sun8i-a23-a33.dtsi.) The relevant register offsets in these two sections are:
#define CPUCFG_CPU_RST_CTRL_REG(cpu)            (((cpu) + 1) * 0x40)
#define CPUCFG_GEN_CTRL_REG                     0x184
#define CPUCFG_PRIVATE0_REG                     0x1a4

#define PRCM_CPU_PWROFF_REG                     0x100

The "recipe" looks to be the following:

1 - Acquire the CPU spin lock.

2 - Write the address of the code you want the processor to run into the PRIVATE0_REG. (this register is undocumented in the datasheet).

3 - Put the processor into reset by writing 0 to the appropriate processor reset vector. There are 4 of these, one per core. There are two bits (so you write a 3 to de-assert both the "core" and "power on" resets when the time comes.

4 - Reset the L1 cache for that processor by clearing a bit in the general control register. There are other bits, including bits to reset the L2 cache that you don't want to mess with. Do something like this:

GEN_CTRL_REG &= ~BIT(cpu);

5 - Power on the core in question by clearing the power suppression bit. This should look familiar:

PRCM_CPU_PWROFF_REG &= ~BIT(cpu);

6 - De-assert reset.

7 - Release the spin lock.

What about the start address?

I had expected that an ARM core would come up with the PC set to 0 or 0xffff0000, Coming up at 0 would be running in SRAM and coming up at 0xffff0000 would be in the boot rom. It turns out that it comes up with the PC set to 0xffff0000. This is the usual entry point into the boot rom (BROM). An inspection of the disassembled boot rom code shows that the ROM indeed fetches an address from PRIVATE0 --- 0x01F01DA4 and jumps to it. Linux code loads PRIVATE0 with the address of "secondary_startup", which is a routine in arm/kernel/head.S. This is tricky business, but not as bad as it sounds. Clocks and memory are all set up, the main thing to do is to set up the MMU in the new processor and give it a stack.

Looking at the boot rom code, the first question that arises is "what does the following instruction do?"

mrc     15, 0, r0, cr0, cr0, {5}
"MRC" is the "move to ARM register from coprocessor" instruction. It is moving from coprocessor 15 (the system coprocessor). The opcode is 0 and it moves to register "r0". Coprocessor register 0 is mentioned twice, Opcode 2 is 5.

Note that this instruction arises in the BE8 macro that is part of the linux assembly language startup when a new core starts running. This is reading the multiprocessor affinity register.

The Allwinner H3 ha a Cortex-A7 processor, which implements the ARMv7-A architecture (not the ARMv7-M). The Cortex-A8 in the BBB also implements the ARMv7-A by the way. Figuring out just what register this is from the 2736 page manual is hell on wheels, but a search for "arm system control register list" led to this register summary. and reveals that this reads the MPIDR (multiprocessor affinity register). The lowest 2 bits tell you what processor (0-3) you are running. The relevant code in the boot rom is:

ffff2c44:       ee100fb0        mrc     15, 0, r0, cr0, cr0, {5}
ffff2c48:       e2001003        and     r1, r0, #3
ffff2c4c:       e3510000        cmp     r1, #0
ffff2c50:       1afffff9        bne     0xffff2c3c

ffff2c3c:       e59f01c8        ldr     r0, [pc, #456]  ; 0xffff2e0c (PRIVATE0)
ffff2c40:       e590f000        ldr     pc, [r0]
So this is simple and clear. If we are not core 0, we branch to the address in "PRIVATE0". There is more code (and interesting too, but having little to do with the current topic) as follows:
ffff2c44:       ee100fb0        mrc     15, 0, r0, cr0, cr0, {5}
ffff2c48:       e2001003        and     r1, r0, #3
ffff2c4c:       e3510000        cmp     r1, #0
ffff2c50:       1afffff9        bne     0xffff2c3c	; start non-zero core

; We must be core 0
; This reads the "cluster ID" pin value, which could be
; non-zero is some multiprocessor system with discrete processors.
ffff2c54:       e2001cff        and     r1, r0, #65280  ; 0xff00
ffff2c58:       e3510000        cmp     r1, #0
ffff2c5c:       1afffff6        bne     0xffff2c3c

; Now we read PRIVATE1 and compare it to a "magic value"
; If it matches, we branch to PRIVATE0
ffff2c60:       e59f11a8        ldr     r1, [pc, #424]  ; 0xffff2e10    (PRIVATE1)
ffff2c64:       e59f21a8        ldr     r2, [pc, #424]  ; 0xffff2e14    (MAGIC)
ffff2c68:       e5913000        ldr     r3, [r1]
ffff2c6c:       e1520003        cmp     r2, r3
ffff2c70:       1a000000        bne     0xffff2c78
ffff2c74:       eafffff0        b       0xffff2c3c      ; start as if non-zero core

ffff2e0c:       01f01da4        ; address of PRIVATE0
ffff2e10:       01f01dac        ; another address (PRIVATE1)
ffff2e14:       fa50392f        ; magic number

What the heck is BE8 mode?

This is a long story that ought to be short. ARM processors have the ability to switch their endian behavior. Most people use them as little endian machines (like x86 and I guess VAX), but you could use them big endian. You could. I don't know why you would want to, but some people do, and you can build the linux kernel to support it.

Older ARM processors handled the big endian business in a way that caused problems that some folks didn't like. So, ARM fixed this and came out with a new way of doing things. The old way is called BE32 and the new way is called BE8. The transition happened with armv7 (so as of armv7, you get BE8).

Do we care? Probably not, but you trip over this stuff in the linux kernel. In general you want to flip the bit that puts the processor in this mode as early as possible after reset if you are crazy enough to want to go down this road. I don't understand the interest, but there are people putting a lot of effort into this.

Unaligned memory accesses

We are wandering farther and farther afield, but it is worth making note of things as we trip across them. The NetBSD article above makes the claim: "ARM CPUs post version 5 can do unaligned data access". It turns out this is indeed true! It involves clearing a bit in the processor control register. (This bit should be clear coming out of reset, for one reason or other, U-boot seems to set it, which is unfortunate).


Have any comments? Questions? Drop me a line!

Tom's electronics pages / tom@mmto.org