December 22, 2016

Multiple cores - Part 1

I began looking into this (without success) a month ago, back in 12-22-2016. Now I am back at it again, with new insights.

Bad documentation

This business is vitually undocumented in the Allwinner H3 datasheet. On top of that, the stock linux kernel does not include the latest H3 code. For that, you need to grab the "Armbian" distribution, which builds a custom kernel that includes the most complete sunxi code for the H3 chip.

Armbian to the rescue

I had gone through the full exercise of building Armbian from scratch sometime around 12-11-2016. To do this I ran virtual box, installed Debian (as directed) and ran the Armbian build scripts (which are an involved set of bash scripts that will only work on a Debian system). This patches the regular U-boot and linux kernel (3.4.112) distribution and builds from the patched sources. I worked out how to do the U-boot patching and building under Fedora, but never followed on with the kernel. But since I had the patched kernel sources in the virtual disk image, it was easy enough to transfer that snapshot and study it.

It turns out there is a huge amount of code in the arch/arm/mach-sunxi directory. And it was clear by looking at which files had corresponding ".o" files which of the many files were actually involved in the build. This is good, and a valuable source of information above and beyond what is in the mainline linux kernel sources.

If you do begin looking at the linux code, there is a lot of nomenclature to be aware of. First of all, the H3 chip is in the "sun8i" family. In particular, it is a sun8iw7p1. So you can ignore stuff for the sun8iw6 and the sun9i and others.

Note that the H5 chip (that I plan to work on someday) is in the sun50i family, and as near as I can tell there is no support at all for this in the code tree I am have. This may be supported if a different set of patches were applied and an H5 specific Armbian build was performed. Maybe. Also note that the H5 is a 4-core Cortex-A53 device (a 64 bit ARM). Something for another day.

ARM booting and Bootrom involvement

It turns out that when a new core starts running, it is simply another ARM processor just like any other. It is unaware that it is part of a multi-core system or that it is not the one and only ARM processor in the whole wide world. It has ways of figuring things out (namely reading the processor affinity register), but until it does that, it simply comes out of reset, sets the PC to 0xffff0000 and starts running. This as it happily turns out is the start address of the H3 bootrom (and a second core will start in this just like the first processor did).

A new core does not get far before reading the processor affinity register and getting its processor ID from the low 2 bits. If these are 0, it is the "main" processor and continues on in the bootrom, ends up doing the SPL process, running U-boot and all that we are familiar with. However if the processor ID is non-zero, it is some other core and the bootrom code does something special. Namely it loads the PC from a special location, address 0x01f01da4. Naturally this must have been set to point to some code we have prepared for it by the routine we are using to start a new core.

Note that this is not a location in on-chip SRAM. It does boil down to a 4 byte piece of SRAM if you want to look at it that way.

Detecting a new core running

There are any number of ways for a freshly booted processor to get into trouble, so we would like to contrive the simplest possible test to have it announce that it is up and running. My first idea was to just set aside a memory location, set this location to some non-zero value, and have the new core clear the location when it starts up. Processor zero can just poll this location watching to see if it goes non-zero. This all sounds simple enough --- until you consider the issue of caching.

The issue of caching

The simple strategy of using a "sentinel" like this fails because of caching. What happens is that the fully intialized processor that is trying to start a new core has the data cache enabled. It writes some non-zero value to the sentinel address, but that value never makes it beyond the cache. The new core will almost certainly come up with the data cache disabled, but this doesn't help us any.

We could write some code to flush and invalidate cache lines to solve all this, but we would like a simpler solution just to do some preliminary testing.

The thing to do is to use some memory location that is not cached. In other words, a memory location that is not in SDRAM. One possibility is to use any part of the on-chip SRAM (which is simply gathering dust once the system is booted up).

Another possibility is to just zero the value in the magic location that holds the jump address, namely 0x01f01da4. This works out just fine as it turns out.

After having success with this, we try just some "random" address in on-chip SRAM (namely 0x4), this also works just fine.


Have any comments? Questions? Drop me a line!

Tom's electronics pages / tom@mmto.org