January 27, 2023

Kyu networking -- Allwinner H3 - ARM v7 D cache notes

You might think this would be simple -- but you would be wrong. You have both the L1 and L2 cache to worry about, along with the MMU. It turns out that it is not possible to enable the D cache without first enabling the MMU. Why? For one thing, you won't be at all happy if accesses to IO registers are being cached. So you need to set up the MMU to indicate which regions are cacheable and which are not. The MMU handles a large part of the cache configuration with bit determining what cache strategies (things like write-back versus write-allocate) should be used.

Vocabulary

A glossary of sorts is needed. What is PoU and PoC? Exactly what do inner and outer mean? What are sets and ways? What does the "M" in MVA mean? And we are barely getting warmed up.

inner and outer I have spent a fair bit of time puzzling over exactly what this means. The best I have been able to do so far is to say that "inner" means the L1 cache and "outer" means the L2. This is probably what it means in a specific context (and hopefully mine). There is probably a reason why ARM is being vague and using these terms. Whatever the case, I need to know what this means in the system I am now working with and that is very hard to pin down. For now I am going with L1 and L2 and if there is trouble down the road, I can hardly be blamed.

PoU and PoC are also tricky. The textbook definitions at least let you know what the letters stand for. PoC is "point of coherency" and PoU is "point of unification".
PoC is typically main external memory and is the point at which all "observers" (cores and DMA) are guaranteed to see the same copy of memory.
PoU is main memory if there is no external cache. This is the point where the I and D caches and translation table walks are guaranteed to see the same copy of memory
So there you have it for what use you can make of it.

MVA is easier. The bottom line is that this pertains to a feature I won't use and can just ignore. At some point ARM provided a feature the called "FSCE". (Fast Context Switch Extension). The idea was that cache data was tagged with some kind of context ID. This allowed data to remain in the cache across context switches (sacrificing some of the upper address bits to provide room for the contex ID).

set and ways ARM caches are always set associate (not direct mapped). The cache is divided into equal sized pieces called "ways".
L1 caches are often 2 or 4 way.
L2 caches commonlyhave 16 ways.
As I understand this, a given address can map to any (all?) of the different ways and the cache hardware checks all of them. Do I need to know about all this to write software? Probably not, but we shall see.

Bit the bullet - read the manual

Some will say, "why, wasn't that the very first thing you did?". Not when the ARM Architecture Reference manual for ARMv7-A (and -R) is a 2734 page monster. The manual is in 4 parts (A, B, C, and D). Part A is "application level architecture" and is the stuff a "normal" programmer wants to know about. Part A is the first 1130 pages. Part B is "system level architecture" and continues to page 2016. Part C is "debug architecture" and Part D is the appendix.

Chapters are B1, B2, and so forth in Part B. Page numbers are of the for B9-2016, which is nice. You get the part, chapter, and the page in the whole document.

VMSA and PMSA The PMSA (protected memory system architecture) is something I can ignore. It pertains to the -R "profile" parts. VPF is vector floating point, which I ignore along with SIMD.

Jazelle is a processor mode that allows direct execution of Java byte codes (not something I am in any way concerned with doing).

Section B2 talks about memory attributes and references page A3-126 Usually there are 4 choices, and they are described as "hints".

Section B2.2 talks about caches and branch predictors. Section A3-155 talks about caches. Branch predictors are a special sort of cache, and they may need "maintenance" (usually the invalidating of out of date entries when instruction memory is changed).

Up to 7 levels of cache are supported and software may read various registers to determine the cache configuration.
(see page B2-1266)

I enjoyed this statement:
The exact form of any required initialization routine is IMPLEMENTATION DEFINED, and the routine must be documented clearly as part of the documentation of the device.
And here I am reading this manual trying to figure out exactly what they tell me I won't be able to figure out!


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org