January 29, 2023

Kyu networking -- Allwinner H3 - ARM v7 D cache notes, part 2

You can only read manuals for so long before you have to write some code and see if you are really understanding what you are reading. So I wrote up a bit of code to read the CCSIDR register. Since there are several caches, a person wonders just which cache this is telling him about. It is necessary to select one first using the CSSELR register. After writing a short routine to make the information in the CCSIDR register something a person can undestand at a quick glance, here is what we get:
CLIDR = 0a200023
CCSIDR, L1-D = 700fe01a
 supports Write back
 supports Read allocate
 supports Write allocate
 128 sets as 4 way
 line size = 16 words (64 bytes)
CCSIDR, L1-I = 203fe009
 supports Read allocate
 512 sets as 2 way
 line size = 8 words (32 bytes)
CCSIDR, L2-D = 707fe03a
 supports Write back
 supports Read allocate
 supports Write allocate
 1024 sets as 8 way
 line size = 16 words (64 bytes)
CCSIDR, L2-I = 707fe03a
 supports Write back
 supports Read allocate
 supports Write allocate
 1024 sets as 8 way
 line size = 16 words (64 bytes)
Note that we get the same result for L2 whether we ask for I or D, which makes sense I suppose given that it is a unified cache. Also notice:
for L1 D:  128 sets * 4 way * 64 byte lines = 32768 bytes
for L1 I:  512 sets * 2 way * 32 byte lines = 32768 bytes
for L2:   1024 sets * 8 way * 64 bytes lines = 524288 bytes
This is all as advertised. Code I had written before got a 64 byte line size, but that could be called coincidence or luck given that I never set the CSSELR register.

Now look at the CTR register:

CTR = 84448003
 CTR - minimum line in I cache = 32
 CTR - minimum line in D cache = 64
 CTR - CWG = 64
 CTR - ERG = 64
This register gives line sizes (in words, but I display the values in bytes above). The manual "strongly recommends" using the DMIN and IMIN values given for loops in cache maintenance operations.

And there is the CLIDR register:

CLIDR = 0a200023
 CLIDR - LoUU = 1
 CLIDR - Loc = 2
 CLIDR - LoUIS = 1
 CLIDR - L1 type = 3 I/D
 CLIDR - L2 type = 4 unified
It gives the "type" for up to 7 levels of cache, with the results for the Cortex-A7 in the Allwinner H3 shown. This register allows you to discover how many levels of cache your device has, then you could interrogate those levels using the CLIDR register as shown above.

I am not yet working with any devices with more than 2 levels of cache. Someday I might get my hands on the Rockchip RK3588. It is an 8 core device set up with 3 levels:

L1 = 4 64/64 and 4 32/32
L2 = 2M and 512K
L3 = 3M

How to clean the entire data cache

I am surprised that there is no one operation to invalidate or clean the entire data cache. You have to know the cache structure and you must loop through all the sets and ways. The following does a "clean" (also refered to and identical with a "flush"). To do an invalidate, you would replace the DCCSW with a DCISW.

Pages B2-1286 to 1287 give example code for cache maintenance (cleaning in this case). There is a bug/typo on line 3 where they shift by 23 to get the LoC value (should be 24).
Note that LSR is "logical shift right" and LSL is "logical shift left".
Also note the "isb" between setting the CSSELR and reading the CCSIDR.

Here is my (possibly error ridden) translation into pseudocode.

    r0 = CLIDR (mrc)
    r3 = (r0 & 0x07000000) >> 24
    if ( r3 == 0 ) finished
    r10 = 0
    for ( ;; ) {
	r2 = 3 * r10
	r1 = r0 >> r2
	r1 &= 7
	/* skip if no cache or I cache only */
	if ( r1 < 2 ) continue
	CSSELR = r10
	isb()
	r1 = CCSIDR
	r2 = r1 & 0x7	// line length
	r2 += 4
	r4 = (r1>>3) & 0x3ff
	r5 = clz(r4)
	r9 = r4		// way number
	for ( ;; ) {
	    r7 = (r1>>13) & 0x7fff	// index
	    for ( ;; ) {
		r11 = r10 | r9 << r5	// way number and cache number
		r11 |= r7 << r2		// factor in index number
		DCCSW = r11		// clean by set/way
		r7--			// decrement index
		if ( r7 == 0 ) break
	    }
	    r9--		;; decrement way number
	    if ( r9 == 0 ) break
	}

	r10 += 2
	if ( r10 < r3 ) break
    }
Note the "clz" instruction. This is an ARM instruction that counts leading zeros in a word. The heart of all this is the DCCSW register. This is "Data Cache Clean by Set/Way" and is one of the Cache maintenance instructions. A number of instructions share a common data format in the register: And the bit positions drift around based on B and L as follows:
S = log2 of number of sets
L = log2 of the line length
B = L + S
All this is described here:

Special things for the Cortex-A7 MPCore

Who would have thought? The v7-A manual does not tell it all. Here are quite a few registers that are specific to the Cortex-A7 MPCore: In particular, look at DCCISW (c7,c14,2) which is "Data cache clean and invalidate line by set/way"


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org