April 23, 2026

Allwinner H5 network driver -- Sets and Ways

I wrote some code for the H5 to extract information from the cache control registers and display it.
--------   cache CCSIDR for L1 D = 700fe01a
NO Write through
Write back
Read allocate
Write allocate
ARM L1 D cache line size: 64
 associativity: 4
 sets: 128
--------   cache CCSIDR for L1 I = 201fe00a
NO Write through
NO Write back
Read allocate
NO Write allocate
ARM L1 I cache line size: 64
 associativity: 2
 sets: 256
--------   cache CCSIDR for L2 = 703fe07a
NO Write through
Write back
Read allocate
Write allocate
ARM L2 cache line size: 64
 associativity: 16
 sets: 512
The heart of an H5 chip is a quad core Cortex-A53 cluster. As we know, there are 3 different caches, and that is what we see above.
The cache sizes for the H5 are:
A shared (unified) 512KiB L2 cache for all cores
A 32KiB L1 Instruction cache for each core.
A 32KiB L1 Data cache for each core
The Mali450-MP4 GPU has a 32K cache for the vertex processor
The Mali450-MP4 GPU has a 128K cache for the pixel processor

I've never had anything to do with the Mali450 (and probably never will), but it is interesting to learn that there are caches tucked away in it.

Note in the above:

128*4*64 = 32K
256*2*64 = 32K
512*16*64 = 512K
So the calculated cache size based on information we read out from the chip matches the claimed sizes in literature.

Sets and Ways

There are many ways to design caches. All 3 caches we are dealing with here are set associative caches.

Consider finding an entry in the cache. A given line has a memory address and the address gets split into 3 parts.

The low part is easy, it just selects which byte in the cache line we are currently interested in. We can more or less ignore this. It is 6 bits for our 64 byte line size.

The middle part selects which "way" we search. In the case of our 2-way I-cache, this is a single bit!

The high part is called the tag and will be matched against saved tags for each line in the cache.

Note that either the high or middle part could in theory be used for the tag. It seems to be common to use the high part as shown here.

Each "way" consists of several sets. In the case of our I cache, we have 256 sets in each way. Each set holds a tag value along with the line itself. There is also a "valid" bit.

Once a way is selected, a match is performed in parallel for all sets (so each set has a comparator to perform the match of the current tag against the saved tag). For our I-cache, the tag must be 64-7 or 57 bits if we have 64 bit addresses.

A hit takes place when some comparator indicates a match against the tag and when the valid bit is set.

As you can see, there is quite a bit of circuitry aside from the cache memory itself when you start thinking about memory for tag values along with a bunch of comparators (we need 256 comparators for our I-cache)

The CSSELR_EL1 register

This is defined only for EL1 apparently. We are running in EL2 so we are free to access it.

This is a 64 bit registers with only 5 bits active

bit 4 - set 0 to get I, D, or unified (as we want)
bits 3,2,1 - level (0 is L1, 1 is L2, ...)
bit 0 - set 1 to get I, set 0 to get D or unified
The purpose of this register is to make a selection for subsequent accesses to the CCSIDR_EL1 register.

The CCSIDR_EL1 register

Reading this register gives information about the cache selected by a previous writeto the CSSELR register. It is a 64 bit register with 3 fields as follows:
bits 32-55 = number of sets - 1
bits 3-23 = associativity- 1
bits 0-2 = log2(line-size)-4
For our 64 byte line size, the line size field is 2

The CTR_EL0 register

This is the "cache type register" and strangely enough is only defined for EL0 (but of course we can access it from EL1 or EL2). This register is used by the dcache flush and invalidate routines I found in U-Boot.


Have any comments? Questions? Drop me a line!

Tom's electronics pages / tom@mmto.org