-------- cache CCSIDR for L1 D = 700fe01a NO Write through Write back Read allocate Write allocate ARM L1 D cache line size: 64 associativity: 4 sets: 128 -------- cache CCSIDR for L1 I = 201fe00a NO Write through NO Write back Read allocate NO Write allocate ARM L1 I cache line size: 64 associativity: 2 sets: 256 -------- cache CCSIDR for L2 = 703fe07a NO Write through Write back Read allocate Write allocate ARM L2 cache line size: 64 associativity: 16 sets: 512The heart of an H5 chip is a quad core Cortex-A53 cluster. As we know, there are 3 different caches, and that is what we see above.
A shared (unified) 512KiB L2 cache for all cores A 32KiB L1 Instruction cache for each core. A 32KiB L1 Data cache for each core The Mali450-MP4 GPU has a 32K cache for the vertex processor The Mali450-MP4 GPU has a 128K cache for the pixel processor
I've never had anything to do with the Mali450 (and probably never will), but it is interesting to learn that there are caches tucked away in it.
Note in the above:
128*4*64 = 32K 256*2*64 = 32K 512*16*64 = 512KSo the calculated cache size based on information we read out from the chip matches the claimed sizes in literature.
Consider finding an entry in the cache. A given line has a memory address and the address gets split into 3 parts.
The low part is easy, it just selects which byte in the cache line we are currently interested in. We can more or less ignore this. It is 6 bits for our 64 byte line size.
The middle part selects which "way" we search. In the case of our 2-way I-cache, this is a single bit!
The high part is called the tag and will be matched against saved tags for each line in the cache.
Note that either the high or middle part could in theory be used for the tag. It seems to be common to use the high part as shown here.
Each "way" consists of several sets. In the case of our I cache, we have 256 sets in each way. Each set holds a tag value along with the line itself. There is also a "valid" bit.
Once a way is selected, a match is performed in parallel for all sets (so each set has a comparator to perform the match of the current tag against the saved tag). For our I-cache, the tag must be 64-7 or 57 bits if we have 64 bit addresses.
A hit takes place when some comparator indicates a match against the tag and when the valid bit is set.
As you can see, there is quite a bit of circuitry aside from the cache memory itself when you start thinking about memory for tag values along with a bunch of comparators (we need 256 comparators for our I-cache)
This is a 64 bit registers with only 5 bits active
bit 4 - set 0 to get I, D, or unified (as we want) bits 3,2,1 - level (0 is L1, 1 is L2, ...) bit 0 - set 1 to get I, set 0 to get D or unifiedThe purpose of this register is to make a selection for subsequent accesses to the CCSIDR_EL1 register.
bits 32-55 = number of sets - 1 bits 3-23 = associativity- 1 bits 0-2 = log2(line-size)-4For our 64 byte line size, the line size field is 2
Tom's electronics pages / tom@mmto.org