.pushsection .text.__asm_flush_dcache_range, "ax"
ENTRY(__asm_flush_dcache_range)
mrs x3, ctr_el0
ubfx x3, x3, #16, #4
mov x2, #4
lsl x2, x2, x3 /* cache line size */
/* x2 <- minimal cache line size in cache system */
sub x3, x2, #1
bic x0, x0, x3
1: dc civac, x0 /* clean & invalidate data or unified cache */
add x0, x0, x2
cmp x0, x1
b.lo 1b
dsb sy
ret
ENDPROC(__asm_flush_dcache_range)
Replace "civac" with "ivac" to get the invalidate routine.
ENTRY and ENDPROC are defined in include/linux/linkage.h -- they are trivial and of no particular interest here.
The "mrs" instruction reads the "ctr_el0" register (cache type register). The ubfx (unsigned bitfield extract) instruction pulls a 4 bit field from offset 16 from the 32 bit value this returns (namely bits 19:16). This gives the log2 of the size in words of the smallest cache line in the L1 and L2 caches controlled by this core.
The value "4" is loaded into x2 to effectively multiply the result to be obtained by 4, converting the word count to a byte count. The shift then inverts the log2 to give us the actual line size in bytes (in x2).
The "bic" instruction is "bit clear". It does NOT just clear one bit. It clears all the bits given (in this case) by the mask in x3. This mask is (x2-1), i.e. the line size minus 1. Clearing these low bits in x0 (start address argument) gives a nice start to the loop that follows.
The loop should be clear. We call the "dc civac" (or "dc ivac") instruction for each cache line that starts within the range ending by the comparison with the x1 value.
Finally, a "dsb sy" barrier ensures that the function does not return until all the writes have been performed.
#define CCSIDR_LINE_SIZE_OFFSET 0
#define CCSIDR_LINE_SIZE_MASK 0x7
u32 line_len, ccsidr;
u32 mva;
/* Read current CP15 Cache Size ID Register */
asm volatile ("mrc p15, 1, %0, c0, c0, 0" : "=r" (ccsidr));
line_len = ((ccsidr & CCSIDR_LINE_SIZE_MASK) >>
CCSIDR_LINE_SIZE_OFFSET) + 2;
/* Converting from words to bytes */
line_len += 2;
/* converting from log2(linelen) to linelen */
line_len = 1 << line_len;
#if FLUSH
/* Align start to cache line boundary */
start &= ~(line_len - 1);
for (mva = start; mva < stop; mva = mva + line_len) {
/* DCCIMVAC - Clean & Invalidate data cache by MVA to PoC */
asm volatile ("mcr p15, 0, %0, c7, c14, 1" : : "r" (mva));
}
#endif
#if INVALIDATE
for (mva = start; mva < stop; mva = mva + line_len) {
/* DCIMVAC - Invalidate data cache by MVA to PoC */
asm volatile ("mcr p15, 0, %0, c7, c6, 1" : : "r" (mva));
}
#endif
// dsb();
asm volatile ("dsb sy" : : : "memory")
Take note in the above, that the C code must do a shift and mask to extract
a field from the register, while with aarch64 we have the ubfx instruction
to do just this sort of thing for us.
Here we have the infernal and cursed syntax for the aarch32 system registers. Let me go on a rant here. Why, o why? Why did they not make the assembler take care of this and allow us instead to write code somewhat like:
mrc r0, CCSIDR mcr DCCIMAVAC, r1They certainly could have -- and should have! I have written post processors to produce this syntax for code I have disassembled, and it has been a huge benefit. This is what assemblers are for and what they should do!!
The U-boot code kindly provides comments to tell us what is intended.
Why does the flush get the start aligned to a line boundary, while the invalidate does not? They both should (as they both do in the aarch64 code above). The only result of not doing this is that we might needlessly invalidate a line that we shouldn't at the end of the range. Excess invalidating never causes errors, but it has a slight performance penalty.
asm volatile ("mrc p15, 0, %0, c0, c0, 1" : "=r" (ctr));
So the code can be written to use either.
The field in the CTR does say that it returns the minimum line size of ALL the caches.
Exactly what goes on with the CCSIDR depends on the specific core (A7 or A53 or ...).
Tom's electronics pages / tom@mmto.org