January 8, 2026

Kyu - ARM - Aarch64 cache dc and ic instructions

I had expected all of the cache maintenance to be handles by msr and mrs instructions to system registers. Instead, aarch64 has two cache specific instructions, one for the data cache and one for the instruction cache.
The form is generally "dc op, reg"
dc	ivac -- invalidate by VA to PoC
dc  isw  -- invalidate by set/way
dc  csw  -- clean by set/way
dc  cisw -- clean and invalidate by set/way
dc  cvac -- clean by VA to PoC
dc  cvau -- clean by VA to PoU
dc  cvap -- clean by VA to PoP
dc  cigdpae -- clean and invalidate data and allocation tags by PA to PoE
dc  cgdpae -- clean data and allocation tags by PA to PoE
dc  igdpae -- invalidate data and allocation tags by PA to PoE
Note that all of these affect both the core specific data cache and the L2 unified cache.

PoP is "point of permanence", which may not be supported by an implementation (and commonly is not).

PoE is "point of elimination" and is only supported if the MemTag extension is included (i.e. probably not in the systems I work with).

The "ic" instruction for the instruction cache

Here we have:
ic  ialluis  -- invalidate to PoU and PoIS
ic  iallu    -- invalidate to PoU
ic  ivau     -- invalidate by address to PoU

Examples

The following is only intended to give samples of instruction syntax (along with some hints about use).

U-boot does this after relocating code:

    ic  iallu       /* i-cache invalidate all */
    isb sy
And it provides this in a routine "asm_invalidate_icache_all":
    ic  ialluis
    isb sy
We never see U-boot using "ic ivau", so it does a whole hog invalidation of the I cache whenever it gets the itch to do so.

We see these in a loop:

    dc  isw, x9
    dc  cisw, x9
And these in a loop in a "dcache flush by range" routine:
    dc  civac, x0   /* clean & invalidate data or unified cache */
    dsb sy
And these in a loop in a "dcache invalidate by range" routine:
    dc  ivac, x0    /* invalidate data or unified cache */
    dsb sy
The routines in question here are in arch/arm/cpu/armv8/cache.S. This is only 259 lines of code and worthy of study. Note however that there is also cache_v8.c as a layer above this assemly code, and it is 787 lines.

In other words there are about 1000 lines of code to digest to get a handle on these cache routines.
Most of them are involved with MMU setup code.

Take a look at this:

The ctr_el0 system register

This is the all important "cache type register". It gets read by both the invalidate_dcache_range and flush_dcache_range routines. They read it to figure out the cache line size. The code uses the ubfx (unsigned bit field extract) instruction on the contents of this register:
	mrs  x3, ctr_el0
	ubfx x3, x3, #16, #4
This extracts 4 bits (19:16) at shift 16. This is Log2 of the number of words in the smallest cache line of all the data caches and unified caches that are controlled by the PE. It uses the line length to set up a loop to invalidate or flush line by line.


Have any comments? Questions? Drop me a line!

Tom's electronics pages / tom@mmto.org