What about the F411? Here we have a Cortex-M3. It does not have either an I-cache or a D-cache. What it does have is something called an ART accelerator which is exactly a cache for flash memory access. To me, that makes it to all intents and purposes an I-cache. But perhaps if you run code out of ram, to find an example, the ART would not get involved (or need to?). ART is "adaptive real time" accelerator and indeed it is only involved with flash accesses. The F429 also has one.
In the other direction, consider F7 and H7 cores, which are often (but not always) Cortex-M7 parts. These have both I and D caches with all the DMA issues.
Tom's Computer Info / tom@mmto.org