The Kyu project

January 18, 2023

ARM Processor 101 -- the MMU

All of my 32 bit ARM boards are ARMv7-A instruction set. This means that what I learn about the MMU for one applies to all of them.

Manuals and documentation

ARM makes a vast and overwhelming amount of documentation available. Care must be taken that you are (for example) reading the documents that pertain to the ARMv7-A (or whatever architecture you are working with). I have found myself mistakenly reading about MMU details for some other architecture -- which may be very similar, but have vital details different.

Enabling and Disabling the MMU

Basics

The ARM mmu can be used in many different ways. In the most general case it is a 2 level MMU with 4k pages (as an example). This is how something like the Linux operating system might use it.

The way I use it is as a 1 level MMU with 1M "sections". My aim is to make it transparent and forget about it. You might ask, "why not just turn it off entirely", but as near as I can tell that will not allow caches to be enabled. Also mapping as much of the address space as possible "invalid" will catch certain bugs.

Translation table

The translation table must begin on a 16K boundary. When a virtual address needs to be translated, the top 12 bits of that address select the first level entry in the translation table. Hence each first level entry deals with 1M of the virtual address space. Each entry is 32 bits in size.

There are 4 types of table entries as indicated by the low 2 bits of the entry:

00 - unmapped (access will give a translation fault).
01 - coarse 2nd level table
10 - section descriptor
11 - fine 2nd level table

As a side note, some ARM devices support 16M "supersections", but none that I am working with.

I use section descriptors, which are as follows. My old printed manual (for ARMv5 it turns out) showed bits 19-12 were as "should be zero". ARMv7 has added many new definitions. Also note that attribute meanings have been redefined. It is vital to use the relevant online documents.

bits 31-20 (12 bits) section base address
bit 19 NS
bit 18 0
bit 17 NG
bit 16 S
bit 15 AP[2]
bits 14-12 (3 bits) TEX
bits 11-10 AP[1:0]
bit 9 IMP (iplementation dependent)
bits 8-5 DOM (4 bits) domain (16 possible)
bit 4 XN
bit 3 C - cacheable
bit 2 B - bufferable
bit 1-0 "10" to make this a section descriptor

NS is "non secure" and only is significant on chips with security extensions
NG is "non global" and could be used to divide memory into global/non-global regions
S is "shareable"
AP (3 bits) are "access permission"
TEX along with C,B are the new "attribute bits". The v7 document says that these bits now have more general meaning than their names imply
XN is "execute never" and prohibits instruction fetch if set.

A bit in the SCTLR register (namely the TRE bit) can change the meaning of the TEX bits).
When TRE (TEX remap enable) is 0, then TEX along with C and B work in the traditional way.

The 5 bits give 32 possibilities, many of which are "reserved", see this table:

TEX and C,B

The AP bits are as per this table. Note that there are actually 3 bits, but separated in the descriptor. Note that 011 will give full access (r/w).

AP - access permission

The DOM bits specify one of 16 domains. The DACR (domain access register) has a 2 bit field controlling access to the domain. A reasonable approach for me is to put everything into domain 0 and set up the DACR accordingly.

Domains

The DACR register

This is a 32 bit register consisting of 16 2-bit groups. The 2 bits specify access for the given domain as follows:

00 - no access
01 - client access (as per bits in translation tables)
10 - undefined and unpredictable
10 - manager acess (bits in translation tables are not checked).

MCR p15, 0, , c3, c0, 0    ; Write Rt to DACR

To entirely get this business out of your hair, set the domain field in all the descriptors to zero (actually any value will do), then do this:

set_DACR ( 0xffffffff );

Translation table registers

There are 3 of them. Two base registers and a control register

MCR p15,0,,c2,c0,2    ; Write CP15 Translation Table Base Control Register
MRC p15,0,,c2,c0,0    ; Read CP15 Translation Table Base Register 0
MRC p15,0,,c2,c0,1    ; Read CP15 Translation Table Base Register 1

TTBCR - translation table base control register. Above all else, this selects whether TTBR0 or TTBR1 is selecting the translation table.
It is zero after reset, all bits are zero except the following:

Bit 5 - PD1 - disables translations when TTBR1 is selected.
Bit 4 - PD0 - disables translations when TTBR0 is selected.
Bits 2-0 (3 bits) N - determines width of base field in TTBR0

The value of N when set to 0 says to use TTBR0 for the base address and to use bits 31-14 as the base. This is how I intend to do business, setting non-zero values activates logic that selets TTBR0 or TTBR1 depending on the virtual address. Note that when the PD bits are set, values in the TLB will still be used. If the TLB is flushed and the PD bits are set, translations will yield a translation fault

I see no reason not to just set the TTBCR register to 0. TTBR1 - translation table base register 1. I don't intend to use this register (I will set TTBCR to zero), so I skip describing it. TTBR0 - translation table base register 0.

Bits 31-14 Base address of the table
Bits 13-6 should be zero
Bit 5 NOS "not outer sharable"
Bits 4-3 "region"
Bit 1 S "shareable"
Bit 0 C "inner cacheable"

This register has an alternate definition when "multiprocessing extensions" are enabled. Also, I have not plumbed the depths of regions, inner, and outer, for which, see the online pages:

Translation control registers

Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org