May 11, 2018

The ESP32 bootrom - case studies

For no particular reason, let's look at the ets_delay_us routine. Actually there is a good reason - this is both short and instructive. Here is the disassembled code:
[tom@trona bootrom]$ ./espdis ets_delay_us

40008530:	3ffe01e0	; l32r

40008534:	364100        	entry	a1, 32
40008537:	a5b103        	call8	0x4000c050	; xthal_get_ccount
4000853a:	81fdff        	l32r	a8, 0x40008530	; g_ticks_per_us_pro ( 0x3ffe01e0 )
4000853d:	3d0a      	mov.n	a3, a10
4000853f:	8808      	l32i.n	a8, a8, 0
40008541:	802282        	mull	a2, a2, a8
40008544:	a5b003        	call8	0x4000c050	; xthal_get_ccount
40008547:	30aac0        	sub	a10, a10, a3
4000854a:	273af6        	bltu	a10, a2, 0x40008544
4000854d:	1df0      	retw.n

4000c050:	362100        	entry	a1, 16
4000c053:	20ea03        	rsr.ccount	a2
4000c056:	1df0      	retw.n
This calls a routine "xthal_get_ccount", which is about as simple as a routine can get. This reads the esp32 processor "ccount" register into a2 and returns. The first thing to note is that because of the "call8", the register window shifts by 8, so that once this returns, the value is seen in a10 by the caller. Also it is at least worthy of comment that this could be simply done inline via the instruction "rsr.ccount a10". Note that the "rsr.ccount" routine returns a cycle count value from a special register in the esp32. This counts at the processor clock rate (probably 240 Mhz) continually.

What happens when it the ccount register overflows? This routine does not guard against that and it seems like it will generate a very long delay when that happens. At 240 Mhz, with a 32 bit ccount register, we will see overflow every 17.9 seconds -- and this routine will delay for nearly 17.9 seconds when that happens.

The variable at 0x3ffe01e0 (in ram) apparently holds the number of processor ticks per microsecond, i.e. 240,000,000 / 1,000,000 for a 240 Mhz processor, i.e. 240. We don't know how or where this values gets initialized, but some startup code must do this. The argument (the desired delay in microseconds) come in in the a2 register. It gets multiplied by 240, then the routines loops sampling the ccount register until the elapsed ccount values matches the desired delay: C pseudocode might look like this:

    a10 = ccount
    a3 = a10
    a8 = ticks_per_us
    a2 = delay * a8
    do {
	a10 = ccount
    } while ( a10 - a3 < a2 )

The entry instruction

I never saw this instruction in 8266 disassembly. This is because the esp8266 never shifted the register window. The esp8266 had only a 16 register file, so shifting was pointless. The esp32 evidently has more than 16 registers in the register file. The Xtensa documentation says the actual size may be either 32 or 64. There are lots of details in section 4.7.1 of the ISA manual (starting at page 180).

The entry intstuction actually does the window shifting. The call8 instruction just sets the shift amount in a field in the processor PS register and makes the call. The entry instruction also saves the stack pointer, using the register specified (here a1) as the stack and sets up a stack frame of the size indicated (here 16 or 32). These values are a number of bytes.

The call8 instruction places the return address in a8 (which becomes a0 in the routine called (the "callee"). The plain old call0 instruction would place the return address into a0. The retw routine performs a "windowed return" and can only be used in a routine that started off with an entry instruction. A plain "ret" is available for non windowed use. The retw both restores the stack frame and undoes the window shift done by the call8. This may trigger a window underflow exception if the restored register window is not resident in the register file.

Feedback? Questions? Drop me a line!

Tom's Computer Info /