[tom@trona bootrom]$ ./espdis ets_delay_us 40008530: 3ffe01e0 ; l32r ets_delay_us: 40008534: 364100 entry a1, 32 40008537: a5b103 call8 0x4000c050 ; xthal_get_ccount 4000853a: 81fdff l32r a8, 0x40008530 ; g_ticks_per_us_pro ( 0x3ffe01e0 ) 4000853d: 3d0a mov.n a3, a10 4000853f: 8808 l32i.n a8, a8, 0 40008541: 802282 mull a2, a2, a8 40008544: a5b003 call8 0x4000c050 ; xthal_get_ccount 40008547: 30aac0 sub a10, a10, a3 4000854a: 273af6 bltu a10, a2, 0x40008544 4000854d: 1df0 retw.n xthal_get_ccount: 4000c050: 362100 entry a1, 16 4000c053: 20ea03 rsr.ccount a2 4000c056: 1df0 retw.nThis calls a routine "xthal_get_ccount", which is about as simple as a routine can get. This reads the esp32 processor "ccount" register into a2 and returns. The first thing to note is that because of the "call8", the register window shifts by 8, so that once this returns, the value is seen in a10 by the caller. Also it is at least worthy of comment that this could be simply done inline via the instruction "rsr.ccount a10". Note that the "rsr.ccount" routine returns a cycle count value from a special register in the esp32. This counts at the processor clock rate (probably 240 Mhz) continually.
What happens when it the ccount register overflows? This routine does not guard against that and it seems like it will generate a very long delay when that happens. At 240 Mhz, with a 32 bit ccount register, we will see overflow every 17.9 seconds -- and this routine will delay for nearly 17.9 seconds when that happens.
The variable at 0x3ffe01e0 (in ram) apparently holds the number of processor ticks per microsecond, i.e. 240,000,000 / 1,000,000 for a 240 Mhz processor, i.e. 240. We don't know how or where this values gets initialized, but some startup code must do this. The argument (the desired delay in microseconds) come in in the a2 register. It gets multiplied by 240, then the routines loops sampling the ccount register until the elapsed ccount values matches the desired delay: C pseudocode might look like this:
a10 = ccount a3 = a10 a8 = ticks_per_us a2 = delay * a8 do { a10 = ccount } while ( a10 - a3 < a2 )
The entry intstuction actually does the window shifting. The call8 instruction just sets the shift amount in a field in the processor PS register and makes the call. The entry instruction also saves the stack pointer, using the register specified (here a1) as the stack and sets up a stack frame of the size indicated (here 16 or 32). These values are a number of bytes.
The call8 instruction places the return address in a8 (which becomes a0 in the routine called (the "callee"). The plain old call0 instruction would place the return address into a0. The retw routine performs a "windowed return" and can only be used in a routine that started off with an entry instruction. A plain "ret" is available for non windowed use. The retw both restores the stack frame and undoes the window shift done by the call8. This may trigger a window underflow exception if the restored register window is not resident in the register file.
Tom's Computer Info / tom@mmto.org