They were used all the time to enter and leave subroutines, as well as to save and restore registers during exception handling. Typical calls looked like this:
stmia r1!, {r3, r4, r5, r6, r7, r8, r9, sl}
ldm r7!, {r0, r1, r2, r3}
This instruction is gone in aarch64. What happened?
Well the ARM designers came to their senses, that is what happened. The real question is how a CISC like instruction such as ldm/stm ever found it way into the ARM architecture in the first place.
What we have now is ldp/stp which loads (or stores) a pair of registers. This may not seem so handy, but it plays much better with RISC chip design, pipelines, and all of that. Consider that many ARM chips these days have multiple execution units and can perform 3 or 4 operations in parallel. It is quite possible that this instruction can run in one cycle storing (or loading) both registers at the same time. There is often a wide path to memory (and remember that these are 64 bit registers), so we would need to have a 128 bit wide data path to memory -- and this is quite likely to be available with modern cache designs.
Here is an example of these instructions being used in an exception handler. Registers x0 to x18 and x30 get saved. It is not necessary to save x19 to x29. I hear the screams already. You need to save ALL of the registers! Well you do, but it is not the responsibility of this code to do so. The ARM compiler ABI says that x19 to x29 are "callee saved" registers. So, it is the responsibility of the subroutine being called to save any of those registers it intends to use. Usually this will be none at all and nothing needs to be done -- which is certainly an optimization over blindly saving them all.
ffff0a80: a9bf07e0 stp x0, x1, [sp, #-16]! ffff0a84: a9bf0fe2 stp x2, x3, [sp, #-16]! ffff0a88: a9bf17e4 stp x4, x5, [sp, #-16]! ffff0a8c: a9bf1fe6 stp x6, x7, [sp, #-16]! ffff0a90: a9bf27e8 stp x8, x9, [sp, #-16]! ffff0a94: a9bf2fea stp x10, x11, [sp, #-16]! ffff0a98: a9bf37ec stp x12, x13, [sp, #-16]! ffff0a9c: a9bf3fee stp x14, x15, [sp, #-16]! ffff0aa0: a9bf47f0 stp x16, x17, [sp, #-16]! ffff0aa4: a9bf7bf2 stp x18, x30, [sp, #-16]! ffff0aa8: 94000a63 bl 0xffff3434 ffff0aac: a8c17bf2 ldp x18, x30, [sp], #16 ffff0ab0: a8c147f0 ldp x16, x17, [sp], #16 ffff0ab4: a8c13fee ldp x14, x15, [sp], #16 ffff0ab8: a8c137ec ldp x12, x13, [sp], #16 ffff0abc: a8c12fea ldp x10, x11, [sp], #16 ffff0ac0: a8c127e8 ldp x8, x9, [sp], #16 ffff0ac4: a8c11fe6 ldp x6, x7, [sp], #16 ffff0ac8: a8c117e4 ldp x4, x5, [sp], #16 ffff0acc: a8c10fe2 ldp x2, x3, [sp], #16 ffff0ad0: a8c107e0 ldp x0, x1, [sp], #16 ffff0ad4: d69f03e0 eretI told you aarch64 is quite different from aarch32, and this is a great example.
Tom's Computer Info / tom@mmto.org