ADC Rd, Rd, RdAdd the
S
modifier to have the carry flag updated.
SBC Rd, Rd, Rd
[Rn, -Rn]
addressing mode.
This can provide a useful bootstrap in implementing certain low-level operating system routines (e.g. breakpoint handler), for example:
STR R0, [R0, -R0] ; preserve R0 at address 0 so it is free to use MOV R0, #0x100 ; or any convenient register save block STMIB R0, {R1-R14} ; save R1-R14 LDR R1, [R0, -R0] ; original R0 STR R1, [R0] ; save it too ; now all registers R0-R14 free to use ; ... MOV R0, #0x100 ; register save block LDMIA R0, {R0-R14} ; restore all registersI first saw this technique used in a disassembly of the Arthur operating system.
0xFC000003
and 0x03FFFFFC
are both acceptable).
For other cases, a PC-relative load from a literal pool is typically used:
LDR R0, const ; assembles to: LDR R0, [PC, #offset] ... .const EQUD 0x55555555This takes 2 words of storage and typically 3 cycles to execute (one of which may be delayed if it’s missing the cache), so it would be better to use a sequence of 2 data processing instructions instead, if possible. It will often even be preferable to use a sequence of 3 data processing instructions - this trades off instruction cache for data cache. Of course any constant can be constructed with 4 data processing instructions (e.g.a
MOV
, ORR
, ORR
, ORR
sequence),
and the majority of “random” constants do require the full 4.
Multiplier | MUL Rd, Rd, #imm | MUL Rd, Rn, #imm | MLA Rd, Rd, #imm, Rm | MLA Rd, Rn, #imm, Rm |
---|---|---|---|---|
2 |
MOV Rd, Rd LSL #1
|
ADD Rd, Rm, Rd LSL #1
|
||
3 |
ADD Rd, Rd, Rd LSL #1
|
ADD Rd, Rd, Rd LSL #1 |
||
4 |
MOV Rd, Rd LSL #2
|
ADD Rd, Rm, Rd LSL #2
|
||
5 |
ADD Rd, Rd, Rd LSL #2
|
ADD Rd, Rd, Rd LSL #2 |
||
6 |
ADD Rd, Rd, Rd LSL #1 |
ADD Rd, Rd, Rd LSL #1 |
||
7 |
RSB Rd, Rd, Rd LSL #3
|
RSB Rd, Rd, Rd LSL #3 |
||
8 |
MOV Rd, Rd LSL #3
|
ADD Rd, Rm, Rd LSL #3
|
||
9 |
ADD Rd, Rd, Rd LSL #3
|
ADD Rd, Rd, Rd LSL #3 |
||
10 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rd, Rd LSL #2 |
||
11 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rn, Rn LSL #2 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rn, Rn LSL #2 |
12 |
ADD Rd, Rd, Rd LSL #1 |
ADD Rd, Rd, Rd LSL #1 |
||
13 |
ADD Rd, Rd, Rd LSL #1 |
ADD Rd, Rn, Rn LSL #1 |
ADD Rd, Rd, Rd LSL #1 |
ADD Rd, Rn, Rn LSL #1 |
14 |
RSB Rd, Rd, Rd LSL #3 |
RSB Rd, Rd, Rd LSL #3 |
||
15 |
RSB Rd, Rd, Rd LSL #4
|
RSB Rd, Rd, Rd LSL #4 |
||
16 |
MOV Rd, Rd LSL #4
|
ADD Rd, Rm, Rd LSL #4
|
||
17 |
ADD Rd, Rd, Rd LSL #4
|
ADD Rd, Rd, Rd LSL #4 |
||
18 |
ADD Rd, Rd, Rd LSL #3 |
ADD Rd, Rd, Rd LSL #3 |
||
19 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rn, Rn LSL #3 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rn, Rn LSL #3 |
20 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rd, Rd LSL #2 |
||
100 |
ADD Rd, Rd, Rd LSL #2 |
ADD Rd, Rd, Rd LSL #2 |
Divisor | Unsigned, floor | Unsigned, ceil | Unsigned, nearest | Signed, floor | Signed, ceil | Signed, trunc | Signed, nearest |
---|---|---|---|---|---|---|---|
2 |
MOV Rd, Rd LSR #1
|
SUB Rd, Rd, Rd LSR #1
ADDS Rd, Rd, #1
MOVS Rd, Rd LSR #1
|
; tiebreak: even
; tiebreak: odd
; Ru = 1<<31
|
MOV Rd, Rd ASR #1
|
SUB Rd, Rd, Rd ASR #1
MOVS Rd, Rd ASR #1
|
MOVS Rd, Rd ASR #1
; Ru = 1<<30, Rv = 0
; Ru = 1<<30, Rv = -1
|
|
3 |
; Ru = 0x55555555
; Ru = 0xAAAAAAAB
|
... |
... |
... |
... |
... |
... |
4 |
MOV Rd, Rd LSR #2
|
SUB Rd, Rd, Rd LSR #1
; Ru = 1<<30
; Ru = 0
; Ru = 1
; Ru = -1
; Ru = -1<<30, Rv = 0
ADDS Rd, Rd, #3
ADDS Rd, Rd, #3
|
; tiebreak: floor
; tiebreak: ceil
; tiebreak: even
; tiebreak: odd
|
MOV Rd, Rd ASR #2
|
... |
... |
... |
5 | ... |
... |
... |
... |
... |
... |
... |
Divisor | Unsigned, floor | Unsigned, ceil | Unsigned, nearest | Signed, trunc | Signed, nearest |
---|---|---|---|---|---|
2 |
AND Rd, Rd, #1
|
MOV Rd, Rd LSL #31
AND Rd, Rd, #1
|
; tiebreak: even
; tiebreak: odd
|
... |
... |
3 |
; Ru = 3
; Ru = 3
|
... |
... |
... |
... |
4 |
AND Rd, Rd, #3
|
ANDS Rd, Rd, #3 |
; tiebreak: floor
; tiebreak: ceil
; tiebreak: even
; tiebreak: odd
|
... |
... |
5 | ... |
... |
... |
... |
... |
Divisor | Unsigned, floor | Unsigned, ceil | Unsigned, nearest | Signed, floor | Signed, ceil | Signed, trunc | Signed, nearest |
---|---|---|---|---|---|---|---|
2 | ... |
... |
... |
... |
... |
... |
... |
3 | ... |
... |
... |
... |
... |
... |
... |
4 | ... |
... |
... |
... |
... |
... |
... |
5 | ... |
... |
... |
... |
... |
... |
... |
Divisor | Inverse | Minimal in-place code | Minimal code using temporaries |
---|---|---|---|
3 | 0xAAAAAAAB |
ADD Rd, Rd, Rd LSL #2 |
|
-3 | 0x55555555 |
ADD Rd, Rd, Rd LSL #2 |
|
5 | 0xCCCCCCCD |
SUB Rd, Rd, Rd LSL #2 |
|
-5 | 0x33333333 |
ADD Rd, Rd, Rd LSL #1 |
|
7 | 0xB6DB6DB7 |
ADD Rd, Rd, Rd LSL #3 |
|
-7 | 0x49249249 |
ADD Rd, Rd, Rd LSL #3 |
|
9 | 0x38E38E39 |
SUB Rd, Rd, Rd LSL #3 |
|
-9 | 0xC71C71C7 |
RSB Rd, Rd, Rd LSL #3 |
|
11 | 0xBA2E8BA3 |
ADD Rd, Rd, Rd LSL #1 |
|
-11 | 0x45D1745D |
ADD Rd, Rd, Rd LSL #1 |
|
13 | 0x3B13B13B |
ADD Rd, Rd, Rd LSL #2 |
|
-13 | 0xC4EC4EC5 |
ADD Rd, Rd, Rd LSL #2 |
|
15 | 0x11111111 |
ADD Rd, Rd, Rd LSL #4 |
|
-15 | 0xEEEEEEEF |
ADD Rd, Rd, Rd LSL #4 |
|
17 | 0xF0F0F0F1 |
SUB Rd, Rd, Rd LSL #4 |
|
-17 | 0x0F0F0F0F |
RSB Rd, Rd, Rd LSL #4 |
|
19 | 0x286BCA1B |
ADD Rd, Rd, Rd LSL #1 |
|
-19 | 0xD79435E5 |
ADD Rd, Rd, Rd LSL #1 |
CMP
instruction has 7 possible outcomes in terms of flag configuration (the constraints are V ⇒ N⊕C, Z ⇒ N̅CV̅). A CMN
instruction has 9 possible outcomes (the constraints are V ⇒ N⊕C, Z ⇒ N̅(C+V̅)): observe that CMN #0, #0
and CMN #1<<31, #1<<31
are special cases, leading to the 2 extra possibilities. If one of the arguments is fixed, there are in general 5 possible outcomes for either instruction (although sometimes as few as 3 for special values of the fixed argument). Thus multi-way switches can be built, for example:
CMP R0, #0x80000001 ; R0 = 0x7FFFFFFF => VS ; R0 = 0x80000000 => LT ; R0 = 0x80000001 => EQ ; R0 = 0x80000002 => HI
CMN
or CMP
with appropriate shift and constant.
EOR R0, R0, R0 LSL #1
):
EOR R0, R0, R0 LSL #16 EOR R0, R0, R0 LSL #8 EOR R0, R0, R0 LSL #4 EOR R0, R0, R0 LSL #2 EOR R0, R0, R0 LSL #1The above operations are in fact commutative and so can be reordered freely.
EOR R0, R0, R0 LSR #1
):
EOR R0, R0, R0 LSR #16 EOR R0, R0, R0 LSR #8 EOR R0, R0, R0 LSR #4 EOR R0, R0, R0 LSR #2 EOR R0, R0, R0 LSR #1The operations
EOR R0, R0, R0 ASR #1
and EOR R0, R0, R0 ROR #1
are 2-to-1,
with inverse images also given by the above procedure and its (one’s) complement.
EOR R0, R0, R0 ROR #16 EOR R0, R0, R0 ROR #8 EOR R0, R0, R0 ROR #4 EOR R0, R0, R0 ROR #2 EORS R0, R0, R0 ROR #1 ; now R0 = 0 (even parity) or -1 (odd parity) ; also N = Z̅ = parityVariants of the final instruction can be used to get parity in the C or V flags.
; Ru = 0x34CB34CB EOR R0, R0, R0 LSR #16 EOR R0, R0, R0 LSR #8 EOR R0, R0, R0 LSR #4 MOVS R0, Ru ROR R0 ; msb(R0) = N = parity (also C = parity, if C clear initially)Instead using a lookup table:
; R1 = pointer to 256-byte lookup table (1 byte per value) EOR R0, R0, R0 LSL #16 EOR R0, R0, R0 LSL #8 LDRB R0, [R1, R0 LSR #24]
; R1 = pointer to 32-byte lookup table (1 bit per value) EOR R0, R0, R0 ROR #16 EOR R0, R0, R0 ROR #8 LDR R2, [R1, R0 LSR #27] ; requires unaligned loads to be aligned (not rotated) MOVS R0, R2 ROR R0 ; msb(R0) = N = parity (also C = parity, if C clear initially)
; Ru = 0x55555555, Rv = 0x33333333, Rw = 0x0F0F0F0F BIC R1, R0, Ru SUB R0, R0, R1 LSR #1 AND R1, Rv, R0 LSR #2 AND R0, R0, Rv ADD R0, R0, R1 ADD R0, R0, R0 LSR #4 AND R0, R0, Rw ADD R0, R0, R0 LSR #8 ADD R0, R0, R0 LSR #16 AND R0, R0, #0xFFOr with a multiply (likely slower):
; Ru = 0x55555555, Rv = 0x33333333, Rw = 0x0F0F0F0F, Rx = 0x01010101 BIC R1, R0, Ru SUB R0, R0, R1 LSR #1 AND R1, Rv, R0 LSR #2 AND R0, R0, Rv ADD R0, R0, R1 ADD R0, R0, R0 LSR #4 AND R0, R0, Rw MUL R0, Rx, R0 MOV R0, R0 LSR #24
EOR R0, R0, R1 EOR R1, R1, R0 EOR R0, R0, R1(Using
RSB
instead of EOR
throughout will result in swapping and negating both values.
Other variants can selectively negate just R0
or R1
.)
EOR R1, R0, R0 ROR #16 BIC R1, R1, #0xFF0000 MOV R0, R0 ROR #8 EOR R0, R0, R1 LSR #8Or with preloaded registers:
; Ru = 0xFF00FF AND R1, Ru, R0 AND R0, Ru, R0 ROR #24 ORR R0, R0, R1 ROR #8
; Ru = 0x01010101 SUB R1, R0, Ru BIC R1, R1, R0 TST R1, Ru LSL #7 ; Z̅ = zero byte presentFurthermore, bit 7 of
R1
indicates whether the least significant byte is 0 (and if not, bit 15 corresponds to the next byte, etc.). A precheck which takes one fewer cycle but is only necessary and not sufficient:
AND R1, R0, R0 LSL #16 TST R1, R1 LSL #8 ; Z = zero byte maybe presentUnfortunately, random data will trigger this check about 60% of the time! ASCII text will fare better. Is there a better 2-cycle precheck?