What PA-RISC does is ingenious and profoundly disturbing. PA-RISC delays the execution of the branch for one cycle. As a result, the instruction following the branch (located in the delay slot) is executed before control passes to the branch destination. The compiler looks for an instruction that it can put in the delay slot, one that can be executed during the branch operation without changing the logic:
BL opencarton ; branch LDW 26 ... ; load word into register during delay
The BL instruction is executed before the LDW instruction, but does not take effect until one cycle later. The LDW instruction that comes after the BL is actually executed while the BL completes. If the compiler can't find anything useful to do in that slot, you see a No-Op instruction:
BL closecarton ; branch NOP ; code 8000240, actually OR 0,0,0
Delayed Branching is the same as if we could pack our bags while flying to our destination:
1. book our flight 2. reserve hotel room 3. reserve rental car 4. fly to destination 5. (pack suitcase during the delay slot) 6. collect baggage 7. get rental car 8. check into hotel
The delayed branch is not just an intellectual curiosity. When you set Debugger breakpoints, you must remember that the instruction after the branch will be executed before the branch. Setting breakpoints near branches can stretch your imagination. The newest PA-RISC designs are even more radical. They speculatively execute many instructions at once, out of sequence, then undo the ones that have a conflict. This makes delayed branching seem almost ordinary.