Delayed Branching in PA-RISC

Delayed Branching in PA-RISC

A primary goal of the PA-RISC architecture is to complete the execution of a useful instruction in each machine cycle. Branch is difficult to implement in one cycle because we must first compute the branch address, then go to memory again to retrieve the instruction at that address. PA-RISC is a pipelined processor, meaning that it has a pipeline of instructions that it is preparing to execute while it is actually executing the current instruction. As long as the instructions execute in sequence, the hardware to pre-retrieve the following instructions is fairly straightforward. When we branch, we cannot have the instruction at the branch destination in our pipeline because we don't know in advance which instruction it is. The choice seems to be between using two cycles to execute the branch or stretching the length of the basic cycle time to allow for retrieving the branch location from memory. Neither choice seems attractive.

What PA-RISC does is ingenious and profoundly disturbing. PA-RISC delays the execution of the branch for one cycle. As a result, the instruction following the branch (located in the delay slot) is executed before control passes to the branch destination. The compiler looks for an instruction that it can put in the delay slot, one that can be executed during the branch operation without changing the logic:

    BL opencarton       ; branch
    LDW 26 ...          ; load word into register during delay

The BL instruction is executed before the LDW instruction, but does not take effect until one cycle later. The LDW instruction that comes after the BL is actually executed while the BL completes. If the compiler can't find anything useful to do in that slot, you see a No-Op instruction:

    BL closecarton      ; branch
    NOP                 ; code 8000240, actually OR 0,0,0

Delayed Branching is the same as if we could pack our bags while flying to our destination:

      1.  book our flight
      2.  reserve hotel room
      3.  reserve rental car
      4.  fly to destination
      5.       (pack suitcase during the delay slot)
      6.            collect baggage
      7.            get rental car
      8.            check into hotel

The delayed branch is not just an intellectual curiosity. When you set Debugger breakpoints, you must remember that the instruction after the branch will be executed before the branch. Setting breakpoints near branches can stretch your imagination. The newest PA-RISC designs are even more radical. They speculatively execute many instructions at once, out of sequence, then undo the ones that have a conflict. This makes delayed branching seem almost ordinary.