The following was a proposal I wrote proposing an improvement to the branch prediction in a new ICL mainframe. The main scheme was what is now called a two-bit Smith branch predictor, but the interesting bit is at the end where I show how the branch prediction can be improved using only one bit per hash entry by using random numbers.

An area like this I used random numbers for is when gathering statistics on program running. It would be daft to do this nowadays with memory so cheap but once upon a time an extra 16k really meant something. The idea is that instead of using 4-byte integers for the count you use only a single byte, but the value is only incremented with a certain probability that goes down the higher the number in the byte. Thus the exact number isn't kept only a approximation to the log of the number.

Probably the neatest application of probability I know of that is still of use in computing is to form a directory fingerprint so one can normally avoid checking it for an entry which isn't present. Hmm..., I just did a search using Vivisimo and I can't find any reference to the technique so perhaps it ain't so useful any more :(

D McQuillan

BRA 4OPAW x2633

25 Sept '79

The jump prediction described in the new PSD about the pipeline for S$ predicts that a conditional jump will jump the same way as it did last. I also suppose that there is no clearing of the jump prediction table - a jump address hashes to a place in a table in which a single bit is kept giving the prediction.

The jump prediction could be very significantly improved I think by having 2-bits per entry and only changing a prediction if it is wrong twice in succession. What is wrong with having one bit is:

a) In repeated loop the first time round is predicted wrong each time - thus the prediction is wrong twice per invocation of the loop (either explicitly in an outer loop or implicitly in a repeated subroutine call).

b) Single executions of contrary jumps in straight sequences of code have an undue disruptive influence when they hash to the same places as repeated jumps.

There are a number of alternatives which work better in these circumstances. My favourite is a two bit method. One bit (j)holds the result of the last jump and another (p) hold the prediction for the next jump hashing to the location. The prediction is changed if it is wrong and differed from the last jump bit also. The action is shown in the table below

prediction=p | j | jump is | success | next p | j=jump |

0 | 0 | 0 | Y | 0 | 0 |

0 | 0 | 1 | N | 0 | 1 |

0 | 1 | 0 | Y | 0 | 0 |

0 | 1 | 1 | N | 1 | 1 |

1 | 0 | 0 | N | 0 | 0 |

1 | 0 | 1 | Y | 1 | 1 |

1 | 1 | 0 | N | 1 | 0 |

1 | 1 | 1 | Y | 1 | 1 |

Repeat Loop size |
Random prob of jump |
|||||

2 | 3 | 4 | 5 | 1/3 | 1/4 | |

PSD Method | 0% | 33% | 50% | 60% | 56% | 62% |

2-bit method | 50% | 67% | 75% | 80% | 59% | 69% |

Formulae:

PSD method repeated loop size N |
% right = (N-2)/N |

2-bit method |
% right = (N-1)/N |

PSD method random prob of jump |
% right = 1-2g |

2-bit method |
% right = (1-2g-2g |

Where g = p(1-p)

and p = probability of successful jump

Difference repeated loop size N = 1/N

Difference random prob f jump = (g-4g^{2})/(1-g) approx= p
when p small

Other interesting methods are:-

The 1-bit random method – the prediction bit is only changed with probability p when it is wrong.. The following shows the % of correct predictions:

Repeat Loop size |
||||

2 | 3 | 4 | 5 | |

p=1/2 | 33% | 43% | 53% | 61% |

p small | 50% | 55% | 62% | 68% |

The probabilities are the same as the PSD method for random probabilities if jumping. The only disadvantage is the method takes longer to settle down and it is not as good as the 2-bit method.

> 2-bit methods. If more than 2 bits are used then better prediction can be done for the random probability of jump case approaching the best case of % prediction right = max(p,1-p). The pattern for small loops can be recognized so that if

true false false true false false

is encountered a prediction of true is made and so a 100% success rate achieved for such loops.

An 8-bit method might be

New 8-bit = old 8 bit shift left 1 and put in new jump at the right

so the 8 bits contain a historical record of the last 8 jumps. A 256 entry table would be used prediction indexed by the 8-bit value with values like

pattern 10001000 predict 1 loop of 4

pattern 01110111 predict 0

pattern 10101010 predict 1 loop of 2

pattern 01010101 predict 0

David McQuillan