The magazine of the Institute & Faculty of Actuaries
.

# The flaw in batting averages

o, can you tell the true batting averages of all the great and not-so-great one-day players? They’re all published and publicised, quoted, and misquoted. But does the batting average as we know it truly reflect a measure of central tendency? We don’t think so.

So, what’s wrong?
What’s wrong with the current method of calculating averages is the treatment of ‘not-outs’. The current definition of batting averages takes the total numbers of runs scored by players and divides it by the number of innings played. However, the denominator excludes the number of innings in which the batsman was not out. The logic for ignoring the not-out innings is that it is not possible to estimate the number of runs a batsman would have made had the game continued. However, this really bumps up the career batting average for late-order batsmen like Michael Bevan.
All those versed with the concept of exposed-to-risk will know there’s something wrong here. For example, if a batsman scores 300 runs in four innings of a series and is not out in each of them, his average will be infinite. In the fifth innings if he gets out for a duck, his average for the series would be a majestic 300!
Clearly, batsmen who do not have a substantial number of not-outs in their careers would not be affected much, but those late-order batsmen who stay around until the end of the innings would benefit unduly under the current method of calculating batting average.

An alternative measure
An alternative to the current method is to account for, using some proxy, the innings in which the batsman was not out. One proxy can be the career average number of minutes spent by a batsman in the middle. The problem with this proxy is that a batsman could have spent a lot of minutes in the middle, but on the non-striker’s end.
Another and a better proxy (we assert) is the career average number of balls faced per innings. This essentially uses the principle of exposed-to-risk for calculating the denominator in the average. Each inning of a player can be viewed as contributing to the ‘exposure’ and the fact that a batsman ends up being not out at the end of the innings does not indicate that the exposure was zero, since the player has had the opportunity to score runs in that innings.
We propose the following formula to calculate the following exposure:
IF Batsman = Out
THEN 1
ELSEIF Batsman = Not Out
AND NOB THEN NOB/AvgNOB
ELSE 1
ENDIF
where,
NOB = Number of balls faced in a particular innings
AvgNOB = Career average number of balls faced per innings

Therefore, for an innings in which the batsman is not out we can define the exposure as the actual number of balls faced in the innings over the average number of balls faced by the batsman in his career. This exposure is capped at one, so if the batsman is not out but has faced more than the average number of balls then he has played a ‘full’ innings for the purpose of calculating the batting average.

Does this really matter?
This depends on whether it’s Tendulkar or Bevan, Sehwag, or Klusener. Top-order batsmen are less likely to have many not-outs in their career, so the revised batting average will not be very different from the normal average. However, middle-order and lower middle-order batsmen, especially those that stick around to the end, are the ones most affected. Michael Bevan, the quintessential ‘finisher’ in one-dayers, ended with a phenomenal average of 53.58 in 232 ODIs. However, his revised average drops to 38.73!
Similarly, Lance Klusener’s average drops from 41.10 to 28.73. In our sample of batsmen, Virender Sehwag shows the maximum resistance and his average only drops from 32.44 to 31.20. We all know the reason why: he’s not very likely to remain not out at the end of 50 overs!

Do the results make sense?
Table 1 shows the results for the sample batsmen we looked at sorted by the drop in their averages. The top 10 batsmen in the table whose averages have dropped the most are all middle-order batsmen. Of these, five are Aussies.
An alternative, and simpler way of calculating the revised average is to deduct from the normal average the proportion of not-outs to matches. This gives the estimate (shown as ‘revised average (2)’ in table 1) that has an error on average of less than half a run. The method gives an estimate of the average score that is reasonable.
Comparing the two averages with the median score also illustrates that the revised average is closer to the median than the normal average. However, the median is still lower than both the revised and the normal average. This indicates that the distribution of runs scored by batsmen is positively skewed that is, there are a few high scores above the median which push the averages up.
We believe that our proposed adjustment to the current methodology for calculating batting average has some advantages and is a better reflection of the underlying central tendency. In our sample, Viv Richards has the second-highest normal average. Under the revised method, Richards moves to the top, with Sachin Tendulkar moving to the second position and Ricky Ponting at 3. That sounds about right!

07_05_07.pdf