Alan Chalk provides an introduction to the science of machine learning, and believes it has a place in the actuarial skillset of the future
Seen in this way, the link between machine learning and actuarial work is not instantly obvious, yet the techniques used by data scientists to solve problems in image recognition, social networking and elsewhere are directly applicable.
Machine learning can be broken down into various kinds. Learning from past examples of the outcome of a process is called supervised
learning. This is what actuaries do when they use past claims development to estimate future development factors, or past claims experience in pricing exercises. Clustering customers into groups based on their attributes in order to help design appropriate marketing material is called unsupervised learning. Other categories of learning include semi-supervised learning and
reinforcement learning. Within each of these types of learning there are
many methods that can be used. Some of these methods are already widely
used by actuaries but many are not and an exploration of these would be
beneficial. In this article though, we look at an overarching concept
within the machine learning community - 'How do we know when we have learned
something?' - and we see how it can benefit actuarial work.
What is 'learning'?
Consider training a computer to recognise apples. In this instance, it means finding a function that can predict, based on the colouring of pixels, those that are likely to belong to an apple. Various images are used to find such a function. This process is called training, and the images used in this process are called training images (or training data).
One of the training images is shown in Figure 1, and the result of the function, when run over the pixels in that image, is shown in Figure 2. It seems that the computer has 'learned' to identify the apple pixels.
However, we might ask 'Did the computer really learn to understand the difference between apples and other things, or did the model only work for the few images it was trained on?'.
The top panel of Figure 3 shows a more complicated image and the bottom panel shows the result of applying the function. Most of the leaves have been classified as 'apple' (there were green and red apples in the training set). Clearly, the algorithm does not generalise well to new images.
It is important to be sure of the ability of a computer vision to generalise to the population at large. We need to know that a driverless car can tell the difference between a green traffic light and a green leaf. In the insurance setting, there is little point in a pricing model that fits well to historic claims experience but does not accurately estimate the claims experience of future new business.
In a reserving exercise, it is not useful to find a set of loss development factors that accurately fits the data in the triangles but does not correctly predict ultimate claims.
Understanding the errors
There are various reasons why a model may not generalise well.
? Models may be too complex - their complexity allows them to fit the vagaries of training data and gives a false impression that something useful has been learnt. However, what has been learnt is very specific to the training data and does not generalise well to new examples.
? Models may be too simple - they do not reflect all the important features of the real world and will be consistently wrong in certain situations. This is, indeed, the situation for the 'apples' model above, which does not allow for the shape of objects.
? We may not have enough data
Methods to deal with this issue include:
1. Split the data into two or three parts - training, validation and possibly test sets.
2. Fit and fine-tune models and avoid over-fitting by using the training and validation data- sets.
3. Assess the performance of different models based on test data.
Once this has been done, useful analytic graphs can be produced. Two such graphs are shown in Figure 4 (see below). The top panel shows a fairly typical situation, where the model fits training data increasingly well as complexity increases - for example, with the addition of more rating factors or more interactions between rating factors - but that beyond a certain complexity, validation error gets worse. This graph and related measures can help in making sure the best model is chosen.
The bottom panel reflects a situation in which an analysis is being done with only 8,000 experience records, raising concerns that the number of records is too small for accurate learning. The graph is created by taking samples of the training data, that is experience records, of increasing size, and repeatedly finding the best-fitting model. The graph shows clearly that while some learning has been achieved, more data would still be helpful. In a competitive situation, such graphs can help us to understand whether lack of data might be leading to poor models and therefore possible anti-selection.
In the machine learning community there is a significant and formal
emphasis in understanding the generalisation error of models. Various
empirical methods are used - essentially, more sophisticated versions of
the train-validation-test approach discussed above. Some models actually
come with theoretical guarantees on the generalisation error.
The same emphasis is not always present in work carried out by actuaries at
present, and this can be attributed to a number of factors:
? The often automated application of machine learning algorithms may require these checks and guarantees more than the mixture of statistics and expert judgment present in much actuarial work.
? In terms of the over-fitting risk, historically, the type of model and data used by actuaries has meant that this risk was small - although growing in importance, for example, where proxy models for assets and liabilities are used within Solvency II internal capital models.
Nonetheless, as data volumes increase and actuarial models become more
complex, those not currently using these techniques may find a fresh review
Figure 1: An apple and a 'not apple'
Figure 2: Estimated 'apple' pixels
Figure 3: The test image contains green leaves. The result of the function is shown below (white parts representing predicted apple pixels) and it can be seen that the leaves have been misclassified
Figure 4:Training reduces with increasing model complexity based on the chosen measure of complexity
Validation error reduces with increasing sample size, and is still reducing when approaching the total data-set