The world of software: And then there was R
Alan Chalk uses R to visualise data in Google Earth.
17 AUGUST 2012 | ALAN CHALK
Are any of these statements true?
• You know everything.
• You want to be part of a profession whose overall skill set is irrelevant and outdated.
• You want your own skill set to be irrelevant and outdated.
• You are not interested in using your skill set to create new insights and add value to your customers /shareholders /company.
If so, stop reading now.
If you are still reading, the good news is that there is one easy way to keep your skill set moving ahead. Get to know the capabilities of the software that analysts and experts in all fields worldwide are using to push back the boundaries.
‘I am too busy,’ I hear you cry. ‘I’m working 12 hours a day at work. / I am out watching the Olympics. / I am eating pizza.’ (Delete as necessary).
This is a very real issue. We are all very busy. We are not professional programmers. We have some exceedingly easy-to-use software which does the day job reasonably well (for the moment). The learning curve to get significant benefit out of new software is steep and requires an investment of hundreds of hours.
For an investment of 15 minutes digesting the contents of this column every month or two, we will bring you some aspects of the huge range of tools that are “out there”. This, at least, will help you decide whether investing further time might be beneficial and when calling external expertise may be of use.
We will of course take a good look at Open Source software: R, Python, Perl, LaTeX + Sweave. Some of the names may be familiar to you, some may not. You may have your own favourites. There are so many things they can do that are useful for you and will expand your skill set. They are free, so you can experiment at your leisure without worrying about budgets. We won’t limit ourselves to Open Source though. There are commercial packages which are too good to ignore – Matlab comes immediately to mind.
With our remaining column inches this month, we will take a very brief look at Geospatial visualisation in R. If you are working in Insurance or any form of risk management you could probably benefit from visualising where your risks are and also from analysing how your risk varies over space and time. Two obvious applications are in setting premiums where prices depend on where the risks are (auto, private household etc.) and in accumulation management. We will concentrate on visualisation– we’ll save analysis for another time. The main advantage of starting to do this in R is that you will find yourself exposed to new ideas as you begin your journey through the amazing set of R libraries. New to R? If you are new to R and would like to run the code in this article:
a) Don’t panic
b)Visit http://www.r-project.org/ and follow the details for getting started
c) You will need to download certain libraries – you can see these in the code, contained in the brackets after the keyword “library”. For example for Figure 1 you will need “rgdal”.
d)To download a library: Once you have started R, type install.packages('name of library') for example install.packages('rgdal'). The press enter. Don’t forget to enclose the name of the library in quotes.
e)One more thing. You will probably want your work and files to appear in a particular directory. If so, when you start your session, type setwd(“directory of your choice”) for example setwd ("C://work//geo") . Note the double forward slashes needed. R and Google Earth
Take a look at the graphic below which shows pollution levels near the River Maas (Figure 1).
Figure 1: Pollution readings next to the River Maas on the border between Belgium and the Netherlands. The rather grey looking river does not stand out but you can see a yellow line over it. The level of pollution is not shown – just where the readings were taken.
Here R is used R to create a KML file which is then viewed in Google Earth. The process of doing this is not particularly complicated. If you have the data, you could just as easily view where accidents happen or the damage claims along a storm path. R code for figure 1
# Load the libraries that you need
# Load the data that you need
coordinates(meuse) <- c("x", "y")
# Add the coordinate system
proj4string(meuse) <- CRS("+init=epsg:28992")
meuse_ll <- spTransform(meuse, CRS("+proj=longlat +datum=WGS84"))
# Write the kml file
writeOGR(meuse_ll["zinc"], "meuse.kml", layer="zinc", driver="KML")
# Find the kml file, open GoogleEarth and drag the kml file on to the globe. R and Google Maps
You might be thinking you would like the visualisation to indicate the amount of pollution at each measurement. Consider Figure 2 below:
Figure 2: Pollution readings next to the River Maas on the border between Belgium and the Netherlands. The size of the circles represents the amount of pollution. It can be seen that the largest amounts of pollution are near the river.
In this case, we used R to directly create an htm file. Double click on the htm file to view the map in your browser (or upload it to your company’s website etc).
R code for Figure 2
# Load the library that you need
# Load the data that you need and add coordinate reference system
# already done for Figure 1
# Create the htm file
# Your output will look similar to Figure 2, but not identical.
Before we leave the River Maas – let us just briefly show that R is good for more than visualisation. Imagine, that you would like to infer from the above sample how the risk of pollution is spread across the whole area. R has many ways to help you do this – we show just one of them below (Figure 3):
Figure 3: Ordinary Krigging (OK) used to predict pollution levels near the Rive Maas. Lighter areas are those with higher predicted pollution levels.
The white circles are the original data points but now you can also see predictions of the pollution level for all points in the area. The R code for carrying out this analysis is fairly simple.
You could of course do exactly the same for accident frequencies or severities for any class of business you are looking at. If you like 3-d graphics, those are available too.
R and the Google Visualisation API
What about earthquake monitoring? Marcus Gesmann at Lloyds (watch out for him at GIRO) contributed Figure 4 below:
Figure 4: All earthquakes in the last 30 days of magnitude greater than 4
How many lines of code do you think it took to fetch the code from the Internet and produce this output? Not many (using googleVis). See below.
R code for Figure 4
# Get earthquake data of the last 30 days
eq <- eq[] ## extract the eq table
eq$MAG <- as.numeric(as.character(eq$MAG))
# Create the map
eq$loc=paste(eq$LAT, eq$LON, sep=":") ## create a lat:long location variable
plot(gvisGeoMap(eq, "loc", "MAG","DATE", options=list(dataMode="markers")))
You can see that it would not be hard to plot accumulated location, sum insured or other measures of risk within a 20km radius of each epicentre. A nice challenge for one of your summer interns?
For me, the main benefit of using R has not been that it can do many things far better than most other software. It is that by using it, I get exposed to the methods and thinking from top experts from all over the world – the ideas you see above are only the beginning of the journey. This keeps my skill set up to date and helps me to add value to my customers.
That’s all we have space for this month. If you have a topic for which you would like a solution or you have a favourite software that you would like covered, please email me at firstname.lastname@example.org. Do let me know also if you would like more practical detail on the topics we cover. Better still, if you would like to contribute to this column with a case study of your own – please let me know.
Further afield, CAS has an Open Source software committee which carries out various activities. These include: telephone conferences where you can hear the latest news, Webexes given by various experts and various other activities. If you would like to be part of that community please email Lee Bowron at email@example.com. Alan Chalk works in general insurance and is always looking for new ways to add value
Download the R code snippets as a text file