## Simple Risk Model (Part 1 of 5): Simulate Loss Frequency #1

October 25, 2010

Let’s start this series by defining risk. I am going to use the FAIR definition of risk which is: the probable frequency and probable magnitude of future loss. From a modeling perspective, I need at least two variables to model the risk for any given risk issue: a loss frequency variable and a loss magnitude variable. Hopefully, you are using a risk analysis methodology that deconstructs risk into these two variables…

The examples I am sharing in this blog series are an example of stochastic modeling. The use of random values as an input to a probability distribution ensures there is variation in the output; thus making it stochastic. The variable output allows for analysis through many different lenses; especially when there are additional (meaningful) attributes associated with any given risk issue (policy section, business unit, risk type, etc…).

Part 1 and 2 of this series will focus on “probable or expected [loss] frequency”. Frequency implies a number of occurrences over a given period of time. Loss events are discrete in nature; there are no “partial” loss events. So, when we see probable loss frequency values like 0.10 or 0.25 – and our time period is a year – we interpret that to mean that there is a 10% or 25% chance of a loss event in any given year. Another way of thinking about it is in terms of time; we expect a loss event once every ten years (0.10) or once every four years (0.25). Make sense?

You may want to download this Excel spreadsheet to reference for the rest of the post (it should work in Excel 2003, Excel 2007 and Excel 2010; I have not tested it on Office for Mac).

Make sure you view it in Excel and NOT Google Apps.

In a simulation, how would we randomly draw loss frequency values for a risk issue whose expected loss frequency is 0.10, or once every ten years? I will share two ways; the first of which is the remainder of this post.

For any simulation iteration, we would generate a random value between 0 and 1; and compare the result to the expected loss value

a.    The stated expected loss frequency is 0.10 (cell B1; tab “loss 1”)

b.    For purposes of demonstration, we will number some cells to reflect the number of iterations (A6:A1005; A6=1; A7=A6+1; drag A7 down to you get to 1000).

c.    In Excel, we would use the =RAND() function to generate the random values in cells B6:B1005.

d.    We would then compare the randomly generated value to the expected loss frequency value in cell B1; with this code in C6 dragged down to C1005:

=IF(B6<=\$B\$1,1,0)

i.    If the generated random value in cell B6 is equal to or less then 0.1000 (cell B1), then the number of loss events for that iteration is 1.
ii.    If the generated random value in B6 is greater then 0.1000, then the number of loss events for that iteration is 0

e.    Once you have values in all the cells, you can now look at how many iterations resulted in a loss and how many did not. Cell B2 counts the number of iterations you had a loss and cell B3 counts the number of iterations you did not have a simulated loss; their corresponding percentages are next to each other.

f.    The pie chart shows the percentage and count for each loss versus no loss.

g.    Press the F9 key; new random values will be generated. Every time you press F9 think of it as a new simulation with 1000 iterations. Press F9 lots of times and you will notice that in some simulations loss events occur greater then 10% of the time and in some simulations less then 10% of the time.

h.    What you are observing is the effect of randomness. Over a large number of iterations and/or simulations we would expect the loss frequency to converge to 10%.

i.    Another thing worth mentioning, is that output from the RAND() function is uniform in nature. Thus, there is equal probability of all values between 0 and 1 being drawn for any given iteration.

j.    Since our expected loss frequency is 0.1000 and the RAND() functions output is uniform in nature – we would expect to see 10% of our iterations result in loss; some were more and some were less.

There are some limitations with this method for simulating the loss frequency portion of our risk model:

1.    If the expected loss frequency is greater then 1 then using RAND() is not viable, because RAND() only generates values between 0 and 1.

2.    In iterations where you had a loss event; this method does not reflect the actual number of loss events for that iteration. In reality, there could be some iterations (or years) where you have more then one loss event.

Some of the first models I built used this approach for generating loss frequency values. There is usefulness regardless of its simplicity. However, there are other methods to simulate loss frequency that are more appropriate for modeling and overcome the limitations listed above. In the next post, we will use random values, a discreet probability distribution and the expected loss frequency value to randomly generate loss frequency values.

NOTES / DISCLAIMERS: I am intentionally over-simplifying these modeling examples for a few reasons:
1.    To demonstrate that IT Risk modeling is achievable; even to someone that is not an actuarial or modeling professional.
2.    To give a glimpse of the larger forest past some of the trees blocking our view within the information risk management profession.
3.    As with any model – simple or complex – there is diligence involved to ensure that the right probability distributions and calculations are being used; reflective of the data being modeled.
4.    In cases where assumptions are being made in a model; they would be documented.

## Useful Risk Modeling / Simulation Excel Add-Ons

October 18, 2010

I have had a lot of “modeling” and simulation conversations over the last several months. One common strand of these discussions is around Excel add-ons used to perform simulations and analysis. I want to share with you a few thoughts as well as some of the add-ons I use.

THOUGHTS

1.    Add-ons are great and they can make a lot of aspects of simulation and analysis “easy”. However, I would argue that you still need to understand “the why” and “the how” of features and functionality of the add-on to ensure you are using the add-on correctly. One of my modeling mentors discourages extensive use of “fancy” add-ons – he prefers plain vanilla Excel. He prefers native Excel because most of the add-ons are not worth the money. He has also suggested that if you are forced to write your own calculations and create your own model / charts, it forces you to learn the trade – especially the probability theory and statistics. There is truth in what he says but one has to balance time with learning.

2.    Generally speaking, most of what you need for Monte Carlo simulations is already in Excel. So, if you have the input parameters you need, know the right distribution to reference, and have some little to moderate VBA skills – you can begin to conquer the world.

3.    Add-on documentation. Any add-on or 3rd party plug-in should have ample documentation. In cases where the add-on includes its own probability distributions – understand why you would use one tool’s probability distribution implementation over any other.

TOOLS

1.    Microsoft Excel. At a minimum you want to use Excel 2007 – just to get around some of the row and column limitations of previous versions of Excel. Do not use versions of Excel prior to Excel 2003 as the random number generator was flawed. With Excel 2010, Microsoft included a new random number generator algorithm called Mersenne Twister. I won’t bore you with the details – just understand that it is a good thing and reduces the need for yet another Excel plug-in or 3rd party tool.

2.    SimTools. This is a recent find for me and it is FREE. The primary reason I like this add-in is for its broad implementation of inverse cumulative-probability functions that Excel 2007 does not have. Another reason I like this add-on is that the SimTool VBA code can be found here; just in case you need to validate calculations that are occurring or if your work is being audited / challenged.

3.    Simulacion. I was originally turned onto this add-on by my modeling mentor and it’s free as well; though there is a shortage of documentation. Some of the useful features are: distribution fitting and correlation capabilities. I do not use this add-on very much – but the distribution fitting capability alone makes it worth having.

4.    RiskAmp. I was introduced to RiskAmp via Risk Management Insight (RMI). Some of RMI’s tools leverage (require) the RiskAmp add-on. RiskAmp includes a few continuous probability distributions not included in Excel. One specifically is the betaPERT distribution. RiskAmp’s implementation of betaPERT allows for a fourth parameter called lamda which determines the height of the distribution. For what its worth, Oracle’s CrystalBall betaPert implementation does not allow for lamda. So, if betaPERT is a big deal in your analysis / simulations and you don’t need other flash-whiz-bang functionality out of Oracle ChrystalBall – save yourself over a \$1000 dollars and purchase a copy of RiskAmp. One more thing on RiskAmp – I cannot prove it yet – but I think there are some memory sharing issues with their code. If you have multiple spreadsheets of which you are using RiskAmp for simulations – the speed of any one simulation is painfully slow.

5.    Mersenne Twister (MT) pseudorandom number generator. The nice folks at Structured Data (makers of RiskAmp) were nice enough to make a MT pseudorandom number generator add-on available for FREE. I have used this add-on in both Excel 2003 and Excel 2007. There is no need for it if you are using Excel 2010 – since Microsoft implemented the MT algorithm in its RAND() function.

6.    XLSim. This is a nifty Monte Carlo simulator that has a bunch of goodies in it that make it worth the couple of hundred dollars. There is a lot of flexibility with the chart output and the simulation menus are very intuitive. XLSim also includes the ability to store simulation results via the DIST standard. I will limit discussion of DIST for this post – but there are quite a few vendors that have already accommodated their software to create and/or work with DIST files.

7.    MegaStat. MegaStat came with one of my decision science textbooks while taking some courses at the Fisher College of Business; The Ohio State University. There is a great correlation matrix generator, the descriptive statistics functionality is great, and there are some other useful functions. While you can definitely find MegaStat for free on the Internet – consider buying a textbook from the creator of MegaStat, Professor J.B. Orris; Butler University. It’s ethically the right thing to do and it’s a small price to pay for a great utility!

8.    XY Chart Labler. This FREE Excel add-on has nothing to do with statistics or probability distributions. I use this add-on to add labels to data points for some charts. Does not sound like much but the functionality makes up for a shortcoming in Excel.

That’s it for now. Feel free to leave comments; especially if you have a really useful add-on or tool that you have used. I will mark as SPAM any comments that I feel are purely product pitches – with no substantive, experience based content.