In parts one and two of this series we looked at two methods for simulating loss frequency. Method one – while useful – has shortcomings as it primarily requires working with expected loss frequency values less then 1 (once per year). In addition, with method one, it was not possible to determine iterations where loss frequency could be greater then once per year.
Method two overcame these limitations. We leveraged the Poisson probability distribution (discrete distribution) as well as an expected loss frequency value and a random value between 0 and 1 to return a loss value (an integer) for any given iteration. Using this method – about 10% of our iterations resulted in loss events and some of those iterations had multiple loss events. From my perspective method two is the more useful of the two – especially since it has the potential to account for low probability situations where there could be numerous loss events for any simulation iteration.
The purpose of this post is to simulate loss magnitude. Conceptually, we are going to do what we did with loss frequency method two – but our distribution and input parameters will differ. To simulate loss magnitude we need four things:
1. A continuous probability distribution.
2. A random value between 0 and 1
3. An expected or average loss magnitude
4. A loss magnitude standard deviation
Continuous Probability Distribution. Technically, if you have internal or external loss magnitude data, you would analyze that data and fit the data to an appropriate continuous probability distribution. There are dozens of such distributions. There are often times where we have limited data or we need to make good faith (or “educated”) assumptions about the shape of our loss magnitude curve. A lot of IT risk scenarios loss magnitude curves are often assumed to be normal or lognormal in nature. Normal is often assumed but it has its limitations since there can be negative values and rarely is there a “perfect” normal loss magnitude curve for IT risk scenarios. However, most of the “normal-like” distributions converge to normal (as data points increase). Thus, for the purposes of demonstration I am going to use the normal distribution.
Random Value Between 0 and 1. Because we are dealing with uncertainty and a distribution, we will use random values between 0 and 1 in our probability distribution; think Excel function RAND().
Expected or Average Loss Magnitude. Statistics 101 – If you take the sum of X values and divide by X you get the average. Quantitative risk analysis methodologies like FAIR can facilitate deriving an average loss magnitude estimate. Or maybe, you have actual loss magnitude data points. How you derive average loss magnitude is not the focus of this post – just remember that to use the normal distribution you need that average loss magnitude value.
Loss Magnitude Standard Deviation. More Statistics 101. At a high level, standard deviation is a statistic or measure of how spread out our data points are relative to the mean. The larger the number, the greater or flatter our distribution (think bell curve) will be; the smaller the number – the more narrow the bell curve will be. In the interest of brevity, it is assumed that either you can use existing Excel functions to calculate a standard deviation from your loss magnitude data points, or your risk analysis tool sets will provide this value to you. In some cases you may not have actual data sets to calculate a standard deviation let alone an average magnitude value – in those cases we have to make our best estimates and document assumptions accordingly.
How do these work together? In layman’s terms – given a normal loss distribution with an average loss magnitude of $5000 and a standard deviation of $1000; what is the loss value (inverse cumulative value) at any point in the distribution, given a random probability value?
You may want to download this Excel spreadsheet to reference for the rest of the post (it should work in Excel 2003, Excel 2007 and Excel 2010; I have not tested it on Office for Mac). Reference tab “magnitude” and make sure you view it in Excel and NOT in Google Apps.
b. The loss magnitude standard deviation is $1000 (cell B2; tab “magnitude”)
c. For purposes of demonstration, we will number some cells to reflect the number of iterations (A9:A1008; A9=1; A10=A9+1; drag A10 down to you get to 1000).
d. In Excel, we would use the =RAND() function to generate the random values in cells B9:B1008.
e. Now, in column C beginning in cell C9 – we are going to combine a Normal probability distribution with our average loss ($B$1), standard deviation ($B$2) and the random value(s) in column B to return a loss value. In other words, given a normal distribution with mean $5000 and standard deviation of $1000 – what is the value of that distribution given a random value between 0 and 1 – rounded to the nearest 10th? You would type the following in C9 and then drag C9 down to C1008:
Let’s dissect this formula.
i. ROUND. I am going to round the output of this formula to the nearest 10; annotated by the -1.
ii. MAX. Because we are using the normal distribution and because some values could be less then zero which is not applicable for most IT scenarios, we are going to compare the value generated by the NORMINV function to 0. Which ever is larger is the value that then gets rounded to nearest 10.
iii. NORMINV. This is the function built into Excel that returns an inverse cumulative value of a normal distribution given a probability, a mean and a standard deviation.
f. Once you have values in all the cells – hit F9 a few times.
g. Cell B3 gives the minimum loss value from cells C9 through C1008. The random value associated with the minimum value is probably less then 0.00xxxx.
h. Cell B4 gives the maximum loss value from cells C9 through C1008. The random value associated with the maximum value is probably greater then 0.99xxxx.
i. The histogram shows the count of iterations whose loss magnitude values falls within a loss magnitude bin. If you drew a line around the tops of each column it would resemble a bell curve. We expect to get this since we are using the normal distribution.
j. Press the F9 key; new random values will be generated. Every time you press F9 think of it as a new simulation with 1000 iterations. Press F9 lots of times and you will notice that the histogram changes as well. While individual bin counts will change – the general shape of the histogram does not.
k. By the way, if you change the average loss magnitude value in cell B1 – the histogram will probably break. But you can change the value in B2 to 500, hit F9 a few times and observer how the bell-curve shape becomes more narrow. Or, change B2 to 2000 and you will see a much flatter bell curve.
1. As we did with simulating loss frequency, we leverage randomness to simulate loss magnitude.
2. While we typically talk about an average loss magnitude value; losses can range in terms of magnitude. Being able to work within a range of loss values gives us a more complete view of our loss potential.
In part four of the series, we will combine loss frequency and loss magnitude into one simulation. For every iteration, we will randomly derive a loss frequency value (an integer) and a loss magnitude value. We will then calculate an expected loss, which is the product of the loss frequency and the loss magnitude values. Perform this cycle thousands or millions of time and you now have an expected loss distribution.