First, I would like to welcome Jack Jones *back* to the world of risk blogging. Jack blogged a few weeks ago on the subject of heat maps; “Lipstick on Pigs” and “Lipstick Part II”; prompting a response by Jared Pfost of Third Defense. These are great posts that underscore the need to structure and leverage heat maps in an effective and defensible manner.
The purpose of this post is to share a recent “ah hah” moment involving heat maps and loss distributions. Whether you are an advocate or not of risk quantification or simulation modeling – it is hard to criticize one for having tools or procedures in place that essentially serve as a “risk sniff test”. I consider reconciling portions of a loss distribution to a heat map – a pretty useful sniff test.
QUESTION TO BE ANSWERED: How do I reconcile – or validate – the plotting of a heat map bubble with a loss distribution?
Well, it depends…but let’s establish some context.
• 5-by-5 heat map. The X axis of the heat map represents “Frequency of Loss”; the Y axis represents “Magnitude of Loss”. Each axis is broken into 5 sections.
• Let’ say we have a heat map whose bubbles represent categories of risk issues (ISO 27002 categories, BASEL II OpRisk Categories, etc…).
• At a minimum, all of these issues have been assessed with some methodology and/or tool (I prefer FAIR) that allow us to associate the frequency and magnitude of loss for each and every issue.
• We can perform thousands of simulation iterations for each and every issue in the risk repository, perform analysis, determine categories of risk that are contributing the most to various percentiles of the loss distribution, and then associate them with a heat map.
• For the purpose of this post we are going to make a good faith assumption that our loss distribution resembles a log-normal or normal “like” distribution.
Back in my “Rainbow Risk” post I shared an example of a “rainbow chart”; a 100% stacked bar chart representing the contribution of loss that a category contributes (by percentage) to any given loss distribution percentile. For example, in the rainbow chart on that post, it showed that Business Continuity Mgmt category of risk issues accounted for about 55% of the risk in the 99th percentile. On a heat map, most significant IT Business Continuity issues are probably going to be very low frequency, very high magnitude events. Thus, it is fairly intuitive that very low frequency / very high frequency magnitude loss events would “drive” the tail of a given loss distribution.
In the images below, I have mapped areas on a heat map (image 1) to areas on a distribution (image 2). Specifically, I am trying to illustrate how frequency and magnitude for any given issue factors into or most likely represented in a loss distribution.
Area A – Very low frequency, very high magnitude risk issues. These types of events or risk issues drive the tail portion of a loss distribution.
Area B – Very low frequency, moderate or high magnitude risk issues; or low to moderate frequency, very high magnitude loss events. It can be said that these type of issues also drive the tail – but maybe not as much past the 99th percentile like issues associated with Area A.
Area C – Low to Moderate frequency, moderate or high magnitude. These issues are best represented in the middle of the distribution; generally speaking, around one standard deviation on both sides of the mean.
Area D – Very frequent, moderate or high magnitude. Loss associated with these issues is not as severe as those of Areas A and B; but are typically greater then the mean expected loss.
Area E – Very frequent, very high magnitude. Generally speaking, these issues probably drive the portion of the distribution between 1 and 2.5 standard deviations (to the right of the mean).
Area F – Low or moderate frequency, low or moderate magnitude. These issues best factor into the area of the distribution left of the mean. Loss associated with these issues is less then the mean.
In closing, I would share at least one use case for performing this analysis or validation. Key risk heat maps. If all of your issues have frequency and magnitude values as well as some other attributes associated with the issue, you can:
1. Perform simulations on all of these issues.
2. Calculate their contributions to various distribution percentiles
3. Analyze the results by various attributes (ISO 27002, BASEL II, IT Process, etc…).
4. Chart derived information (categories of risk) on a heat map
5. Review for plausibility / accuracy (this should occur all the time)
I welcome any feedback!