Deconstructing Some HITECH Hype

February 23, 2011

A few days ago I began analyzing some model output and noticed that the amount of loss exposure for ISO 27002 section “Communications and Operations Management” had increased by 600% in a five week time frame. It only took a few seconds to zero-in on an issue that was responsible for the increase.

The issue was related to a gap with a 3rd party of which there was some Health Information Technology for Economic and Clinical Health Act (HITECH) fine exposure. The estimated HITECH fines were really LARGE. Large in the sense that the estimates:

a.    Did not pass the sniff test
b.    Could not be justified based off any documented fines / or statutes.
c.    From a simulation perspective were completely skewing the average simulated expected loss value for the scenario itself.

I reached out to better understand the rationale of the practitioner who performed the analysis and after some discussion we were in agreement that some additional analysis was warranted to accurately represent assumptions as well as refine loss magnitude estimates – especially for potential HITECH fines. About 10 minutes of additional information gathering yielded valuable information.

In a nutshell, the HITECH penalty structure is a tiered system that takes into consideration the nature of the data breach, the fine per violation and maximum amounts of fines for a given year. See below (the tier summary is from link # 2 at the bottom of this post; supported by links # 1 and 3):

Tier A is for violations in which the offender didn’t realize he or she violated the Act and would have handled the matter differently if he or she had. This results in a $100 fine for each violation, and the total imposed for such violations cannot exceed $25,000 for the calendar year.

Tier B is for violations due to reasonable cause, but not “willful neglect.” The result is a $1,000 fine for each violation, and the fines cannot exceed $100,000 for the calendar year.

Tier C is for violations due to willful neglect that the organization ultimately corrected. The result is a $10,000 fine for each violation, and the fines cannot exceed $250,000 for the calendar year.

Tier D is for violations of willful neglect that the organization did not correct. The result is a $50,000 fine for each violation, and the fines cannot exceed $1,500,000 for the calendar year.

Given this information and the nature of the control gap – one can quickly determine the penalty tier as well as estimate fine amounts. The opportunity cost to gather this additional information was minimal and the benefits of the additional analysis will result  in not only more accurate and defendable analysis – but also spare the risk practitioner from what would have been certain scrutiny from other IT risk leaders and possibly business partner allegations of IT Risk Management once again “crying wolf”.

Key Take-Away(s)

1.    Perform sniff tests on your analysis; have others review your analysis.
2.    There is probably more information then you realize about the problem space you are dealing with.
3.    Be able to defend assumptions and estimates that you make.
4.    Become the “expert” about the problem space not the repeater of information that may not be valid to begin with.

Links / References associated with this post:

1.    HIPAA Enforcement Rule ref. HITECH <- lots of legalese
2.    HITECH Summary <- less legalese
3.    HITECH Act scroll down to section 13410 for fine information <-lots of legalese
4.    Actual instance of a HITECH-related fine
5.    Interesting Record Loss Threshold Observation; Is 500 records the magic number?

Advertisements

Simple Risk Model (Part 4 of 5): Simulating both Loss Frequency & Loss Magnitude

February 5, 2011

Part 1 – Simulate Loss Frequency Method 1
Part 2 – Simulate Loss Frequency Method 2
Part 3 – Simulate Loss Frequency Method 3

In this post we want to combine the techniques demonstrated in parts two and three into a single simulation. To accomplish this simulation we will:

1.    Define input parameters
2.    Introduce VBA code – via a macro – that consumes the input parameters
3.    Perform functions within the VBA code
4.    Take the output from functions and store them in the spreadsheet
5.    Create a histogram of the simulation output.

Steps 3 & 4 will be performed many times; depending on the number of iterations we want to perform in our simulation.

You can download this spreadsheet to use as a reference throughout the post. The spreadsheet should be used in Excel only. The worksheets we are concerned with are:

test – This worksheet contains code that will step through each part of the loss magnitude potion of the simulation. By displaying this information, it allows you to validate that both the code and calculations are functioning as coded. This tab is also useful for testing code in small iterations. Thus, the number of iterations should be kept fairly low (“test”; B1).

prod – Unlike the “test” tab, this tab does not display the result of each loss magnitude calculation per iteration. This is the tab that you would want to run the full simulation on; thousands of iterations.

Here we go…and referencing the “prod” worksheet…

Input Parameters.
Expected Loss Frequency. It is assumed for this post that you have estimated or derived a most likely or average loss frequency value. Cell B2 contains this value. The value in this cell will be one of the input parameters into a POISSON probability distribution to return an inverse cumulative value (Part 2 of this Series).

Average Loss Magnitude. It is assumed for this post that you have estimated or derived a most likely or average loss magnitude value. Cell B3 contains this value. The value in this cell will be one of the input parameters into a NORMAL probability distribution to return an inverse cumulative value (Part 3 of this Series).

Loss Magnitude Standard Deviation. It is assumed for this post that you have estimated or derived the standard deviation for loss magnitude. Cell B4 contains this value. The value in this cell will be one of the input parameters into a NORMAL probability distribution to return an inverse cumulative value (Part 3 of this Series).

The Simulation.
On the “prod” tab, when you click the button labeled “Prod” – this will execute a macro composed of VBA code. I will let you explore the code on your own – it is fairly intuitive. I also left a few comments in the VBA so I remember what certain sections of the code are doing. There are four columns of simulation output that the macro will generate.

Iter# (B10). This is the iteration number. In cell B1 we set the number of iterations to be 5000. Thus, the VBA will cycle through a section of its code 5000 times.

LEF Random (C10). For each iteration, we will generate a random value between 0 and 1 to be used in generating a loss frequency value. Displaying the random value in the simulation is not necessary, but I prefer to see it so I can informally analyze the random values themselves and gauge the relationship between the random value and the inverse cumulative value in the next cell.

LEF Value (D10). For each iteration, we will use the random value we generated in the adjacent cell (column c), combine it with the Expected Loss Frequency value declared in B2 and input these values as parameters into a POISSON probability distribution that returns an inverse cumulative value. The value returned will be an integer – a whole number. Why a whole number? Because you can’t have half a loss event – just like a woman cannot be half pregnant ( <- one of my favorite analogies). This is a fairly important concept to realize from a loss event modeling perspective.

Loss Magnitude (E10). For each iteration, we will consume the value in the adjacent cell (column D) and apply logical rules to it.

a.    If the LEF Value = 0, then the loss magnitude is zero.
b.    If the LEF Value > 0, then for each instance of loss we will:
1.    Generate a random value
2.    Consume the average loss magnitude value in cell B3
3.    Consume the loss magnitude standard deviation in cell B4
4.    Use the values referenced in 1-3 as input parameters into a Normal probability distribution and return an inverse cumulative value. In other words, given a normal distribution with mean $2000 and standard deviation of $1000 – what is the value of that distribution point given a random value between 0 and 1.
5.    We will add all the instances of loss for that iteration and record the sum in column E.

Note: Steps 4 and 5 can be observed on the “test” worksheet by clicking the button labeled “test”.

The code will continue to loop until we have completed the number of iterations we specified in cell B1.

The Results. Now that the simulation is complete we can begin to analyze the output.

# of Iterations With No Loss (B5). This is the number of iterations where the returned inverse cumulative value was zero.

# of Iterations With Loss (B6). This is the number of iterations where the returned inverse cumulative value was greater than zero.

# of Loss Events (B7). This is the sum of loss events for all the iterations. There was some iteration where there was more then one loss event.

Max. # of Loss Events for an iteration (B8). This is the maximum number of loss events for any given iteration.

Next, let’s look at some of the simulation output in the context of loss severity; $.

Min. Loss (K6). This is minimum loss value returned from the simulation. I round the results to the nearest hundred in the worksheet.

Max. Loss (K7). This is maximum loss value returned from the simulation. I round the results to the nearest hundred in the worksheet.

Median (G5). This is the 50th percentile of the simulation results. In other words, 50% of the simulations results were equal to or less then this value.

Average (G6). This is the average loss value for the simulation. This is the quotient of summing all the loss magnitude values and dividing by the number of iterations. This value can quickly be compared to the median to make inferences about the skew of the simulation output.

80th % (G7). This is the 80th percentile of the simulation results. In other words, 80% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-5 loss.

90th % (G8). This is the 90th percentile of the simulation results. In other words, 90% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-10 loss.

95th % (G9). This is the 95th percentile of the simulation results. In other words, 95% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-20 loss.

99th % (G10). This is the 99th percentile of the simulation results. In other words, 99% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-100 loss.

Note 2: Generally speaking, the 95th, 99th and greater percentiles are often considered as being part of the tail of the loss distribution. I consider all the points in cells G5:G10 to be useful. For some loss exposures, the median and average values are more than enough to make informed decisions. For some loss exposures; the 80th, 90th, 95th, 99th and even larger percentiles are necessary.

Simulated Loss Magnitude Histogram. A histogram is a graphical representation showing the distribution of data. The histogram in the “prod” worksheet represents the distribution of data for all iterations where the loss was greater than zero.

Wrap Up. What I have presented in this post is a very simple model for a single loss exposure using randomness and probability distributions. Depending on your comfort level with VBA and creativity, one can easily build out more complex models; whether it is hundreds of loss exposures you want to model for or just a few dependent loss exposures.