Sanity Check – Managing Risk to the “Unexpected Loss”

November 18, 2008

Question: Can you justify proposing a budget that manages risk to the “unexpected loss” event level? If so, would your company be able to even operate?

On Monday, I wrote a rebuttal to a blog post by Stuart King regarding my risk assessment posts I did a few weeks back. I want to take a moment to follow-up on a comment Stuart left on my blog – specifically “…I suppose that the “right” way is whatever way works best in your present circumstances: fact of the matter is that even given the best set of data, we usually get broadsided by something unexpected.”

I do not think that Stuart was referring to “unexpected loss” in the sense that maybe an actuarial or enterprise risk management professionals does. “Unexpected loss” in this context is usually measured in the >99.x% area of the risk curve. Extreme tail risk – what would probably be a death blow to a F100 company usually measured in the tens – maybe hundreds – of billions of dollars of unexpected loss – in a single year.

I do think that Stuart was referring to losses that creep up and bite an organization for one of two reasons:

1.    An event that results in loss that was truly something that could not have reasonably been detected, prevented, or responded to in such a way where significant loss would have occurred. It is important to state that use of the word reasonable is in the context of the company experiencing the loss – not all the naysayers that sit back and proclaim they would never have missed such a gap.

2.    A loss where effective risk management is not occurring.

From a risk assessment perspective, I would break the two reasons above into three different types of loss scenarios:

1.    A significant loss independent of any identified risks. Meaning, I was not aware we had an asset (or process) with such a vulnerability that could result in a significant loss to our company.

2.    A significant loss associated with an existing risk issue of which some risk component changed since originally assessed that increased its vulnerability and thus loss event frequency (this assumes the risk was assessed with a sound methodology to begin with).

3.    A significant loss associated with an existing risk issue of which the loss magnitude exceeded previous estimates (this assumes the risk was assessed with a sound methodology that factors some form of monetary loss magnitude).

Question: If I managed to risk to the “unexpected loss” loss event level – would I ever suffer a loss?

Now from my “risk chair quarterback” position, I would say that most CISOs, CRO, information security managers, cannot request funds that are not unreasonable. Some big government agencies aside – most companies and small businesses are tightening their belts when it comes to IT security spend. So to get dollars above and beyond paying your employees, training them, maintaining existing capabilities, and maybe even improving a few capabilities – you have to justify it. I think it would be very hard to defend a budget increase under the umbrella of “I need to manage risk to a level to where unexpected losses are accounted for”. I would get laughed out of my current job if I ever suggested a thing and I do not even have a budget to manage.

With all that said I do think there are some ways to reduce the chances of an “unexpected loss” (info sec context) and it comes down to effective risk management.

1.    Regarding loss type one above, ask yourself: “Do I know what my key risks are?” If some of you that are reading this (that work in information security in a leadership or influencer position) and cannot answer this question…uh oh. Once in a while you have to take a step back, assess the “risk landscape”, and determine where you are relative to the landscape. This will not catch all “unexpected loss events” but it will no doubt catch some. Performing this exercise could be the justification needed to request extra budget dollars to improve existing or implement new preventive, detective, or response security controls. And by the way – you will probably have to do some form of a risk assessment on newly identified risks. I do not recommend assigning a HIGH, MEDIUM, or LOW risk. Estimate the risk in dollars and determine if the control significantly mitigates the risk down to an acceptable level.

2.    Regarding loss type two and three above: Effective unmitigated risk issue management. So, the reality is that we document a lot of risk issues that other people are responsible for mitigating if a decision is made to not assume the risk. For those “risk assumed” risk issues there needs to be follow-up on them periodically. Part of the follow-up should be to reassess the issue. Maybe something has changed in your environment that increases or decreases the loss event frequency or loss magnitude. The beauty of this exercise is that it forces someone to become better tuned to assumptions they made when the issue was first assessed as well as become an indicator as to how accurate their previous estimates were. A self-sharpening knife concept.

So in this post, I have commented on the concept of unexpected loss in the context of enterprise risk management (ERM) and information security risks. A lot of enterprise risk management teams classify “information security risks” as operational risk. For those information security risk professionals make a new friend this week with someone in your enterprise risk management group or any other group that manages risk associated with investments, product risk, etc.

For a brief glimpse into this fascinating aspect of ERM take a look at this white paper: http://www.google.com/search?hl=en&q=%22unexpected+loss%22+%22jonathan+davies%22&btnG=Google+Search&aq=f&oq=


Stuart King – Risk Assessment Rebuttal

November 17, 2008

Stuart King over at ComputerWeekly.com is not complementary about my recent risk assessment blog post. I am happy that Stuart reads this blog and to his credit he helped welcome me to the blogosphere a few months back.

When I first read his take on my assessment post, I have to admit that I wanted to reach towards the screen with an open hand and ask him to choke himself – but then I flashed back to work and happiness. (Yes, I was a Marine, and my idea of humor is probably a lot more different then most folks).

After reading Stuart’s post a few more times, there is a significant difference between his idea of a risk assessment and mine. Simply put, I believe in performing “risk assessments”, Stuart believes in doing a “vulnerability assessment”.

The LOW HANGING FRUIT objection. Stuart implies that the approach I use is too time consuming – especially given the length of the post. What Stuart does not put in his post is that at the end of my post I address this misconception. What he also did not state was that the assessment I posted and future assessments are meant to be training tools for those not familiar with a formal risk assessment approach; especially FAIR.

Next, objection – using a simple language that the business can understand. I agree with this comment. Most of the assessment analysis that I posted is more technical given the audience that I know reads this blog. Again though, at the end of the assessment – I provided a three sentence, business / decision maker summary.

The most important objection. Stuart states in his blog entry that his what I will call “keep it simple” risk assessment approach is:

1.    List the threats.
2.    State the level of vulnerability.
3.    List operational costs and potential revenue hits.
4.    Describe controls and options.
5.    Write up who needs to do what; keep track of time.
6.    Slap on a high, medium or low qualitative risk label.

Stuart – you have just completed a vulnerability assessment – you are crying WOLF. You are not taking into consideration “how often your asset that has a vulnerability” is getting attacked let alone how often you experience a loss because of a successful attack. Risk assessments take this into consideration.

As for the HIGH, MEDIUM, or LOW – qualitative labels may be a good starting point. But at the end of the day, they are still representative of some loss magnitude. Stick it out there and associate a cost to the risk you are trying to explain versus doing a “wet finger in the wind”, gut feeling check.

I welcome any feedback on my blog entries and I especially enjoy defending what I believe is a solid approach to a very sought after discipline within our profession.


Security Template Exception (part 2) – The Assessment

November 6, 2008

In “The Scenario” I laid out the scenario that we want to assess the risk for. Simply put, a rather routine Windows security template exception. Using the RMI FAIR Basic Risk Assessment Guide (BRAG) as our guide, let’s jump into this.

Note: In the interest of brevity, I will try to strike an appropriate balance between descriptiveness and conciseness when characterizing scenario components. When in doubt, error on the side of what may seem like too much documentation. Not only does it make the assessment more defensible now- it helps in the future if you have to revisit it or need to compare a similar scenario against it.

1.    Identify the Asset(s) at Risk: A non-redundant, non-highly available Windows 2003 Active directory member server that runs a sales tracking application that is infrequently used by CSRs to service customers for detailed sales order information. The most likely threat scenario is non-availability of the server to the business (CRSs) due to 3rdpartysalesapp.exe being leveraged to fill up the  hard disks on the server with useless data by the TCOMM below.

2.    Identify the “Threat Community” (TCOMM): This is always an interesting discussion since there can be multiple threats that can come into contact and attempt to take action against an asset. For this scenario – the first two that come to mind are malware and a malicious server administrator; I will only perform the assessment using one of them.

a.    Malware. Based off the given information, I am less inclined to stick with malware because the server is very isolated from the most likely attack vectors one would expect malware to be propagated by (email, Internet browsing, outbreak in lesser trusted network space, etc..). Now please understand, I am not stating that this server cannot be attacked by malware, I am stating that compared to my other threat community, a malware infection on this server has a lower probability of occurring then the other.

b.    Malicious Server Administrator (SA). I am choosing this TCOMM as the most likely for several reasons:

i.    The server is not accessible from the Internet; which reduces the chances of attack from the traditional “hacker” TCOMM.
ii.    It is reasonable to assume that most Initech Novelty, Inc. end users that interface with the application running on the server do not have privileged knowledge of the security configuration of the server.
iii.    Based on Initech company history there has been at least one incident of a malicious technical insider attack (Initech, Inc.).
iv.    I would characterize my TCOMM as “an internal, privileged, professional, technical server administrator”.

3.    Threat Event Frequency (TEF): TEF is the probable frequency, within a given timeframe, that a threat agent will act against an asset. For this step, I am going to select LOW or once every 10 years. Here is why:

a.    There was an incident in 1999 where a malicious internal employee was able to successfully take action against Initech. The circumstances then compared to this scenario are different but we have a starting point from an analysis perspective.

b.    In general, SA’s are pretty trustworthy individuals. Initech Novelty, Inc. is a small company with minimal IT staff. It is reasonable to assume that most of them would not have reason nor intent to intentionally attempt to bring down a production server. From a scenario perspective, there is nothing stated that should lead one to assume there is a reason for one of the existing SA’s  to take malicious actions.

c.    Initech Inc. has already been assessed by a 3rd party for ISO 27002 and given a CMM score of around 3.5 for the “human resource security management” section. An assessor could assume that Initech, Inc. is performing a good level of due diligence to ensure they are hiring trustworthy individuals as well as ensuring there are deterrents in place to hopefully minimize malicious behavior (could be a combination of both preventive and detective controls; policy, monitoring, training, etc..).

*NOTE – It may make more sense to skip to Step Five and then come back to Step Four.

4.    Threat Capability (TCAP): The probable level of force that a threat agent (within a threat community) is capable of applying against an asset. Now keep in mind that we are focused on the TCOMM in the context of a weakened security template that allows a non-Windows provided executable to write to “%systemroot%\system32” – not the threat population. For this step I am selecting MODERATE; between %16 and %84 of the threat community is capable of applying force against the server and vulnerability in focus for this scenario. Here is my reasoning:

a.    The TCOMM would have unfettered and privileged access to the server and be able to easily launch an attack.

b.    The TCOMM would most likely have privileged knowledge of the weakened security template.

c.    It would not take much effort for the TCOMM to find or quickly create a method to exploit the vulnerability.

5.    Control Resistance (CR; aka Control Strength): The expected effectiveness of controls, over a given timeframe, as measured against a baseline level of force. The baseline level of force in this case is going to be the greater threat population. This is important to understand. So, threat community is a subset of a greater threat population. So we can have internal employees as a threat population; but we have narrowed our focus in this scenario to a small subset of the population which is articulated in the Step 2. Pardon the redundancy, but Control Resistance is analyzed against the population – not the threat community. For this scenario, I am selecting a Control Resistance value of HIGH; or stated otherwise, the controls on the server are resistant to 84% of the threat population. Here is my reasoning:

a.    A very small percentage of the Initech Novelty, Inc. workforce would ever have privileged knowledge of the weakened security template.

b.    The skills required to remotely exploit the vulnerability are not trivial. It’s possible that someone may have the skills, and tools, but it is not probable that a large or even moderate percentage of the “threat population” does for this scenario.

6.    Vulnerability (VULN): The probability that an asset will be unable to resist the actions of a threat agent. The basic FAIR methodology determines vulnerability via a look-up table that takes into consideration “Threat Capability” and “Control Resistance” (this is on page seven of the BRAG).

a.    In step four – Threat Capability (TCAP) – we selected a value of MODERATE.

b.    In step five – Control Resistance (CR) – we selected a value of HIGH.

c.    Using the TCAP and CR inputs in the Vulnerability table, we are returned with a vulnerability value of LOW

7.    Loss Event Frequency (LEF): The probable frequency, within a given timeframe, that a threat agent will inflict harm upon an asset. The basic FAIR methodology determines LEF via a look-up table that takes into consideration “Threat Event Frequency” and “Vulnerability” (this is on page eight of the BRAG).

a.    In step three – Threat Event Frequency (TEF) – we selected a value of LOW; once every 10 years.

b.    The outcome of step 6 was a VULN value of LOW

c.    Using the TEF and VULN inputs in the Loss Event Frequency table, we are returned with a LEF value of VERY LOW

*Note: the loss magnitude table used in BRAG and the loss magnitude table for the Initech, Inc. scenarios are different. The Initech loss magnitude table can be viewed below as well as at the Initech, Inc. page of this blog.

Loss Magnitude Table (Initech Specific)

Loss Magnitude Table (Initech Specific)

8.    Estimate Worst-Case Loss (WCL): Now we want to start estimating loss values in terms of dollars. For the basic FAIR methodology there are two types of loss: worst case and probable (or expected) loss. The BRAG asks us to: determine the threat action that would most likely result in a worst-case outcome, estimate the magnitude for each loss form associated with that threat action, and sum the loss magnitude. For this step, I am going to select DENY ACCESS, in the RESPONSE loss form, with a WCL value of LOW. Here is why:

a.    The server going down results in it not being available- thus access to it is denied. The most likely loss form to Initech Novelty, Inc. is the cost (IT response) in bringing it back up.

b.    Since this is “worst case”, the longest I could see this server being down is five business days. Based off the given information, this would result in one call to the service center not being able to be properly serviced because the application is down. I estimate response loss from a customer service center perspective to be less then $100.

c.    The effort required by IT to rebuild the serve is 1-2 hours – easily under $1000 dollars.

d.    Even though the application is not considered “mission critical” to Initech Novelty, Inc. – a prolonged outage could impact other processes as well as increase the response impact.

9.    Estimate Probable Loss Magnitude (PLM): In step eight, we focused on worst-case loss – which in reality for this type of a scenario is probably not a practical step. Now we are going to focus on probable loss. Probable loss is for the most part always going to be lower then “worst case” loss. The BRAG asks us to: determine the threat action that would most likely result in a worst-case outcome, estimate the magnitude for each loss form associated with that threat action, and sum the loss magnitude. For this step, I am going to select DENY ACCESS, in the RESPONSE loss form, with a WCL value of VERY LOW. Here is why:
a.    The server going down results in it not being available- thus access to it is denied. The most likely loss form to Initech Novelty, Inc. is the cost (IT response) in bringing it back up.

b.    Since this is “probable loss”, the longest I could see this server being down is a few hours, not more then a day. Based off the given information, this could result in one call to the service center not being able to be properly serviced because the application is down. I estimate response loss from a customer service center perspective to be less then $100.

c.    The effort required by IT to rebuild the serve is 1-2 hours – easily under $1000 dollars.

10.     Derive and Articulate Risk: At this point in the basic FAIR methodology we can now derive a qualitative risk rating. Using the table on page 11 of the BRAG worksheet, we use the LEF value from step seven and the PROBABLE LOSS MAGNITUDE value from step nine to derive our overall qualitative risk label.

a.    LEF value from step seven was VERY LOW.

b.    PLM from step nine was VERY LOW.

c.    Overall risk using the BRAG table on page 11 is LOW.

So how would I articulate this to a decision maker or someone that is responsible for this exposure to the organization?

“A security template modification is necessary for this application to function properly. The modification somewhat weakens the security posture of that server. The most likely threat to the server would be a disgruntled server administrator that wants to bring the server down. Our assessment is that there is a very low likelihood of loss and even if it did occur there would be minimal impact in terms of response costs to Initech Novelty, Inc; (expected loss of less then $1000; worst case loss less then $5000).”

Final thoughts: Some of you that have read the scenario and assessment are probably thinking that this seems like a long process. As with anything new it takes a few iterations for one to become comfortable. Over time, a simple scenario like this could easily be done in a few minutes mentally – maybe five minutes if you have to document some of your justifications (which I suggest you always do).

I look forward to any feedback you might have! If anyone has any suggestions for a future scenario – please let me know.


Security Template Exception (part 1) – The Scenario

November 6, 2008

A Windows server administrator (SA) has contacted the Initech Novelty, Inc. Security Manager to get a formal security exception to modify the security template that allows the executable of a third party sales tracking application (3rdpartysalesapp.exe) to have write access to the “%systemroot%\system32” directory on a Windows 2003 server. The sales tracking application has been designed to create temporary files in the “%systemroot%\system32” directory. This server fulfills a “member server role” within the Initech Active Directory domain.

Given:

1.    The Windows servers runs a web-based intranet sales tracking application. It is accessed over TCP 443 (SSL) from the user segment. The application’s data actually resides on a separate database server in a separate data networksegment. The application is not considered a mission critical application.

2.    The Windows server sits on a dedicated network segment dedicated to Initech internal application servers. This segment is firewalled off from the user segments, the database servers as well as the network segments that have Internet facing servers. All firewall configurations are least privilege in nature. There are both logical and physical security controls that facilitate network segmentation.

3.    The servers on the internal application network server segment are centrally managed from an OS patching and Anti-Malware perspective (in other words, they do not make direct connections to the Internet).

4.    Initech, Inc. mandates via its information security policy that Windows server security templates be applied to all of its servers; specific to its role within the enterprise. In the case of this scenario, a default Microsoft provided Windows server 2003 “member server” security template has been applied to the Windows 2003 server.

5.    This particular sales tracking application server is not clustered or considered to be highly available. It is estimated that it would take Initech Novelty, Inc. IT personnel between 1-2 hours to restore the server from the most recent daily back-up and another 1-2 hours for application testing.

6.    The application is used by Initech Novelty, Inc. customer service representatives to get detailed information about a sales order that is not available within their regular CSR application. About one out of every 250 calls to the Initech Novelty, Inc. service center require access to the application in focus for this scenario.

7.    The Initech Novelty, Inc. service center is open seven days per week between the hours of 8 AM EST and 10 PM EST. Average daily volume of calls that are sales transaction related are 50.

8.    Initech Novelty, Inc. has calculated CSR employee costs to the organization (salary, benefits, etc.) to be $60 per hour. Information technology employees are listed as $85 per hour.

9.    Finally, the server in this scenario is not in scope for PCI-DSS compliance (Payment Card Industry Data Security Standards). Access to credit card “primary account number” (PAN) or any other card information is not possible in this application.

In the next post, we will perform the risk assessment using the RMI FAIR Basic Risk Assessment Guide (BRAG) as our guide.


Initech Inc., Risk Scenario PRE-READ

November 6, 2008

I participated in an advanced FAIR training session recently with a very small group of peers from my employer. It was great training, great collaboration, and was actually the formal kick-off to a special project I am leading regarding risk quantification. During the course of this training, I was reminded of a few things that I think are important to remember about risk scenarios – especially given the upcoming posts where I will post risk scenarios and my analysis.

1.    Training risk scenarios – whether reflective of actual incidents or purely made up – need to be structured enough to minimize “what-if” and or hypothetical questions. During this training event, I brought to the table what I thought was a “simple” risk scenario that I expected would take maybe 10 minutes to work through – it took about 30 minutes (there were 7 people chiming in). Everyone has a different perspective when looking and dealing with risk. So, to be effective at writing risk scenarios, I think each scenario needs to be framed up to account for at least 80-90% of the relevant information one needs to truly assess the scenario. Anything greater then 90% may be time prohibitive. Feel free to provide comments about the structure of the risk scenarios I present – what is the missing information you need? Ask yourself if the information you need is something that would only be applicable in your environment versus universal information that should have been included in the scenario.

2.    I will use the FAIR methodology to assess the risk for these scenarios. There are four FAIR certifications that can be earned – you can get more details at RMI’s website. I am currently certified as a “FAIR Analyst” and a “FAIR Senior Analyst”. For the risk scenarios I post, I will reference a freely available FAIR tool called the “Basic Risk Assessment Guide” (BRAG) and stick with basic FAIR concepts for the actual risk assessment. This approach should allow for an easier understanding of FAIR concepts and overtime, the complexity of the scenarios will be easier to digest. Of course, I would recommend reading the FAIR white paper but I am hoping that the risk scenarios will still give an adequate representation of FAIR.

3.    In the BRAG that is available from RMI – in the loss magnitude section – there is a table for loss magnitude severity with dollar value ranges. The values listed in the BRAG should be replaced with dollar value ranges more reflective of your company – especially if you start to adopt FAIR and use it on a regular basis. Determining these ranges should be an exercise that includes information security, IT, legal, business folks, and probably others I have not listed. In the case of the Initech risk scenarios – I have modified the loss magnitude severity table and posted it on the Initech, Inc. page.