Analyse the vulnerabilities of components

4.3 Analyse the vulnerabilities of components

This activity must be performed for each component in turn. Each step, a component is selected for analysis.

4.3.1 Add and remove vulnerabilities

Inspect the listed vulnerabilities of the component. Other vulnerabilities may exist that were not in the general checklist. These vulnerabilities must be added. The disaster scenarios that were prepared in Stage 1 must be used as guidance in decisions to add vulnerabilities.

Example: Telecommunication satellites are vulnerable to space debris. This vulnerability does not apply to any other kind of equipment, and will therefore not be in the equipment checklist. On the other hand, satellites are not vulnerable to flooding. Therefore “Collision with space debris” must be added, and “Flooding” must be removed from the list of satellite vulnerabilities.

A vulnerability must not be removed unless it is clearly nonsensical, e.g. configuration errors on devices that do not allow for any kind of configuration, or flood damage to a space satellite. To be removed, a vulnerability must be physically impossible, not just very unlikely in practice. In all other cases the frequency and impact of the vulnerability should be assessed (although they can both be set to Extremely low), and the vulnerability must be part of the review at the end of Stage 2.

When a vulnerability is removed, that node will also not be shown in the list for common cause failures. That is another reason not to remove vulnerabilities.

It is important that vulnerabilities that are merely unlikely but not physically impossible are retained in the analysis, because such vulnerabilities could have an extremely high impact. Low-probability/high-impact events must not be excluded from the risk analysis.

4.3.2 Assess vulnerabilities

When the list of vulnerabilities for the component is complete, each vulnerability must be assessed. The analysts, based on their collective knowledge, estimate two factors:

the likelihood (frequency) that the vulnerability will lead to an incident, and
the impact of that incident.

Both factors Frequency and Impact are split into eight classes. The classes do not correspond to ranges (a highest and lowest permissible value); instead they mention a typical, characteristic value for the class. The selection of the proper class may require a discussion between analysts. Analysts must provide convincing arguments for their choice of class.

Sometimes a factor (a likelihood or impact) is extremely large, or extremely small. Extremely large values are not simply very big, but too big to fit in the normal scale, unacceptably high and intolerably high. Likewise, extremely small values are outside the scale of normal values, and sometimes may safely be ignored. Extreme values fall outside the normal experience of analysts or other stakeholders, and normal paths of reasoning cannot be applied.

If no consensus can be reached between the analysts, the class Ambiguous must be assigned. In the remarks the analysts should briefly explain the cause for disagreement, and the classes that different analysts would prefer to see.

A limited amount of uncertainty is unavoidable, and is normal for risk assessments. However, when uncertainty becomes too large, so that multiple classes could be assigned to a factor the class Unknown must be assigned.

The Raster tool assists in recording the analysis results. The tool will also automatically compute the combined vulnerability score for each vulnerability, and the overall vulnerability level for each node (see sections Vulnerability assessment window and Single failures view; for technical details see Computation of vulnerability levels).

Do not blindly trust your initial estimate of frequency and impact. You must not rely only on information that confirms your estimate, but also actively search for contradicting evidence.

4.3.3 Assess frequency

For natural vulnerabilities the factor Frequency indicates the likelihood that the vulnerability will lead to an incident with an effect on the telecom service. All eight classes can be used for Frequency (see Frequency table).

A frequency of “once in 50 years” is an average, and does not mean that each 50 years an incident is guaranteed to occur. It may be interpreted as:

The average timespan between incidents on a single component is 50 years.
For a set of 50 identical components, each year on average one of them will experience an incident.
Each year, the component has a 1 in 50 chance of experiencing an incident.

When the life time of a component is 5 years (or when the component is replaced every 5 years) the frequency of a vulnerability can still be “once in 500 years”.

Example: a component is always replaced after one year, even if it is still functioning. On average, 10% of components fail before their full year is up. The general frequency for this failure is therefore estimated as “once in 10 years” even though no component will be in use that long.

Note that this value is between the characteristic values for High and Medium. The analysts must together decide which of these two classes is assigned.

**Natural frequencies:** characteristic values for frequency classes of natural vulnerabilities.
Class	Value	Symbol
High	Once in 5 years. For 100 identical components, each month 1 or 2 will experience an incident.	H
Medium	Once in 50 years. For 100 identical components, each year 2 will experience an incident.	M
Low	Once in 500 years. For 100 identical components, one incident will occur every five years.	L
Extremely high	Routine event. Very often.	V
Extremely low	Very rare, but not physically impossible.	U
Ambiguous	Indicates lack of consensus between analysts.	A
Unknown	Indicates lack of knowledge or data.	X
Not yet analysed	Default. Indicates that no assessment has been done yet.	–

The likelihood of malicious vulnerabilities is not based on chance (as is the case for natural vulnerabilities), but is based on the difficulty of the action and on the determination and capabilities of the attacker. An attack that requires modest capabilities could already prove too demanding for a casual customer or employee. On the other hand, even a difficult attack may will be within the reach of skilled state-sponsored hackers. The Raster method is based on the most skilled attacker to the organisation, the worst plausible attacker.

**Worst plausible attackers:** descriptions, motivations and goals.
Customers, employees	Unskilled, lightly motivated by opportunity or mild protest (e.g. perceived unfair treatment).
Activists	Moderately skilled, aiming for media exposure to further their cause or protest. Visible impact.
Criminals	Highly skilled, motivated by financial gains (e.g. ransomware).
Competitors	Highly skilled, aiming to obtain trade secrets for competitive advantage. Avoid visible impact.
State-sponsored hackers	Very highly skilled, motivated by geo-political advantages. Avoid visible impact.

In the Raster tools you set the worst plausible attacker as part of the project properties. Since this is a property of the entire project, you only need to select the appropriate difficulty level of the exploit, as per the table below.

**Malicious frequencies:** characteristic values for frequency classes of malicious vulnerabilities.
Class	Value
Very difficult	Exploit requires skill, custom attack tools, long preparation time and multiple weaknesses and zero-days.
Difficult	Exploit requires skill, some customized attack tools and long preparation time
Easy	Tools exist to execute the exploit. Basic skills required.
Trivial	Requires no skill or tools at all.
Nearly impossible	Exploit may be possible in theory, but consensus is that exploit is infeasible.
Ambiguous	Indicates lack of consensus between analysts.
Unknown	Indicates lack of knowledge or data.
Not yet analysed	Default. Indicates that no assessment has been done yet.

Use the following three-step procedure to determine the factor Frequency:

Find the frequency class that applies to this type of node in general.

This can be based on, for example, past experience or expert opinion. If available, MTBF (mean time between failures) figures or failure rates should be used.
Think of reasons why this particular node should have a lower or higher frequency than usual.

Existing countermeasures may make the frequency lower than usual. For example, if an organisation already has a stand-by generator that kicks in when power fails, then the frequency of power failure incidents is thereby reduced. Remember that the frequency does not reflect the likelihood that the vulnerability is triggered, but the likelihood that the vulnerability will lead to an incident.

For some components monitoring can detect failures that are imminent before they occur. This also will reduce the frequency of incidents. Another example is the use of premium quality components, or secure and controlled equipment rooms. All of these measures make incidents less likely.

The disaster scenarios may be an indication that the frequency should be higher than usual. In crisis situations it is often more likely that an incident will occur. For example, power outages are not very common, but are far more likely during flooding disasters. These disasters themselves are very uncommon. The overall frequency is therefore determined by:
- the likelihood of power outages during normal circumstances, and
- the likelihood of power outages during a flood, combined with the likelihood of flooding.
Decide on the frequency class for this particular node.

Typically either Low, Medium, or High will be used. If neither of these accurately reflect the frequency, one of the extreme classes should be used. If no class can be assigned by consensus, one of Ambiguous or Unknown should be used.

4.3.4 Assess impact

The factor Impact indicates the severity of the effect when a vulnerability does lead to an incident. This severity is the effect to the service as a whole, not its effect to the component that experienced the vulnerability. For example, a power failure will cause equipment to stop functioning temporarily. This is normal, and in itself of little relevance, unless it has an effect on the availability of the telecom service. The power failure could cause the service to fail (if the equipment is essential), but could also have a no effect at all (if the equipment has a backup). Or any effect in between.

Only the effects on the telecom service must be taken into account in this stage. Loss of business, penalties, and other damage are not considered, but may be relevant during risk evaluation (see Assessing social risk factors).

The damage may be caused by an incident that also affects other components of the same telecom service. For example, a cable may be damaged by an earthquake; the same earthquake will likely cause damage to other components as well. However, this additional damage must not be taken into account. Only the damage resulting from the damage to this component must be considered. The next stage, common cause failures analysis, takes care of multiple failures due to a single incident.

The impact of some vulnerability on a component covers:

only effects to the service, not the effects to the component itself,
only effects to the service, not subsequent damage to the organisation,
only effects due to damage this single component, not effects due to the failure scenario.

All eight classes can be used for Impact. Characteristic values for the classes high, medium, and low are given in Table .

Use the following three-step procedure to determine the factor Impact:

Choose the impact class that most accurately seems to describe the impact of the incident.
Think of reasons why the impact would be higher or lower than this initial assessment.

Existing redundancy can reduce or even annul the impact. For example, a telecom service may have been designed such that when a wireless link fails, a backup wired link is used automatically. The impact of the wireless link failing is thereby reduced.

Monitoring and automatic alarms may reduce the impact of incidents. When incidents are detected quickly, repairs can be initiated faster. Keeping stock of spare parts, well trained repair teams, and conducting regular drills and exercises all help in reducing the impact of failures and must be considered in the assessment. On the other hand, absence of these measures may increase the impact of the incident.
Decide on the impact class.

Typically either Low, Medium, or High will be used. If neither of these accurately reflect the impact, one of the extreme classes should be used. If no class can be assigned by consensus, one of Ambiguous or Unknown should be used.

**Impact classes:** Characteristic values, for natural and malicious vulnerablities.
Class	Value	Symbol
High	Partial unavailability, if unrepairable. Total unavailability, if long-term.	H
Medium	Partial unavailability, if repairable (short-term or long-term). Total unavailability, if short-term.	M
Low	Noticeable degradation, repairable (short-term or long-term) or unrepairable.	L
Extremely high	Very long-term or unrepairable unavailability.	V
Extremely low	Unnoticeable effects, or no actors affected.	U
Ambiguous	Indicates lack of consensus between analysts.	A
Unknown	Indicates lack of knowledge or data.	X
Not yet analysed	Default. Indicates that no assessment has been done yet.	–

It typically does not matter for the selection of impact class whether some or all actors are affected. All actors are important; they would not appear in the diagram otherwise. However, if the analysts agree that only very few actors are affected they can select the next lower class (e.g. Low instead of Medium).

The meaning of “short-term” and “long-term” depends on the tasks and use-cases of the actors. A two minute outage is short-term for fixed telephony but long-term for real-time remote control of drones and robots.

“Degradation” means that actors notice reduced performance (e.g. noise during telephone calls, unusual delay in delivery of email messages), but not so much that their tasks or responsibilities are affected.

“Partial unavailability” means severe degradation or unavailability of some aspects of the service, such that actors cannot effectively perform some of their tasks or responsibilities. For example: email can only be sent within the organisation; noise makes telephone calls almost unintelligible; mobile data is unavailable but mobile calls and SMS are not affected. Actors can still perform some of their tasks, but other tasks are impossible or require additional effort.

“Total unavailability” means that actors effectively cannot perform any of their tasks and responsibilities using the telecom service (e.g. phone calls can be made but are completely unintelligible because of extremely poor quality).

“Extremely high” means that if the incident happens the damage will be so large that major redesign of the telecom service is necessary, or the service has to be terminated and replaced with an alternative because repairs are unrealistic.

4.3.5 Assessing all vulnerabilities on a component

The overall vulnerability level of a component is defined as the worst vulnerability for that component. If some of the vulnerabilities are not assessed (no frequency or impact have been set on them), they will not contribute to the overall vulnerability level. It can thus be a useful time-saver to skip assessment of unimportant vulnerabilities.

It is very important that all vulnerabilities with High and Extremely high impact are assessed fully. This is true even when their Frequency is low.