7 Executing the Raster method

Practical guidelines for execution of the Raster method.

7.1 Team composition

The execution of the Raster method is a project that requires a suitable project leader. The project leader should posses the following skills:

  • able to effectively lead project meetings.
  • ample experience with the Raster method.
  • sufficient knowledge of IT and telecommunication technology.

In project meetings, the project leader ensures that each participant receives opportunities to contribute, and that all points of view are discussed. When necessary, the project leader queries statements, to improve on opinions and assessments. The project leader does not have to be a telecommunication expert, but should posses sufficient knowledge of IT and telecoms to lead the discussions. The project leader can participate as one of the analysts, or concentrate on managing the project.

Three factors influence the choice and number of analysts.

  1. To apply the Raster method to an organisation, expertise from various fields of study is essential. Analysing threats to telecom service components requires in-depth knowledge of telecoms engineering, crisis management, political and legal issues, and the preferences of external stakeholders. No analyst can be expected to be expert in all these fields.
  2. Raster requires analysts to make assessments about uncertain scenarios, often without access to all desired information. This inevitably means that assessments are partly subjective. By including several analysts from different backgrounds, the amount of subjectivity can be kept in check.
  3. Several steps in the Raster method call for consensus. When the group becomes too large, reaching consensus will be time consuming.

These factors indicate that the group of analysts should not be too small, but also not too large. The group should include experts from different fields and backgrounds, and should not exceed 10 persons.

Before a Raster project can start, an introductory session should be held in which the project leader shows the key activities using a small mock-example.

It is often useful to use a core team. The core team consists of the two or three most experienced analysts, plus the project leader. The responsibility of the core group is to execute most of the operational tasks, so that the other analysts can restrict their involvement to providing their specific knowledge.

During stages 2 and 3 one of the analysts should be appointed as recorder. The responsibility of the recorder is to record the diagrams and the assessments of vulnerabilities to components using the Raster tool. The recorder should use a computer connected to a projector, so that all analysts in the room can view a common, central display of the tool. The project leader may perform the recorder role.

Because the recorder notes all assessments, he or she will be the best placed to detect inconsistencies in assessments. The recorder should take special care to notice inconsistent scores between components, and bring these up for discussion. For example, if some vulnerability is scored as Medium in one component but as Low in another, similar component, the group should discuss whether one of these scores may have to be adjusted.

In follow-up sessions the recorder may find it useful to distribute printouts from the Raster tool for reference.

7.2 Stage 1 — Initiation and preparation

If a core group is used, the core group will take care of Stage 1. The results are then presented during the first project meeting, so that the other analysts can contribute.

7.3 Stage 2 — Single failures analysis

Most commonly, the analysis cannot be completed in a single work session. To make the most effective use of the expertise of the analysts, the project leader decides which telecommunication services and components are examined in the project meetings The goal is to discuss as many points of view as possible, in order to understand each others arguments for frequency and impact assessments. Based on this insight, the core group can then examine and assess the remaining components. Those results are then presented during the next meeting, and discussed briefly. Then the single failures of the next batch of components are discussed. This procedure is repeated until all single failures are assessed. It is not unusual that two to three project meetings are needed.

In the assessments of frequency and impact, already implemented risk treatments are taken into account. Existing measures may reduce the frequency of vulnerabilities, their impact, or both. Because of a backup power generator, for example, power supply will fail less often. The use of a smart phone cover will reduce the possibility that a smart phone is physically damaged. The generator and the cover will not prevent the loss of external power of dropping of a phone, but help to prevent it from leading to an incident, which is the meaning of Frequency.

Impact is reduced by measures that provide alternatives, or backup options. For example, a stand-by server that takes over from the main server when it fails. Or the use of two cable connections, so that in case of a cable break service continues at half capacity.

During assessments it is of prime importance to keep the definitions of frequency and impact classes in mind. The precise definition of the various vulnerabilities must be used consistently as well. The recorder or project leader must monitor consistency.

The following clarifications to the standard vulnerabilities may be useful. Suggestions for additional vulnerabilities are provided as well.

7.3.1 Wireless connections

Examples included mobile telephony (GSM, UMTS, LTE), WiFi, cordless DECT telephones, bluetooth, wireless audio and video connections, access cards for electronic locks, two-way radios and remote controls.

Interference. Unintentional interference by a radio source using the same frequency band. WiFi, for example, can be disturbed by other transmitters in the same frequency band, or even by a badly shielded microwave oven. Interference is often unpredictable, and of short duration.

Jamming. Intentional interference by a third party. For example, someone deliberately tries to disrupt mobile telephony. Jammers are sometimes used by criminals to prevent tracing and detection. Jamming may last for a long time, and is often difficult to locate and remove.

Congestion. The amount of traffic offered exceeds the capacity of the link. WiFi connections can become very slow on busy locations; mobile telephony can become troublesome when a large group of people start calling or use social media simultaneously, for example during a festival or an incident. Congestion is often brief, but can persist during large incidents.

Signal weakening. Loss of signal strength through distance or blocking by buildings, trees, etc. In modern buildings mobile signal strength is often low, due to thin metal foil in insulating glazing. In basements or underground parking lots the use of mobile phones and two-way radios may be impeded by weak signals. Often users know at which locations they can expect signal weakening.

7.3.2 Wired connections

Indoor examples include cords, network cables and patch cables. Outdoor examples include fiber optic, coax or traditional copper cables running above ground or below ground. Wired connections do not include power cords; vulnerabilities to power cords are included in equipment power loss.

Break. Cable damaged by natural events, trenching during construction work, anchors (for marine cables or cables under rivers or canals), weak contacts (corrosion, loose connectors) or other external influences. Especially for patch cables it is common that the wrong cable is unplugged during maintenance. This type of mistake can also be included in cabe breaks.

Congestion. The amount of traffic offered exceeds the capacity of the link. This vulnerability is similar to congestion on wired links. Base your assessments on the true capacity, not on theoretical capacity. Fiber optic cables can handle enormous throughput, but when a contract for 2 Mbps has been agreed with the supplier, the data speed will be limited to 2 Mbps. Often the impact of congestion is noticeable at 50% load. An example of congestion is the scenario whereby all office PCs simultaneously attempt to download their monthly patches.

Cable aging. Insulation weakens with age. This is an issue mostly with outdoor cables that are exposed to the elements. Cable ageing expresses itself in noise or disruptions. A cable that gets cut and disconnected due to ageing is considered a cable break instead.

Additional vulnerabilities. Some cables are susceptible to electromagnetic interference, meaning that the cable acts as an antenna to nearby transmitters or other equipment.

7.3.3 Equipment

Physical damage. Fire, flood, water from fire fighters, knocks and other physical damage inflicted. Physical damage involves unusual external influences. For example, a user drops his mobile phone or radio, a cup of coffee is spilled over a laptop, or the automatic fire sprinklers are activated.

Power. Failure of electrical power supply. For battery-powered devices this means that the battery is empty. If backup power is available, the frequency of power failures is reduced; because of the backup, the power supply to the device is not cut. A power incident only occurs when also the backup power cuts out. A defect in the power supply unit inside a device ican be considered as power failure, not as Malfunction. Accidental switch-off and accidental unplugging of the power plug can be considered power failure, not as Configuration error. The vulnerability of power failure can be removed when the device does not have a power cord nor battery. This applies, for example, to WiFi access points or IP phones that use Power of Ethernet (PoE).

Configuration. Incorrect configuration or mistakes by operators or users. Examples include hardware configuration (switches, volume controls) or software configuration. Devices can be configured by the end user, by an IT department or an external service organisation. Complex devices such as smart phones or laptops contain many settings that can possibly be misconfigured by end users, causing malfunction of the device. The IT department may roll out a faulty patch to PCs under their control, causing malfunction of all office computers. A mobile two-way radio can be set to the wrong channel, or its volume can be dialled down, causing a message to be missed. Unintended switching off of devices is most commonly regarded as Power failure. Only the most simple devices do not have configuration settings. Sometimes devices are set up once before deployment, and never reconfigured thereafter. For these devices the vulnerability Configuration can be removed too.

Malfunction. Failure of an internal module without a clear external cause, possibly by aging. Malfunction involves failures without an apparent cause; devices have a limited lifetime. Even new devices can fail within their warranty period. Malfunction and Physical failure are similar, and their impact will often be identical. Their frequencies typically differ.

Additional vulnerabilities. At some locations theft is an issue. Some devices are more attractive to thieves than others. Overheating may be an issue. If the environmental controls fail in a large data centre, the consequences will be high. Also, like wired connections devices can be susceptible to electromagnetic interference.

7.4 Stage 3 — Common cause failures analysis

The following information may help with choosing the critical property and the classification of components into clusters. Each cluster has a corresponding story or failure scenario. For example: “when the power fails in this room, all these devices will stop functioning”. Creating such stories makes for a good check; if you cannot think of a plausible story, the cluster classification is probably incorrect.

Interference. The critical property is the frequency band, together with geographical proximity. For two wired links to be affected by the same radio source, that source must be transmitting on a near frequency, and be relatively close.

Jamming. Most jammers operate on a single frequency band. Because the use of a jammer is intentional, the motive of the person jamming is relevant. Most jammers have a limited range. Clusters can be organised based on “who may want to jam communication where, and for which motive?”

Congestion. Congestion is mostly temporary. When there is a high amount of activity in the neighbourhood, the mobile or private networks may become congested. Congestion may therefore affect multiple frequency bands simultaneously (GSM, UMTS and LTE, for example). Depending on the technology used, congestion may be local or affect the entire telecommunication service.

Signal weakening. This vulnerability is mostly limited to mobile devices. A common cause failure requires one person using two devices is at a spot with poor coverage (for example, using a mobile phone and a two-way radio), or that two users are at such a spot.

Break. The critical property is geographical proximity; cables sharing the same location can be damaged simultaneously by external influences. Cables below the ground often follow the same route, or share a common duct underneath roads or canals.

Congestion. As for wired links.

Cable ageing. The critical property can be whether the cable is used above the ground, below the ground or indoor. If the age of the cables is known, a subdivision based on age can be used.

7.4.3 Equipment

Physical damage. For fixed equipment the critical property is geographical proximity. Mobile devices have their own cluster, with possibly subclusters for different types of users.

Power. For fixed equipment the critical property is geographical proximity. Mobile devices have their own cluster; subclusters can be created according to battery life.

Configuration. The critical property is who controls or can change the configuration settings: the IT department, an external service organisation, professional end users or common users. Computers are maintained by the IT department, but their users can often change settings (accidentally) as well. In this case, assign the device to the most vulnerably cluster.

Malfunction. Devices are often bought in batches. It is not impossible for multiple devices to fail (roughly) simultaneously. Age is a relevant property, but the kind of use and treatment are relevant as well. Devices that experience rough handling will have a higher probability of sudden malfunction than devices at a fixed location.

Often a single project meeting suffices, after which the core team completes the assessment of common cause failures.

7.5 Stage 4 — Risk evaluation

Most commonly the compilation of the longlist and reduction into the shortlist can be completed in a single project meeting. It is also possible for the core group to prepare their choice for discussion. Social risk factors must be assessed for each of the risks on the shortlist.

The collation of material into a final report can be prepared by the core team. Much of the Stage 1 report can be reused, and printouts from the Raster tool can be used for the appendices suggested in the template in section Prepare the final report.