1 Introduction

Introduction and guide to this document.

Organisations use many types of telecommunication services: fixed and mobile telephony, videoconferencing, internet, encrypted links between offices, etc. In the last decade, organisations have become much more dependent on these services. Whereas in the past a telephone outage was an inconvenience, today the failure of telecom services often makes it impossible to do business at all. And as organisations move online and into the cloud, reliability of telecom services becomes even more essential.

At the same time, technological and market changes have made it more difficult to assess the reliability of telecommunication services. Networks grow continuously, new technologies replace old ones, and telecom operators outsource and merge their operations. For any end-to-end telecom service, several telecoms operators will be involved, and none of them can understand how important that service is to each customer.

This increased dependency applies even more to organisations that fulfil a vital role in society, such as fire services, medical care, water boards, utilities, banks, etc.

It is therefore important that organisations in general, and organisations that supply critical infrastructures in particular, understand the vulnerabilities and dependencies of the telecom services they use. This document describes a method, called Raster, to assist in this understanding.

The goal of Raster is that the organisation becomes less vulnerable to telecom failures. To reduce the vulnerability, the organisation must first understand what can go wrong with each telecom service they use. Also, these risks must be ranked, so that the most pressing risks can be addressed first. Raster helps a team of analysts to map and investigate one or more telecom services for an organisation. The result is a report, showing which risks should be addressed first, and why. Selection and execution of countermeasures is the next logical step, but is not part of the Raster method.

Incidents with availability of telecom services often happen because of component failures: an underground cable is damaged by a contractor, a power failure causes equipment to shut down. To prepare for these incidents, the organisation must first realise that the cable and equipment exist. An important part of the Raster method is therefore to draw a diagram showing all components involved in delivering the service.

Incidents can also happen when a single event leads to the simultaneous failure of two or more components. For example, two cables in the same duct can be cut in the same incident, or a software update can cause several servers to misbehave. These failures are called common cause failures, and they are dangerous because their impact can be quite large.

Major steps in the Raster method are to draw service diagrams, and to assess the likelihood and potential impact of single and common cause failures. However, unlike other methods Raster does not take a narrow numerical approach to assessing risks.

Risks with low probability and high effects are especially important. These rare but catastrophic events have been called “black swans”. Raster helps to uncover black swans in telecom services.

Risk assessments are always in part subjective, and information is hardly ever as complete as analysts would like it to be. This does not mean that biases and prejudices are acceptable. Raster tries to nudge analysts into a critical mode of thinking. Uncertainty is normal, and assessments can be explicitly marked as “Unknown” or “Ambiguous” if a more specific assessment cannot be made. Raster can be applied even when much of the desired information on the composition of telecom networks is unavailable or unknown. Missing information can be gradually added.

To avoid a narrow risk assessment, the Raster method is applied by a team of experts, each having his own area of expertise. Raster facilitates cooperation between experts of different backgrounds.

Raster facilitates the construction of a recommendation using a tested methodical analysis. This recommendation is not just based on the technical aspects of failure of telecoms services, but also takes account of the societal impact of failures, and of risk perceptions of external stakeholders.

One final remark: Raster can be deployed on its own, or as part of a company-wide risk management framework. This manual assumes a stand-alone application. When Raster is used as an element within an approach, the initiation stage (in which the scope of the study is determined) will likely need to be adapted.

The following parties are involved in applying the Raster method.

The case organisation: the method is executed on request of an organisation. This organisation is the requesting client of the study.

The project leader: the person facilitating the application of the method. The project leader can be one of the analysts, or focus on managing the process.

The analysts: the method is executed by a group of professionals. It is essential that this group consists of multiple people. Not only does a single person seldom possess all required knowledge, it is also important that the study leads to an objective and impartial assessment, as much as possible free from personal preferences or personal blind spots.

The team needs to encompass knowledge on essential business activities and technical aspects of telecommunication networks and services. Additionally, it will be useful if team members have some experience with risk assessment, and with the Raster method in particular. Because of this range of knowledge it will be necessary to include employees of the case organisation in the team of analysts.

The sponsor: the person or entity representing the case organisation for the purpose of the study. Typically this will be a manager from the case organisation. The sponsor can be one of the analysts. The sponsor is the customer to the project leader.

The decision makers: the output of the method is a set of recommendations and supporting argumentation that serve as the basis for the selection of risk treatment decisions. Responsibility for the selection does not belong to the analysts, but to the decision makers. The decision maker can be sponsor, but these roles can also be separate.

The external stakeholders: this category includes all parties that are not part of the case organisation and not involved in the use of telecom services, but do have interests that may be harmed by the risks or chosen risk treatments. External stakeholders may be 'the public' in general, or a specific group such as those people living in the neighbourhood of a facility, the patients of a hospital, customers, etc.

Software tools are available to support the application of the method. Their use is strongly recommended, and this manual assumes that one of these applications is used. There are multiple versions: there is a standalone application for Windows and MacOS, and a web-based tool that requires an intranet server. All versions function almost identically. This manual uses “the application” without specific reference to indicate that either version can be read.

This manual is for the professionals who will execute the Raster method. It explains the method and provides guidance. These professionals can either be telecom experts or experts in any other field whose expertise is needed.

Examples, notes and tips are typeset in text boxes.

This would be an example, note, tip or shortcut.

Chapters 3 to 6 describe the Raster method; chapters 8 to 13 describe the Raster tool that aids the creation of diagrams and the analysis of Single Failures and Common Cause Failures. When executing an analysis using Raster, you will proceed as in the figure overleaf.

 

Technical issues