SS-E - Practical Safety Instrumentation & Emergency Shutdown Systems for Process Industries
For project managers and engineers involved with hazardous processes, this manual focuses on the management, planning and execution of automatic safety systems in accordance with IEC 61511, the newly released international standard for process industry safety controls.
Download Chapter List
Overview of Safety Instrumented Systems - Practical Safety Instrumentation & Emergency Shutdown Systems for Process Industries
1 Overview of Safety Instrumented Systems
1.1 Summary of contents
The theme of this chapter can be simply stated by two sentences:
- A business that operates any form of hazardous process needs safety systems
- Safety systems do not work without good management
Safety Instrumented Systems are part of the overall risk reduction measures that a company will typically install to deal with a hazardous process. We explain the basic technical features of a safety system and show what tasks must be carried out to ensure that the protection measures are properly defined and implemented. The performance requirements of safety systems are described in non-technical terms and the relevance of safety integrity to the capital cost and operating costs are spelt out.
We look at the developments that have resulted in a comprehensive internationally accepted standard, IEC 61511-2003 being available specifically for use in the process industries. The chapter explains the scope and importance of IEC 61511 as a means to achieve and demonstrate high quality in applied safety systems. The version of IEC 61511 published in the USA as ANSI/ISA S84.00.01:2004 is the standard required by OHSA for SIS to achieve compliance with Process Safety Management and General Duty regulations as applied to process plants.
Past failures of safety systems have very often been attributed to human errors in their design and upkeep. Authorities responsible for enforcement of safety have come to the realization that the management of all safety activities is therefore as important as the technical equipment used to carry out safety functions. This is why IEC 61511 defines the management of safety life cycle activities as one of the critical issues in achieving compliance with the standards. This chapter outlines the requirements for management of safety life cycle activities and introduces issues such as, staff competency requirements and conformity assessment schemes.
1.2 Introduction and objectives
This IDC training workshop has been developed to provide a broad introduction to the methods and concepts of applying safety instrumented system to processing plants. The range of process industries that are likely to use this type of safety instrumentation as broad as the range of process control system applications. The only thing they may have in common is that they have potentially hazardous materials or processes or they may have large sources of stored energy that could be harmful if something goes wrong.
Safety instrumentation is not exclusively an instrument and control engineering subject. The successful implementation of a safety system project depends on the support and knowledge of other disciplines as well as being dependent on a full commitment from company management structures. It requires the environment of a well defined safety management system within the company. Without proper support structures and a good understanding by all involved in defining safety requirements the safety instrumentation on its own will be unlikely to deliver the levels of safety that are expected of it.
IDC’s past experience with training in this field has indicated there is a need for process engineers and technical managers to be conversant with the basics of functional safety systems as they are broadly described in the new standards. The support structures are a crucial part of the assessment scope for compliance with the new IEC 61508 and 61511 standards. This workshop therefore provides a mix of training in the technical issues of safety instrumentation with training in the project engineering and support activities that are essential for success. It is the responsibility of the instrument engineer to involve colleagues from other disciplines in the safety package. It is the responsibility of managements to see that the safety activities are clearly assigned and supported.
The idea of this first training chapter is to provide a substantial overview of the basic issues affecting a safety instrumentation project. It is intended that this chapter can be used as a half day briefing package for any managers and engineers in a company who may wish to learn some more about the subject but do not requires to have deeper knowledge of the technical issues. This chapter equally serves for instrument or electrical technicians as an introduction to the subjects to be covered in more detail in the subsequent chapters.
- To introduce the concepts of functional safety management and the principles of safety systems to both engineering and management personnel
- To provide a foundation for more detailed training of engineers and technicians
- To assist managers in developing safety engineering competencies within their organization
After this training chapter you should:
- Understand the basic concepts of instrumented protection systems
- Recognize the main activities of a safety system project
- Be able to plan a complete lifecycle project using IEC 61511 for guidance
- Be able to identify the main safety system support tasks required in your organization
- Be able to use IEC 61511 to review your existing practices and identify possible shortcomings
- Know the meaning of safety integrity level, and be aware of its relevance to cost of ownership of a safety system
In this chapter the practical exercise takes the form of a questionnaire. Questions will require knowledge of basic principles to be applied.
1.2.5 Contents and roadmap
The subjects in the chapter include the follwing:
- Safety system basics
- Risk management principles applied to protection systems
- Process hazard analysis and its link to protection systems
- The legal framework
- The meaning of SILs and their cost implications
- An overview of standards ANSI/ISA S84.01, IEC 61508 and IEC 61511
- An introduction to the safety life cycle as defined in IEC 61511
- The problems and rewards of SIL determination
- Basics of safety instrumentation needed to meet SIL targets
- Why programmable systems need special treatment
- Cost models and the cost ownership
- Management of functional safety
- Competency requirements and conformity assessment programmes
The following diagram provides a graphical indication of the steps we are going to cover in this chapter.
A roadmap for the safety systems overview
1.3 Safety system basics
To begin this workshop we first need to answer the question: What is safety instrumentation?
Here is a typical definition as given by the UK Health and Safety Executive in their very useful publication “Out of Control: why safety systems go wrong”.
1.3.1 Definition of Safety Instrumented Systems
Safety Instrumented Systems are control systems that take the process to a safe state on detection of conditions that may be hazardous in themselves or if no action were taken could eventually give rise to a hazard. They perform “safety instrumented functions” by acting to prevent the hazard or mitigate the consequences.
The abbreviation SIS is used for “Safety Instrumented Systems” whilst the abbreviation SIF means “Safety Instrumented Function” which is the task or function performed by the SIS. These are terms generally used in engineering standards. You may know the subject by other names because of the different ways in which these systems have been applied. Here are some of the other names in use:
Alternative names found in service:
- Trip and Alarm System
- Emergency Shutdown System
- Safety Shutdown System
- Safety Interlock System
- Safety Related Control System (More general term for any system that maintains a safe state for EUC)
SIS operates independently of the Basic Process Control System (BPCS)
We are talking about automatic control systems or devices that will protect personnel, plant equipment or the environment against harm that may arise from specified hazardous conditions.
When applied to a typical process plant situation the SIS is normally seen as a separate control system that acts independently of any other control or persons. The diagram here shows the basic arrangement.
The SIS is an example of a “Functional Safety System.” Meaning: Safety depends on the correct functions being performed. This distinguishes functional safety from “passive safety “devices such as handrails, or blast proof walls. It is a useful term because it distinguishes the active safety system of any type whether mechanical, electrical or in any other form that must function properly to provide safety.
1.3.2 The structure of an SIS
Safety Instrumented Systems are normally regarded as being structured into 3 parts within a framework or boundary that defines it. They always require the three parts comprising:
- Sensor sub-system:To capture the data on line from the process
- Logic solver sub-system:To evaluate the data and make decisions on when and how to act
- Actuator sub-system:To execute the required actions on plant
Structure of a Safety Instrumented System
Figure 1.3 shows that the subsystems lie within a boundary that defines the essential SIS whilst it also needs to have interfaces to its users and those who maintain it as well as to the basic plant controls. Items within the boundary must be engineered to the standards required for functional safety systems.
All three sub-systems must perform correctly to ensure that the SIS can provide the required protection. Which brings us to one of the key design principles.
1.3.3 Safety integrity
The degree of confidence that can be placed in the reliability of the SIS to perform its intended safety function is known as its “Safety Integrity”. The concept of safety integrity includes all aspects of a safety system that are needed to ensure it does the job it is intended to perform. One of these aspects will be the hardware reliability of the equipment and the way it responds under all conditions. Other aspects include the accuracy with which it has been designed and the level of understanding of the hazards that went into its original design.
These are topics that we must be concerned with if we are to build a credible or “high integrity safety system”.
We shall see in a moment how safety integrity is graded into levels of performance called SILs or safety integrity levels.
It follows from the structure of the SIS that all three subsystems must individually be good enough to ensure that the overall safety integrity of the SIS meets the intended target or SIL target. This is a useful concept because it means we can concentrate on each subsystem separately at the basic engineering stage.
1.3.4 Practical example of an SIS
It may be useful at this stage to translate above concepts into something closer to reality. Let’s consider a simple process plant example as shown in figure 1.4. The hazard in this process is seen as the overfilling of a pressure vessel with a toxic chemical leading to release via the relief valve.
The causes of the overfill could be an operational error or a failure of the basic level control instrumentation. An SIS can be designed to independently shut off the incoming feed if the level or pressure becomes high enough to indicate a dangerous condition.
Figure 1.4 shows the SIS added to the plant as an entirely separate control system capable of acting despite any problems with the rest of the plant equipment.
Figure 1. 4
Example of a simple shutdown system
This example is sufficient for our overview work and we can must now attend to the underlying concepts of hazards and risk reduction.
1.4 Risk reduction and safety integrity
There is a common saying in the control systems world: “if you want to control something, first make sure you can measure it.” We need to control the risks of harm or losses in the workplace due to hazards of all forms. So what we need to measure is: RISK. Here we need to be clear on the terms Hazard and Risk.
1.4.1 What is hazard and what is risk?
A hazard is "an inherent physical or chemical characteristic that has the potential for causing harm to people, property, or the environment”.
In chemical processes: “It is the combination of a hazardous material, an operating environment, and certain unplanned events that could result in an accident”.
Risk is usually defined as the combination of the severity and probability of an event. In other words, how often can it happen and how bad is it when it does happen? Risk can be evaluated qualitatively or quantitatively.
Roughly: Risk = Frequency x Consequence of hazard
Risk reduction can be achieved by reducing either the frequency of a hazardous event or its consequences or by reducing both them. Generally the most desirable approach is to first reduce the frequency since all events are likely to have cost implications even without dire consequences.
Example for risk reduction
Safety systems are all about risk reduction. If we can’t take away the hazard we shall have to reduce the risk. To know how to do this it helps to look at the theory measuring risk and then reducing it.
1.4.2 Hazards and risks
All types of safety measures are intended to reduce risk of harm to people, the environment and assets. The hazards most commonly found in process industries are those due to:
- Explosions or bursting due to large amounts of stored energy, chemical reactions or release of flammable vapors
- Fires due to combustion of chemical substances internally or externally to the process or through overheating of equipment
- Toxic releases and exposures or entrapment in gas filled spaces
- Mechanical hazards due to large machines, materials handling, steam and gas discharges
We have seen that risk is usually defined as the combination of the severity and probability of an event. In other words, how often can it happen and how bad is it when it does happen?
Roughly: risk = Frequency x Consequence of hazard
1.4.3 Measurement of risk
Risk can be evaluated qualitatively or quantitatively. The qualitative approach requires that we describe risk in descriptive terms such as “high” or “low’ or “moderate”. These terms are only effective if everyone has a good understanding of what they mean in the context of use. Hence a “high risk neighborhood” is not popular with insurance companies. If the terms are well defined or “calibrated” against a scale of values that is generally accepted the qualitative risk measurement can be very effective.
The quantitative approach is easier to define in terms of frequency of events and how many people get hurt but it is often hard to extract a firm number from a situation without a lot of statistical evidence. For the moment our studies will assume a quantitative measure of risk is possible.
Figure 1.6 indicates shows that risk levels can be regarded as similar if a severe consequence may occur rarely or if a less severe consequence occurs more often. It follows that risk reduction can be achieved either by reducing the frequency (likelihood) of the hazardous event or by reducing the consequences.
Principles of risk reduction by reducing frequency or consequence
Usually a functional safety system acts to reduce the likelihood of the hazardous event whilst other operational measures are used to minimize consequences. For example a blast proof wall may protect people against an explosion but it will not reduce the chances of the explosion.
So the easiest way to visualize an SIS providing safety is to regard it as reducing the event frequency.
As shown in Figure 1.7: A plant without a safety system may have an unprotected risk frequency of Fnp which is reduced to a protected risk frequency, Fp, by adding a safety system. The risk reduction provided by the SIS is called the Risk Reduction factor (RRF) and is simply the ratio of unprotected risk to the protected risk frequency.
RRF = Fnp/Fp
This simple ratio makes RRF a very effective index of safety system performance or integrity. The amount of risk reduction provided by the SIS depends on its “safety integrity “.
SIS reducing the frequency of the hazardous event
1.4.4 Introducing safety integrity levels
We have noted that safety integrity depends on hardware and design. We have seen that the required RRF provides a scale of performance for the ability of a safety system to reduce risk. We can therefore use RRF as a measure of safety integrity. Safety system engineers recognize that it is helpful to grade safety integrity into four distinct bands of risk reduction capability known as the safety integrity levels.
Figure 1.8 shows how 4 safety integrity levels are recognized and how these levels encompass 4 ranges of RRF capability.
In practice a SIL 1 safety system is the most commonly used and provides risk reduction in the range from 10:1 to 100:1. In the process industries the highest SIL rating used is normally SIL 3 whilst SIL 4 is only attempted under very special circumstances. The SIL levels 1 to 3 therefore represent a coarse scale of safety performance for the SIS. The challenge will be to choose the right SIL for any particular problem.
Table of safety integrity levels
1.4.5 Demand mode and continuous mode
The new standards have clarified the fact that there are two basic types of safety controls. In the process industry, “Demand Mode” is widely used applies when the safety trip is expected less than once per year. These are the familiar safety trip systems that are used to shutdown the process in emergency. The hardware reliability that is expected of a SIL rating is derived from the “Probability of failure on demand”. As can be seen in figure 1.9 this is the PFD avg.
In “Continuous Mode”. The safety trip or control action is expected more than once per year. This would be the case for example where a Start up safety interlock may have to act daily or once per month. This type of control is regarded as a safety control system and the SIL is derived from:
Frequency of dangerous failures per hour.
The engineering principles for these two modes are exactly the same but the method of calculating the possible failure and consequent accident rate is different. We shall look at this point in more detail later in the workshop but it is worth noting at this stage that where a plant is encountering trips more often than once per year the validity of its SIS performance claims may have to be revised to lower values.
SIS operating in demand mode
In conclusion, the Demand Mode features are summarized in figure 1.9 where the SIS can be seen responding to the possible hazard as a demand to take action. Only when the SIS fails will the demand be allowed to become a hazardous event.
1.5 Protection layers
Now that we see the SIS as a risk reduction element it is helpful to see how it fits in the context of overall plant safety. This will enable us to see how the SIL target can be adjusted to provide best overall value from the plant safety systems.
1.5.1 Belt and braces
Layers of protection model
The concept of protection layers applies to the use of a number of safety measures all designed to prevent the accidents that are seen to be possible. Essentially this concept identifies all “belts and braces” involved in providing protection against a hazardous event or in reducing its consequences. Figure 1.10 shows the concept where the core risk due to a hazard is seen to be contained by successive layers of protection leaving a minimal or acceptable risk level at the outside boundary.
Protection layers can be divided into two main types: Prevention and Mitigation as seen in figure 1.10:
- Prevention layers:These try to stop the hazardous event from occurring
- Mitigation layers: Mitigation layers reduce the consequences after the hazardous event has taken place
1.5.2 Prevention layers
Examples of prevention layers include:`
- Plant design
Plants should be designed as far as possible to be inherently safe. This is the first step in safety and techniques such as the use of low-pressure designs and low inventories are obviously the most desirable route to follow wherever possible
- Process control and work procedures
The control system and the working procedures for operators play a role in providing a safety layer since they try to keep the machinery or process within safe bounds. However we shall see later that their contribution to plant safety is limited and can sometimes be overrated.
- Alarm systems
Alarm Systems have a very close relationship to safety shutdown systems but they do not have the same function as a safety instrumented system. Essentially alarms are provided to draw the attention of operators to a condition that is outside the desired range of conditions for normal operation. Such conditions require some decision or intervention by persons. Where this intervention affects safety, the limitations of human operators have to be allowed for.
- Mechanical or Non-SIS protection layers
A large amount of protection against hazards can be often be performed by mechanical safety devices such as relief valves or overflow devices. These are independent layers of protection and play an important role in many protection schemes.
- Shutdown systems (SIS)
The safety shutdown system provides a safety layer through taking automatic and independent action to protect the personnel and plant equipment against potentially serious harm. The essence of a shutdown system is that it is able to take direct action and does not require a response from an operator.
1.5.3 Mitigation layers
Mitigation layers are identified as those measures that reduce the consequences of the hazardous event after it has occurred. Examples include: Fire & Gas systems, Containments and Evacuation Procedures.
Using more than one method of protection is generally the most successful way of reducing risk. The safety standards rate this approach very highly and it is particularly strong where a SIS is backed up with, say, a mechanical system or another SIS working on a completely different parameter.
1.5.5 Risk reduction models
It is often helpful to visualize risk reduction by using a graphical model as seen in the example shown in figure 1.11.
SIS seen as a layer of protection
This model indicates the core risk of a toxic release from a hazardous process and shows a potential release frequency of Fnp. The successive and diverse layers of protection reduce the risk frequency at each stage until the residual risk becomes Fp. Note the role of the SIS in this example.
Risk reduction models help us to see how the risk reduction tasks have been “allocated “to various protection layers.
1.5.6 The problem of common cause
The idea of protection layers and successive risk reduction is only valid if the layers are fully independent of each other. It assumes that if one layer fails the other layers will still do the job. If there is a possibility that two or more layers could fail at the same time the assumptions become invalid and the protection systems are said to have a “common cause failure”.
Whilst common cause failures may be attributable some form of engineering factor a more likely cause is the failure to manage overall safety in such a way that it affects two or more safety layers. One tragic example was seen in the events at the Bhopal pesticides plant in India as illustrated by the next figure 1.12.
These notes are based on the account of the accident described in the book “Five Past Midnight in Bophal” by D Lapierre and J Moro (see ref 4 in appendix 1).
The plant had 3 storage tanks for methyl isocyanate (MIC), an unstable liquid that decomposes into a range of toxic components as its temperature rises above 15 C. Most deadly of these is hydrocyanide acid or cyanide gas, which when inhaled, typically leads to death in a very short time.
The Bhopal disaster: all safety layers disabled
The safety systems for the tanks comprised 4 protection layers:
- Each tank was to be operated at no more than 50 % capacity to allow room for a solvent to be added in case a chemical reaction started in the tank
- The tank contents were to be kept below 15C by means of a refrigerant system circulating Freon through cooling pipes at the tanks. A high temperature alarm was provided on each tank to alert operators to an abnormal temperature rise
- Should any gases start to emerge from the tanks they should be absorbed by caustic soda injection as they pass through a decontamination tower
- Finally if any gases escape the absorber tower a flare at the top of a 34-metre flare stack
Due to a lack of demand for the pesticides produced by the plant there had been a long period of time when production had been shutdown or kept to a minimum. The plant equipment and operating standards had been allowed to deteriorate. Finally on the night of 2 December 1984 the tanks appear to have been contaminated with hot water from a pipe-flushing task. This lead to an uncontrollable reaction, which ruptured the tanks, the first of which being 100% full contained 42 tons of MIC. The resulting gas clouds blew across the settlements adjoining the factory fence and onwards into the city. The death toll is disputed but is claimed by Lapierre and Moro to be between 16 000 and 30 000 with around 500 000 people injured.
How could 4 layers of protection be defeated? The simple answer is that there was a common cause failure that was not factored into the safety calculations. Failure to manage the plant according to intended safety and maintenance practices.
The individual failures were:
- Tanks were not kept below 50% full as intended for safe operating practices.
- The refrigeration system had been turned off months earlier including the alarm system because the plant manager did not believe it was necessary to keep the MIC at 5 C. The ambient temperature was 20 C.
- The decontamination tower was offline for maintenance and had been so for a week.
- The flare stack was also out of service for maintenance.
The chemical industry has hopefully learned a lot of hard lessons from the Bhopal disaster but it is informative to read the details and see how familiar the problems are as reported from that experience
1.5.7 Summary of hazards and risk reduction
What we have seen so far indicates that the SIS is just one component of an overall risk management strategy for a hazardous activity in a manufacturing plant. For a SIS to be effectively designed and implemented, the following key aspects of a SIS project will have to be assured.
- Hazard studies and hazard analysis
Identify the hazards and estimate the risks.
- Definition of overall safety targets for each type of risk
The overall amount of risk reduction needed for the hazard needs to be defined by someone who knows what is acceptable: This is a management or corporate responsibility.
- Allocate risk reduction functions and RRFs to layers of protection
This defines the risk reduction contribution of the SIS and hence defines its target SIL.
- Ensure that each safety layer is managed to deliver the required risk reduction
This requires correct design procedures in each discipline and requires work procedures and responsibilities to be defined and supported by management.
- Ensure that the SIS delivers the required functional safety
What does it take to ensure the SIS will deliver the required functional safety?
We are going to investigate the answer to this question in the next sections. To proceed further we should now look for guidance from the standards. The next section introduces the standards, describes how they have come about and shows what they cover.
1.6 Safety management principles
It helps to look at the principles of risk management because they can be applied directly to safety management. Understanding risk management will show us how the application of Safety Instrumented Systems is an integral part of the overall task of managing risk in a company.
Why is this important?
Because both managers and engineers can do can do a better job for safety instrumentation by understanding its context and relevance to the overall business and the risk it carries.
1.6.1 The meaning of safety management
What does safety management mean for a manufacturing plant or large item of equipment?
Safety management involves the provision of a safe working environment for all persons involved in the manufacturing process. It extends to cover the safety of the environment and the security of the business from losses.
The fundamental components of safety management will include:
- Having a systematic method of identifying and recording all hazards and risks presented by the subject plant or equipment
- Ensuring that all unacceptable risks are reduced to an acceptably low level by recognized and controllable methods that can be sustained throughout the life cycle of the plant
- Having a monitoring and review system in place that monitors implementation and performance of all safety measures
- Ensuring all departments and personnel involved in safety administration are aware of their individual responsibilities
- Responding to regulatory requirements from national and local authorities for the provision of adequate safeguards against harm to persons and the environment
- Maintaining a risk register and a safety case report that demonstrates adequate safety measures are in place and are being maintained at all times
Safety management is effectively the same as the more general term, risk management, but applied specifically to risks associated with harm to persons, property or environment. Let’s take a closer look at risk management principles to see what we can learn from them.
1.6.2 Risk management defined
Risk management is a very broadly used term and it is typically applied to business and organizational activities. The broad scope of this term can be seen in the definition of risk management taken from the Australian/New Zealand standard AS/NZ 4360:1999 clause 1.3.24. (Latest version AS/NZS 4360: 2004)
“Risk management-The culture, process and structure, which come together to optimize the management of potential opportunities and adverse effects “
The application of risk management to occupational health and safety is just one of the many areas where the techniques are used. Let’s look at a few basic processes in risk management to show how they match up to established or emerging methods in engineering systems. The following notes are based on guidance provided in the guideline document: “A basic introduction to managing risk” published as an Australian guideline HB 142-1999 by Standards Australia. (Now superseded by HB 436:2004 (Guidelines to AS/NZS 4360:2004): Risk Management Guidelines Companion to AS/NZS 4360:2004.
- Requires rigorous thinking. It is a logical process, which can be used when making decisions to improve the effectiveness and efficiency of performance.
- Encourages an organization to manage pro-actively rather than reactively.
- Requires responsible thinking and improves the accountability in decision making.
- Requires balanced thinking… “Recognizing that a risk-free environment is uneconomic (if not impossible) to achieve, a decision is needed to decide what level of risk is acceptable”.
- Requires understanding of business operations carried on, where conformity with process will alleviate or reduce risk.
Hazard studies are part of the disciplined approach to managing risks in plant operations and they must be conducted in accordance with the principles shown here.
1.6.3 The process for managing risk
It turns out that the models suggested for managing risk are the same as those we find in the procedural models described for safety life cycle activities that we shall be looking at later. This is encouraging since it means that one procedural model fits all circumstances and no specialties are involved for safety. If the company recognizes risk management in its business, it should have no problem understanding safety management.
Here is a diagram of a general risk management model based on the version originally published in AS/NZS 4360: 1999. (Latest version AS/NZS 4360: 2004)
The process for managing risk
This model is intended to serve for all risk management activities within a company. These begin with strategic risk management applicable to the corporate planning levels where key business decisions can be subjected to risk evaluation and treatment. There are close parallels with the management of engineering risks and the management of functional safety. Let’s examine the meaning of each step of the process.
1.6.4 Establishing the context
The context includes:
- Strategic context: In our field of work this would be typically defined by the organization’s overall Safety Health and Environment (SHE) policy. It would also define the legal framework or regulatory compliance needs for the plant in question.
- Organizational context: Requires an understanding of the organization and its capabilities. For example; is the plant in high tech or low-tech area?
- Risk management context: Defining which part of the organization or which activities are in the scope. This would be the specific manufacturing plant or process under consideration.
- Risk evaluation criteria: Defines the criteria against which any risk is to be evaluated. We shall see that in our field this includes the so-called tolerable risk criteria for risks of harm to persons, environment and asset losses. Risk management and risk reduction cannot be conducted without some reference points for what is acceptable.
- Structural context: Deals with how the risk management process is to be handled and documented within the organization. Expect this to lead to a definition of who is responsible for the supply of information, conducting studies and managing the documentary records. In the case of SHE risk management the documentary records are of critical importance and will require a quality management system.
1.6.5 Identify risks
With the context in place the risk management model says, “identify the risks”. The HB 436 guide (see Para.1.6.3) raises the issue of “perceptions of risk” and points out that: “perceptions of risk can vary significantly between technical experts, project team members, decision makers and stakeholders”.
In this workshop we have to take the “technical experts” route to risks, as we shall see below. It is instructive to note that the layperson sees risk on a more personal and subjective scale.
“…lay persons are less accepting of risk over which they have little or no control (e.g. public transport versus driving one own car), where the consequences are dreaded or the activity is unfamiliar”
This is the stage where hazard studies are performed to answer the questions: “What can happen, how can it happen? The result is a list of risks with the possible causes: This provides the foundation for the “Risk Register”; a convenient way of registering all known risks at the plant and recording what measures have been taken to reduce them to an acceptable level.
1.6.6 Analyze risks
The next step is “Analyze the risk”. You can see from the diagram that it is necessary to establish a level of risk based on the criteria we mentioned earlier. The likelihood and the consequences must be found, the resultant risk established and applied to a scale of risk used to set priorities. This is known as risk ranking value and is often performed by using a risk matrix.
1.6.8 Evaluate risks
The next step is to compare the risk level with certain reference points to decide if the risk level is acceptable or not.
Evaluation of risk and the treatment stages
If the risks are unacceptable the choice is to treat the risks or decide to avoid the risks altogether by doing something else.
The diagram introduces the concept of “tolerable risk” or “acceptable risk”. In business practice, the reference point for acceptable risks may depend on the company and its senior management. When it comes to safety and operability there is less room for flexibility. We are concerned with what is acceptable to society and our workers as a “tolerable risk”.
We are going to take a closer look at tolerable risk concepts in a few moments. Before that, let’s look at the general-purpose model for risk treatment.
Details of treatment of risks (based on AS/NZS 4360: 1999 but modified for safety studies)
This diagram is informative for us in safety management because it demonstrates the options and decision that have to be considered during a hazard analysis and after a Hazop study. In fact this diagram covers all stages in the life cycle of the situation being considered. We shall see this theme recurring throughout the workshop. Let’s consider the terms on the left hand side of the diagram:
1.6.9 Identify treatment options
In safety applications we are often able to reduce the risk by treating the likelihood (i.e. reducing the chances of the accident). Sometimes it is necessary to reduce the consequences by what is called “mitigation”. (Putting on gas masks after a gas escape is a simple example of mitigation). Protection methods to reduce risk are described as “layers of protection” and we shall be looking at those shortly.
One solution to an unacceptable risk is to avoid it altogether. Unfortunately, this route sometimes implies not building the plant and this has to be considered along with all other options. One of the most important outcomes of a hazard study can be the decision to abort the whole project or adopt an alternative technology on the grounds of unacceptable risk to persons and environment.
1.6.9 Assess treatment options
This is a very interesting stage of risk analysis. We have to consider feasibility, costs and benefits of the possible risk treatment options.
In the case of an engineering project the choices typically come down to:
- Shall we redesign the process to minimize hazard?
- Shall we provide alarms and trips to shutdown the process when the hazardous condition approaches?
- Shall we provide a blast-proof room and evacuation facilities to protect the persons on the plant?
- Shall we do all of these things?
To make a good decision here requires knowledge of the process and the protection methods, some experience and some good cost information. Someone has to do a quantitative analysis of the risks. The problem for hazard study teams and project managers is often that the analysis of the risk is approximate and the cost implications of some of the solutions are not readily available. And there may not be much time available for the choices to be made as project deadlines always demand an early decision.
Assume for the moment that the approximate cost of all risk treatment options is known in a particular case. If a choice of options is available, the decision can be made by looking for a trade off between the achievable risk level and cost of achieving it. The relationship model is typically as shown in the next diagram.
Typically, the cost of reducing risk levels will increase with the amount of reduction achieved and it will follow “the law of diminishing returns”. Risk is usually impossible to eliminate so there has to be a cut off point for the risk reduction we are prepared to pay for. We have to decide on a balance between cost and acceptable risk. This is the principle of ALARP that we shall examine in the next section.
The second factor in that will influence the hazard study work is the relationship between design changes and their impact on project costs. There are heavy cost penalties involved in late design changes. Hence it pays to design the hazard study program to identify critical safety and operability problems at an early stage. This is where preliminary hazard study methods are valuable. Preliminary studies can often identify major problems at the early stage of design where risk reduction measures or design changes can be introduced with minimum costs.
Risk reduction versus cost
Cost of design changes against project time
1.6.10 Prepare treatment plans
The next step in the risk management model is to detail the chosen or proposed solutions to the risk problems. In safety systems, this translates into what is known as the “safety requirements specification”. Later in the workshop we are going to examine this stage in detail to make sure the transition from problem identification to solution works properly. The need for monitoring and review becomes critical from this point on as we seek to make sure the solutions still fit the problem.
This stage is completed when the chosen solutions are ready for use and have been validated to be correct for the original purpose.
1.6.11 Implement treatment plans
Implementation covers the in-service operation of the safety systems and is supported by the monitoring and review process. The model shows that the question of acceptable risk is to be kept open and under review. This philosophy requires, for example, that the hazard study information is kept up to date and that periodic reviews must be held to see that the risks levels are still acceptable.
1.6.12 Practical versions of risk management for plant safety
Practical implementation of risk management
The SIS project is an integral part of the overall safety management system for a process plant that presents a hazard. All the elements of risk management translate into practical activities for the specification and design of Safety Instrumented Systems. 1.17 shows the key elements of the safety project with the hazard study stages on the left and the SIS implementation on the right.
- The preliminary hazard studies identify the risks and place them into a risk register.
- The risk register records the risk reduction needs for each risk and the treatment options deliver the requirements for safety into the core documents for the plant and for SIS. These are called the safety requirements specifications.
- The requirements are used as the basis for building the safety systems. Non-SIS devices as those such as such as relief valves and protected buildings. The SIS takes it share of the risk reduction, known as its safety allocation.
- Hazop studies examine the detailed P&I diagrams for the plant and should be used confirm that the planned safety measures are still acceptable. Sometimes the Hazops identify new hazards and risks and these are then added to the list.
- As the plant moves into construction and commissioning follow up studies confirm that the measures listed by previous studies have been implemented and these support the validation of the completed SIFs.
- The final risk rating should be low enough to be considered acceptable for all personal, environmental and business risks. Validation of the installed SIS and other measures seeks to confirm the risk reduction objectives have been achieved
- Once fully operational all the operating and maintenance procedures are aimed at keeping up the standard of performance of the various safety systems.
- Periodic reviews of both the hazard studies and the SIS performance are used to ensure risk levels are being kept within the target range.
1.6.13 Conclusions from risk management
We have seen how the generalized models for risk management are directly applicable in safety management. Risk management involves the systematic analysis of risk levels, knowledge of acceptable risk levels and the selection of measures to reduce risk to the acceptable level. The selection of measures involves balancing the level of safety achieved against the cost of achieving it.
When we look at the new application standards for Safety Instrumented Systems it is easy to recognize the same principles being applied. Industry therefore has available a set of recognized standards and practices for designing and operating safety systems that aligns with well established principles of risk management. Does it also have legal obligation to use them? Let’s take a look at the legal framework.
1.7 The legal framework for process safety
Where do we stand with regard to the legal requirements for safety? What does the law require us to do? Are there any safety targets that we are legally required to meet?
Most industrialized countries have legal frameworks in place that are similar in nature and have been substantially improved in recent years. Safety regulation now emphasizes the need for a complete safety management system. This aims to deal with the fact that many accidents can be traced back to failures to manage the various aspects of safety from identification of hazards through to training and continued monitoring of safety performance.
The general principles that are those that are commonly seen in regulations in the USA and in Europe. These provide a good indication of what one should expect to be doing to satisfy good practices anywhere. The most commonly seen principle is that all potentially hazardous activities must be subject to a risk assessment process.
- Hazard studies to identify hazards and risks
- Risk analysis to decide the level of risk
- Decide if risk reduction measures are needed
- Implement risk reduction measures
- Confirm that the process is now safe to an acceptable level of risk.
- Carry out periodic audits and reviews of safety studies and achieved performance
In the case of the process industries, plants having a known hazardous process or having major accident potential are required to develop a comprehensive safety case for inspection by authorities and this will include proving that they have a good safety management system in place. They are required to carry out process hazard analysis studies at frequent intervals to ensure the plant risk assessments and treatments methods are up to date with the current version of the plant.
1.7.1 International trends in safety practices
Typical structure of health and safety regulations for industry
In most countries Occupational Health and Safety (OHS) regulations lay down basic requirements for employers to safeguard their workers and public from harm. The overall OHS requirements are typically supplemented by additional regulations that target particular sectors of industry where significant problems with hazards are known. The above diagram shows the characteristic structure seen in USA, Europe, South Africa, Australia and other countries.
The OHS types of regulations usually require that a risk assessment be carried out on the occupations and processes at the place of work. They normally require a reporting and review system to assist regulatory oversight.
Specific regulations have been generated for particular types of industry that supplement the basic OHS requirements. For example the principal regulations affecting the chemical industries in the USA are:
- OSHA regulations for “Process Safety Management of Highly Hazardous Chemicals and Blasting Substances (Referred to as the PSM rule) (29 CFR 1910.119)
- USA: Clean Air Act: EPA-40 CFR Part 68: Accidental Release Prevention Requirements. Risk Management Program. Referred to as the “RMP Rule”
The widespread application of the PSM and RMP rules means that process hazard analysis (PHA) is an essential technique for very many companies in the USA. In particular the more critical process plants will be most likely to employ detailed HAZOP procedures as the routine method for assisting them to comply with the regulations.
Periodic reviews of existing hazard studies are part of the mandatory review procedures built into safety management systems. Some countries define mandatory review intervals. The PSM rule in USA requires companies to update or revalidate their process hazard analysis at least every 5 years, in South Africa the Major hazard installations Act set has set the review at every 3 years.
The PSM rule was an improvement over earlier safety regulations and was driven by the realization that major hazard potentials at plants were not being managed to adequate standards in some areas. The main driving force was said to be the Pasadena, Texas incident. Subsequently the USA set up the Chemicals Safety Board to track all chemical plant accidents. Their annual reports have shown some startling facts:
Studies have been carried out in response to these concerns and there have been significant if sometimes conflicting findings.
1.7.2 European regulations
In Europe the major hazard regulations are derived from the Seveso II directive (96/82/EEC) and its amendments. The directive originates from the Seveso 1 directive that was introduced following the disastrous events at Seveso in Northern Italy.
The first Seveso directive was later revised and extended, again stimulated by accidents such as Bhopal, India 1984 and Basel, Switzerland, 1986. The current version is known as the Seveso II directive:
Incidents at Seveso, Italy
Outline of Seveso II
The SEVESO II Directive sets out basic principles and requirements for policies and management systems, suitable for the prevention, control and mitigation of major accident hazards.
Establishments that have the potential for major accidents are required to comply with the requirements of the directive in the form of national laws that are passed to enact the EU directives. The establishments are classed into “lower tier” and “upper tier” according to size of inventories and the size of the plant.
- Lower tier establishmentsare to draw up a Major Accident Prevention Policy (MAPP), designed to guarantee a high level of protection for man and the environment by appropriate means including appropriate management systems, taking account of the principles contained in Annex III of the Directive
- Upper tier establishments(covered by Article 9 of the Directive and corresponding to a larger inventory of hazardous substances) are required to demonstrate in the ‘safety report’ that a MAPP and a Safety Management System (SMS) for implementing it have been put into effect in accordance with the information set out in Annex III of the Directive
1.7.3 Development of a Major Accident Prevention Policy (MAPP)
The Seveso II directive states:
“The major accident prevention policy should be established in writing and should include the operator’s overall aims and principles of action with respect to the control of major accident hazard”
Activities in support of the SMS are defined in the directive. These include:
- Organization and personnel:Roles and responsibilities of personnel, identification of training needs and the provision of training. The operator should identify the skills and abilities needed by such personnel, and ensure their provision
- Hazard identification and evaluation:includes procedures to systematically identify and evaluate hazards, define measures for the prevention of incidents and mitigation of consequences
- Operational control:documented procedures to ensure safe design and operation of the plant. Safe working practices should be defined for all activities relevant for operational safety
- Management of change: Operating company should adopt procedures for planning and controlling all changes in people, plant, processes and process variables, materials, equipment, procedures, software, design or external circumstances which are capable of affecting the control of major accident hazards
- Planning for emergencies:An emergency plan is required
- Monitoring performance:The operator should maintain procedures to ensure that safety performance can be monitored and compared with the safety objectives defined
- Audit and review:Independent audit of the organization and its processes. Management to keep its SMS under review for essential correction or changes
The above principles have been transferred into national laws in member states of the EU. So in the UK, for example, the directive is implemented as the Control of Major Accident Hazards (COMAH) regulations and has been in force since Feb 1999. The two tier reporting requirements are defined as per the directive. Additionally, all hazardous chemical and other substances used in industry are subject to the Control of Substances Hazardous to Health (COSHH) regulations, 1994.
1.7.4 Are they any legal requirements for tolerable risk targets
Regulations usually leave open the question of how much is safe? The approach that has been widely adopted in industry is to avoid specifying absolute numbers as a measure of safety but rather to use a comparative scale for similar situations or context to use the term seen earlier. We are going to study this issue in chapter 4 but here is brief summary of the subject.
When considering how much risk reduction is needed for a process risk the general approach by safety authorities is to see if the company has followed the principle of ALARP, meaning As Low AS reasonable Practicable.
The ALARP principle is commonly represented by the following “ALARP Diagram”
ALARP diagram based on the version published in IEC 61511-3 Annex A Figure A-1
The ALARP (as low as reasonably practicable) principle recognizes that there are three broad categories of risks:
- Negligible risk:broadly accepted by most people as they go about their everyday lives, these would include the risk of being struck by lightning or of having brake failure in a car
- Tolerable risk:We would rather not have the risk but it is tolerable in view of the benefits obtained by accepting it. The cost in inconvenience or in money is balanced against the scale of risk and a compromise is accepted
- Unacceptable risk: The risk level is so high that we are not prepared to tolerate it. The losses far outweigh any possible benefits in the situation
The width of the triangle represents risk and hence as it reduces the risk zones change from unacceptable through to negligible. Clearly this is following the same principle that we saw earlier in the risk management section. The hazard study and the design teams for a hazardous process or machine have to find a level of risk that is as low as reasonably practicable in the circumstances or context of the application. The problem here is: How do we find the ALARP level in any application?
The procedure is deceptively simple!
The estimated level of risk must first be reduced to below the maximum level of the ALARP region at all costs.
This assumes that the maximum acceptable risk line has been set as the maximum tolerable risk for the society or industry concerned. This line is not always easy to find, as we shall see in a moment
Further reduction of risk in the ALARP region requires cost benefit analysis to see if it is justified. This step is a bit easier and many companies define cost benefit formulae to support cost justification decisions on risk reduction projects. The principle is simple:
“If the cost of the hazardous event is likely to exceed the cost of more risk reduction then more risk reduction is justified.”
The tolerable risk region remains the problem for us. How do we work out what is tolerable in terms of harm to people, property and environment?
In chapter 4 you will find some notes on this subject. The conclusion to be reached is that there are approximate scales of personal risk that are derived from accident statistics and the knowledge of what is reasonably achievable in different types of industries. Similarly in business the risk frequency for major damage to the plant can be found by considering what scale of loss it will represent. For example a 1 million dollar loss every 10 years may be just acceptable without too much long term harm. But a 50 million dollar loss might be the end of the business and you may want to set that chance at 1 in 10,000 years.
The problem here for the risk assessment team is that someone has to set the targets for tolerable risk so that the risk reduction measures can be adopted to meet or better the target by following the ALARP principle. This comes down to a management or corporate responsibility.
One simple way of presenting the targets is to establish a tolerable risk profile or chart describing what levels of risk are acceptable and what levels are not. Figure 1.22 shows an elementary risk matrix chart with the tolerable and unacceptable areas marked. The overlap region in the middle is the “Grey Area” of uncertainty. This where the where the principle of ALARP must be applied for each individual case.
Risk matrix with tolerability bands
1.7.5 Legal requirements for safety instrumentation
The provision of a SIS will fall within the overall safety management system wherever it is claimed to be part of the risk reduction measures. Where there are well-established protection methods for known types of hazards in the workplace, (e.g. for many common types of machines), the regulations usually require compliance or will accept conformity to an approved standard.
When it comes to implementing protection measures or trips the solutions are not usually prescribed directly except in the case of boiler and furnace safety interlocks. For process plants there is no specific requirement for safety instrumentation to be applied but it will frequently be claimed as one of the key safety measures applied to make a plant safe.
The questions then arise:
- How can the company substantiate its claim that the plant has been made safe by the fact that it has a safety instrumented trip system?
- How effective is the safety system?
- How does the company ensure that it is kept in working order
This is where the new international standards for functional safety, IEC 61508 and IEC 61511, provide a comprehensive method of assessment for instrumented safe