TY-E - Reliability Centered Maintenance
The manual is designed to familiarize you with the principles and procees of implementing a Reliability Centered Maintenance (RCM) program. RCM is a systematic and structured process used to decide what must be done to ensure that any physical asset, system or process continues to do whatever its users want it to do.
The manual covers the principles of RCM , including the type of strategies, as well as the tools that are used to facilitate the process. It will assist maintenance managers, reliability engineers and technicians directly involved in maintaining and preserving the function of assets.
Download Chapter List
Chapter 1: Introduction to Reliability Centered Maintenance
Introduction to Reliability Centered Maintenance
1.1 Maintenance perspective
There is a common misconception about maintenance that exists in industries. Maintenance function is considered a “necessary evil” in any industry as it requires costs for its activities. Though budgetary allocations are made to sustain the activities, sometimes budget cuts make maintenance activities more painful and the control over maintenance management is lost. The machine downtime has always affected the productivity; any malfunction may reduce the output, increasing the machine’s operating costs as well as the production costs.
The goal of maintenance is not to preserve equipment as an asset, but to preserve its system function. It may not seem to be a problem to preserve the equipment as a whole. But, contrary to this thinking, it has produced many problems such as being overly conservative in our maintenance actions, performing opportunistic maintenance, and resulting in unnecessary intrusions, more human error is bound to happen.
Like any other profitable department in a business, we can also correct the term “necessary evil” of maintenance into a “profit generator” and if properly performed, it might produce significant results.
Now, it is the time that the management changed its views on maintenance organization and responsibilities. Over the years, maintenance strategies have changed, perhaps better than any other management discipline. The changes were necessitated due to a huge increase in the number of assets like plant, equipment, and buildings, much more complex designs, new products and processes, and new maintenance techniques.
The managers in a business unit have started looking for a new approach to maintenance. They want to avoid pitfalls, which always accompany major upheavals. They also seek a strategic framework, which synchronizes the maintenance and production functions and evaluate them more easily to the advantages of their business units.
The strategic framework, which is wanted by the managers, and its philosophy are more understood by the “reliability-centered maintenance” or RCM in short form, which, if applied correctly, may apparently, transform the relationship between the business unit, their existing physical assets like plant, equipment, and buildings, and the personnel who operate and maintain the plant and the machinery. This RCM framework also enables the assets to be put into effective utilization, and service with great confidence, speed, and accuracy.
Here, before we go into the details of strategic framework of RCM, we may have to understand how maintenance has evolved over the past 50 years.
1.2 The history of maintenance
When the first generation jet aircraft had an alarming crash rate – way back in late 1950s and early 1960s – the studies conducted proved that the fundamental assumption considered by the design engineers and maintenance planners, “every airplane and every major component in the airplane such as its engines had a specific lifetime of reliable service, after which it had to be replaced or overhauled in order to prevent failures,” was wrong in nearly every specific example in a jet airliner. The studies conducted by the North American civil aviation industry, also concluded that many of their maintenance philosophies were outdated in the sense that they were not only expensive, but also highly dangerous for an airborne jetliner.
This realization led the aircraft industries to re-think their maintenance philosophies. Extensive research revealed that only 11% of aircraft components suffered from age-related failures and the balance 89% were most likely to fail when new or immediately after maintenance.
To sort out things, they formed a series of “maintenance Steering Groups” or “MSG.” These groups consisted of the representatives of the aircraft manufacturers, the airlines, and the “Federal Aviation Administration” or “FAA,” and they started to re-examine everything they were doing to keep their aircraft airborne. The formation of MSGs necessitated a whole new approach to determining aircraft maintenance requirements. The first attempt at a rational, zero-based process for formulating maintenance strategies was promulgated by the Air Transport Association in Washington DC in 1968 and is known as MSG-1-1968, which was applied to Boeing 747. A refinement MSG-1, known as the generic MSG-2, was promulgated in 1970.
The term RCM was first used in public papers authored by Tom Matteson, Stanley Nowlan, Howard Heap, and other senior executives and engineers at United Airlines (UAL) to describe a process used to determine the optimum maintenance requirements for an aircraft.
The landmark development in the history of RCM was Stan Nowlan and Howard Heap’s 1978 report called “reliability-centered maintenance,” which remains one of the important documents in the history of physical asset management and the basis of RCM. Nowlan and Heap’s report represented a considerable advance on MSG 2 thinking. It was used as a basis for MSG 3, which was promulgated in 1980. MSG 3 has since been revised four times. Revision 1 was issued in 1988 and revision 2 in 1993. MSG3.2001 and MSG3.2002 were issued in 2001 and 2002, respectively. It is used to develop prior-to-service maintenance programs for new aircraft types (including Boeing 777 and Airbus 330/340).
After being created by the commercial aviation industry, RCM was adopted by the US military (beginning in the mid-1970s) and by the US commercial nuclear power industry (in the 1980s). It began to enter further into other commercial industries, including transport, petro-chemical, mining, steel making, manufacturing, and utilities in the early 1990s.
1.3 Generation of changes in maintenance
Maintenance can be classified as “Generation of changes” since 1930s and it has finally become almost stable, if not permanently, but prone to further changes occurring in the near future.
- First generation maintenance
- Second generation maintenance
- Third generation maintenance
The present-day maintenance is considered as third generation, which has evolved from the first and second generations. The figure below illustrates this evolution.
1.3.1 The First Generation thinking till 1950s
The first generation covers the thinking of maintenance philosophies prior to World War II. As the industry was not mechanized very highly, any downtime was not considered important and they were adopting the strategy called “Fix it when it’s broke.” Moreover, the equipment and the machinery designed in those times were simple in construction and many times they were over-designed and were considered reliable and easy to repair. A systematic maintenance and the skill level of maintenance personnel were not the priority in those days. A method of lubricating, cleaning, and servicing was found to be adequate.
1.3.2 The Second Generation thinking till 1975
During World War II, as the supply of industrial manpower dropped sharply, a steady increase in the demand for mechanized equipment and machinery became the top priority to be reliable on the war field. The maintenance philosophies needed a change for the equipment and machinery. The focus shifted from first-generation thinking to the second-generation thinking such as
- availability of plant and machinery for maintenance function.
- expected longer life and durability of equipment and machinery.
- manufacture of machines at a lower cost.
Even after World War II, this mechanization became utmost important and all types of machines were mechanized and the dependence grew steadily. By the 1950s, more complex machines were built. Any downtime of these complex machines incurs extraordinary expenditure. The cost of maintenance also started to rise sharply, relative to other operating costs. As this mechanization grew, downtime came into sharper focus. Equipment failures needed to be prevented by some means, which led, in turn, to the concept of preventive maintenance and maintenance planning and control systems.
Since the increase in capital cost of the machinery, as fixed assets and locked up capital on these fixed assets, the focus of industrial management shifted to thinking along the lines of maximizing the useful life of the machinery and this led to the “The Third Generation Thinking.”
1.3.3 The Third Generation thinking after 1975
After 1975, the momentum of process change in industry grew at a faster rate and the new researches, techniques, and expectations started evolving. The changes can be classified under the following headings:
- Availability of plant for maintenance and the reliability of machinery
- All possible safety standards are followed
- Better product quality must be achieved by using these machines
- Environmental concerns must be taken into account
- Cost effectiveness of the product manufactured by these machines
- Durability of these machines and high life expectancy
1.4 Evolution of maintenance processes
Maintenance processes are classified as follows:
- Primitive maintenance
- Preventive maintenance
- Predictive maintenance
- Pro-active maintenance
1.4.1 Primitive maintenance
As explained in first-generation maintenance thinking, the equipment and machinery designed in those times were simple in construction and many times they were over-designed and were considered reliable and easy to repair. Systematic maintenance and skill level of maintenance personnel were not a priority in those days. A method of lubricating, cleaning, and servicing was found to be adequate. This is considered as “primitive maintenance.”
1.4.2 Preventive maintenance
Over a period of time, industries realized that whenever any equipment breaks down, the cost of maintenance escalates and more time is needed for repair. Along with the cost of production loss due to downtime, the total cost of maintenance and downtime seemed extremely high, necessitating a rethinking of primitive maintenance and resulting in the evolution of a new concept called “preventive maintenance.” Industries started adopting a policy of time-based maintenance, in which, the equipment can be shutdown for shorter periods so that minor repairs and adjustments can be incorporated to reduce the frequency and duration of breakdowns.
These preventive maintenance steps adopted allow the inspection of machine operations, its behavior on the run, routine oil change, filter change, lubrication, etc., to take care of the equipment before it fails.
1.4.3 Predictive maintenance
Though preventive maintenance may produce some results, it cannot prevent certain types of failures. The ageing of machinery, wear and tear, improper operation, improper lubrication, some mechanisms and linkage failures, high operating temperature, and fatigue failures all contribute to unexpected breakdowns. To prevent these kinds of failures, we will have to monitor the condition of the machinery in operation continuously, predict the failures that are bound to occur, and use proper techniques to anticipate the failures. This kind of maintenance is called “predictive maintenance.” The following are some of the common techniques:
- Vibration analysis
- Thermo-graphic or infrared temperature monitoring
Information about the machine’s operating conditions by the above techniques will minimize the breakdowns to a low level. By this method, we may expect the machine breakdown reduced to 90% or more. Any hidden problem is noticed before it can develop into a major problem. Continuous monitoring may help to improve the situation. Condition-based monitoring (CBM) and the use of CMMS techniques are essential parts of this maintenance.
The final step of the evolution process is involving both operators and maintenance personnel together with the condition analyst, who will assist the team in solving the machine problems and provide necessary steps to be taken for increasing the effective maintenance function as well as preserving the system function. Involving machine operators in the team is justified in the sense that they are the ones who know first when something goes wrong with the equipment. In fact, it is the fastest way to find out the troubles. The information is then passed on to the maintenance team, who normally take time to prepare to tackle the problems. Some of the basic tasks in which the operators can be engaged are as follows:
- Basic lubrication
- Routine cleaning of equipment
The maintenance team, after receiving communication from the operator, can concentrate on refining the predictive monitoring and trending of the equipment. They also will have more time to concentrate on equipment failure analysis, which will prevent future or repetitive problems on the equipment. This step increases not only the availability of the equipment but also the reliability over its useful life. This type of maintenance is called “pro-active maintenance.”
The essential resources needed to achieve “ pro-active maintenance” can be classified as follows:
- The team must have sufficient training to learn the functionality of pro-active maintenance.
- Sufficient time must be allowed for analyzing and trouble-shooting.
- There must be management support and understanding to form a team for root-cause analysis.
A laboratory set-up is essential for testing, analyzing, and providing the results.
1.5 Definition of maintenance and RCM
Let us understand the difference between the two terms used: maintenance and RCM. There are numerous dictionaries and websites defining the above terms.
- The action taken to protect, preserve, or restore the as-built functionality of any facility or system.
- Actions performed to keep some machine or system functioning or in service.
- Activities required to conserve as nearly and as long as possible the original condition of an asset or resource while compensating for normal wear and tear.
- Actions necessary for retaining or restoring an equipment, machine, or system to the specified operable condition to achieve its maximum useful life. They include corrective maintenance and preventive maintenance.
- Ensuring that physical assets continue to do what their users want them to do.
From the above definitions, we may conclude that maintenance means preserving something. When we intend to preserve something, what is it to preserve? It is not only to preserve the physical assets but also its system function or functions continuously to fulfill our expectations.
1.5.2 Reliability centered maintenance
- What the users want will depend on exactly where and how the asset is being used (the operating context). This leads to the following formal definitions of RCM: A process used to determine what must be done to ensure that any physical asset continues to do what its users want it to do in its present operating context.
- RCM is a process to establish the safe minimum levels of maintenance.
- RCM is an engineering framework that enables the definition of a complete maintenance regime.
RCM process is a cost-effective strategy for maintenance of physical assets and is sure to achieve a precise maintenance strategy, higher reliability, and a slimmer budget, and the maintenance personnel who carry out this process are much more interested in it.
Reliability improvement methods start with the strategy of identifying the preventive maintenance tasks that are both cost-effective and technically correct in maintaining equipment function. Increased equipment reliability leads to improved system and facility availability. Reliability improvement methods also create a documented basis for Preventative Maintenance (PM) tasks performed on equipment.
The RCM and its methods or processes can be classified into the following:
- Classical RCM
- Streamlined RCM
- PM Optimization
Classical RCM, in use by commercial airlines, the military, and nuclear facilities, has been used to improve equipment reliability and availability. Because of safety concerns, all equipment are needed to be included in the evaluation for executing a classical RCM.
It essentially creates a new preventive maintenance program rather than enhancing or revising the existing program. Due to the complex method employed in this process, a group of specialized engineers are needed to run and the results are passed on to the maintenance department for implementation.
Classical RCM is a labor-intensive program, and performing and implementing it became a major hurdle due to the complexity of excessive documentation, inflexible process steps, and hard- to-understand process basics. Although excellent benefits can be delivered by using classical RCM, many facilities have decided not to proceed with this methodology because of its significant cost and low success rate.
The timeframe and cost involved in executing classical RCM takes as long as 6 years to complete and a huge budget of $70,000 a system. Therefore, the usage of classical RCM is limited to industries where absolute safety is a major concern.
Streamlined RCM, sometimes called modified classical or reliability-based maintenance, maintains the same methods as classical RCM. The major difference is that streamlined RCM evaluates only a portion of the plant, focusing on systems that are prescreened as “important.” The timeframe and cost involved in executing streamlined RCM takes as long as 2 years to complete and a huge budget of $ 40,000 a system.
PM optimization is a new evaluation and is framed from the experiences and drawbacks learnt from both classical and streamlined RCM. This new evaluation focuses on a rapid evaluation cycle and high craft involvement, while maintaining many of the classical RCM methods.
PM optimization employs many of the same analysis techniques as RCM. However, PM optimization is a more efficient approach. RCM starts at the top with a system that identifies the critical equipments. The existing PM procedures are then broken down into tasks and reviewed to identify the failure for which it is intended to prevent. Related data is then collected and evaluated and then compares those recommendations to existing PM tasks from which final task recommendations are made.
In PM optimization, the plant condition and performance can be successfully enhanced through a carefully planned and executed program. Significant cost benefits are from three major areas of concern such as inventory reduction of both capital equipment and spares to maintain, reduction of downtime of equipment and its cost involved, and the cost of maintenance management to manage the issues involved. The timeframe and cost involved in executing PM optimization takes a year and the ROI is considered to be one sixth of a classical RCM program.
RCM: The seven basic questions
Before we embark on RCM process, we need to know the functions to preserve the assets, their failures, consequences of failures, and how they can be prevented. It is also helpful to know the default actions carried out by a maintenance department before introducing an RCM Process.
According to the SAE JA1011 standard, which describes the minimum criteria that a process must comply with in order to be called “RCM,” an RCM process answers the following seven essential questions:
The answers to these questions are dealt with more elaborately in the following.
A-1 – Functions and performance standards
The first thing to do is to ensure that the physical asset continues to do whatever its users want it to do in its present operating context by:
- determining what its users want it to do
- ensuring that it is capable of doing what its users want.
The first step in the RCM process is defining the functions of each asset in its operating context, together with the associated desired performance standards.
There are two types of functions:
- Primary functions – why the asset was installed in the first place (speed, output, carrying or storage capacity, product quality, customer service, etc.).
- Secondary functions – which recognize that every asset is expected to do more than simply fulfilling its primary functions (safety, control, containment, comfort, structural integrity, economy, protection, efficiency of operation, compliance with environmental regulations, appearance, etc.)
A-2 – Functional failures
In the second step, we must identify the following:
(a) What kind of failure can occur in any asset without fulfilling the performance standards as required by its users?
(b) What is the effective strategy that maintenance can adopt to avoid this failure?
In RCM process, failed states are known as functional failures because they occur when an asset is unable to fulfill a function to a standard of performance, which is acceptable to the user.
A-3 – Failure modes
In the third step, after functional failures have been identified, we must identify the “failure modes,” which are likely to cause these failures.
These failure modes and the causes include:
- Failures, which have occurred on the same or similar equipment operating in the same context.
- Failures, which are currently being prevented by existing maintenance procedures.
- Failures, which have not happened yet but which are considered to be real possibilities:
- Failures caused by deterioration or normal wear and tear.
- Failures caused by human errors (either from operators or maintenance personnel).
- Faulty design.
A-4 – Failure effects
The fourth step is to list failure effects that describe what happens when each failure mode occurs. This should include all the information needed to support the evaluation of the consequences of the failure such as the following:
X.1 What is the evidence that the failure has occurred?
X.2 In what ways does it pose a threat to safety or the environment?
X.3 In what ways does it affect production or operations?
X.4 What physical damage is caused by the failure?
X.5 What must be done to repair these failures?
A-5 – Failure consequences
The fifth step is to recognize the consequences of failures, which may initiate pro-active maintenance to reduce the consequences of failures, if not totally avoid.
The RCM process classifies these consequences into the following groups:
Hidden function – failure will not become evident to operators under normal circumstances if it occurs on its own.
Evident function – failure will become evident to operators under normal circumstances with four types of consequences:
- Safety: A failure mode has safety consequences if it causes a loss of function or other damage that could injure or kill someone.
- Environmental: A failure mode has environmental consequences if it causes a loss of function or other damage that could lead to the breach of any known environmental standard or regulation.
- Operational: A failure has operational consequences if it has a direct effect on operational capability.
Non-operational – any evident failure not included above.
The RCM process requires the assets’ failure consequences to be described for every failure mode and in what way the failure matters in areas of risk assessment.
Risk assessment is done by using a risk priority number (RPN). Referring to the graph below, the RPN is assigned a numerical value and rated on a scale 1 to 10. The value assigned indicates the probability of failure relative to its size and the priority of each significant failure.
The objectives of the RCM process are to reduce the RPN in terms of high priority and low priority lines and are indicated in the graph.
A-6 – Proactive tasks
It is still believed that the best way to optimize plant availability is pro-active maintenance on a routine basis. Second-generation wisdom suggested that this should consist of overhauls of component replacements at fixed intervals.
This diagram shows the fixed interval view of failures.
The graph indicates that most assets function reliably for a period of time and then wear out. Classical thinking suggests that extensive records about failure will enable us to determine this life and so make plans to take preventive action shortly before the item is due to fail in future.
A-7 – Default tasks
Default tasks are carried out when it is not possible to identify an effective pro-active task, and include the following:
These are fully explained in subsequent chapters.