On the journey towards maintenance excellence, conventional wisdom places a heavy emphasis on the efficient planning, scheduling, and execution of work. We are told to “maximize resource utilization” and work as efficiently as possible to “keep the backlog under control.” But have you ever taken a step back to ask: why are we even doing these maintenance tasks in the first place? Are the inspections that we carry out every month actually a good use of resources? You can be as efficient as you want in executing maintenance – but if you are doing the wrong maintenance, then you are still wasting your time.
What matters more is that you design an effective Maintenance Strategy – i.e. a program of maintenance tasks specifically targeted at relevant failure modes. This way, every routine task is a justifiable use of time and resources. Your Maintenance Strategy should be the foundation of your entire maintenance process, with all routine maintenance tasks flowing from it.
In this article, I want to focus on two specific steps from Figure 1 that are essential for designing and improving an effective maintenance strategy: Criticality Analysis and Vulnerability Analysis.
A Reliability Engineer might be responsible for thousands of assets. It’s not possible to give them all your undivided attention, so you need to prioritize. Criticality Analysis is a method for ranking your assets based on importance – i.e. which ones are more critical than others. This assessment is based on the consequences of failure. If an asset fails, how much of a problem will it cause? How much production will be lost? Will anybody get hurt? Will we fall out of compliance with regulations or standards? Each of these questions and their range of possible responses is given a value, which is weighted according to the context of the business and the operating conditions. These values are then summed to give an overall criticality score for each asset.
Once you have determined which assets are the most important, you can use this information to influence the design of your Maintenance Strategy. More resources can be allocated to those assets with higher criticality, and those with low criticality can be justifiably ignored.
How do you know if your strategy is working? It’s not enough to just set an initial plan and hope for the best, because you won’t always get it right first time. You need to continuously monitor the results and make adjustments where necessary to generate improvement. This is where we can apply the feedback loop from Figure 1. After tasks from your strategy are executed and closed out, data is collected and analyzed to understand the effectiveness of the strategy.
But exactly what kind of data do we need to determine if the strategy is effective? To answer this question, think about what the strategy is trying to prevent – i.e. asset failure. This means that we need to know which assets are vulnerable to failure. What is their current health? If the strategy is working, then they will be running smoothly without defects. If the strategy is wrong, then there should be some warning signs that the asset is more likely to experience a failure. By monitoring these warning signs, or vulnerability, we get an indication of how effective the strategy is. Vulnerability can be determined by a combination of several different factors, such as failure rate, routine maintenance compliance, a sufficient stock of essential critical spares etc. As with criticality, each of these factors can be given a weighted numerical value, which can be summed to give an overall vulnerability score for each asset.
If we consider that criticality is a measure of the consequences of failure, and vulnerability is a measure of the likelihood of failure, then it follows naturally that multiplying these values together will give a measure of risk. If you plot these two parameters on a graph, like in Figure 2 below, you get a good picture of overall Asset Risk, where Risk = Criticality x Vulnerability.
What we can achieve from this chart is a clear picture of assets that are both critical and vulnerable – i.e. those which are close to failure, and yet also have a high consequence of failure (i.e. the top-right section of Figure 2). Such visibility is extremely useful, as you can instantly see where your highest risks are, allowing action to be taken to adjust your strategy (e.g. increase maintenance scope or frequency), to bring such risks back under control.
This level of visibility of asset risk is extremely useful, but it is difficult to achieve in practice. Calculating criticality and vulnerability would be easy if you only had to look after one asset – but with thousands, how do you keep track? Consider also that vulnerability is especially difficult to pin down, because it is constantly changing. From one day to the next, an asset’s current health will change, as maintenance is carried out and defects accumulate etc. Getting a live overview of asset risk, across a large asset portfolio, is nearly impossible.
Or at least, it was. Much of my time at Prometheus Group has been spent working with our development team to create a new software solution called Total Asset Optimizer (TAO), which is designed specifically to solve this problem. It provides Criticality Analysis, Vulnerability Analysis, and visualizes overall Asset Risk, directly inside of SAP:
In managing your maintenance processes, don’t overlook the vital role that your Maintenance Strategy plays in making sure that your planners, schedulers, and technicians are focusing their attention in the right places. Developing and improving an effective Maintenance Strategy requires good visibility of criticality, vulnerability, and risk, across your entire asset portfolio. Previously, this was difficult to achieve in practice. But with Total Asset Optimizer, it is now possible to have all of the right information available in real time, in a highly visible format, so that effective decisions can be made.
I want to learn more about Prometheus Total Asset Optimizer!