There are many different ways to collect datasets in order to determine machine and equipment health. Several popular approaches available to engineering and maintenance professionals include measuring vibration, temperature and supersonic data, as well as data from thermal images.
Collecting datasets is necessary for a number of efficiency-centered maintenance processes and activities such as:
FMEA approaches are employed when designing or analyzing potential failures of a process or service. This step is implemented with the active involvement of maintenance engineers.
Accordingly, it’s important to note that remedial efforts which produce the greatest value are planned (such as during preventive or predictive maintenance).
In principle, RCM and FMEA are analytical methods that can greatly benefit from the analysis of maintenance-related datasets. Analyzing such datasets helps classify parts in terms of their failure probability or even at predicting the asset longevity.
Maintenance datasets can be analyzed in order to extract maintenance knowledge, such as rules that signify the high likelihood of an asset’s failure.
Data analysis for automatic classification, production of end-of-life (EOL) predictions and extraction of rules falls in the realm of data mining and machine learning.
Machine learning hinges on the selection of a proper method and model for the problem at hand. For example, automated classification is based on the use of the collected datasets in conjunction with some classification method. It uses past data about the status of assets (e.g., failed, malfunctioning, working normally) in order to identify the most likely health status of an asset.
There are several classification techniques and algorithms such as decision trees, which use a tree-like graph for modeling classification decisions. Decision trees represent a list of if-then-else statements on the observed data, which ultimately lead to a classification decision.
Because of their simple design, decision trees are quite easy to understand but may not be accurate enough. Consequently, data scientists and statisticians may opt to use other, more effective classification methods.
The selection of a proper machine learning and data mining model is crucial for the credible estimation of parameters, including failure probabilities and EOL. To best achieve these results, workers traditionally employ disciplined methodologies to analyze data and evaluate alternative data mining models.
The most popular methodologies for analyzing and mining maintenance datasets are:
These methodologies are iterative, which means that they are subject to a design, build, deploy and evaluate cycle. This cycle can be applied multiple times in order to boost continuous improvement.
These methodologies are also cross-sector, meaning that they are not only used for industrial maintenance, but can be used in different industrial sectors and applications.
By using these approaches, a user can evaluate the performance of a given model on the supplied datasets, prior to the final selection and field deployment of a data mining algorithm.
The CRISP-DM, is the most popular among the three approaches. CRISP-DM is an iterative methodology comprising six major phases, which are sequential in the sense that each is based on the outcomes of the previous one.
Due to the outcome-based nature of CRISP-DM, it is possible—and in most cases required—to revert from one phase to a previous one. In the case of mining maintenance datasets, the six phases of CRISP-DM are:
This initial phase sets the scene and decides the scope of the data mining activities. It establishes the requirements and goals of the maintenance data mining process, including the expected result.
To this end, the target maintenance-related question has to be formulated. This could be the prediction of a machine’s EOL, based on vibration and ultrasonic data or even the estimation of a failure probability, based on temperature data. In addition to formulating the target maintenance question, a preliminary plan to resolve this question is developed, including the datasets to be used and the models that should be explored.
As part of this phase, datasets are collected and reviewed. For the success of the data mining process, it is very important to inspect the datasets to identify any quality problems. It also helps determine which models could be effective or ineffective. Even though every problem is different, experienced data scientists can prioritize the methods to be tested and evaluated simply by reviewing the available data.
In this phase, the final maintenance datasets to be used for extracting and evaluating the data mining model is prepared. This may involve several transformations to the raw data collected by the sensors, including:
- filtering information (e.g., selecting specific attributes);
- transforming data into different formats, combining datasets (e.g., joining datasets from different sensing modalities); and
- cleaning the data (e.g., getting rid of empty or incomplete fields).
The ultimate objective of this phase is to ensure that the data is ready for data modeling tools.
As already outlined, there are a variety of models that can be used for classification, prediction or even rules extraction. The purpose of this phase is to apply some of the available methods, while also calibrating them by tuning their parameters. Note that each selected model may require different datasets. Therefore, it is common to go back to the data preparation phase in order to select alternative datasets as needed.
Following the development of the data model(s) in the previous phase, this stage performs a thorough evaluation of the operation of the selected models against the target objectives. The evaluation is conducted in terms of the performance of each model. For example, it is tested to determine if a model can produce EOL productions that are very close to the known EOL of assets. However, apart from evaluating a model’s performance, it is also important to assess (at a higher level) whether the business objectives can be met. The assessment helps determine if a model can be moved to production. It is quite common for the data mining team to leave a phase and reevaluate the first step of this process in order to reformulate the data-driven maintenance target at hand.
This phase is concerned with the deployment of successful data mining models in the field. This stage is not confined to the integration of algorithms within platforms and systems like Asset Management (AM) and Enterprise Resource Planning (ERP) systems. In this step, users decide on the best way to presenting the information to maintenance teams, which can include reports and visualizations.
KDD and SEMMA are less popular than CRISP-DM, but still deployed in several data mining settings and applications. They are also staged and iterative. In particular, KDD’s main stages include:
Likewise, SEMMA comprises also five data preparation and processing phases, including the following activities:
One can easily observe a direct mapping between the stages and activities of KDD and SEMMA, as well as their pertinence to the phases of the CRISP-DM. This reveals the similarities of the three methodologies.
One of the major challenges of the data mining process is to assemble the right team that will be in charge of deployment. Indeed, the implementation of the methodology outlined above requires at least the involvement of experts from three different disciplines:
Maintenance workers will remain at the heart of data-driven maintenance processes. Nevertheless, effective mining of maintenance data also requires additional experts (such as data scientists) in areas where there is still a big talent gap in disciplines such as statistics, machine learning and deep learning.
The practice of mining maintenance data and deriving hidden patterns of maintenance is both art and science.
The presented methodologies provide a sound basis for understanding the scientific part. However, they also reveal the importance of a maintenance data scientist’s creativity for processes like inspecting the data, selecting suitable data mining and machine learning models, as well as evaluating the business relevance of the results.
This is the reason why a good multi-disciplinary team is required, including people with field experience. Is your organization ready to efficiently process maintenance data to help achieve maintenance excellence?