Machine Learning: Predictive Modeling for Failure Prevention

Reading Time:

min

Created in:

July 1, 2021

Updated:

4/22/2024

Tools

Data science

Machine learning inhabits the world of data science. This is where prediction problems with unbalanced data are common and occur when there is not a balanced ratio of observations between the possible classes of the response variable.

In addition, there is an even greater challenge, which is when this disproportion of the response variable extends over more than two different classes.

To solve this, you need to enter the field of data science called multi-classification with unbalanced data.

Start this immersion now by reading this case study. We brought a practical example of predictive maintenance addressing the problem to find patterns and predict failures in a particular piece of equipment.

Happy reading!

Machine Learning: An Introduction

Long before data science, around the end of the eighteenth century, industry emerged. In it, it was once very common for a production line to stop because of equipment that unexpectedly stopped working due to some defect.

Over the years, some procedures have been adopted to minimize this type of problem. One of these procedures is the preventive maintenance of equipment, which is still very common today.

How do you do that?

At set intervals of time, a technician performs a check-up on each piece of equipment.

It turns out that relying on this alone does not prevent situations where equipment stops working unexpectedly, and this is a very expensive procedure since it needs dedicated manpower for the job. I'm glad we're evolving!

Nowadays, a new way of dealing with these problems has gained space and popularity in the industrial sector. We are talking about the practice of predictive maintenance, which tries to identify the moment when a piece of equipment will give a problem.

Without a crystal ball, but with the use of sensors and machine learning algorithms, nowadays it is possible to identify when a piece of equipment is about to fail.

In data science, the problem of finding patterns and predicting what failure might occur in a given piece of equipment falls into a field called multiclassification with unbalanced data.

And that's what you're going to read about throughout this study.

Prediction issues with unbalanced data are common in the machine learning universe. They occur in situations where there is not a balanced proportion of observations among the possible classes of the response variable.

Such disproportion creates a challenge in the evaluation of the developed model because, due to the disparity in the volume of observations, the algorithm tends to preferentially categorize the class with more instances, while giving the false impression of being a highly accurate model.

Here are some examples of unbalanced situations you may encounter:

Fraud Detection
Spam Filtering
Mechanical Failure Prediction
Medical diagnosis of cancer
Oil Spill Detection in Satellite Imagery

Let's look at and analyze an example in practice?

Looking at the following image, we have the analysis of the proportion of failures of a product in relation to the total produced:

On machine learning, a column chart image that shows the analysis of the proportion of failures of a product of 3.39% in relation to the total produced. — Machine Learning: Failure Ratio Analysis Chart

And since we're talking about machine learning, let's raise some common problems in this machine learning universe by looking at this example for you.

First point: when receiving a much larger number of products without defects (96.61% versus 3.39% of products with some irregularity), it is very likely that the algorithm is biased at the time of ranking.

Second point: identifying all products as non-defective would give the false impression of a highly accurate model, since it would correctly point out 96.61% of the answers.

Third point: finally, when we want to consider the possibility of identifying multiple classes for the same observation, we have another common problem in the machine learning universe, that of multiclassification.

In data science (and Indicium), we call multiclassification what happens in machine learning when we need to predict a class out of three or more response class options.

This type of classification, of multiclass standard, also has many applications, including text document classification, speech recognition, image recognition, etc.

Most multiclass pattern classification techniques are proposed for the machine learning model to learn from balanced datasets.

And therein lies the danger!

In a number of real-world situations, as we've already mentioned, datasets have an unbalanced distribution of data. This means that some data classes may have few training examples compared to other classes.

Machine learning: the problem

In data science, actual predictive maintenance datasets are often difficult to obtain and particularly difficult to publish. Therefore, we will use a synthetic dataset that simulates the failure situation in industry equipment, available in the UCI Machine Learning Repository, which will feed our predictive maintenance models for the industry.

In the following image, do you notice that, even among the possible types of equipment failures, there is an imbalance that we will need to consider in our predictive maintenance model?

About machine learning, column chart image analyzing imbalance in the possible types of different failures. — Machine Learning: Unbalance in Possible Different Types of Failure

In addition, from our database, 23 observations had 2 different classes associated with it, and 1 had 3 classes.

About machine learning, column chart image demonstrating multiple classes of equipment failures. — Machine Learning: Multiple Classes of Failures

Since there are few items with more than one fault associated with them, and also for the purpose of simplifying the problem, we consider failures that occurred independently.

Machine learning: preparing the data to enter the model

Different models have different sensitivities regarding the type of predictor in the model. After all, how the predictors come in is also important. To reduce the impact of data asymmetry or outliers, data transformations can be done to lead to significant improvements in performance.

Some procedures, such as tree-based models, are remarkably insensitive to the characteristics of the predictor data. Others, such as linear regression, are not.

Tree-based classification models create divisions of the training data, and the predictive equation is a set of logical statements, such as: if the predictor A is greater than X predict the class to be Y. Then the outlier typically does not have an exceptional influence on the model

Data cleansing to remove noise, inconsistent data, and errors also needs to be done. With this, we will obtain a better and more representative dataset to develop a reliable prediction model. It's just that in most prediction models, impure data can affect a model's prediction accuracy. And we don't want that to happen.

Data Balancing

In the last few years of data science, several balancing methods have been developed to solve problems of imbalance between classes. These methods have been broadly categorized into two groups:

Balancing in terms of data
Balancing in terms of algorithm

In machine learning, data-balancing approaches rebalance their distribution by resampling methods.

How?

Increasing the minority class or decreasing the observations of the majority class.

To give you a better understanding, here are some examples of a data approach:

Oversample: Increases the volume of data from minority classes by generating synthetic data or replicating presented occurrences.
Undersample: Reduces the volume of the majority classes, seeking a numerical balance in relation to the rest of the data.

On machine learning, two parallel images of two column charts demonstrating balancing approaches in terms of data. — Machine Learning: **Data Balancing Approaches**

On the other hand, algorithm-based balancing approaches in machine learning modify the parameters and hyperparameters of the model in order to compensate for disparities. These modifications can occur in a variety of ways, such as: cost functions for false negatives (i.e., penalizing the model if you identify a defective product as non-defective) and fit and trend induction.

Evaluation metrics

In machine learning, as in everything in life, you have to know how to choose.

To be able to compare models and assess whether the predictions are good or not, there are a few methods. But, if you choose a method wrongly, it can lead to a misinterpretation of the quality of the model.

Here at Indicium, as performance comparison metrics for this problem, we use the F1-score and the MCC, which are the most used to evaluate this type of model.

Machine Learning: Methods to Solve the Problem

The development of this proposed work in data science takes into account two possible scenarios:

binary rating: 1 for defective cases and 0 for non-defective products; and subsequent multiclass classification among the 5 potential defects (TWF, HDF, PWF and OSF).
A single classification model: developed with the same classes as the previous model and a sixth category called without defect.

In the following diagram, we show you an example of the structure of the experiments to be tested:

On machine learning, a chart showing an example of the structure of the experiments to be tested. — Machine Learning: Experiment Framework Example

Tested Models

Let's now look at four tried and tested methods for you to know.

a) Logistic regression in machine learning

As a prediction method for categorical variables, logistic regression is comparable to the supervised techniques proposed in machine learning (decision trees, neural networks, etc.) and to predictive discriminant analysis in exploratory statistics. You will be able to put both in competition to choose the model best suited to a predictive problem to be solved.

Conceptually speaking, logistic regression in machine learning is either a regression model for dependent variables or a binomially distributed response. It is useful for modeling the probability of an event occurring as a function of other factors. E is a generalized linear model that uses the logit function as a binding function.

b) Decision trees in machine learning

As Kartik Nooney explains, the multilabel decision tree is like the traditional one, but with one change: instead of an object going through a single path in the tree, to be classified, it goes through all the paths that are pertinent to it, and each "leaf" that it reaches is a label that it owns

Decision trees in machine learning are among the most popular inference algorithms and have been applied in various areas, such as medical diagnostics and credit risk. From them, you can extract if-then rules, which are easily understood. The ability to discriminate a tree comes from dividing the space defined by the attributes into subspaces, and each subspace is associated with a class.

c) Ensembles in machine learning

Ensemble (or hybrid) models are composed of two or more prediction models, combining characteristics unique to each. Boosting models are considered the most popular within this set, as they generate an accurate classifier from other lower accuracy classifiers.

The superiority of boosting models in machine learning lies in their serial learning nature, which results in excellent approximation and generalization. In other words, inaccurate classifiers are trained sequentially to reduce errors from previous classifiers.

d) Support vector machine (SVC) in machine learning

SVM belongs to the class of supervised learning algorithms, and because of the division of data into two classes, it is known as a binary classification.

The complexity of SVM due to the increase in the number of classes decreases computing performance. To mitigate this, the multiclass classifier is simplified into a series of binary classifications, such as: One-Against-One and One-Against-All.

Oversample Balancing Methods

Here are five methods of oversample balancing.

Smote (synthetic minority over-sampling technique)
Random over sampler
Borderline smote
K-Means smote
SVMSMOTE
Adasyn (adaptive synthetic sampling approach)

Undersample Balancing Methods

And below, we present four methods of undersample balancing.

Near miss
Condensed nearest neighbour
Tomek links
Edited nearest neighbours

Findings

In machine learning, after preparing the data and choosing the right methods for it, we arrive at results.

Do you know what to do now?

Binary classification and subsequent multiclass classification

For the combinations, we initially run the models for binary classification. We then use the results of the top 10 models as input for multiclass forecasting.

At the same time, we performed a test taking into account only defective products (TWF, HDF, PWF and OSF) in order to point out which models would be the ones with the greatest potential for identification between classes.

The two-step prediction was developed from the 100 different combinations between the models presented above.

Attention here: only products classified as problematic move on to the second phase of the forecast. Automatically, the rest of the data is classified as non-problematic.

The best results from this? We present it below.

Direct multi-classification

The previously listed models were also directly tested (i.e. classified between: ok, TWF, HDF, PWF and OSF). This allowed the comparison between the evaluative metrics of each case.

Get to know the 10 best results obtained below.

Machine learning: what is the conclusion of this study?

Although it allows the combination of efficient prediction models, we conclude that the division of the classificatory processes into two steps impairs the serial learning method of the ensemble models.

In the specific case of this research, predictive boosting methods outperformed any combination among other models. And the Smote, Adasyn, and Tomek methods have been shown to bring better data balancing results.

A tip for you to put into practice: changes in the hyperparameters of each model can also be tested. This can enable greater adequacy of the model to the data and, consequently, an improvement in the result of the evaluation metrics.