Introduction
Visual inspection is traditionally used to detect pavement distress, whereby specialized personnel walk along a road while making on-site measurements. Thus, novel alternative technologies are required that can be used to analyze of the behavior of road infrastructure 1. Among the different inspection methods are manual, deflectometer-based, vibroacoustic, penetration-radar-based and automated. The traditional (manual) method is one of the most used techniques but can be tedious, subjective and dependent on the experience of the evaluator 2. The Dynaflect is a deflectometer that measures the amount of deflection caused by the oscillation of weights, employing geophones. The oscillation of weights cause the pavement to deflect and rebound in a manner similar to a vehicle driving over a pavement surface 3. In the vibroacoustic method, sensors are employed to detect structural variations in pavement resulting from hidden damage before the damage propagates along the road surface 4. Penetrating radar sends electromagnetic waves into pavement materials, and variations in the dielectric constant caused by the presence of a new material or deterioration is reflected in the response signal 5. In automated methods, a vehicle fitted with cameras is used to acquire data for the pavement surface 6.
PIAS is a pavement image acquisition system consisting of light sources, a camera and global positioning systems. This system captures data for a road surface that is processed by a computer using multi-resolution methods for crack detection 7. RIEGL VMX-450 is a system consisting of laser scanners and cameras that are mounted on a vehicle grill to automatically detect road cracks 8. ARAN is an automated road analyzer that uses a high-density synchronous flash to identify pavement cracks 9. These technologies involve image processing techniques and pattern recognition. For example, 10) used various techniques, such as low-pass filters, an edge detection algorithm and mathematical morphology. 11, cracks in concrete bridges were identified by semantic segmentation of images acquired by mobile devices. 12, concrete structures were subjected to load testing, and cracks in the structures were detected by morphological processing: a Gaussian filter was applied followed by the Bottom hat transform to detect detail elements that were subsequently segmented using the Otsu method. Other techniques such as Wavelet Scattering Transform (WST) have been used for pattern recognition in texture discrimination to obtain irises information 13.
Machine learning algorithms provide the best results for classification and pattern recognition 9. 14, pavement cracks were identified using image processing, fuzzy logic and artificial neural networks (ANNs) with the backpropagation learning algorithm. 15, pavement cracks were detected and classified by implementing the artificial bee colony algorithm and ANNs. 16, pavement cracks were classified by applying different machine learning algorithms, such as the support vector machine, random forest and ANN. 17, longitudinal and transverse cracks were detected and classified by implementing image processing techniques and ANN, where the images were acquired through ARAN.
In this study, a new methodology was proposed for the detection of pavement distress by implementing techniques such as Wavelet Scattering transformation and Hu invariant moments. The combination of the wavelet scattering transform and Hu invariant moments introduces an innovative methodology for pavement damage analysis. First, the wavelet scattering transform provides a multiresolution analysis of the input image, allowing for the detection of cracks at different scales. The resulting scattering coefficients capture the texture and structural information of the image, making it easier to distinguish between the cracks and other image features. The use of Hu invariant moments further enhances the accuracy of crack detection by providing rotation, translation, and scale invariant features. This means that the same crack can be identified regardless of its orientation or location within the image. In addition, Hu moments are relatively robust to noise and variations in lighting conditions, which can be a common challenge in pavement inspection.
Another advantage of this study is its relatively low computational cost compared with other pattern-recognition techniques. The WST can efficiently analyze the input image, and the Hu moments can be quickly calculated from the resulting edge detections. Additionally, the combination of the wavelet scattering transform and Hu invariant moments offers a powerful and efficient approach for crack detection in pavement inspection. This methodology is a promising option for future research and practical implementation in pavement-management systems. This paper is organized as follows: Section 2, Materials and Methods; Section 3, Results; Section 4, Discussion; and Section 5, Conclusions.
Materials and methods
The methodology for the detection and classification of pavement distress is implemented in four stages: 1. Image acquisition, 2. Image processing for contrast enhancement and edge detection, 3. Feature extraction and 4. Deterioration classification using an ANN.
Image acquisition
A set of 300 road images was used in this study. The images were obtained using a mobile device and have different resolutions, lighting variations and different types of deterioration: potholes, longitudinal cracks and alligator cracking. Some images were also acquired from the SDNET2018 dataset of Utah State University 18. The images were captured on a dry pavement under daytime lighting conditions, which allowed for natural variations in lighting. The images can be either in color or grayscale format, providing flexibility in terms of image representation. Furthermore, it was essential that the longitudinal cracking in the pavement was vertically oriented. This requirement ensured consistency in the analysis and facilitated accurate detection and classification of longitudinal cracks. The complete image set was divided into 240 images for training and 60 images for validation purposes similar to the work by 16. Table 1 lists the information on the dataset employed.
Image processing
Preprocessing is required to remove distortion and enhance contrast in images. Power functions are used to remove the effects of nonuniform intensity in the image background by enhancing the contrast 19. For this purpose, the image is converted to grayscale and subdivided into several square windows. The average grayscale value of each window is considered its representative grayscale 17.
The wavelet scattering transform is invariant to rotation, translation and scaling and has therefore been applied to texture discrimination. This transform reduces the variability between features and the quantity of information. The transform is implemented using cascaded wavelet convolutions. The first-order scattering coefficients 𝑆1𝑥 are defined in Eq. 1 20,21.
Where 𝑈1?? is the modulus of the wavelet coefficients, 𝜙𝐽 is a scaling function that is calculated using Eq. 3, and 𝑥 denotes the input image 21.
The mother wavelet 𝜓1(𝑢) is calculated using Eq. 4, in which the wavelet is scaled by 2𝑗1, where 𝑗1 is an integer or a half-integer that is rotated by 𝛳1 =2𝑘𝜋 with 0≤𝑘<𝐾121.
The Wavelet Scattering transform is applied to the contrast-corrected images to identify features, thereby reducing the image size, and highlighting the edges.
Edges are basic image features. An edge corresponds to a set of pixels that reflect change and discontinuity in an image. A margin or edge is one of the most important features considered in image segmentation 22. Canny and Prewitt is one of the various edge detection methods available.
Prewitt method: Convolution is used to identify the edges in an image Two directional filters are used to determine the pixels of horizontal 𝑃𝑦 and vertical 𝑃𝑥 edges 22 and are given by Eq. 5 and Eq. 6 23,24.
The total gradient 𝑃𝑝 is the combination of the two filters and is calculated using Eq. 7, where 𝑁𝐼𝑃 are the pixel values of a 3x3 size image 24.
Canny method: The Canny method consists of applying a Gaussian convolution to an image to remove most of the noise. The first derivative is calculated to detect locations with intensity discontinuities in the image. Eq. 8 and Eq. 9 are used to calculate the edge intensity and direction for each pixel of the smoothed image 24,25.
In the equations presented above, 𝐺𝜎 is the Gaussian function with a variance 𝜎2, 𝑚 and 𝑛 denote the location of a pixel in an image and 𝑓(𝑚, ??) are the pixel values of an image of size (𝑚, 𝑛) 24.
A single method cannot be used for edge detection of different distress types. Potholes and alligator cracks are most effectively detected using the Prewitt method, whereas Canny is most effective for detecting longitudinal cracks. All images are preprocessed with both edge detection methods.
Morphological operators are commonly applied in image processing. The elements in an image are collected and analyzed. Mathematical morphology is used to modify the shape of objects through interaction with neighboring objects, reducing noise and joining fragments that appear during edge detection 14,26,27. Erosion and dilation are among the basic morphological operations used. Dilation combines all the elements of the neighboring region into a single element, filling in gaps as objects dilate and grow. Erosion removes all neighboring pixels if a single pixel value is zero, thus reducing the size of objects 28.
The edge-detected images are subjected to gap filling and denoising at different thresholds to identify the optimal threshold for noise removal. A single threshold value cannot be used for all three types of deterioration. Thus, different thresholds are applied according to the type of deterioration, as shown in Table 2.
Feature extraction
Feature extraction directly impacts the classifier performance and is therefore one of the most important and critical steps in classification 29. The Hu moment invariants are seven invariants with respect to image rotation, translation and scaling that are calculated by normalizing the third-order central moments 30. The normalized central moments are defined in Eq. 10:
Where 𝑝+𝑞 is the order of the geometric moment 31. The seven Hu moment invariants are defined in Eq. 11 to Eq. 17.
𝑝+??=2
𝑝+𝑞=3
The first four moment invariants are used in feature extraction, because discrimination between classes is not possible using the fifth and higher moments. A 12-dimensional feature vector is composed by concatenating the first four moment invariants of three images produced by mathematical morphology.
ANN classification
Damage was classified using a multilayer perceptron (MLP) class of an ANN with a backpropagation learning algorithm 14. Neural networks are powerful machine learning models that have shown success in various classification tasks, including image analysis. They are capable of learning complex patterns and relationships within data, which is crucial for accurately classifying different types of pavement damage. The MATLAB Statistics and Machine Learning Toolbox was used for this purpose 32.
Results and discussion
Image processing
As the considered images had different sizes, the processing time varied with the image dimensions. The maximum processing time of 3 minutes and 10 seconds was determined by the wavelet scattering transform, the most time-consuming process. This transform highlighted the deteriorations by producing data representations that minimize differences within a texture type and discriminate textures, thereby improving edge detection. Figure 1 shows the images obtained at different processing stages for each type of deterioration.
ANN classification
The dataset was divided into 80% training and 20% validation. Using 80% of the dataset for training, the model could obtain sufficient information from a sufficiently large sample size to accurately identify the patterns and features of the images. The remaining 20% of the dataset was used for validation to test the performance of the model on new data.
To identify the best classifier, 20 neural networks were created with varying hidden layer configurations to identify the best classifier. The training dataset consisted of 80 images for each type of deterioration (potholes, alligator cracks, and longitudinal cracks). By exploring different hidden layer configurations, the study aimed to determine the optimal network architecture that would yield the highest classification performance. This approach allowed for a comprehensive evaluation of the neural networks and their ability to accurately classify pavement deterioration. This provided valuable insights into the impact of the hidden layer configuration on the performance of the classifiers. By comparing the accuracy and precision metrics of each network, it was possible to identify the neural network with the highest overall performance. The input layer was composed of a 12-dimensional feature vector based on the Hu moment invariants for three images obtained in the processing stage. The output layer was composed of three neurons coded according to deterioration type. The 10 neural networks with the lowest mean square error was used for validation. Figure 2 shows the classifier results.
The selected neural network had 12 neurons in the input layer, 14 neurons in the hidden layer and 3 neurons in the output layer. This network had a mean square error of 0.056, which was not the lowest value among the 20 networks, but it had the highest overall accuracy of 95.56% and a precision of 94.44%. Sensitivity and specificity performance metrics were also calculated for each type of deterioration. The sensitivities were 100% for both alligator and longitudinal cracks and 80% for potholes, that is, potholes were confused with alligator cracks. The specificities were 100% for both longitudinal cracks and potholes and 90% for alligator cracks. The F-score precision measure provided further insight into the classification performance for each type of deterioration. The F-score precision was calculated to be 0.88 for potholes, 0.90 for alligator cracks, and 1.0 for longitudinal cracks. These values indicate a good balance between precision and recall for all types of damage, with the highest precision achieved for the longitudinal cracks.
Table 3 shows the confusion matrix obtained for the selected network with the 12 14 3 configuration. This result was obtained using a validation dataset of 60 images, with 20 images for each type of deterioration. Potholes were confused with alligator cracks, this can be attributed to the presence of both types of damage in many images, which introduced noise and made it challenging for the network to differentiate between them accurately.
Deterioration | Potholes | Alligator cracks | Longitudinal cracks |
---|---|---|---|
Pothole | 16 | 4 | 0 |
Alligator cracks | 0 | 20 | 0 |
Longitudinal cracks | 0 | 0 | 20 |
The processing time ranged from 3.88 seconds for an image of 256x256 pixels to 3 minutes and 10 seconds for an image of 4624x3468 pixels. The time variation resulted from the difference in the sizes of the processed images and the properties of the hardware used (a computer with an Intel Core i5 processor, 8 gigabytes of RAM and a 500-gigabyte hard drive). The process that consumed the most computational resources was the wavelet scattering transform.
The integration of the wavelet scattering transform and Hu moment invariants provided a comprehensive representation of the crack and pothole features, capturing both local and global characteristics. This feature representation, combined with the learning capabilities of neural networks, enables the accurate classification of different types of pavement damage.
The classification performances of different neural network configurations for pavement deterioration are compared in this study. The neural network used in this study had a 12 14 3 configuration and used a backpropagation learning algorithm. The overall classifier accuracy and precision were 95.56% and 94.44%, respectively, although there were some instances of confusion between potholes and alligator cracks. Previous studies on pavement crack classification have also utilized neural networks with different configurations. For example, in a study by 17, a neural network with a 2 10 2 configuration achieved an accuracy of 92.5% for classifying alligator and longitudinal cracks using a dataset of 400 images previous to this preprocessing to images employing background correction to image and Otsu method for segmentation to image.
Another study 15 used a neural network with a 3,8,3 configuration to classify 600 images of longitudinal cracks, transverse cracks, and potholes with an accuracy of 97.5%. Similarly 16, a classifier accuracy of 84.25% was obtained for the classification of longitudinal, transverse, and alligator cracks and no deterioration using 200 images 150 × 150 pixels.
Additionally, 14, an ANN with a backpropagation learning algorithm and a 2,13,1 configuration produced a classifier accuracy of 98.81% for cracked and uncracked images. Another study by 33, a classifier accuracy of 79.5% was obtained for crack detection in concrete. 34, an algorithm was used to classify images of concrete with wide and narrow cracks and no cracks. Two types of concrete were considered: one type had been exposed for 10 years in Ottawa and the second type was stored in the GRAI laboratory. Image processing resulted in a classifier accuracy between 71.4% to 76.5% for the exposed concrete and from 68.7% to 76.9% for the concrete in the laboratory. Table 4 presents the classification results obtained in different studies. These findings emphasize the importance of considering different methodologies and configurations when approaching pavement crack classification.
Study | Classifier accuracy (%) | ||||||
---|---|---|---|---|---|---|---|
Longitudinal cracks | Transverse cracks | Block cracks | Alligator cracks | Potholes | No cracks | Total | |
17 | 87.5 | - | - | 97.5 | - | - | 92.5 |
15 | 97.5 | 100.0 | - | - | 95.0 | - | 97.5 |
16 | 89.5 | 82.0 | - | 77.5 | - | 88.0 | 84.2 |
35 | 98.0 | 90.0 | 88.0 | 88.0 | - | 100.0 | 95.2 |
36 | 82.9 | 90.8 | - | 80.3 | - | 85.4 | 84.8 |
37 | 95.0 | 94.2 | 93.5 | - | - | - | 92.7 |
Authors | 100.0 | - | - | 93.3 | 93.3 | - | 95.5 |
Conclusions
A methodology was developed to detect pavement cracks and potholes. Pattern recognition techniques, including the wavelet scattering transform and neural networks, were used to detect deterioration in the form of potholes, alligator cracks and longitudinal cracks. A classifier accuracy of 95.56% and a precision of 94.44% were obtained using a neural network with a 12,14,3 configuration. The classification accuracy was 93.33% for both potholes and alligator cracks and 100% for longitudinal cracks.
The combination of the wavelet scattering transform, Hu moment invariants, and neural networks demonstrates the potential of advanced pattern recognition techniques for automated pavement inspection and maintenance. This study contributes to the advancement of pavement inspection technologies and highlights the importance of leveraging innovative approaches for the efficient and accurate analysis of pavement cracks and potholes.
Despite the relatively small size of the datasets used in this study, promising results were obtained. The performance of the employed methodology demonstrated its effectiveness in detecting and classifying pavement damages. Although a larger dataset would enhance the generalizability of the findings, the obtained results provide a solid foundation for further exploration and validation. Future studies can build on these initial findings and consider expanding the dataset to strengthen the robustness and reliability of the proposed approach.
In future studies, different sensors could be used to obtain data to analyze other crack properties, such as depth. Analyses of other types of deterioration, such as transverse and block cracks, are recommended. It is also recommended that other pattern recognition techniques be used for detecting deterioration in the presence of other objects. In addition, the potential benefits of ensemble models should be investigated in the context of pavement damage classification. The use of ensemble models can potentially enhance the classification accuracy, robustness, and generalization capabilities of the system, making it more effective in real-world scenarios.