Object Recognition Through Artificial Intelligence Techniques

Ramírez-Arias Ph. D., José-Luis; Rubiano-Fonseca Ph. D., Astrid; Jiménez-Moreno Ph. D., Robinson; Ramírez-Arias Ph. D., José-Luis; Rubiano-Fonseca Ph. D., Astrid; Jiménez-Moreno Ph. D., Robinson

doi:10.19053/01211129.v29.n54.2020.10734

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Citado por Google
Similares em SciELO
Similares em Google

Mais
Mais

Permalink

Revista Facultad de Ingeniería

versão impressa ISSN 0121-1129versão On-line ISSN 2357-5328

Rev. Fac. ing. vol.29 no.54 Tunja jan./mar. 2020 Epub 01-Fev-2020

https://doi.org/10.19053/01211129.v29.n54.2020.10734

Papers

Object Recognition Through Artificial Intelligence Techniques

Reconocimiento de objetos a través de técnicas de inteligencia artificial

Reconhecimento de objetos através de técnicas de inteligência artificial

José-Luis Ramírez-Arias Ph. D.¹
http://orcid.org/0000-0002-7126-5378

Astrid Rubiano-Fonseca Ph. D.²
http://orcid.org/0000-0002-8894-7121

Robinson Jiménez-Moreno Ph. D.³
http://orcid.org/0000-0002-4812-3734

¹Ph. D. Universidad Militar Nueva Granada (Cajicá-Cundinamarca, Colombia).

²Ph. D. Universidad Militar Nueva Granada (Cajicá-Cundinamarca, Colombia).

³Ph. D. Universidad Militar Nueva Granada (Cajicá-Cundinamarca, Colombia).

Abstract

This paper describes a methodology for object recognition categorized as polyhedron and non-polyhedron. This recognition is achieved through digital image processing combined with artificial intelligence algorithms, such as Hopfield networks. The procedure consists of processing images in search of patterns to train the system. The process is carried out through three stages: i) Segmentation, ii) Smart recognition, and iii) Feature extraction; as a result, images of objects are obtained and trained in the designed neuronal network. Finally, Hopfield's network is used to establish the object type as soon as it receives one. The proposed methodology was evaluated in a real environment with a considerable number of detected images; the noisy images recognition uncertainty was 2.6%, an acceptable result considering variable light, shape and color. The results obtained from this experiment show a high recognition level, which represents 97.4%. Out of this procedure, we can assume that it is possible to train new patterns, and it is expected that the model will be able to recognize them. Potentially, the proposed methodology could be used in a vast range of applications, such as object identification in industrial environments, grasping objects using manipulators or robotic arms, tools for blind patients, among other applications.

Keywords Hopfield network; morphologic operations; neuronal networks; objects recognition as of 2D images

Resumen

En el presente artículo se describe una metodología para el reconocimiento de objetos, los cuales se han clasificado en poliedros y no poliedros, este reconocimiento se logra mediante procesamiento digital de imágenes combinada con el uso de algoritmos de inteligencia artificial, como son las redes neuronales de Hopfield. En una primera etapa se procesa las imágenes, con el fin de obtener los patrones a entrenar, dicho proceso fue desarrollado en tres etapas: i.) Segmentación, ii.) Reconocimiento inteligente, iii.) Extracción de características, a partir de los resultados obtenidos, en este caso imágenes de los objetos, estos elementos se entrenan en la red neuronal diseñada, finalmente se hace uso de la red neuronal de Hopfied propuesta, la cual, al recibir un nuevo elemento o imagen de un objeto, determinará el tipo de objeto. La metodología propuesta fue evaluada en un ambiente real, mostrando un amplio número de imágenes detectadas, la incertidumbre al reconocer imágenes ruidosas, representa el 2,6% de la muestra, ofreciendo una respuesta aceptable frente a condiciones de luz, forma y color variables, los resultados obtenidos a partir del experimento evidencian un grado alto de reconocimiento del 97.4%, consecuentemente, a partir de este procedimiento es posible entrenar nuevos patrones con nuevas formas, y se espera que este modelo de reconocimiento sea capaz de reconocer patrones completamente nuevos. La metodología propuesta potencialmente puede ser utilizada en diferentes aplicaciones, como es la identificación de objetos en procesos industriales, funciones de agarre de objetos mediante el uso de manipuladores o brazos robóticos, en el área de la rehabilitación como ayuda a personas con limitaciones visuales, entre otras.

Palabras clave operaciones morfológicas; reconocimiento de imágenes en 2D; red de Hopfield; redes neuronales

Resumo

No presente artigo descreve-se uma metodologia para o reconhecimento de objetos, os quais se tem classificado em poliedros e não poliedros, este reconhecimento logra-se mediante processamento digital de imagens combinada com o uso de algoritmos de inteligência artificial, como são as redes neuronais de Hopfield. Em uma primeira etapa processam-se as imagens, com o fim de obter os padrões para treinar, dito processo foi desenvolvido em três etapas: i.) Segmentação, ii.) Reconhecimento inteligente, iii.) Extração de características, a partir dos resultados obtidos, neste caso imagens dos objetos, estes elementos treinam-se na rede neuronal desenhada, finalmente faz-se uso da rede neuronal de Hopfied proposta, a qual, ao receber um novo elemento ou imagem de um objeto, determinará o tipo de objeto. A metodologia proposta foi avaliada em um ambiente real, mostrando um amplo número de imagens detectadas; a incerteza ao reconhecer imagens ruidosas, representa 2,6% da amostra, oferecendo uma resposta aceitável frente a condições de luz, forma e cor variáveis, os resultados obtidos a partir do experimento evidenciam um grau alto de reconhecimento de 97.4%, consequentemente, a partir deste procedimento é possível treinar novos padrões com novas formas, e espera-se que este modelo de reconhecimento seja capaz de reconhecer padrões completamente novos. A metodologia proposta potencialmente pode ser utilizada em diferentes aplicações, como é a identificação de objetos em processos industriais, funções de agarre de objetos mediante o uso de manipuladores ou braços robóticos, na área da reabilitação como ajuda a pessoas com limitações visuais, entre outras.

Palavras-chave operações morfológicas; reconhecimento de imagens em 2D; rede de Hopfield; redes neuronais

I. INTRODUCTION

Artificial vision is a relevant research field that involves multiple applications, such as robotics, industrial processes, and devices for people with visual disabilities, among others. Artificial vision is contained within the artificial intelligence field, which makes use of different algorithms, techniques, and methods, achieving the processing of information contained in digital images. In particular, in order to provide a solution to object recognition, multiple algorithms have been proposed [¹], which involve different morphological operations allowing to adapt the captured images, which have high variability due to: i.) light conditions, ii.) device for capturing the image, and iii.) object itself, among many other possibilities. Additionally, the proposed methodology involves artificial intelligence algorithms, particularly Hopfield´s neural networks [²,³,⁴,⁵,⁶,⁷,⁸,⁹,¹⁰].

Then, it becomes necessary to propose an artificial vision algorithm that recognizes objects collection, such as polyhedral and non-polyhedral, under variable luminosity conditions. The proposed procedure involves two main phases: i.) capture image processing, and ii.) neural network application. First step images are reduced and adjusted, following this procedure: i.) Segmentation, ii.) Intelligent recognition, iii.) feature extraction. The second step is related to the use of a Hopfield neural network, and it is trained with patterns, then new images are presented to the network; it will automatically and autonomously recognize objects presented. Consequently, the article is divided into three sections; first, the methods and materials are introduced, detailing the digital processing of images and the artificial intelligence model. Then the results section shows the experimental procedure to validate the model, testing the artificial intelligence system. Finally, the discussion and conclusions section is presented.

II. METHODS AND MATERIALS

Conventionally the recognition of objects in a scene is achieved by digital image processing [¹¹], whose purpose is directly related to artificial vision or computer vision. Artificial vision aims to detect, segment, locate, and recognize particular objects. Thus, the following methodology is proposed: (a) segmentation, (b) intelligent recognition, and, (c) extraction of characteristics.

A. Segmentation

The proposed image processing consists of converting the image to grayscale, obtaining a lighter image format. To obtain a lighter image format is needed to convert on grayscale images. Then, edges of the images are detected by the derivative method; before the image is enlarged and eroded to close the found edges. Finally, the borders are filled, achieving a mask that identifies the position of the object inside the image. Each step is described in detail below:

1) Greyscale Transformation. It consists of determining the equivalent of the luminance, which is defined as the light received on a surface being defined as the relation of the luminous flux on the illuminated area [¹²], luminance concept is associated with human eye perception of different light intensity [¹³]. The luminance is calculated based on the weighted average of the color components of each pixel, as shown in Equation 1, where L corresponds to the luminance, R is the red component. G the green component and B the blue component.

(1)

Equation 1 should be used for calculation grayscale to each pixel; this process is required for the image edge detection.

2) Images Edges Detection. The image obtained in the previous step can be represented as a discrete function in 2D, which is defined by the coordinates of each pixel m and n. The discrete value of the function is evaluated at a specific point; its procedure is known as brightness or pixel intensity. An edge is defined as tone changes between pixels, in cases where changes exceed a threshold value, it is considered an edge. Different methods to identify edges have been proposed, one of them is the intensity gradient of each pixel, using a convolution mask, then magnitude is calculated finally, the threshold process is applied [¹⁴].

The most used edge detection techniques employ local operators [¹⁵], using discrete approximations of the first and second of grayscales images, hereunder it will be described the proposed operator, which is based on the first derivative of the image.

3) First Derivative Operators. The derivative of a continuous signal provides localvariations for the coordinate, so the obtained values are higher when variations arefaster. In the case of two-dimensional functions. 𝑓 (𝑥, 𝑦), The derivative is a vectorpointing in the direction of the maximum variation of 𝑓 (𝑥, 𝑦) and the module of whichis proportional to that variation. This vector is called gradient [¹⁶] and is defined as its magnitude is defined as and its address is given by

4) Morphological Operations. The contour and shape of objects are studied by morphology, applying a set of mathematical operations to images [¹⁷]. The sets represent the shape of the objects contained in an image, so to extract a geometric structure from a set, using known shape as structuring elements. Then, each pixel is compared with its neighborhood pixels [¹⁸]. The morphological operations commonly used are dilation and erosion, among others; these operations are described below.

5) Dilation. It allows increasing the size of the objects, reducing the size of the background, results depend on structural elements applied. For example, in order to expand the image shown in Fig. 1a, a red cross (+) was used as a structuring element, subsequently, the structuring element matchs with each of the elements of the input image, as shown in Fig. 1b, obtaining Fig. 1c the dilated pixels appear in pink, as shown in Fig. 1d.

Fig. 1 Dilatation example. (a) Original image, (b) First step, (c) Intermediate step, (d) Erosion.

6) Erosion. Erosion is a morphological operation that functions in a way similar to dilation but obtaining an inverse result. This process reduces the size of objects by extending the limits of the background and eliminating small objects.

7) Filling Edges. It consists of completing those rough edges, the edges are defined as the transition between two regions of considerably different levels of grey, from these, it is possible to determine the boundary or contour of an object.

Fig. 2 presents the synthesis of the image segmentation process. This methodology allows to identify the position of the object within the image, however, as observed in the right figure, the result of the segmentation is not forceful, and it is not yet possible to identify the edge of the object, being the system highly sensitive to errors, for this reason, the use of artificial intelligence becomes necessary.

Fig. 2 Image segmentation.

B. Artificial Intelligence for Object Recognition

The image is reconstructed using a Hopfield neural network; the network eliminates noise in edge image, improving accuracy and precision, and obtaining a proper segmented image.

Artificial intelligence is known as "the science and engineering of making intelligent machines, especially intelligent computer programs", this definition was proposed by Professor John Mccarthy, in 1956 [¹⁹]. The target of artificial intelligence is to think, evaluate or act according to certain inputs to exercise some specific function, to achieve this, different processes could be performed: i.) Genetic algorithms, ii.) Artificial neural networks and iii.) Formal logic.

For the specific problem, artificial neural networks are employed, it uses elements information processors, where local interactions depend on overall system performance.[²⁰] Networks consist of a large number of simple processing elements called nodes or neurons that are arranged in layers [²¹]. Each neuron is connected to other neurons by communication links, which have a weight associated. Weights represent the information that will be used by the neural network to solve a given problem [²²]. Now, there are various types of neural networks, such as self-organization, recurrent, among others.

Hopfield Network is part of recurrent networks, which is a network of probabilisticadaptation, recurrent and with associative memory, it learns to reconstruct the inputpatterns memorized during training [²³], patterns may be presented incompletely orwith noise. Also, Hopfield networks are characterized by the fact that each neuroncan be updated an indefinite number of times, independently of the rest of theneurons, and the network is interconnected in parallel. Figure 3 shows the structureof the Hopfield Network where N1, N2 and N3 corresponds to neurons, X isassociated with inputs, D corresponds to the distribution node, F is the activationfunction, 𝑙_𝑖 is the input polarization, 𝑊_𝑖𝑗 corresponds to the weights and finally 𝑌_𝑖 isthe exit.

Fig. 3 Hopfield network structure.

The model consists of a monolayer network with N neurons, with analog inputs and outputs, using neurons with activation functions F (Fig. 4), the mathematical model describing the behavior of the neural network is represented mathematically by equations 2 and 3.

Fig. 4 Activation function.

(2)

(3)

The Hopfield network employs the basic structure of individual Adaline perceptrontypeneurons. However, it departs from the usual neural network designs in itsfeedback structure. Note that a binary Hopfield Network of two neurons can beconsidered as a system of 2n state, i.e., 2², with outputs belonging to the set of fourstates {00, 01, 10, 11}. When the network receives an input vector, it will stabilize inone of the above states, and the output state will be determined by the networkweight configurations.

For the implementation of the Hopfield network, it is necessary to propose a modelof weights based on the self-associative memory [²⁴], This is characterized becauseit presupposes that the input must approach a value stored in the memory in such away: Achieving that for an arbitrary entry that is closer to x compared to anentry that is closer to x compared to an entry , obtaining equation 4.

(4)

In this way, the network weights will be calculated using equation 5.

(5)

A partially incorrect input will produce in the network an output close to the most similar pattern stored in the memory, achieving a correct output in the presence of distorted information. This latter property makes the Hopfield Network the most suitable for the expected solution (Fig. 5) since the input is a partially incorrect image due to noise conditions, the output from the network will be the corrected image.

Fig. 5 Artificial intelligence system procedure.

C. Feature Extraction

The feature extraction phase consists of dilating and eroding the image resulting from the Hopfield network, generating a new, improved mask that when applied to the original image will result in the segmentation of the image extracting the object to be analyzed (Fig. 6).

Fig. 6 Feature extraction procedure.

III. RESULTS

The proposed methodology was validated, carrying out an experiment, which evaluates the effectiveness. The materials are described below and the results obtained are presented.

A. Materials

1) Capture Device. A Handycam® Sony camera with Carl Zeiss® lens, ith optic zoom of 40x, DCR-HC52 reference, it was selected because it delivers 720 x 480 images, which is the right size to process the image quickly without loss of information. Also, the selected camera has self-calibration, allowing it to improve the focus. Additionally, the camera is fixed on a tripod, with a tilt of 33,75°.

2) Working Space. The working space consists of boxes of dimensions of the width of 47 cm, height of 40 cm, and depth of 38 cm, internally these were lined with papers of red, green and blue color because these are the basic colors (Figure 7). Inside were located three LED arrays: 3 x 4 white light, these arrays were located on the top, right and left side internally in the boxes, the arrays were located centered on each of the faces, the LED light was polarized to different voltages: 12v, 9v, 7.5v, 6v and 4.5v.

Fig. 7. Box with matrix leds.

3) Objects. Two types of small and large objects were selected as follows: i.) Polyhedral: prismatic (Fig. 8), hexahedron, orthohedron and ii.) Non-polyhedral (Fig. 9).

Fig. 8. Polyhedral objects.

Fig. 9. Non-polyhedral objects.

B. Experimental Configuration

The camera was placed on an angle of 33.75, see Fig. 10.

Fig. 10. Experiment.

C. Samples

150 images were taken, corresponding to prismatic, hexahedron and octahedron, cylindrical, and spherical objects. 5 different photographs were taken considering: i.) the intensity of light, represented in different voltages, when the voltage decreases, the images get darker, as the consequence noise is more significant. Samples of the photographs collected for prismatic objects are presented in Table 1.

Table 1 Objects image polyhedral

IV. Discussion

A. Image Segmentation

The segmentation procedure was carried out on the 150 images see Figure 11, errors are highlighted in red, only 4 cases were not recognized. Therefore, uncertainty is 2,6% during noised image recognition. It is important to note that the errors occurred under the less favorable conditions when the light intensity was low, thus considering that the mathematical model worked properly, offering an acceptable response to light conditions, variable shape, and color.

Fig. 11. Segmentation results, mistakes are highlighted in red.

B. Artificial Intelligence

Since the goal of the neural network is to reconstruct an unknown pattern, based on information stored in memory, a network with associative memory is required, this concept is quite intuitive, associating two patterns, the input pattern with one stored in memory, as explained above.

Segmented images measure 720x480 pixels, so the input vectors should measure 1x345600, and the weights matrix 345600x345600, these values are considerably high to be processed by a conventional computer. This is way before processing the image in the neural network, the image was dilated and its size was changed by reducing 10 times, the patterns used to train the network are shown in Figure 12.

Fig. 12. Trained patterns.

The patterns shown have dimensions of 72x48 pixels, so the W weight matrix has a dimension of 3456x3456, so it is possible to process the network on a conventional computer.

C. Results

The tests of effectiveness on the neural network evidence that the network reconstructed the expected pattern, Figure 13 shows the matrix of confusion of the neural network, evidencing the capacity of classification of the network. Consequently, the only error of recognition of an object comes from the uncertainty of 2.6% found in the segmentation stage. Therefore, the complete methodology shows the overall efficiency of 97.4%.

Fig. 13 Confusion matrix.

V. CONCLUSIONS

The proposed methodology demonstrated to be able to identify basic shapes of objects from images; this methodology can be extrapolated to different applications, such as the grip of objects by using robotic arms, aids for visually impaired persons, among others.

On the other hand, the use of hybrid methodologies and image processing techniques combined with artificial intelligence techniques, specifically Hopfield network, constitutes an essential contribution in the area of artificial vision.

References

[1] S. Todorovic, and N. Ahuja, “Unsupervised Category Modeling, Recognition, and Segmentation in Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, pp. 2158-2174, 2008. https://doi.org/10.1109/TPAMI.2008.24 [ Links ]

[2] S. Chartier, and R. Lepage, “Learning and extracting edges from images by a modified Hopfield neural network,” Object recognition supported by user interaction for service robots, Quebec, Canada, 2002. [ Links ]

[3] R. Sammouda, and B. B. Youssef, "A comparison of cluster distance metrics for the segmentation of sputum color image using unsupervised hopfield neural network classifier," in Global Summit on Computer & Information Technology, Sousse, Tunez, 2014. https://doi.org/10.1109/GSCIT.2014.6970130 [ Links ]

[4] X. Zhao, Y. Li, and Q. Zhao, “A Fuzzy Clustering Approach for Complex Color Image Segmentation Based on Gaussian Model with Interactions between Color Planes and Mixture Gaussian Model,” International Journal of Fuzzy Systems, vol. 20, pp. 309-317, 2017. https://doi.org/10.1007/s40815-017-0411-1 [ Links ]

[5] K. Mutter, M. Jafri, and A. Aziz, “Real time object detection using Hopfield neural network for Arabic printed letter recognition,” in 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA), 2010. https://doi.org/10.1109/isspa.2010.5605416 [ Links ]

[6] C. M. Bishop, Pattern recognition and machine learning, New York: Springer, 2006. [ Links ]

[7] S. S. Young, P. D. Scott, and N. M. Nasrabadi, "Object recognition using multilayer Hopfield neural network," IEEE Transactions on Image Processing, vol. 6 (3), pp. 357-372, 1997. https://doi.org/10.1109/83.557336 [ Links ]

[8] Z. Hongbin, N. Hideki, and N. Tadashi, “Recognizing 3D objects by using a Hopfield-style optimization algorithm for matching patch-based descriptions,” Pattern Recognition, Pergamon, vol. 31, pp. 727-741, 1998. https://doi.org/10.1016/s0031-3203(97)00105-2 [ Links ]

[9] H. Zha, H. Nanamegi, and T. Nagata, "3-D object recognition from range images by using a model-based Hopfield-style matching algorithm," in 13th International Conference on Pattern Recognition, Vienna, Austria, 1996. https://doi.org/10.1109/ICPR.1996.547244 [ Links ]

[10] N. M. Nasrabadi, and W. Li, "Object recognition by a Hopfield neural network," IEEE Transactions on Systems, Man, and Cybernetics, vol. 21 (6), pp. 1523-1535, 1991. https://doi.org/10.1109/21.135694 [ Links ]

[11] E. García, “Detección y clasificación de objetos dentro de un salón de clases empleando técnicas de procesamiento digital de imágenes,” México, 2008. http://newton.azc.uam.mx/mcc/01_esp/11_tesis/tesis/terminada/080513_garcia_santillan_elias.pdf [ Links ]

[12] P. Fiorentin, and A. Scroccaro, "Comparison of luminance measurement based on illuminance and luminance detectors," in Conference IEEE Instrumentation and Measurement Technology, Singapore, 2009. https://doi.org/10.1109/IMTC.2009.5168561 [ Links ]

[13] B. Escalante, “Procesamiento digital de imágenes,” Apuntes de curso, Distrito Federal, México, 2006. http://lapi.fi-p.unam.mx/wp-content/uploads/PDI_Cap1_Introduccion.pdf [ Links ]

[14] E. Lucer, and H. Saldana, “Utilización de técnicas de visión artificial para la detección automática de defectos externos del mango,” Grade Thesis, Universidad Señor de Sipán, Chiclayo, Perú, 2016. [ Links ]

[15] A. Martínez, “Técnicas de segmentación de imágenes, reconstrucción y descomposición de mallas enfocadas y aplicaciones médicas,” Doctoral Thesis, Universidad de Jaén, Spain, 2013. http://ruja.ujaen.es/bitstream/10953/524/1/9788484390398.pdf [ Links ]

[16] E. R. Davies, Machine Vision, Theory, Algorithms, Practicalities, Elsevier, 2004. [ Links ]

[17] Mathworks, "MathWorks-Makers of MATLAB and Simulink," 2020. https://www.mathworks.com/ [ Links ]

[18] S. Ortiz, and C. Lemus, “Diseño de un modelo basado en técnicas de inteligencia artificial para el desarrollo de un sistema inteligente orientado al aprendizaje,” Grade Thesis, Escuela Especializada en Ingeniería ITCA, Santa Tecla, El Salvador, 2011. https://www.itca.edu.sv/wp-content/themes/elaniin-itca/docs/2011-Diseno-de-un-modelo-basado-en-tecnicas-de-inteligencia.pdf [ Links ]

[19] J. R. Hilera, and V. J. Martínez, Redes neuronales artificiales: Fundamentos, modelos y aplicaciones, RA-MA, 2009. [ Links ]

[20] D. J. Matich, Redes Neuronales: Conceptos Básicos y Aplicaciones, Universidad Tecnológica Nacional, Argentina, 2001. [ Links ]

[21] M. J. Palmer, and J. J. Montaño, “¿Qué son las redes neuronales artificiales? Aplicaciones realizadas en el ámbito de las adicciones,” Adicciones, vol. 11, pp. 243-255, 1999. [ Links ]

[22] Electronica.com.mx, "Hopfield", 2020. http://electronica.com.mx/neural/informacion/hopfield.html [ Links ]

[23] L. Wang, "Effects of noise in training patterns on the memory capacity of the fully connected binary Hopfield neural network: mean-field theory and simulations," IEEE Transactions on Neural Networks, vol. 9 (4), pp. 697-704, 1998. https://doi.org/10.1109/72.701182 [ Links ]

Received: February 18, 2020; Accepted: March 24, 2020

Competing interests: The authors have declared that no competing interests exist.

AUTHOR’S CONTRIBUTION: Ramirez and Rubiano developed the methodology, the collection of patterns, the experimentation and the results obtained. Jimenez contributed to the development of the artificial intelligence system and the analysis of the results.

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.