Point cloud saliency detection via local sparse coding

Leal, Esmeide; Sanchez-Torres, German; Branch-Bedoya, John William; Leal, Esmeide; Sanchez-Torres, German; Branch-Bedoya, John William

doi:10.15446/dyna.v86n209.75958

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

DYNA

Print version ISSN 0012-7353

Dyna rev.fac.nac.minas vol.86 no.209 Medellín Apr./June 2019

https://doi.org/10.15446/dyna.v86n209.75958

Artículos

Point cloud saliency detection via local sparse coding

Detección de prominencia de nubes de puntos por medio de codificación dispersa local

Esmeide Leal^a

German Sanchez-Torres^b

John William Branch-Bedoya^c

^{^a} Facultad de Ingenierías, Universidad Autónoma de Colombia, Barranquilla, Colombia. esleal@uac.edu.co

^{^b} Facultad de Ingenierías, Universidad del Magdalena, Santa Marta, Colombia. gsanchez@unimagdalena.edu.co

^{^c} Facultad de Minas, Universidad Nacional de Colombia, Medellín, Colombia. jwbranch@unal.edu.co

Abstract

The human visual system (HVS) can process large quantities of visual information instantly. Visual saliency perception is the process of locating and identifying regions with a high degree of saliency from a visual standpoint. Mesh saliency detection has been studied extensively in recent years, but few studies have focused on 3D point cloud saliency detection. The estimation of visual saliency is important for computer graphics tasks such as simplification, segmentation, shape matching and resizing. In this paper, we present a method for the direct detection of saliency on unorganized point clouds. First, our method computes a set of overlapping neighborhoods and estimates a descriptor vector for each point inside it. Then, the descriptor vectors are used as a natural dictionary in order to apply a sparse coding process. Finally, we estimate a saliency map of the point neighborhoods based on the Minimum Description Length (MDL) principle. Experiment results show that the proposed method achieves similar results to those from the literature review and in some cases even improves on them. It captures the geometry of the point clouds without using any topological information and achieves an acceptable performance. The effectiveness and robustness of our approach are shown by comparing it to previous studies in the literature review.

Keywords: point clouds; sparse coding; saliency; minimum description length

Resumen

El sistema visual humano (SVH) puede procesar grandes cantidades de información visual al instante. La percepción de la salencia visual es el proceso de localizar e identificar regiones con un alto grado de prominencia desde un punto de vista visual. La detección de la salencia de la malla se ha estudiado ampliamente en los últimos años, pero pocos trabajos se han centrado en la detección desde las nubes de puntos 3D. La estimación de la salencia visual es importante para tareas de gráficos por computadora tales como simplificación, segmentación, ajuste de formas y cambio de tamaño. En este artículo presentamos un método para la detección directa de la salencia en nubes de puntos no organizadas. Primero, nuestro método calcula un conjunto de vecindarios superpuestos y estima un vector descriptor para cada punto dentro de él. Luego, los vectores descriptores se utilizan como un diccionario natural para aplicar un proceso de codificación disperso. Finalmente, estimamos un mapa de prominencia de los vecindarios de puntos según el principio de Longitud de descripción mínima (MDL). Los resultados experimentales muestran que el método propuesto alcanza los resultados del estado del arte y algunos casos los llega a superar. Captura la geometría de las nubes de puntos sin utilizar ninguna información topológica y logra un rendimiento aceptable. La efectividad y la solidez de nuestro enfoque se muestran en comparaciones con trabajos anteriores en el estado del arte.

Palabras clave: nubes de puntos; código disperso; salencia; longitud de descripción mínima

1. Introduction

The human visual system (HVS) can process large amounts of visual information instantly and is able to locate objects of interest and distinguish them from complex background scenes. Research shows that the HVS pays more attention to infrequent features and suppresses repetitive ones [¹-²]. Visual saliency plays an important role in the process in which HSV identifies scenes and detects objects. It is also concerned with the way the biological system perceives the environment. For example, every time we look at a specific place, we pay more attention to particular regions which are distinct from the surrounding area.

Visual saliency is an active research field in areas such as psychology [²], neuroscience [³], computer vision [⁴-⁵] and computer graphics [⁶-⁷]. There are many computational methods for simulating the HVS and visual saliency, but it nonetheless remains an unexplored field because of the difficulty of designing algorithms to simulate this process [⁶]. As for computer graphics, while the concept of visual saliency has been widely explored for mesh saliency [⁶], [⁸-¹⁸], few studies have explored visual saliency in point clouds [¹⁹-²²]. Visual saliency is an important topic for 3D surface study and has important applications in 3D geometry processing such as resizing [¹], simplification [²³-²⁴], smoothing [²⁵], segmentation [²⁶], shape matching and retrieval [²⁷-²⁸], 3D printing [²⁹], and so forth.

Due to the development of 3D data scanning technology, current scanners generate thousands of points for every scanned object. Therefore, for rendering, point clouds have become an alternative to triangular meshes. A common way to process a point cloud is to reconstruct the surface using methods such as triangular mesh, NURBS representation and Radial Basis Functions. However, due to a large number of points, different sampling densities and the inherent noise produced by the scanning process, reconstruction is an expensive and challenging computational task. For these reasons, it is necessary to develop geometry processing algorithms that operate directly on the point sets. Applying existing saliency detection techniques to point clouds is not a trivial task. This is due to the absence of topological information, which is not a problem for mesh-based methods. The method we propose here is inspired by Li et al. [³⁰] and we made it fit for general use supported by the Minimum Description Length (MDL) principle. In this way, MDL is used as a criterion to distinguish regions in the point clouds.

1.1. Contribution

The main contributions of this paper are: (1) a method for saliency detection without topological information using only the raw point sets, (2) the use of the MDL principle for defining saliency measurements, extended to sparse coding representation to obtain saliency maps of point clouds.

Experimental results show that the proposed algorithm performs favorably in the capture of the geometry of the point clouds without using any topological information and achieves an acceptable performance when compared to previous approaches.

1.2. Organization

This paper is organized as follows. A review of related work on saliency detection is provided in Section 2. The theoretical basis for Sparse Coding and MDL for the proposed Sparse Coding Saliency method is presented in Section 3. Section 4 lays out the results and discussion. Finally, we present the conclusions and future studies in Section 5.

2. Related work

Visual saliency detection has its origin in the area of computer vision, specifically in 2D images. Inspired by this research, visual saliency detection has been applied successfully to 3D meshes and point clouds. In recent years, much research has tried to develop methods for visual saliency on 3D surfaces [¹⁰-¹²,¹⁷,¹⁹-²¹,³²].

Early advances in the field of Mesh Saliency used 2D projections for 3D applications. While such works often ignored (or were unable to exploit) the importance of depth in human perception, they set a basis for further research by highlighting the importance of human perception in image analysis [¹⁸]. Some of the first researchers to exploit saliency for 3D mesh processing were Lee et al. [³²]. The authors introduced the concept of mesh saliency and computed it using a Gaussian-weighted center-surround mechanism. Results were then exploited for the implementation of mesh simplification and best point-of-view selection algorithms. Wu et al. [¹³] presented an approach based on the principles of local contrast and global rarity. This method has applications for mesh smoothing, simplification and sampling. Leifman et al. [³³] propose a method for detecting a region of interest on surfaces where they capture the distinctness of the vertex descriptor, characterizing the local geometry around it. The method is applied to viewpoint selection and shape similarity. Tao et al. [¹¹] put forward a mesh saliency detection approach which reached state-of-the-art performance even when handling noisy models. Firstly, the manifold ranking was employed in a descriptor space to imitate human attention, and then descriptor vectors were built for each over-segmented patch on the mesh, using Zernike coefficients and center-surround operators. Afterward, background patches were selected as queries to transfer saliency, helping the method to handle noise better. An approach for mesh saliency detection based on a Markov Chain is proposed by Liu [¹⁰]; the input mesh was partitioned into segments using Neuts algorithms and then over-segmented into patches using Zernike coefficients. Instead of employing center-surround operators, background patches were selected by determining feature variance to separate the insignificant regions. Limper et al. [⁸] applied the concept of Shannon entropy, which is defined as the expected information value within a message, for 3D mesh processing. In this case, the curvature is established to be the primary source of information in the mesh. Song et al. [¹⁸] argued for the benefits of using not only local but also global criteria for mesh saliency detection. Their method consisted of computing local saliency features while considering the later computation of global saliency using a statistic Laplacian-based algorithm which captures salient features at multiple scales.

Recently, point saliency detection-based methods have been introduced. Shtrom et al. [²¹] propose a multi-level approach to find distinct points, basing their approach on the context-aware method for image processing. Tasse et al. [²⁰] propose a cluster-based approach; the method discomposed the point cloud into the small clusters using an adaptive fuzzy clustering algorithm and is applied in the detection of key-points.

Finally, Guo et al. [¹⁹] propose a saliency detection method based on a covariance descriptor to capture the local geometry information, using a sigma set descriptor to transform the covariance descriptor from a Riemannian space to a Euclidian space to facilitate the application of Principal Component Analysis for the inner structure analysis to classify whether a point is salient or not.

3. Proposed method

The saliency detection framework based on sparse coding is shown in Fig.1. The input point cloud is represented in sparse form and the saliency is estimated by a center-surround hypothesis [³⁴-³⁵], which states that a region is salient if it is distinct from its surrounding regions. Taking a point cloud as input, we estimate a neighborhood with radius r for each point in the cloud, then select some of its boundary points. We estimate a neighborhood with radius r around each selected point. Overlapping is allowed between neighborhoods in order to capture the structure of the cloud. Then, for each point in the neighborhood, we compute a set of basic features to build a descriptor vector and finally, we estimate the mean of the feature vectors to build only one descriptor vector per neighborhood. Similarly, we do the same with the neighborhoods surrounding the central one. Afterward, we construct a dictionary D, using the feature vectors from the surrounding neighborhoods, and then a sparse coding is carried out using the feature vector of the central neighborhood and by searching for a sparse representation for it with the basis in the dictionary D. Finally, we compute the neighborhood saliency using the MDL principle based on the sparse representation measure of vector x, and the residual of the sparse reconstruction measure. The final saliency map is computed using a fusion of both measures.

Source: The Authors.

Figure 1. . Processing pipeline of our sparse coding-based saliency detection method.

3.1. Sparse coding

The purpose of sparse coding is to approximate a feature input vector as the linear combination of basis vectors, called atoms, which are selected from a dictionary which has been created from the data. In other words, sparse coding provides a low-dimensional approximation of a given signal in a given set of a basis [³⁶].

Formally, let be a signal of dimension , the sparse coding aims to find a dictionary such that can be approximated by a linear combination of the atoms this is (if the dictionary is overcomplete then where most of the coefficients are zeros or close to zero [³⁷]. We propose that the sparse coding problem can typically be formulated as an optimization problem, as in (1):

In this formulation, the dictionary D is given and L controls the sparsity of x in D. The term measures the dispersion of the decomposition and can be understood as the number of non-zero coefficients in , or, sparse coefficients, in order to approximate the signal in as sparse a way as possible. Or, alternatively, it can be formulated as (2):

This is an optimization problem where the norm is changed by the norm , where is the regularization parameter). The solution to equation (1) with the norm is an NP-hard problem; fortunately, under certain conditions, it is possible to relax the problem using the norm and find an approximated solution using equation (2) with the norm .

3.2. Minimum description length

The Minimum Description Length (MDL) principle [⁴][³¹][³⁸] states that, given a model class of candidate models, its parameter M, and its data sample x, the MDL will provide a generic solution to the model selection that minimally represents the data x. Formally, given a set of candidate models , and a data vector , MDL searches for the best model , that can be used to describe x in its entirety with the shortest length.

is a coding assignment function which gives the codelength required to describe uniquely.

In [³⁹], a method based on information theory was introduced for image saliency detection. This method measures the saliency concerning the likelihood of a patch given the patches that surround it. The method measures the self-information using the negative log-likelihood. Defining as an image and as the probability of occurrence of the patch , given its surrounding neighborhood patches, the saliency measure of the patch is defined that is, the self-information characterizes the raw likelihood of the n-dimensional vector values given by .

Based on [⁴,³¹,³⁸], we use the MDL principle to propose a method for salience detection in point clouds based on sparse coding. The coding assignment function can be defined in terms of probability assignment based on the Ideal Shannon Codelength Assignment [²⁸], that is, . Using Bayes theorem, we can establish as and then by applying maximum a posteriori (MAP), the penalized likelihood form of the coding model is formulated thus:

Where is the term that describes how well the model adjusts the data and describes the model complexity, model cost or prior term.

Once is estimated, the terms in equation (4) can be interpreted as follows:

is the description length or codelength of the model; and

is the description length or Codelength of the data encoded using the model.

3.3. Relating the MDL principle to sparse coding

The standard Gaussian is assumed to be coding assignment function , and assuming that can be represented in a sparse way given a basis dictionary and a sparse vector , the probability distribution of the reconstruction error follows a Gaussian distribution with known variance . The term in (4) becomes , and the term in (4), assuming sparsity constraint, becomes . The aforementioned calculations adhere to the sparse coding model described in Section 3.3.

The sparsity condition and the sparse reconstruction error conform to the MDL principle if we set the estimation parameter and evaluate it with the prior term and with the likelihood term and so and are obtained respectively. We conclude that the sparsity of the vector is the codelength of the model and the residual of the sparse reconstruction error is the codelength of the data given the model.

The MDL principle selects the best model that produces the shortest description of the data. The more regularity that is presented in the data, the shorter the description the model will produce. If a neighborhood is equal or slightly different with respect to its surroundings in terms of information, it means these signals (neighborhoods) are redundant and can be represented sparsely with a suitable basis dictionary, meaning that its description length (the sparsity of vector i.e., will be short and we can conclude that this neighborhood is not salient; on the other hand, if the neighborhood is very different from its surroundings, its description length will be longer. In other words, it cannot be represented sparsely by the basis dictionary and we can conclude that the neighborhood is salient.

If the sparse reconstruction error (i.e., produces a high residual, it means that its description length will be longer and this implies that the neighborhood is dissimilar from its surroundings and therefore more salient. On the other hand, if the sparse reconstruction error produces a low residual, its description length will be short and this implies that the neighborhood is similar with respect to its surroundings and, consequently, less salient. Based on these saliency measurements, our point cloud saliency detection method is described below.

3.4. Sparse coding saliency detection

Unlike some saliency detection methods for point clouds compiled in the literature review, our method does not estimate saliency using individual points directly, rather, a neighborhood is used to determine its saliency.

3.4.1. Feature vector

Given the point set , a neighborhood i is estimated for every point with radius , is the number of neighbor points inside a sphere with radius , and center See Fig.1. In order to establish whether a neighborhood is different, first it is important to characterize it with a descriptor, of which there are several which describe low level features for each , for example, normal, curvatures, shape index, etc. In this method the normal and Gaussian curvatures are selected; these features are rotationally invariant. A third feature , , is also selected. With the features defined, a five dimensional feature vector is formed for every point of , i.e., where are the three components of the normal vector ; is the Gaussian curvature and is the fifth component coordinate of , which will be defined below. It is necessary to have a single descriptor vector for each neighborhood, so the mean of the characteristic vectors belonging to the neighborhood are estimated as follows:

Where is the cardinality of

is a global measure, which establishes the difference between the feature vector of each neighborhood and the global mean (6) of all the feature vectors of the point cloud, that is:

3.4.2. Surrounding neighborhoods

Once the neighborhood descriptor is established, the next step is to find the surrounding neighborhoods to each To find these neighborhoods, first, the 3x3 covariance matrix (8) of is computed as:

Where is the mean of and

After the covariance matrix has been estimated, we will use its two largest eigen-values with its corresponding eigen-vectors that follow the and axes, which expand the tangent plane to . at the point . Next, the points within are projected onto the 2D plane as shown in Fig. 2(a). Before projecting the points within we establish the area of the surrounding neighborhoods as the n-ring, as can be seen in Fig. 2(b), that is, the 1-ring corresponds to radius , the 2-ring corresponds to and so on. In these experiments we used the 1-ring. Then, we proceed to trace a set of radii, separated by an angle, from the center of the projected neighborhood in Fig. 2(b); the length of radii is n-ring. In each of the radii, we mark points (green dots) depending on if we have used the 1-ring, 2-ring or n-ring, as shown Fig. 2(b) Then, we find the nearest point to each marked point within the projected neighborhood and find its corresponding point within the 3D neighborhood; then we estimate a neighborhood for each of the 3D points with radiusr, as seen in Fig. 2(c). Finally, a feature vector is estimated for each surrounding neighborhood, in the same way as in the previous section.

3.4.3. Dictionary construction and sparse coding model

It can be observed that the surrounding feature vectors are a natural over complete basis dictionary and the central feature vector acts as the sparse linear combination of these basis or atoms, i.e. , Now we can write the sparse model as so:

Equation (9) is a linear regression problem optimization for estimating , known as Lasso. The LARS algorithm gives the solution and the SPAMS library is used to carry out the sparse coding solution.

3.4.4. Saliency detection

Previously, in Section 3.4.1, was mentioned as being the fifth component of the feature vectors , and it was estimated using (7). This component was added because often, a local neighborhood has similar surroundings but the local and surroundings are globally distinct over the entire point cloud. Using only normal vectors and Gaussian curvature can produce areas in the cloud which have saliency but the center of these areas can be empty. Adding solves this difficulty.

When a sparse solution to (9) is achieved, the codelength for the neighborhoods is established as was laid out in Section 3.3. Our model is based on the MDL principle, therefore, the saliency of each neighborhood is proportional to . We replaced the L2- norm with the L1-norm in the residual error, because it is more discriminative and robust to outliers. Next, following the instructions in Borji and Itti [⁴⁰], we rewrote the equations and both saliency measurements are then normalized and combined (10).

Where is the saliency produced by the sparse reconstruction error, is the saliency produced by the sparse coefficient and is the combination of both saliency measurements; normalizes the saliency measurements for a better fusion. The symbol in (10), is an integration scheme [⁴⁰], see Fig. 3.

Source: The Authors.

Figure 2 Surrounding neighborhoods selection. (a) Projecting the 3D points to 2D, (b) mark points and n-ring area, (c) selecting surrounding neighborhoods.

Source: The Authors.

Figure 3 Saliency maps generated by different operator functions, using equation (11). (a) Using * operator, (b) Using + operator, (c) Using min operator and (d) Using max operator.

The best results in all experiments were produced using the * operator. Since overlapping is allowed between neighborhoods, the total saliency map is obtained by accumulating the saliency by neighborhood. Our method differs from that proposed in [³⁰], since it only uses the dispersion vector (i.e., )as a measure of saliency, while our method also takes into account the reconstruction error or residual (i.e., )as an additional measure of saliency; finally when these measurements are combined, we obtain a better saliency map. Our method can be seen as a generalization of the framework presented in Li et al. [³⁰], improving the final result of the saliency map. This can be observed in Fig. 4. This is a result of the residuals performing as a weighting factor with respect to the measure of rarity given by the dispersion vector.

Source: The Authors

Figure 4 Saliency maps results, (a) Using only the dispersion vector as saliency measurement (b) Using the dispersion vector and residual error as saliency measurement (b).

4. Results and discussion

In this section, we evaluate the proposed method on a set of objects and compare it against 3D model saliency detection approaches outline in the literature review, taking into account methods based on point clouds and meshes. The object shapes were obtained from the Watertight Models of SHREC 2007 and the Stanford 3D Scanning repository. Our method is compared to three-point cloud-based methods from the literature review: Tasse et al. [²⁰], Shtrom et al. [²¹] and Guo et al. [¹⁹], as well as six mesh-based methods from the literature review: Lee et al. [³²], Wu et al. [¹³], Leifman et al. [³³], Tao et al. [¹¹], Song et al. [¹⁸] and Liu et al. [¹⁰]. It is also compared against the pseudo-ground truth [⁴¹].

All the experiments were run on a PC with an Intel Core i7-2670QM CPU@2.20 GHz and 8GB RAM. As an example, for the model of a girl in Fig. 8, with 15.5 K vertices, the saliency computation cost of our method was 81.7s, being slower than Tasse [²⁰] and Guo [¹⁹] but it has some optimizations and if a language like C++ is used, performance will improve.

Source: The Authors.

Figure 5 Saliency map produced by our method with different values of λ. From left to right: λ = 0.1, 0.2, 0.4, 0.6, 0.8, 0.9.

Source: [²⁴,³,²³].

Figure 6 Point-based methods saliency comparison. (a) Shtrom et al. [²³]. (b) Tasse et al. [³]. (c) Guo et al. [²⁴], and (d) our proposal.

Source

Figure 7 Mesh-based methods comparison. (a) Lee et al. [²¹]. (b) Wu et al. [¹⁰]. (c) Leifman et al. [²²], Tao et al. [⁸] (d), and (e) our proposal.

Source: [⁷,⁸,¹⁵,⁴¹]

Figure 8 Mesh saliency results of Tao et al. [¹¹] (a), Liu et al. [⁷] (b), Song et al. [¹⁵] (c), Our method (d), and the pseudo-ground truth. [⁴¹].

4.1. Parameter selection

The only parameter in our method is ; this is the regularization parameter. It produces smoother results as value increases in the range (0, 1); its visual effect is appreciated in Fig. 5. Experimentally we found that, on average, a value of lambda of near to 0.9 generates the best qualitative as well as quantitative results, therefore in all our experiments, we fix lambda at .

The size of the neighborhood is calculated according to the local characteristics of points like density and curvature, and its size is increased or decreased. To achieve this, a method -such as that proposed in [⁴²]- is used to estimate the size of the neighborhood, taking into account the local characteristics named.

4.2. Qualitative evaluation

Our method is compared with methods from the literature review, both mesh-based and point cloud-based. In Fig. 6, we compare the point-based methods, including Shtrom et al. [²¹], Tasse et al. [²⁰] and Guo et al. [¹⁹]. The method in Shtrom et al. can obtain a reasonable saliency result and highlights a relevant saliency region on the Max Planck model, but there are also extensive areas with noise around the principal saliency features. In the method proposed by Tasse et al., less noise is perceived, but there are large regions around the features with greater saliency. In the result of the Guo et al. method we observe a clean saliency map concentrated on the most representative saliency features, however, our result, in addition to achieving the same, highlights areas such as eyes, the lower part of the nose, ears and lips with greater saliency. Regarding the dragon model, there was a similar result as with the Max Planck model. The proposed method produces a clean saliency map compared to Shtrom et al. and Tasse et al. The saliency map produced by Guo et al. is very similar to ours, which only highlights some of the finer details such as eyes and ridges.

We also compare our results with mesh-based methods, shown in Fig. 7, with those of Lee et al. [³²], Wu et al.[¹³], Leifman et al. [³³], and Tao et al. [¹¹]. It can be observed that the local changes in the curvature had less influence when using the proposed method, as seen in Lee et al. [³²].

In the bunny and dragon models, the saliency map is not correct due to variation of the level of saliency in different areas, the dragon, for example, has lost some fine details like crests and ridges.

The method of Wu et al. produces a better saliency map than that of Lee et al., but some salient areas are missing as in the case of the dragon. The method proposed by Leifman et al. produces a reliable saliency map for the dragon model, in the case of bunny, the feet are missing in the final saliency map. For the Tao et al. method, it can be seen that in the dragon model, some areas are shown to be salient where they are not. In the visual comparisons, it should be pointed out that our method generally achieves better results than the other four methods.

Fig. 8 compares our results with [¹⁰,¹¹,¹⁸], and shows that the way in which our method detects saliency is more coherent, as it detects small salient regions, such as the ears and the little tie, and the facial regions like the eyes and mouth of the bust of the girl. Furthermore, the hair (bun and braids) show up better with the pseudo-ground truth [⁴¹], in comparison to the other four methods. We observe, in reference to the bird model, how our method detects the saliency points in the wings, tail, beak and finally the light area of saliency between the wings compared to the pseudo-ground truth bird model.

Source: [¹¹].

Figure 9 . Saliency results on Gargoyle model. Our method (bottom row), and Tao et al. [¹¹] (first row). The first two columns are front and back view of saliency results from Gargoyle model. The second two columns are the same view results but with 30% random noise relative to the average of the nearest distance of each point.

Fig.9 shows the results when the input is a noisy point cloud, we can observe that our method is more robust against noise in comparison to mesh-based methods [¹¹]. We added 30% (relative to the average of the nearest distance of each point) random noise to disturb the points’ coordinates, and observe that the method proposed by Tao et al. failed to detect some details of the wings in the gargoyle like the little rings between the veins from the front view and the loss of some details in the veins from the back view. We can observe that our method is more consistent in detecting salient features in the gargoyle model, both in the clean and noisy model in comparison to Tao et al. [¹¹]. The strength of our method lies in the fact that since the saliency of point clouds is a feature of a sparse nature, modeling based on sparse models is usually a good alternative to adequately represent these characteristics. When it is combined with the MDL principle, this method is the simplest and best choice guarantee among all the possible solutions.

4.3. Quantitative evaluation

We carried out a quantitative evaluation on the 400 watertight models of SHREC 2007, using the distribution of schelling points provided by Chen et al. [⁴¹] as the ground truth. To evaluate the performance, we used the metrics proposed by Tasse et al. [⁴³]; they propose three metrics that adapt 2D image saliency metrics to 3D saliency, these metrics are: Area Under the ROC curve (AUC), Normalized Scanpath Saliency (NSS) and Linear Correlation Coefficient (LCC). In the AUC metric the ideal saliency model has a score of 1.0, AUC disregards regions with no saliency, and focuses on the ordering of saliency values. The NSS metric measures the saliency values selecting the users as fixation points. The LCC metric has values of between -1 and 1, with values closer to 0 implying weak correlation. This metric compares the saliency map under consideration with the ground-truth saliency map (Schelling distributions). For comparison, we selected three methods from the literature review, two based on point clouds and one based on meshes, these methods are: the clustering method (CS) [²⁰], the point-wise method (PW) [¹⁹] and the spectral analysis method (SS) [¹²]. Also, as a reference score, we incorporated the human performance score (HS), provided by Tasse et al. [⁴³].

Table 1 shows the AUC score values of the selected methods; our method competes with the CS and PW methods, as the final average shows. The SS method performs poorly compared to our method and the CS and PW methods. The same is true regarding the HS method which outperforms the SS method. Table 1 shows that for the classes glasses, ant, octopus, bird and bearing, our method obtained the best results, while for the classes hand, fish, spring, mechanic, airplane and vase, our method equals the PW and CS methods. The above shows that the proposed method achieves the same as the most up-to-date methods of the literature review and, in some cases, even improves on them.

Table 1 AUC performance per shape class in SHREC 2007.

Source: The Authors.

Fig. 10 shows NSS and LCC metrics, and also confirms that the performance of our method is similar to CS and PW, and outperforms SS, but HS outperforms all methods presented.

Source: The Authors.

Figure 10 Saliency performance evaluation under the metrics NSS and LCC.

5. Conclusions and future work

This paper presents a novel and simple method for point cloud saliency detection via sparse coding. Based on the MDL principle, the proposed method uses a sparse coding representation to find the minimum codelength to establish when a neighborhood is salient or not with respect to its surrounding neighborhoods. It is robust against noise since it takes the mean of the feature vectors of the neighborhood as a unique feature vector.

Our approach produces feasible and even faithful results on a variety of models, giving convincing results. We have compared our results to the most recent approaches found in the literature review and we found that the proposed method competes with and in several cases significantly outperforms these approaches, using the pseudo-ground truth provided by Chen et al. [⁴¹] as a reference. For future studies, we plan to investigate how the incorporation of high-level information in the form of semantic cues into the point cloud saliency detection allows us to identify salience globally on the point cloud.

Acknowledgements

This study is partially supported by Departamento Administrativo de Ciencia, Tecnología e Innovación de Colombia (Colciencias), under the program doctoral scholarships 727.

References

[1] Jia, S., Zhang, C., Li, X. and Zhou, Y., Mesh resizing based on hierarchical saliency detection, Graph. Models, 76(5), pp. 355-362, Sep. 2014. DOI: 10.1016/j.gmod.2014.03.012 [ Links ]

[2] Wolfe, J.M., Guided Search 2.0 A revised model of visual search, Psychon. Bull. Rev., 1(2), pp. 202-238, Jun. 1994. DOI: 10.3758/BF03200774 [ Links ]

[3] Koch, C. and Poggio, T., Predicting the visual world: silence is golden, Nat. Neurosci., 2(1), pp. 9-10, Jan. 1999. DOI: 10.1038/4511 [ Links ]

[4] Somasundaram, G., Cherian, A., Morellas, V. and Papanikolopoulos, N., Action recognition using global spatio-temporal features derived from sparse representations, Comput. Vis. Image Underst., 123(1), pp. 1-13, Jun. 2014. DOI: 10.1016/j.cviu.2014.01.002 [ Links ]

[5] Kalboussi, R., Abdellaoui, M. and Douik, A., A spatiotemporal model for video saliency detection, in 2016 International Image Processing, Applications and Systems (IPAS), 2016, pp. 1-6. DOI: 10.1109/IPAS.2016.7880113 [ Links ]

[6] Kim, Y., Varshney, A., Jacobs, D.W. and Guimbretière, F., Mesh saliency and human eye fixations, ACM Trans Appl Percept, 7(2), pp. 1-13, Feb. 2010. DOI: 10.1145/1670671.1670676 [ Links ]

[7] Lau, M., Dev, K., Shi, W., Dorsey, J. and Rushmeier, H., Tactile mesh saliency, ACM Trans Graph, 35(4), pp. 1-11, Jul. 2016. DOI: 10.1145/2897824.2925927 [ Links ]

[8] Limper, M., Kuijper, A. and Fellner, D.W., Mesh saliency analysis via local curvature entropy, in: Proceedings of the 37th Annual Conference of the European Association for Computer Graphics: short papers, Goslar Germany , Germany, 2016, pp. 13-16. DOI: 10.2312/egsh.20161003 [ Links ]

[9] Wang, S., Li, N., Li, S., Luo, Z., Su, Z. and Qin, H., Multi-scale mesh saliency based on low-rank and sparse analysis in shape feature space, Comput. Aided Geom Des. ., 35(36), pp. 206-214, May 2015. DOI: 10.1016/j.cagd.2015.03.003 [ Links ]

[10] Liu, X., Tao, P., Cao, J., Chen, H. and Zou, C., Mesh saliency detection via double absorbing Markov chain in feature space, Vis. Comput., 32(9), pp. 1121-1132, Sep. 2016. DOI: 10.1007/s00371-015-1184-x [ Links ]

[11] Tao, P., Cao, J., Li, S., Liu, X. and Liu, L., Mesh saliency via ranking unsalient patches in a descriptor space, Comput. Graph., 46, pp. 264-274, Feb. 2015. DOI: 10.1016/j.cag.2014.09.023 [ Links ]

[12] Song, R., Liu, Y., Martin, R. and Rosin, P.L., Mesh saliency via spectral processing, ACM Trans Graph, 33(1), pp. 6:1-6:17, Feb. 2014. DOI: 10.1145/2530691 [ Links ]

[13] Wu, J., Shen, X., Zhu, W. and Liu, L., Mesh saliency with global rarity, Graph. Models, 75(5), pp. 255-264, Sep. 2013. DOI: 10.1016/j.gmod.2013.05.002 [ Links ]

[14] Nouri, A., Charrier, C. and Lézoray, O., Multi-scale mesh saliency with local adaptive patches for viewpoint selection, Signal Process. Image Commun., 38, pp. 151-166, Oct. 2015. DOI: 10.1016/j.image.2015.08.002 [ Links ]

[15] Liu, X., Ma, L. and Liu, L., P2: a robust and rotationally invariant shape descriptor with applications to mesh saliency, Appl. Math.- J. Chin. Univ., 31(1), pp. 53-67, Mar. 2016. DOI: 10.1007/s11766-016-3364-5 [ Links ]

[16] Zhao, Y. et al., Region-based saliency estimation for 3D shape analysis and understanding, Neurocomputing, 197, pp. 1-13, Jul. 2016. DOI: 10.1016/j.neucom.2016.01.012 [ Links ]

[17] Jeong, S.W. and Sim, J.Y., Saliency detection for 3D surface geometry using semi-regular meshes, IEEE Trans. Multimed., 19(12), pp. 2692-2705, Dec. 2017. DOI: 10.1109/TMM.2017.2710802 [ Links ]

[18] Song, R., Liu, Y., Martin, R. and Echavarria, K., Local-to-global mesh saliency, Vis. Comput., 34(3), pp. 323-336, Nov. 2016. DOI: 10.1007/s00371-016-1334-9 [ Links ]

[19] Guo, Y., Wang, F. and Xin, J., Point-wise saliency detection on 3D point clouds via covariance descriptors, Vis. Comput., pp. 1-14, Jun. 2017. DOI: 10.1007/s00371-017-1416-3 [ Links ]

[20] Tasse, F.P., Kosinka, J. and Dodgson, N., Cluster-based point set saliency, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 163-171. DOI: 10.1109/ICCV.2015.27 [ Links ]

[21] Shtrom, E., Leifman, G. and Tal, A., Saliency detection in large point sets, in: 2013 IEEE International Conference on Computer Vision, 2013, pp. 3591-3598. DOI: 10.1109/ICCV.2013.446 [ Links ]

[22] Akman, O. and Jonker, P., Computing saliency map from spatial information in point cloud data, in Advanced Concepts for Intelligent Vision Systems, 2010, pp. 290-299. DOI: 10.1007/978-3-642-17688-3_28 [ Links ]

[23] Yu, H., Wang, R., Chen, J., Liu, L. and Wan, W., Saliency computation and simplification of point cloud data Proceedings of 2nd International Conference on Computer Science and Network Technology , 2012, pp. 1350-1353. DOI: 10.1109/ICCSNT.2012.6526171 [ Links ]

[24] An, G., Watanabe, T. and Kakimoto, M., Mesh simplification using hybrid saliency, in: 2016 International Conference on Cyberworlds (CW), 2016, pp. 231-234. DOI: 10.1109/CW.2016.47 [ Links ]

[25] Dutta, S., Banerjee, S., Biswas, P.K. and Bhowmick, P., Mesh denoising using multi-scale curvature-based saliency, in: Computer Vision- ACCV 2014 Workshops, 2014, pp. 507-516. DOI: 10.1007/978-3-319-16631-5_37 [ Links ]

[26] Jiao, X., Wu, T. and Qin, X., Mesh segmentation by combining mesh saliency with spectral clustering, J. Comput. Appl. Math., 329(1), pp. 134-146, Feb. 2018. DOI: 10.1016/j.cam.2017.05.007 [ Links ]

[27] Tasse, F.P., Kosinka, J. and Dodgson, N., How well do saliency-based features perform for shape retrieval?, Comput Graph, 59(C), pp. 57-67, Oct. 2016. DOI: 10.1016/j.cag.2016.04.003 [ Links ]

[28] Gal, R. and Cohen-Or, D., Salient geometric features for partial shape matching and similarity, ACM Trans Graph, 25(1), pp. 130-150, Jan. 2006. DOI: 10.1145/1122501.1122507 [ Links ]

[29] Wang, W. et al., Saliency preserving slicing optimization for effective 3D printing, Comput. Graph. Forum, 34(6), pp. 148-160, Sep. 2015. DOI: 10.1111/cgf.12527 [ Links ]

[30] Li, Y., Zhou, Y., Xu, L., Yang, X. and Yang, J., Incremental sparse saliency detection, in: 2009 16th IEEE International Conference on Image Processing (ICIP), 2009, pp. 3093-3096. DOI: 10.1109/ICIP.2009.5414465 [ Links ]

[31] Ramirez, I. and Sapiro, G., An MDL framework for sparse coding and dictionary learning, IEEE Trans. Signal Process., 60(6), pp. 2913-2927, Jun. 2012. DOI:10.1109/TSP.2012.2187203 [ Links ]

[32] Lee, C.H., Varshney, A. and Jacobs, D.W., Mesh saliency, in: ACM SIGGRAPH 2005 Papers, New York NY, USA, , 2005, pp. 659-666. DOI: 10.1145/1186822.1073244 [ Links ]

[33] Leifman, G., Shtrom, E. and Tal, A., Surface regions of interest for viewpoint selection, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 414-421. DOI: 10.1109/CVPR.2012.6247703 [ Links ]

[34] Itti, L., Koch, C. and Niebur, E., A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell., 20(11), pp. 1254-1259, Nov. 1998. DOI: 10.1109/34.730558 [ Links ]

[35] Itti, L. and Koch, C., Computational modelling of visual attention, Nat. Rev. Neurosci., 2(3), pp. 194-203, Mar. 2001. DOI: 10.1038/35058500 [ Links ]

[36] Olshausen ,, and Field D.J ., Sparse coding with an overcomplete basis set: a strategy employed by V1?, Vision Res., 37(23), pp. 3311-3325, Dec. 1997. DOI: 10.1016/S0042-6989(97)00169-7. [ Links ]

[37] Bao, C., Ji, H., Quan, Y. and Shen, Z., Dictionary learning for sparse coding: algorithms and convergence analysis, IEEE Trans. Pattern Anal. Mach. Intell., 38(7), pp. 1356-1369, Jul. 2016. DOI: 10.1109/TPAMI.2015.2487966 [ Links ]

[38] Rissanen, J., Modeling by shortest data description, Automatica, 14(5), pp. 465-471, Sep. 1978. DOI: 10.1016/0005-1098(78)90005-5 [ Links ]

[39] Bruce, N.D.B. and Tsotsos, J.K., Saliency based on information maximization, in: Proceedings of the 18th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 2005, pp. 155-162. DOI: 10.1167/9.3.5 [ Links ]

[40] Borji, A. and Itti, L., Exploiting local and global patch rarities for saliency detection, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 478-485. DOI: 10.1109/CVPR.2012.6247711 [ Links ]

[41] Chen, X., Saparov, A., Pang B. and Funkhouser, T ., Schelling points on 3D surface meshes, ACM Trans Graph, 31(4), pp. 29:1-29:12, Jul. 2012. DOI: 10.1145/2185520.2185525 [ Links ]

[42] Mitra, N.J. and Nguyen, A., Estimating surface normals in noisy point cloud data, in: Proceedings of the Nineteenth Annual Symposium on Computational Geometry, New York NY , , USA, 2003, pp. 322-328. DOI: 10.1145/777792.777840 [ Links ]

[43] Tasse, F.P., Kosinka, J. and Dodgson, N.A., Quantitative analysis of saliency models, in: SIGGRAPH ASIA 2016 Technical Briefs, New York, NY, USA , 2016, pp. 19:1-19:4. DOI:10.1145/3005358.3005380 [ Links ]

How to cite: Leal, E., Sanchez-Torres, G. & Branch-Bedoya, J.W., Point cloud saliency detection via local sparse coding. DYNA, 86(209), pp. 238-247, April - June, 2019.

E. Leal, received a BSc. Eng in Systems Engineering in 2000 from Universidad de Antioquia, Colombia, an MSc. in Systems Engineering in 2006, from Universidad Nacional de Colombia in Medellin, Colombia. Currently he is a full professor at the Faculty of Engineering, Universidad Autonoma del Caribe, Barranquilla, Colombia. His research interests include: digital image processing, 3D surface reconstruction, computer vision and computational intelligence techniques. ORCID: 0000-0003-2468-370X

G. Sanchez-Torres, received a BSc. Eng in Systems Engineering in 2005 from Universidad del Magdalena, Colombia, an MSc. in Systems Engineering in 2006, and a PhD degree in Systems Engineering in 2012, both from Universidad Nacional de Colombia in Medellin, Colombia. Currently, he is a full professor at the Faculty of Engineering, Universidad del Magdalena, Santa Marta, Colombia. His research interests include: digital image processing, 3D surface reconstruction, computer vision, computational intelligence techniques and computational simulation and modeling. ORCID: 0000-0002-9069-0732

J.W. Branch-Bedoya, received a BSc. Eng in Mining and Metallurgy Engineering in 1995, an MSc. in Systems Engineering in 1997, and a PhD in Systems Engineering in 2007; all of them from Universidad Nacional de Colombia in Medellin, Colombia. Currently he is a full professor at the Computing and Decision Sciences Department, in the Facultad de Minas, Universidad Nacional de Colombia, Medellin, Colombia. His research interests include: automation, computer vision, digital image processing and computational intelligence techniques. ORCID: 0000-0002-0378-028X

Received: November 04, 2018; Revised: March 19, 2019; Accepted: April 22, 2019

This is an open-access article distributed under the terms of the Creative Commons Attribution License

Services on Demand

Journal

Article

Indicators

Related links

Share

DYNA

Print version ISSN 0012-7353

Dyna rev.fac.nac.minas vol.86 no.209 Medellín Apr./June 2019

https://doi.org/10.15446/dyna.v86n209.75958