I. INTRODUCTION
Among Karp's 21 NP-complete problems [1] is the set covering problem (SCP). The SCP has many real-world applications, such as locating emergency services [2][3], military planning [4][5], and decision-making in the context of a pandemic due to COVID-19 [6][7]. Its solution is given by a subset of available elements subject to problem constraints. The SCP is defined by a binary matrix
Minimize:
Constraints:
Equation (2) indicates that each row must be covered by at least one column. Equation (3) indicates that decision variables
Algorithms to solve the SCP have been proposed over the years. A group of them are exact algorithms, which ensure the optimization of the solutions they find. However, these are capable of solving only problems of a limited size. Large-scale problems tend to require a substantial computational effort [8][9]. In another group are the heuristic (H) and metaheuristic (MH) algorithms, which consist of iterative procedures that guide a subordinate heuristic, combining different concepts to explore and exploit the search space intelligently. Although the solutions achieved through MH algorithms cannot be guaranteed in terms of optimization, they have been empirically shown to be of good quality [10].
Considering the above, MH algorithms include components such as initialization methods and local search that can be modified, contributing to finding better quality solutions in arge-scale SCPs. In the present study, we carry out a systematic mapping of the literature, focused on these integrated methods in algorithms applied to the SCP, contributing to identifying possible routes in this area that can be deepened in future research. Given that the SCP is a very active research topic on which there are a large number of studies, the present systematic mapping only considers studies published in the last 5 years, seeking to study the most recent publications.
The present study is organized as follows: Section 2 explains the research protocol to conduct this systematic mapping, Section 3 shows the results obtained by the mapping, Section 4 lays out in detail the impact of the results obtained on the study area, and finally, in Section 5 the conclusions are presented.
II. METHODOLOGY
Systematic mapping of the literature is a methodological process that allows collecting and classifying existing studies on a research topic. To carry out this mapping, the guidelines presented in the following works were used as a reference: Kitchenham et al. [11], Petersen et al. [12], Cañizares et al. [13] and Bashab et al. [14], involving three main stages: planning, implementation, and documentation of results. The first two stages are described in this section, and the results documentation stage is presented in Section 3.
A. Planning Stage
In this stage, the activities for carrying out the search, selection and analysis of the studies are defined, encompassing the following activities: (i) design of the research questions; (ii) research strategy; (iii) definition of inclusion/exclusion criteria; (iv) quality evaluation criteria; and (v) execution stage.
Research Questions: With the present systematic mapping, we seek to determine the state of the art of the initialization and local search methods that have been applied to the SCP problem to identify possible studies in the future. In order to fulfill this objective, the research questions shown in Table 1 were established.
Research question | Motivation |
---|---|
Q1. What initialization methods have recently been applied to the SCP? | Identify initialization methods that have been used in recent algorithms applied to the SCP |
Q2. What local search methods have recently been applied to the SCP? | Identify local search methods that have been used in recent algorithms applied to the SCP |
Q3. What is the quality of the selected studies? | Determine the quality of the selected studies according to Table 6. Quality assessment criteria of the primary studies |
Search Strategy: In order to find the primary studies, search strings related to the research questions were defined and applied for the Scopus, Springerlink, Science Direct, and ACM research databases, which were selected for their recognition and use. In addition, preliminary searches in these returned considerable results on the proposed research topic. Meanwhile, Scopus was selected since it is an abstract indexer that ensures the coverage of other databases that were not selected. The search strings applied in these databases are presented in Table 2.
Database | Search string |
---|---|
Scopus |
|
Springerlink | algorithm AND *heuristic* AND "set covering problem" AND (initializ* OR "local search") |
ScienceDirect, ACM | "set covering problem" AND (algorithm OR heuristic) AND (initialize OR initialization OR "local search") |
The structure of the search strings usually uses the OR connector to join similar terms as synonyms. In this study, the use of the OR connector is defined to include studies that apply at least one of the two searched methods since, in preliminary searches, few studies that apply the two methods in the same proposed algorithm were observed. The AND operator makes the search compulsorily include the terms in such a way that it finds articles related to initialization methods and local search applied to the SCP. Quotation marks are used to search for exact phrases and the asterisk (*) wildcard is to cover spelling variations of words referring to the same concept. In Scopus, the following reserved search words specific to the database were used: (i) TITLE-ABS-KEY to search the title, abstract and keywords, (ii) REF to search the document references, (iii) PUBYEAR to filter the results by year of publication.
Selection Criteria for Primary Studies: The studies were collected by verifying that the terms defined in the search strings were included within the title, abstract, keywords or body of the document (Fig. 1).
Studies that met the following criteria were considered: (i) solve the classic SCP, (ii) use initialization or local search methods, (iii) are published in English, (iv) contain the keywords defined in the search string, (v) were published from 2017 to the year of implementation of this mapping (2022). Meanwhile, studies that met the following criteria were excluded: (i) they do not contain detailed information or there is no access to it, (ii) repeated studies.
Data Extraction Strategy: In order to extract the relevant information from the primary studies and answer the research questions, they are grouped by similar characteristics that are described later. For studies that are not grouped, the main characteristics of the methods found are described. Then, the answers obtained are analyzed, and the main contributions found are discussed.
Synthesis Methods: The analysis of the primary studies was based on the representation of the information in tables, grouping the methods found according to common characteristics. The quality analysis of the studies was represented with a stacked bar graph, showing the percentage of studies that met each quality criterion according to their obtained score.
Systematic Mapping Calendar: The systematic mapping began in June 2022 and ended in October 2022.
B. Execution Stage
In this stage, the research protocol defined in the previous stage was applied through four iterations, one iteration for each research database. Table 3 presents the total number of studies found in the defined research databases, then the relevant, repeated, and primary ones.
III. RESULTS
In this stage, the research questions raised are presented along with their respective answers based on the selected primary studies.
A. What Initialization Methods have been Applied Recently to the SCP?
Table 4 shows the initialization methods found, indicating the approach applied and the type of algorithm (H or MH) with their respective quantities. The percentages are calculated taking as reference the 23 primary studies. 4% of the initialization methods were found in H algorithms, while 74% were found in MH algorithms. Two initialization approaches were identified: 65% of the studies are related to a randomized approach with solution repair, and 13% are related to other approaches based on iterative construction of solutions and based on n-approximate algorithms. The main characteristics of the methods found are described below.
Method | Studies | Algorithm | Approach | Quantity | % | |
---|---|---|---|---|---|---|
H | MH | |||||
Random with heuristic operator and removal of columns | [16] - [27] | 13 | Random, solution repair | 15 | 65% | |
Random. Solution repair is applied at later stages | [28], [29] | 2 | ||||
Iterative construction of solutions | [30] | 1 | Other approaches | 3 | 13% | |
Based on n-approximate algorithms | [31], [32] | 2 | ||||
Total | 1 | 17 | 18 | |||
Percentage | 4% | 74% | 78% |
Studies [15]-[16] show population-based MH algorithms where the initialization stage integrates the random generation of solutions, a heuristic operator applied to solutions that do not meet the covering conditions, and a redundant column reduction operator, which removes rows covered by more than one column seeking to improve solutions. These last two operators are applied again in later stages of the algorithms, validating compliance with the covering conditions.
Meanwhile, [17] and [18] show a similar approach, except for the initialization stage, where only the random generation of solutions is performed. As in [15]-[16], the heuristic and column reduction operators are applied in later stages of the algorithms. In the variant of the binary monkey algorithm [17], it is applied after the watch jump process, and in the binary version of the fruit fly algorithm [18], it is then applied to the global view-based search stage.
In [19], the iterative solution construction phase is implemented using a set of evaluation functions and selection criteria to instantiate the columns of a solution set according to the constraint matrix of the problem entered.
In the initialization phase defined in [9], a set of initial solutions is built randomly, and then a simple greedy heuristic is used to repair infeasible solutions so that column j with the highest priority is added to the solution. The priority of each column is calculated in terms of the set of rows discovered, the weight of row i, and the cost of column j.
In [20] and [21], population initialization methods based on n-approximate algorithms are applied, whose characteristic is to ensure that the generated solutions are n-factor of the known optimal solution. These methods were coupled to a genetic algorithm seeking to analyze the impact that these methods generate in the general performance of the algorithm.
B. What Local Search Methods have been Applied Recently to the SCP?
Table 5 shows the local search methods grouped by algorithm type. The percentages are calculated taking as reference the 23 primary studies. 13% of the local search methods were found in H algorithms and 26% in MH algorithms. In addition, all methods were used by a single study except for the Configuration Checking strategy (CC) and Row Weighting-based Local Search (RWLS), which were used by two studies. The main characteristics of the methods found are described below.
Method | Studies | Algorithm | Quantity | % | |
---|---|---|---|---|---|
H | MH | ||||
Generation and selection of the best local neighbor | [18] | X | 1 | 4% | |
KNN Perturbation operator | [15] | X | 1 | 4% | |
Iterated Local Search ILS | [19] | X | 1 | 4% | |
Configuration Checking Strategy CC | [8], [22] | X | 2 | 9% | |
Row Weighting-based Local Search RWLS | [23][24] | X | 2 | 9% | |
JB Local Search | [25] | X | 1 | 4% | |
Row Weighting-based mutation and Local Search | [9] | X | 1 | 4% | |
Total | 3 | 5 | 9 | ||
Percentage | 13% | 26% | 40% |
A local search method is integrated into the Fruit-Fly algorithm [18]; this is based on the generation of neighbors for each solution in the population and the selection of the best local neighbor on evaluating the objective function. Neighbors of a solution are generated by randomly selecting columns and then changing their values to binary opposites. If infeasible solutions are obtained, a solution repair operator similar to the one used in the solutions initialization of the algorithm is applied.
The solution perturbation operator based on the k-nearest neighbors (KNN) technique is implemented [15], which seeks to perturb the list of solutions when a perturbation criterion is met. If the solution is within 25% of the best solutions of the iteration, a perturbation probability is defined for each element of the solution. Otherwise, the solution is randomly perturbed according to a defined parameter.
An iterated local search procedure (ILS) is proposed, which repairs solutions after the initialization phase, randomly setting some solution variables to zero, applying a random greedy procedure to complete the solution and performing the replacement of the current solution if the generated solution is better [19].
In [8] and [22], local search algorithms based on the configuration checking strategy (CC) are proposed, which prevents these algorithms from finding a previously visited search space. In [22], this strategy is integrated into a heuristic that distinguishes four different situations during the search and uses different variable selection rules in each situation to determine the bit exchange of a solution. In [8], the SCP is interpreted as a hypergraph, where the local search is responsible for adding or removing a hyperedge (column) of a candidate solution according to the evaluation of established rules. It then applies the weight diversity strategy to update the vertices covered and uncovered, respectively, by the candidate solution.
In [23] and [24], memetic approaches are presented to solve the unicost SCP that integrates the row weighting-based local search algorithm (RWLS). It is composed of three elements: a weighting scheme that updates the weightings of the discovered elements to avoid convergence to local optima, a tabu strategy to avoid possible cycles during the search, and a timestamp method to prioritize uncovered rows.
In [25], a local search algorithm called JB local search is implemented, which consists of eliminating the columns that exceed a cost threshold from the solution. Since the covering of one or more rows is likely to be lost with this action, the algorithm covers the uncovered rows based on cost criteria at the maximum cost allowed for a column to be included.
In [9], a memetic approach to solving the SCP based on a genetic algorithm is shown. This study proposes a Row Weighting-based Mutation operator combined with a local search algorithm. Instead of randomly inverting bits of the solution with a small probability, a scoring term is defined for each bit of a gene. The genetic bit with the highest score is mutated, then local search is performed to build a better solution, and a variance crossover operator is applied that can exchange the evolutionary information of individuals. Finally, the best genes for the next generation are selected.
C. What is the Quality of the Studies Selected?
To evaluate and weigh the quality of the primary studies [26], we defined five criteria to be applied to each study and a compliance scale associated with a score with possible values (-1, 0, 1), as shown in Table 6.
Criterion | Description | Assigned score | ||
---|---|---|---|---|
-1 | 0 | 1 | ||
C1 | The study presents a detailed description of the algorithms and methods used. | No | Partially | Yes |
C2 | The study presents the results obtained clearly and in detail. | No | Partially | Yes |
C3 | Number of measurements used by the study to present the results obtained (Max, Avg, RPD, etc.). | 1 | 2 or 3 | 3+ |
C4 | The study has been published in a relevant journal or conference (considering the JCR index). | No | Q4, Q3 | Q2, Q1 |
C5 | The study has been published by other authors (according to the Google Scholar citation index). | 0 | 1 to 10 | 10+ |
Finally, the values obtained are added to obtain a total score for each study (Table 7).
Study | C1 | C2 | C3 | C4 | C5 | Total |
---|---|---|---|---|---|---|
[8] | 1 | 0 | -1 | 1 | 1 | 2 |
[9] | 1 | 1 | 0 | -1 | -1 | 0 |
[16] | 1 | 1 | 1 | 1 | 0 | 4 |
[17] | 1 | 1 | 0 | 1 | 0 | 3 |
[18] | 0 | 1 | 1 | 1 | 1 | 4 |
[19] | 0 | 1 | 0 | 0 | -1 | 0 |
[20] | 0 | 1 | 0 | 0 | 1 | 2 |
[21] | 0 | 1 | 0 | 0 | 0 | 1 |
[22] | 1 | 1 | 1 | 1 | 1 | 5 |
[23] | 0 | 1 | 0 | 0 | -1 | 0 |
[24] | 0 | 1 | 0 | 0 | -1 | 0 |
[25] | 0 | 1 | 0 | 0 | -1 | 0 |
[26] | 0 | 1 | 0 | 0 | -1 | 0 |
[27] | 0 | 1 | 0 | 0 | -1 | 0 |
[28] | 0 | 1 | 0 | 0 | 0 | 1 |
[29] | 1 | 1 | 0 | 1 | 1 | 4 |
[30] | 1 | 1 | 1 | 1 | 0 | 4 |
[31] | 1 | 0 | 1 | -1 | 0 | 1 |
[32] | 1 | 0 | 1 | 0 | -1 | 1 |
[33] | 1 | 0 | 0 | 1 | 0 | 2 |
[34] | 1 | 1 | 0 | 1 | -1 | 2 |
[35] | 1 | 1 | 1 | 1 | 1 | 5 |
[36] | 1 | 1 | 1 | 1 | -1 | 3 |
Figure 2 shows the percentage distribution of the studies according to the quality criteria and scores obtained. It shows that approximately 53% of the studies describe in detail the algorithms and methods used (C1); 82% clearly reveal the results (C2); and 34% use more than three measures when showing the results (C3). In terms of publication and the number of citations of the studies, 47% of the studies were published in recognized journals or conferences (C4), and 23% have ten or more accumulated citations (C5). In the latter, it should be noted that approximately 43% of the studies have no citations (score -1).
IV. DISCUSSION
A. Main Observations
Table 4 and 5 identify the H and MH algorithms applied to the SCP in recent years. On comparing these two approaches, it is observed that the highest percentages (74% of the initialization methods and 22% of the local search methods) correspond to methods applied in MH algorithms. The dominance of these types of algorithm is due to the advantages they offer when solving small and large-scale problems [10].
Meanwhile, two types of algorithms were identified in the primary studies to solve the SCP: (i) algorithms that have not been previously applied to this problem and (ii) modified algorithms based on a previous proposal. The results of some algorithms of type (i) [15][16][18][27] show that the average optimal values obtained from the algorithm executions are far from the optimal ones known from the test problems. This is related to the novelty of the proposed algorithms and the initial uncertainty of knowing the quality of the results that they might return.
Population-based MH algorithms were found applied in continuous domains using binarization methods [15, 17-18], a percentile operator [28-31], and operators based on machine learning [15-16, 32-35] to resolve binary domains problems such as SCP. These methods facilitate the adoption of algorithms that have not been applied to the SCP or that were recently proposed.
From the results obtained in Table 4, it can be seen that 65% of the studies found apply the random initialization method with heuristic search and elimination of redundant columns, which is due to the ease of its implementation to generate feasible solutions. Meanwhile, in other approaches that correspond to 13%, studies [20-21] stand out, where n-approximate algorithms are proposed as initialization methods which locate the search in promising regions of the search space in a shorter convergence time, increasing the chances of finding quality solutions.
From the results obtained in Table 5, it can be seen that 40% of the studies adapt local search methods. In addition, it was observed in studies [23][25] that adapting these methods to the base algorithm contributes to obtaining better results compared to the execution of the algorithm without applying local search.
Regarding quality assessment, it is observed that between 92% and 100% of the studies satisfactorily or partially fulfill the first four quality criteria. However, the percentage of studies that have not been cited corresponds to approximately 43%, which can determine whether or not they are used as a reference in future research.
B. Limitations of Systematic Mapping
On executing the search strings in the defined research databases, they returned a large number of studies that address the formulation of real problems based on the SCP, as well as a large number of H and MH algorithms applied to the classic SCP and its variants, which included the keywords without detailing the search methods. For this reason, many studies were discarded (Table 3).
V. CONCLUSIONS
This systematic mapping found a trend in applying MH algorithms and random initialization methods with heuristic search and removing columns redundant to the SCP. In addition, it was found that the adaptation of local search methods in MH algorithms applied to the SCP reports better results than the execution of the algorithm without applying local search.
The initialization and local search methods applied to the SCP found in this mapping can be applied in new algorithms to solve these types of problem, with the aim of analyzing their behavior and impact on the results obtained.
The mapping of recent studies ought to be continued, including other databases seeking to expand knowledge regarding initialization and local search methods applied to the SCP. Furthermore, the variants of this problem must be involved in order to determine which methods were used.