1. Introduction
Supervised learning techniques are commonly used in Machine Learning applications to design specific classifiers that can solve pattern recognition problems. One such classifier is the Support Vector Machine (SVM), which aims to find a decision surface that maximizes the margin between two distinct groups in binary categorization 1 3,4].
In embedded systems, hardware limitations can pose restrictions on implementing sophisticated algorithms. However, methods such as modular arithmetic can assist in overcoming these limitations by leading to less complex functional units compared to floating-point arithmetic. One specific approach to utilizing modular arithmetic in parallel hardware operations is the Residue Number System (RNS), which enables the representation of large numbers using a set of smaller numbers [5].
Although the RNS exhibits fast operations such as addition, subtraction, and multiplication, it poses challenges when dealing with non-linear operations like comparison and division [6]. Overcoming this limitation requires a specific approach: first, developing a method to represent input features using integers modulo-M; second, mapping non-linear functions such as the Kernel in SVM to integers modulo-M; and finally, utilizing the model to design an equivalent RNS-based implementation of SVM (SVM-RNS).
By leveraging the SVM-RNS equivalent as a foundational component in pattern recognition, it is indeed feasible to explore more complex architectures for multi-class classifiers in subjects such as surface electromyographic (sEMG) signal analysis [7,8].
2. Support Vector Machines (SVM)
In SVM classifiers, the surface decisions aim to the find optimal margin, which involves maximizing the minimum margin between two classes.
This problem can be formulated in terms of optimization using dual problem formulation:
Here, each λ corresponds to Lagrange Multipliers, K represents the Kernel function, C is the maximum bound for λ, x represents an input feature from training set, and l is the class label. Numerical optimization methods such as a simplified version of Sequential Minimal Optimization (SMO) algorithm [9] can be employed to obtain the Lagrange Multipliers and Support Vectors (S.V), which form a hyperplane:
The Kernel function is a powerful tool that can be used to transform non-separable categories into separable ones. One well-known example is the Gaussian Kernel, which can map finite-dimensional features onto a space characterized by an infinite number of dimensions [10]. This transformation is achieved by calculating the Euclidean distance between the Support Vectors xi, which involves performing inner product operations:
After training stage, the Lagrange Multipliers lie within the interval:
The class labels are defined as 1 for Class 1 and -1 for Class 2, indicating that labels belong to the set {-1,1}:
For classification purposes, the hyperplane g is utilized as the input for the sign function:
3. Residue Number Systems (RNS)
In hardware implementations, several techniques are available to execute operations in parallel based on modular arithmetic [11]. For instance, considering an integer number X that can be decomposed into a set of residues {x1, x2, …, xn} with respect to moduli set {m1, m2, …, mn}. It is essential to note that each modulus mi is coprime with each other modulus mj. Similarly, when this procedure is applied to an integer Y, the following system can be defined:
Where ◦ denotes any operator from the set {+, -, *}, and M represents the Dynamic Range of the moduli set, which defines the number of possible representations in the ring [9]. Furthermore, the RNS is specified in this work under the following constraints:
Each integer X represented in this finite field must satisfy the following condition:
This main constraint is thoroughly analyzed in this work as it significantly influences every aspect of the SVM, as discussed in the following sections.
4. Implementation constraints over integers
Recalling the previous analysis, it can be inferred that the SVM hyperplane satisfies the following equation:
Where G 1 represents a gain that can assists in mapping g(xj) within a limited scale. In the case of integer implementation, there are more specific restrictions for each part of the hyperplane. Firstly, finite fields only allow the representation of a subset P composed of integers:
This implies that an equivalent representation in the subset P must be obtained for every component of the SVM, including Lagrange Multipliers, feature vectors, Kernel function, class labels and comparator function.
5. Proposed Kernel function mapping over integers modulo-M
The Kernel function is an essential component of Support Vector Machines (SVMs) which maps features into a space with higher dimensions. In Radial Basis Functions (RBFs), the distance between two finite-dimensional vectors is measured, forming an inner product between two infinite-dimensional vectors. This approach enhances separability, and by utilizing the Kernel Trick [10] in SVM with RBF Kernel, the hyperplane can be calculated without explicitly computing the infinite-dimensional inner product. The calculations can be performed using the following equation, which proposes a mapping over integers modulo-M:
Where γ is a positive constant parameter. The exponential function delimits the range of the function to:
This function can be interpreted as being dependent on the distance d:
Based on this, the behavior of function z is illustrated in Fig.1.
This function has its domain and range in real numbers, but they can be mapped into integers by determining their respective bounds. The first step is to find the bounds of the domain. Essentially, d2 represents the Euclidean distance between two k-dimensional vectors xi and xj:
If the x vectors are normalized, their components must be within in interval [0,1]. Under this assumption, the maximum distance between each pair of components is 1. Therefore, it can be inferred that the maximum distance between vectors is:
This implies that:
Taking this inequality into account, and defining the mapping function as :
Then:
This last inequality is useful for achieving the mapping of d2 to integers, indicating that:
By utilizing the floor function to approximate the mapped distance, the following relationship is obtained:
This approximation provides a method for converting a normalized feature vector into its integer equivalent:
Where M represents the dynamic range of the RNS and k is the number of dimensions in feature vectors. This analysis provides an integer equivalent for the distance, and the next step is to convert γ and the function z(d) to integers.
To convert γ into an equivalent integer, it is necessary to compensate for the effect of in the exponent. Therefore, it is advisable to use the following mapping function:
These approximations lead to the following model for the exponential function:
Finally, to adjust the RBF kernel, the following integer-based equivalent is proposed:
In this case, represents a function with a range contained in the interval , which is a valid subset within the finite field. G e is a positive constant gain that allows adjusting the amplitude of the exponential function according to the subsequent mapping of the SVM hyperplane.
For example, if M = 510, G e = 5 and k=2, the integer-based exponential equivalent will have the following bounds:
Thus, the integer exponential function will produce values within the range [0,25], which is within the valid subset of the finite field P.
This relationship is illustrated in Fig. 2.
The kernel function is typically stored in a Look-Up Table (LUT) or an equivalent form to simplify its approximation. Although other methods, such as using Taylor Series for finite fields, exist, they can be challenging to implement. One major issue with the Taylor Series approach is the difficulty of finding a modular multiplicative inverse for the factorial function, which is needed to calculate the series coefficient. If the modulo is not a prime number and is not coprime with all Taylor coefficients, the existence of a modular multiplicative inverse cannot be guaranteed [12]. In the case of a composite number like M, it is not certain that all the Taylor Series coefficients will be available.
To visualize the integer exponential approximation, a block diagram is provided in Fig. 3.
6. Proposed support vector machine mapping over integers modulo-M
Building on the analysis presented in the previous sections, it is feasible to map the SVM hyperplane into integers by mapping all its components individually. Starting with the Kernel function represented in the modulo-M finite field, similar steps can be taken to convert the other parameters. A summary of these conversions is provided in Table 1.
By utilizing these conversion functions, it becomes possible to obtain the approximated bound of the hyperplane, referred as glim. This parameter represents the maximum value of g within a specific area of interest, which corresponds to the vector space encompassing all possible feature vectors. In the case of a two-dimensional normalized feature, for instance, the area of interest is a square defined within the range of [0,1] in both dimensions, as depicted in Fig. 4.
To approximate the value of g lim , the absolute value of g can be evaluated at q randomly selected points within the area of interest. By comparing these evaluations, the maximum value can be determined, yielding the value of g lim :
While using numerical analysis techniques such as SMO (Sequential Minimal Optimization) or genetic algorithms could potentially provide a more precise estimation of g lim, for the sake of simplicity, this work utilizes the q random points method, as depicted in Fig. 5. Although it may not guarantee an optimal solution, it offers a practical and straightforward approach for approximating g lim .
Looking back on the fact that the hyperplane can be multiplied by a positive gain G1 without altering its sign, one can utilize the following inequality:
Then:
In principle, G plane can be used to map g into the range . However, considering the raw approximation in g lim , a slack of 20% is applied in G plane , which corresponds to the 80% of the total range:
Nevertheless, G lambda is a parameter that scales the Lagrange Multipliers in hyperplane function. If the Lagrange Multipliers are significantly smaller compared to the range , G lambda can be defined as a small portion of G plane :
Next, combining all the components of the SVM, the integer SVM can be expressed as follows:
Then, replacing gains:
From this model, the system gains can be determined based on the following conditions:
7. Proposed SVM model using Residue Number Systems
Based on the previous approaches, a new objective can be achieved: constructing an SVM classifier using RNS when the SVM is modeled in a modulo-M ring. The initial step in this approach is to select a suitable set of moduli. Using moduli that are powers of two is often a suitable choice to simplify the hardware implementation:
RNS residues for SVM parameters in the modulo-M system can be obtained by applying the conversions outlined in Table 2.
In Table 2, for each parameter p, there is a corresponding 3-tuple where each component is defined as follows:
When using a Look-Up Table with dimensions to represent an integer exponential, there will be a corresponding size matrix in RNS representation.
With this parameter conversion, it becomes suitable to create an equivalent hyperplane in RNS. However, it is necessary to define a new function in RNS that can replace the sign function:
Implementing the sign function, which involves comparing a number with zero to determine whether it is less than, greater than, or equal to zero, is not a straightforward task in RNS. However, a variant of the LPN algorithm [13] that is specifically designed for the chosen modulus set can be utilized to accomplish the goal of performing the comparison in RNS.
The first step in this process is to represent an integer number in modulo-M, denoted as X, and its equivalent in RNS, denoted as .The relationship between them can be expressed as follows:
A periodicity is evident in the set of residues with respect to the m1 and m3 moduli. This can be observed by defining:
The last residue in the set can be defined as:
By generalizing and assuming a period T, where T = m 1 . m 3, it can be observed that if multiple periods of length kT are added or subtracted to an integer number X, the residue R2 will be affected in k units:
To compare numbers in RNS, periodicity can be leveraged by defining the following terms:
- Least Possible Number (LPN): this is the smallest integer whose residues R1 and R3 match those of the number being represented.
- Reference Residue (RR): this is the R2 residue of LPN.
Representing an integer in RNS using a modulus set can be thought of as a linear combination of its LPN (LPNx) and RR (RRx). This is illustrated by considering two integers, X and X', and their corresponding RNS residues:
Now, X’ can be defined as a linear combination of LPNx, R2 and RRx:
By means of modular arithmetic, it can be concluded that X is equivalent to X’ in RNS representation:
Since LPNx is within the first period T = m1.m3, it follows that:
Then, an algorithm based on these principles can be utilized to compare two integer numbers, denoted as X and Y, represented in RNS:
Then:
In terms of LPNs and RRs:
To simplify the comparison function in RNS, the subtraction of RRs with their corresponding R 2 residues is carried out in modulo m2 to ensure positive values. In terms of hardware implementation, if LPNs and RRs are obtained from a memory-based LUT, the comparison function can be calculated solely using the LUTs and the residues from X and Y. This approach overcomes the main challenge in RNS, which is the conversion from RNS to Binary using methods such as the Chinese Remainder Theorem. The comparison function can be defined as follows:
This last function can be implemented efficiently using adders and comparators. In signed RNS, positives numbers are represented in the range , while negative numbers are represented in the range , or equivalently in the nominal range . Using complete nominal range [0, M - 1], the decision boundary between positive and negative numbers is . Therefore, sign function in RNS can be defined as follows:
This property enables the use of RNS for representing SVM-RBF classifiers.
8. Results
The previous methods discussed in the sections were implemented to demonstrate the feasibility of SVMs with Radial Basis Function in both modulo-M and RNS representations. The implementation was carried out using the Python programming language along with various libraries such as Numpy, sklearn, and matplotlib. The SVM training method used was a simplified SMO-algorithm-based approach (SMO-lite).
Python proved to be a versatile and widely used platform for implementing SVM-RBF models. The powerful numerical computing library, Numpy, facilitated efficient mathematical operations and array manipulation. The sklearn library provided convenient tools for machine learning, including SVM implementations. Matplotlib was utilized for visualizing the results and analyzing the performance of the models.
As an example, a dataset of 25 two-dimensional feature vectors was analyzed, which were divided into two distinct groups with the main group centered around a cluster. To ensure consistency, all samples were normalized and subjected to Gaussian noise with a standard deviation of 0.1. The Radial Basis Function (RBF) kernel used a γ parameter set to 1.
Figs. 6-8 compare the decision boundaries of the SVM-RBF approximation in modulo-M to the original SMO-lite algorithm, sklearn SVM, and a Maclaurin Series approximation with a ninth-degree polynomial. The effect of changing the dynamic range (M) is also depicted, with α and β parameters influencing the shape of the decision boundaries.
Fig. 9 shows the plot of the sign function in Python using the LPN-method. The plot labels the first half of the nominal range as positive 1, the second half of the nominal range as negative 1, and zero value in the input represents a 0 sign according to the Dirichlet Conditions.
Finally, Fig. 10 compares four previous methods to a fifth SVM approach, known as SVM-RNS, which maps the modulo-M equivalent to the RNS representation. The figure provides a visual comparison of the performance and accuracy of these different methods. The implementation was carried out using the Python programming language and libraries such as Numpy, Scikit-learn, and Matplotlib, which facilitated efficient mathematical operations, machine learning tools, and data visualization.
The results obtained indicate that the SVM in RNS and SVM in modulo-M demonstrate similar performance to the other methods, with decision boundaries that are almost identical across different modulo sets. In contrast, the LPN method for implementing the sign function has been shown to be effective for classification purposes in SVM-RNS. This suggests that the LPN-based approach provides a viable solution for implementing the sign function in RNS-based SVM classifiers.
9. Literature review
Table 3 provides a summary of recent research studies that have focused on utilizing of RNS in machine learning applications to enhance computing performance. The literature review reveals that incorporating RNS in signal processing systems can lead to substantial benefits such as improved power efficiency and reduced latency.
10. Conclusions
The results show that it is possible to implement SVMs using integer numbers, even with nonlinear kernel functions such as Radial Basis Functions. The modulo-M approximation provides a better fit to the original SVM when the dynamic range is increased, allowing for more possible represented values. However, this approach requires a larger Look-Up Table (LUT) size, which may demand more memory resources.
Furthermore, the implementation of SVM-RNS demonstrates that Residue Number System (RNS) is suitable for SVMs with RBF kernels. This opens possibilities for future research in developing hardware-based methods for implementing classifiers. RNS-based SVMs provide a promising avenue for efficient and robust hardware implementations of SVM classifiers, which can be explored in future studies.
Both Look-Up Tables (LUTs) and Residue Number System (RNS) have been effective techniques for feature extraction, as in the case of Wavelet coefficients [18], and for training Artificial Neural Networks (ANNs). [19] In addition, modular arithmetic has been used in various deep learning acceleration techniques, such as employing Number Theoretical Transforms (NTTs) in convolutional layers [20].
It is important to note that approaches like Convolutional Neural Network (CNN) require more resources compared to an SVM and may offer similar performance in certain applications, such as EMG pattern recognition [21]. Thus, the choice between SVMs and CNNs should depend on the specific requirements and constraints of a given application. Depending on the situation, SVMs can provide a simpler and more resource-efficient solution while still achieving satisfactory performance.
Nonlinear activation functions play a crucial role in introducing complex decision boundaries in the stages of a classifier. Commonly used nonlinear activation functions include softmax, logistic, hyperbolic, and ReLU [22]. In the context of neural network development with the RNS system, advantages have been observed in performing addition, subtraction, and multiplication operations [15].
However, when it comes to using nonlinear functions such as divisions, comparisons, and exponentials in the RNS, there is a high computational cost involved [23]. Previous research has explored various implementations, including representations in Mixed Radix Representation and the Chinese Remainder Theorem [24]. However, these methods can be complex to implement. In the case of the comparison operation, the LPN method has been recently implemented in a memristor array (PIM) [13]. In other recent research, a comparable approach has been demonstrated utilizing the Mixed Radix representation with dynamic range partitioning [25]. It is noteworthy to emphasize that one of the major contributions of this research is the proposal of a method to utilize SVM with RNS. This method involves the use of the LPN-based comparison operation with a modulus set distinct from the one utilized in the implementation with PIM.
Additionally, the research presents a methodology for performing the exponential operation in RNS. Both of these objectives hold fundamental importance in advancing the practical application of unconventional numerical systems, such as RNS, in pattern recognition systems and digital signal processing [11,15,16,24,26].