Hardware Design of the Discrete Wavelet Transform: an Analysis of Complexity, Accuracy and Operating Frequency

Ballesteros-L., Dora M.; Renza, Diego; Fernando-Pedraza, Luis; Ballesteros-L., Dora M.; Renza, Diego; Fernando-Pedraza, Luis

doi:10.17230/ingciencia.12.24.6

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Citado por Google
Similares em SciELO
Similares em Google

Mais
Mais

Permalink

Ingeniería y Ciencia

versão impressa ISSN 1794-9165

ing.cienc. vol.12 no.24 Medellín jul./dez. 2016

https://doi.org/10.17230/ingciencia.12.24.6

Research Article

Hardware Design of the Discrete Wavelet Transform: an Analysis of Complexity, Accuracy and Operating Frequency

Diseño hardware de la transformada wavelet discreta: un análisis de complejidad, precisión y frecuencia de operación

Dora M. Ballesteros-L.¹

Diego Renza²

Luis Fernando-Pedraza³

^¹ Universidad Militar Nueva Granada, dora.ballesteros@unimilitar.edu.co, ORCID:http://orcid.org/0000.0003-4399-823X, Bogotá, Colombia.

^² Universidad Militar Nueva Granada, diego.renza@unimilitar.edu.co, ORCID:http://orcid.org/0000.0003-4399-823X, Bogotá, Colombia.

^³ Universidad Distrital Francisco José de Caldas, lfpedrazam@distrital.edu.co, ORCID:http://orcid.org/0000.0003-4399-823X, Bogotá, Colombia.

Abstract

The purpose of this paper is to present a comparative analysis of hardware design of the Discrete Wavelet Transform (DWT) in terms of three design goals: accuracy, hardware cost and operating frequency. Every design should take into account the following facts: method (non-polyphase, polyphase and lifting), topology (multiplier-based and multiplierless-based), structure (conventional or pipelined), and quantization format (floating-point, fixed-point, CSD or integer). Since DWT is widely used in several applications (e.g. compression, filtering, coding, pattern recognition among others), selection of adequate parameters plays an important role in the performance of these systems.

Key words: Discrete Wavelet Transform; topology; quantization format; accuracy

Resumen

El propósito de este documento es presentar un análisis comparativo de esquemas hardware de la Transformada Wavelet Discreta, DWT, en términos de tres objetivos de diseño: precisión, complejidad y frecuencia de operación. Cada diseño debe considerar los siguientes aspectos: método (no polifásico, polifásico y lifting), topología (basados en multiplicadores y sin multiplicadores), estructura (convencional o pipeline) y formato de cuantización (punto flotante, punto fijo, CSD o entero). Dado que la DWT es ampliamente utilizada en diversas aplicaciones (por ejemplo en compresión, filtrado, codificación, reconocimiento de patrones, entre otras), la selección adecuada de parámetros de diseño desempeña un papel importante en el diseño de estos sistemas.

Palabras clave: DWT; topología; formato de cuantización; precisión

1 Introduction

The Discrete Wavelet Transform is a powerful tool for the multi-resolution analysis of different kind of signals (e.g. biomedical, voice, image and video). Among others, DWT is used in filtering ¹^),(²^),(³, compression ⁴^),(⁵^),(⁶^),(⁷^),(⁸^),(⁹^),(¹⁰, pattern recognition ¹¹^),(¹²^),(¹³^),(¹⁴ coding, ¹⁵^),(¹⁶^),(¹⁷^),(¹⁸^),(¹⁹ applications. Since several systems require real-time operation ²⁰^),(²¹^),(²²^{), (}²³^),(²⁴^),(²⁵^),(²⁶, the design of hardware topologies of the DWT is a current issue.

Design of the hardware implementation of the DWT has several choices. In terms of the method, it is carried out within non-polyphase (convolutionbased) ²⁷, polyphase ²⁸^),(²⁹^),(³⁰^),(³¹ or lifting schemes ³²^),(³³^),(³⁴. In terms of the topology, the schemes are multiplier-based ²⁷^),(³⁵^),(³⁶^),(³⁷^),(³⁸^{), (}³⁹ or multiplierless-based ²⁸^),(²⁹^),(³⁰^),(³¹^),(⁴⁰^),(⁴¹^),(⁴²^),(⁴³^),(⁴⁴^),(⁴⁵^),(⁴⁶^),(⁴⁷^),(⁴⁸^),(⁴⁹. In terms of the quantization of the filters weights, the system can work with floating-point data (35), fixed-point data ²⁷^),(³⁶^),(³⁷^{), (}⁴¹, Canonical Signed Digit (CDS) ³⁰^),(³¹^),(⁴⁰ or integer data ²⁸^),(²⁹^),(⁴²^),(⁴³^),(⁴⁴^),(⁴⁵^). Finally, structure is conventional ²⁷^),(²⁸^),(²⁹^),(³⁰^),(³¹^),(⁴⁰^),(⁴²^),(⁴³^),(⁴⁴^),(⁴⁵ or pipelined-based ⁴¹^),(⁵⁰^),(⁵¹^),(⁵²^),(⁵³^),(⁵⁴^),(⁵⁵.

Every choice plays an important role in the performance of the system. For example, non-polyphase schemes have easier design than the others, but lower throughput. Lifting schemes with non-pipelined structures have higher path delay than the non-polyphase schemes. Quantization error decreases with long word-bits, but the hardware cost increases. Therefore, the following design aims must be taken into account: high accuracy, high operating frequency or low hardware cost. None of them is able to simultaneously optimize the above objectives. A good design for one of them cannot be a good design for other aim.

The rest of the paper is organized, as follows. Firstly, the background of the Discrete Wavelet Transform is presented. Secondly, a review of works in terms of complexity is shown. Thirdly, main concepts behind accuracy and some of the most remarkable works in terms of accuracy are illustrated. Then, a discussion about pipelined-based and conventional schemes is presented.

2 Background of the discrete wavelet transform

Discrete Wavelet Transform (DWT) is one of multi-resolution transforms in which both time and frequency of the input signal are analyzed. At the point of view of filter banks, DWT is carried out in two stages: firstly, the input signal is filtered with two half-band filters (i.e. low-pass filter and high-pass filter); secondly, the filtered signals are decimated by a power of two. There are many filters that satisfy the conditions of the wavelet transform and they are grouped in families. In the same family, there are many filters related to the length of the filter.

The easiest representation of the DWT is the convolution (non-polyphase) approach. In this case, the two stages of the DWT are clearly separated (Figure 1). If the DWT is designed as a FSM (Finite State Machine), the first state consists on calculating the filtered signals (it can use several clock cycles), and then in the second state, half of data are eliminated (by the de- cimation process). Although its hardware implementation is less complex, throughput is not the highest as possible.

Figure 1: Generic block diagram of the non-polyphase scheme.

The second design method is the polyphase one. In this case, in the same state convolution and decimation process are carried out. The input signal is down-sampled (i.e. split) in data of even clock cycles and data of odd clock cycles. Then, data of the even cycles are filtered with the even weights of the filter (low-pass or high-pass) and data of the odd cycles are filtered with the odd weight of the filter. At the end, a sum (between even filtered data and odd filtered data of the same filter) is applied (Figure 2) ²⁹. Unlike the convolution approach, half of data are not wasted. Therefore, throughput of this kind of schemes is the double that of the convolutionbased schemes.

Figure 2: Generic block diagram of the polyphase scheme.

A special case of a polyphase scheme is the lifting approach. In a similar way of the polyphase structure, the input signal is down-sampling before the filtering process. Nevertheless, the approximation and detail coefficients are calculated using P (prediction) unit and U (updating) unit. With data of odd part and the result of the P unit, detail coefficients are obtained. With the detail coefficients, the result of the U unit and data of the even part, approximation coefficients are found. P and U functions are directly related to the selected wavelet base. Figure 3 shows a generic block diagram in which these functions are not specified.

Figure 3: Generic block diagram of the lifting scheme.

In terms of throughput, the result of the lifting scheme is the same than the result of the polyphase scheme. Differences lie on hardware resources and latency, but it depends on the P and U functions (and therefore the selected wavelet base).

3 Design goal: complexity

Since DWT uses mathematical operations (adder, sum, down sampling), one parameter to take into account is the topology, which can be multiplier-based or multiplierless-based. In the first case, convolution process between the input signal and the filters weigths are carried out by multiplier units; in the second case, it is calculated by right-shifts and left-shifts.

In order to illustrate the differences between the topologies with an example, we have selected the wavelet base 5/3 (i.e. CDF 2, 2). The filters weights have the values, as follows:

()1

()2

Where h0 is the low-pass filter, h1 is the high-pass filter, k is in the range [0 4] for h ₀, and k is in the range [0 2] for h ₁.

In the case of multiplier-based schemes, the design uses one multiplier for each weight of the filter (i.e. 5 multipliers for h ₀ and 3 multipliers for h ₁) and one adder to obtain approximation coefficients and one adder to obtain detail coefficients. These multiplier units must allow multiply data in float-format or fixed-format, with several bits in inputs and outputs. Therefore, hardware resources are directly related to the length of the input signals (word-length). The higher the word-length of the inputs, the higher is the hardware cost. Like the multiplier units, the adder unit must work with several bits and then hardware cost is directly related to the word-length.

Figure 4 shows a generic block diagram for the low-pass filter of the 5/3 wavelet base, for a multiplier-based topology.

Figure 4: Generic block diagram of the low-pass filter 5/3 wavelet base: multiplier-based topology.

If the input signal is quantized to 16-bits, and the filters weights are quantized to 8-bits (e.g. in fixed-format), multiplier units must work with 23-bits and the adder unit must work with at least 23-bits. The higher the total number of bits, the higher is the delay of the multiplication process.

On the other hand, in multiplierless-based schemes the multiplier units are eliminated of the design and then mathematical operations are carried out by left-shifts or right-shifts. If the signal is left-shifted, one bit with value of 0 _b is added at the right of the signal; otherwise, if the signal is right-shifted, the least significant bit of the signal is discarded. Left-shift operation is equal to multiply by 2 the input; right-shift operation is equal to the integer part of the division by 2 of the input signal. For example, if the input signal is 10111 _b , the result of the left-shift is 101110 _b and the result of the right-shift is 1011 _b . In decimal format, the input signal is 23, the result of the left-shift is 46 and the result of the right-shift is 11. As consequence of the right-shift, a clipping error appears. However, clipping error is enough low and then quantization error is low, too.

In multiplierless-based schemes, if data are quantized with integer for- mat, the length of the internal signals is significantly lower than in the case of multiplier-based topology, even if the later uses integer quantization, too. For example, suppose that the input signal is 5-bits (e.g. 23 = 10111 _b ), and the weight of the filter is 2-bits (e.g. 2 = 10 _b ). With a multiplier unit the result is 7 bits (e.g. 46 = 0101110 _b ). However, as we explain in the above paragraph, the result with one right-shift (that it is equal to multiply by 2) is 6 bits (e.g. 101110 _b ). Although input data is the same in both topolo- gies, multiplierless-based schemes have lower bits than the multiplier-based schemes.

Figure 5 shows a generic block diagram for the low-pass filter of the 5/3 wavelet base, for a multiplierless-based topology with integer data. Constant has been ignored. It is taken into account in a post-amplifier

stage.

An example of multiplierless-based topology and integer quantization of the weights of the filter is found in the work of Ballesteros and Moreno (²⁸). In that case, they use left-shift, right-shift, delay, and split units to compute the wavelet base 5/3. In terms of hardware resources, the wavelet transform use 99 slice registers, 130 slice LUTs, 87 LUT FF-pairs and 51 bounded IOBs. With that design the maximum delay is 3.59 ns with latency equal to 2. This design was used for data hiding purposes (²⁹).

Figure 5: Generic block diagram of the low-pass filter 5/3 wavelet base: multiplierless-based topology.

Summarizing, in terms of complexity is better a design with multiplierless-based topology and integer quantization of the weights of the filter than with multiplier-based topologies, even if quantization of the filters weights is integer, too.

4 Design goal: accuracy

One of the most important requirements in several systems is accuracy. If a system satisfies this requirement, the user knows that the obtained data are highly close to the theoretical data. In the case of hardware wavelet- based systems, it is desirable that the quantization error (qe) of the filters weights is the lower as possible, and therefore the obtained data are highly similar to the real one. If the system uses the decomposition (DWT) and the reconstruction (IDWT) stages, the total error due to the quantization is known as the reconstruction error (re). In some applications, the system tolerates values of re 2 %; but in other cases (e.g. data hiding systems) re must be lower than 0.1 %. In this section we present some architectures of the DWT-IDWT and they are analyzed in terms of accuracy.

Since accuracy is strongly related with the quantization process of the filters weights, the main point in the design is to select the most appropriate format to represent data. There are four formats, as follows: floating-point, fixed-point, Canonical Signed Digit (CSD) and integer. In floating-point format, the filters weights are represented by several bits related to the integer part and the mantissa. The higher the number of bits, the lower is the quantization error. Nevertheless, higher precision implies higher hardware cost. In the case of fixed-point format, binary representation encompasses two parts: integer part and fractional part. In a similar way of float format, the total number of bits is strongly related to the quantization error. In the third case, in CSD, every bit can be a positive or a negative power of two (i.e. = −0.25) and then the total number of bits to represent data is lower than in fixed format (because it does not need a sign bit). Finally, integer format is the easiest format in terms of binary representation. It is useful in multiplierless topologies in which data operations (multiplication, division) are performed by right-shifts and left-shifts.

In order to illustrate the quantized error with an example, we have selected the wavelet base 5/3. Their values are shown in Eq. 1 and 2.

In fixed-format, if the weights are represented with six bits, one bit is for the integer part and five bits are for the decimate part (e.g. |h ₀(1)| = 0.00101 _b = 0.15625). In this example, the sign is not included within binary data. The quantized filters for the 5/3 wavelet base are obtained as follows:

(3)

()4

Since by definition and in the current case: , the total quantized error is 1.64 %.

In the case of integer format, the term of this wavelet base can be factorized in a similar way of (28, 29) , and then the weights are represented by rational terms in which both the numerator as the denominator are integer data. Now, quantization error is only due to the division process which is related to the right-shifts of data (i.e. 1/(2p) needs p right-shifts). The authors of (28, 29) found that the quantization error is up to 0.0031 %. In the case of data hiding schemes based on LSB (Least Significant Bit) substitution is very useful working with a very low quantization error with the aim of recovering the embedded data.

With a second example of the quantization process with fixed-format, suppose that the wavelet base db2 is selected. The decomposition filters are shown in Equations 5 and 6:

()5

()6

With nine bits, the binary representation of the weights is, for example, |h ₀(1)| = 0.00100001 _b = 0.12890625. The quantized filters for db2 are obtained as follows:

()7

()8

In the current case and then the total quantized error is 0.01 %. This result is better than the obtained in the first example; however, in the current case the quantization process uses nine bits instead of six bits.

Table 1 shows the comparison of some works in the design of wavelet transform. In the first column, the method of the design and the proposal are included (Non-pol.: non-polyphase, pol.: polyphase, lif.: lifting). In the second column, it is presented the topology (M: multiplier, Ml: multiplierless). In the third column, it is presented the type of structure (C: conventional, P: pipelined). In the fourth column, quantization format is defined. In the fifth column, the highest quantization error or/and the total reconstruction error are calculated. Finally, in the sixth column, the strengths and weakness of the proposal are identified.

Table 1: Comparison in terms of accuracy.

Work	T	S	Q	qe(max), re(total)	Strengths and weakness
³⁵⁾ Lif.	M	P	Floating- point	Non-fixed	It allows choosing desired data precision. Long word-width.
³⁶⁾ Lif.	M	P	Fixed- point	qe ≤ 3 %	High size of internal signals. Middle frequency operation.
²⁷⁾ Non-pol.	M	C	Fixed- point	qe ≤ 2 %	Non-external memories. Very low latency.
³⁷⁾ Lif.	M	P	Fixed- point	qe ≤ 0.02 %	Very low latency. Long word- width. Middle hardware cost.
³⁰^),(³¹⁾ Pol.	Ml	C	CSD	qe ≤ 7 %	Long sum-product terms. Long latency.
⁴⁰⁾ Non-pol.	Ml	C	CSD	qe ≤ 4 %	Long sum-product terms. Long latency.
⁴¹⁾ Lif.	Ml	P	Fixed- point	qe ≤ 6 %	High frequency operation. Middle hardware cost.
⁴²	Ml	C	Integer	qe ≤ 5 %	It can work for image processing.
⁴³^),(⁴⁴⁾ Lif.	Ml	C	Integer	qe ≤ 2.5 %	Non-reconfigurable. Middle hardware cost. Real-time operation.
⁴⁵⁾ Pol.	Ml	C	Integer	qe ≤ 0.0031 %	Non-reconfigurable. Middle hardware cost. Real-time operation.
²⁸^),(²⁹⁾ Pol.	Ml	C	Integer	qe ≤ 0.0031 % re ≤ 0.0092 %	Non-reconfigurable. Low latency and low hardware cost. Real-time operation.

As it is expected, quantization format is the most important parameter in terms of accuracy. Very low quantization error may be obtained with multiplier or multiplierless topologies. However, designs are less complex with multiplierless topologies because they need only shifts (instead of multiplier units).

If accuracy is the goal of the design, it is suggested multiplierless topologies with integer quantization. Since the throughput of lifting and polyphase schemes is the same, any of them can be selected.

5 Design goal: operating frequency

Another important aspect to take into account in the design of the DWT and the inverse DWT (IDWT) is the operating frequency. Several applications works with signals of high frequency and then it is necessary a scheme fast response. We compare pipelined-based designs and conventional designs.

One disadvantage of the lifting schemes over the non-polyphase schemes is that the latter has a higher value of the delay path and therefore, it is expected that its operating frequency is lower. To overcome this problem pipelined architectures are used. For example, the highest operating frequency of the DWT can increase of 117 MHz to 277 MHz with a pipelined structure ⁵⁴. In another work, it has been found that the highest operating frequency depends on the number of pipeline stages. The higher the number of pipeline stages, the higher is the highest operating frequency (i.e. 60 MHz with 3 pipeline stages, 186 MHz with 18 pipeline stages ⁴¹⁾⁾. However, pipelined-based scheme does not always ensure a high value of operating frequency. For example, a design of the 9/7 lifting wavelet with pipeline-based structure, fixed-point quantization of the filters weights and multiplierless-based topology has highest operating frequency up to 100 MHz ⁵⁵.

Another approach consists on using Distributed Arithmetic. For example, the db4 wavelet base is implemented with a ROM lookup table and a cascade of shift registers, into a parallel structure. In this approach, the highest operating frequency is 134 MHz ⁵⁶.

On the other hand, in some works with multiplierless-based topologies and conventional structures, the highest operating frequency is 110 MHz ³⁰, 140 MHz ³¹ or 166 MHz ²⁸. These values are lower than the obtained in ⁵⁴ but higher than the results of ⁵⁵ and ⁴¹ (with 3 pipeline stages).

Summarizing, although pipelined-based structures may have lower de- lay path, choice of this structure does not guarantee the high values of operating frequency. Other facts, like topology and quantization, should be taken into account, too.

6 Conclusion

In this paper we revised several works of hardware implementation of the DWT. Proposals were analyzed in terms of three design aims: complexity, accuracy and highest operating frequency. In any design, the following parameters must be taken into account: method (convolution, polyphase, lifting), topology (multiplier-based or multiplierless-based), structure (con- ventional, pipelined), and quantization format (floating-point, fixed-point, CSD, integer).

Firstly, if the aim of the design is low complexity (and low hardware cost), it is suggested multiplierless topologies. In addition, integer data uses lower number of equivalent blocks than the other formats. In terms of the method, there is not a meaningful difference between polyphase and lifting schemes.

Secondly, if the aim is accuracy, the most important aspect in the design is the quantization format. It has been found low error values when the system works with integer data (it does not matter about the structure). It is worth noting that multiplierless-based schemes take advantage of integer data, and therefore this choice is also suggested.

Finally, if the aim is operating frequency, the best result was found in a pipelined structure. Nevertheless, some conventional designs obtained better results than some pipelined-based designs, and then it is not asserted than pipelined-based structures always outperform conventional structures.

Referencias

1 () Y. Zhang, Y. Wang, W. Wang , and B. Liu, " Doppler ultrasound signal denoising based on wavelet frames," IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, vol. 48, no. 3, pp. 709-716, May 2001. (Online). Available: http://ieeexplore.ieee.org/document/920698/ 130 [ Links ]

2 () D. Hepburn and M. Michel, " Second generation wavelet transform for data denoising in PD measurement," IEEE Transactions on Dielectrics and Electrical Insulation, vol. 14, no. 6, pp. 1531-1537, Dec. 2007. (Online). Available: http://ieeexplore.ieee.org/document/4401237/ 130 [ Links ]

3 () G. Bandyopadhyay, P. Syam, A. Chattopadhyay, and S. Das, " Application of Wavelet transform in denoising synchronising signal in line synchronized power electronics converters," IET Power Electronics, vol. 5, no. 3, pp. 281-292, Mar. 2012. (Online). Available: http://digital-library.theiet.org/content/journals/10.1049/iet-pel.2010.0382 130 [ Links ]

4 () J. Reichel, G. Menegaz, M. J. Nadenau, and M. Kunt, " Integer wavelet transform for embedded lossy to lossless image compression." IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 10, no. 3, pp. 383-92, Jan. 2001. (Online). Available: http://www.ncbi.nlm.nih.gov/pubmed/18249628 130 [ Links ]

5 () E. Hamid and Z.-I. Kawasaki, " Wavelet-based data compression of power system disturbances using the minimum description length criterion," IEEE Transactions on Power Delivery, vol. 17, no. 2, pp. 460-466, Apr. 2002. (Online). Available: http://ieeexplore.ieee.org/document/997918/ 130 [ Links ]

6 () B. A. Rajoub, " An efficient coding algorithm for the compression of ECG signals using the wavelet transform." IEEE transactions on biomedical engineering, vol. 49, no. 4, pp. 355-62, Apr. 2002. (Online). Available: http://www.ncbi.nlm.nih.gov/pubmed/11942727 130 [ Links ]

7 () Y. Shi, " A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression," IEEE Transactions on Circuits and Systems for Video Technology , vol. 13, no. 8, pp. 776-786, Aug. 2003. (Online). Available: http://ieeexplore.ieee.org/document/1227607/ 130 [ Links ]

8 () L. Brechet, M.-F. Lucas, C. Doncarli, and D. Farina, " Compression of Biomedical Signals With Mother Wavelet Optimization and Best-Basis Wavelet Packet Selection," IEEE Transactions on Biomedical Engineering, vol. 54, no. 12, pp. 2186-2192, Dec. 2007. (Online). Available: http://ieeexplore.ieee.org/document/4360002/ 130 [ Links ]

9 () Y. Zheng, " Quality Constrained Compression Using DWT-Based Image Quality Metric," IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 7, pp. 910-922, Jul. 2008. (Online). Available: http://ieeexplore.ieee.org/document/4472175/ 130 [ Links ]

10 () Z. Fang, N. Xiong, L. T. Yang, X. Sun, and Y. Yang, " Interpolation-Based Direction-Adaptive Lifting DWT and Modified SPIHT for Image Compression in Multimedia Communications," IEEE Systems Journal, vol. 5, no. 4 , pp. 584-593, Dec. 2011. (Online). Available: http://ieeexplore.ieee.org/document/6044698/ 130 [ Links ]

11 () H. Demirel and G. Anbarjafari, " Improved face recognition system using probability distribution functions extracted from wavelet subbands," in 2009 24th International Symposium on Computer and Information Sciences, IEEE. Ankara: IEEE, Sep. 2009, pp. 94-98. (Online). Available: http://ieeexplore.ieee.org/document/5291859/ 130 [ Links ]

12 () N. Begum, M. Alam, and M. I. Islam, " Application of Canny filter and DWT in fingerprint detection a new approach," in 2010 13th International Conference on Computer and Information Technology (ICCIT), IEEE. Dhaka, Bangladesh: IEEE, Dec. 2010, pp. 256-260. (Online). Available: http://ieeexplore.ieee.org/document/5723865/ 130 [ Links ]

13 () Y.-T. Chou, S.-M. Huang, S.-H. Wu, and J.-F. Yang, " DWT and Sub-pattern PCA for Face Recognition Based on Fuzzy Data Fusion," in 2011 International Conference on Intelligent Computation and Bio-Medical Instrumentation, IEEE. Wuhan, China: IEEE, Dec. 2011, pp. 296-299. (Online). Available: http://ieeexplore.ieee.org/document/6131767/ 130 [ Links ]

14 () R. Zhang and J. Ding, " Facial recognition based on wavelet transform," in World Automation Congress (WAC), 2012. Puerto Vallarta, Mexico: IEEE, 2012, pp. 1-4. (Online). Available: http://ieeexplore.ieee.org/document/6321574/ 130 [ Links ]

15 () J. Andrew, " Coding gain and spatial localisation properties of discrete wavelet transform filters for image coding," IEE Proceedings - Vision, Image, and Signal Processing, vol. 142, no. 3, p. 133, 1995. (Online). Available: http://digital-library.theiet.org/content/journals/10.1049/ip-vis_19951938 130 [ Links ]

16 () D. Marpe and H. Cycon, " Very low bit-rate video coding using wavelet-based techniques," IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 1, pp. 85-94, 1999. (Online). Available: http://ieeexplore.ieee.org/document/744277/ 130 [ Links ]

17 () S. Li and W. Li, " Shape-adaptive discrete wavelet transforms for arbitrarily shaped visual object coding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 5, pp. 725-743, 2000. (Online). Available: http://ieeexplore.ieee.org/document/856450/ 130 [ Links ]

18 () K. Ferguson and N. Allinson, " Psychophysically derived quantisation model for efficient DWT image coding," IEE Proceedings - Vision, Image, and Signal Processing, vol. 149, no. 1, p. 51, 2002. (Online). Available: http://digital-library.theiet.org/content/journals/10.1049/ip-vis_20020073 130 [ Links ]

19 () J. Yang, Y. Wang, W. Xu, and Q. Dai, " Image coding using dual-tree discrete wavelet transform." IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 17, no. 9, pp. 1555-69, Sep. 2008. (Online). Available: http://www.ncbi.nlm.nih.gov/pubmed/18701394 130 [ Links ]

20 () H. Mota, N. Volpini, G. Rodrigues, and F. Vasconcelos, " A real- time processing system for denoising of partial discharge signals using the wavelet transform," in Conference Record of the 2008 IEEE International Symposium on Electrical Insulation, IEEE. Vancouver, British Colombia, Canada: IEEE, Jun. 2008, pp. 391-395. (Online).Available: http://ieeexplore.ieee.org/document/4570356/ 130 [ Links ]

21 () J. Chilo and T. Lindblad, " Hardware Implementation of 1D Wavelet Transform on an FPGA for Infrasound Signal Classification," IEEE Transactions on Nuclear Science, vol. 55, no. 1, pp. 9-13, 2008. (Online).Available: http://ieeexplore.ieee.org/document/4448457/ 130 [ Links ]

22 () H. A. Darwish, M. Hesham, A.-M. I. Taalab, and N. M. Mansour, " Close Accord on DWT Performance and Real-Time Implementation for Protection Applications," IEEE Transactions on Power Delivery, vol. 25, no. 4, pp. 2174-2183, Oct. 2010. (Online). Available: http://ieeexplore.ieee.org/document/5556054/ 130 [ Links ]

23 () J. d. J. Rangel-Magdaleno, R. d. J. Romero-Troncoso, R. A. Osornio-Rios, E. Cabal-Yepez, and A. Dominguez-Gonzalez, " FPGA-Based Vibration Analyzer for Continuous CNC Machinery Monitoring With Fused FFT- DWT Signal Processing," IEEE Transactions on Instrumentation and Measurement, vol. 59, no. 12, pp. 3184-3194, Dec. 2010. (Online). Available: http://ieeexplore.ieee.org/document/5458093/ 130 [ Links ]

24 () K. Inoue, Y. Kuroki, M. Kurosaki, Y. Nagao, and H. Ochi, " Real time 2D-DWT of JPEG 2000 for Digital Cinema using CUDA 4.0," in 2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), IEEE. Chiang Mai, Thailand: IEEE, Dec. 2011, pp. 1-5. (Online). Available: http://ieeexplore.ieee.org/document/6146127/ 130 [ Links ]

25 () Y. Han, H.-i. Kang, C. Kim, and Y. Seo, " Statistical Pattern Based Real-Time Smoke Detection Using DWT Energy," in 2011 International Conference on Information Science and Applications, IEEE. Jeju Island in Korea: IEEE, Apr. 2011, pp. 1-7. (Online). Available: http://ieeexplore.ieee.org/document/5772361/ 130 [ Links ]

26 () F. B. Costa, C. M. S. Neto , S. F. Carolino, R. L. A. Ribeiro, R. L. Barreto, T. O. A. Rocha, and P. Pott, " Comparison between two versions of the discrete wavelet transform for real-time transient detection on synchronous machine terminals," in 2012 10th IEEE/IAS International Conference on Industry Applications, IEEE. Fortaleza, CE: IEEE, Nov. 2012, pp. 1-5. (Online). Available: http://ieeexplore.ieee.org/document/6453533/ 130 [ Links ]

27 () D. M. Ballesteros, D. M. Moreno, and A. E. Gaona, " FPGA compression of ECG signals by using modified convolution scheme of the Discrete Wavelet Transform," Ingeniare. Revista chilena de ingeniería, vol. 20, no. 1, pp. 8-16, Apr. 2012. (Online). Available: http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S071833052012000100002&lng=en&nrm=iso&tlng=en 130,139 [ Links ]

28 () D. M. Ballesteros L and J. M. Moreno A, " Wavelet-denoising on hardware devices with Perfect Reconstruction, low latency and adaptive thresholding," Computers & Electrical Engineering, vol. 39, no. 4, pp. 1300-1311, May 2013. (Online). Available: http://linkinghub.elsevier.com/retrieve/pii/S0045790613000621 130, 135, 137, 138, 139, 140 [ Links ]

29 () D. M. Ballesteros L and J. M. Moreno A, " Real-time, speech-in-speech hiding scheme based on least significant bit substitution and adaptive key," Computers & Electrical Engineering, vol. 39, no. 4, pp. 1192-1203, May 2013. (Online). Available: http://linkinghub.elsevier.com/retrieve/pii/S0045790613000323130, 132, 135,137, 138, 139 [ Links ]

30 () K. Kotteri, S. Barua, A. Bell, and J. Carletta, " A comparison of hardware implementations of the biorthogonal 9/7 DWT: convolution versus lifting," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 52, no. 5, pp. 256-260, May 2005. (Online). Available:http://ieeexplore.ieee.org/document/1431103/ 130, 139, 140 [ Links ]

31 () K. Kotteri, A. Bell, and J. Carletta, " Multiplierless filter Bank design: structures that improve both hardware and image compression performance," IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 6, pp. 776-780, Jun. 2006. (Online). Available:http://ieeexplore.ieee.org/document/1637517/ 130, 139, 140 [ Links ]

32 () H. Li, Q. Wang, and L. Wu, " A novel design of lifting scheme from general wavelet," IEEE Transactions on Signal Processing, vol. 49, no. 8, pp. 1714- 1717, 2001. (Online). Available: http://ieeexplore.ieee.org/document/934141/130 [ Links ]

33 () A. Soman and P. Vaidyanathan, " On orthonormal wavelets and paraunitary filter banks," IEEE Transactions on Signal Processing, vol. 41, no. 3, pp. 1170-1183, Mar. 1993. (Online). Available: http://ieeexplore.ieee.org/ document/205722/ 130 [ Links ]

34 () X. Lan, N. Zheng, and Y. Liu, " A high-performance and memory-efficient VLSI architecture with parallel scanning method for 2-D lifting-based discrete wavelet transform," IEEE Transactions on Consumer Electronics, vol. 55, no. 2, pp. 400-407, May 2009. (Online). Available: http://ieeexplore.ieee.org/document/5174400/ 130 [ Links ]

35 () D.-U. Lee, L.-W. Kim, and J. D. Villasenor, " Precision-aware self-quantizing hardware architectures for the discrete wavelet transform." IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 21, no. 2, pp. 768-77, Feb. 2012. (Online). Available:http://www.ncbi.nlm.nih.gov/pubmed/21824849 130, 139 [ Links ]

36 () S. Silva and S. Bampi, " Area and Throughput Trade-Offs in the Design of Pipelined Discrete Wavelet Transform Architectures," in Design, Automation and Test in Europe, IEEE . Grenoble, France: IEEE, 2005, pp. 32-37. (Online). Available: http://ieeexplore.ieee.org/document/1395789/ 130, 139 [ Links ]

37 () Y.-K. Lai, L.-F. Chen, and Y.-C. Shih, " A high-performance and memory-efficient VLSI architecture with parallel scanning method for 2-D lifting-based discrete wavelet transform," IEEE Transactions on Consumer Electronics, vol. 55, no. 2, pp. 400-407, May 2009. (Online). Available:http://ieeexplore.ieee.org/document/5174400/ 130, 139 [ Links ]

38 () M. A. Farahani, S. Mirzaei, and H. A. Farahani, " Implementation of a reconfigurable architecture of discrete wavelet packet transform with three types of multipliers on FPGA," in 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) , IEEE . Niagara Falls, Ontario: IEEE, May 2011, pp. 001 459-001 462. (Online). Available:http://ieeexplore.ieee.org/document/6030704/ 130 [ Links ]

39 () Z. Szadkowski, " An Optimization of the FPGA Based Wavelet Trigger in Radio Detection of Cosmic Rays," IEEE Transactions on Nuclear Science, vol. 62, no. 3, pp. 993-1001, Jun. 2015. (Online). Available:http://ieeexplore.ieee.org/document/7102787/ 130 [ Links ]

40 () K. Kotteri, A. Bell, and J. Carletta, " Design of Multiplierless, High- Performance, Wavelet Filter Banks With Image Compression Applications," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 3, pp. 483-494, Mar. 2004. (Online). Available: http://ieeexplore.ieee.org/document/1275595/ 130, 139 [ Links ]

41 () W. Wang, Z. Du, and Y. Zeng, " High-Speed FPGA Implementation for DWT of Lifting Scheme," in 2009 5th International Conference on Wireless Communications, Networking and Mobile Computing, IEEE . Beijing, China: IEEE, Sep. 2009, pp. 1-4. (Online). Available: http://ieeexplore.ieee.org/document/5302003/ 130, 139, 140 [ Links ]

42 () G. N. Geetha and K. K. Mohammed Salih, " A parallel processing architecture for two dimensional discrete wavelet transform without using multipliers," in 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12), IEEE. Tamilnadu, India: IEEE, Jul. 2012, pp. 1-4. (Online). Available: http://ieeexplore.ieee.org/document/ 6396059/ 130, 139 [ Links ]

43 () A. Abbas and T. Tran, " Multiplierless Design of Biorthogonal Dual-Tree Complex Wavelet Transform using Lifting Scheme," in 2006 International Conference on Image Processing, IEEE. Atlanta, GA: IEEE, 2006, pp. 1605- 1608. (Online). Available: http://ieeexplore.ieee.org/document/4106852/ 130, 139 [ Links ]

44 () A. Abbas and T. Tran, " Rational Coefficient Dual-Tree Complex Wavelet Transform: Design and Implementation," IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3523-3534, Aug. 2008. (Online). Available: http://ieeexplore.ieee.org/document/4527184/ 130, 139 [ Links ]

45 () M. Zhang, R. Deng, Z. Ma, and M. Zhang, " A FPGA-based low-cost real- time wavelet packet denoising system," in Proceedings of 2011 International Conference on Electronics and Optoelectronics, vol. 2, IEEE. Dalian, Liaoning, China: IEEE, Jul. 2011, pp. V2-350-V2-353. (Online). Available: http://ieeexplore.ieee.org/document/6013254/ 130, 139 [ Links ]

46 () N. Elghamery and S.-D. Habib, " An efficient FPGA implementation of a wavelet coder/decoder," in ICM 2000. Proceedings of the 12th International Conference on Microelectronics. (IEEE Cat. No.00EX453), IEEE. Singapore City, Singapore: Univ. Tehran, 2000, pp. 269-272. (Online). Available: http://ieeexplore.ieee.org/document/916458/ 130 [ Links ]

47 () J. M. Abdul-Jabbar and R. W. Hmad, " Allpass-based design, multiplierless realization and implementation of IIR wavelet filter banks with approximate linear phase," in International Symposium on Innovations in Information and Communications Technology, IEEE. Amman Jordan: IEEE, Nov. 2011, pp. 118-123. (Online). Available: http://ieeexplore.ieee.org/document/6149606/ 130 [ Links ]

48 () S. Ghosh, S. P. Maity, and H. Rahaman, " Multiplier-less VLSI architecture of 1-D Hilbert transform pair using Biorthogonal Wavelets for QCM-SS image watermarking," in 2013 4th International Conference on Computer and Communication Technology (ICCCT), IEEE. Allahabad, India: IEEE, Sep. 2013, pp. 5-10. (Online). Available: http://ieeexplore.ieee.org/document/6749594/ 130 [ Links ]

49 () T.-Y. Sung, Y.-S. Shieh, C.-W. Yu , and H.-C. Hsin, " Low-Power Multiplierless 2-D DWT and IDWT Architectures Using 4-tap Daubechies Filters," in 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06), IEEE. Taipei, Taiwan: IEEE, 2006, pp. 185-190. (Online). Available: http://ieeexplore.ieee.org/document/4032175/ 130 [ Links ]

50 () C. Zhang, C. Wang, and M. O. Ahmad, " A Pipeline VLSI Architecture for Fast Computation of the 2-D Discrete Wavelet Transform," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 8, pp. 1775-1785, Aug. 2012. (Online). Available: http://ieeexplore.ieee.org/ document/6133304/ 130 [ Links ]

51 () K. Mei, N. Zheng, and Y. Liu, " A high pipeline and low memory design of JPEG2000 encoder," in Proceedings of the Ninth International Symposium on Consumer Electronics, 2005. (ISCE 2005)., IEEE. Quebec, canada: IEEE, 2005, pp. 315-319. (Online). Available: http://ieeexplore.ieee.org/document/1502394/ 130 [ Links ]

52 () X. Fan, Z. Pang, D. Chen, and H. Z. Tan, " A Pipeline Architecture for 2-D Lifting-Based Discrete Wavelet Transform of JPEG2000," in 2010 International Conference on Multimedia Technology, IEEE. Xuanwu, china: IEEE, Oct. 2010, pp. 1-4. (Online). Available: http://ieeexplore.ieee.org/document/5629864/ 130 [ Links ]

53 () X. Mei-hua, C. Zhang-jin, R. Feng, and C. Yu-lan, " Architecture research and VLSI implementation for discrete wavelet packet transform," in Conference on High Density Microsystem Design and Packaging and Component Failure Analysis, 2006. HDP'06., IEEE. Shanghai, China: IEEE, 2006, pp. 1-4. (Online). Available: http://ieeexplore.ieee.org/document/1707554/ 130 [ Links ]

54 () M. Maamoun, R. Bradai, A. Meraghni, and R. Beguenane, " Low cost VLSI discrete wavelet transform and FIR filters architectures for very high-speed signal and image processing," in 2010 IEEE 9th International Conference on Cyberntic Intelligent Systems, IEEE. Reading, United Kingdom: IEEE, Sep. 2010, pp. 1-6. (Online). Available: http://ieeexplore.ieee.org/document/5898088/ 130, 140 [ Links ]

55 () Z. Wu and W. Wang, " Pipelined architecture for FPGA implementation of lifting-based DWT," in 2011 International Conference on Electric Information and Control Engineering. Yichang, China: IEEE, Apr. 2011, pp. 1535-1538. (Online). Available: http://ieeexplore.ieee.org/document/ 5777731/ 130, 140 [ Links ]

56 () M. Nagabushanam, P. Cyril Prasanna Raj, and S. Ramachandran, " Design and FPGA implementation of modified Distributive Arithmetic based DWT-IDWT processor for image compression," in 2011 International Conference on Communications and Signal Processing, IEEE. Nanjing, China: IEEE, Feb. 2011, pp. 1-4. (Online). Available: http://ieeexplore.ieee. org/document/5739397/ 140 [ Links ]

Received: April 28, 2016; Accepted: October 21, 2016

This is an open-access article distributed under the terms of the Creative Commons Attribution License