

# Implementation of Discrete Cosine Transform using Common Boolean Logic Adder

Neelam Sharma, Vipul Agrawal

Trinity Institute of Technology and Research, Bhopal Madhya Pradesh, India

## ABSTRACT

Low-power design is one of the most important challenges to maximize battery life in portable devices and to save the energy during system operation. Discrete Cosine Transform (DCT) is widely used in image and video compression standards. In this paper, we review on a low-power DCT (Discrete Cosine Transform) architecture using varies techniques. Discrete Cosine Transform (DCT) is one of the most popular lossy techniques used today in video compression schemes. Several algorithms have been proposed to implement the DCT. Loeffler (1989) has given a new class of 1D-DCT using just 11 multiplications and 29 additions. To implement such an algorithm, one or more multipliers have to be integrated. This requires a high silicon occupation area. Arithmetic distribution is widely used for such algorithms. The coding for reconfigurable 8 point DCT has been done using VHDL under Xilinx platform.

Keywords: Discrete Cosine Transform (DCT), Inverse discrete Cosine Transform (IDCT), VHDL

## I. INTRODUCTION

The rapid growth of multimedia services running on portable applications demands low power and high quality implementation of complex signal processing algorithms. The applications of multimedia systems involve image and video processing and it should be implemented with low cost and low power because of limited battery lifetime. Many papers have been published on reducing power dissipation of image and video applications, especially low power design of discrete cosine transform [1]. DCT is a computation intensive operation in image and video compression. It is used in image and video compression standards such as JPEG [2], MPEG, H.263 [3] and H.264. The direct implementation of DCT requires large number of multipliers and adders. Many previous works focused on Distributed Arithmetic (DA) based DCT & multiple constant multiplications [4]. To reduce the power consumption Distributed Arithmetic (DA) is used without multiplier [5].DCT implementation using distributed arithmetic [DA] includes several advantages such as area reduction and high speed performance. High speed can be achieved by using conventional DA implementation by pre-computing possible values and storing it in ROM. But ROM based DA has the

disadvantage of redundancy which is introduced to accommodate all possible combinations of bit patterns of input signals. A regular and simple DCT architecture can be obtained by using bit serial DA based approach. MCM based DCT can be implemented with a smaller number of shifts and add operations.

The Discrete Cosine Transform (DCT) is a Fourier-like transform, which was developed by Ahmed, Natarajan and Rao in 1974. It has become one of the most widely used transform techniques in digital signal processing. The DCT is one of the computationally intensive transforms which require many multiplications and additions [1], while the Fourier Transform represents a signal as the mixture of sine and cosine, the Cosine Transform performs only the cosine-series expansion. The purpose of DCT is to perform de-correlation of the input signal and to present the output in the frequency domain. The DCT is known for its high "energy compaction" property, meaning that the transformed signal can be easily analyzed using few low-frequency components [4]. It turns out to be that the DCT is a reasonable balance of optimality of the input decorrelation (approaching the Karhunen-Loève transform) and the computational complexity. This fact made it widely used in digital signal processing [2].

The entire fast algorithm still require floating point multiplication which is slow in both hardware and software implementation. To achieve faster implantation, coefficients can be scaled and approximated by integer such as floating point multiplication can be replaced by integer multiplication [6]. This can be done by rounding floating point value to integer value by multiplying floating point value. Where can be any integer number? This is called as fixed point arithmetic. The resulting algorithms are much faster than the original version and therefore have wide practical applications.

## **II. METHODS AND MATERIAL**

### A. Literature Review

Mamatha I et al. [1], Discrete Fourier Transform is widely used in signal processing for spectral analysis, filtering, image enhancement, OFDM etc. Cyclic convolution based approach is one of the techniques used for computing DFT. Using this approach an N point DFT can be computed using four pairs of [(M-1)/2]-point cyclic convolution where M is an odd number and N=4M. This work proposes an architecture convolution based DFT and its for FPGA implementation. Proposed architecture comprises of a pre-processing element, systolic array and a post processing stage. Processing element of systolic array uses a tag bit to decide on the type of operation (addition/subtraction) on the input signals. Proposed architecture is simulated for 28 point DFT using ModelSim 6.5 and synthesized using Xilinx ISE10.1 using Vertex 5 xc5vfx100t-3ff1738 FPGA as the target device and can operate at a maximum frequency of 224.9MHz. The performance analysis is carried out in terms of hardware utilization and computation time and compared with existing similar architectures. Further, as the convolution based DCT has two systolic arrays similar to that of DFT, a unified architecture is proposed for 1D DFT/1D DCT.

Mansi Mane et al. [2], CORDIC or CO-ordinate Rotation Digital Computer is a fast, simple, coherent and powerful algorithm which is used for diversified Digital Signal Processing applications. In pursuance of speed and accuracy requirements of today's applications, we put forward variable iterations CORDIC algorithm. In this algorithm, to boost speed we can lessen number of iterations in CORDIC algorithm for specific accuracy. This enhances efficiency of conventional CORDIC algorithm which we have used to compute Discrete Cosine Transform for image processing. One Dimensional Discrete Cosine Transform is implemented by using only 6 CORDIC blocks which needs only 6 multipliers. Because of the simplicity in hardware speed of image processing on FPGA is raised. Further increase in speed can be achieved by concurrently processing number of macro-blocks of an image on FPGA.

Hyeonuk Jeong et al. [3], Low-power configuration is a standout amongst the most imperative difficulties to boost battery life in versatile gadgets and to spare the vitality amid framework operation. In this paper, we propose a low-power DCT (Discrete Cosine Transform) structural planning utilizing an adjusted multiplier-less CORDIC (Coordinate Rotation Digital Computer) number juggling.

The exchanging power utilization is diminished amid DCT: the proposed building design does not perform math operations of pointless bits amid the CORDIC figurings. The test results demonstrate that we can decrease up to 26.1% force dissemination without bargain of the last DCT results. Additionally, the pace of the proposed structural planning is expanded around 10%. The proposed low-power DCT structural engineering can be connected to customer gadgets and versatile sight and sound frameworks requiring high throughput and low-power.

Esakkirajan G et al. [4], CORDIC or CO-ordinate Rotation Digital Computer is a fast, simple, efficient and powerful algorithm used in Digital Signal Processing applications. In this paper, we extend the methodology for designing a low-power area-efficient DCT for image compression using only shift registers, and adders! Sub tractors and special interconnections. Through hardware synthesis we proved that shift and add based DCT computation is efficient one over conventional multiplier based approach and finally accuracy was measured by comparing PSNR value of reconstructed image with original image using MATLAB.

E. Jebamalar Leavline et al. [5], Discrete Cosine Transform (DCT) is widely used in image and video compression standards. This paper presents low-power co-ordinate rotation digital computer (CORDIC) based reconfigurable architecture for discrete cosine transform (DCT).



Figure 1: 8-point Discrete Cosine Transform

#### **B.** Discrete Cosine Transform

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio (e.g. MP3) and images (e.g. JPEG) (where small high-frequency components can be discarded), to spectral methods for the numerical solution of partial differential equations. The use of cosine rather than sine function is critical for compression, since it turns out (as described below) that fewer cosine functions are needed to approximate a typical signal, whereas for differential equations the cosines express a particular choice of boundary conditions.

#### **DCT** output

$$F(0) = 0.5(f(0) + f(1) + f(2) + f(3) + f(4) + f(5) + f(6) + f(7))\cos\frac{\pi}{4}$$

$$F(1) = 0.5[\{(f(0) - f(7)\}\cos\frac{\pi}{16} + \{f(1) - f(6)\}\cos\frac{3\pi}{16} + \{f(2) - f(5)\}\cos\frac{5\pi}{16} + \{f(3) + f(4)\}\cos\frac{7\pi}{16}]$$

$$F(2) = 0.5[\{(f(0) - f(3) - f(4) + f(7)\}\cos\frac{2\pi}{16} + \{f(1) - f(2) - f(5) + f(6)\}\cos\frac{6\pi}{16}]$$

$$F(3) = 0.5[\{(f(0) - f(7)\}\cos\frac{3\pi}{16} + \{f(6) - f(1)\}\cos\frac{7\pi}{16} + \{f(5) - f(2)\}\cos\frac{\pi}{16} + \{f(4) + f(3)\}\cos\frac{5\pi}{16}]$$

$$F(4) = 0.5[(f(0) + f(3) + f(4) + f(7) - f(1) - f(2) - f(5) - f(6))\cos\frac{\pi}{4}]$$

$$\begin{split} F(5) &= 0.5[\{(f(0) - f(7)\}\cos\frac{5\pi}{16} + \{f(6) - f(1)\}\cos\frac{\pi}{16} + \{f(2) - f(5)\}\\ &\cos\frac{7\pi}{16} + \{f(3) + f(4)\}\cos\frac{3\pi}{16}]\\ F(6) &= 0.5[\{(f(0) - f(3) - f(4) + f(7)\}\cos\frac{6\pi}{16} - \{f(1) - f(2) - f(5) + f(6)\}\cos\frac{2\pi}{16}]\\ F(7) &= 0.5[\{(f(0) - f(7)\}\cos\frac{7\pi}{16} + \{f(6) - f(1)\}\cos\frac{5\pi}{16} + \{f(2) - f(5)\}\cos\frac{3\pi}{16} + \{f(4) + f(3)\}\cos\frac{\pi}{16}] \end{split}$$

#### **Common Boolean Logic**

Area and power efficient high speed data logic path are the most significant areas of research. With the help of simple modification in gate level we can achieve the improvement in the results. Speed of the adder depends on the time required to propagate the carry through the adder. These adder works in series format, that is the sum of the elementary position bit is calculated when the previous bits are summed and the carry is propagated to that next stage.

Carry select adder (CSLA) is one of the advanced adders used in data processing processors to perform fast arithmetic function. It focuses on the problem of carry propagation delay by generating the carry independently at each stage and the select the efficient one with the help of multiplexer to perform the sum. The conventional CLSA is RCA (Ripple carry adder) which generate the partial sum and carry by using the input carry condition Cin=0 and Cin=1, select one out of each pair to form final sum and final carry output.

RCA is not area efficient as large number of gates circuitry is used to form the partial products and then the final sum and carry is selected.

Another form of CLSA adder uses binary to excess-1 convertor replacing ripple carry adder with Cin=1. This adder is known as CLSA along with BEC. The number of gates used has been reduced when we have to design large bit adder. This adders is more conventional as compare to RCA when deal with silicon area used but this is having marginally higher delay time.

The proposed Common Boolean Logic (CBL) adder is area-power-delay efficient. It work on the logic to remove the redundant adders and use Common Boolean Logic as compare to conventional carry select adder. The CBL block is comprised of two parts sum generation block and carry generation block. In sum generation block the output sum is achieved using the multiplex. This multiplex is used to select the output value depeding on the value of Cin( previous bit).

If Cin=0, then output is xor of the two input bits. If Cin=1, then output get inverted. In carry generation block, multiplexer is used to select the carry of next stage depending upon the previous carry input. If Cin=0, cout is OR of two input and if Cin=1 the output carry is AND of the input bit.



(a) CBL Block



Figure 2: Block Diagram of n-bit CBL

If 
$$C_{in} = 0$$
  
Sum = A XOR B  
Carry AOR B  
else  
Sum = NOT (A XOR B)  
Carry = A AND B

This same process is used for the n number of bits and thus we get the final sum and carry as output.

#### **III. CONCLUSION**

In literature survey we found that CBL adder based DCT algorithm is the best algorithm in the existing algorithm.

So we are implementation to CBL based DCT algorithm in this paper. The performance evaluation of the various sub modules are carried out using Xilinx 14.1 ISE Simulator and it was found that the circuits designed using DCT logic showed a reduced delay and power. As a future work more arithmetic and logical function can be used.

#### **IV. REFERENCES**

- Mamatha I, Nikhita Raj J, Shikha Tripathi, Sudarshan TSB, "Systolic Architecture Implementation of 1D DFT and 1D DCT", 978-1-4799-1823-2/15/\$31.00 ©2015 IEEE.
- [2] J. E. Volder, "The CORDIC trigonometric computing technique," IRE Trans. Electron. Comput. Vol. EC-8, no.3, pp.335-339, Sept. 1959.
- [3] Liyi Xiao Member, IEEE and Hai Huang, "Novel CORDIC Based Unified Architecture for DCT and IDCT", 2012 International Conference on Optoelectronics and Microelectronics (ICOM) 978-1-4673-2639-1/12/\$31.00 ©2012 IEEE.
- [4] Shymna Nizar N.S,Abhila and R Krishna, "An Efficient Folded Pipelined Architecture For Fast Fourier Transform Using Cordie Algorithm", 2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) IEEE.
- [5] E. Jebamalar Leavline, S.Megala2 and D.Asir Antony Gnana Singh, "CORDIC Iterations Based Architecture for Low Power and High Quality DCT", 2014 International Conference on Recent Trends in Information Technology 978-1-4799-4989-2/14/\$31.00 © 2014 IEEE.
- [6] Hyeonuk Jeong et al, "Low-Power Multiplierless" DCT Architecture Using Image Data Correlation;" IEEE Transactions on Consumer Electronics, Vol. 50, No. 1, FEBRUARY 2004.
- [7] Syed Ali khayam, "The Discrete cosine transform(DCT) Theory and Application" Department of Electrical and Computer Engineering Michigan state University, March10th 2003.
- [8] Satyasen Panda, "Performance Analysis and Design of a Discreet Cosine Transform processor Using CORDIC algorithm", 2008-2010.
- [9] Befrooz parhami, "Computer Airthmatic Algorithms and Hardware design", published by Oxford university press Inc. 198, Madison Avenue, New Yark, 2000.
- [10] Keshab K. Parhi, "VLSI Digital Signal Processing Systems, design and implementation", Wiley.
- [11] J.SriKrishna, "DESIGN OF 2D DISCRETE COSINE TRANSFORM USING CORDIC ARCHITECTURES IN VHDL" Department of Electronics and Communication Engineering National Institute of Technology, Rourkela May, 2007.
- [12] Deepika Ghai, "COMPAATIVE ANALYSIS OF VARIOUS CORDIC TECHNIQUES", June, 2011.
- [13] Yuan-Ho Chen et al, "A High Performance Video Transform Engine by Using Space- Time Scheduling Strategy", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012.