FIR FILTER SOFT CORE GENERATOR Anatoli Sergyienko*, Vladislav Vasylienko*, Oleg Maslennikow** * National Technical University of Ukraine "KPI", Kiev, Ukraine, email: email@example.com ** Technical University of Koszalin, Poland, email: firstname.lastname@example.org ABSTRACT A new FIR filter soft core generator is represented which utilizes structural properties of FPGA, minimizes its hardware volume. It provides the design of highly pipelined structures with extremely high clock frequency which are described on VHDL. 1. INTRODUCTION The modern system on the chip (SOC) technology is based on a set of principles, one of them is the core generator use. Due to its flexibility, and reusability the core generator can substitute the very large library of cores of a single type. Tunable core generators help to optimize the speed, hardware, and energy consumption of the resulting project. Finite impulse responce (FIR) filters are widely used in digital signal processing systems due to their high throughput, easy procedures of their characteristic calculation, phase characteristic linearity. The implementation of FIR filters in FPGA has a set of advantages, like short design period, high throughput, hardware utilization effectiveness, and any rate of calculating precision because of full adaptation of implemented structure to the filtering algorithm. In many special applications, like high speed communications, FPGA is the only solution for FIR filter implementation. The FIR filter core generators can support the development of high speed FIR filters with given filter length, impulse characteristic, signal-to-noise ratio. The Xilinx FIR core generator is well known due to high speed, small hardware volume of generated filters . This generator generates so called firm cores, which are represented by the netlist. This is the source of such serious disadvantages like small range of parameter representation, and that this generator is tied to the small set of FPGA chip families. Due to the high measurements of generated relatively placed macros these filters are stiff and fit only large devices, and therefore they have improper device area utilization. In this paper the another FIR filter core generator is represented which generates filter cores described on VHDL, with increased throughput, minimized hardware, wide range of filter length, data and coefficient length, and which are implemented in the large set of FPGA families. 2. METHODS OF COEFFICIENT MULTIPLICATION IN FPGAS FIR filter computation, or convolution consists in deriving the following sum of products K yn = ∑ ak ⋅ xn−k +1 , k =1 where xn-k+1 is the input data sample, ak is the filter impulse response coefficient, K is the impulse response length, yn is the output result sample with index n. To derive one resulting sample the filter has to implement averagely K multiplications and additions. Therefore the convolution is rather computation intensive. Initially FPGAs were not intended for computation intensive applications like convolution. Therefore the multiplication units were firstly programmed as a set of logic equations or as a set of AND gates with an adder tree. When the application of FPGA in the intensive signal processing became clear and widely propagated, in the new FPGA families some elements were added which support the high speed multiplications. For example in Xilinx Virtex™ configurable logic blocks (CLB) the additional AND gates were added which support the multiplication, and then in VirtexII™ devices the separate multiply units were introduced. In the most applications the convolution procedure has the multiplication to the constant coefficients. And the use of combinatorial circuits of multipliers for such a multiplication is rather abundance. Besides, the number of multiply units in the new FPGA families is not enough for many DSP applications. Therefore a set of techniques were proposed to utilize the property of constant coefficient multipliers (KCM). For example, taking into account that the average number of one bits in the m bit constant coefficient is equal to m/2 the KCM can be designed which consists of a tree of m/2-1 adders. A.Peled and B.Liu have invented in1974 the method of deriving the sum of products named the distributed arithmetic. The method consists in the storing of up to 2K combinations of sums of K coefficients in the ROM. If the i-th bit silice of the input data set is inputted to the address input of this ROM then the read data is the proper sum of partial products. The whole sum of products is derived when up to m such sums of partial products are added in the result accumulator with the proper bit shift. This method was proposed by L.Mintzer as the effective method to derive the sum of products in FPGAs . It utilizes the property of 4 input look-up table (LUT) to be programmed as a high speed 16 bit ROM. This method is widely used in many DSP applications, and in the Xilinx FIR core generator as well . To achieve the high throughput all the sums of partial products to the i –th digits of multiplicants are calculated in parallel. Therefore such filters are usually called parallel distributed arithmetic (PDA) FIR filters. The disadvantage of the method consists in the extremally large ROM volume when the coefficient number K is larger than 8,…,12, and more. To minimize this disadvantage the filter is divided to a set of blocks, each of them calculates the sum of K=4 or 5 products. Another way to get the multiplication to the constant coefficient was proposed in . This method was shown in  as the effective multiply method for FPGAs. It based on the ROM which stores in its i-th location the coefficient a multiplied by i. Therefore the ROM of volume 2m represents the KCM to m bit dates. Such multiplier and the filter based on it are called ROM based multiplier (RBM) and RBM filter, respectively. When the ROM volume is too small then the input data is divided to l slices of p bits so as m≤lp, and the result is equal to the sum of l partial products. In  was shown that such multipliers for ASIC implementation are effective ones comparing to the usual 16x16 bit multipliers when the amount of the ROM volume is less than ca. 4 kbytes, and therefore the optimum slice width is p=4. The example of such KCM for 16 bit input data and 16 bit coefficients for 16 bit precision product and p=4 is illustrated by the fig.1. Such implementation of KCM has the following advantages. • High throughput. The multiplier netlist has much of pipeline registers, and the logical paths are rather short. Therefore the minimum clock period can be extremely short and is equal to the delay of a single adder. • Small hardware volume. The resulting hardware volume of RBM FIR filters is equal or less than hardware volume of PDA filters. It is less according to the following feature. Consider the coefficient a has a sequence of j equal (zero or one) bits. Then in the ROM the j-p+1 bit wide slice will have bits which are equal to each other. On the netlist synthesis stage such a bit slice will be minimized to the constant source. The FIR filters, which are used in the applications, usually have more than 70% of coefficients, which have more than j=4 equal most significant bits, and averagely 40% of coefficients having more than 7 such bits. Therefore when the RBM FIR filter is described on the language like VHDL, then the synthesized filter has in 1.1,...,1.6 times less hardware volume than the PDA filter has, depending on the filter coefficient set. x RGD 1512 11 8 7 4 3 0 ROM1 17 0 ROM2 13 0 ROM3 9 0 ROM4 5 0 RG RG RG SM1 17 RG SM2 RG 0 9 RG 0 SM3 15 RG 0 ay Fig.1. Example of the constant coefficient multiplier • Increased equivalent coefficient bit length. Such a feature is connected with the fact that when storing in the ROM, the coefficient code is rounded to m bits after multiplication of the coefficient a to the constant i but not before the multiplication. Therefore the equivalent coefficient length in RBM filters can be estimated by m-2, where m is the maximum word length of the ROM. Note that in PDA filters the coefficient length is equal to m-p. 3. DESCRIPTION OF THE FIR FILTER CORE GENERATOR A FIR filter core generator is designed which utilizes ROM based multipliers. The generated FIR filter structure is usual one, and contains the array of internal registers for input data delay – one per tap, array of coefficient multipliers –one per tap, and an adder tree which calculates the sum of delayed and weighted input dates. To support the high throughput the circuit is deeply pipelined. Two types of structures are distinguished called filters with nonsymmetrical (NSRBM) and symmetrical (SRBM) impulse characteristics. The last structure utilizes the property of couples of equal coefficients which reduces in two times the number of multipliers. The advantages of these structures were shown in . The generated cores are described by a single VHDL file and are independent on the FPGA family and vendor. This file can be compiled by any proper synthesis tool into the netlist for FPGAs of such families like Xilinx XC4000, Spartan™, Virtex™, Altera Flex10K™, Flex20K™, and others. The core generator is designed as the Win32 Delphi Dialog-Based Application. The main window is shown on the fig.2. Fig.2. FIR filter core generator parameterization window The generator is implemented as the command line driven application as well. This implementation is introduced in the HDL Wizard™ tool of the Active HDL package. The customer can exchange the following parameters: coefficient set which is given as decimal, octal, hexadecimal, binary integers or real numbers; input data width from 4 to 24 bits; coefficient width from 4 to 24 bits; number of taps from 4 to 100; SRBM, NSRBM, and antisymmetric filters; output precision up to full precision; 2's complement input data or unsigned input data; the speed slow down which is provided by the control of the Clock Enable signal. The regulated maximum parameters can have more high limits. The mentioned above parameter limits are set as practical limits. 4. EXPERIMENTAL RESULTS A set of FIR filters was generated by the new core generator and implemented in Xilinx FPGAs using Foundation 3.1. synthesis and implementation tool. The impulse responses were selected as practical ones which represent the low pass and band pass filters with sharp frequency response characteristics and 60-70 db suppression rate. Some results for input and output data width 16, coefficient width 16, and symmetrical impulse characteristic are shown in the table 1. The results of the Xilinx FIR PDA core generator are shown in the last row of the table for comparison. The table analysis shows that the resulting filter structures have the high throughput, and minimized hardware. The low hardware volume increase and low sampling frequency degradation when the filter complexity increases can be seen comparing the results on the Virtex devices. Tabl.1. Parameters of designed FIR-filters and Xilinx PDA filter Filter type SRBM SRBM SRBM SPDA Device type Number of taps, K VirtexE-8 20 VirtexE-8 99 XC4000XV-07 20 XC4000XV-07 20 Hardware volume, CLB (slices) 1022 5151 1008 1457 Maximum sampling frequency, MHz 212 208 76 50 5. CONCLUSION A new FIR filter soft core generator is represented which utilizes structural properties of FPGA, minimizes its hardware volume. It provides the design of highly pipelined structures with extremely high clock frequency, which is higher than 200 MHz for modern FPGA families. The generated core is described by a single VHDL file and therefore is independent on the FPGA vendor and family. Now the core generator is modernized to generate the decimation, interpolation filters, 1/n -band filters, etc. REFERENCES  Parallel Distributed Arithmetic FIR filter. Product Specification. Available at http://www.xilinx.com. Feb.8.1988.  Peled A., Liu B. A New Hardware Realization of Digital Filters. IEEE Trans. on Acoustic Signal, Speech Processing, V.ASSP-22, June, 1974.  Mintzer L. FIR filters with the Xilinx FPGA. FPGA'92, ACM/SIGDA Workshop on FPGAs, 1992, pp. 129-134.  Goslin G.R. A Guide to Using Field Programmable Gate Arrays (FPGAs) for ApplicationSpecific Digital Signal Processing Performance. Available at http://www.xilinx.com. 1995.  Kanevski Ju.S., Nekrasov B.A., Sergyienko A.M. Issues of High-Performance FFT Procesor Implementation. (In Rusian). Upravlajustshie Systemy i Mashiny, 1986. №4, pp.60-63.  Chapman K. Constant Coefficient Multipliers for the XC4000E. Application Note. Available at http://www.xilinx.com. December 11, 1996.  Kaniewski J., Berezowski R., Gretkowski D., Maslennikov O., Soltan P. Modele VHDL filtrov FIR przeznaczonych do realizaciji w ukladach FPGA. Materialy III Krajowej Konferencji Naukowej "Reprogramowalne uklady cyfrowe", RUC'2000, Szczecin (Poland), 10-11 Kwietnia 2000. pp.269-276.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project