Journal "Computational Technologies"

Article information

2018 , Volume 23, № 3, p.39-57

Isupov K.S., Knyazkov V.S., Kuvaev A.S.

Efficient scaling in RNS using interval estimations

Purpose. This paper addresses an acceleration of the scaling operation in the residue number system (RNS). In RNS, addition, subtraction and multiplication are concurrently performed on the n digits (residues) within n parallel channels, and this is the primary advantage of RNS over the binary system. However, some basic operations are more complex in RNS than in binary system. Scaling, i.e. division by a constant factor, is one of such operations. The impossibility of effective scaling prevents a more widespread use of parallel RNS arithmetic.

Methodology. We developed two RNS scaling algorithms, focused on arbitrary moduli sets. In these algorithms, in order to determine the remainder when dividing the number to be scaled by the scaling factor, interval estimation for the relative value of an RNS representation is used. The first algorithm is applicable for scaling factors that do not exceed the machine word size. The second algorithm is designed for scaling by an arbitrary power of two. Both algorithms require only machine-precision integer and floating-point operations, so they are well suited for implementation on many computing architectures.

Findings. The developed algorithms are implemented in C language for CPU and GPU using NVIDIA CUDA, and are tested on moduli sets of sizes from 8 to 64, which provide the dynamic ranges with bit sizes from 55 to 512, respectively. Performance of new algorithms is shown to be much higher than for the algorithms based on the Chinese remainder theorem and parity checking. In addition, new algorithms are well parallelized: increasing the number of RNS moduli leads only to a slight decrease in the performance of parallel CUDA implementations.

Originality/value. The proposed scaling algorithms can be used in many RNS applications that require a dynamic range reduction, for example, in digital signal processing and multiple-precision arithmetic. In particular, the proposed power-of-two scaling algorithm is used for fast rounding and exponent alignment in the hybrid CPUGPU software library for multiple-precision floating-point computations based on RNS, which is developed by the authors.

[full text]
Keywords: residue number system, scaling, parallel algorithms, CUDA programming

doi: 10.25743/ICT.2018.3.15962

Author(s):
Isupov Konstantin Sergeevich
Office: Vyatka State University
Address: 610000, Russia, Kirov
E-mail: ks_isupov@vyatsu.ru

Knyazkov Vladimir Sergeevich
Office: Vyatka State University
Address: 610000, Russia, Kirov

Kuvaev Alexander Sergeevich
Office: Vyatka State University
Address: 610000, Russia, Kirov

References:
[1] Akushskiy, I.Ya., Yuditskiy, D.I. Mashinnaya arifmetika v ostatochnykh klassakh [Machine arithmetic in residual classes]. Moscow: Sovetskoe Radio; 1968: 440. (In Russ.)

[2] Szabo, N.S., Tanaka, R.I. Residue arithmetic and its application to computer technology. N.Y.: McGraw-Hill; 1967: 236.

[3] Omondi, A., Premkumar, B. Residue number systems: theory and implementation. London: Imperial College Press; 2007: 312.

[4] Bajard, J.-C., Eynard, J., Merkiche, N. Montgomery reduction within the context of residue number system arithmetic. Journal of Cryptographic Engineering. 2017:1–12. Available at: https://doi.org/10.1007/s13389-017-0154-9

[5] Antao, S., Bajard, J.-C., Sousa, L. RNS-based elliptic curve point multiplication for massive parallel architectures. Computer Journal. 2012; 55(5):629–647.

[6] Cardarilli, G.C., Nannarelli, A., Re, M. RNS applications in digital signal processing. Embedded Systems Design with Special Arithmetic and Number Systems. Springer International Publishing. Available at: https://doi.org/10.1007/978-3-319-49742-6_8

[7] Chen, J., Yatskiv, V., Sachenko, A., Su, J. Wireless sensor networks based on modular arithmetic. Radioelectronics and Communications Systems. 2017; 60(5):215–224.

[8] Younes, D., Steffan, P. Efficient image processing application using residue number system. Proc. of the 20th Intern. Conf. Mixed Design of Integrated Circuits and Systems (MIXDES). Gdynia: IEEE; 2013:468–472.

[9] Isupov, K., Kuvaev, A., Popov, M., Zaviyalov, A. Multiple-precision residue-based arithmetic library for parallel CPU-GPU architectures: data types and features. Lecture Notes in Computer Science. 2017; (10421):196–204.

[10] Kong, Y., Phillips, B. Fast scaling in the residue number system. IEEE Transactions On Very Large Scale Integration (VLSI) Systems. 2009; 17(3):443–447.

[11] Low, J.Y.S., Chang, C.H. A VLSI efficient programmable power-of-two scaler for 2𝑛 − 1, 2𝑛, 2𝑛 + 1 RNS. IEEE Transactions on Circuits and Systems I. 2012; 59(12):2911–2919.

[12] Isupov, K., Knyazkov, V. Interval estimation of relative values in residue number system. Journal of Circuits, Systems and Computers. 2018; 27(1):1850004.

[13] Meyer-B‥ase, U., Stouraitis, T. New power-of-2 RNS scaling scheme for cell-based IC design. IEEE Transactions On Very Large Scale Integration (VLSI) Systems. 2003; 11(2):280–283.

[14] Garcıa, A., Lloris A. A look up scheme for scaling in the RNS. IEEE Transactions on Computers. 1999; 48(7):748–751.

[15] Cardarilli, G.C., Del Re, A., Nannarelli, A., Re, M. Programmable power-of-two RNS scaler and its application to a QRNS polyphase filter. Proc. of the 2005 IEEE Intern. Symp. on Circuits and Systems. Kobe: IEEE; 2005:1102–1105.

[16] Czyżak M., Smyk R., Ulman Z., Pipelined scaling of signed residue numbers with the mixed-radix conversion in the programmable gate array. Poznan University of Technology Academic Journals. Electrical Engineering. 2013; (76):89–99.

[17] Jullien, G.A. Residue number scaling and other operations using ROM arrays. IEEE Transactions on Computers. 1978; 27(4):325–336.

[18] Shenoy, A.P., Kumaresan, R. A fast and accurate RNS scaling technique for high speed signal processing. IEEE Transactions on Acoustics, Speech, & Signal Processing. 1989; 37(6):929–937.

[19] Barsi, F., Pinotti, M.C. Fast base extension and precise scaling in RNS for look-up table implementation. IEEE Transactions on Signal Processing. 1995; 43(10):2427–2430.

[20] P. V. Ananda Mohan Scaling, base extension, sign detection and comparison in RNS. Birkhäuser, Cham; 2016:133–162. Available at: https://doi.org/10.1007/978-3-319-41385-3_6 ссылка для Г.Г. https://link.springer.com/chapter/10.1007%2F978-3-319-41385-3_6

[21] Shenoy, A.P., Kumaresan, R. Fast base extension using a redundant modulus in RNS. IEEE Transactions on Computers. 1989; 38(2):292–297.

[22] Lu, M., Chiang, J.-S. A novel division algorithm for the residue number system. IEEE Transactions on Computers. 1992; 41(8):1026–1032.

[23] Sousa, L. 2𝑛 RNS scalers for extended 4-moduli sets. IEEE Transactions on Computers. 2015; 64(12):3322–3334.

[24] Hiasat, A. Efficient RNS scalers for the extended three-moduli set (2𝑛 − 1, 2𝑛+𝑝, 2𝑛 + 1). IEEE Transactions on Computers. 2017; 66(7):1253–1260.

[25] Soderstrand, M., Vernia, C., Chang, J.-H. An improved residue number system digitalto-analog converter. IEEE Transactions on Circuits and Systems. 1983; 30(12):903–907.

[26] Vu, T.V. Efficient implementations of the chinese remainder theorem for sign detection and residue decoding. IEEE Transactions on Computers. 1985; 34(7):646–651.

[27] Wilt, N. The CUDA handbook: a comprehensive guide to GPU programming. Addison-Wesley; 2013: 528.

[28] Harris, M. Optimizing parallel reduction in CUDA. Available at: http://developer. download.nvidia.com/compute/cuda/1.1 Beta/x86_website/projects/reduction/doc/ reduction.pdf (accessed 11.10.2017).

Bibliography link:
Isupov K.S., Knyazkov V.S., Kuvaev A.S. Efficient scaling in RNS using interval estimations // Computational technologies. 2018. V. 23. № 3. P. 39-57