Method and device for realizing 32-bit integer division with high precision and low time delay

文档序号：1003151 发布日期：2020-10-23 浏览：24次中文

阅读说明：本技术 一种高精度低时延实现32位整数除法的方法及装置 (Method and device for realizing 32-bit integer division with high precision and low time delay ) 是由谭定富于 2020-07-10 设计创作，主要内容包括：本发明提供一种高精度低时延实现32位整数除法的方法及装置,包括有如下步骤：S1、将被除数,除数,输入判零单元如果被除数为0,商直接输出0,缩放因子为0；S2、将1输出的被除数,除数,输入符号提取模块,输出商的符号及被除数,除数的模值。本发明中,利用较少迭代次数CORDIC实现牛顿迭代的初值,只有加法及移位操作,资源消耗较低,方便硬件实现,通过缩放模块,将被除数除数缩放到一样的幅度位置,减小了计算位宽,减少了对迭代次数的需求,降低了运算周期,增加一次牛顿迭代,以少量运算器及时延的代价,增加了运算精度,输出结果使用商加缩放因子的形式,有效保证了商的精度,且输出位宽较小,方便后续使用。误差小于千分之一。(The invention provides a method and a device for realizing 32-bit integer division with high precision and low time delay, which comprises the following steps: s1, if the dividend, the divisor and the input zero judgment unit are 0, the quotient directly outputs 0 and the scaling factor is 0; and S2, outputting the dividend and the divisor output by 1, inputting the symbol extraction module, and outputting the symbol of the quotient, the dividend and the modulus value of the divisor. According to the invention, the initial value of Newton iteration is realized by using CORDIC with less iteration times, only addition and shift operations are required, the resource consumption is lower, the hardware implementation is convenient, dividend divisor is scaled to the same amplitude position through the scaling module, the calculation bit width is reduced, the requirement on the iteration times is reduced, the operation period is reduced, one Newton iteration is added, the operation precision is increased with the cost of a small number of operators and time delay, the output result uses the form of quotient plus scaling factor, the quotient precision is effectively ensured, the output bit width is smaller, and the subsequent use is convenient. The error is less than one in a thousand.)

1. A method for realizing 32-bit integer division with high precision and low time delay is characterized by comprising the following steps:

s1, if the dividend, the divisor and the input zero judgment unit are 0, the quotient directly outputs 0 and the scaling factor is 0;

s2, dividend and divisor output by 1, input symbol extraction module, symbol of output quotient and dividend, and modulus of divisor;

s3, inputting the dividend and the divisor output by 2 into a scaling module, scaling the dividend and the divisor to an effective bit number of 12 bits, outputting the dividend and the divisor, and recording scaling factors of the dividend and the divisor respectively;

s4, inputting the divisor output by 3 into CORDIC unit, updating quotient, remainder 2^12 and initial value 2^12, and iterating for 6 times;

s5, after 6 iterations are completed, outputting the quotient and the remainder as initial values, giving the initial values to a Newton iteration unit, and performing one Newton iteration;

s6, inputting the quotient output by 5 and the dividend output by 3, and outputting the sign/2 ^12 of the quotient and the dividend;

s7, updating the quotient output by 6, and outputting a symbol of quotient; scaling factor-divisor scaling factor-12 for subsequent use;

s8, the intermediate bit width of the CORDIC unit of the divider is within S14bit, and the intermediate bit width is realized only by addition and shift to obtain an initial value of Newton iteration;

s9, the divider gets the most significant 12 bits of the quotient, and for calculating y/x, the module outputs the quotient a, and a scaling factor b, y/x ═ a × 2^ (b).

2. The method of claim 1 wherein in step S1, if the dividend is not 0, the divisor is 0, the quotient output is 4095, the shift factor is 20, and if the dividend is not 0, the divisor is not 0, the next step is performed.

3. The method of claim 1, wherein according to the operation step in S4, the operation of the ith CORDIC iteration unit is as follows:

s401, obtaining di-sign (remainder) according to the sign of the remainder input in the current iteration;

s402, updating the remainder as follows: remainder (remainder + di divisor) × 2, update quotient: the quotient-di value/2^ (i-1).

4. The method of claim 1, wherein according to the operation in S5, the method further comprises the following steps:

s501, judging whether the remainder is 0, if so, ending Newton iteration and directly outputting to 6;

s502, updating the remainder as quotient remainder/2 ^18 and outputting the remainder;

s503 inputs the remainder output in S501, updates the quotient to be the quotient + remainder, and outputs the quotient.

5. The method of claim 1, wherein in the step of operating according to S9: when the following multiplication is used, a is directly used, b is continuously recorded, and when the addition is available, the scaling factors before and after the addition are aligned and then directly added.

6. A device for realizing 32-bit integer division with high precision and low time delay is characterized by comprising: the divider system comprises an input module, an extraction module, a scaling module, a calculation module and an output module.

7. The apparatus of claim 6, wherein the apparatus for implementing 32-bit integer division with high precision and low delay comprises: the input module is used for inputting numerical values; the extracting module is used for extracting numerical values of dividends and divisors; the scaling module is used for scaling the dividend and the divisor and recording scaling factors of the dividend and the divisor at the same time; the calculation module is used for calculating the input numerical value; and the output module is used for outputting the calculated numerical value.

Technical Field

The invention relates to the technical field of digital signal processing, in particular to a method and a device for realizing 32-bit integer division with high precision and low time delay.

Background

In the field of digital signal processing, a 32-bit integer divider is often used, for example, in operations such as signal normalization and channel estimation, but the existing 32-bit integer divider has the following disadvantages:

1. the existing division schemes often use derivative division, an SRT method, an addition and subtraction alternation method, a CORDIC method and the like, the operation period of the methods is often greatly increased along with the increase of digits, and the power consumption is wasted.

2. With the increase of the data bit width, the intermediate bit width of the arithmetic unit is increased more and more, and the occupied storage space is large.

3. In order to ensure the performance, the bit width of the output result is large, and the resource overhead is large when the subsequent addition and multiplication are used.

4. Reciprocal division, which usually uses newton iteration plus one multiplication to realize a/b, but there are generally two initial values of newton iteration, which are table lookup or Tylor expansion, both of which require storage space, and the Tylor expansion requires extra multiplication and addition, consuming resources.

Disclosure of Invention

The invention aims to provide a method and a device for realizing 32-bit integer division with high precision and low time delay, which can effectively solve the problems in the background technology.

In order to achieve the purpose, the invention is realized by the following technical scheme: a method for realizing 32-bit integer division with high precision and low time delay comprises the following steps:

s1, if the dividend, the divisor and the input zero judgment unit are 0, the quotient directly outputs 0 and the scaling factor is 0.

And S2, outputting the dividend and the divisor output by 1, inputting the symbol extraction module, and outputting the symbol of the quotient, the dividend and the modulus value of the divisor.

And S3, inputting the dividend and the divisor output by 2 into a scaling module, scaling the dividend and the divisor to an effective bit number of 12 bits, outputting the dividend and the divisor, and recording scaling factors of the dividend and the divisor respectively.

S4, inputting the divisor outputted by 3 into CORDIC unit, updating quotient, remainder 2^12 and initial value 2^12, and iterating for 6 times.

And S5, after 6 iterations are completed, outputting the quotient and the remainder as initial values, giving the initial values to a Newton iteration unit, and performing one Newton iteration.

S6, the quotient of the 5 output and the dividend of the 3 output are input, and the quotient/2 ^12 of the dividend is output.

S7, updating the quotient output by 6, and outputting a symbol of quotient; scaling factor dividend scaling factor-divisor scaling factor-12 for subsequent use.

And S8, the intermediate bit width of the CORDIC unit of the divider is within S14bit, and the intermediate bit width is realized by only adding and shifting to obtain an initial value of Newton iteration.

S9, the divider gets the most significant 12 bits of the quotient, and for calculating y/x, the module outputs the quotient a, and a scaling factor b, y/x ═ a × 2^ (b).

Further, in the operation step according to S1, if the dividend is not 0, the divisor is 0, the quotient is directly output as 4095, the shift factor is 20, and if the dividend is not 0, the divisor is input next.

Further, according to the operation step in S4, the operation of the ith CORDIC iteration unit is as follows:

s401 obtains di-sign (remainder) from the sign of the remainder input in the current iteration.

S402, updating the remainder as follows: remainder (remainder + di divisor) × 2, update quotient: the quotient-di value/2^ (i-1).

Further, according to the operation step in S5, the method further includes the steps of:

and S501, judging whether the remainder is 0, if so, ending Newton iteration and directly outputting to 6.

S502, updating the remainder as quotient remainder/2 ^18 and outputting the remainder.

S503 inputs the remainder output in S501, updates the quotient to be the quotient + remainder, and outputs the quotient.

Further, in the operation step according to S9: when the following multiplication is used, a is directly used, b is continuously recorded, and when the addition is available, the scaling factors before and after the addition are aligned and then directly added.

Further, the input module is used for inputting numerical values; the extracting module is used for extracting numerical values of dividends and divisors; the scaling module is used for scaling the dividend and the divisor and recording scaling factors of the dividend and the divisor at the same time; the calculation module is used for calculating the input numerical value; and the output module is used for outputting the calculated numerical value.

The invention provides a method and a device for realizing 32-bit integer division with high precision and low time delay. The method has the following beneficial effects:

(1) the invention comprises the following steps: the initial value of Newton iteration is realized by using CORDIC with less iteration times, only addition and shift operations are required, the resource consumption is low, and the hardware implementation is convenient.

(2) The invention comprises the following steps: the dividend divisor is scaled to the same amplitude position through the scaling module, so that the calculation bit width is reduced, the requirement on the iteration times is reduced, the operation period is reduced, one Newton iteration is added, and the operation precision is increased at the cost of a small number of operators and time delay.

(3) The invention comprises the following steps: the output result uses the form of quotient plus scaling factor, thereby effectively ensuring the precision of quotient, and the output bit width is small, thereby facilitating the subsequent use. The error is less than one in a thousand.

(4) The invention comprises the following steps: compared with the common CORIDC scheme, the method greatly reduces the iteration times and the operation period, and compared with the common Newton iteration scheme, the method uses CORDIC to calculate the initial value and the remainder, only increases a small amount of operation time delay, reduces the requirements of a storage and multiplication adder, and has superiority.

Description of the drawings:

FIG. 1 is a block diagram of the system of the present invention;

FIG. 2 is an overall block diagram of a divider according to the present invention;

FIG. 3 is a block diagram of an ith CORDIC iteration block according to the present invention;

FIG. 4 is a block diagram of a Newton's iteration unit of the present invention.

Detailed Description

The invention is illustrated below with reference to specific examples. It will be understood by those skilled in the art that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention in any way.

Implementation 1: referring to FIGS. 1-4: a method for realizing 32-bit integer division with high precision and low time delay is often applied to the digital signal processing field, such as the operation of signal normalization, channel estimation and the like, and the method is realized by the following steps:

the method comprises the following steps: and inputting the dividend, the divisor and the input zero judgment unit into the next step if the dividend is 0, directly outputting 0 by the quotient and having 0 scaling factor, if the dividend is not 0, having 0 divisor, directly outputting 4095 by the quotient and having 20 shifting factor, and if the dividend and the divisor are not 0.

Step two: the dividend and the divisor which are output by 1, the input symbol extraction module, the symbol of the output quotient, the dividend and the modulus of the divisor.

Step three: inputting the dividend and the divisor which are output by 2 into a scaling module, scaling the dividend and the divisor to an effective bit number of 12 bits, outputting the dividend and the divisor, and recording scaling factors of the dividend and the divisor respectively.

Step four: inputting the divisor output by 3 into the CORDIC unit, updating quotient, remainder ^ 2^12 and initial value ^ 2^12, iterating for 6 times, and operating the ith CORDIC iterating unit as follows:

1) based on the sign of the remainder of the current iteration input, di-sign (remainder) is obtained.

2) The update remainder is: the remainder is (remainder + di divisor) × 2. The update quotient is: the quotient-di value/2^ (i-1).

Step five: after 6 iterations are completed, outputting the quotient and the remainder as initial values, giving the initial values to a Newton iteration unit, and performing one Newton iteration:

1) and judging whether the remainder is 0, if so, ending Newton iteration and directly outputting to 6. Otherwise, proceed to (2).

2) Updating remainder as quotient remainder/2 ^ 18; and outputting the remainder.

3) Inputting the remainder output in 1), updating the quotient as the quotient plus the remainder, and outputting the quotient.

Step six: the quotient of the 5 output and the dividend of the 3 output are input, and the symbol/2 ^12 of the quotient dividend is output.

Step seven: updating the quotient output by the 6, and outputting the symbol of the quotient; scaling factor dividend scaling factor-divisor scaling factor-12 for subsequent use.

Step eight: the intermediate bit width of the CORDIC unit of the divider is within S14bit, and only addition and shifting are used for realizing. An initial value for newton iterations is obtained.

Step nine: the divider gets the most significant 12 bits of the quotient, and for calculating y/x, the module outputs the quotient a, and a scaling factor b, y/x ═ a × 2^ (b). When the following multiplication is used, a is directly used, b is continuously recorded, and when the addition is available, the scaling factors before and after the addition are aligned and then directly added.

A device for realizing 32-bit integer division with high precision and low time delay comprises: the divider system comprises an input module, an extraction module, a scaling module, a calculation module and an output module, wherein the input module is used for inputting numerical values; the extracting module is used for extracting numerical values of dividends and divisors; the scaling module is used for scaling the dividend and the divisor and recording scaling factors of the dividend and the divisor at the same time; the calculation module is used for calculating the input numerical value; the output module is used for outputting the calculated numerical value.

For a certain LTE receiver, the maximum receiving antenna is set to be 4, the maximum number of receiving layers is set to be 4, and when the actual receiving antenna is 4 and the number of receiving layers is 2, the received signal is set to beThe corresponding channel estimate is

Then the value of y-H x + n,n is a 2 x 1 matrix, and an ML solution of x is required to be solved.

Inputting H and y into QR decomposition module, outputting to obtain 4 x 3 upper triangular matrixIt can be known that the ML solution of y ═ H × x + n is equivalent to the equation

The solution of (1). Is reversely pushed to x₁＝R₂₃/R₂₂，x₂＝(R₁₃-R₁₂*x₂)/R₁₁。

At this time, in order to solve the equation, a divider module is required. By a dividend, divisor R₂₃、R₂₂The output is input into the divider, the quotient and the scaling factors a, b are obtained, x₁A 2 b. The result is obtained.

According to the invention, the initial value of Newton iteration is realized by using CORDIC with less iteration times, only addition and shift operations are required, the resource consumption is lower, the hardware implementation is convenient, dividend divisor is scaled to the same amplitude position through the scaling module, the calculation bit width is reduced, the requirement on the iteration times is reduced, the operation period is reduced, one Newton iteration is added, the operation precision is increased with the cost of a small number of operators and time delay, the output result uses the form of quotient plus scaling factor, the quotient precision is effectively ensured, the output bit width is smaller, and the subsequent use is convenient. Compared with the general Newton iteration scheme, the method uses CORDIC to calculate the initial value and the remainder, only increases a small amount of operation time delay, reduces the requirements of a storage and multiplication adder, and has superiority.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the inventive concept of the present invention, and these changes and modifications are all within the scope of the present invention.

9页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：具有浮动栅极晶体管类型存储单元的随机码产生器

Method and device for realizing 32-bit integer division with high precision and low time delay

相关技术

网友询问留言