Device and method for solving complex number N-degree square root

文档序号:1952072 发布日期:2021-12-10 浏览:4次 中文

阅读说明:本技术 一种求解复数n次方根的装置和方法 (Device and method for solving complex number N-degree square root ) 是由 李丽 徐瑾 傅玉祥 陈辉 蒋林 武瑞琪 何书专 陈健 于 2021-09-27 设计创作,主要内容包括:本发明公开一种求解复数N次方根的装置和方法,涉及复数N次方根运算技术领域。针对现有技术中复数N次方根计算过程复杂,效率低等问题,本发明构建CORDIC模块进行计算,CORDIC模块采用流水线架构,可同时进行多次迭代,利用复数N次方根N个根求解过程的相似性共享计算资源降低计算复杂度,并行执行结果处理单元的计算,降低计算成本,提高计算效率和计算精度,硬件装置复杂度低,可支持10~(-8)到10~(4)范围内的输入,相对误差数量级可达10~(-6)。(The invention discloses a device and a method for solving a complex number N-th square root, and relates to the technical field of complex number N-th square root operation. Aiming at the problems of complex computation process of a plurality of N square roots, low efficiency and the like in the prior art, the CORDIC module is constructed for computation, the CORDIC module adopts a pipeline architecture and can carry out iteration for multiple times at the same time, the similarity of the complex N square roots and the N roots in the solving process is used for sharing computing resources, the computation complexity is reduced, the computation of a result processing unit is executed in parallel, the computation cost is reduced, the computation efficiency and the computation precision are improved, the complexity of a hardware device is low, and 10 can be supported ‑8 To 10 4 Input in the range, the relative error can reach 10 orders of magnitude ‑6 。)

1. The device is characterized by comprising a CORDIC calculating module and a result processing unit, wherein input data a and b are calculated by the CORDIC calculating module and then input to the result processing unit for calculation to obtain a calculation result.

2. The apparatus of claim 1, wherein the CORDIC module comprises a first coordinate transformation unit, a modular length computation unit, a phase angle computation unit, and a second coordinate transformation unit, the first coordinate transformation unit comprises a circular vector computation unit; the module length calculating unit comprises a first linear vector calculating unit, a hyperbolic vector calculating unit and a hyperbolic rotation calculating unit; the phase angle calculation unit includes a second linear vector amount calculation unit; the second coordinate conversion unit includes a circular rotation calculation unit;

the input data are input into a first coordinate conversion unit, are converted into a polar coordinate form from a plane coordinate form, are calculated by a module length calculation unit and a phase angle calculation unit respectively after being converted into coordinates, and are input into a second coordinate conversion unit, and the polar coordinate form is converted into the plane coordinate form; the system uses a pipelined architecture.

3. The apparatus of claim 2, wherein the CORDIC module has X convergence times, X being an integer greater than zero; the CORDIC module is iterated X times through an X-stage pipeline architecture to accomplish result convergence.

4. The apparatus of claim 3, wherein the number of iterations comprises a positive number of iterations for determining the computational accuracy and a negative number of iterations for expanding the computational convergence range.

5. The apparatus according to claim 4, further comprising a look-up table for storing constant values associated with N.

6. The apparatus as claimed in claim 1, wherein the result processing unit comprises a plurality of parallel sub-computing units, and different sub-computing units in the result processing unit are dynamically activated according to the difference of the processing unit input integer N, so as to perform the computation of N roots in parallel.

7. The apparatus of claim 6, wherein the sub-calculation unit calculates the root of the complex number N using a trigonometric function formula of sum and difference of two anglesReal and imaginary parts of (c).

8. The apparatus of claim 7, wherein the sub-computation unit comprises a multiplier, an adder and a subtracter, wherein the output terminals of the first multiplier and the third multiplier are connected to the input terminal of the adder, and the output terminal of the first multiplier and the third multiplier is connected to the input terminal of the adder to calculate the outputAn imaginary part of (d); the output ends of the second multiplier and the fourth multiplier are connected with the input end of the subtracter to calculate outputThe real part of (a).

9. A method for solving a plurality of square roots of degree N, characterized in that, using an apparatus for solving a plurality of square roots of degree N according to any one of claims 1 to 8, said plurality of roots being denoted as z ═ a + b ×, i, the input data a, b are subjected to X iterations in the CORDIC module through a pipeline architecture to complete the convergence of the result, X being an integer greater than zero; and transmitting the calculation result of the CORDIC module to a result processing unit for calculation and parallel calculation to obtain a calculation result.

10. The method of claim 9, wherein N is an integer greater than or equal to two and less than or equal to ten.

Technical Field

The present invention relates to the technical field of complex number-N square root operations, and more particularly, to an apparatus and method for solving complex number-N square root.

Background

The complex operation is a core part of circuit calculation, is widely applied to the fields of communication systems and signal processing, and is used for real-time data representation and system modeling; the N-power root operation is an important component of a complex function theory, and the N-power root operation is introduced in the calculation of polynomial calculation, matrix calculation, trigonometric function and the like in time to simplify the calculation process. However, the complex N-th root operation has a high complexity due to the uncertainty of the number of roots and the complexity of the complex operation, and most of the research on the N-th root operation focuses on real numbers or the complex N-th root operation is usually implemented by software. However, various algorithms are often adopted for mixed operation through software implementation, rather than a special algorithm, accurate and reliable calculation is guaranteed, redundancy exists in the process, and the performance in real-time work is poor.

Another approach is to accelerate the N-th root operation by Application Specific Integrated Circuit (ASIC) hardware to achieve high computational performance. However, only a few jobs are related to hardware implementation of complex square roots, and as high-order roots are widely applied in the fields of atmospheric models, radiation and the like, only square root operations cannot meet application requirements; while in a specific hardware implementation, the resource consumed by the constant associated with N is proportional to the range of N, in practical applications, the most common value of N is an integer from 2 to 10.

A coordinate rotation digital computer (CORDIC) can effectively calculate transcendental functions such as trigonometric functions, exponential functions, logarithmic functions and the like through simple shift and addition operation, can realize higher calculation speed, has better balance between precision and area, and realizes low cost. Therefore, the invention provides a low-complexity hardware solution based on CORDIC, which is used for calculating a complex number of square roots from 2 to 10, and reduces the complexity of hardware realization while realizing high calculation efficiency.

Because the number of results is uncertain, the N-th power root calculation has been a challenging issue, and most hardware implementations focus on the real N-th power root or just the quadratic root calculation. Chinese patent application No. CN202011357034.8, published 2021, 03.12.d., discloses a computing method for calculating a complex number of N root-opening numbers based on CORDIC method, which is an early work result of the applicant of the present invention, and although it can implement arbitrary order of N root-opening number computation, it can only serially compute each input complex number of N root-opening numbers according to the value of N, and only compute 1 of N results each time, but there are N results for the N root-opening number computation, when the value of N is large, the complexity of the circuit will increase rapidly with the increase of the value of N, so the computing efficiency and flexibility of the method are not sufficient.

Disclosure of Invention

1. Technical problem to be solved

Aiming at the problems of complex computation process and low efficiency of a plurality of N-th-order square roots in the prior art, the invention provides a device and a method for solving the plurality of N-th-order square roots.

2. Technical scheme

The purpose of the invention is realized by the following technical scheme.

The device comprises a CORDIC calculating module and a result processing unit, wherein input data a and b are calculated by the CORDIC module and then input to the result processing unit for calculation to obtain a calculation result.

Furthermore, the CORDIC module comprises a first coordinate conversion unit, a module length calculation unit, a phase angle calculation unit and a second coordinate conversion unit, wherein the first coordinate conversion unit comprises a circular vector calculation unit; the module length calculating unit comprises a first linear vector calculating unit, a hyperbolic vector calculating unit and a hyperbolic rotation calculating unit; the phase angle calculation unit includes a second linear vector amount calculation unit; the second coordinate conversion unit includes a circular rotation calculation unit;

the input data are input into a first coordinate conversion unit, are converted into a polar coordinate form from a plane coordinate form, are calculated by a module length calculation unit and a phase angle calculation unit respectively after being converted into coordinates, and are input into a second coordinate conversion unit, and the polar coordinate form is converted into the plane coordinate form; the system uses a pipelined architecture.

Furthermore, the convergence time of the CORDIC module is X, and X is an integer greater than zero; the CORDIC module is iterated X times through an X-stage pipeline architecture to accomplish result convergence.

Further, the number of iterations includes a positive number of iterations for deciding the calculation accuracy, and a negative number of iterations for expanding the calculation convergence range.

Still further, the apparatus includes a look-up table for storing constant values associated with N. The constants in the calculation are pre-calculated and stored in the lookup table, so that complex hardware calculation can be avoided.

Furthermore, the result processing unit comprises a plurality of parallel sub-computing units, and different sub-computing units in the result processing unit are dynamically activated according to different input integers N of the processing unit, so that the N root computations are completed in parallel.

Further, the sub-calculation unit calculates using a trigonometric function formula of sum and difference of two anglesReal and imaginary parts of (c).Real part ofAnd imaginary partThe expression is as follows:

furthermore, the sub-calculation unit comprises a multiplier, an adder and a subtracter, wherein the output ends of the first multiplier and the third multiplier are connected with the input end of the adder, and the calculation outputAn imaginary part of (d); the output ends of the second multiplier and the fourth multiplier are connected with the input end of the subtracter to calculate outputThe real part of (a).

A method for solving a plurality of square roots of degree N, using said device for solving a plurality of square roots of degree N, said plurality of roots being denoted as z ═ a + b × i, input data a, b being subjected to X iterations in a CORDIC module through a pipeline architecture to complete result convergence, X being an integer greater than zero; and transmitting the calculation result of the CORDIC module to a result processing unit for calculation and parallel calculation to obtain a calculation result.

Further, N is an integer of two or more and ten or less.

The invention innovatively uses an efficient parallel result processing unit, reduces the calculation complexity by utilizing the similarity of the solving process of the N square roots of the plurality of N square roots, adopts a pipeline architecture, can dynamically support the calculation of the plurality of square roots from 2 to 10 times, saves the storage resource and overcomes the problems of high calculation complexity and long calculation time of the N square roots.

3. Advantageous effects

Compared with the prior art, the method adopts a pipeline architecture, can calculate a plurality of iterative processes simultaneously, and improves the throughput rate; in the calculation process, the constant value related to N is calculated in advance and stored in the lookup table, and the corresponding constant is flexibly taken out according to different integers N, so that the hardware overhead is further reduced, and the storage resource is saved.

The invention shares the computing resource by utilizing the similarity of the solving process of the N roots of the complex N-th-order square root, thereby reducing the hardware realization complexity of the high-order N-th-order square root; the solution is solved by simple shift and addition operations using CORDIC properties. According to the difference of the input integer N, different result processing units are dynamically activated, N results are calculated in parallel, the flexibility is guaranteed, the input range, the throughput rate and the flexibility are effectively improved, and the hardware implementation complexity and the resource consumption are reduced.

The invention enlarges the calculation convergence range, and the software simulation can support 10-8To 104Input in the range, the relative error can reach 10 orders of magnitude-6The method has wide support range and high calculation precision.

Drawings

FIG. 1 is a hardware architecture diagram of the present invention;

FIG. 2 is a diagram of a parallel result processing unit architecture of the present invention;

FIG. 3 is a schematic diagram of a 23-stage pipeline architecture of the CV-CORDIC module of the present invention;

FIG. 4 is a graph of the simulated relationship of the average relative error, the integer value of N, and the number of positive iterations P in the present invention.

Detailed Description

The invention is described in detail below with reference to the drawings and specific examples. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

Examples

As shown in fig. 1, a real part a and an imaginary part b of a complex number z are input into a system, and pass through a CORDIC module, wherein the CORDIC module comprises a first coordinate conversion unit, a module length calculation unit, a phase angle calculation unit and a second coordinate conversion unit, and the first coordinate conversion unit comprises a circular vector calculation unit; the module length calculating unit comprises a first linear vector calculating unit, a hyperbolic vector calculating unit and a hyperbolic rotation calculating unit; the phase angle calculation unit includes a second linear vector amount calculation unit; the second coordinate conversion unit includes a circular rotation calculation unit;

the input data are input into a first coordinate conversion unit, converted into a polar coordinate form from a plane coordinate form, the data after coordinate conversion are calculated through a module length calculation unit and a phase angle calculation unit respectively, and input into a second coordinate conversion unit, the polar coordinate form is converted into the plane coordinate form and sent to a result processing unit; the system uses a pipeline architecture to implement high-speed full-pipeline computing.

The result processing unit performs shift and addition operation by using a trigonometric function dihedral sum difference formula, and calculatesReal and imaginary parts of (c). The result processing unit comprises a plurality of parallel sub-computing units, and different sub-computing units in the result processing unit are dynamically activated according to different input integers N of the processing unit, so that the N roots of the processing unit are calculated in parallel.

The high-order N-th-order root has N roots, and the complexity of the circuit is rapidly increased along with the increase of N. Taking N as an example of 10, if 10 paths of computing resources are used to compute 10 results in parallel, the hardware overhead is 10 times that of the original hardware; if 10 roots are computed in series, a large amount of computation time is consumed. In order to reduce the complexity of the high-order nth root, the embodiment shares the computing resources by using the similarity in the solving process of the N roots, and finally connects the parallel result processing units for parallel processing, thereby reducing the hardware implementation complexity of the high-order nth root.

In the embodiment, five CORDIC algorithms are used for calculating the N-th root of the complex number in three coordinate systems, each coordinate system includes two modes of rotation and vector, and the generalized CORDIC formula is as follows:

xk+1=xk-μdk(2-kyk)

yk+1=yk+dk(2-kxk)

zk+1=zk-dkek

where k represents the current iteration number, dkRepresenting a decision operator, dkThe value of (d) is determined by the operating mode of the CORDIC, in the rotating mode, dk=sign(zk) (ii) a In vector mode, dk=sign(xkyk). In addition to this, when different coordinate systems are used to describe the CORDIC equations, μ and ekThe value of (a) is shown in the following formula:

in the above formula, circular represents a circular coordinate system, linear represents a linear coordinate system, and hyperbolic represents a hyperbolic coordinate system. In general, k starts from 0, the value of k increases by 1 in each iteration, and the calculation range can be expanded through negative expansion, namely k starts from a negative value; an exception to the hyperbolic coordinate system is that when k is 4, 13, 40, which are fixed special values, the iteration needs to be repeated once to ensure convergence.

In this embodiment, each CORDIC operator is iterated 23 times to complete result convergence, theoretically, the iteration number of this system may be any value without considering accuracy, and in this embodiment, 23 times are taken as an example, and higher accuracy can be achieved by obtaining 23 times of iteration according to software simulation. The maximum number of positive iterations is chosen to be 20, where the relative error can be of the order of 10-6The requirement of good precision can be met; meanwhile, the maximum index of the negative iteration is-2, and the input range can be expanded to 10-8To 104And N times of square root operation of the general data is satisfied. Over 23 iterations, the CORDIC equation will converge to the values shown in table 1.

TABLE 1 CORDIC output Convergence values for three coordinate systems, two modes

In table 1, χ and λ are scaling factors used to correct the result, and for a circular coordinate system,for a hyperbolic coordinate system, the system is,since the number of iterations is known, these scaling factors are all constants that can be pre-computed by software and stored in a look-up table to avoid complex hardware computations.

According to table 1, a circular vector calculating unit, a hyperbolic vector calculating unit, a first linear vector calculating unit, a hyperbolic rotation calculating unit, a second linear vector calculating unit and a circular rotation calculating unit are used for calculating the N-th power root of a complex number z (m + j) N, wherein the calculation result of the circular vector calculating unit needs to pass through a multi-stage buffer unit and then sends data to the second linear vector calculating unit to ensure the synchronization with the hyperbolic rotation calculating unit, the connection mode of each CORIDC calculating unit and the input and output values are shown in fig. 1, and the result processing unit finally outputs the output values through shifting and adding operationReal and imaginary parts of (c).

Exponential form of z is ρ ej(2dπ+θ) Solving the square root of the complex number z for the N times through an exponential form; then:

whereind-0, 1,. and N-1; when m is more than or equal to 0,when m is less than 0 and n is more than or equal to 0,when m is less than 0 and n is less than 0,

depending on the N different values of d,n roots are provided; when the value of d is determined,andare constants which are stored in a lookup table in advance, and corresponding constants are flexibly fetched according to different integers N between 2 and 10 so as to save hardware calculation time.

As shown in fig. 2, the result processing unit includes a multiplier, an adder and a subtracter, wherein the output ends of the first multiplier and the third multiplier are connected with the input end of the adder, and the output end of the calculation is outputAn imaginary part of (d); the output ends of the second multiplier and the fourth multiplier are connected with the input end of the subtracter to calculate outputThe real part of (a). The result processing unit adopts a parallel structure, and a trigonometric function dihedral sum difference formula is used in the parallel result processing unit to calculateReal part ofAnd imaginary part

The value of the integer N input in each clock cycle can be dynamically changed between 2 and 10, and the parallel result processing unit dynamically activates different result units in the result processing unit according to the difference of the input integer N to complete the computation of N roots in parallel, as shown in fig. 2, at this time, N is 3, and the result processing unit activates the result1 to the result3 units to compute 3 roots at the same time.

FIG. 3 is a diagram of the 23-stage pipeline architecture of the CV-CORDIC module of the present invention, and the architecture of the operators of other CORDIC modules is similar to that of the present invention. In the first three reverse iteration processes, k is 0, and the iteration formula at this time is:

xk+1=xk-sign(yk)*yk

yk+1=yk+sign(yk)*xk

zk+1=zk-sign(yk)*tan-1 (1)

in the last twenty forward iterations, k is 1, 2.

xk+1=xk-sign(yk)*yk*2-k

yk+1=yk+sign(yk)*xk*2-k

zk+1=zk-sign(yk)*tan-1(2-k)

Wherein tan is-1(1) And tan-1(2-k) Is stored in a look-up table. In this embodiment, each CORDIC module operator is iterated for 23 times to complete result convergence, and the iteration number is matched with the pipeline level, so that high-throughput fixed-point implementation of 2 to 10 times of root computation can be finally completed. Software simulation proves that the maximum number of positive iterations is 20, which is enough to meet the requirement of good precision, meanwhile, the maximum index of negative iterations is-2, and the negative expansion method is slightly different and specific for three coordinate systems:

(a) for the circular coordinate system CORDIC algorithm, the sequence given by k-0, 0, 0, 1.., 20 has been examined as a better sequence than k-2, -1, 0, 1.., 20;

(b) for the linear coordinate system CORDIC algorithm, the iteration index set is extended to k-2, -1, 0, 1.., 20, as shown in fig. 3;

(c) for the hyperbolic coordinate system CORDIC algorithm, the iteration index set is extended to k-2, -1, 0, 1., 20 as with the linear coordinate system CORDIC algorithm, and the difference from the linear coordinate system CORDIC algorithm is that in the hyperbolic coordinate system CORDIC algorithm, the negative iteration has an iterative formula independent of the positive iteration operation.

In this embodiment, the precision is determined by the number of positive iterations, and the convergence range is expanded by the negative iterations, which have great flexibility, and 10000 data are simulated and calculated by MATLAB.

On the basis that the maximum negative iteration number is-2 and the maximum positive iteration number is P, the relation between the average relative error and the integer N value and the positive iteration number P is explored in the experiment.

The integer N supports a value of 2 to 10, when P is changed from 15 to 20, a simulation relation graph of the average relative error, the integer N value and the positive iteration number P shown in fig. 4 is obtained, when P is 20, the average relative error can reach 1.38 × 10-6, and 10 can be supported-8To 104The input range of (1).

This example was modeled using the Verilog HDL language and hardware simulation based on 10000 test data with an average relative error of 2.9578 × 10-6. Taking N as an example 10, an exemplary circuit using a TSMC 28nm CMOS process is synthesized by hardware implementation, and table 2 is a table comparing the comprehensive results and performance of this embodiment with those of the prior art.

TABLE 2

Process for the preparation of a coating Framework Frequency of Area (μm)2) Calculating a delay Accuracy of measurement
Earlier stage work 1 28nm Non-pipelined architecture 1.5GHz 6561 170.9ns 9.6117*10-5
Working in the early stage 2 28nm Pipeline architecture 2.218GHz 67964.17 4.50ns 2.9660*10-6
Working of the invention 28nm Pipeline architecture 2.218GHz 68070.87 0.451ns 2.9578*10-6

The prior work 1 is a patent cited in the background art, and is a result of the prior work of the applicant, the working frequency of the method is 1.5GHz, a pipeline architecture is not adopted, each input complex number N-th-order root can be serially calculated only according to the value of N, only 1 of N results is calculated each time, and taking N ═ 10 as an example, 255 clock cycles are needed for calculating one complex number 10-th-order root, and 170.9ns are needed.

The early-stage work 2 is to add a pipeline architecture on the basis of the early-stage work 1, the working frequency is 2.218GHz, taking N as an example 10, only 1 of 10 roots is calculated each time, and then 10 clock cycles are needed for calculating a complex 10-th-order root, and 4.5ns is needed.

In this embodiment, on the basis of the earlier work 2, a parallel result processing unit is designed, and the computation complexity is reduced by using the similarity of the complex number N-th-power root solving processes, taking N ═ i0 as an example, through earlier computation, only 1 clock cycle is needed for computing a complex number 10-th-power root, and 0.45ns is needed;

from the comprehensive results in table 2, the circuit complexity of the present embodiment is increased by only 0.157% compared with the previous operation 2, and the calculation speed is increased by 10 times. Compared with the earlier work 1, the calculation efficiency of the system of the embodiment can be improved by 379 times to the maximum extent, and the hardware implementation precision is improved by 1 order of magnitude.

In summary, the present invention provides a low hardware complexity architecture for dynamically supporting complex 2 to 10 th power root operations, which uses three CORDIC systems, circular, linear and hyperbolic, to construct a hardware efficient algorithm. The convergence range is expanded, and the software simulation can support 10-8To 104Up to 10-6The relative error of the system is wide in support range and high in precision; by adopting a pipeline architecture, high-speed full-flow calculation can be realized; reducing the computational complexity by utilizing the similarity of a complex number N times of square root N root solving processes; can dynamically support the calculation of the complex number of the square root of 2 to 10 degrees, and has high efficiencyHigh efficiency, high precision and low hardware complexity.

The invention and its embodiments have been described above schematically and without limitation, and although the invention has been shown and described with reference to specific preferred embodiments, it should not be construed as being limited to the invention itself. Various changes in form and details may be made therein without departing from the spirit or essential characteristics thereof, and it is intended that all matter contained in the accompanying claims and claims be interpreted as illustrative and not in a limiting sense. Several of the elements described in this application may also be implemented by one element in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种不重复随机自然数的快速生成方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类