Speculative computation in square root operations

文档序号：1419476 发布日期：2020-03-13 浏览：18次中文

阅读说明：本技术 平方根运算中的推测性计算 (Speculative computation in square root operations ) 是由贾维尔·迪亚兹·布鲁古拉于 2019-08-29 设计创作，主要内容包括：本公开涉及平方根运算中的推测性计算。提供了一种数据处理装置,其包括输入电路,该输入电路用于接收与平方根指令相对应的信号,该平方根指令标识输入值。处理电路对输入值执行迭代平方根运算,并且包括：数字确定电路,该数字确定电路用于针对当前迭代确定平方根运算的至少部分结果的下一个数字；以及余数确定电路,该余数确定电路用于针对当前迭代确定平方根运算的至少部分余数。针对当前迭代的下一个数字是基于来自前一次迭代的平方根运算的至少部分余数来确定的。针对当前迭代的至少部分余数是基于来自前一次迭代的平方根运算的至少部分余数和至少部分结果来确定的。(The present disclosure relates to speculative computation in square root operations. A data processing apparatus is provided that includes an input circuit for receiving a signal corresponding to a square root instruction that identifies an input value. The processing circuit performs an iterative square root operation on the input value and includes: a number determination circuit for determining a next number of at least partial results of the square root operation for a current iteration; and a remainder determination circuit for determining at least a partial remainder for the square root operation for the current iteration. The next number for the current iteration is determined based on at least a partial remainder of the square root operation from the previous iteration. At least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration.)

1. A data processing apparatus comprising:

an input circuit to receive a signal corresponding to a square root instruction that identifies an input value;

a processing circuit to perform an iterative square root operation on the input values, the processing circuit comprising:

a digital determination circuit to: determining, for a current iteration, a next number of at least a partial result of the square root operation; and

a remainder determination circuit to: determining at least a partial remainder of the square root operation for the current iteration,

wherein the content of the first and second substances,

the next number for the current iteration is determined based on at least a partial remainder of the square root operation from a previous iteration;

at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and is

The processing circuit is adapted to speculatively generate a set of at least partial remainders of candidates for the square root operation for the current iteration before at least a partial result of the square root operation for the current iteration is determined.

2. The data processing apparatus according to claim 1, wherein

At least a partial remainder of each of the at least partial remainders of the candidates for the square root operation for the current iteration is a redundant representation.

3. The data processing apparatus according to claim 1, wherein

The processing circuitry is adapted to speculatively generate a set of at least partial remainders of candidates for the square root operation for the current iteration before the next number is determined by the number determination circuitry.

4. The data processing apparatus according to claim 1, wherein

The set of candidate at least partial remainders includes one candidate value for each possible value of the most recent number of the at least partial remainder.

5. The data processing apparatus according to claim 1, comprising:

a normalization circuit to normalize the input value.

6. The data processing apparatus according to claim 1, wherein

The processing circuit is adapted to perform two iterations in a single clock cycle.

7. The data processing apparatus according to claim 1, wherein

The iterative square root operation is radix 4.

8. The data processing apparatus according to claim 1, wherein

At least a partial remainder of the square root operation from the previous iteration comprises a predetermined number of most significant bits of at least a partial remainder of the square root operation from the previous iteration.

9. The data processing apparatus according to claim 1, wherein

At least one of at least a partial remainder of the square root operation from the previous iteration and at least a partial result of the square root operation from the previous iteration is provided in a redundant representation.

10. The data processing apparatus according to claim 1, wherein

The processing circuitry is adapted to speculatively generate a set of at least partial remainders of candidates for the square root operation for the current iteration in both a redundant representation and a non-redundant representation; and is

At least a partial remainder of candidates for the square root operation for the current iteration in a non-redundant representation is based on most significant bits of at least a partial remainder of the square root operation from the previous iteration.

11. A method of data processing, comprising:

receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value;

performing an iterative square root operation on the input value by:

determining, for a current iteration, a next number of at least a partial result of the square root operation;

determining at least a partial remainder of the square root operation for the current iteration, wherein

The next number for the current iteration is determined based on at least a partial remainder of the square root operation from a previous iteration;

Speculatively generating a set of at least partial remainders of candidates for the square root operation for the current iteration before the next number is determined.

12. A data processing apparatus comprising:

means for receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value;

a module for performing an iterative square root operation on the input values, comprising:

means for determining a next number of at least partial results of the square root operation for a current iteration; and

means for determining at least a partial remainder of the square root operation for the current iteration, wherein

The next number for the current iteration is determined based on at least a partial remainder of the square root operation from a previous iteration;

at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and

means for speculatively generating a set of at least partial remainders of candidates for the square root operation for the current iteration before the next number is determined.

Technical Field

The present disclosure relates to data processing. More specifically, it relates to the computation of square roots.

Background

The square root can be calculated using iterative numerical recursion. At each iteration (rather than the first iteration), the result of the previous iteration is the input and the result of the one or more numbers is the output. The circuitry that performs each iteration requires fast operations so that a greater number of digits are output during a single cycle of the processor clock. The result is a faster circuit that can calculate the square root faster.

Disclosure of Invention

Viewed from a first example configuration, there is provided a data processing apparatus comprising: an input circuit for receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value; processing circuitry for performing an iterative square root operation on an input value, the processing circuitry comprising: a digital determination circuit to: determining a next number of at least partial results of the square root operation for the current iteration; and a remainder determination circuit for: determining, for a current iteration, at least a partial remainder of the square root operation, wherein a next number for the current iteration is determined based on the at least partial remainder of the square root operation from a previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and the processing circuitry is adapted to speculatively generate a set of at least partial remainders for candidates for the square root operation of the current iteration before at least a partial result of the square root operation for the current iteration is determined.

Viewed from a second example configuration, there is provided a data processing method comprising: receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value; performing an iterative square root operation on an input value by: determining a next number of at least partial results of the square root operation for the current iteration; and determining, for the current iteration, at least a partial remainder of the square root operation, wherein the next number for the current iteration is determined based on the at least partial remainder of the square root operation from the previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and speculatively generating a set of at least partial remainders of candidates for the square root operation of the current iteration before the next number is determined.

Viewed from a third example configuration, there is provided a data processing apparatus comprising: means for receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value; a module for performing an iterative square root operation on an input value, comprising: means for determining a next number of at least partial results of the square root operation for the current iteration; and means for determining at least a partial remainder for the square root operation for the current iteration, wherein the next number for the current iteration is determined based on the at least partial remainder for the square root operation from the previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and means for speculatively generating a set of at least partial remainders of candidates for a square root operation for the current iteration before the next number is determined.

Drawings

The invention will be further described, by way of example only, with reference to the embodiments of the invention illustrated in the accompanying drawings, in which:

FIG. 1 schematically illustrates a data processing apparatus according to some embodiments;

FIG. 2 illustrates an example of a processing circuit according to some embodiments;

FIGS. 3 and 4 illustrate relocation of circuitry in a processing circuit according to some embodiments;

FIG. 5 illustrates an example of a processing circuit according to some embodiments; and

FIG. 6 illustrates a flow diagram of a display data processing method according to some embodiments.

Detailed Description

Before discussing embodiments with reference to the figures, a description of the embodiments below is provided.

According to one example configuration, there is provided a data processing apparatus comprising: an input circuit to receive a signal corresponding to a square root instruction that identifies an input value; processing circuitry for performing an iterative square root operation on an input value, the processing circuitry comprising: a number determination circuit for determining a next number of at least partial results of the square root operation for a current iteration; and a remainder determination circuit for determining at least a partial remainder for the square root operation for a current iteration, wherein a next number for the current iteration is determined based on at least the partial remainder for the square root operation from a previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and the processing circuitry is adapted to speculatively generate a set of at least partial remainders for candidates for the square root operation of the current iteration before at least a partial result of the square root operation for the current iteration is determined.

By speculatively generating a set of at least partial remainders for candidates for a square root operation, generation of at least partial remainders may be omitted from a critical path of operations performed by the data processing apparatus. This may be accomplished, for example, by generating a set of at least partial remainders of candidates while performing other operations. Once the next number of operations has been determined, this can be used to select a candidate to output at least a partial remainder resulting from the square root operation for the current iteration. Thus, the time taken to perform one iteration is reduced. This may reduce the time constraints on the circuit and, in some cases, may cause the number of iterations performed within a clock cycle to be increased in the system, where the generation of at least a portion of the remainder is not speculatively determined. It is noted that although the term "square root" is used herein, situations may arise where the exact square root cannot be determined as is known in the art (e.g., because the number is not reasonable, or cannot be represented in binary). Thus, the term "square root" is intended to encompass the square root approximation that occurs in this case.

In some embodiments, each of at least a partial remainder of candidates for the square root operation of the current iteration is a redundant representation. Redundant representation is a technique that enables efficient transfer of data values between certain circuits. In particular, the redundant representation represents a number as a pair of values, rather than a single value. The pair of values may be a sum and carry value, or a positive and negative value. Some circuits (e.g., addition circuits) can process numbers given in this format faster than numbers given in another format. By speculatively generating candidate values for at least a partial remainder of a square root operation in a redundant representation, the time that may be spent converting the candidates to a non-redundant representation may be eliminated from the critical path.

In some embodiments, the processing circuitry is adapted to speculatively generate a set of at least partial remainders for candidates for a square root operation for a current iteration before a next number is determined by the number determination circuitry. The at least partial result of the square root operation for the current iteration includes at least partial result of the square root operation for the previous iteration and a next number calculated for the current iteration. In this way, each iteration adds an additional number to at least part of the result.

In some embodiments, the set of candidate at least partial remainders comprises one candidate value for each possible value of the most recent number of the at least partial remainder in the set of at least partial remainders. In each iteration, another number of at least a partial result of the square root operation is the output. However, there are only a limited number of possibilities for the number as output. The number of possible digits depends on the radix (r) of the square root operation. In general, the numbers may take the following values: { -a, - (a-1), … -1, 0, +1, …, + a }, wherein a ≧ ceil (r/2) and a < r. For radix 4, if a is 2, this will give the set-2, -1, 0, 1, 2. This is called the minimum redundant digital set. If a is 3, then there are seven values { -3, -2, -1, 0, 1, 2, 3 }. This is called the maximum redundant digital set. If a is 4, then there are nine values-4, -3, -2, -1, 0, 1, 2, 3, 4 }. This is called a super redundant digital set because a > r-1. Thus, the data processing apparatus speculatively generates a candidate set of at least partial remainders by generating one candidate for each possible digital value. By using a minimal redundancy set, the number of candidates is kept low and thus the amount of circuitry required to consider each candidate is also kept low. In the example above, one candidate would correspond to the possible number-2, another candidate would correspond to the possible number-1, and so on. Partial remainders are calculated in all of these possible cases, and once the next number is known, it can be determined which of these possible numbers is correct, and the corresponding at least partial remainder can be selected from the candidates. This eliminates the need to compute at least part of the remainder only after the next digit is known, thus eliminating this computation from the critical path.

In some embodiments, the data processing apparatus includes a normalization circuit that normalizes the input values. In normalized form, the mantissa is greater than or equal to 1 and less than 2. However, the integer portion of the mantissa is not stored in a floating point number. Further, in some embodiments, scaling (scaling) is performed (e.g., by scaling circuitry). This allows certain assumptions to be made about the input values as well as the output values. For example, scaling may involve altering the mantissa (e.g., dividing the mantissa by 2), thereby adjusting the exponent. If the exponent can become even, the exponent for the square root can be determined by dividing the exponent for the input value by 2.

In some embodiments, the processing circuitry is adapted to perform two iterations in a single clock cycle. By reducing the time required to perform each iteration, the number of iterations of a possible square root operation per clock cycle may be increased. In this way, the present techniques may be applied to circuits that implement at least two iterations per clock cycle. Thus, the entire square root operation may be performed faster (e.g., using fewer clock cycles) than other proposed techniques.

In some embodiments, the iterative square root operation is radix-4 (radix-4). The radix-4 implementation of the present embodiment makes it possible to output numbers in the format of two binary numbers at each iteration. For example, if the digital set includes { -2, -1, 0, +1, +2}, then these may be binary coded as part of the redundant representation: -2 ═ 10, -1 ═ 01, 0 ═ 00, +1 ═ 01, +2 ═ 10. Note that the positive and negative values are output the same (e.g., -2 and +2 both output binary values of 10). This is because in the redundant representation, negative and positive values are provided as separate words. For example, the sequences 1, -2, 0 would be encoded as positive words of 01, 00, 00 and negative words of 00, 10, 00. In other embodiments, other cardinalities are possible, such as 2 or 8. However, the number of at least partial remainders of candidates will also increase as there are more possible output numbers. Thus, increasing the radix (and thus reducing the number of iterations required to determine the square root) results in a more scalable circuit.

In some embodiments, the at least partial remainder of the square root operation from the previous iteration comprises a predetermined number of most significant bits of the at least partial remainder of the square root operation from the previous iteration. All bits of the at least partial remainder from the square root operation of the previous iteration are not considered, only a predetermined number of most significant bits of the at least partial remainder from the square root operation of the previous iteration are considered. This predetermined number depends on the radix of the square root operation and contains a sufficient number of bits to include the bits necessary for the rounding operation. In this way, insignificant bits that have no effect on the overall output of the result can be discarded, resulting in a less complex circuit.

In some embodiments, at least one of at least a partial remainder of the square root operation from a previous iteration and at least a partial result of the square root operation from the previous iteration is provided in a redundant representation.

In some embodiments, the processing circuitry is adapted to speculatively generate a set of at least partial remainders of candidates for a square root operation for a current iteration in a redundant representation and a non-redundant representation; and at least a partial remainder of the candidates for the square root operation of the current iteration in the non-redundant representation is based on the most significant bits of at least a partial remainder of the square root operation from a previous iteration. This may be accomplished, for example, by generating a set of at least partial remainders of candidates for the square root operation for the current iteration in a redundant representation, and then taking the most significant bits of these candidates and converting them to a redundant representation. By representing in redundant form the set of at least partial remainders for candidates for a square root operation generated by the current iteration, the selected candidate may be provided as input for the next iteration. By providing a candidate set of approximate remainders in a non-redundant representation, the selected candidate can be used as part of a comparison function to select the next number for the next iteration. In this way, two elements of the square root operation can be omitted from the critical path, thereby reducing the time constraints of the circuit.

Specific embodiments will now be described with reference to the accompanying drawings.

Fig. 1 illustrates an apparatus 100 according to some embodiments. The apparatus 100 includes a receiving circuit 110 responsible for receiving input values. The input value may be received as part of a signal corresponding to a square root instruction. The effect of the signal is to cause the apparatus 100 to perform a square root operation on the input value so as to output a value corresponding to the square root of the input value. It is noted that the actual square root of the input value may be, for example, an irrational number or another type of number that cannot be accurately represented in binary representation. Thus, the output may actually be an approximation of the square root of the input value.

After the receiving circuit 110 receives the input value, the input value is provided to the normalization and scaling circuit 120. In some embodiments, the normalization and scaling circuit performs an initial normalization and scaling operation on the input values. These operations may be performed in order to increase the degree to which floating point operations may be performed. For example, the manner in which the square root operation is performed may be assumed by performing a particular operation to receive input values in a particular format. Thus, further iterations of the square root operation may be simplified, thereby using less circuitry, or performing the square root operation in a smaller number of processing cycles than would otherwise be required. In some embodiments, the normalization process allows the initial iteration of the square root operation to proceed more quickly.

After performing any necessary normalization and scaling, the input values are passed to the processing circuit 130. The processing circuit 130 includes an initial iteration circuit 140, as well as a number determination circuit 150 and a remainder determination circuit 160. In this embodiment, the initial iteration circuit 140 performs an initial iteration of an iterative square root operation. In some embodiments, the role of initial iteration circuitry 140 is not an explicit, separate element, but is performed by digit determination circuitry 150 and remainder determination circuitry 160. In other embodiments, the initial iteration circuitry is a simplified version of the number determination circuitry 150 and the remainder determination circuitry 160. In particular, the initial iteration of the square root operation may be simplified if the input into the initial iteration circuit 140 is known to have a particular format. For example, if the input value is known to be between two boundaries (e.g., by a normalization and scaling circuit), then the output value may also be known to be between two boundaries, thereby limiting the possible numbers output in the first iteration. Such techniques are beyond the scope of this disclosure unless the disclosure does not preclude the use of such techniques.

After performing the initial iteration, the number determination circuit 150 and the remainder determination circuit 160 perform further iterations of the square root operation. In each iteration, one or more further digits of the final result are produced by the digit determination circuit 150. When a desired level of precision is reached (e.g., when a desired number of digits have been output by the digit determination circuit 150), the iterative operations performed by the processing circuit 130 end, and the set of digits output by the digit determination circuit 150 are concatenated. The concatenated results are then passed to the scaling and rounding circuit 170, where the scaling operation performed by the scaling and rounding circuit 170 may correspond to the inverse (inverse) of any scaling operation performed by the normalization and scaling circuit 120. Furthermore, many different rounding operations may occur depending on the needs of the user performing the square root operation. After any scaling and rounding operations are performed, the final result is output by scaling and rounding circuit 170.

Fig. 2 illustrates an example of a processing circuit 130 according to some embodiments. In these embodiments, the digital determination circuit 150 is composed of a first digital determination circuit part 150a and a second digital determination circuit part 150 b. The first number determination circuit component 150a is responsible for generating a first number from an iteration of the square root operation and the second data determination circuit component 150b is responsible for determining a second number of a result generated from a further iteration of the square root operation. Similarly, the remainder determining circuit 160 is composed of a first remainder determining circuit section 160a and a second remainder determining circuit section 160 b. As with the number determination circuit, the first remainder determination circuit component 160a is responsible for generating remainder values for iterations of the square root operation, and the second remainder determination circuit component 160b is responsible for generating further remainder values from further iterations of the square root operation.

The first remainder determination circuit component 160a receives at least some of the root and remainder values from the previous iteration in a redundant representation. At least a portion of the root includes all of the output numbers output by the number determination circuit 150 so far. The at least partial root is at least partial because the concatenation of all numbers may or may not be an exact square root. At least some of the root and remainder values are received by next remainder determination circuit 200 a. This will calculate the remainder for the current iteration according to the following equation:

rem[i+1]＝r×rem[i]-s_i+1×(2×S[i]+s_i+1×r^-(i+1))

where ' i ' is the number of iterations, ' r ' is the base (e.g. 4) ' s_i'is the digital output in iteration i,' a 'is the maximum output number in base r,' S_i' is the partial root before iteration i.

Note that there are five next remainder circuits 200a, 200b, 200c, 200d, 200 e. Each of the five next remainder circuits 200a, 200b, 200c, 200d, 200e speculatively determines a next remainder value, each assuming a different value for the newly determined number to be output by the first number determination circuit component 150 a. Thus, these circuits produce a set of candidate remainder values, each of which is input into the 5:1 multiplexer 210. In this way, the next remainder value rem [ i +1 ]]Are generated by each of the next remainder circuits 200. Once the current iteration s has been determined_iA number of +1, which will be used as a select signal for the 5:1 multiplexer 210 to select the correct one of the five candidates to be output as the remainder value rem [ i +1 ] of the current iteration]. Thus, there is no need to wait for the number of the current iteration to be known before computing the remainder value for the current iteration. This can significantly reduce the length of the "critical path".

The first digit determination circuit section 150a operates substantially simultaneously with the first remainder determination circuit section 160 a. The first number determination circuit part 150a takes the remainder value rem i of the previous iteration]As input, the most significant bit (redundant representation). In the present embodiment, it is assumed that the square root operation is performed at radix 4, and thus the appropriate level of precision is requiredIs 9 bits. Thus, a 9-bit adder 220 is provided to convert the most significant bit of the remainder of the previous iteration from a redundant representation to a non-redundant representation. The output of adder 220 is then provided to comparison circuit 230. Here, the most significant bits of the remainder value are compared to different comparison constants to determine the current iteration s_i+1Is output to the digital signal processor. These selection constants are provided in the literature, for exampleErcegovac and Tom-s Lang, section 8.2.1, "Division and Square Root, Digit Current Algorithms and Implementations", published by Kluwer Academic Publishing in 1994.

The numbers are then provided to the root update circuit 230 that updates at least part of the roots and to the 5:1 multiplexer 210 in the first remainder determination circuit block 160a as described above as selection signals.

In the present embodiment, the second digital determination circuit part 150b operates in a manner similar to that of the first digital determination circuit part 150 a. The significant differences here are: the outputs of the second number determination circuit 150b and the second remainder determination circuit 160b may be provided as inputs back to the first number determination circuit 150a and the first remainder determination circuit 160a if further iterations are performed, or may be output as a final result if no further iterations are performed.

Thus, it can be seen that by providing the next remainder circuit 200, the next remainder value for each iteration can be speculatively determined before at least one input required for the calculation is known. Specifically, a next remainder circuit 200 is provided such that calculations of next remainders may be performed in parallel, each circuit 200a, 200b, 200c, 200d, 200e assuming a current number s_i+1Is different one of the possible values of (a). Once the current number s is known_i+1May be used as a select signal for the 5:1 multiplexer 210 to select the corresponding remainder value. Therefore, it is not necessary to know the current number s_i+1The calculation is performed later, but may be performed in advance (e.g., AND number)Word determination circuit 150 in parallel) and determines the next digit s_i+1Then, only the selection operation is performed to output the corresponding remainder value. Thus, once the next digit is known, the remainder value can be output quickly, thereby completing the operation quickly. Thus, the time constraint for each iteration of the square root operation is reduced. In some cases, this may allow a large number of iterations of the square root operation to be performed in a single cycle of the processor. This in turn enables the square root operation to be performed faster. Thus, the skilled person will appreciate that as particular components are placed within the processing circuitry 130, particular operations may be performed in parallel and/or speculatively by the circuitry to make the circuitry work faster.

Fig. 3 illustrates a manner in which similar techniques may be used instead of or in addition to those described with respect to fig. 2. Specifically, the 9-bit adder 300 of the second digital determination circuit 150b need not be implemented as part of the second digital determination circuit 150 b. In particular, it will be appreciated by those embodiments that the possible inputs to the 9-bit adder 300 are also limited, and thus the output of the 9-bit adder 300 may be generated speculatively. This operation may be performed in the first digital determination circuit 150a at a time similar to the operation of the 9-bit adder 220 and the comparison circuit 230 of the first digital determination circuit 150 a. In a similar manner as described with respect to fig. 2, by speculatively performing this calculation, and then once the next number s is known_i+1The correct output is selected and the critical path can be passed by only requiring the next digit s to be known_i+1A selection operation is performed instead of an addition operation to reduce. An example of how such a circuit may be configured is shown with respect to fig. 4.

Fig. 4 shows an example of a processing circuit 130 'in which the 9-bit adder 300 circuit has been moved to the first digital determination circuit 150 a'. As with the previously described techniques, a 9-bit adder circuit is provided 5 times 300a, 300b, 300c, 300d, 300e to speculatively generate a remainder value rem [ i +1 ] for the current iteration in a non-redundant representation]. This results from the provision of five 9-bit adders 300a-300e in the present example, each responsible for generating a remainder value rem [ i +1 ] for the current iteration]Is calculated as a candidate value of (1). Each of the additivesThe counters 300a-300e behave as if the next digit output by the comparator circuit 230 is a different one of the possible values-2, -1, 0, 1, 2 (for base 4). The presence of five possible values requires five 9-bit adders. These candidates are provided to a 5:1 multiplexer 400. The select signal to the 5:1 multiplexer 400 is the current iteration s_i+1Is output to the digital signal processor. Thus, when the next digit is known, it can be provided to the multiplexer to provide the appropriate remainder value for the previous iteration in a non-redundant representation. Thus, the 9-bit adder circuit 300 does not need to operate after the number is determined. In other words, this circuit has been removed from the critical path and can operate substantially in parallel with the 9-bit adder 220 already located in the first bit determination circuit 150 a'. Thus, the amount of processing present on the critical path of the digital determination circuit 150 may be reduced. Again, this results in the square root operation being performed faster and makes the timing constraints easier to satisfy.

Fig. 5 shows an embodiment of the processing circuit 130 "in more detail. Of particular note, the 9-bit adder circuit 220 has been replaced with an 8-bit adder 520. The adder can perform effective addition on 9 bits by additionally using the carry signal cin. Also in the present embodiment, the selection constant supplied to the comparison circuit 230 is generated due to interval [ i ]. The interval of the first two iterations varies (because the first two digits of the partial root determine the interval) and the possible range of the partial root i at iteration i is divided into several equal-sized sections.

The interval value is updated in the first digital determination circuit part 150a ″ by the interval update circuit 530. For the next remainder circuit 200, the most recently determined number (consisting of two bits, since this embodiment is radix 4) is concatenated using a mask (mask) to the correct position of the shifted part root and this input is provided to the 4:2 carry save adder together with the remainder value rem [ i ] of the previous iteration. Note that the mask value is shifted to the right by two bits between the first remainder determining circuit section 160a ″ and the second remainder determining circuit section 160b ″. This is due to another number (comprising two bits) concatenated between the two components 160a "and 160 b". Note that many of the inputs to the partial root are 56 bits. This is because in a double-precision floating-point number, there are 53 decimal places. A guard bit is required for the rounding operation. The added bits of the integer component, plus the padding bits, make the total number up to 56 bits. 59 bits are provided for remainder input. These include 55 decimal digits and 4 integer digits (due to the maximum value of the remainder).

FIG. 6 shows a flow diagram 600 of a display data processing method according to one embodiment. At step 610, an input is received that indicates a value that is the subject of a square root operation. At step 620, the input values are scaled and appropriately converted to a normalized format. At step 630, a first iteration of the square root operation is performed. The first operation may be considered separately from the other iterations, since the first iteration may be performed faster than the other iterations due to the limited number of possible input values. In any case, at step 640, the next number and at least a portion of the remainder value are determined for the next two iterations in a single clock cycle. The process involves speculatively generating at least a partial set of residuals for candidates for a square root operation for at least one of the iterations prior to determining at least a partial result of the square root operation for the current iteration. At step 650, it is determined whether the operation is complete. This may be determined based on the number of iterations that have been performed and whether the iterations have reached a desired level of accuracy. If further iterations are to be performed (e.g., the process is not complete), then the process will return to step 640 and perform two further iterations at step 640. Otherwise, at step 660, the output numbers of step 630 and step 640 are concatenated to produce a result. It is then scaled and rounded appropriately and the final value is output at step 670. Note that in some embodiments, at step 640, it may be indicated that only one further iteration is performed, rather than two iterations, e.g., if doing so would bring the level of accuracy to a desired level.

Thus, it has been demonstrated that by speculatively generating a set of candidate values prior to determining at least a partial result of a current iteration, the operands to a critical path may be reduced. Thus, some processes may be parallelized, and thus the time required for each iteration to complete may increase. This in turn allows the timing constraints to be relaxed or more iterations to be performed than would otherwise be possible in a single clock cycle. Therefore, the square root operation can be performed faster.

In this application, the word "configured to … …" is used to indicate that an element of an apparatus has a configuration capable of performing a defined operation. In this context, "configuration" refers to an arrangement or manner of hardware or software interconnections. For example, the apparatus may have dedicated hardware providing the defined operations, or a processor or other processing device may be programmed to perform the functions. "configured to" does not mean that the device elements need to be altered in any way to provide the defined operation.

Although illustrative embodiments of the present invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes, additions and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims may be made with the features of the independent claims without departing from the scope of the invention.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种近似计算tanh函数的装置

Speculative computation in square root operations

相关技术

网友询问留言