Speculative computation in square root operations
阅读说明:本技术 平方根运算中的推测性计算 (Speculative computation in square root operations ) 是由 贾维尔·迪亚兹·布鲁古拉 于 2019-08-29 设计创作,主要内容包括:本公开涉及平方根运算中的推测性计算。提供了一种数据处理装置,其包括输入电路,该输入电路用于接收与平方根指令相对应的信号,该平方根指令标识输入值。处理电路对输入值执行迭代平方根运算,并且包括:数字确定电路,该数字确定电路用于针对当前迭代确定平方根运算的至少部分结果的下一个数字;以及余数确定电路,该余数确定电路用于针对当前迭代确定平方根运算的至少部分余数。针对当前迭代的下一个数字是基于来自前一次迭代的平方根运算的至少部分余数来确定的。针对当前迭代的至少部分余数是基于来自前一次迭代的平方根运算的至少部分余数和至少部分结果来确定的。(The present disclosure relates to speculative computation in square root operations. A data processing apparatus is provided that includes an input circuit for receiving a signal corresponding to a square root instruction that identifies an input value. The processing circuit performs an iterative square root operation on the input value and includes: a number determination circuit for determining a next number of at least partial results of the square root operation for a current iteration; and a remainder determination circuit for determining at least a partial remainder for the square root operation for the current iteration. The next number for the current iteration is determined based on at least a partial remainder of the square root operation from the previous iteration. At least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration.)
1. A data processing apparatus comprising:
an input circuit to receive a signal corresponding to a square root instruction that identifies an input value;
a processing circuit to perform an iterative square root operation on the input values, the processing circuit comprising:
a digital determination circuit to: determining, for a current iteration, a next number of at least a partial result of the square root operation; and
a remainder determination circuit to: determining at least a partial remainder of the square root operation for the current iteration,
wherein the content of the first and second substances,
the next number for the current iteration is determined based on at least a partial remainder of the square root operation from a previous iteration;
at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and is
The processing circuit is adapted to speculatively generate a set of at least partial remainders of candidates for the square root operation for the current iteration before at least a partial result of the square root operation for the current iteration is determined.
2. The data processing apparatus according to claim 1, wherein
At least a partial remainder of each of the at least partial remainders of the candidates for the square root operation for the current iteration is a redundant representation.
3. The data processing apparatus according to claim 1, wherein
The processing circuitry is adapted to speculatively generate a set of at least partial remainders of candidates for the square root operation for the current iteration before the next number is determined by the number determination circuitry.
4. The data processing apparatus according to claim 1, wherein
The set of candidate at least partial remainders includes one candidate value for each possible value of the most recent number of the at least partial remainder.
5. The data processing apparatus according to claim 1, comprising:
a normalization circuit to normalize the input value.
6. The data processing apparatus according to claim 1, wherein
The processing circuit is adapted to perform two iterations in a single clock cycle.
7. The data processing apparatus according to claim 1, wherein
The iterative square root operation is radix 4.
8. The data processing apparatus according to claim 1, wherein
At least a partial remainder of the square root operation from the previous iteration comprises a predetermined number of most significant bits of at least a partial remainder of the square root operation from the previous iteration.
9. The data processing apparatus according to claim 1, wherein
At least one of at least a partial remainder of the square root operation from the previous iteration and at least a partial result of the square root operation from the previous iteration is provided in a redundant representation.
10. The data processing apparatus according to claim 1, wherein
The processing circuitry is adapted to speculatively generate a set of at least partial remainders of candidates for the square root operation for the current iteration in both a redundant representation and a non-redundant representation; and is
At least a partial remainder of candidates for the square root operation for the current iteration in a non-redundant representation is based on most significant bits of at least a partial remainder of the square root operation from the previous iteration.
11. A method of data processing, comprising:
receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value;
performing an iterative square root operation on the input value by:
determining, for a current iteration, a next number of at least a partial result of the square root operation;
determining at least a partial remainder of the square root operation for the current iteration, wherein
The next number for the current iteration is determined based on at least a partial remainder of the square root operation from a previous iteration;
at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and is
Speculatively generating a set of at least partial remainders of candidates for the square root operation for the current iteration before the next number is determined.
12. A data processing apparatus comprising:
means for receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value;
a module for performing an iterative square root operation on the input values, comprising:
means for determining a next number of at least partial results of the square root operation for a current iteration; and
means for determining at least a partial remainder of the square root operation for the current iteration, wherein
The next number for the current iteration is determined based on at least a partial remainder of the square root operation from a previous iteration;
at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and
means for speculatively generating a set of at least partial remainders of candidates for the square root operation for the current iteration before the next number is determined.
Technical Field
The present disclosure relates to data processing. More specifically, it relates to the computation of square roots.
Background
The square root can be calculated using iterative numerical recursion. At each iteration (rather than the first iteration), the result of the previous iteration is the input and the result of the one or more numbers is the output. The circuitry that performs each iteration requires fast operations so that a greater number of digits are output during a single cycle of the processor clock. The result is a faster circuit that can calculate the square root faster.
Disclosure of Invention
Viewed from a first example configuration, there is provided a data processing apparatus comprising: an input circuit for receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value; processing circuitry for performing an iterative square root operation on an input value, the processing circuitry comprising: a digital determination circuit to: determining a next number of at least partial results of the square root operation for the current iteration; and a remainder determination circuit for: determining, for a current iteration, at least a partial remainder of the square root operation, wherein a next number for the current iteration is determined based on the at least partial remainder of the square root operation from a previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and the processing circuitry is adapted to speculatively generate a set of at least partial remainders for candidates for the square root operation of the current iteration before at least a partial result of the square root operation for the current iteration is determined.
Viewed from a second example configuration, there is provided a data processing method comprising: receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value; performing an iterative square root operation on an input value by: determining a next number of at least partial results of the square root operation for the current iteration; and determining, for the current iteration, at least a partial remainder of the square root operation, wherein the next number for the current iteration is determined based on the at least partial remainder of the square root operation from the previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and speculatively generating a set of at least partial remainders of candidates for the square root operation of the current iteration before the next number is determined.
Viewed from a third example configuration, there is provided a data processing apparatus comprising: means for receiving a signal corresponding to a square root instruction, the square root instruction identifying an input value; a module for performing an iterative square root operation on an input value, comprising: means for determining a next number of at least partial results of the square root operation for the current iteration; and means for determining at least a partial remainder for the square root operation for the current iteration, wherein the next number for the current iteration is determined based on the at least partial remainder for the square root operation from the previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and means for speculatively generating a set of at least partial remainders of candidates for a square root operation for the current iteration before the next number is determined.
Drawings
The invention will be further described, by way of example only, with reference to the embodiments of the invention illustrated in the accompanying drawings, in which:
FIG. 1 schematically illustrates a data processing apparatus according to some embodiments;
FIG. 2 illustrates an example of a processing circuit according to some embodiments;
FIGS. 3 and 4 illustrate relocation of circuitry in a processing circuit according to some embodiments;
FIG. 5 illustrates an example of a processing circuit according to some embodiments; and
FIG. 6 illustrates a flow diagram of a display data processing method according to some embodiments.
Detailed Description
Before discussing embodiments with reference to the figures, a description of the embodiments below is provided.
According to one example configuration, there is provided a data processing apparatus comprising: an input circuit to receive a signal corresponding to a square root instruction that identifies an input value; processing circuitry for performing an iterative square root operation on an input value, the processing circuitry comprising: a number determination circuit for determining a next number of at least partial results of the square root operation for a current iteration; and a remainder determination circuit for determining at least a partial remainder for the square root operation for a current iteration, wherein a next number for the current iteration is determined based on at least the partial remainder for the square root operation from a previous iteration; at least a partial remainder for the current iteration is determined based on at least a partial remainder and at least a partial result of the square root operation from the previous iteration; and the processing circuitry is adapted to speculatively generate a set of at least partial remainders for candidates for the square root operation of the current iteration before at least a partial result of the square root operation for the current iteration is determined.
By speculatively generating a set of at least partial remainders for candidates for a square root operation, generation of at least partial remainders may be omitted from a critical path of operations performed by the data processing apparatus. This may be accomplished, for example, by generating a set of at least partial remainders of candidates while performing other operations. Once the next number of operations has been determined, this can be used to select a candidate to output at least a partial remainder resulting from the square root operation for the current iteration. Thus, the time taken to perform one iteration is reduced. This may reduce the time constraints on the circuit and, in some cases, may cause the number of iterations performed within a clock cycle to be increased in the system, where the generation of at least a portion of the remainder is not speculatively determined. It is noted that although the term "square root" is used herein, situations may arise where the exact square root cannot be determined as is known in the art (e.g., because the number is not reasonable, or cannot be represented in binary). Thus, the term "square root" is intended to encompass the square root approximation that occurs in this case.
In some embodiments, each of at least a partial remainder of candidates for the square root operation of the current iteration is a redundant representation. Redundant representation is a technique that enables efficient transfer of data values between certain circuits. In particular, the redundant representation represents a number as a pair of values, rather than a single value. The pair of values may be a sum and carry value, or a positive and negative value. Some circuits (e.g., addition circuits) can process numbers given in this format faster than numbers given in another format. By speculatively generating candidate values for at least a partial remainder of a square root operation in a redundant representation, the time that may be spent converting the candidates to a non-redundant representation may be eliminated from the critical path.
In some embodiments, the processing circuitry is adapted to speculatively generate a set of at least partial remainders for candidates for a square root operation for a current iteration before a next number is determined by the number determination circuitry. The at least partial result of the square root operation for the current iteration includes at least partial result of the square root operation for the previous iteration and a next number calculated for the current iteration. In this way, each iteration adds an additional number to at least part of the result.
In some embodiments, the set of candidate at least partial remainders comprises one candidate value for each possible value of the most recent number of the at least partial remainder in the set of at least partial remainders. In each iteration, another number of at least a partial result of the square root operation is the output. However, there are only a limited number of possibilities for the number as output. The number of possible digits depends on the radix (r) of the square root operation. In general, the numbers may take the following values: { -a, - (a-1), … -1, 0, +1, …, + a }, wherein a ≧ ceil (r/2) and a < r. For
In some embodiments, the data processing apparatus includes a normalization circuit that normalizes the input values. In normalized form, the mantissa is greater than or equal to 1 and less than 2. However, the integer portion of the mantissa is not stored in a floating point number. Further, in some embodiments, scaling (scaling) is performed (e.g., by scaling circuitry). This allows certain assumptions to be made about the input values as well as the output values. For example, scaling may involve altering the mantissa (e.g., dividing the mantissa by 2), thereby adjusting the exponent. If the exponent can become even, the exponent for the square root can be determined by dividing the exponent for the input value by 2.
In some embodiments, the processing circuitry is adapted to perform two iterations in a single clock cycle. By reducing the time required to perform each iteration, the number of iterations of a possible square root operation per clock cycle may be increased. In this way, the present techniques may be applied to circuits that implement at least two iterations per clock cycle. Thus, the entire square root operation may be performed faster (e.g., using fewer clock cycles) than other proposed techniques.
In some embodiments, the iterative square root operation is radix-4 (radix-4). The radix-4 implementation of the present embodiment makes it possible to output numbers in the format of two binary numbers at each iteration. For example, if the digital set includes { -2, -1, 0, +1, +2}, then these may be binary coded as part of the redundant representation: -2 ═ 10, -1 ═ 01, 0 ═ 00, +1 ═ 01, +2 ═ 10. Note that the positive and negative values are output the same (e.g., -2 and +2 both output binary values of 10). This is because in the redundant representation, negative and positive values are provided as separate words. For example, the
In some embodiments, the at least partial remainder of the square root operation from the previous iteration comprises a predetermined number of most significant bits of the at least partial remainder of the square root operation from the previous iteration. All bits of the at least partial remainder from the square root operation of the previous iteration are not considered, only a predetermined number of most significant bits of the at least partial remainder from the square root operation of the previous iteration are considered. This predetermined number depends on the radix of the square root operation and contains a sufficient number of bits to include the bits necessary for the rounding operation. In this way, insignificant bits that have no effect on the overall output of the result can be discarded, resulting in a less complex circuit.
In some embodiments, at least one of at least a partial remainder of the square root operation from a previous iteration and at least a partial result of the square root operation from the previous iteration is provided in a redundant representation.
In some embodiments, the processing circuitry is adapted to speculatively generate a set of at least partial remainders of candidates for a square root operation for a current iteration in a redundant representation and a non-redundant representation; and at least a partial remainder of the candidates for the square root operation of the current iteration in the non-redundant representation is based on the most significant bits of at least a partial remainder of the square root operation from a previous iteration. This may be accomplished, for example, by generating a set of at least partial remainders of candidates for the square root operation for the current iteration in a redundant representation, and then taking the most significant bits of these candidates and converting them to a redundant representation. By representing in redundant form the set of at least partial remainders for candidates for a square root operation generated by the current iteration, the selected candidate may be provided as input for the next iteration. By providing a candidate set of approximate remainders in a non-redundant representation, the selected candidate can be used as part of a comparison function to select the next number for the next iteration. In this way, two elements of the square root operation can be omitted from the critical path, thereby reducing the time constraints of the circuit.
Specific embodiments will now be described with reference to the accompanying drawings.
Fig. 1 illustrates an
After the receiving
After performing any necessary normalization and scaling, the input values are passed to the
After performing the initial iteration, the
Fig. 2 illustrates an example of a
The first remainder
rem[i+1]=r×rem[i]-si+1×(2×S[i]+si+1×r-(i+1))
where ' i ' is the number of iterations, ' r ' is the base (e.g. 4) ' si'is the digital output in iteration i,' a 'is the maximum output number in base r,' Si' is the partial root before iteration i.
Note that there are five
The first digit
The numbers are then provided to the
In the present embodiment, the second digital
Thus, it can be seen that by providing the next remainder circuit 200, the next remainder value for each iteration can be speculatively determined before at least one input required for the calculation is known. Specifically, a next remainder circuit 200 is provided such that calculations of next remainders may be performed in parallel, each
Fig. 3 illustrates a manner in which similar techniques may be used instead of or in addition to those described with respect to fig. 2. Specifically, the 9-
Fig. 4 shows an example of a processing circuit 130 'in which the 9-
Fig. 5 shows an embodiment of the
The interval value is updated in the first digital
FIG. 6 shows a flow diagram 600 of a display data processing method according to one embodiment. At
Thus, it has been demonstrated that by speculatively generating a set of candidate values prior to determining at least a partial result of a current iteration, the operands to a critical path may be reduced. Thus, some processes may be parallelized, and thus the time required for each iteration to complete may increase. This in turn allows the timing constraints to be relaxed or more iterations to be performed than would otherwise be possible in a single clock cycle. Therefore, the square root operation can be performed faster.
In this application, the word "configured to … …" is used to indicate that an element of an apparatus has a configuration capable of performing a defined operation. In this context, "configuration" refers to an arrangement or manner of hardware or software interconnections. For example, the apparatus may have dedicated hardware providing the defined operations, or a processor or other processing device may be programmed to perform the functions. "configured to" does not mean that the device elements need to be altered in any way to provide the defined operation.
Although illustrative embodiments of the present invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes, additions and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims may be made with the features of the independent claims without departing from the scope of the invention.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:一种近似计算tanh函数的装置