Application specific integrated circuit chip and method, block chain system and block generation method

文档序号:1904810 发布日期:2021-11-30 浏览:28次 中文

阅读说明:本技术 专用集成电路芯片及方法、区块链系统及区块生成方法 (Application specific integrated circuit chip and method, block chain system and block generation method ) 是由 刘明 汪福全 于 2021-11-01 设计创作,主要内容包括:本申请提供了一种专用集成电路芯片及方法、区块链系统及区块生成方法;其中,专用集成电路芯片包括:压缩计算单元;压缩计算单元包括:第一并行处理子单元,用于并行的对中间变量所包含的四组第一元素和相应的输入块数据进行组合计算,并根据组合计算的结果分别对所述四组第一元素进行第一、第二轮重新赋值;中间变量中的第一元素按照第一规则划分为四组;第二并行处理子单元,用于接收第一并行处理子单元输出的重新赋值后的中间变量,并行的对中间变量所包含的四组第一元素和相应的输入块数据进行组合计算,并根据组合计算的结果分别对所述四组第一元素进行第三、第四轮重新赋值;其中,中间变量中的第一元素按照第二规则划分为四组。(The application provides an application specific integrated circuit chip and a method, a block chain system and a block generation method; wherein, the application specific integrated circuit chip includes: a compression calculation unit; the compression calculation unit includes: the first parallel processing subunit is used for performing parallel combination calculation on four groups of first elements contained in the intermediate variable and corresponding input block data, and performing first and second round reassignment on the four groups of first elements respectively according to the result of the combination calculation; dividing a first element in the intermediate variable into four groups according to a first rule; the second parallel processing subunit is used for receiving the reassigned intermediate variables output by the first parallel processing subunit, performing parallel combination calculation on four groups of first elements contained in the intermediate variables and corresponding input block data, and performing third and fourth reassignment on the four groups of first elements according to the results of the combination calculation; and the first elements in the intermediate variables are divided into four groups according to a second rule.)

1. An application specific integrated circuit chip implementing Blake2, comprising: a compression calculation unit; wherein the compression calculation unit comprises:

the first parallel processing subunit is used for performing parallel combination calculation on four groups of first elements contained in the intermediate variable and corresponding input block data, and performing first and second round reassignment on the four groups of first elements respectively according to the result of the combination calculation; the sixteen first elements in the intermediate variables are divided into four groups according to a first rule;

the second parallel processing subunit is used for receiving the reassigned intermediate variables output by the first parallel processing subunit, performing parallel combination calculation on four groups of first elements contained in the intermediate variables and corresponding input block data, and performing third and fourth reassignment on the four groups of first elements according to the results of the combination calculation; and dividing sixteen first elements in the intermediate variables into four groups according to a second rule.

2. The asic chip of claim 1, wherein the first parallel processing subunit comprises, connected in series: a first function module and a second function module;

the first function module comprises four first processing modules which perform first combination calculation in parallel, each first processing module is respectively used for performing first combination calculation on a group of first elements and an input second element and performing reassignment on the group of first elements according to the result of the first combination calculation;

the second function module comprises four second processing modules which perform second combination calculation in parallel, each second processing module is respectively used for performing second combination calculation on a group of first elements and an input second element, and performing reassignment on the group of first elements according to the result of the second combination calculation;

the four groups of first elements aimed at by the first and second function modules are obtained by dividing the intermediate variables according to a first rule; and the second elements input by the first and second function modules are different from each other.

3. The application specific integrated circuit chip of claim 2, wherein the second parallel processing subunit comprises: a mapping module, the first function module, the second function module, an inverse mapping module connected to the first parallel processing subunit;

the mapping module is used for respectively assigning the values of different first elements to the first elements mapped by the first elements according to a preset mapping relation;

the inverse mapping module is used for respectively assigning the values of different first elements to the first elements mapped to the first elements according to the preset mapping relation;

the second elements input by the first and second function modules are different from each other and different from the second elements input by the first and second function modules in the first parallel processing subunit.

4. The asic chip of claim 3, wherein the mapping module assigns values of different first elements to the first elements to which the first elements are mapped according to a predetermined mapping relationship, respectively, comprising:

the mapping module assigns the values of the first elements with index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13 and 14 to the first elements with index values of 0 to 15 respectively;

the inverse mapping module assigns values of different first elements to the first elements mapped to the first elements according to a preset mapping relation, and the inverse mapping module comprises:

the inverse mapping module assigns values of first elements having index values of 0 to 15 to the first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14, respectively.

5. A method of implementing Blake2, the method being applied to an asic chip implementing Blake2 as claimed in any of claims 1-4, the method comprising: performing compression calculation of a preset number of rounds, wherein each round of compression calculation respectively comprises the following steps:

the first parallel processing subunit performs combined calculation on four groups of first elements contained in the intermediate variables and corresponding input block data in parallel, and performs first and second round reassignment on the four groups of first elements respectively according to the result of the combined calculation; the sixteen first elements in the intermediate variables are divided into four groups according to a first rule;

the second parallel processing subunit performs combined calculation on four groups of first elements contained in the intermediate variables and corresponding input block data in parallel, and performs third and fourth rounds of reassignment on the four groups of first elements respectively according to the result of the combined calculation; and dividing sixteen first elements in the intermediate variables into four groups according to a second rule.

6. The method of claim 5, wherein the first parallel processing subunit comprises, connected in series: a first function module and a second function module; the parallel combination calculation of four groups of first elements contained in the intermediate variables and corresponding input block data, and the first and second round reassignment of the four groups of first elements according to the result of the combination calculation respectively comprises:

inputting input parameters into the first function module, wherein the input parameters comprise intermediate variables and four second elements in the input block data; the first function module executes four first sub-processes in parallel, each first sub-process respectively carries out first combination calculation on a group of first elements and one second element serving as an input parameter, and carries out first round reassignment on the group of first elements according to the result of the first combination calculation;

inputting input parameters into a second function module, wherein the input parameters comprise intermediate variables subjected to the first round of reassignment and four second elements in input block data; the second function module executes four second sub-processes in parallel, each second sub-process respectively carries out second combination calculation on a group of first elements subjected to the first round of reassignment and one second element serving as an input parameter, and carries out second round of reassignment on the group of first elements according to the result of the second combination calculation;

wherein, the groups of the first elements corresponding to the first and second sub-processes are obtained by dividing sixteen first elements of the intermediate variable according to a first rule; and the second elements in the input parameters of the first function and the second function are different.

7. The method of claim 6, wherein the second parallel processing subunit comprises: a mapping module, the first function module, the second function module, an inverse mapping module connected to the first parallel processing subunit; the parallel combination calculation of four groups of first elements contained in the intermediate variables and corresponding input block data, and the third and fourth re-assignment of the four groups of first elements according to the result of the combination calculation respectively comprises:

the mapping module assigns values of different first elements to the first elements mapped by the first elements in the intermediate variables subjected to the second round of reassignment according to a preset mapping relation;

inputting input parameters into the first function module, wherein the input parameters comprise intermediate variables assigned by the mapping module and four second elements in input block data; the first function module executes four first sub-processes in parallel, each first sub-process respectively carries out first group total calculation on a group of first elements subjected to second round of reassignment and one second element serving as an input parameter, and carries out third round of reassignment on the group of first elements according to the result of the first group calculation;

inputting input parameters into the second function module, wherein the input parameters comprise intermediate variables subjected to the third round of reassignment and four second elements in the input block data; the second function module executes four second sub-processes in parallel, each second sub-process respectively carries out second group total calculation on a group of first elements subjected to third-wheel reassignment and one second element serving as an input parameter, and carries out fourth-wheel reassignment on the group of first elements according to the result of the second group calculation;

the inverse mapping module assigns values of different first elements to the first elements mapped to the first elements in the intermediate variables subjected to fourth-round reassignment according to the preset mapping relation;

and the second elements in the input parameters of the first and second functions are different from the second elements in the input parameters of the first and second functions used last time.

8. The method of claim 7, wherein the assigning values of different first elements to which the first elements are mapped according to a preset mapping relationship comprises:

assigning values of first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14 to the first elements having index values of 0 to 15, respectively;

assigning values of different first elements to the first elements mapped to the first elements according to a preset mapping relation comprises:

values of the first elements having index values of 0 to 15 are assigned to the first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14, respectively.

9. A blockchain system, comprising: a plurality of nodes;

the method is characterized in that: each of the nodes includes: an application specific integrated circuit chip implementing Blake2 as recited in any one of claims 1-4.

10. A method for generating blocks in a block chain system, comprising:

converting the input conditions into a problem to be solved; the input conditions include: a block header and preset parameters of the last block;

performing Blake2 a predetermined number of times according to the method of implementing Blake2 of any of claims 5-8, solving the problem based on raw data obtained from Blake 2;

and if the obtained solution meets the preset algorithm condition and difficulty condition, judging that the solution is successful, and generating a new block.

Technical Field

The present disclosure relates to the field of block chaining, and more particularly, to an asic chip and method, a block chaining system and a block generating method.

Background

The block chain is used as a shared database, and the data or information stored in the block chain has the characteristics of unforgeability, whole-course trace, traceability, public transparency, collective maintenance and the like. Based on the characteristics, the block chain technology lays a solid 'trust' foundation, creates a reliable 'cooperation' mechanism and has wide application prospect. Equihash is a memory-oriented working certificate in a block chain; the general process is that firstly, input conditions, namely block heads and various parameters are constructed, then the input conditions are converted into a problem of a general form of a generalized birthday problem, the problem is analyzed, difficulty judgment is carried out on the obtained solution, if algorithm conditions and difficulty conditions are met, the solution is judged to be successful, work proves to be completed, and a new block is generated; otherwise, adjusting the parameters to operate again.

Equihash requires the generation of two million raw data from Blake2, from which a solution to the above problem is obtained; blake2 includes Blake2B and Blake 2S; wherein, Blake2B can generate 1 data of 400 bits at a time, and divide into 2 data of 200 bits; blake2S can generate 1 200-bit data at a time, and divide into 2 100-bit data, Equihash needs to perform Blake2 about one million times to obtain the original data to solve, thereby completing the proof of work and generating a new tile in the tile chain.

Disclosure of Invention

The following is a summary of the subject matter described in detail in this application. This summary is not intended to limit the scope of the claims.

The application provides an application specific integrated circuit chip and method, a block chain system and a block generation method, which can improve the processing speed.

The embodiment of the present application provides an asic chip for implementing Blake2, including: a compression calculation unit; the compression calculation unit includes:

the first parallel processing subunit is used for performing parallel combination calculation on four groups of first elements contained in the intermediate variable and corresponding input block data, and performing first and second round reassignment on the four groups of first elements respectively according to the result of the combination calculation; the sixteen first elements in the intermediate variables are divided into four groups according to a first rule;

the second parallel processing subunit is used for receiving the reassigned intermediate variables output by the first parallel processing subunit, performing parallel combination calculation on four groups of first elements contained in the intermediate variables and corresponding input block data, and performing third and fourth reassignment on the four groups of first elements according to the results of the combination calculation; and dividing sixteen first elements in the intermediate variables into four groups according to a second rule.

Optionally, the first parallel processing subunit includes, connected in sequence: a first function module and a second function module;

the first function module comprises four first processing modules which perform first combination calculation in parallel, each first processing module is respectively used for performing first combination calculation on a group of first elements and an input second element and performing reassignment on the group of first elements according to the result of the first combination calculation;

the second function module comprises four second processing modules which perform second combination calculation in parallel, each second processing module is respectively used for performing second combination calculation on a group of first elements and an input second element, and performing reassignment on the group of first elements according to the result of the second combination calculation;

the four groups of first elements aimed at by the first and second function modules are obtained by dividing the intermediate variables according to a first rule; and the second elements input by the first and second function modules are different from each other.

Optionally, the second parallel processing subunit includes: a mapping module, the first function module, the second function module, an inverse mapping module connected to the first parallel processing subunit;

the mapping module is used for respectively assigning the values of different first elements to the first elements mapped by the first elements according to a preset mapping relation;

the inverse mapping module is used for respectively assigning the values of different first elements to the first elements mapped to the first elements according to the preset mapping relation;

the second elements input by the first and second function modules are different from each other and different from the second elements input by the first and second function modules in the first parallel processing subunit.

Optionally, the assigning, by the mapping module, values of different first elements to the first element mapped to by the first element according to a preset mapping relationship includes:

the mapping module assigns the values of the first elements with index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13 and 14 to the first elements with index values of 0 to 15 respectively;

the inverse mapping module assigns values of different first elements to the first elements mapped to the first elements according to a preset mapping relation, and the inverse mapping module comprises:

the inverse mapping module assigns values of first elements having index values of 0 to 15 to the first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14, respectively.

The embodiment of the present application further provides a method for implementing Blake2, which is applied to the above-mentioned asic chip for implementing Blake2, and the method includes: performing compression calculation of a preset number of rounds, wherein each round of compression calculation respectively comprises the following steps:

the first parallel processing subunit performs combined calculation on four groups of first elements contained in the intermediate variables and corresponding input block data in parallel, and performs first and second round reassignment on the four groups of first elements respectively according to the result of the combined calculation; the sixteen first elements in the intermediate variables are divided into four groups according to a first rule;

the second parallel processing subunit performs combined calculation on four groups of first elements contained in the intermediate variables and corresponding input block data in parallel, and performs third and fourth rounds of reassignment on the four groups of first elements respectively according to the result of the combined calculation; and dividing sixteen first elements in the intermediate variables into four groups according to a second rule.

Optionally, the first parallel processing subunit includes, connected in sequence: a first function module and a second function module; the parallel combination calculation of four groups of first elements contained in the intermediate variables and corresponding input block data, and the first and second round reassignment of the four groups of first elements according to the result of the combination calculation respectively comprises:

inputting input parameters into the first function module, wherein the input parameters comprise intermediate variables and four second elements in the input block data; the first function module executes four first sub-processes in parallel, each first sub-process respectively carries out first combination calculation on a group of first elements and one second element serving as an input parameter, and carries out first round reassignment on the group of first elements according to the result of the first combination calculation;

inputting input parameters into a second function module, wherein the input parameters comprise intermediate variables subjected to the first round of reassignment and four second elements in input block data; the second function module executes four second sub-processes in parallel, each second sub-process respectively carries out second combination calculation on a group of first elements subjected to the first round of reassignment and one second element serving as an input parameter, and carries out second round of reassignment on the group of first elements according to the result of the second combination calculation;

wherein, the groups of the first elements corresponding to the first and second sub-processes are obtained by dividing sixteen first elements of the intermediate variable according to a first rule; and the second elements in the input parameters of the first function and the second function are different.

Optionally, the second parallel processing subunit includes: a mapping module, the first function module, the second function module, an inverse mapping module connected to the first parallel processing subunit; the parallel combination calculation of four groups of first elements contained in the intermediate variables and corresponding input block data, and the third and fourth re-assignment of the four groups of first elements according to the result of the combination calculation respectively comprises:

the mapping module assigns values of different first elements to the first elements mapped by the first elements in the intermediate variables subjected to the second round of reassignment according to a preset mapping relation;

inputting input parameters into the first function module, wherein the input parameters comprise intermediate variables assigned by the mapping module and four second elements in input block data; the first function module executes four first sub-processes in parallel, each first sub-process respectively carries out first group total calculation on a group of first elements subjected to second round of reassignment and one second element serving as an input parameter, and carries out third round of reassignment on the group of first elements according to the result of the first group calculation;

inputting input parameters into the second function module, wherein the input parameters comprise intermediate variables subjected to the third round of reassignment and four second elements in the input block data; the second function module executes four second sub-processes in parallel, each second sub-process respectively carries out second group total calculation on a group of first elements subjected to third-wheel reassignment and one second element serving as an input parameter, and carries out fourth-wheel reassignment on the group of first elements according to the result of the second group calculation;

the inverse mapping module assigns values of different first elements to the first elements mapped to the first elements in the intermediate variables subjected to fourth-round reassignment according to the preset mapping relation;

and the second elements in the input parameters of the first and second functions are different from the second elements in the input parameters of the first and second functions used last time.

Optionally, the assigning the values of the different first elements to the first element mapped by the first element according to the preset mapping relationship includes:

assigning values of first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14 to the first elements having index values of 0 to 15, respectively;

assigning values of different first elements to the first elements mapped to the first elements according to a preset mapping relation comprises:

values of the first elements having index values of 0 to 15 are assigned to the first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14, respectively.

An embodiment of the present application further provides a block chain system, including: a plurality of nodes; each of the nodes includes: the above-described asic chip implementing Blake 2.

The embodiment of the present application further provides a block generation method in a block chain system, including:

converting the input conditions into a problem to be solved; the input conditions include: a block header and preset parameters of the last block;

performing Blake2 for a preset number of times according to the method for realizing Blake2, and solving the problem according to original data obtained by Blake 2;

and if the obtained solution meets the preset algorithm condition and difficulty condition, judging that the solution is successful, and generating a new block.

Compared with the related art, the embodiment of the application provides an implementation scheme of Blake2, and the implementation scheme realizes the process of each round of compression calculation in Blake2 in a parallel mode, so that compared with the traditional implementation scheme, the computation speed of Blake2 is increased, and the computation time of Blake2 is reduced. In the process of generating a new block by the block chain system, the Equihash adopts Blake2 to generate original data so as to be further solved to complete the work proof to generate the new block, and about one million times of operation of Blake2 is required for generating 2 solutions, so that the efficiency of completing the work proof can be greatly improved by slightly increasing the calculation speed of Blake2, the speed of generating the new block is increased, and considerable benefits can be brought.

Other aspects will be apparent upon reading and understanding the attached drawings and detailed description.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a schematic diagram of a compressed computing unit in an ASIC chip implementing Blake2 according to an embodiment of the present application;

FIG. 2 is a flow chart of each round of compression calculation in a method for implementing Blake2 according to an embodiment of the present application;

FIG. 3 is a flow chart of an example of an embodiment of the present application;

fig. 4 is a flowchart of a block generation method according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

In addition, descriptions in this application as to "first", "second", etc. are used for differentiation in description only, and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the related art, the Blake2 includes two versions, Blake2B and Blake2S, where the Blake2B is a 64-bit version, and may generate a hash value of an arbitrary length of up to 512 bits; blake2S is a 32-bit version that can generate a hash value of up to 256 bits.

Taking the case of solving by Blake2B as an example, the input data of Equihash includes: block header 108 bytes (byte), nonce value 32 bytes, for a total of 140 bytes. Each time Blake2B is used to concatenate a different 4-byte counter value from 0 to 2 after the 140 bytes20-1 for a total of 144 bytes. Dividing the 144 bytes of data into two blocks of data, the first block of data being 128 bytes and the second block of data being 16 bytes (including a 12 byte nonce value and a 4 byte counter value); the number 0 is again spliced 16 bytes after the second block data until the second block data is padded to 128 bytes.

In the process of calculating by using Blake2B, firstly calculating the h value (h _1 st) of the first block data, and then performing 12-round compression calculation on the second block data; in each round of compression calculation, combining the intermediate variable v and the input block data m (namely the second block data) by using a mixing function G function, thereby obtaining a new intermediate variable v for the next round of use; acquiring v _ init required by the first round of compression calculation of the second block data by using the h value (h _1 st) of the first block data and a preset initial vector iv, wherein the calculation method comprises the following steps:

v_init[0:7] : = h_1st[0:7];

v_init[8:11] : = iv[0:3];

v _ init [12] = iv [4] (t mod 2 x w);// bitwise XOR

v_init[13] := iv[5] ^ ( t >> w);

v _ init [14] = [ iv ] 6// bit-wise negation

v_init[15] := iv[7];

Wherein "=" represents assignment, t is offset count, "mod" represents modulo 2 to the power w, and w in Blake2B is 64; ">" is a logical shift; the h value is 512 bits (8 64 bits); v is 1024 bits (16 64 bits).

Calculating v _ out obtained by the last round of compression calculation and the h value of the first block data, outputting 400-bit data Blake2b _ data, and splitting the 400-bit data into two 200-bit data as the final result of performing Blake2B at this time; the calculation method is as follows:

h_result [0] = h_1st[0] ^ v_out[0] ^ v_out[8];

h_result [1] = h_1st[1] ^ v_out[1] ^ v_out[9];

h_result [2] = h_1st[2] ^ v_out[2] ^ v_out[10];

h_result [3] = h_1st[3] ^ v_out[3] ^ v_out[11];

h_result [4] = h_1st[4] ^ v_out[4] ^ v_out[12];

h_result [5] = h_1st[5] ^ v_out[5] ^ v_out[13];

h_result [6] = h_1st[6] ^ v_out[6] ^ v_out[14];

h_result [7] = h_1st[7] ^ v_out[7] ^ v_out[15];

blake2b_data[399:0] = h_result[399:0];

in Blake2B, 8G functions are required for each round of compression calculation.

The input block data m is also a word vector (word) containing 16 words (word), each word in Blake2B being 64 bits, and each word in the input block data m is referred to herein as an m element (also referred to herein as a second element); the 16 m elements are input when the G function is used 8 times.

The intermediate variable v is a word vector (word vector) containing 16 words (word), each word in Blake2B being 64 bits, each word in the intermediate variable v being referred to herein as a v element (also referred to herein as a first element); the 16 v elements v0 to v15 are divided into the following 4 groups according to a first rule, as inputs when the G function was used the previous 4 times respectively:

(v0, v4, v8, v12)、(v1, v5, v9, v13)、(v2, v6, v10, v14)、(v3, v7, v11, v15);

after the G function is used for the first 4 times, 16 v elements are updated, and the updated 16 v elements are divided into the following 4 groups according to a second rule and are respectively used as input when the G function is used for the next 4 times.

(v0, v5, v10, v15)、(v1, v6, v11, v12)、(v2, v7, v8, v13)、(v3, v4, v9, v14)。

It can be seen that the rules adopted in the division are different, and the v elements contained in each group are different, so that the v elements involved in the combined calculation are different. Wherein v0 refers to the v element with index value (index) of 0 in the intermediate variable v, or the first v element; the other v elements are analogized.

The 12-round compression calculation in Blake2B may be implemented with the following code:

FOR i = 0 TO 11 DO

s[0..15] := SIGMA[i mod 10][0..15]

v := G( v, 0, 4, 8, 12, m[s[ 0]], m[s[ 1]] )

v := G( v, 1, 5, 9, 13, m[s[ 2]], m[s[ 3]] )

v := G( v, 2, 6, 10, 14, m[s[ 4]], m[s[ 5]] )

v := G( v, 3, 7, 11, 15, m[s[ 6]], m[s[ 7]] )

v := G( v, 0, 5, 10, 15, m[s[ 8]], m[s[ 9]] )

v := G( v, 1, 6, 11, 12, m[s[10]], m[s[11]] )

v := G( v, 2, 7, 8, 13, m[s[12]], m[s[13]] )

v := G( v, 3, 4, 9, 14, m[s[14]], m[s[15]] )

END FOR

wherein s is a constant of the permutation matrix, SIGMA is a constant matrix with 12 rows (0 to 11 rows) and 16 columns, and 10 and 11 rows in the SIGMA are respectively the same as 0 and 1 row; SIGMA can be in the form:

const unit 8_t SIGMA [12] [16] ={

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},

{14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3},

{11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4},

{7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8},

{9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13},

{2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9},

{12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11},

{13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10},

{6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5},

{10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0},

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},

{14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3}

};

in 12-round compression calculation, the first 10 rounds respectively select 0 to 9 lines in SIGMA to be assigned to s [0..15], and the 11 th and 12 th rounds respectively select 0 and 1 lines in SIGMA to be assigned to s [0..15 ]; thus, in different rounds of compression calculation, the m elements input each time the G function is used will be different, for example, in the first round of compression calculation, i =0, s [0..15] is assigned with the first row of the SIGMA matrix, so that in this round, the m elements input the first time the G function is used are m0 and m1, the m elements input the second time the G function is used are m2 and m3, and so on; and in the second round of compression calculation, i =1, s [0..15] is assigned with the SIGMA matrix in the second row, so that m elements input in the first round of use of the G function are m14 and m10, m elements input in the second round of use of the G function are m4 and m8, and so on.

Each time the G function is used, the input set of m elements and the set of v elements are combined to update the set of 4 v elements. One execution process of the G function can be divided into a first half update process and a second half update process, wherein the first half update process uses one of the input 2 m elements to perform combined calculation with the input 4 v elements so as to reassign the 4 v elements (i.e., update the 4 v elements); the second half update uses another of the input 2 m elements to combine with the re-assigned 4 v elements in the first half update to re-assign the 4 v elements again; and taking the re-assigned 4 v elements as the output of the G function at the time.

In the process of one round of compression calculation (using 8 times of G function), all 16 v elements of the intermediate variable v are respectively input into the G function twice, and 2 rounds of updating in the first and second half stages are carried out after each time of inputting the G function, namely each v element is updated for 4 rounds in each round of compression calculation.

The first and second half-period updating respectively comprises: and performing combined calculation on the first v element in the group, the second v element in the group and the first m element of the input, and reassigning the first v element according to the result of the combined calculation. And for the last three v elements in the group, respectively reassigning according to the combination calculation result of the v element and the third/fourth/first v element in the group. The difference between the first and second half updates is: and the second half updating is to perform combined calculation again on the basis of the first half updating, and the adopted v elements are all the v elements which are reassigned in the first half updating.

The G function can be expressed as the following code:

FUNCTION G ( v[ 0..15 ], a, b, c, d, x, y )

v[a] := (v[a] + v[b] + x) mod 2**w

v[d] := (v[d] ^ v[a] ) >>>R1

v[c] := (v[c] + v[d] + x) mod 2**w

v[b] := (v[b] ^ v[c] ) >>>R2

v[a] := (v[a] + v[b] + y) mod 2**w

v[d] := (v[d] ^ v[a] ) >>>R3

v[c] := (v[c] + v[d] + x) mod 2**w

v[b] := (v[b] ^ v[c] ) >>>R4

RETURN v[ 0..15 ]

END FUNCTION

wherein a, b, c, d, x and y are input parameters of the G function, where a, b, c and d are indexes, and are used to represent v elements of the input, such as a, b, c and d are 0, 4, 8 and 12 respectively, and represent v0, v4, v8 and v 12; x and y are input m elements, such as m [ s 0] and m [ s 1 ]. "2 × w" indicates the power w of 2, w in Blake2B is 64, and the calculation of the combination can be omitted; "^" indicates an XOR operation, "> > > >" indicates a circular right shift, and R1 through R4, the number of bits to the right shift, after "> > > >", are 32, 16, 24, 63 in Blake2B, respectively.

Blake2S and Blake2B perform substantially similarly, except that: blake2S performs only 10 rounds of compression calculation, w is 32, one word is 32 bits, the block length is 64 bytes, the hash result is 32 bytes, (R1, R2, R3, R4) is (16, 12, 8, 7).

It can be seen that the whole implementation process of Blake2 has many steps, 10 or 12 rounds of compression calculation need to be performed, 8G functions need to be used in each round of compression calculation, and each G function includes 8 assignment operations and 4 shift operations; considering that the Equihash requires about one million runs of Blake2, this results in a long processing time, which is inefficient when using Equihash for proof of work in a blockchain system to generate new blocks.

The embodiment of the present application provides an asic chip for implementing Blake2, including: a compression calculation unit; the compression calculation unit is shown in fig. 1 and includes:

the first parallel processing subunit 11 is configured to perform parallel combination calculation on four groups of first elements included in the intermediate variable and corresponding input block data, and perform first and second round reassignment on the four groups of first elements according to a result of the combination calculation; the sixteen first elements in the intermediate variables are divided into four groups according to a first rule;

the second parallel processing subunit 12 is configured to receive the reassigned intermediate variable output by the first parallel processing subunit, perform parallel combination calculation on four groups of first elements included in the intermediate variable and corresponding input block data, and perform third and fourth reassignment on the four groups of first elements according to a result of the combination calculation; and dividing sixteen first elements in the intermediate variables into four groups according to a second rule.

In this embodiment, the compression calculation unit adopts the first and second parallel processing sub-units to update four groups of first elements in the intermediate variables in parallel, so as to improve the calculation speed of Blake2 and reduce the calculation time of Blake2, compared with the conventional implementation scheme. In the process of generating a new block by the block chain system, the Equihash adopts Blake2 to generate original data so as to be further solved to complete the work proof to generate the new block, and about one million times of operation of Blake2 is required for generating 2 solutions, so that the efficiency of completing the work proof can be greatly improved by slightly increasing the calculation speed of Blake2, the speed of generating the new block is increased, and considerable benefits can be brought.

In this embodiment, the input block data targeted by the first parallel processing subunit 11 is different from the second parallel processing subunit.

In this embodiment, four groups of first elements divided according to the first rule are:

first elements with index values of 0, 4, 8, 12, first elements with index values of 1, 5, 9, 13, first elements with index values of 2, 6, 10, 14, first elements with index values of 3, 7, 11, 15;

the four groups of first elements divided according to the second rule are respectively:

first elements with index values of 0, 5, 10, 15, first elements with index values of 1, 6, 11, 12, first elements with index values of 2, 7, 8, 13, first elements with index values of 3, 4, 9, 14.

In an exemplary embodiment, as shown in fig. 1, the first parallel processing subunit 11 may include, connected in sequence: a first function module 111 and a second function module 112;

the first function module 111 includes four first processing modules for performing a first combination calculation in parallel, each of the first processing modules is configured to perform a first combination calculation on a group of first elements and an input second element, and perform reassignment on the group of first elements according to a result of the first combination calculation;

the second function module 112 includes four second processing modules for performing second combination calculation in parallel, each of the second processing modules is respectively configured to perform second combination calculation on a group of first elements and an input second element, and perform reassignment on the group of first elements according to a result of the second combination calculation;

the four groups of first elements aimed at by the first and second function modules are obtained by dividing the intermediate variables according to a first rule; the second elements input by the first and second function modules are different from each other.

In an alternative of this embodiment, as shown in fig. 1, the second parallel processing subunit 12 may include: a mapping module 121, a first function module 122, a second function module 123, an inverse mapping module 124 connected to the first parallel processing subunit 11;

the input end of the mapping module 121 may be connected to the output end of the second function module 112, and is configured to assign values of different first elements to the first elements to which the first elements are mapped according to a preset mapping relationship;

the inverse mapping module 124 is configured to assign values of different first elements to the first elements mapped to the first elements according to the preset mapping relationship; the output end of the inverse mapping module 124 outputs an intermediate variable, i.e., the output result of one round of compression calculation.

The second elements input by the first and second function modules are different from each other and different from the second elements input by the first and second function modules in the first parallel processing subunit 11.

Wherein the first function module 122 and the second function module 123 may multiplex the first function module 111 and the second function module 123; alternatively, the first parallel processing subunit 11 and the second parallel processing subunit 12 use different first and second function modules, respectively.

In this alternative, the process of one round of compression calculation may include: v and m are input into a first function module 111 for first round updating, m and v output by the first function module 111 are input into a second function module 112 for second round updating, and v input output by the second function module 112 is input into a mapping module 121 for v positive transformation; m and v of the map modulo fast output are input to the first function module 122 for a third round of updating; m and v output by the first function module 122 are input into the second function module 123 for the fourth round of updating; the v input output by the second function module 123 is input to the inverse mapping module 124 for v inverse transformation; and taking v output by the inverse mapping module 124 as the output of the current round of compression calculation.

It can be seen that the alternative is a pipeline type of operation, and each module outputs the result to the next module after completing its own processing; when the first and second function modules are used for processing, a parallel processing mode is adopted, and 4 first and second processing modules can be respectively adopted for parallel combined calculation.

In this alternative, the assigning, by the mapping module, values of different first elements to the first element to which the first element is mapped according to a preset mapping relationship may include:

the mapping module assigns the values of the first elements with index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13 and 14 to the first elements with index values of 0 to 15 respectively;

the inverse mapping module assigns the values of the different first elements to the first elements mapped to the first elements according to the preset mapping relationship, and the inverse mapping module may include:

the inverse mapping module assigns values of the first elements having index values of 0 to 15 to the first elements having index values of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14, respectively.

The embodiment of the application provides an application-specific integrated circuit chip for realizing Blake2, and the main difference between the application-specific integrated circuit chip and the related technology is that a first parallel processing subunit and a second parallel processing subunit are adopted in each round of compression calculation to realize parallel updating of a first element in an intermediate variable; illustratively, newly designed first and second function modules are adopted in the first and second parallel processing subunits to perform combined calculation and updating on 4 groups of first elements in parallel.

The embodiment of the application also provides a method for realizing Blake2, which comprises the following steps: performing a predetermined number of rounds of compression calculations, each round of compression calculation as shown in fig. 2, comprising steps S210-S220:

s210, carrying out parallel combined calculation on 4 groups of v elements contained in the intermediate variable v and corresponding input block data, and respectively carrying out first and second round reassignment on the 4 groups of v elements according to the result of the combined calculation; wherein, 16 v elements in the intermediate variable v are divided into 4 groups according to a first rule;

s220, parallelly carrying out combined calculation on 4 groups of v elements contained in the intermediate variable v and corresponding input block data, and respectively carrying out third and fourth rounds of reassignment on the 4 groups of v elements according to the result of the combined calculation; wherein, 16 v elements in the intermediate variable v are divided into 4 groups according to a second rule.

In this embodiment, in the process of compression calculation, an implementation manner of parallel update of 4 groups of v elements is adopted, so that the processing speed of each round of compression calculation is increased, and further the processing speed of Blake2 is increased.

In this embodiment, the steps S210 and S220 may be implemented by, but not limited to, an ASIC (Application Specific Integrated Circuit) chip provided in the above embodiment, or implemented by running a program. When implemented by an ASIC, the first parallel processing subunit 11 may be used to perform step S210, and the second parallel processing subunit 12 may be used to perform step S220. The first and second parallel processing subunits can perform 4 processing procedures in parallel, and each processing procedure respectively performs combined calculation on 1 group of v elements and corresponding input block data. When the method is implemented by running a program, 4 processor cores can be adopted, 4 threads can be run in parallel, and each thread performs combined calculation on 1 group of v elements and corresponding input block data. In steps S210 and S220, when performing the combination calculation, a v element may be read from an intermediate variable v stored in a first storage area of the memory, and input block data may be read from a second storage area of the memory; the v element may be written to the first memory region after reassigning the v element.

In this embodiment, steps S210 and S220 adopt different m elements in the input block data to participate in the combination calculation.

In this embodiment, for Blake2B, the predetermined number of rounds is 12 rounds; for Blake2S, the predetermined number of rounds is 10 rounds; when implementing Blake2B and Blake2S, the parameters used in the combined calculations will also differ accordingly.

In this embodiment, the method for implementing Blake2 may further include, before performing the compression calculation for the predetermined number of rounds: calculating the h value of the first block of data, and storing the h value in a third storage area of a memory; and obtaining an initial intermediate variable v according to the h value, and storing the initial intermediate variable v in the first storage area. After the compression calculation of the preset number of rounds, the method further comprises the following steps: and (3) calculating by using the intermediate variable v stored in the first storage area and the h value stored in the third storage area to obtain 2 200-bit data, and storing the data in a fourth storage area of the memory as the output of Blake 2.

In an implementation manner of this embodiment, step S210 may include:

inputting input parameters into a first function module, wherein the input parameters comprise intermediate variables and 4 m elements in input block data; the first function module executes 4 first sub-processes in parallel, each first sub-process carries out first combination calculation on 1 group of v elements and 1 m element serving as input parameters respectively, and carries out first round reassignment on the group of v elements according to the result of the first combination calculation;

inputting input parameters into a second function module, wherein the input parameters comprise intermediate variables subjected to first round reassignment and 4 m elements in the input block data; the second function module executes 4 second sub-processes in parallel, each second sub-process respectively carries out second group total calculation on 1 group of v elements subjected to third-wheel reassignment and 1 m element serving as input parameters, and carries out second-wheel reassignment on the group of v elements according to the result of the second group calculation;

the group of v elements corresponding to the 4 first and second subprocesses is obtained by dividing 16 v elements of the intermediate variable v according to a first rule; the m elements in the input parameters of the first and second functions are different.

In this embodiment, on the basis of performing the updating of the 4 groups of v elements in parallel, the first and second round of updating of the 4 groups of v elements divided according to the first rule are completed by using the two functions in sequence, and compared with the method of using the G function 4 times in the conventional scheme, the number of times of using the functions can be reduced, the number of times of reading and writing the intermediate variable v is reduced, and the processing time is further reduced.

In this embodiment, the first function module and the second function module may respectively include 4 processing modules, and each of the first/second sub-processes is implemented by using 1 processing module; compared with the traditional implementation scheme, the scheme can reduce the number of modules, thereby reducing the area of a chip and saving resources.

In the alternative of this embodiment, the first and second functions may be implemented by a program, and the first and second functions are called by using the first and second functions; compared with the traditional implementation scheme, the method and the device can reduce the times of calling the function.

In this embodiment, m elements input to the first function and the second function are different; the number of bits of cyclic shift in the first combining calculation and the second combining calculation is different.

In the alternative of this embodiment, one function may be used to complete the first and second round updates of 4 groups of v elements in parallel, but the scheme only needs to use one function, but needs to re-assign the 16 v elements twice each time the function is used, which has a high requirement on storage capacity and thus increases cost. In another alternative of this embodiment, four functions or eight functions may be used successively to implement the first update or the second update of half or one v element in the 4 groups in parallel for each function, which requires multiple uses of the function and multiple reads and writes of the intermediate variable v, resulting in a longer processing time and a weakening of the advantages of parallel processing. In contrast, the implementation of the present embodiment using two functions is a solution that can take into account the cost and the processing time.

In this embodiment, step S220 may include:

the mapping module assigns values of different v elements to the v elements mapped by the v elements in the intermediate variables subjected to the second round of reassignment according to a preset mapping relation;

inputting input parameters into a first function module, wherein the input parameters comprise intermediate variables subjected to second round reassignment and 4 m elements in the input block data; the first function module executes 4 first sub-processes in parallel, each first sub-process respectively carries out first group total calculation on 1 group of v elements subjected to second-round reassignment and 1 m element serving as an input parameter, and carries out third-round reassignment on the group of v elements according to the result of the first group calculation;

inputting input parameters into a second function module, wherein the input parameters comprise intermediate variables subjected to the third-round reassignment and 4 m elements in the input block data; the second function module executes 4 second sub-processes in parallel, each second sub-process respectively carries out second group total calculation on 1 group of v elements subjected to third-wheel reassignment and 1 m element serving as input parameters, and carries out fourth-wheel reassignment on the group of v elements according to the result of the second group calculation;

and the inverse mapping module assigns values of different v elements to the v elements mapped to the v elements in the intermediate variables subjected to fourth-round reassignment according to a preset mapping relation.

In this embodiment, the first and second function modules used in the first and second rounds of reassignment may be reused in the third and fourth rounds of reassignment, or additional first and second function modules may be used.

In this embodiment, m elements input when the first function and the second function are used are different from each other, and are different from m elements input when the first function and the second function are used in the first and second rounds of reassignment.

In this embodiment, when the first/second function is used for the first time or used again, 4 processor cores may be adopted, 4 threads are run in parallel, each thread executes a first/second subprocess, and when the first/second subprocess is executed, the v elements participating in the combination calculation are read from the first storage area, and the v elements after reassignment are written into the first storage area.

In this embodiment, since each group of v elements needs to be divided according to the second rule in the third and fourth updates of v elements, and the v element groups corresponding to the first and second sub-processes are divided according to the first rule, in order to implement the third and fourth updates using the first and second functions without changing the functions, the values may be assigned according to a preset mapping relationship, so that the values of the 4 v elements included in each of the 4 groups divided according to the first rule after being assigned are the same as the values of the 4 v elements included in each of the 4 groups divided according to the second rule before being assigned.

For example, the first group divided according to the first rule is (v 0, v4, v8, v 12), and the first group divided according to the second rule is (v 0, v5, v10, v 15); according to a preset mapping relation, assigning a value of v5 to v4, which is equivalent to adjusting a v element of original index =5 to a v element of index = 4; similarly, the value of v10 is assigned to v8, and the value of v15 is assigned to v 12; thus, after the values are assigned according to the mapping relations, the first/second subprocess extracts v0, v4, v8 and v12 to participate in the combined calculation, and actually extracts the values of v0, v5, v10 and v15 before the values are assigned according to the mapping relations, and the like in the cases of other groups. Therefore, under the condition of keeping the first function module and the second function module unchanged, the first group of v elements divided according to the second rule can be subjected to combined calculation and re-assignment without introducing a new function module or modifying the input parameters of the first function module and the second function module.

In this embodiment, after the third and fourth round updates of the v element are performed by using the first and second function modules, the values of different v elements are assigned to the v element mapped to the v element, which is equivalent to a reduction process and an inverse mapping process. For example, v5 is mapped to v4, the value of v5 is assigned to v4 according to the preset mapping relationship, and then the value of v4 (namely, the value of v5 before being assigned according to the mapping relationship after being updated by the third round and the fourth round) is assigned to v5 again after the value is assigned by the first function module and the second function module, which is equal to the value of v5 which is restored.

In this embodiment, the preset mapping relationship may be:

values of v elements with indices of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14 are respectively assigned to v elements with indices of 0 to 15.

Thus, when restoring, v elements having indices of 0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, and 14 are assigned values corresponding to v elements having indices of 0 to 15, respectively.

In the alternative of this embodiment, the third and fourth round updates of the v element can be completed by using the third and fourth functions; in the third and fourth functions, v element groups corresponding to the 4 sub-processes are obtained by dividing the intermediate variable v according to the second rule. However, this is cumbersome to construct more functions. In another alternative of this embodiment, an index of 4 v elements corresponding to each of the first and second sub-processes may be specified in the input parameters of the first and second functions, but since 4 sub-processes and 16 v elements are involved, the input parameters of the first and second functions are excessive. In the embodiment, assignment is performed according to the mapping relation, and the first function module and the second function module are used and then mapped reversely, so that only two function modules are needed in the whole compression calculation process, and the input parameters are simple, so that the compression calculation process can be realized very conveniently, and complicated design is avoided.

The method of implementing Blake2 of the present application embodiment is described below with an example.

In this example, the implementation steps other than the compression calculation may be the same as those of the conventional scheme; the difference is that two new functions G1 and G2 are used to complete the compression calculation process, implementing Blake 2B; the implementation of Blake2S is similar and will not be described in detail herein. The function G1 may be implemented by a first function block in an asic chip, and the function G2 may be implemented by a second function block in the asic chip.

Wherein the first function G1 can be represented as a code of the form:

Function G1(v, x1, x2, x3, x4)

Return(v)

End function

a first subprocess, namely 1 column in the 4 columns of codes divided by the separation line, wherein the 4 columns of codes can be regarded as codes for representing 4 first subprocesses executed in parallel in the G1 function, and can be respectively realized by a first processing module; each column of codes respectively performs a first group of total calculations on a group of v elements and corresponding input block data, and updates the group of v elements; wherein the 4 groups of v elements are the following 4 groups divided according to a first rule:

(v0, v4, v8, v12)、(v1, v5, v9, v13)、(v2, v6, v10, v14)、(v3, v7, v11, v15)。

in this example, the second function G2 may be represented as a code of the form:

Function G2(v, x1, x2, x3, x4)

Return(v)

End function

a second sub-process, i.e. 1 of the 4 columns of codes divided by the dividing line, wherein the 4 columns of codes can be regarded as codes representing 4 second sub-processes executed in parallel in the G2 function, and can be respectively realized by a second processing module; each column of codes respectively performs second group calculation on a group of v elements and corresponding input block data, and updates the group of v elements; wherein the 4 groups of v elements are 4 groups divided according to a first rule.

It can be seen from the above codes that the main difference between the first and second combined calculations is the difference in the number of bits that are circularly right-shifted.

In this example, a round of compression calculation process is shown in fig. 3, and includes the following steps 301 and 306:

301. using the G1 function; the 4 m elements of the input G1 function are m elements of the input block data m with index s [0], s [2], s [4] and s [6]; namely: the input parameters x1, x2, x3, x4 of the G1 function are: m < s < 0 >, m < s < 2 >, m < s < 4 >, m < s < 6 >.

S [0..15] may be derived from SIGMA and the compression calculation current round i, such as S [0..15 ]: = SIGMA [ i mod 10] [0..15 ].

302. Using the G2 function; the 4 m elements of the input G2 function are the s [1], s [3], s [5], s [7] m elements of the input block data m; namely: the input parameters x1, x2, x3, x4 of the G2 function are: m < s < 1 >, m < s < 3 >, m < s < 5 >, m < s < 7 >.

In this step, the v element of the input G2 function is the v element updated in step 301.

Wherein the G2 function performs 4 second sub-processes in parallel for 4 groups of v elements and 4 input m elements; each second sub-process performs a second combination calculation on a set of v elements and the input one of m elements, respectively, and updates the set of v elements according to the result of the second combination calculation.

The above steps 301 and 302 can be regarded as first and second round updates to 16 v elements in the intermediate variable respectively; where G1 can be regarded as performing the first half update process of the G function in parallel for 4 groups of v elements divided according to the first rule, and G2 can be regarded as performing the second half update process of the G function in parallel for 4 groups of v elements divided according to the first rule. G1 and G2 can jointly complete the first 4 times of G function usage of a round of compression calculation in the related art; it can be seen that only 2 times of functions are needed, and 4 groups of v elements are updated in parallel each time the functions are used, so that the processing speed of the first and second round of updating is improved.

303. The intermediate variable v is positive transformed.

Step 303 may specifically include: respectively assigning values of different v elements to the v elements mapped by the v elements according to a preset mapping relation; for example, v5 is mapped to v4, then v4 is assigned the value of v5 (i.e., the value of v element of index =5 is assigned to v element of index = 4; it can also be considered that v5 is moved to the position of index = 4).

In this step, the values of v elements at original indices =0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, and 14 are respectively assigned to the v elements of indices =0 to 15.

The changes before and after the forward transform are shown in table 1.

TABLE 1, v Positive transformations

Before forward transformation After positive transformation
v0 v0
v1 v1
v2 v2
v3 v3
v5 v4
v6 v5
v7 v6
v4 v7
v10 v8
v11 v9
v8 v10
v9 v11
v15 v12
v12 v13
v13 v14
v14 v15

304. Using the G1 function; the 4 m elements of the input G1 function are the s [8], s [10], s [12] and s [14] m elements of the input block data m; namely: the input parameters x1, x2, x3 and x4 of the G1 function are m [ s [8] ], m [ s [10] ], m [ s [12] ]andm [ s [14] ], respectively.

In this step, the v element of the input G1 function is the v element that is updated through two rounds of steps 301 and 302 and is transformed through step 303.

305. Using the G2 function; the 4 m elements of the input G2 function are the s [9], s [11], s [13], s [15] m elements of the input block data m; namely: the input parameters of the G2 function are x1, x2, x3 and x4, respectively: m < 9 >, m < 11 >, m < 13 >, m < 15 >.

In this step, the v element of the input G2 function is updated in two rounds of steps 301 and 302, and is updated again in step 304 after being subjected to the forward transformation in step 303.

The above steps 304, 305 can be regarded as that 16 v elements in the intermediate variable are respectively subjected to the third and fourth round updates; wherein G1 can be regarded as performing the first half update process of the G function in parallel for 4 groups of v elements divided according to the second rule, and G2 can be regarded as performing the second half update process of the G function in parallel for 4 groups of v elements divided according to the second rule. G1 and G2 can jointly complete the processing performed when the G function is used for the last 4 times in one compression calculation round in the related art; it can be seen that only 2 times of functions are needed, and each time the function is used, 4 groups of v elements are updated in parallel, so that the processing speed of the third and fourth round of updating is improved.

306. The intermediate variable v is inverse transformed.

Step 306 may specifically include: and respectively assigning the values of different v elements to the v elements mapped to the v elements according to a preset mapping relation. Step 306 can be regarded as the reverse operation of step 303, which is to restore the value assigned in step 303, such as mapping v5 to v4, and assigning the value of v5 to v4 in step 303; in step 306, the value of v4 is assigned to v5 (i.e., the value of v element with index =4 is copied to v element with index =5, which can also be regarded as v4 is moved to the position with index = 5), which is equal to the value of v5 being restored to v5, but note that the value restored to v5 is different from step 303 because the re-assignment is performed after two rounds of steps 304 and 305.

For example, in step 303, after the forward transform is performed according to table 1, the values of v elements of index =0, 1, 2, 3, 7, 4, 5, 6, 10, 11, 8, 9, 13, 14, 15, and 12 before the inverse transform are assigned to v elements of index =0 to 15 after the inverse transform; namely: the values of v elements of index =0 to 15 before inverse transformation are assigned to v elements at index =0, 1, 2, 3, 5, 6, 7, 4, 10, 11, 8, 9, 15, 12, 13, 14 after inverse transformation.

The cases before and after v inverse transformation are shown in table 2.

TABLE 2 inverse transformation of v

Before inverse transformation After inverse transformation
v0 v0
v1 v1
v2 v2
v3 v3
v7 v4
v4 v5
v5 v6
v6 v7
v10 v8
v11 v9
v8 v10
v9 v11
v13 v12
v14 v13
v15 v14
v12 v15

In this example, each of the first sub-processes G1, G2 may include the following operations:

performing combined calculation on a first v element in the group, a second v element in the group and an input m element, and reassigning the first v element in the group according to the result of the combined calculation; and for the last three v elements in the group, respectively reassigning according to the combination calculation result of the v element and the third/fourth/first v element in the group.

The first and second subprocesses may differ in the amount of shift when performing cyclic shift in the combined calculation.

In this example, the process of one round of compression calculation in Blake2B can be represented by the following code:

FOR i= 0 TO 11 DO

S[0..15] := SIGMA[i mod 10][0..15]

G1(v, m[s[0]], m[s[2]], m[s[4]], m[s[6]] )

G2(v, m[s[1]], m[s[3]], m[s[5]], m[s[7]] )

vc[0..15] := { v0, v1, v2, v3, v5, v6, v7, v4, v10, v11, v8, v9, v15, v12, v13, v14}

v[0..15] := vc[0..15]

G1(v, m[s[8]], m[s[10]], m[s[12]], m[s[14]] );

G2(v, m[s[9]], m[s[11]], m[s[13]], m[s[15]] );

vc[0..15] := { v0, v1, v2, v3, v7, v4, v5, v6, v10, v11, v8, v9, v13, v14, v15, v12}

v[0..15] := vc[0..15]

End FOR

an embodiment of the present application further provides a block chain system, including: a plurality of nodes;

each of the nodes includes: the above embodiment implements the dedicated integrated circuit chip of Blake 2.

The embodiment of the present application further provides a block generation method in a block chain system, as shown in fig. 4, including steps 401 and 403:

401. converting the input conditions into a problem to be solved; the input conditions include: a block header and preset parameters of the last block;

402. performing Blake2 for a predetermined number of times according to the method for realizing Blake2 of any one of the embodiments, and solving the problem according to original data obtained by Blake 2;

403. and if the obtained solution meets the preset algorithm condition and difficulty condition, judging that the solution is successful, and generating a new block.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:处理单元、相关装置和张量运算方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!