Parallel operation acceleration system and operation method thereof

文档序号:135438 发布日期:2021-10-22 浏览:16次 中文

阅读说明:本技术 一种并行运算加速系统及其运行方法 (Parallel operation acceleration system and operation method thereof ) 是由 常子奇 赵旺 于 2021-07-20 设计创作,主要内容包括:本发明公开了一种并行运算加速系统及其运行方法,并行运算加速系统包括:操作数缓存模块、结果缓存模块、控制模块和计算模块;其中,控制模块包括存取控制单元、M级并行的表达式缓存单元、M级并行的解析单元和分发单元,计算模块包括N级并行的计算单元,N级并行的计算单元的一端与分发单元的一端连接,以实现N级并行的计算单元接收分发单元传输的计算操作信息,N级并行的计算单元的另一端分别都与M级并行的表达式缓存单元连接。本发明将计算分步骤分发到计算单元计算,使得该加速系统单次支持的最大混合运算操作,计算模块将数据直接传输至表达式缓存单元以提高计算结果的传输效率,从而提高加速系统的运算效率。(The invention discloses a parallel operation accelerating system and an operation method thereof, wherein the parallel operation accelerating system comprises: the device comprises an operand cache module, a result cache module, a control module and a calculation module; the control module comprises an access control unit, an M-level parallel expression caching unit, an M-level parallel analysis unit and a distribution unit, the calculation module comprises an N-level parallel calculation unit, one end of the N-level parallel calculation unit is connected with one end of the distribution unit so as to enable the N-level parallel calculation unit to receive calculation operation information transmitted by the distribution unit, and the other end of the N-level parallel calculation unit is respectively connected with the M-level parallel expression caching unit. The invention distributes the calculation to the calculation unit for calculation in steps, so that the maximum mixed operation supported by the acceleration system at a single time is realized, and the calculation module directly transmits the data to the expression cache unit to improve the transmission efficiency of the calculation result, thereby improving the operation efficiency of the acceleration system.)

1. A parallel operation acceleration system, characterized in that the parallel operation acceleration system comprises: the device comprises an operand cache module, a result cache module, a control module and a calculation module; the control module comprises an access control unit, an M-level parallel expression caching unit, an M-level parallel analysis unit and a distribution unit, the calculation module comprises an N-level parallel calculation unit, one end of the N-level parallel calculation unit in the calculation module is connected with one end of the distribution unit of the control module so as to realize that the N-level parallel calculation unit receives calculation operation information transmitted by the distribution unit, and the other end of the N-level parallel calculation unit in the calculation module is respectively connected with the M-level parallel expression caching unit of the control module; the operand cache module is used for caching a batch of operands of the calculation operation to be executed; the result caching module is used for caching the expression calculation result; wherein M and N are integers greater than or equal to 2.

2. The parallel operation acceleration system according to claim 1, wherein the access control unit is configured to read an operand to be calculated from an operand cache module, and further configured to transmit an expression calculation result of a complete calculation operation performed in the control module to the result cache module; the access control unit comprises M expression calculation result input ends, M storage state signal input ends, M operand output ends, an expression calculation result output end, an operand input end and M +1 access state signal output ends;

the M storage state signal input ends are used for realizing that the access control unit respectively receives the storage state signals of the M levels of parallel expression cache units, and the access control unit judges whether a group of operands is transmitted to the access control unit according to the storage state signals of each level of expression cache units;

the M operand output ends are used for realizing that the access control unit transmits operands to one level or more than one level of expression cache units in the M levels of parallel expression cache units;

the operand input end is used for realizing that the access control unit reads at least one group of operands to be calculated from the operand cache module;

the M expression calculation result input ends are used for realizing that the access control unit respectively receives expression calculation results which are output by the M-level parallel expression caching units and used for executing complete part calculation operation;

the expression calculation result output end is used for realizing that the access control unit transmits the expression calculation result of executing the complete part calculation operation to the result cache module;

and the M +1 access state signal output ends are used for realizing that the access control unit respectively transmits the access state signals to the M-level parallel expression cache unit and the operand cache module.

3. The parallel operation acceleration system according to claim 2, characterized in that the M-level parallel expression cache units are M parallel expression cache units, no data and/or signal transmission exists between every two of the M parallel expression cache units, and an initial calculation expression corresponding to a level is configured inside each level of expression cache unit; each level of expression cache unit comprises an expression calculation result output end, a current calculation result input end, a storage state signal output end, an access state signal input end, an operand input end, a calculation operation information output end and a calculation completion signal input end;

the expression calculation result output end is connected with a corresponding expression calculation result input end of the access control unit and is used for realizing that the first-level expression cache unit transmits the expression calculation result of the execution of the complete part of calculation operation to the access control unit;

the current calculation result input end is used for receiving the current calculation result output by the calculation module by the first-level expression cache unit;

the storage state signal output end is connected with a corresponding storage state signal input end of the access control unit and is used for realizing that the first-level expression cache unit transmits the storage state signal to the access control unit;

the access state signal input end is connected with a corresponding access state signal output end of the access control unit and used for realizing that the first-level expression cache unit receives the access state signal transmitted by the access control unit;

the operand input end is connected with a corresponding operand output end of the access control unit and is used for realizing that the first-level expression cache unit receives the operand transmitted by the access control unit;

the operand output end is used for realizing that the first-level expression cache unit transmits the operand of the calculation operation to be executed to the same-level analysis unit;

the calculation completion signal input end is used for receiving a calculation completion signal transmitted by the analysis unit of the same level by the expression cache unit of the level one;

the calculation operation information comprises an operational character and an operand of the current calculation to be executed and the level number of the first-level expression cache unit; the calculation completion signal is a signal for indicating that all calculation operations included in the calculation expression of the first-level expression cache unit have been executed.

4. The parallel operation acceleration system of claim 3, characterized in that the M-level parallel parsing units are M parallel parsing units, no data and/or signal transmission exists between every two of the M parallel parsing units, each level of parsing unit organizes a data structure table according to a calculation expression configured inside a corresponding level of expression cache unit, and reads calculation operation information from the expression cache unit according to the data structure table; each level of analysis unit comprises an operand input end, a calculation completion signal output end, a calculation operation information output end and a calculation authorization signal input end;

the operand input end is connected with the operand output end of the expression cache unit at the same level and used for reading operands from the expression cache unit at the same level by the first-level analysis unit and generating calculation operation information;

the calculation completion signal output end is connected with the calculation completion signal input end of the expression cache unit at the same level and used for realizing that the analysis unit at the level transmits a calculation completion signal to the expression cache unit at the same level;

the calculation authorization signal input end is used for receiving the calculation authorization signal output by the distribution unit by the primary analysis unit;

the calculation operation information output end is used for realizing that the primary analysis unit transmits the calculation operation information to the distribution unit;

the calculation authorization signal is a signal fed back by the distribution unit when the calculation operation information transmitted by the primary analysis unit meets the condition of the target calculation unit, so as to indicate that the calculation resource application is successful; the kth level expression caching unit is an expression caching unit at the same level of the kth level analysis unit, and k is an integer which is less than or equal to M and greater than 0; the condition of the target computing unit is met, namely the computing unit can execute the computing operation information and is in an idle state, and the computing unit is determined to be the target computing unit.

5. The parallel computing acceleration system of claim 4, characterized in that the data structure table comprises:

a valid identifier column for identifying whether valid expression content exists for each row in the data structure table;

the single and double identifier columns are used for identifying whether the operator is a monocular operator or a binocular operator and determining the operand quantity correspondingly required by the operator;

the operator column is used for storing the calculation operation to be executed;

an operation identifier column for identifying whether an operand exists for each row in the data structure table;

a data column for storing operands of a computing operation to be performed;

the information of the effective identifier column, the single and double identifier columns, the operation identifier column, the data column and the operator column is correspondingly configured by the analysis unit according to the calculation expression configured in the first-level expression cache unit; the valid expression contents refer to operands and operators.

6. The parallel computing acceleration system of claim 4, characterized in that, the N-level parallel computing units are N computing units operating in parallel, there is no data and/or signal transmission between every two computing units operating in parallel, each level of computing units is configured to perform a computing operation, each level of computing units comprises a computing operation information input terminal, an idle state signal output terminal and M current computing result output terminals;

the calculation operation information input end is used for receiving the calculation operation information transmitted by the distribution unit by the primary calculation unit;

the idle state signal output end is used for realizing that the primary computing unit transmits an idle state signal to the distribution unit;

the M current calculation result output ends are respectively connected with the M current calculation result input ends of the M-level expression cache units and are used for realizing that the one-level calculation unit transmits the calculation result to the one-level expression cache unit corresponding to the calculation operation information corresponding to the current calculation result;

wherein the idle state signal is a signal for indicating whether the primary computing unit is in an idle state; the idle state refers to a state in which the computing unit is capable of receiving information of computing operations to be performed and performing corresponding computing operations.

7. The parallel computing acceleration system of claim 6, characterized in that the distribution unit is configured with computing operations executed by each stage of computing unit, and the distribution unit includes M computing operation information input terminals, M computing authorization signal output terminals, N idle state signal input terminals, and N computing operation information output terminals;

the M computing operation information input ends are correspondingly connected with the M computing operation information output ends of the M-level parallel analysis units and are used for realizing that the distribution unit respectively receives the computing operation information transmitted by the M-level analysis units;

the M calculation authorization signal output ends are correspondingly connected with the M calculation authorization signal input ends of the M-level parallel analysis units and are used for transmitting the calculation authorization signals to the one-level analysis unit corresponding to the calculation operation information by the distribution unit;

the N idle state signal input ends are correspondingly connected with the N idle state signal output ends of the N-level parallel computing units and are used for realizing that the distributing unit respectively receives N idle state signals transmitted by the N-level computing units;

the N computing operation information output ends are correspondingly connected with the N computing operation information input ends of the N-level parallel computing units and used for realizing that the distributing unit transmits the computing operation information to the target computing unit;

wherein, the target computing unit refers to a computing unit which can execute the computing operation information and is in an idle state.

8. A method for operating a parallel operation acceleration system according to any one of claims 1 to 7, the method comprising:

step 1: when the parallel operation accelerating system is started, each level of analysis unit reads the initial calculation expression corresponding to the first level from the expression cache unit corresponding to the first level, and each level of analysis unit organizes the data structure table corresponding to the first level according to the calculation expression corresponding to the first level;

step 2: the access control unit judges whether an expression cache unit in a to-be-stored state exists in the M-level parallel expression cache units, if the kth-level expression cache unit is in the to-be-stored state, the access control unit reads a group of operands from the operand cache module and transmits the operands to the kth-level expression cache unit, and the kth-level expression cache unit receives the group of operands transmitted by the access control unit and then converts the operands in-to-be-stored state into the stored state;

and step 3: the kth level analysis unit judges whether the current calculation operation to be executed exists according to the kth level data structure table, if yes, the step 4 is carried out, if not, the kth level analysis unit transmits a calculation completion signal to the kth level expression cache unit, the kth level expression cache unit receives the calculation completion signal and then transmits the current calculation result which is cached latest in the kth level expression cache unit as an expression calculation result for executing the complete calculation operation to the access control module, and the access control module transmits the expression calculation result which is transmitted by the kth level expression cache unit and is used for executing the complete calculation operation to the result cache module for caching;

and 4, step 4: the k-th level analyzing unit determines operands required by the current to-be-executed computing operation according to the k-th level data structure table, reads corresponding operands from the k-th level expression cache unit according to the operands required by the current to-be-executed computing operation, acquires the current to-be-executed computing operation information, and transmits the current to-be-executed computing operation information to the distributing unit;

and 5: the distribution unit analyzes the calculation operation requested to be executed by the kth level analysis unit according to the received calculation operation information, determines a target calculation unit according to the calculation operation requested to be executed by the kth level analysis unit, transmits the calculation operation information to be executed currently of the kth level analysis unit to the target calculation unit, and transmits a calculation authorization signal to the kth level analysis unit;

step 6: the target calculation unit acquires operands of the calculation operation to be executed currently according to the calculation operation information to be executed currently, executes corresponding calculation operation, acquires a current calculation result and transmits the current calculation result to the kth level expression cache unit;

and 7: caching the current calculation result by the k-th level expression caching unit, updating the calculation expression of the k-th level expression caching unit according to the current calculation result, and returning to the step 3;

wherein k is an integer less than or equal to M and greater than 0.

9. The method according to claim 8, wherein when the parallel computing acceleration system is performing step 3 to step 7, the parallel computing acceleration system is also performing step 2; the step 2 further comprises: when the expression cache unit in the state to be stored does not exist in the expression cache units in the M-level parallel, the access control unit repeatedly judges whether the expression cache unit in the state to be stored exists in the expression cache units in the M-level parallel; when more than one expression cache unit is in a to-be-stored state in the M-level parallel expression cache unit, the storage control unit reads a group of operands from the operand cache module in sequence according to the sequence of detecting that the expression cache units are in the to-be-stored state and transmits the operands to the corresponding expression cache units in the to-be-stored state in sequence; the expression cache unit in the state to be stored is converted from the state to be stored to the state to be stored after receiving a group of operands transmitted by the access control unit.

10. The method for operating a parallel computing acceleration system according to claim 9, characterized in that the step 3 further comprises: the kth level analysis unit transmits a calculation completion signal to the kth level expression cache unit, and meanwhile, the kth level analysis unit updates a data structure table in the kth level analysis unit into an initial data structure table; when the kth-level expression caching unit transmits an expression computing result of executing complete computing operation to the access control module, the kth-level expression caching unit is converted from a stored state to a to-be-stored state, and meanwhile, a computing expression of the kth-level expression caching unit is updated to be an initial computing expression of the kth level; the initial data structure table refers to a data structure table organized by the analysis unit according to the initial expression corresponding to the first-level expression cache unit.

11. The method for operating a parallel operation acceleration system according to claim 8, wherein the step 5 specifically includes: the distributing unit analyzes the computing operation requested to be executed by the kth-level analyzing unit according to the received computing operation information, matches the computing operation requested to be executed by the kth level with the computing operation correspondingly executed by the N-level computing unit arranged in the distributing unit, screens out at least one level of computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit from the N-level computing unit, determines the one level of computing unit which has the lowest transmission cost and is in an idle state in the computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit as a target computing unit by combining idle state signals of the computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit, and transmits the information of the computing operation currently to be executed by the kth-level analyzing unit to the target computing unit, meanwhile, the distribution unit transmits a calculation authorization signal to the kth level analysis unit; the lowest transmission cost means that the time required for transmitting the information of the current calculation operation to be executed to the first-level calculation unit in the distribution unit is shortest and the occupied system resources are least.

12. The method for operating a parallel computing acceleration system according to claim 11, characterized in that the step 5 further comprises: and after the kth-level analysis unit receives the calculation authorization signal transmitted by the distribution unit, updating the data structure table information in the kth-level analysis unit.

13. The method according to claim 12, wherein the updating the data structure table inside the kth-level parsing unit specifically includes: replacing operands contained in the calculation operation information corresponding to the calculation authorization signal and row information of the operators in a data structure table with a row of operand information; the replacing with a row of operand information means that the current calculation result corresponding to the calculation operation information is correspondingly updated in the data structure table in a code form as an operand.

Technical Field

The invention relates to the field of integrated circuits, in particular to a parallel operation acceleration system and an operation method thereof.

Background

With the rapid development of science and technology, more and more technical fields, such as artificial intelligence, security operation and the like, all relate to the operation of mass data. The demand for performing the same calculation operation on large-batch data is increasing at present. In these large computation fields, in order to improve the data processing speed and processing capacity, it is usually necessary to control a plurality of computing units to work simultaneously, and in the fields of artificial intelligence, secure computation, and the like, it is usually required to have a fast computation speed, a short time delay, and high efficiency, so how to improve the efficiency of hybrid computation is always the goal of accelerating the chip design.

Disclosure of Invention

In order to solve the above problems, the present invention provides a parallel operation acceleration system and an operation method thereof, which achieve continuous data calculation and continuous output of calculation results, and improve the flow efficiency of a hybrid operation level. The specific technical scheme of the invention is as follows:

a parallel operations acceleration system, the parallel operations acceleration system comprising: the device comprises an operand cache module, a result cache module, a control module and a calculation module; the control module comprises an access control unit, an M-level parallel expression caching unit, an M-level parallel analysis unit and a distribution unit, the calculation module comprises an N-level parallel calculation unit, one end of the N-level parallel calculation unit in the calculation module is connected with one end of the distribution unit of the control module so as to realize that the N-level parallel calculation unit receives calculation operation information transmitted by the distribution unit, and the other end of the N-level parallel calculation unit in the calculation module is respectively connected with the M-level parallel expression caching unit of the control module; the operand cache module is used for caching a batch of operands of the calculation operation to be executed; the result caching module is used for caching the expression calculation result; wherein M and N are integers greater than or equal to 2.

Compared with the prior art, the parallel operation accelerating system disclosed by the technical scheme is based on the M-level parallel expression caching unit and the analysis unit, so that M calculation expressions can be independently calculated, the complex calculation is distributed to the calculation unit in steps based on the control module for calculation, the maximum mixed operation supported by the accelerating system at a single time is only limited by the size of the caching space of the control module and is not limited by the number of hardware calculation resources, the complex calculation can be realized by using less calculation resources, the design area occupied by the bus part of the accelerating module is greatly reduced, the hardware cost of a chip is reduced, the overall structure of the accelerating system can be correspondingly cut according to the calculation requirements with different complexities, the adaptability is stronger, meanwhile, the calculation module directly feeds back the calculation results to the expression caching unit, and the distribution transmission time of the calculation results is saved, the operation efficiency and the performance of the acceleration system are improved.

Further, the access control unit is configured to read an operand to be computed from the operand cache module, and further configured to transmit an expression computation result of a complete computation operation performed in the control module to the result cache module; the access control unit comprises M expression calculation result input ends, M storage state signal input ends, M operand output ends, an expression calculation result output end, an operand input end and M +1 access state signal output ends; the M storage state signal input ends are used for realizing that the access control unit respectively receives the storage state signals of the M levels of parallel expression cache units, and the access control unit judges whether a group of operands is transmitted to the access control unit according to the storage state signals of each level of expression cache units; the M operand output ends are used for realizing that the access control unit transmits operands to one level or more than one level of expression cache units in the M levels of parallel expression cache units; the operand input end is used for realizing that the access control unit reads at least one group of operands to be calculated from the operand cache module; the M expression calculation result input ends are used for realizing that the access control unit respectively receives expression calculation results which are output by the M-level parallel expression caching units and used for executing complete part calculation operation; the expression calculation result output end is used for realizing that the access control unit transmits the expression calculation result of executing the complete part calculation operation to the result cache module; and the M +1 access state signal output ends are used for realizing that the access control unit respectively transmits the access state signals to the M-level parallel expression cache unit and the operand cache module.

The technical scheme is that the control module reads the operand according to the cache space of the expression cache unit in the control module based on the access control unit, and caches the calculation result of executing complete calculation operation to the result cache unit, so that the condition that the operand is continuously read by the access control unit due to insufficient cache space in the control module is avoided; for more than one expression calculation results for executing the complete calculation operation, which are transmitted to the access control unit by the M-level expression cache unit in sequence, the access control unit supports the expression calculation results for executing the complete calculation operation to be transmitted to the result cache unit in sequence according to the sequence of transmitting the expression calculation results to the access control unit in sequence, and the access control unit also supports the expression calculation results for executing the complete calculation operation to be transmitted to the result cache unit in disorder.

Furthermore, the M-level parallel expression cache units are M parallel expression cache units, no data and/or signal transmission exists between every two of the M parallel expression cache units, and an initial calculation expression corresponding to a level is configured in each level of expression cache unit; each level of expression cache unit comprises an expression calculation result output end, a current calculation result input end, a storage state signal output end, an access state signal input end, an operand output end and a calculation completion signal input end; the expression calculation result output end is connected with a corresponding expression calculation result input end of the access control unit and is used for realizing that the first-level expression cache unit transmits the expression calculation result of the execution of the complete part of calculation operation to the access control unit; the current calculation result input end is used for receiving the current calculation result output by the calculation module by the first-level expression cache unit; the storage state signal output end is connected with a corresponding storage state signal input end of the access control unit and is used for realizing that the first-level expression cache unit transmits the storage state signal to the access control unit; the access state signal input end is connected with a corresponding access state signal output end of the access control unit and used for realizing that the first-level expression cache unit receives the access state signal transmitted by the access control unit; the operand input end is connected with a corresponding operand output end of the access control unit and is used for realizing that the first-level expression cache unit receives the operand transmitted by the access control unit; the operand output end is used for realizing that the first-level expression cache unit transmits operands to the same-level analysis unit; the calculation completion signal input end is used for receiving a calculation completion signal transmitted by the analysis unit of the same level by the expression cache unit of the level one; the calculation operation information comprises an operational character and an operand of the current calculation to be executed and the level number of the first-level expression cache unit; the calculation completion signal is a signal for indicating that all calculation operations included in the calculation expression of the first-level expression cache unit have been executed. According to the technical scheme, the multi-level expression cache unit is adopted, so that the acceleration system can realize parallel operation of multiple groups of calculation expressions, and realize continuous data calculation and continuous output of calculation results.

Furthermore, the M-level parallel analysis units are M parallel analysis units, no data and/or signal transmission exists between every two M parallel analysis units, each level of analysis unit organizes a data structure table according to the calculation expressions configured in the corresponding level of expression cache unit, and reads calculation operation information from the expression cache unit according to the data structure table; each level of analysis unit comprises an operand input end, a calculation completion signal output end, a calculation operation information output end and a calculation authorization signal input end; the operand input end is connected with the operand output end of the expression cache unit at the same level and used for reading the operand to be executed currently from the expression cache unit at the same level by the analysis unit at the first level so as to generate calculation operation information; the calculation completion signal output end is connected with the calculation completion signal input end of the expression cache unit at the same level and used for realizing that the analysis unit at the level transmits a calculation completion signal to the expression cache unit at the same level; the calculation authorization signal input end is used for receiving the calculation authorization signal output by the distribution unit by the primary analysis unit; the calculation operation information output end is used for realizing that the primary analysis unit transmits the calculation operation information to the distribution unit; the calculation authorization signal is a signal fed back by the distribution unit when the calculation operation information transmitted by the primary analysis unit meets the condition of the target calculation unit, so as to indicate that the calculation resource application is successful; the kth level expression caching unit is an expression caching unit at the same level of the kth level analysis unit, and k is an integer which is less than or equal to M and greater than 0; the condition of the target computing unit is met, namely the computing unit can execute the computing operation information and is in an idle state, and the computing unit is determined to be the target computing unit. In the technical scheme, each level of expression cache unit is configured with a corresponding level of analysis unit, the calculation expressions configured in the expression cache units are analyzed through the analysis units, and the data structure table is organized, so that the subsequent analysis units can read the current operand to be executed from the expression cache units according to the data structure table, and the mixed operation efficiency is improved.

Further, the data structure table includes: a valid identifier column for identifying whether valid expression content exists for each row in the data structure table; the single and double identifier columns are used for identifying whether the operator is a monocular operator or a binocular operator and determining the operand quantity correspondingly required by the operator; the operator column is used for storing the calculation operation to be executed; an operation identifier column for identifying whether an operand exists for each row in the data structure table; a data column for storing operands of a computing operation to be performed; the information of the effective identifier column, the single and double identifier columns, the operation identifier column, the data column and the operator column is correspondingly configured by the analysis unit according to the calculation expression configured in the first-level expression cache unit; the valid expression contents refer to operands and operators. In the technical scheme, the analysis unit organizes the data structure table according to the calculation expression, so that the subsequent analysis unit judges whether all calculation operations in the calculation expression are completely executed according to the data structure table, and reads operands to be executed from the expression cache unit according to the data structure table, thereby improving the data reading efficiency.

Furthermore, the N-level parallel computing units are N computing units performing parallel operations, no data and/or signal transmission exists between every two of the N computing units performing parallel operations, each level of computing units is configured to perform a computing operation, and each level of computing units includes a computing operation information input end, an idle state signal output end, and M current computing result output ends; the calculation operation information input end is used for receiving the calculation operation information transmitted by the distribution unit by the primary calculation unit; the idle state signal output end is used for realizing that the primary computing unit transmits an idle state signal to the distribution unit; the M calculation result output ends are respectively connected with the M calculation result input ends of the M-level expression cache units and are used for realizing that the one-level calculation unit transmits the calculation result to the one-level expression cache unit corresponding to the calculation operation information corresponding to the current calculation result; wherein the idle state signal is a signal for indicating whether the primary computing unit is in an idle state; the idle state refers to a state in which the computing unit is capable of receiving information of computing operations to be performed and performing corresponding computing operations. The N computing units configured in the technical scheme can execute various different computing operations, so that the control module can realize flexible distribution of computing operation information, realize continuous output of computing results and improve the flow efficiency of mixed operation.

Furthermore, the distribution unit is internally configured with the computing operation correspondingly executed by each level of computing unit, and comprises M computing operation information input ends, M computing authorization signal output ends, N idle state signal input ends and N computing operation information output ends; the M computing operation information input ends are correspondingly connected with the M computing operation information output ends of the M-level parallel analysis units and are used for realizing that the distribution unit respectively receives the computing operation information transmitted by the M-level analysis units; the M calculation authorization signal output ends are correspondingly connected with the M calculation authorization signal input ends of the M-level parallel analysis units and are used for transmitting the calculation authorization signals to the one-level analysis unit corresponding to the calculation operation information by the distribution unit; the N idle state signal input ends are correspondingly connected with the N idle state signal output ends of the N-level parallel computing units and are used for realizing that the distributing unit respectively receives N idle state signals transmitted by the N-level computing units; the N computing operation information output ends are correspondingly connected with the N computing operation information input ends of the N-level parallel computing units and used for realizing that the distributing unit transmits the computing operation information to the target computing unit; wherein, the target computing unit refers to a computing unit which can execute the computing operation information and is in an idle state. According to the technical scheme, the distribution unit is arranged in the control module, the computing operations which can be executed by N levels of computing units are configured in the distribution unit, the idle states of all the computing units are uniformly detected and managed by the control module based on the distribution unit, the computing efficiency is improved, the computing operations which can be executed by each level of computing unit can be adaptively configured according to the computing operations which are correspondingly executed by the accelerating system, so that the computing efficiency and the utilization rate of computing resources are improved, and one computing unit supports and executes two or more than two same computing operations in the same computing expression based on the distribution control of the distribution unit on computing operation information.

The invention also discloses an operation method of the parallel operation acceleration system, which comprises the following steps: step 1: when the parallel operation accelerating system is started, each level of analysis unit reads the initial calculation expression corresponding to the first level from the expression cache unit corresponding to the first level, and each level of analysis unit organizes the data structure table corresponding to the first level according to the calculation expression corresponding to the first level; step 2: the access control unit judges whether an expression cache unit in a to-be-stored state exists in the M-level parallel expression cache units, if the kth-level expression cache unit is in the to-be-stored state, the access control unit reads a group of operands from the operand cache module and transmits the operands to the kth-level expression cache unit, and the kth-level expression cache unit receives the group of operands transmitted by the access control unit and then converts the operands in-to-be-stored state into the stored state; and step 3: the kth level analysis unit judges whether the current calculation operation to be executed exists according to the kth level data structure table, if yes, the step 4 is carried out, if not, the kth level analysis unit transmits a calculation completion signal to the kth level expression caching unit, the kth level expression caching unit receives the calculation completion signal and then transmits the current calculation result which is cached latest in the kth level expression caching unit as an expression calculation result for executing the complete calculation operation to the access control module, and the access control module transmits the expression calculation result which is transmitted by the kth level expression caching unit and used for executing the complete calculation operation to the result caching module for caching. And 4, step 4: the k-th level analyzing unit determines operands required by the current to-be-executed computing operation according to the k-th level data structure table, reads corresponding operands from the k-th level expression cache unit according to the operands required by the current to-be-executed computing operation, generates current to-be-executed computing operation information, and transmits the current to-be-executed computing operation information to the distributing unit; and 5: the distribution unit analyzes the calculation operation requested to be executed by the kth level analysis unit according to the received calculation operation information, determines a target calculation unit according to the calculation operation requested to be executed by the kth level analysis unit, transmits the calculation operation information to be executed currently of the kth level analysis unit to the target calculation unit, and transmits a calculation authorization signal to the kth level analysis unit; step 6: the target calculation unit acquires operands of the calculation operation to be executed currently according to the calculation operation information to be executed currently, executes corresponding calculation operation, acquires a current calculation result and transmits the current calculation result to the kth level expression cache unit; and 7: caching the current calculation result by the k-th level expression caching unit, updating the calculation expression of the k-th level expression caching unit according to the current calculation result, and returning to the step 3; wherein k is an integer less than or equal to M and greater than 0. Compared with the prior art, the method realizes continuous data calculation, continuously outputs the calculation result and improves the flow efficiency of the mixed operation level.

Further, when the parallel operation acceleration system is in the process of executing the step 3 to the step 7, the parallel operation acceleration system is also executing the step 2 at the same time; the step 2 further comprises: when the expression cache unit in the state to be stored does not exist in the expression cache units in the M-level parallel, the access control unit repeatedly judges whether the expression cache unit in the state to be stored exists in the expression cache units in the M-level parallel; when more than one expression cache unit is in a to-be-stored state in the M-level parallel expression cache unit, the storage control unit reads a group of operands from the operand cache module in sequence according to the sequence of detecting that the expression cache units are in the to-be-stored state and transmits the operands to the corresponding expression cache units in the to-be-stored state in sequence, wherein the expression cache units in the to-be-stored state are converted from the to-be-stored state to the stored state after receiving the group of operands transmitted by the access control unit. The access control unit continuously detects the storage state of the M-level parallel expression cache units, sends a group of operands to the expression cache units in the to-be-stored state in time, and controls the reading and transmission of the operands through the storage state signals of the expression cache units so as to ensure that the control module has enough cache space to meet the input of the calculation result.

Further, the step 3 further comprises: the kth level analysis unit transmits a calculation completion signal to the kth level expression cache unit, and meanwhile, the kth level analysis unit updates a data structure table in the kth level analysis unit into an initial data structure table; when the kth-level expression caching unit transmits an expression computing result of executing complete computing operation to the access control module, the kth-level expression caching unit is converted from a stored state to a to-be-stored state, and meanwhile, a computing expression of the kth-level expression caching unit is updated to be an initial computing expression of the kth level; the initial data structure table refers to a data structure table organized by the analysis unit according to the initial expression corresponding to the first-level expression cache unit. The technical scheme controls the reading of a new group of operands and the initialization of the calculation expression by setting the storage state conversion of the expression cache unit.

Further, the step 5 specifically includes: the distributing unit analyzes the computing operation requested to be executed by the kth-level analyzing unit according to the received computing operation information, matches the computing operation requested to be executed by the kth level with the computing operation correspondingly executed by the N-level computing unit arranged in the distributing unit, screens out at least one level of computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit from the N-level computing unit, determines the one level of computing unit which has the lowest transmission cost and is in an idle state in the computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit as a target computing unit by combining idle state signals of the computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit, and transmits the information of the computing operation currently to be executed by the kth-level analyzing unit to the target computing unit, meanwhile, the distribution unit transmits a calculation authorization signal to the kth level analysis unit; the lowest transmission cost means that the time required for transmitting the information of the current calculation operation to be executed to the first-level calculation unit in the distribution unit is shortest and the occupied system resources are least. And screening out the most suitable computing unit according to a plurality of conditions to execute the current computing operation to be executed, thereby improving the computing efficiency of the acceleration system.

Further, the step 5 further comprises: and after the kth-level analysis unit receives the calculation authorization signal transmitted by the distribution unit, updating the data structure table information in the kth-level analysis unit. According to the technical scheme, the data structure information is updated after each piece of calculation operation information is executed, so that wrong repeated operation cannot be carried out on partial operands or operators, and the accuracy of the operation result of the acceleration system is improved.

Further, the updating the data structure table inside the kth level parsing unit specifically includes: replacing operands contained in the calculation operation information corresponding to the calculation authorization signal and row information of the operators in a data structure table with a row of operand information; the replacing with a row of operand information means that the current calculation result corresponding to the calculation operation information is correspondingly updated in the data structure table in a code form as an operand.

Drawings

Fig. 1 is a schematic structural diagram of a parallel operation acceleration system according to an embodiment of the present invention.

Fig. 2 is a port diagram of an access control unit according to an embodiment of the invention.

Fig. 3 is a schematic port diagram of an expression cache unit according to an embodiment of the present invention.

Fig. 4 is a schematic port diagram of the parsing unit according to an embodiment of the present invention.

Fig. 5 is a port diagram of a distribution unit according to an embodiment of the present invention.

Fig. 6 is a schematic port diagram of a computing unit according to an embodiment of the present invention.

Fig. 7 is a flowchart illustrating an operation method of the parallel operation acceleration system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the following specific examples are illustrative only and are not intended to limit the invention. Moreover, it should be understood that the technical disclosure of the present invention may be modified by those skilled in the art by a conventional method, and it should not be understood that the technical disclosure of the present invention is not limited thereto.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. Reference to the words "a," "an," "the," and "the" in this application are not to be construed as limiting in number, and may mean singular or plural. The use of the terms "including," "comprising," "having," and any variations thereof herein, is intended to cover non-exclusive inclusions, such as: a process, method, system product or apparatus that comprises a list of steps or modules is not limited to the listed steps or elements but may include additional steps or elements not listed or inherent to such process, method, product or apparatus. Reference throughout this application to the terms "first," "second," "third," and the like are only used for distinguishing between similar references and not intended to imply a particular ordering for the objects.

In an embodiment of the present invention, a parallel operation acceleration system is provided, as shown in fig. 1, the parallel operation acceleration system including: the device comprises an operand cache module, a result cache module, a calculation module and a control module. It should be noted that the arrows in fig. 1 only indicate two modules/units, or the transmission of signals and/or information between modules and units and the transmission direction thereof, and do not indicate that the transmission of more than one signal and/or information between the modules/units is realized by only one port. It should be noted that, as shown in fig. 1, there are signals and information transmission between each level of computing unit of the computing module and the expression buffer unit in parallel with the M levels of the control module.

Specifically, the operand cache module is configured to store a batch of operands to be calculated; wherein a batch of operands comprises one or more sets of operands. The operand cache module comprises an operand output end which is used for realizing that the control module reads the operand from the operand cache module; the operand cache module also comprises an access state signal input end which is used for realizing that the operand cache module receives an access state signal of an access control unit in the control module, and the operand cache module determines whether to approve the operand cache module to read the operand according to the access state signal of the access control unit, if the access control unit is in an access saturation state, but the access control unit still submits a request for reading the operand to the operand cache module, the operand cache module rejects the read request of the access control unit.

The result caching module is used for caching the expression calculation results of all calculation operations, and comprises an expression calculation result input end used for receiving and caching the expression calculation results transmitted by the control module.

The control module comprises an access control unit, an M-level parallel expression caching unit, an M-level parallel analysis unit and a distribution unit; as shown in fig. 1, one end of the access control unit is connected to one end of the M-level parallel expression cache unit, the other end of the access control unit is connected to the operand cache module and the result cache module, the other end of each level of expression cache unit is connected to one end of the corresponding level of parsing unit, one end of the distribution unit is connected to the other end of the M-level parallel parsing unit, and the other end of the distribution unit is connected to the N-level parallel computing unit of the computing module; the analysis unit of the kth level is an analysis unit of a corresponding level of the kth level expression cache unit; n is an integer greater than or equal to 2, M is an integer greater than or equal to 2, and k is an integer less than or equal to M and greater than 0.

Specifically, the access control unit is configured to enable the control module to read an operand from the operand cache module, and to enable an expression computation result of a complete computation operation performed in the control module to be transmitted to the result cache module. As shown in fig. 2, the access control unit includes M expression computation result input terminals, M storage status signal input terminals, operand output terminals, expression computation result output terminals, M operand input terminals, and M +1 access status signal output terminals; the access control unit is used for judging whether a group of operands is transmitted to the access control unit according to the storage state signals of each level of expression cache unit; the M operand output ends are used for realizing that the access control unit transmits operands to one level or more than one level of expression cache units in the M levels of parallel expression cache units; the operand input end is used for realizing that the access control unit reads at least one group of operands to be calculated from the operand cache module; the M expression calculation result input ends are used for realizing that the access control unit respectively receives expression calculation results which are output by the M-level parallel expression caching units and used for executing complete part calculation operation; the expression calculation result output end is used for realizing that the access control unit transmits the expression calculation result of executing the complete part calculation operation to the result cache module; and the M +1 access state signal output ends are used for realizing that the access control unit respectively transmits the access state signals to the M-level parallel expression cache unit and the operand cache module.

The access state signal is used for representing the access state of the access control unit, the access state of the access control unit comprises a to-be-accessed state and an access saturation state, when the access control unit is in the to-be-accessed state, the access control unit indicates that a new operand and an expression calculation result can be received in the access control unit, when the access control unit is in the access saturation state, the access control unit indicates that data stored in the access control unit reaches a preset accommodating threshold value, and the access control unit cannot receive the new operand and the expression calculation result in the access saturation state.

Preferably, for more than one expression calculation results of executing the complete calculation operation, which are successively transmitted to the access control unit by the M-level expression cache unit, the access control unit supports that the expression calculation results of executing the complete calculation operation are transmitted to the result cache unit according to the sequence in which the expression calculation results are successively transmitted to the access control unit, and the access control unit also supports that the expression calculation results of executing the complete calculation operation are transmitted to the result cache unit out of order.

The M-level expression cache units are M parallel expression cache units, no data or signal transmission exists between every two M-level parallel expression cache units, and each level of expression cache unit is internally provided with an initial calculation expression corresponding to one level. As shown in fig. 3, each level of the expression cache unit includes an expression calculation result output terminal, a current calculation result input terminal, a storage status signal output terminal, an access status signal input terminal, an operand output terminal, and a calculation completion signal input terminal; the expression calculation result output end is connected with a corresponding expression calculation result input end of the access control unit and is used for realizing that the first-level expression cache unit transmits the expression calculation result of the execution of the complete part of calculation operation to the access control unit; the current calculation result input end is used for receiving the current calculation result output by the calculation module by the first-level expression cache unit; the storage state signal output end is connected with a corresponding storage state signal input end of the access control unit and is used for realizing that the first-level expression cache unit transmits the storage state signal to the access control unit; the access state signal input end is connected with a corresponding access state signal output end of the access control unit and used for realizing that the first-level expression cache unit receives the access state signal transmitted by the access control unit; the operand input end is connected with a corresponding operand output end of the access control unit and is used for realizing that the first-level expression cache unit receives the operand transmitted by the access control unit; the operand output end is used for realizing that the first-level expression cache unit transmits the operand to be executed currently to the analysis unit of the same level; the calculation completion signal input end is used for receiving a calculation completion signal transmitted by the analysis unit of the same level by the expression cache unit of the level one; the calculation operation information comprises an operational character and an operand of the current calculation to be executed and the level number of the first-level expression cache unit; the calculation completion signal is a signal used for indicating that all calculation operations contained in the calculation expression of the first-level expression cache unit are executed; the storage state signal is used for representing the storage state of each level of expression cache unit, the storage state of the expression cache unit includes a to-be-stored state and a storage saturation state, when the expression cache unit is in the to-be-stored state, the access control unit transmits a group of operands to the expression cache unit, when the expression cache unit receives the group of operands, the to-be-stored state is converted into the storage saturation state, and it needs to be explained that the storage state of the expression cache unit does not limit the expression cache unit to receive the current calculation result transmitted by the calculation module.

The M-level analysis units are M parallel analysis units, data and/or signal transmission does not exist between every two M-level analysis units, the analysis units organize a data structure table according to calculation expressions configured in the same-level expression cache unit, and the calculation operation information is read from the expression cache unit according to the data structure table; wherein the contents of the data structure table include: a valid identifier column for identifying whether valid expression content exists for each row in the data structure table; the single and double identifier columns are used for identifying whether the operator is a monocular operator or a binocular operator and determining the operand quantity correspondingly required by the operator; the operator column is used for storing the calculation operation to be executed; an operation identifier column for identifying whether an operand exists for each row in the data structure table; a data column for storing operands of a computing operation to be performed; the information of the effective identifier column, the single and double identifier columns, the operation identifier column, the data column and the operator column is correspondingly configured by the analysis unit according to the calculation expression configured in the first-level expression cache unit; the valid expression contents refer to operands and operators. It should be noted that the data structure table is not an actually existing table, but a virtual table obtained by organizing and sorting the initial expression in the expression cache unit, and only exists in a logical level. It will be appreciated that the initial expression is an expression in the form of an inverse polish expression, and the parsing unit parses the operands and operators contained therein from the initial expression, such as: if the initial expression is 'AB + C', the analysis unit acquires A, B and C operands and + C operators according to the initial expression, organizes the acquired operands and operators into a data structure table according to the sequence, and confirms information such as effective identifiers, single and double identifiers, operation identifiers and the like of each line according to the operands and operators. It should be noted that the operand obtained by the parsing unit according to the initial expression is only an operand tag code, and is not an actual operand, and the parsing unit needs to analyze the operand tag code to be currently executed on the data structure table, and then reads the actual operand from the expression cache unit according to the operand tag code.

As shown in fig. 4, the calculation operation information input end is connected to the calculation operation information output end of the expression cache unit at the same level, and is configured to enable the first-level parsing unit to read calculation operation information from the expression cache unit at the same level; the calculation completion signal output end is connected with the calculation completion signal input end of the expression cache unit at the same level and used for realizing that the analysis unit at the level transmits a calculation completion signal to the expression cache unit at the same level; the calculation authorization signal input end is used for receiving the calculation authorization signal output by the distribution unit by the primary analysis unit; the calculation operation information output end is used for realizing that the primary analysis unit transmits the calculation operation information to the distribution unit; the calculation authorization signal is a signal fed back by the distribution unit when the calculation operation information transmitted by the primary analysis unit meets the condition of the target calculation unit, so as to indicate that the calculation resource application is successful; the kth level expression caching unit is an expression caching unit at the same level of the kth level analysis unit, and k is an integer which is less than or equal to M and greater than 0; the condition of the target computing unit is met, namely the computing unit can execute the computing operation information and is in an idle state, and the computing unit is determined to be the target computing unit.

The distribution unit is internally provided with calculation operations correspondingly executed by each level of calculation unit; as shown in fig. 5, the distribution unit includes M computation operation information input terminals, M computation authorization signal output terminals, N idle state signal input terminals, and N computation operation information output terminals; the M computing operation information input ends are correspondingly connected with the M computing operation information output ends of the M-level parallel analysis units and are used for realizing that the distribution unit respectively receives the computing operation information transmitted by the M-level analysis units; the M calculation authorization signal output ends are correspondingly connected with the M calculation authorization signal input ends of the M-level parallel analysis units and are used for transmitting the calculation authorization signals to the one-level analysis unit corresponding to the calculation operation information by the distribution unit; the N idle state signal input ends are correspondingly connected with the N idle state signal output ends of the N-level parallel computing units and are used for realizing that the distributing unit respectively receives N idle state signals transmitted by the N-level computing units; the N computing operation information output ends are correspondingly connected with the N computing operation information input ends of the N-level parallel computing units and used for realizing that the distributing unit transmits the computing operation information to the target computing unit; wherein, the target computing unit refers to a computing unit which can execute the computing operation information and is in an idle state.

The calculation module comprises an N-level calculation unit and an expression cache unit, wherein the N-level calculation unit is used for acquiring a calculation result based on the calculation operation information transmitted by the control module and outputting the calculation result to the control module; the N-stage computing units are independent computing units of N parallel operations, no data or signal transmission exists between every two computing units, and each stage of computing unit is configured to perform a computing operation, which may be, but is not limited to, addition, multiplication, division, evolution, and the like. As shown in fig. 6, each stage of the computing unit includes a computing operation information input terminal, an idle state signal output terminal, and M current computing result output terminals; the calculation operation information input end is used for receiving the calculation operation information transmitted by the distribution unit by the primary calculation unit; the idle state signal output end is used for realizing that the primary computing unit transmits an idle state signal to the distribution unit; the M current calculation result output ends are respectively connected with the M current calculation result input ends of the M-level expression cache units and are used for realizing that the one-level calculation unit transmits the calculation result to the one-level expression cache unit corresponding to the calculation operation information corresponding to the current calculation result; wherein the idle state signal is a signal for indicating whether the primary computing unit is in an idle state; the idle state refers to a state in which the computing unit is capable of receiving information of computing operations to be performed and performing corresponding computing operations. It can be understood that, the M current calculation result output ends of each level of calculation unit are respectively connected with the M current calculation result input ends of the M level of expression cache unit in a one-to-one correspondence manner, and each level of calculation unit determines the first level of expression cache unit to which the current calculation result should be output according to the input calculation operation information.

Preferably, in a parallel operation acceleration system provided by an embodiment of the present invention, compared with the parallel operation acceleration system of the foregoing embodiment, a control module of the parallel operation acceleration system includes an access control unit, M expression cache units, M parsing units, and a distribution unit, and a computation module includes N computation units, where M is an integer smaller than or equal to N and greater than 0, and N is an integer greater than 0. In this embodiment, the number N of the computing units is greater than the number M of the parsing units, because each computing unit can only perform one computing operation, when the number of the computing units is greater than the number of the parsing units, the control module can better and faster respond to the computing operation information sent by the parsing units by the distribution unit, the control module can more flexibly allocate the computing operation information, and the utilization rate of the computing resources inside the acceleration unit can be increased to 100% under the optimal condition.

In another embodiment of the present invention, based on the parallel operation acceleration system of the above embodiment, an operation method of the parallel operation acceleration system is disclosed, as shown in fig. 7, the operation method of the parallel operation acceleration system includes the following steps:

step 1: when the parallel operation accelerating system is started, each level of analysis unit reads the initial calculation expression corresponding to the first level from the expression cache unit corresponding to the first level, and each level of analysis unit organizes the data structure table corresponding to the first level according to the calculation expression corresponding to the first level;

step 2: the access control unit judges whether an expression cache unit in a to-be-stored state exists in the M-level parallel expression cache units, if the kth-level expression cache unit is in the to-be-stored state, the access control unit reads a group of operands from the operand cache module and transmits the operands to the kth-level expression cache unit, and the kth-level expression cache unit receives the group of operands transmitted by the access control unit and then converts the operands in-to-be-stored state into the stored state;

and step 3: the kth level analysis unit judges whether the current calculation operation to be executed exists according to the kth level data structure table, if yes, the step 4 is carried out, if not, the kth level analysis unit transmits a calculation completion signal to the kth level expression caching unit, the kth level expression caching unit receives the calculation completion signal and then transmits the current calculation result which is cached latest in the kth level expression caching unit as an expression calculation result for executing the complete calculation operation to the access control module, and the access control module transmits the expression calculation result which is transmitted by the kth level expression caching unit and used for executing the complete calculation operation to the result caching module for caching.

And 4, step 4: the k-th level analyzing unit determines operands required by the current to-be-executed computing operation according to the k-th level data structure table, reads corresponding operands from the k-th level expression cache unit according to the operands required by the current to-be-executed computing operation, acquires the current to-be-executed computing operation information, and transmits the current to-be-executed computing operation information to the distributing unit;

and 5: the distribution unit analyzes the calculation operation requested to be executed by the kth level analysis unit according to the received calculation operation information, determines a target calculation unit according to the calculation operation requested to be executed by the kth level analysis unit, transmits the calculation operation information to be executed currently of the kth level analysis unit to the target calculation unit, and transmits a calculation authorization signal to the kth level analysis unit;

step 6: the target calculation unit acquires operands of the calculation operation to be executed currently according to the calculation operation information to be executed currently, executes corresponding calculation operation, acquires a current calculation result and transmits the current calculation result to the kth level expression cache unit;

and 7: caching the current calculation result by the k-th level expression caching unit, updating the calculation expression of the k-th level expression caching unit according to the current calculation result, and returning to the step 3;

wherein k is an integer less than or equal to M and greater than 0.

Specifically, when the parallel computing acceleration system is in the process of executing the step 3 to the step 7, the parallel computing acceleration system is also executing the step 2 at the same time; the step 2 further comprises: when the expression cache unit in the state to be stored does not exist in the expression cache units in the M-level parallel, the access control unit repeatedly judges whether the expression cache unit in the state to be stored exists in the expression cache units in the M-level parallel; when the M-level parallel expression cache units have more than one expression cache units in a to-be-stored state, the storage control unit reads a group of operands from the operand cache module in sequence according to the sequence of detecting the expression cache units in the to-be-stored state and transmits the operands to the corresponding expression cache units in the to-be-stored state in sequence, and the expression cache units in the to-be-stored state are converted into a storage saturation state from the to-be-stored state after receiving the group of operands transmitted by the access control unit.

Based on the foregoing embodiment, the method for operating a parallel operation acceleration system disclosed in another embodiment of the present invention further includes, in step 3: the kth level analysis unit transmits a calculation completion signal to the kth level expression cache unit, and meanwhile, the kth level analysis unit updates a data structure table in the kth level analysis unit into an initial data structure table; when the kth-level expression caching unit transmits an expression computing result of executing complete computing operation to the access control module, the kth-level expression caching unit is converted from a stored state to a to-be-stored state, and meanwhile, a computing expression of the kth-level expression caching unit is updated to be an initial computing expression of the kth level; the initial data structure table refers to a data structure table organized by the analysis unit according to the initial expression corresponding to the first-level expression cache unit.

Based on the foregoing embodiment, the method for operating a parallel operation acceleration system disclosed in another embodiment of the present invention includes: the distributing unit analyzes the computing operation requested to be executed by the kth-level analyzing unit according to the received computing operation information, matches the computing operation requested to be executed by the kth level with the computing operation correspondingly executed by the N-level computing unit arranged in the distributing unit, screens out at least one level of computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit from the N-level computing unit, determines the one level of computing unit which has the lowest transmission cost and is in an idle state in the computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit as a target computing unit by combining idle state signals of the computing unit capable of executing the computing operation requested to be executed by the kth-level analyzing unit, and transmits the information of the computing operation currently to be executed by the kth-level analyzing unit to the target computing unit, meanwhile, the distribution unit transmits a calculation authorization signal to the kth level analysis unit; the lowest transmission cost means that the time required for transmitting the information of the current calculation operation to be executed to the first-level calculation unit in the distribution unit is shortest and the occupied system resources are least.

Preferably, after the kth-level parsing unit receives the calculation authorization signal transmitted by the distribution unit, the kth-level parsing unit updates the data structure table information inside the kth-level parsing unit.

Specifically, the updating of the data structure table inside the kth level parsing unit specifically includes: replacing operands contained in the calculation operation information corresponding to the calculation authorization signal and row information of the operators in a data structure table with a row of operand information; the replacing with a row of operand information means that the current calculation result corresponding to the calculation operation information is correspondingly updated in the data structure table in a code form as an operand.

Preferably, the distribution control of the calculation operation information by the distribution unit is based on such that one calculation unit can execute the same two or more calculation operations within the same calculation expression. Specifically, when one calculation expression includes three addition calculation operations, there is a calculation unit that performs the addition calculation operation, and in the case where the target calculation unit screening condition is met, the three accelerated calculation operations are performed by the same calculation unit that performs the addition calculation operation; wherein the target computing unit screening condition includes: the lowest cost of transmission, being in an idle state and can perform the requested computing operation.

It should be noted that, in the above embodiments, the operand caching module and the result caching module are essentially storage media, and the storage media may be, but are not limited to, various storage media that can store program codes, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), and the like; each expression caching unit, each parsing unit, each distribution unit and each access control unit in the control module and each calculation unit in the calculation module can be, but are not limited to, a digital circuit module formed by compiling a designer by using a hardware description language Verilog HDL, or a digital circuit module formed by circuit drawing or compiling a designer on software with a circuit drawing or compiling function. In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one processing module.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:跨平台资料处理系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!