System, method and device for neural network operation and storage medium

文档序号:510376 发布日期:2021-05-28 浏览:27次 中文

阅读说明:本技术 一种神经网络运算的系统、方法、装置及存储介质 (System, method and device for neural network operation and storage medium ) 是由 刘文峰 于 2019-11-27 设计创作,主要内容包括:本发明提供一种神经网络运算的系统、方法、装置及存储介质,用于减少神经网络计算过程中的数据搬移,提升神经网络处理器的运算效率。神经网络运算系统包括至少两个神经网络处理单元、第一存储单元和第二存储单元,其中:第一存储单元,用于存储神经网络的输入数据、输出数据和每层神经网络运算所需的运算参数;第二存储单元,用于为至少两个神经网络处理单元中的每个神经网络单元提供输入缓存和输出缓存,每个神经网络处理单元包括两个输入缓存和两个输出缓存,相邻两个神经网络处理单元中的一个神经网络处理单元的两个输出缓存为另一个神经网络处理单元的两个输入缓存;至少两个神经网络处理单元进行环形连接。(The invention provides a system, a method, a device and a storage medium for neural network operation, which are used for reducing data movement in the neural network calculation process and improving the operation efficiency of a neural network processor. The neural network operation system comprises at least two neural network processing units, a first storage unit and a second storage unit, wherein: the first storage unit is used for storing input data and output data of the neural network and operation parameters required by operation of each layer of the neural network; the second storage unit is used for providing an input buffer and an output buffer for each neural network unit in at least two neural network processing units, each neural network processing unit comprises two input buffers and two output buffers, and the two output buffers of one neural network processing unit in two adjacent neural network processing units are the two input buffers of the other neural network processing unit; at least two neural network processing units are connected in a ring.)

1. A neural network computing system, comprising at least two neural network processing units, a first storage unit, and a second storage unit, wherein:

the first storage unit is used for storing input data and output data of the neural network and operation parameters required by operation of each layer of neural network;

the second storage unit is configured to provide an input buffer and an output buffer for each of the at least two neural network processing units, each neural network processing unit includes two input buffers and two output buffers, and two output buffers of one of the two adjacent neural network processing units are two input buffers of the other neural network processing unit;

the at least two neural network processing units are connected in a ring shape, each of the at least two neural network processing units sequentially carries out convolution operation on partial row-column data of each layer of the neural network, and an obtained convolution operation result is stored in an output cache of the neural network processing unit, so that a downstream neural network processing unit adjacent to the neural network processing unit obtains the convolution operation result from the output cache, and convolution operation of a next layer of network is carried out according to the convolution operation result.

2. A neural network operation method applied to a neural network operation system including at least two neural network processing units including a first neural network processing unit and a second neural network processing unit which are adjacent to each other, the method comprising:

the first neural network processing unit sequentially performs convolution operation on partial row-column data of a layer of network, and stores an obtained first convolution operation result to a first output cache of the first neural network processing unit; when the cache switching triggering condition is met, the first neural network processing unit switches the output cache from the first output cache to a second output cache;

and the second neural network processing unit obtains the first convolution operation result from the first output cache, performs convolution operation on the next layer of network adjacent to the layer of network according to the first convolution operation result, and stores the obtained second convolution operation result into one output cache of the second neural network processing unit.

3. The method of claim 2, wherein determining that a cache switch triggering condition is satisfied comprises:

and determining to complete all operations on the part of row and column data.

4. The method of claim 2, wherein the method further comprises:

after the second neural network processing unit finishes the operation on the first convolution operation result, determining that the data in the second output cache is finished storing preparation;

if the data in the second output cache is prepared for storage, switching the input cache of the second neural network unit from the first output cache to the second output cache so as to acquire the data from the second output cache for operation;

and if the data in the second output buffer does not finish the storage preparation, waiting.

5. The method of claim 2, wherein the first neural network processing unit sequentially convolves the partial row and column data of a layer of the network, comprising:

and the first neural network processing unit sequentially performs convolution operation on partial row-column data of a layer of network according to a preset interval step, wherein the preset interval step is determined according to the first output buffer and the second output buffer.

6. The method of any one of claims 2-4, wherein the first neural network processing unit sequentially convolves the partial row-column data of a layer of the network, comprising:

and if the operation is carried out according to rows, the first neural network processing unit carries out convolution operation on all columns of part of the rows and data of all input channels.

7. The method of any one of claims 2-4, wherein the first neural network processing unit sequentially convolves the partial row-column data of a layer of the network, comprising:

and if the operation is carried out according to the columns, the first neural network processing unit carries out convolution operation on all rows of part of the columns and data of all input channels.

8. A neural network operation device, the device comprising:

the first processing module is used for the first neural network processing unit to sequentially carry out convolution operation on partial row-column data of a layer of network and store an obtained first convolution operation result to a first output cache of the first neural network processing unit; when the cache switching triggering condition is met, the first neural network processing unit switches the output cache from the first output cache to a second output cache;

and the second processing module is used for obtaining the first convolution operation result from the first output cache by the second neural network processing unit, performing convolution operation on the next layer of network adjacent to the layer of network according to the first convolution operation result, and storing the obtained second convolution operation result into one output cache of the second neural network processing unit.

9. A computing device, wherein the computing device comprises:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory and for executing the steps comprised in the method of any one of claims 2 to 7 in accordance with the obtained program instructions.

10. A storage medium storing computer-executable instructions for causing a computer to perform the steps comprising the method of any one of claims 2-7.

Technical Field

The present invention relates to processors, and more particularly, to a system, method, apparatus, and storage medium for neural network operations.

Background

In the process of running the Neural network model, a Neural Network Processing Unit (NPU) can provide enough computing power for training and reasoning of the deep Neural network. Because the deep neural network is calculated according to layers, the input and output data volume of each layer is large, and therefore a considerable memory is required to be provided for storing the intermediate data and the weight data, a large amount of resources are wasted in data transportation between the cache and the memory, the problem of a storage wall is caused, and the operation efficiency of the processor is reduced.

Disclosure of Invention

The embodiment of the application provides a system, a method and a device for neural network operation and a storage medium, which are used for reducing data movement in the neural network calculation process and improving the operation efficiency of a neural network processor.

In a first aspect, a neural network operation system is provided, which includes at least two neural network processing units, a first storage unit and a second storage unit, wherein:

the first storage unit is used for storing input data and output data of the neural network and operation parameters required by operation of each layer of neural network;

the second storage unit is configured to provide an input buffer and an output buffer for each of the at least two neural network processing units, each neural network processing unit includes two input buffers and two output buffers, and two output buffers of one of the two adjacent neural network processing units are two input buffers of the other neural network processing unit;

the at least two neural network processing units are connected in a ring shape, each of the at least two neural network processing units sequentially carries out convolution operation on partial row-column data of each layer of the neural network, and an obtained convolution operation result is stored in an output cache of the neural network processing unit, so that a downstream neural network processing unit adjacent to the neural network processing unit obtains the convolution operation result from the output cache, and convolution operation of a next layer of network is carried out according to the convolution operation result.

In a second aspect, a method for neural network operation is provided, where the method is applied to a neural network operation system, the neural network operation system includes at least two neural network processing units, the at least two neural network processing units include a first neural network processing unit and a second neural network processing unit which are adjacent to each other, and the method includes:

the first neural network processing unit sequentially performs convolution operation on partial row-column data of a layer of network, and stores an obtained first convolution operation result to a first output cache of the first neural network processing unit; when the cache switching triggering condition is met, the first neural network processing unit switches the output cache from the first output cache to a second output cache;

and the second neural network processing unit obtains the first convolution operation result from the first output cache, performs convolution operation on the next layer of network adjacent to the layer of network according to the first convolution operation result, and stores the obtained second convolution operation result into one output cache of the second neural network processing unit.

Optionally, determining that the cache handover triggering condition is met includes:

and determining to complete all operations on the part of row and column data.

Optionally, the method further includes:

after the second neural network processing unit finishes the operation on the first convolution operation result, determining that the data in the second output cache is finished storing preparation;

if the data in the second output cache is prepared for storage, switching the input cache of the second neural network unit from the first output cache to the second output cache so as to acquire the data from the second output cache for operation;

and if the data in the second output buffer does not finish the storage preparation, waiting.

Optionally, the first neural network processing unit sequentially performs convolution operation on part of row-column data of the first-layer network, including:

and the first neural network processing unit sequentially performs convolution operation on partial row-column data of a layer of network according to a preset interval step, wherein the preset interval step is determined according to the first output buffer and the second output buffer.

Optionally, the first neural network processing unit sequentially performs convolution operation on part of row-column data of the first-layer network, including:

and if the operation is carried out according to the rows, the first neural network processing unit carries out convolution operation on all columns of the preset number of rows and data of all input channels.

Optionally, the first neural network processing unit sequentially performs convolution operation on part of row-column data of the first-layer network, including:

and if the operation is carried out according to the columns, the first neural network processing unit carries out convolution operation on all rows of the preset number of columns and data of all input channels.

In a third aspect, a neural network operation device is provided, the device including:

the first processing module is used for the first neural network processing unit to sequentially carry out convolution operation on partial row-column data of a layer of network and store an obtained first convolution operation result to a first output cache of the first neural network processing unit; when the cache switching triggering condition is met, the first neural network processing unit switches the output cache from the first output cache to a second output cache;

and the second processing module is used for obtaining the first convolution operation result from the first output cache by the second neural network processing unit, performing convolution operation on the next layer of network adjacent to the layer of network according to the first convolution operation result, and storing the obtained second convolution operation result into one output cache of the second neural network processing unit.

Optionally, the first processing module is configured to:

and determining to complete all operations on the part of row and column data.

Optionally, the first processing module is further configured to:

after the second neural network processing unit finishes the operation on the first convolution operation result, determining that the data in the second output cache is finished storing preparation;

if the data in the second output cache is prepared for storage, switching the input cache of the second neural network unit from the first output cache to the second output cache so as to acquire the data from the second output cache for operation;

and if the data in the second output buffer does not finish the storage preparation, waiting.

Optionally, the first processing module is configured to:

and the first neural network processing unit sequentially performs convolution operation on partial row-column data of a layer of network according to a preset interval step, wherein the preset interval step is determined according to the first output buffer and the second output buffer.

Optionally, the first processing module is configured to:

and if the operation is carried out according to the rows, the first neural network processing unit carries out convolution operation on all columns of the preset number of rows and data of all input channels.

Optionally, the first processing module is configured to:

and if the operation is carried out according to the columns, the first neural network processing unit carries out convolution operation on all rows of the preset number of columns and data of all input channels.

In a fourth aspect, a computing device is provided, the computing device comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the steps included in the method of any of the second aspect according to the obtained program instructions.

In a fifth aspect, there is provided a storage medium storing computer-executable instructions for causing a computer to perform the steps included in the method of any one of the second aspects.

A sixth aspect provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the computing device described in the various possible implementations described above.

In an embodiment of the present application, a neural network operation system is provided, which includes at least two neural network processing units, a first storage unit and a second storage unit, wherein the first storage unit is configured to store input data, output data and operation parameters required for operation of each layer of neural network of a neural network, the second storage unit is configured to provide an input buffer and an output buffer for each of the at least two neural network processing units, and each neural network processing unit includes two input buffers and two output buffers, two output buffers of one neural network processing unit of two adjacent neural network processing units are two input buffers of another neural network processing unit, the at least two neural network processing units are connected in a ring, wherein each of the at least two neural network processing units, and sequentially carrying out convolution operation on partial row-column data of each layer of the neural network, storing the obtained convolution operation result to an output cache of the neural network processing unit, so that a downstream neural network processing unit adjacent to the neural network processing unit obtains the convolution operation result from the output cache, and carrying out convolution operation on the next layer of network according to the convolution operation result.

That is to say, at least two neural network processing units connected in a ring manner are adopted, each neural network processing unit is responsible for calculation of a certain layer of the neural network, the next adjacent neural network processing unit calculates the next layer of the neural network, each neural network processing unit is provided with two input caches and two output caches, the two output caches of the previous neural network processing unit and the two input caches of the next adjacent neural network processing unit are shared, after the current neural network processing unit obtains a part of calculation results of row and column data, the output caches are switched, so that the next adjacent neural network processing unit obtains the calculation results of the row and column data in the previous layer of the neural network to perform calculation of the next layer of the neural network, the transportation of middle layer data is reduced by the interlayer pipeline parallel operation method, and the pipeline parallel calculation of the adjacent layers is realized, the utilization rate of the operation resources is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application.

Fig. 1 is a schematic structural diagram of a neural network operation system according to an embodiment of the present disclosure;

fig. 2 is another schematic structural diagram of a neural network operation system according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for neural network operation provided by an embodiment of the present application;

fig. 4 is a flowchart of a specific implementation process of a neural network operation method provided in an embodiment of the present application;

fig. 5 is a block diagram of a neural network operation device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.

For ease of understanding, the technical background of the embodiments of the present invention will be described below.

As described above, in the process of running a neural network model, at present, since a deep neural network is calculated layer by layer, generally, after data of one layer is calculated, data of a next layer is calculated, and therefore, after data of each layer is calculated, all the data of each layer is stored in a memory, and when the data of the next layer is calculated, the data is read from the memory, so that the input and output data volume of each layer is large, and thus, a large amount of calculation resources are wasted in data transportation.

In view of this, in order to improve the utilization rate of computational resources and the computational efficiency of a processor, embodiments of the present application provide a Neural network computing system, method and apparatus, in which a plurality of Neural Network Processing Units (NPUs) are connected together in a ring connection manner, please refer to fig. 1, in fig. 1, at least two NPUs are connected in a ring connection manner, each NPU is responsible for a certain layer of computation of a Neural network, and an adjacent computing Unit computes a next layer of the Neural network, wherein each NPU has two input buffers and two output buffers, and two output buffers of this Unit share two input buffers of the adjacent computing Unit, so when an NPU Unit at a previous layer computes a part of row and column data of this layer, that is, the computation result of this part is put into an output buffer, and then the current output buffer is switched to another output buffer of the two output buffers, and continuing to calculate the rest of data, and putting the calculated result into another output buffer, wherein at this time, the adjacent next NPU unit reads the data stored in the output buffer before the switching of the previous NPU buffer to perform the calculation of the adjacent next layer, that is, the second part of data of the upper layer in the neural network and the first part of data of the next layer are calculated simultaneously, wherein the first part of data of the next layer is calculated by the first part of data of the upper layer. Therefore, by the interlayer pipeline parallel operation method, the intermediate data storage requirement is greatly reduced, the data movement is reduced, the data waiting time during calculation is reduced, and the operation efficiency of the processor is improved.

The technical scheme provided by the embodiment of the application is described in the following with the accompanying drawings of the specification.

Referring to fig. 2, fig. 2 is a diagram of a neural network computing system according to an embodiment of the present disclosure, where the neural network computing system includes a first storage unit 201, a second storage unit 202, and at least two neural network processing units 203.

A first storage unit 201, configured to store input data and output data of a neural network and operation parameters required by operation of each layer of the neural network;

a second storage unit 202, configured to provide an input buffer and an output buffer for each of the at least two neural network processing units, where each neural network processing unit includes two input buffers and two output buffers, and two output buffers of one of the two adjacent neural network processing units are two input buffers of another neural network processing unit;

at least two neural network processing units 203 which are connected in a ring shape, and each of the at least two neural network processing units sequentially performs convolution operation on partial row-column data of each layer of the neural network, and stores the obtained convolution operation result to an output buffer of the neural network processing unit, so that a downstream neural network processing unit adjacent to the neural network processing unit obtains the convolution operation result from the output buffer, and performs convolution operation of the next layer of network according to the convolution operation result.

Based on the same inventive concept, please refer to fig. 3, and fig. 3 provides a neural network operation method according to an embodiment of the present application. The flow chart of the method shown in fig. 3 is described as follows:

step 301: at least two NPUs are connected in a ring.

In the embodiment of the application, at least two NPUs are connected in a ring shape; one NPU is used for calculating a neural network of one layer, the adjacent NPUs are used for calculating the neural network of the next layer, each NPU comprises two input buffers and two output buffers, and the two output buffers of the previous NPU are shared with the two input buffers of the adjacent next NPU. The reason why two input buffers and two output buffers are used is mainly because if there is only one buffer, when the output of one layer and the input of the other layer are performed simultaneously, it is necessary to arbitrate the time-sharing access after the arbitration, which affects the processing efficiency, therefore, two buffers are designed, when the output of the previous layer is full of one buffer, the NPU of the next adjacent layer takes the data stored in the buffer as input to carry out the operation of the next layer, and simultaneously, the output of the previous layer is switched to the output of the other buffer to continue outputting, so that two NPUs cannot simultaneously access one buffer, when two adjacent neural network processing units are connected, an annular structure is formed by the head-to-tail connection of the cache, for example, the output buffer 1 of the previous neural network processing unit is connected to the input buffer 1 of the next neural processing unit, and the output buffer 2 of the neural processing unit of the previous layer is connected to the input buffer 2 of the neural processing unit of the next layer.

Step 302: the first neural network processing unit sequentially performs convolution operation on partial row-column data of the first layer of network, and stores an obtained first convolution operation result to a first output cache of the first neural network processing unit.

In the embodiment of the application, because the middle layer data of the neural network is a three-dimensional array, each input channel is a two-dimensional array, the two-dimensional array has rows and columns, if only part of row and column data is calculated, only part of row and column data can be loaded, convolution operation is only performed on the loaded part of row and column data, then the operation result is stored in the first output buffer of the current neural network processing unit, and then the first neural network processing unit continues to perform convolution operation on the rest part of row and column data in sequence.

In a possible implementation, the first neural network processing unit determines the number of partial rows or partial columns (which may be interval steps, for example) loaded at each time according to the buffering capacity of the first output buffer and the second output buffer, and then performs a convolution operation according to the determined partial row and column data.

In a possible embodiment, the data output mode is calculated by rows (i.e. output by rows), the NPU calculates the convolution data of all columns and all input channels of a predetermined number of rows according to the input columns and the input channels, outputs the calculation result of part of the rows of data to the first output buffer, for example, the data output mode is output by rows, and the preset number of rows to be calculated per batch is 10 rows, then calculates the data of all input columns and input channels of 1-10 rows first, and then puts the calculation result of 1-10 rows into the current output buffer, since only the dimension of a row in the output buffer is incomplete, that is, the data of two dimensions of a column and an input channel are complete, because the convolution operation has locality, partial row input can be output by partial row, and it does not need to input all rows to be calculated, therefore, the partial row data can be used for the adjacent NPUs to start convolution operation of another layer, and the operation efficiency is improved.

In another possible embodiment, the data output mode is column-wise calculation (i.e. output by column), the NPU calculates the convolution data of all rows and all input channels of a predetermined number of columns according to the input rows and input channels, and outputs the calculation result of partial columns of data to the first output buffer, for example, the data output mode is column-wise output, and the number of columns for calculation per batch is 10 columns, then first calculates the data of all input rows and input channels of 1-10 columns, and then puts the calculation result of 1-10 columns into the current output buffer, since only the dimension of a column in the output buffer is incomplete, that is, the data of two dimensions of a row and an input channel is complete, because convolution operation has locality, partial column input can be output by partial column input, and it does not need to input all columns for calculation, therefore, the partial column data can be used for the adjacent NPU to start convolution operation of another layer, and the operation efficiency is improved.

Step 303: it is determined whether a cache switch triggering condition is satisfied.

In a possible embodiment, when the first neural network processing unit has calculated all operations of part of the row and column data, for example, only 10 rows or 10 columns of data are loaded, it is determined that the cache switch triggering condition is satisfied when all operations of the current 10 rows or 10 columns of data have been calculated.

Step 304: and when the cache switching triggering condition is met, the first neural network processing unit switches the output cache from the first output cache to the second output cache.

In this embodiment of the present application, after the first neural network processing unit calculates data of a part of rows and columns, the current first output buffer will be switched to the second output buffer, and the data in the current output buffer is used for data sharing with the input buffer of the next layer of neural processing unit.

In a possible embodiment, when the data output mode is output by row or column, the neural processing unit switches the current output buffer to another output buffer when the current batch of data is calculated (for example, when the data of 1-10 rows or 1-10 columns is calculated), continues to calculate all data of 11-20 rows or 11-20 columns after the buffer switching, and then puts the calculation result of 11-20 rows or 11-20 columns in the output buffer after the switching (second output buffer).

Step 305: the second neural network processing unit obtains the first convolution operation result from the first output cache and carries out convolution operation of the next layer of network adjacent to the first layer of network according to the first convolution operation result

In this embodiment, the second neural network processing unit obtains a first convolution operation result of part of the rows and columns of data from the first output buffer, and performs convolution operation of the layer according to the operation result.

Step 306: and storing the obtained second convolution operation result to an output buffer memory of the second neural network processing unit.

In this embodiment, the second neural network processing unit stores a second convolution operation result obtained by performing operation on the first convolution operation result in the layer into an output buffer of the second neural network processing unit.

In a possible implementation manner, after the second neural network processing unit completes the operation on the first convolution operation result, whether the data in the second output cache of the first neural network processing unit is prepared for storage is judged, and if the data in the second output cache is prepared for storage, the input cache of the second neural network unit is switched from the first output cache to the second output cache so as to acquire the data from the second output cache for operation; and if the data in the second output cache does not finish the storage preparation, waiting until the data in the second output cache finishes the storage preparation, switching the input cache of the second neural network unit from the first output cache to the second output cache so as to acquire the data from the second output cache and continue the operation.

Specifically, as shown in fig. 4, for example, there are 4 NPUs in the neural network system, where a secondary buffer or a random access memory is used to store input/output data of the neural network and operation parameters required in the operation process of the neural network processing unit, the NPU1 is used to calculate data of the layer 1 of the neural network, the NPU2 is used to calculate data of the layer 2 of the neural network, the NPU3 is used to calculate data of the layer 3 of the neural network, the NPU4 is used to calculate data of the layer 4 of the neural network, the output buffer 11 and the output buffer 12 are two output buffers of the NPU1, where the output buffer 11 of the NPU1 and the input buffer 21 of the NPU2 are shared, that is, the output buffer 11 of the NPU1 and the input buffer 21 of the NPU2 are the same buffer, the output buffer 12 of the NPU1 and the input 31 of the NPU2 are the same, the output buffer 21 of the NPU2 and the input 31 of the NPU3 are the same, the output buffer 22 of the NPU2 and the input buffer 32 of the NPU3 are the same buffer, the output buffer 31 of the NPU3 and the input buffer 41 of the NPU4 are the same buffer, the output buffer 32 of the NPU3 and the input buffer 42 of the NPU4 are the same buffer, the output buffer 41 of the NPU4 and the input buffer 11 of the NPU1 are the same buffer, and the output buffer 42 of the NPU4 and the input buffer 12 of the NPU1 are the same buffer.

In the operation process, the NPU2 sequentially obtains input data from two input buffers, taking 10 lines of data calculated in each batch as an example, the NPU1 calculates 1-10 lines of data, stores the calculation result in the output buffer 11, the NPU2 obtains data from the input buffer 21 (or the output buffer 11), at this time, the NPU1 switches the output buffer to the output buffer 12, calculates 11-20 lines of data, and stores the calculation result in the output buffer 12, when the data in the input buffer 21 is used up, the input buffer 21 is switched to the input buffer 22 to continue to obtain data, at this time, if the data in the output buffer 12 is not ready, the NPU2 waits until the data in the output buffer 12 is ready, the NPU2 performs the convolution calculation of the second batch in the layer, and at the same time, the NPU1 switches the output buffer 12 back to the output buffer 11, two input buffers and two output buffers of each neural network processing unit are alternately buffered, all data computation up to this layer is complete, wherein the NPU1 is also used to wait for output data at layer 4 of the neural network at layer 5 of the neural network after the NPU1 has completed all data computation at layer one of the neural network. Therefore, the data of each layer is shared while calculation by utilizing the parallelism of calculation of adjacent layers of convolution, the waiting time for finishing calculation of each layer of data is saved, and the calculation efficiency of the processor is improved.

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the same inventive concept, the embodiment of the present application provides a device for neural network operation, which can implement the corresponding function of the foregoing method for neural network operation. The neural network operation device can be a hardware structure, a software module or a hardware structure and a software module. The device for neural network operation can be realized by a chip system, and the chip system can be formed by a chip and can also comprise the chip and other discrete devices. Referring to fig. 5, the neural network computing device includes a first processing module 501 and a second processing module 502. Wherein:

a first processing module 501, configured to perform convolution operation on partial row-column data of a first-layer network in sequence by a first neural network processing unit, and store an obtained first convolution operation result in a first output cache of the first neural network processing unit; when the cache switching triggering condition is met, the first neural network processing unit switches the output cache from the first output cache to a second output cache;

the second processing module 502 is configured to obtain the first convolution operation result from the first output buffer by the second neural network processing unit, perform convolution operation on a next layer of network adjacent to the first layer of network according to the first convolution operation result, and store an obtained second convolution operation result in an output buffer of the second neural network processing unit.

In one possible implementation, the first processing module 501 is configured to:

and determining to complete all operations on the part of row and column data.

In a possible implementation, the first processing module 501 is further configured to:

after the second neural network processing unit finishes the operation on the first convolution operation result, determining that the data in the second output cache is finished storing preparation;

if the data in the second output cache is prepared for storage, switching the input cache of the second neural network unit from the first output cache to the second output cache so as to acquire the data from the second output cache for operation;

and if the data in the second output buffer does not finish the storage preparation, waiting.

In one possible implementation, the first processing module 501 is configured to:

and the first neural network processing unit sequentially performs convolution operation on partial row-column data of a layer of network according to a preset interval step, wherein the preset interval step is determined according to the first output buffer and the second output buffer.

In one possible implementation, the first processing module 501 is configured to:

and if the operation is carried out according to the rows, the first neural network processing unit carries out convolution operation on all columns of the preset number of rows and data of all input channels.

In one possible implementation, the first processing module 501 is configured to:

and if the operation is carried out according to the columns, the first neural network processing unit carries out convolution operation on all rows of the preset number of columns and data of all input channels.

Based on the same inventive concept, the embodiment of the application provides a computing device. Referring to fig. 6, the computing device includes at least one processor 601 and a memory 602 connected to the at least one processor, in this embodiment, a specific connection medium between the processor 601 and the memory 602 is not limited in this application, in fig. 6, the processor 601 and the memory 602 are connected by a bus 600 as an example, the bus 600 is represented by a thick line in fig. 6, and a connection manner between other components is only schematically illustrated and is not limited. The bus 600 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 6 for ease of illustration, but does not represent only one bus or type of bus.

In the embodiment of the present application, the memory 602 stores instructions executable by the at least one processor 601, and the at least one processor 601 may execute the steps included in the method of the foregoing computing device by executing the instructions stored in the memory 602.

The processor 601 is a control center of the computing device, and may connect various parts of the entire computing device by using various interfaces and lines, and perform various functions of the computing device and process data by executing or executing instructions stored in the memory 602 and calling up data stored in the memory 602, thereby performing overall monitoring of the computing device. Alternatively, processor 601 may include one or more processing units, and processor 601 may integrate an application processor, which mainly handles operating systems and application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601. In some embodiments, the processor 601 and the memory 602 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 601 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for neural network operations disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

The memory 602, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 602 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 602 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

By programming the processor 601, the code corresponding to the method of the computing device described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the method of the computing device when running, and how to program the processor 601 is a technique known by those skilled in the art and is not described herein again.

Based on the same inventive concept, the present application also provides a storage medium, such as a computer-readable storage medium, which stores computer instructions that, when executed on a computer, cause a computing device (e.g., a computer) to perform the steps of the method for neural network operation as described above.

In some possible embodiments, the various aspects of the method of neural network operation provided herein may also be implemented in the form of a program product comprising program code for causing a detection apparatus to perform the steps of the method of neural network operation according to various exemplary embodiments of the present application described above in this specification, when the program product is run on a computing device.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data computing device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data computing device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data computing device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data computing device to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:呼入数据的受控缓存注入的方法、系统和介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类