Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus

文档序号：1184323 发布日期：2020-09-22 浏览：20次中文

阅读说明：本技术 多位复用乘加运算装置、神经网络运算系统以及电子设备 (Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus ) 是由邢小地孙旭光王绍迪于 2020-06-02 设计创作，主要内容包括：本发明提供一种多位复用乘加运算装置、神经网络运算系统以及电子设备,该多位复用乘加运算装置包括：两个多位复用乘法模块、移位寄存模块以及累加模块；其中一个多位复用乘法模块的输出端连接该累加模块的第一输入端,另外一个多位复用乘法模块的输出端连接该移位寄存模块的输入端,该移位寄存模块的输出端连接该累加模块的第二输入端。通过移位寄存模块以及累加模块的运用,使两个多位复用乘法模块配合实现多位复用乘加运算,实现元件的复用,节约功耗,能够有效推动神经网络算法的推广,尤其适用于稀疏神经网络运算。(The invention provides a multi-bit multiplexing multiply-add operation device, a neural network operation system and an electronic device, wherein the multi-bit multiplexing multiply-add operation device comprises: two multi-bit multiplexing multiplication modules, a shift register module and an accumulation module; the output end of one multi-bit multiplexing multiplication module is connected with the first input end of the accumulation module, the output end of the other multi-bit multiplexing multiplication module is connected with the input end of the shift register module, and the output end of the shift register module is connected with the second input end of the accumulation module. Through the application of the shift register module and the accumulation module, the two multi-bit multiplexing multiplication modules are matched to realize multi-bit multiplexing multiply-add operation, the multiplexing of elements is realized, the power consumption is saved, the popularization of a neural network algorithm can be effectively promoted, and the method is particularly suitable for sparse neural network operation.)

1. A multi-bit multiplexing multiply-add operation device, comprising: two multi-bit multiplexing multiplication modules, a shift register module and an accumulation module;

the output end of one multi-bit multiplexing multiplication module is connected with the first input end of the accumulation module, the output end of the other multi-bit multiplexing multiplication module is connected with the input end of the shift register module, and the output end of the shift register module is connected with the second input end of the accumulation module.

2. The multi-bit multiplexing multiply-add operation device of claim 1, wherein the multi-bit multiplexing multiply module comprises: two multipliers, a shift register and an accumulator;

the output end of one multiplier is connected with the first input end of the accumulator, the output end of the other multiplier is connected with the input end of the shift register, and the output end of the shift register is connected with the second input end of the accumulator.

3. The multi-bit multiplexing multiply-add operation device according to claim 2, wherein a circuit configuration of the multiplier is the same as a circuit configuration of the multi-bit multiplexing multiply module.

4. The multi-bit multiplexing multiply-add operation device according to claim 1, wherein the shift register module is a c-bit shift register or a d-bit shift register, and is configured to perform an a × b multiply-add operation, and two multi-bit multiplexing multiply modules are respectively configured to perform an a × c multiply operation and an a × d multiply operation;

wherein c + d ═ b.

5. The multi-bit multiplexing multiply-add operation device according to claim 4, wherein a-8, b-8, c-d-4, the multi-bit multiplexing multiply module comprises: two 8x2 multipliers, a 2-bit shift register, and an accumulator;

the output end of one 8x2 multiplier is connected with the first input end of the accumulator, the output end of the other 8x2 multiplier is connected with the input end of the 2-bit shift register, the output end of the 2-bit shift register is connected with the second input end of the accumulator, and the output end of the accumulator is used as the output end of the multi-bit multiplexing module.

6. The multi-bit multiplexing multiply-add operation device of claim 5, wherein the operation mode comprises: 8 × 8 mode, 8 × 4 mode and 8 × 2 mode, each component is controlled by the mode selection signal to realize the mode switching;

wherein the mode selection signal is determined according to the significance of the multiplier.

7. The multi-bit multiplexing multiply-add operation device according to any one of claims 1 to 6, applied to a neural network operation.

8. A neural network computing system, comprising: a storage-integrated arithmetic device, a multi-bit multiplexing multiply-add arithmetic device according to any one of claims 1 to 6, a shift register device, and an accumulation device;

the input end of the accumulation and calculation integrated operation device receives input data, and the output end of the accumulation and calculation integrated operation device is connected with the first input end of the accumulation device; the input end of the multi-bit multiplexing multiply-add operation device receives the input data, the output end of the multi-bit multiplexing multiply-add operation device is connected with the input end of the shift register device, and the output end of the multi-bit multiplexing multiply-add operation device is connected with the second input end of the accumulation device.

9. An electronic device comprising the multi-bit multiplexing multiply-add operation device according to any one of claims 1 to 7 or the neural network operation system according to claim 8.

Technical Field

The invention relates to the field of artificial intelligence, in particular to a multi-bit multiplexing multiply-add operation device, a neural network operation system and electronic equipment.

Background

Artificial Neural Networks (ans), also referred to as Neural Networks (NNs) or as Connection models (Connection models), are algorithm models that perform parallel information processing with a class-specific biological Neural network behavior characteristic. The neural network depends on the complexity of the system, and the aim of processing information is fulfilled by adjusting the interconnection relationship among a large number of internal nodes.

Artificial neural networks are widely used in the fields of intelligent control, pattern recognition, image/speech processing, etc. With the increasingly complex models and the increasingly more parameters, the calculation amount is also increased, and the hardware power consumption is overhigh.

Disclosure of Invention

The invention provides a multi-bit multiplexing multiply-add operation device, a neural network operation system and an electronic device, aiming at the problems in the prior art, and the multi-bit multiplexing multiply-add operation device, the neural network operation system and the electronic device can at least partially solve the problems in the prior art and fully utilize the advantages of the integrated storage and calculation technology and the neural network sparsity.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, a multi-bit multiplexing multiply-add operation apparatus is provided, including: two multi-bit multiplexing multiplication modules, a shift register module and an accumulation module;

Further, the multi-bit multiplexing multiplication module includes: two multipliers, a shift register and an accumulator;

Further, the circuit structure of the multiplier is the same as that of the multi-bit multiplexing multiplication module.

Furthermore, the multi-bit multiplexing multiply-add operation device is used for executing a multiplied-add operation a × b, the two multi-bit multiplexing multiplication modules are respectively used for executing a × c multiplication operation and a × d multiplication operation, and the shift register module is a c-bit shift register or a d-bit shift register;

wherein c + d ═ b.

Further, a is 8, b is 8, c is d is 4, the multi-bit multiplexing module includes: two 8x2 multipliers, a 2-bit shift register, and an accumulator;

Further, the operation mode includes: 8 × 8 mode, 8 × 4 mode and 8 × 2 mode, each component is controlled by the mode selection signal to realize the mode switching;

wherein the mode selection signal is determined according to the significance of the multiplier.

Further, the multi-bit multiplexing multiply-add operation device is applied to convolution operation.

In a second aspect, a neural network computing system is provided, including: a storage and calculation integrated arithmetic device, the multi-bit multiplexing multiply-add arithmetic device, the shift register device and the accumulation device;

In a third aspect, an electronic device is provided, which includes the multi-bit multiplexing multiply-add operation device or the neural network operation system.

The embodiment of the invention provides a multi-bit multiplexing multiply-add operation device, a neural network operation system and electronic equipment, wherein the multi-bit multiplexing multiply-add operation device comprises: two multi-bit multiplexing multiplication modules, a shift register module and an accumulation module; the output end of one multi-bit multiplexing multiplication module is connected with the first input end of the accumulation module, the output end of the other multi-bit multiplexing multiplication module is connected with the input end of the shift register module, and the output end of the shift register module is connected with the second input end of the accumulation module. Through the application of the shift register module and the accumulation module, the two multi-bit multiplexing multiplication modules are matched to realize multi-bit multiplexing multiply-add operation, the multiplexing of elements is realized, the power consumption is saved, the popularization of a neural network algorithm can be effectively promoted, and the method is particularly suitable for sparse neural network operation.

In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. In the drawings:

FIG. 1 is a block diagram of a multi-bit multiplexing multiply-add operation device according to an embodiment of the present invention;

FIG. 2 is a block diagram of a multi-bit multiplexing multiplier according to an embodiment of the present invention;

FIG. 3 is a diagram of an embodiment of a multi-bit multiplexing 8 × 8 multiply-add operation device according to the present invention;

FIG. 4 illustrates the ports of the multi-bit multiplexed 8 × 8 multiply-add operator of FIG. 3;

FIG. 5 is a block diagram showing the structure of a neural network operation system in the embodiment of the present invention;

FIG. 6 illustrates a neural network sparse matrix in an embodiment of the present invention;

fig. 7 shows a frame structure transmission scheme for the sparse matrix in fig. 6.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The detailed features and advantages of the present invention are described in detail in the following embodiments, which are sufficient for anyone skilled in the art to understand the technical content of the present invention and to implement the present invention, and the related objects and advantages of the present invention can be easily understood by anyone skilled in the art from the disclosure, the claims and the drawings of the present specification. The following examples further illustrate aspects of the present invention in detail, but are not intended to limit the scope of the invention in any way.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

FIG. 1 is a block diagram of a multi-bit multiplexing multiply-add operation device according to an embodiment of the present invention; as shown in fig. 1, the multi-bit multiplexing multiply-add operation device includes: a multi-bit multiplexing multiplication module 1, a multi-bit multiplexing multiplication module 2, a shift register module 3 and an accumulation module 4;

the output end of the multi-bit multiplexing module 2 is connected to the first input end of the accumulation module 4, the output end of the multi-bit multiplexing module 1 is connected to the input end of the shift register module 3, and the output end of the shift register module 3 is connected to the second input end of the accumulation module 4.

Specifically, the multi-bit multiplexing multiplication modules 1 and 2 can be implemented by common multipliers, or by the following multi-bit multiplexing multipliers, and the accumulation module can be implemented by an accumulator; the first input end of the multi-bit multiplexing multiplication module 2 is used for inputting a multiplier 1, the second input end is used for inputting the low n bits of the multiplier 2, and the multiplication operation is carried out on the multiplier 1 and the low n bits of the multiplier 2; the first input end of the multi-bit multiplexing multiplication module 1 is used for inputting a multiplier 1, the second input end is used for inputting high m bits of a multiplier 2, and multiplication operation is carried out on the multiplier 1 and the high m bits of the multiplier 2; the shift register module 3 can be implemented by a shift register shifted to the left by n bits, and is configured to shift the output result of the multi-bit multiplexing module 1 to the left by n bits, and the accumulation module 4 is configured to accumulate the product of the multiplier 1 output by the multi-bit multiplexing module 2 and the lower n bits of the multiplier 2 and the product of the multiplier 1 and the higher m bits of the multiplier 2, so as to obtain the operation result of the multiplier 1 and the multiplier 2, and can be used as a multiplier.

Where m and n may be equal or different, and the multiplier 2 is a binary number of m + n bits.

It should be noted that, in many applications, such as neural network operation (especially convolution operation), multiply-add operation is required, and the embodiments of the present invention can well implement multiply-add operation, for example, if it is required to implement operation of a1 × b1+ a2 × b2+ a3 × b3, it is only required to input a1 and b1 into the multi-bit multiplexing multiply-add operation device, then input a2 and b2 into the multi-bit multiplexing multiply-add operation device, and then input a2 and b2 into the multi-bit multiplexing multiply-add operation device, so that the accumulation module 4 can accumulate the result of a1 × b1, the result of a2 × b2, and the result of a3 × b3 together to implement multiply-add operation. The operation of the result of a1 × b1 is as described above and will not be described herein.

It should be noted that, for a multiplier that is only partially valid in low order, for example, "00000110", the high order is 0, and it is not necessary to consider during calculation, and at this time, only the low order needs to participate in calculation, and it is assumed that n is 3, 4, 5, or 6, only the multi-bit multiplexing module 2 needs to operate, and the multi-bit multiplexing module 1 does not need to participate in operation, and at this time, the multi-bit multiplexing module 1 is controlled to stand by the mode selection signal, and the multi-bit multiplexing module 2 operates to implement multiplication, so that the operating time of the multi-bit multiplexing module 1 can be reduced, hardware resources are saved, and power consumption is reduced.

According to the multi-bit multiplexing multiply-add operation device provided by the embodiment of the invention, through the application of the shift register module and the accumulation module, the two multi-bit multiplexing multiply modules are matched to realize multi-bit multiplexing multiply-add operation, various bit widths can be matched according to operation data, namely adaptation is carried out according to the actual bit width of a neural network sparse matrix, power consumption is saved, applicability is improved, chip performance is improved, popularization of a neural network algorithm is effectively promoted, and the multi-bit multiplexing multiply-add operation device is particularly suitable for sparse neural network operation.

In an alternative embodiment, the multi-bit multiplexing multiplication module may be implemented by using a multi-bit multiplexing multiplication structure, and the implementation principle thereof may be the same as that of the multi-bit multiplexing multiply-add operation device, where a plurality of multipliers are used to perform operations on different bits of a multiplier respectively, and then a shift register and an accumulation are used in cooperation to implement a multiplication function, and in particular, referring to fig. 2, the multi-bit multiplexing multiplication module includes: a multiplier 1a, a multiplier 1b, a shift register 1c and an accumulator 1 d;

the output terminal of the multiplier 1b is connected to the first input terminal of the accumulator 1d, the output terminal of the multiplier 1a is connected to the input terminal of the shift register 1c, and the output terminal of the shift register 1c is connected to the second input terminal of the accumulator 1 d. The operation and implementation principle are as described above, and are not described herein again.

It should be noted that the multiplier 1a and the multiplier 1b may be implemented by a common multiplier, or may be implemented by a multi-bit multiplexing structure (see fig. 2), so that a large-scale complex multiply-add operation device can be implemented by a plurality of multipliers through the concatenation of the plurality of multipliers. For example, a 64 × 64 multi-bit multiplexing multiply-add operation device can be implemented by using 32 64 × 2 multipliers, 16 64 × 4 multipliers or 8 and 64 × 8 multipliers in combination with a required number of shift registers and accumulators.

In an alternative embodiment, the multi-bit multiplexing multiply-add operation device is used for performing a × b multiply-add operation, specifically, a and b respectively represent the number of bits of two multipliers, two multi-bit multiplexing multiplication modules are respectively used for performing a × c multiplication operation and a × d multiplication operation, and the shift register module is a c-bit shift register or a d-bit shift register;

where c + d is b, and c and d may be equal or different.

For example, assuming that a is 8, b is 8, and c is d is 4, referring to fig. 3, for an 8x8 multiply-add operation device, an 8x2 multiplier is built, and then 2 8x2 multipliers are used to build an 8x4 multiplier, where one 8x2 multiplier calculates the lower 2 bits, and the other 8x2 multiplier calculates the upper 2 bits, and the final result is accumulated by shifting. Then 2 8x4 multipliers are used for building an 8x8 multiplier, wherein one 8x4 multiplier calculates the lower 4 bits, the other 8x4 multiplier calculates the upper 4 bits, and the final result is shifted and accumulated.

Specifically, wt [7:0] represents weight data having a bit width of 8 bits, dat [7:0] represents input data having a bit width of 8 bits, "< <" represents a shift to the left, and [ ] represents accumulation.

An 8x4 multiplier 20 is used to calculate the product 1 of the lower 4 bits of wt [7:0] and dat [7:0], an 8x4 multiplier 20 is used to calculate the product 2 of the upper 4 bits of wt [7:0] and dat [7:0], the product 2 is shifted left by 4 bits by a shift register 30 to obtain a result 1, and an accumulator accumulates the product 2 and the result 1.

It can be understood by those skilled in the art that when the multiply-add operation is required, only the 8 × 8 multiplier needs to perform the multiplication operation of the first round of two multipliers, and then the two multipliers of the second round are input, and then the accumulator will accumulate the result of the first round and the result of the second round to perform the multiply-add operation.

The 8x2 multiplier 22 is used to calculate the product 3 of the first and second bits of wt [7:0] (i.e., wt [1:0]) and dat [7:0], the 8x2 multiplier 21 is used to calculate the product 4 of the third and fourth bits of wt [7:0] (i.e., wt [3:2]) and dat [7:0], the product 4 is shifted left by 2 bits by the shift register 23 to obtain the result 2, and the accumulator accumulates the product 3 and the result 3 to obtain the product 1 of the lower 4 bits of wt [7:0] and dat [7:0 ].

An 8x2 multiplier 12 is used to calculate the product 5 of the fifth and sixth bits of wt [7:0] (i.e., wt [5:4]) and dat [7:0], an 8x2 multiplier 11 is used to calculate the seventh and eighth bits of wt [7:0] (i.e., wt [7:6]) and the product 6 of dat [7:0], the product 6 is shifted left by 2 bits by a shift register 13 to obtain a result 3, and an accumulator accumulates the products 5 and 3 to obtain the product 2 of the upper 4 bits of wt [7:0] and dat [7:0 ].

It will be understood by those skilled in the art that when the lower 2 bits of wt [7:0] are active and the upper 6 bits are inactive, such as "00000001" or "00000010", the 8x2 multiplier 22 is only required to participate in the operation, and the 8x2 multiplier 21, the 8x2 multiplier 11, the 8x2 multiplier 12, the shift register 23, the shift register 13, the shift register 30 and the accumulator 14 do not need to participate in the operation; when the low 4 bits of wt [7:0] are valid and the high 4 bits are invalid, such as "00001001" or "00001010", the 8x4 multiplier 24 only needs to participate in the work, and neither the 8x4 multiplier 10 nor the shift register 3 needs to participate in the work.

That is, the operation modes of the apparatus shown in fig. 3 include: 8 × 8 mode (i.e. 8x4 multiplier 10 and 8x4 multiplier 20 both participate in work), 8 × 4 mode (i.e. 8x4 multiplier 10 and shift register 3 do not need to participate in work) and 8 × 2 mode (i.e. 8x2 multiplier 21, 8x2 multiplier 11, 8x2 multiplier 12, shift register 23, shift register 13, shift register 30 and accumulator 14 do not need to participate in work), the mode of work is switched by each element controlled by mode selection signal sel; each element in the circuit (e.g. multiplier, shift register, accumulator) has a control terminal which receives a mode select signal sel and controls whether the element is active or not depending on this signal.

Wherein the mode select signal sel is determined according to the number of significant bits of the multiplier.

Specifically, referring to fig. 4, sel is a mode selection signal, wt is a weight data input, 1 parameter in the 8x8 mode, and high and low 4 bits in the 8x4 mode respectively represent 2 parameters; the 8x2 mode represents 4 parameters. dat0/1/2/3 is a data input, representing the same data. The final result is output as follows:

when sel ═ 0, 8 × 8 mode: result is wt dat 0;

when sel ═ 1, 8 × 4 mode, result ═ wt [3:0] × dat0+ wt [7:4] × dat 1;

2, 8x2 mode,

result＝wt[1:0]*dat0+wt[3:2]*dat1+wt[5:4]*dat2+wt[7:6]*dat3；

sel is derived from the chip state machine from the input data.

For better understanding of the present application, the present invention will be described in detail by taking convolution operation as an example:

the core of the convolution operation is a matrix multiply-add operation, for a2 × 2 convolution kernel, the core operation is a1 × b1+ a2 × b2+ a3 × b3+ a4 × b4, assuming that a1, b1, a2, b2, a3, b3, a4 and b4 are 8-bit binary numbers, in conjunction with fig. 3, a1 and b1 are first input to the 8 × 8 multiplier, assuming that the lower 4 bits of b1 are valid, at this time, the input sel signal is controlled so that 8x4 multiplier and shift register 3 are in a standby state, 8 × 2 multiplier 22 receives the lower two bits of a1 and b1, 8 × 2 multiplier 21 receives the third bit and the fourth bit of a1 and b1, shift register 24 shifts the output of 8 × 2 multiplier 21 to the left 2 bits, accumulator 24 adds the output of shift register 24 and the output of 8 × 2 multiplier 22, and inputs the result to accumulator 30; then, a2 and b2 are input into the 8 × 8 multiplier, and assuming that the low 2 bits of b2 are valid, at this time, a sel signal is input to control so that none of the 8 × 2 multiplier 21, the 8x2 multiplier 11, the 8x2 multiplier 12, the shift register 23, the shift register 13, the shift register 30 and the accumulator 14 need to participate in working and be in a standby state, the 8 × 2 multiplier 22 receives the low two bits of a2 and b2, the accumulator 24 transmits the output of the 8 × 2 multiplier 22 to the accumulator 30, and the accumulator 30 accumulates the product of a1 × b1 and the product of a2 × b 2; and then a3 and b3 are input into the 8 × 8 multiplier, a4 and b4 are input into the 8 × 8 multiplier after the calculation is finished, and finally the result of a1 × b1, the result of a2 × b2, the result of a3 × b3 and the result of a4 × b4 are accumulated in an accumulator to realize matrix multiply-add operation.

It is worth to be noted that, through a great amount of research and experiments, the applicant finds that although the overall model of the neural network is large, certain sparsity exists inside the neural network, and the complexity of the model can be greatly simplified through means such as compression. Meanwhile, the storage and calculation integrated technology can meet the requirement of high-performance neural network operation, but the regularity of the storage and calculation integrated array is not matched with the sparsity characteristic of the neural network, so that resource waste is easily caused. Therefore, the design of the chip can be optimized by combining the characteristics of the neural network model, and the neural network model is decomposed, so that the dense part of the neural network model is realized by storing and calculating an integrated array; for the arithmetic operation of the sparse part, because of the uncertainty of bit width decomposed by a neural network model, the high-efficiency and low-power consumption realization is difficult to realize based on the current multiply-add operation circuit, and therefore the multi-bit multiplexing multiply-add operation device provided by the embodiment of the invention can be adopted to realize, and the optimization can be respectively carried out.

An embodiment of the present invention further provides a neural network operation system, referring to fig. 5, the neural network operation system includes: a unified arithmetic unit 100 for performing unified arithmetic core operations, the above-mentioned multi-bit multiplexing multiply-add arithmetic unit 200 for performing sparse arithmetic core operations, a shift register unit 300 for shifting, and an accumulation unit 400;

the input end of the calculation and integration operation device 100 receives input data, and the output end is connected to the first input end of the accumulation device 500; the input terminal of the multi-bit multiplexing multiply-add operation device 200 receives the input data, the output terminal thereof is connected to the input terminal of the shift register device 300, and the output terminal thereof is connected to the second input terminal of the accumulation device 500.

Specifically, when the neural network operation is implemented by using the storage and computation integration technology, if the weight distribution of the neural network algorithm is too small or too large, or the input signal is too small or too large, the analog voltage/current output value of the memory cell array may be too small or too large, and may exceed the lower limit or upper limit range of the ADC; usually, the ADC has the highest quantization precision for the middle value and has a poor quantization precision for the two sides, and when the input of the ADC exceeds the lower or upper range, the corresponding output is directly truncated to the minimum or maximum value, thereby reducing the operation precision. In order to solve the problem, the weight matrix of the neural network which is distributed unevenly on the whole can be multiplied or multiplied, so that the signal obtained after the processed weight matrix is stored in the memory cell array for memory calculation is in the effective range of the ADC (the ADC is arranged behind the memory cell array and is used for converting the output of each memory cell column into a digital signal), and further the operation precision is improved. For an array with uneven weight distribution, some larger weight values in the increased weight array may exceed the upper limit of the number of bits (may also be referred to as overflow bits after shifting), at this time, the overflow bits are intercepted, the part which is not intercepted forms a weight matrix (may also be referred to as sparse weight data wt [7:0]), the intercepted part is filled with zero to form a sparse matrix, at this time, the weight matrix is input to the integral computing device 100, the integral computing device 100 performs matrix multiplication and addition budgeting on the weight matrix and the input data, the sparse matrix is input to the multi-bit multiplexing multiplication and addition computing device 200, the multi-bit multiplexing multiplication and addition computing device 200 performs multiplication and addition operation on the sparse matrix and the input data (the operation principle is referred to above and is not described herein again), and then the shift register device 300 shifts the output result of the multi-bit multiplexing multiplication and addition computing device 200, the accumulation device 500 accumulates the output results of the integral arithmetic device 100 and the multi-bit multiplexing multiply-add arithmetic device 200.

The operation of the integrated storage and computation operation device 100 is to store the weight matrix in the memory cell array therein, input a data stream to the memory cell array in the application stage, perform analog vector-matrix multiplication operation on the input data stream and the weight matrix in the memory cell array, transmit the operation result to the ADC after the memory cell array in the form of output analog voltage/current of the memory cell array, and convert the analog voltage/current into a digital signal, which is the output of the integrated storage and computation operation device 100.

It should be noted that, through a lot of research by the applicant, it is found that, generally, the number of bits of the neural network model is relatively large and the model has sparseness, and the chip implementation can compress the model precision in order to reduce the computation amount. For example, assume that the neural network model precision is 12 bits and the computational-integral chip is 8 bits. When the neural network operation is realized by using the storage and computation integrated chip, the weight precision of the neural network cannot be completely expressed by using an 8-bit chip. If the parameter is forced to be truncated to 8 bits, the inference result is likely to be inaccurate due to the reduction of precision. In order to solve this problem, under the condition of keeping the 8-bit computation capability of the computation-integrated chip, the weight parameters (weights) of the neural network may be decomposed into two parts, one part is the lower 8-bit parameters, and the other part is the upper 4-bit parameters. There are various methods for decomposition, and this patent is not limited thereto. Then the two parameters are processed by a storage and calculation integrated chip arithmetic device and a sparse arithmetic device respectively, and then the results are accumulated after being shifted properly, and the results are equal to the results of the direct operation of the original parameters. Generally speaking, the number of high-precision parameters is small, and most of the parameters can be represented by 8 bits, so that most of the high-order parameters decomposed by the neural network can be regarded as a sparse matrix. After decomposition, the non-sparse weight parameters of the neural network are directly stored in the storage and calculation integrated array; while the sparse weight matrix (see fig. 6, the same filled block represents the non-0 parameter value in the same column) is typically stored in main memory and transmitted to the multi-bit multiplexing multiply-add unit 200 via a bus.

Due to the sparsity of the sparse matrix, only the non-0 parameter in the matrix and the corresponding data address need to be input to the multi-bit multiplexing multiply-add operation device 200. The multi-bit multiplexing multiply-add operation device 200 takes the input data from the input buffer dac _ fifo according to the data address and performs a convolution operation with the weight data wt. Sparse weight data and addresses are input in frames over the AHB bus. Referring to fig. 7, a frame corresponds to a column of the sparse matrix, and includes a frame header, a data segment, an address segment, and a frame trailer. The frame header contains address additional information; the data segment is a transmitted weight data value; the address field transmission weight corresponds to the address of the input data in the buffer, and the end of frame indicates the completion of a column of data, which is input to the accumulator as an eoc signal. The specific transmission method is not limited.

By adopting the technical scheme, on the basis of improving the operation precision, the energy consumption is reduced, namely the high-precision low-energy-consumption operation of the neural network operation is realized.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Although the present invention has been described with reference to the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but may be embodied or carried out by various modifications, equivalents and changes without departing from the spirit and scope of the invention.

12页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：随机数产生器

Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus

相关技术

网友询问留言