Convolutional neural network accelerator based on FPGA

文档序号：1938651 发布日期：2021-12-07 浏览：21次中文

阅读说明：本技术 一种基于fpga的卷积神经网络加速器 (Convolutional neural network accelerator based on FPGA ) 是由葛志来陈智萍朱晓梅于 2021-10-12 设计创作，主要内容包括：本发明公开一种基于FPGA的卷积神经网络加速器,该卷积神经网络的网络结构包括输入层、第一卷积层、第二卷积层、第一池化层、第二池化层、第一全连接层、第二全连接层和输出层,输入层,接收图像,图像依次经过第一卷积层、第一池化层、激活函数、第二卷积层、第二池化层、激活函数、第一全连接层、第二全连接层运算后,得到多个特征值,然后在Softmax分类层中将特征值概率归一化得出最大概率值对应的下标即为分类结果。本发明实现了高速度的FPGA加速器,在权重数量和准确率之间做了一个很好的折中。(The invention discloses a convolutional neural network accelerator based on FPGA, the network structure of the convolutional neural network comprises an input layer, a first convolutional layer, a second convolutional layer, a first pooling layer, a second pooling layer, a first full-link layer, a second full-link layer and an output layer, the input layer receives images, the images are sequentially operated by the first convolutional layer, the first pooling layer, an activation function, the second convolutional layer, the second pooling layer, the activation function, the first full-link layer and the second full-link layer to obtain a plurality of characteristic values, and then subscripts corresponding to maximum probability values obtained by normalizing the probability of the characteristic values in a Softmax classification layer are classification results. The invention realizes the FPGA accelerator with high speed, and makes a good compromise between the weight quantity and the accuracy.)

1. A convolutional neural network accelerator based on FPGA is characterized in that the network structure of the convolutional neural network comprises an input layer, a first convolutional layer, a second convolutional layer, a first pooling layer, a second pooling layer, a first full-connection layer, a second full-connection layer and an output layer,

the image processing method comprises the steps that an input layer receives an image, the image is sequentially subjected to operation through a first convolution layer, a first pooling layer, an activation function, a second convolution layer, a second pooling layer, an activation function, a first full-link layer and a second full-link layer to obtain a plurality of characteristic values, and then the probability of the characteristic values is normalized in a Softmax classification layer to obtain subscripts corresponding to maximum probability values, namely classification results;

the first convolution layer and the second convolution layer both adopt a convolution expansion mode of parallel in channels and serial between channels, the convolution result of a single channel is output to a buffer corresponding to the convolution layer, and the buffer obtains the final convolution result of the corresponding convolution layer in a mode of repeatedly reading, summing and storing;

a volume of lamination layer, a pooling layer and an activation function are taken as a lamination layer, a buffer area is arranged between the two lamination layers, and a characteristic diagram output by lamination, a corresponding bias unit and a width parameter are stored in the buffer area and are used for inputting the next lamination cycle reading;

the full-connection layer starts to read the feature diagram output by the previous-level hierarchical output and the corresponding bias unit and width after the previous-level hierarchical output is stored, the feature diagram and the width are multiplied by the DSP multiplier, then the product value of the current neuron is summed, and the bias unit is added to be used as the final neuron output when the summation is finished.

2. The FPGA-based convolutional neural network accelerator of claim 1, wherein the quantization and inverse quantization are performed on the weight parameters of the convolutional layer and the pooling layer by using a quantization algorithm with the quantization of float32 to int8, and the method comprises the following steps:

a. calculating a scale transformation parameter s and a 0-value offset parameter z:

according to the interconversion relation between the floating point number x and the fixed point number:

wherein x represents a floating point number to be quantized, q (x) represents a fixed point number after x quantization, floor () is used for truncating a decimal, s represents scale, the function is scale scaling, the floating point number is scaled in a fixed interval, and z represents zero point, namely offset of the floating point number after 0 quantization;

obtaining a scale transformation parameter s and a 0 value offset parameter z required by quantization, wherein the calculation method is as follows:

wherein x_maxAnd x_minMaximum and minimum values, p, of the floating-point number x, respectively_maxAnd p_minMaximum and minimum values of the quantization value p (x), respectively;

b. when no bias exists, the convolution or pooling operation formula is as follows:

wherein N represents the number of convolution kernel parameters, x_iIs input data, w_iIs the weight, y represents the convolution output of the layer, x_i、w_iAnd y are both float32 type floating point numbers;

for x_iAnd w_iQuantization is performed to obtain the formula:

x can be quantized by inverse quantization_iAnd w_iExpressed as the formula:

substituting formula (5) into formula (3) to obtain formula:

the convolution output y is a floating point number, and when the input is input to the next layer of convolution, quantization is also needed, and y quantization and inverse quantization are as follows:

substituting equation (7) into equation (6) yields equation:

the data which is output to the next layer by each layer and needs to be used is the data q (y) after y quantization, and the formula (8) is transformed to obtain the formula:

obtaining the quantization data needed by the next layer, and finishing the function of the current layer;

the floating point number exists in the formula (9)Order toThen M is a floating point number, which is equal to 2^-nM₀Wherein n and M₀Are all positive integers, n is between 0 and 15, such that M and 2^-nM₀Error is 2^-16In this case, formula (9) is rewritten as:

wherein M is₀(q(w_i)-z_w)(q(x_i)-z_x) And z_yBelonging to integer arithmetic, 2^-nShifting left by n bits in FPGA;

c. when the offset b is added, equation (9) becomes equation:

wherein q (b) is the result of the quantification of b, s_bIs scale, z of b_bZero point of b;

q (b) store with int32, while letting s_b＝s_xs_wThen, the quantization result required by the next layer is expressed as formula:

3. the FPGA-based convolutional neural network accelerator of claim 2, wherein when calculating the scale, the maximum value and the minimum value of the values to be quantized are required, the maximum value and the minimum value of the feature map of each layer are tested by using at least 100 parts of data, and the obtained scale result is used for predicting the scale;

after obtaining M, find 2 closest to M^-nM₀Let n be between 0 and 15, M₀GetAnda number that makes the error smaller; wherein the second fully-connected layer is the last layer without looking for 2^-nM₀Directly discarding during calculation

4. The FPGA-based convolutional neural network accelerator of claim 1, wherein the convolutional layer uses a 5 x 5 convolutional kernel, the pipeline generates a 5 x 5 region to be convolved, and the shift ram shift register is used as a buffer area to generate a 5 x 5 region to be convolved and a convolutional kernel;

when a single shift ram is enabled by a module, when a rising edge of a clock comes, storing data of an input end into the shift ram, sequentially shifting original data in the shift ram to the left, and discarding the last data; the 4 shift rams are connected end to achieve the effect of overall data shift, and the output of the 4 shift rams is added with the initial input to obtain a column in a 5 × 5 matrix; the obtained 5 x 5 matrix needs 25 registers to receive data output by five shift rams, and a pipeline generates a 5 x 5 area to be convolved and a convolution kernel sum in a shifting receiving mode;

after receiving a convolution kernel of 5 multiplied by 5 and a to-be-convolved area, parallelly expanding the convolution kernel, parallelly performing 25 multiplication operations through an exemplary 25 DSP fixed-point multiplier, obtaining a product operation result through the time delay of 1 clock, and performing summation operation on the 25 data, wherein the bit width of the data is 16 bits; decomposing the summation operation of convolution operation through a 6-stage production line during summation operation, wherein the used expansion data are all 0, firstly expanding 25 data into 26 data, and carrying out pairwise summation on the 26 data to obtain 13 17-bit data which is a first-stage production line; expanding 13 data into 14 data, and summing the 14 data in pairs to obtain 7 18-bit data which is a second-stage production line; expanding the 7 data into 8 data, summing two data by two to obtain 4 19-bit data, and taking the data as a third-stage production line; summing every two of the 4 data to obtain 2 20-bit data, and taking the data as a fourth-stage production line; and summing every two of the 2 data to obtain 1 21-bit data, serving as a fifth-stage production line, and finally adding 32-bit offset to obtain a final convolution result.

5. The convolutional neural network accelerator based on FPGA of claim 4, wherein the pooling layer is 2 × 2 Maxpooling, a shift ram with a width of 32bit and a depth of half the length of the channel of the previous layer is first set, a row of data of a matrix is continuously generated through the shift ram, a row of data obtained by the shift ram is shifted and stored by four registers, so that a 2 × 2 pooling window of the pipeline is generated, the step size of pooling is set to be 2, the 2 × 2 window generated by the pipeline is effective at intervals, after the 2 × 2 window is obtained, the four numbers are compared with each other through two combinational logic to obtain the maximum value, the obtained two outputs are compared with each other through the combinational logic to output the maximum value, and the obtained result is the output of the pooling layer.

6. The FPGA-based convolutional neural network accelerator of claim 1, wherein the data set used for training of the convolutional neural network is MNIST data set, the MNIST data set is first downloaded from torchvision, epoch is set to 15, batchsize is set to 64, learning rate is set to 0.0001, error uses cross entropy, and gradient descent is performed by random gradient descent.

Technical Field

The invention belongs to the technical field of neural networks, and particularly relates to a convolutional neural network accelerator based on an FPGA (field programmable gate array).

Background

A Convolutional Neural Network (CNN) is a feed-forward Neural Network, which mainly includes Convolutional layers, pooling layers, full-connection layers, and the like, and its weight sharing reduces the number of parameters required by a conventional full-connection type Network. The CNN can extract depth features in the image, avoid excessive data processing, and maintain a high recognition rate. In recent years, convolutional neural networks have achieved significant success in the fields of speech recognition, target detection, face recognition, and the like.

The convolutional neural network is a calculation intensive model, the amount of calculation caused by convolution operation of the core of the convolutional neural network is extremely large, the calculation capacity of the portable embedded device is difficult to cope with the large amount of calculation, and the acceleration of the neural network by using low-power-consumption hardware is a current research hotspot. As a programmable device, a Field Programmable Gate Array (FPGA) contains abundant logic resources, has the advantages of high performance, low power consumption and reconfigurability, and can realize a large amount of independent convolution operations in CNN in a multipath parallel mode. In 1994, DS real first built a neural network accelerator with FPGA, and as the neural network is not valued at that time, FPGA-based accelerator technology is not valued. In the ILSVRC challenge race of 2012, milestone-like networks AlexNet appeared and neural networks lifted a hot tide. As the amount of calculation and the number of parameters of the neural network are increasing day by day, researchers begin to search hardware platforms which can be programmed repeatedly and have low power consumption, the FPGA deployment CNN begins to appear in various international conferences and periodicals widely until 2018, and the number of papers published on IEEE EXPLORE by the direction of the neural network accelerator based on the FPGA reaches 69.

However, the storage space and resources on the FPGA development board are limited, and taking a classical convolutional neural network LeNet for identifying an MNIST handwritten digit data set as an example, the identification rate can reach more than 98%, but the total weight of weight parameters reaches more than 430000, which consumes more storage space and resources on the FPGA development board.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the defects of the prior art, the invention provides a lightweight convolutional neural network acceleration system based on an FPGA platform to reduce the quantity of weight parameters of CNN and save the resource consumption on an FPGA chip.

The technical scheme is as follows: the network structure of the convolutional neural network accelerator based on the FPGA comprises an input layer, a first convolutional layer, a second convolutional layer, a first pooling layer, a second pooling layer, a first full-connection layer, a second full-connection layer and an output layer,

The invention further preferably adopts the technical scheme that a quantization algorithm that float32 is quantized to int8 is adopted for the weight parameters of the convolutional layer and the pooling layer to carry out quantization and inverse quantization, and the specific method comprises the following steps: