Convolution calculation method based on ferroelectric transistor FeFET

文档序号:988108 发布日期:2020-11-06 浏览:18次 中文

阅读说明:本技术 基于铁电晶体管FeFET的卷积计算方法 (Convolution calculation method based on ferroelectric transistor FeFET ) 是由 黄如 刘姝涵 黄芊芊 于 2020-07-10 设计创作,主要内容包括:本发明提出了一种基于铁电晶体管FeFET的卷积计算方法,属于卷积神经网络中卷积计算技术领域。该方法首先构建一个由m*n个铁电晶体管FeFET组成的存储阵列,一个m位的移位寄存器和一个n位的移位寄存器;其中,铁电晶体管FeFET排列成m行n列的存储阵列,用于存储输入特征图和存内计算并输出卷积运算的结果;m位移位寄存器用于存储卷积核分解的列向量,同时给字线输出电压;n位移位寄存器用于存储卷积核分解的行向量,同时给位线输出电压。本发明与传统的将卷积计算转化成矩阵乘法的实现方式相比,回避了复杂的数据调度和冗余数据存储,硬件开销大大降低,为卷积神经网络的硬件实现提供了一个新的设计思路。(The invention provides a convolution calculation method based on a ferroelectric transistor FeFET, and belongs to the technical field of convolution calculation in a convolution neural network. Firstly, constructing a storage array consisting of m x n ferroelectric transistors FeFET, an m-bit shift register and an n-bit shift register; the ferroelectric transistors FeFET are arranged into a storage array with m rows and n columns and used for storing an input characteristic diagram and memory calculation and outputting a convolution operation result; the m-bit shift register is used for storing the column vector decomposed by the convolution kernel and outputting voltage to the word line; the n-bit shift register is used for storing the row vector decomposed by the convolution kernel and outputting voltage to the bit line. Compared with the traditional implementation mode of converting convolution calculation into matrix multiplication, the method avoids complex data scheduling and redundant data storage, greatly reduces hardware cost, and provides a new design idea for hardware implementation of the convolution neural network.)

1. A convolution calculation method based on a ferroelectric transistor (FeFET) is characterized by comprising the following steps:

1) constructing a storage array consisting of ferroelectric transistors FeFET, an m-bit shift register and an n-bit shift register; the ferroelectric transistors FeFETs are arranged into a memory array with m rows and n columns, the grid ends of devices in each row are connected to form a word line, the drain ends of devices in each column are connected to form a bit line, the source ends of devices in each column are connected to form a source line, and all the source lines are connected together to form an output end; the output end of each bit of the m-bit shift register is respectively connected with a corresponding word line; the output end of each bit of the n-bit shift register is respectively connected with the corresponding bit line;

2) the data '0' corresponds to the low threshold state of the storage device, the data '1' corresponds to the high threshold state of the storage device, and the input characteristic graphs are written into the storage array one by one; decomposing the convolution kernel into a product of two vectors, respectively storing the two vectors into two shift registers, and respectively applying the two vectors as voltage inputs to bit lines and word lines corresponding to convolution windows;

3) except the bit lines and the word lines corresponding to the convolution windows, the voltages of the rest bit lines and the rest word lines are set to be zero, and the output current value is read from the output port to obtain a result of the convolution calculation; and continuously moving the convolution window until the convolution operation of the convolution kernel and the input characteristic diagram is completed, and obtaining an output characteristic diagram.

2. A method of convolution calculation based on a ferroelectric transistor FeFET as in claim 1, characterized in that the convolution kernel is decomposed into a product matrix of one column vector and one row vector.

3. The method of claim 1, wherein the convolution kernel is trained to be a decomposable matrix by applying a constraint condition during training, or to be approximately decomposed by a loss function minimum, or to be decomposed to be a linear superposition of a series of decomposable matrices, if the convolution kernel is not directly decomposable.

4. A method of convolution calculation based on a ferroelectric transistor FeFET as in claim 1, characterized in that a shift register is used to shift the voltages of the bit lines and/or word lines in the convolution window.

5. A convolution calculation method based on a ferroelectric transistor FeFET as in claim 1, characterized in that an n-bit shift register is used to store the row vector of the convolution kernel decomposition while outputting the voltage to the bit line.

6. A convolution calculation method based on a ferroelectric transistor FeFET as in claim 1, characterized in that an m-bit shift register is used to store the column vector of the convolution kernel decomposition while outputting a voltage to the word line.

7. A convolution calculation method based on a ferroelectric transistor FeFET as in claim 1, characterized in that the FeFET device is based on an MFMIS, MFIS or MFS structure.

8. The convolution calculation method based on a ferroelectric transistor FeFET as claimed in claim 1, wherein said FeFET device employs perovskite type ferroelectric, ferroelectric polymer or HfO2 doped ferroelectric materials such as HfO2 doped Zr, HfO2 doped Al, HfO2 doped Si, HfO2 doped Y, etc.

Technical Field

The invention relates to a physical implementation mode of convolution calculation in a convolution neural network, in particular to a convolution calculation method based on a ferroelectric transistor FeFET.

Background

The traditional von neumann computing architecture faces the bottleneck of a memory wall (memory wall), data needs to be carried back and forth between a computing unit and a storage unit, the speed and the power consumption of a system are seriously influenced, the improvement of the computing capacity of the system gradually tends to be saturated, the requirements of mass data processing and intelligent tasks cannot be met, and therefore a new computing architecture needs to be searched. Meanwhile, the human brain is a very high-efficiency intelligent system, and complex intelligent calculation can be completed by using extremely low power consumption, so that the neural network calculation architecture is generated by the inspiration of the human brain operation mode.

The basic computational units of the human brain are neurons and synapses, which are formed by neuron-synapse-neuron connections, between which information is transmitted in the form of pulses. Synaptic plasticity is the basis for memory and learning in the human brain, and neurons are responsible for integrating the impulse signals from synapses and outputting new impulse signals to convey information. The neural network built based on the method is a distributed computing network with integration of storage and computation and high parallelism.

The artificial neural network draws the concepts of neurons and synapses and the connection mode thereof for reference, carries out functional module type high numerical abstraction on the human brain biological system, and introduces a plurality of mathematical calculations which do not exist in the human brain on the basis so as to achieve the aim of pursuing better comprehensive performance expression of the system. In the artificial neural network, synapses are still responsible for learning and memorizing functions, but the stored weights are updated in various ways, and the most widely used method is a back-propagation algorithm. The processing of signals by neurons is abstracted into various forms of nonlinear activation functions, and abstract numerical calculations are performed on data from synapses and activation values are output. This constitutes the simplest multilayer perceptron model mlp (multilayered perceptron). Driven by the pursuit of higher performance, MLP-based evolution has been increasingly modified to introduce more additional mathematical calculations, thereby creating a wide variety of Neural networks, of which Convolutional Neural networks (Convolutional Neural networks) are the most widely used. The convolutional neural network improves the calculation efficiency of the network by performing convolution operation on input data and synaptic weights. The performance of the current convolutional neural network in many aspects exceeds that of the human brain, and the convolutional neural network is widely used in many fields such as image understanding and voice recognition.

However, the high accuracy of the convolutional neural network in various tasks is at the cost of high computational complexity, which also means that the hardware overhead for implementing the convolutional neural network is usually large, which hinders its application in the scenarios of edge computation, AIoT, etc. The key of the convolutional neural network, namely convolutional calculation, occupies 90% of the convolutional neural network calculation, and the hardware overhead is particularly large. Therefore, the convolution computation hardware implementation with low hardware overhead needs to be researched. Common convolution calculation acceleration algorithms are the following: conversion to matrix multiplication, Fast Fourier transform (Fast Fourier transform), Winograd algorithm, etc. However, the latter algorithms are usually only used on GPU/CPU platforms because they are difficult to implement in hardware; the convolution calculation is converted into matrix multiplication, so that not only is complex data scheduling required in the conversion process, but also a large amount of redundant data generated after conversion needs to be stored by additionally consuming a large number of registers.

With the wider application of the neural network storage and calculation integrated structure, more and more novel memory devices are deeply researched in the hardware implementation of the neural network. Ferroelectric transistors fefets are gaining attention as a storage technology with many advantages in the hardware implementation of neural networks. The ferroelectric transistor FeFET is formed by superimposing a layer of ferroelectric material on the gate oxide layer of the MOSFET. When the ferroelectric polarization points to the direction of the channel of the device, positive charges are induced on the gate oxide layer of the MOSFET, so that the channel conductance is increased, the current is increased, and the threshold voltage of the device is reduced. Similarly, when the ferroelectric polarization is opposite to the channel direction of the device, the channel conductance is reduced, and the current is reduced, which is represented by the increase of the threshold voltage of the device. Thus, different polarization states of ferroelectrics correspond to different threshold voltages of the FeFET device, and channel currents at the same terminal voltage are different, so that the FeFET can be used as a memory device to store different states. Applying a voltage pulse to the gate of the FeFET can change the polarization of the ferroelectric material by the action of the electric field, thereby changing the threshold voltage of the FeFET, i.e. changing its memory state. Fefets have been studied for use in convolutional neural networks by virtue of their excellent device characteristics, such as low write power consumption and low read current. The conventional scheme is adopted in the researches, after convolution calculation is converted into matrix multiplication operation, cross-structure crossbar storage arrays are adopted to realize memory calculation of matrix multiplication, and the purpose of hardware acceleration is achieved. However, these studies also use the two-terminal device mode, and use one terminal of the FeFET as a vector input, and the other terminal outputs the result of the multiply-add calculation, and place the third terminal at a fixed voltage, which does not fully embody the advantage of one more port of the FeFET as a three-terminal device.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a convolution calculation method based on a ferroelectric transistor FeFET; compared with the traditional implementation mode of converting convolution calculation into matrix multiplication, the convolution calculation method based on the ferroelectric transistor FeFET avoids complex data scheduling and redundant data storage, directly realizes convolution calculation on hardware for the first time, greatly reduces hardware overhead compared with the traditional scheme, and provides a new design idea for hardware implementation of a convolution neural network.

The invention aims to provide a convolution calculation method based on a ferroelectric transistor FeFET, which comprises the following steps:

first, the hardware design based on the ferroelectric transistor FeFET includes: a cross bar structure memory array composed of m x n ferroelectric transistors FeFET, an m-bit shift register and an n-bit shift register; the FeFET storage array is used for storing an input characteristic diagram and memory calculation and outputting a convolution operation result, the ferroelectric transistors FeFETs are arranged into an array of m rows and n columns, the grid ends of devices in each row are connected to form a word line, the drain ends of devices in each column are connected to form a bit line, the source ends of devices in each column are connected to form a source line, and all the source lines are connected together to form an output end; the m-bit shift register is used for storing the column vectors decomposed by the convolution kernels and outputting voltage to the word lines, and the output end of each bit of the m-bit shift register is connected with the corresponding word line; the n-bit shift register is used for storing row vectors decomposed by the convolution kernel and outputting voltage to the bit lines, and the output end of each bit of the n-bit shift register is connected with the corresponding bit line.

Secondly, writing the input characteristic diagrams into the storage array one by one correspondingly, decomposing the convolution kernel into the product of two vectors, respectively storing the two obtained vectors into two shift registers, and respectively applying the two vectors as voltage input to bit lines and word lines corresponding to the convolution windows, wherein data 0 corresponds to a low threshold state of the storage device, and data 1 corresponds to a high threshold state of the storage device; except the bit lines and the word lines corresponding to the convolution windows, the voltages of the other bit lines and the word lines are set to be zero, the output current value is read from the output port, a result of convolution calculation is obtained, the convolution windows are continuously moved through the shifting of the shift register until the convolution operation of the convolution kernel and the input characteristic diagram is completed, and the output characteristic diagram is obtained.

The ferroelectric transistor used in the convolution calculation method of the present invention has a storage characteristic typical of FeFET: in the write characteristic, the threshold voltage of the FeFET, i.e., different memory states, can be changed by applying a write voltage on the gate; in the aspect of reading characteristics, the source-drain current of the FeFET in different storage states can be obtained by applying reading voltage to the gate and the source-drain. The FeFET used in the convolution calculation hardware design of the present invention may be any one of FeFET devices having the above-described typical characteristics, which uses a conventional ferroelectric material such as perovskite ferroelectric (PZT, BFO, SBT), ferroelectric polymer (P (VDF-TrFE)), or HfO2 doped ferroelectric material such as HfO2 doped with zr (hzo), HfO2 doped with al (hfalo), HfO2 doped with Si, HfO2 doped with Y, and the like.

The convolution calculation hardware design based on the ferroelectric transistor FeFET has the following beneficial effects:

the invention utilizes the characteristics of three ends of the FeFET device, low write-in power consumption and the like, avoids data scheduling and redundant data storage brought by converting convolution calculation into matrix multiplication, directly realizes convolution calculation on hardware for the first time, greatly reduces hardware overhead compared with the traditional scheme, and provides a new design idea for realizing the hardware of the convolution neural network.

Drawings

FIG. 1 is a schematic flow chart of a convolution calculation method based on a ferroelectric transistor FeFET according to the present invention;

FIG. 2 is a schematic diagram of an array of ferroelectric transistors, FeFET, in an embodiment of the present invention;

Detailed Description

The invention will be further elucidated by means of specific embodiments in the following with reference to the drawing.

As shown in fig. 2, the present embodiment is a convolution calculation hardware design implemented based on HfO2 zr (hzo) doped ferroelectric transistor fefets, and includes a cross bar structure memory array composed of m × n ferroelectric transistors fefets, an m-bit shift register and an n-bit shift register; the FeFET storage array is used for storing an input characteristic diagram and memory calculation and outputting convolution operation results, the ferroelectric transistors FeFETs are arranged into an array with m rows and n columns, the grid ends of devices in each row are connected to form word lines WL 1-m, the drain ends of devices in each column are connected to form bit lines BL 1-n, the source ends of devices in each column are connected to form source lines SL 1-n, and all the source lines SL are connected together to form an Output end Output; the m-bit shift register is used for storing a column vector decomposed by a convolution kernel and simultaneously outputting voltage to word lines WL 1-m, and the output end of each bit of the m-bit shift register is respectively connected to WL 1-m; the n-bit shift register is used for storing row vectors decomposed by the convolution kernel and outputting voltage to bit lines BL 1-n, and the output end of each bit of the n-bit shift register is connected to BL 1-n respectively.

The convolution calculation process based on the ferroelectric transistor FeFET of the present invention is divided into five steps. In a first step, an input signature is written into a crossbar memory array. As shown in FIG. 1, data "0" corresponds to a low threshold state of the memory device, and data "1" corresponds to a high threshold state of the memory device, with the input signature being written into the memory array one by one. Second, the convolution kernel is decomposed into the product of two vectors. Some special matrices can be decomposed into a product of a column vector and a row vector, and for non-decomposable matrices, solutions such as training a convolution kernel into a decomposable matrix with constraint conditions during training, performing approximate decomposition according to a loss function minimum, decomposing into a linear superposition of a series of decomposable matrices, and the like can be adopted. And thirdly, respectively storing the two vectors obtained in the last step into two shift registers, inputting the two vectors as voltage, respectively applying the voltage to the bit line BL and the word line WL corresponding to the convolution window, setting the voltages of the rest bit lines BL and the word lines WL to zero, and reading an Output current value from an Output port Output, namely, the result obtained by the calculation. And fourthly, moving the convolution window to obtain the next calculation result, wherein the corresponding operation on hardware is to shift the voltage of the BL and/or the WL in the previous step by using a shift register so as to simulate the movement of the convolution kernel on the input characteristic diagram. Similarly, the Output current is read from the Output port Output to obtain the result of the calculation. And fifthly, repeating the fourth step, and continuously carrying out shift operation until the convolution operation of the convolution kernel and the input characteristic diagram is completed, so as to obtain an output characteristic diagram.

In summary, the convolution calculation hardware design based on the ferroelectric transistor FeFET in the invention utilizes the characteristics of three terminals and low write power consumption of the FeFET device, and the whole convolution calculation process can be realized only by one FeFET storage array and two shift registers, compared with the traditional scheme of converting convolution calculation into matrix multiplication, the convolution calculation hardware design not only avoids data scheduling in the conversion process, but also saves hardware overhead of registers for storing redundant data, directly realizes convolution calculation on hardware for the first time, greatly reduces the hardware overhead compared with the traditional scheme, and provides a new design idea for hardware realization of the convolutional neural network.

Finally, it is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于多层次并行策略的集成电路电磁响应计算方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类