Decoder for analog neural memory in deep learning artificial neural network
阅读说明:本技术 用于深度学习人工神经网络中的模拟神经存储器的解码器 (Decoder for analog neural memory in deep learning artificial neural network ) 是由 H·V·特兰 S·洪 A·李 T·乌 H·帕姆 K·恩古耶 H·特兰 于 2019-01-24 设计创作,主要内容包括:本发明公开了用于与人工神经网络中的矢量-矩阵乘法(VMM)阵列一起使用的解码器的多个实施方案。该解码器包括位线解码器、字线解码器、控制栅解码器、源极线解码器和擦除栅解码器。在某些实施方案中,使用解码器的高电压型式和低电压型式。(Various embodiments of a decoder for use with a vector-matrix multiplication (VMM) array in an artificial neural network are disclosed. The decoder includes a bit line decoder, a word line decoder, a control gate decoder, a source line decoder, and an erase gate decoder. In some implementations, a high voltage version and a low voltage version of the decoder are used.)
1. A bit line decoder circuit coupled to a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line, the bit line decoder circuit comprising:
a first circuit for enabling individual bit lines during program and verify operations; and
a second circuit for enabling all bit lines during a read operation.
2. The bit line decoder circuit of claim 1, wherein the second circuit comprises a select transistor and an activation function circuit coupled to each bit line.
3. The bit line decoder circuit of claim 2, wherein the gate of each select transistor is coupled to the same control line.
4. The bit line decoder circuit of claim 1, wherein a negative bias is applied to the word line of each unselected memory cell during program and verify operations or read operations.
5. The bit line decoder circuit of claim 1, wherein each of the non-volatile memory cells is a split gate flash memory cell.
6. The bit line decoder circuit of claim 1, wherein each of the non-volatile memory cells is a stacked gate flash memory cell.
7. The bit line decoder circuit of claim 1, wherein each of the non-volatile memory cells is configured to operate in a sub-threshold region.
8. The bit line decoder circuit of claim 1, wherein each of the non-volatile memory cells is configured to operate in a linear region.
9. The bit line decoder circuit of claim 1, wherein a negative bias is applied to the gate of each unselected bit line decoder during program and verify operations or read operations.
10. A bit line decoder circuit coupled to a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line, the bit line decoder circuit comprising:
a multiplexing circuit, wherein in a first mode the multiplexing circuit enables individual bit lines during program and verify operations, and in a second mode the multiplexing circuit enables all bit lines during read operations.
11. The bit line decoder circuit of claim 10, wherein the multiplexing circuit comprises a select transistor and an activation function circuit coupled to each bit line.
12. The bit line decoder circuit of claim 10, wherein a negative bias is applied to the word line of each unselected memory cell during program and verify operations or read operations.
13. The bit line decoder circuit of claim 10, wherein each of the non-volatile memory cells is a split gate flash memory cell.
14. The bit line decoder circuit of claim 10, wherein each of the non-volatile memory cells is a stacked gate flash memory cell.
15. The bit line decoder circuit of claim 10, wherein each of the non-volatile memory cells is configured to operate in a sub-threshold region.
16. The bit line decoder circuit of claim 10, wherein each of the non-volatile memory cells is configured to operate in a linear region.
17. The bit line decoder circuit of claim 10, wherein a negative bias is applied to the gate of each unselected bit line decoder during program and verify operations or read operations.
18. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line and each memory cell comprises a word line terminal and a source line terminal;
a word line decoder circuit coupled to the word line terminal of the non-volatile memory cell, wherein the word line decoder circuit is capable of applying a low voltage or a high voltage to the coupled word line terminal; and
a source line decoder circuit coupled to the source line terminal of the non-volatile memory cell, wherein the source line decoder circuit is capable of applying a low voltage or a high voltage to the coupled source line terminal.
19. The system of claim 18, wherein each memory cell further comprises an erase gate terminal, the system further comprising:
an erase gate decoder circuit coupled to the erase gate terminal of the non-volatile memory cell, wherein the erase gate decoder circuit is capable of applying a low voltage or a high voltage to the coupled erase gate terminal.
20. A word line driver coupled to a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each row is coupled to a word line and each word line is coupled to the word line driver, the word line driver comprising:
a plurality of select transistors, each of the plurality of select transistors comprising a first terminal, a second terminal, and a gate, wherein the gate of each of the plurality of select transistors is coupled to a common control line and the first terminal of each of the plurality of select transistors is coupled to a different word line, and wherein the second terminal of each of the plurality of select transistors is coupled to one or more bias transistors;
wherein the bias transistor coupled to each of the plurality of select transistors is capable of providing a bias voltage to a single select transistor or all of the select transistors.
21. The word line driver of claim 20, wherein at least one bias transistor coupled to each of the plurality of select transistors is coupled to a common control line.
22. The word line driver of claim 20, wherein each bias transistor coupled to each of the plurality of select transistors is coupled to a different control line.
23. The word line driver of claim 20, wherein each of the bias transistors is coupled to a circuit for decoding a word line address.
24. The word line driver of claim 20, wherein the bias transistor is coupled to a shift register.
25. The word line driver of claim 20, wherein each select transistor is coupled to a capacitor.
26. The word line driver of claim 20, wherein each bias transistor is coupled to a comparator.
27. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line and each memory cell comprises a word line terminal and a source line terminal;
a word line decoder circuit coupled to the word line terminals of the non-volatile memory cells, wherein the word line decoder circuit is capable of applying a low voltage to a coupled word line terminal through a low voltage transistor or a high voltage to a coupled word line terminal through a high voltage transistor, the word line decoder circuit including an isolation transistor coupled to each word line to isolate the high voltage transistor from the low voltage transistor.
28. The system of claim 27, wherein a negative bias is applied to the word line of each unselected memory cell during program and verify operations or read operations.
29. The system of claim 27, wherein each of the non-volatile memory cells is a split gate flash memory cell.
30. The system of claim 27, wherein each of the non-volatile memory cells is a stacked gate flash memory cell.
31. The system of claim 27, wherein each of the non-volatile memory cells is configured to operate in a sub-threshold region.
32. The system of claim 27, wherein each of the non-volatile memory cells is configured to operate in a linear region.
33. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line and each memory cell comprises a word line terminal and a source line terminal;
a word line decoder circuit coupled to the word line terminal of the non-volatile memory cell, wherein the word line decoder circuit is capable of applying a low voltage or a high voltage to the coupled word line terminal; and
a sample-and-hold capacitor coupled to each word line.
34. The system of claim 33, wherein a negative bias is applied to the word line of each unselected memory cell during program and verify operations or read operations.
35. The system of claim 33, wherein the capacitor in the sample-and-hold capacitor is provided by an inherent capacitance of a word line.
36. The system of claim 33, wherein each of the non-volatile memory cells is a split gate flash memory cell.
37. The system of claim 33, wherein each of the non-volatile memory cells is a stacked gate flash memory cell.
38. The system of claim 33, wherein each of the non-volatile memory cells is configured to operate in a sub-threshold region.
39. The system of claim 33, wherein each of the non-volatile memory cells is configured to operate in a linear region.
40. The system of claim 33, wherein the S/H capacitor is used as a voltage source for the word line.
41. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line and each memory cell comprises a word line terminal and a source line terminal; and
a source line decoder circuit coupled to the word line terminal of the non-volatile memory cell, wherein the source line decoder circuit is capable of applying a low voltage or a high voltage to a coupled source line terminal, wherein the source line decoder circuit includes a drive transistor and a monitor transistor.
42. The system of claim 41, wherein the system further comprises a force sensing circuit operating in a closed loop.
43. The system of claim 41, wherein each of the non-volatile memory cells is a split gate flash memory cell.
44. The system of claim 41, wherein each of the non-volatile memory cells is a stacked gate flash memory cell.
45. The system of claim 41, wherein each of the non-volatile memory cells is configured to operate in a sub-threshold region.
46. The system as in claim 41 wherein each of the non-volatile memory cells is configured to operate in a linear region.
47. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns, wherein each column is connected to a bit line and each memory cell comprises a word line terminal and a source line terminal;
a control gate decoder circuit coupled to the control gate terminal of the non-volatile memory cell, wherein the control gate decoder circuit is capable of applying a low voltage or a high voltage to the coupled control gate terminal; and
a sample-and-hold capacitor coupled to each word line.
48. The system of claim 47, wherein a negative bias is applied to the word line of each unselected memory cell during program and verify operations or read operations.
49. The system of claim 47, wherein the sample-and-hold capacitor comprises a control gate capacitance.
50. The system of claim 47, wherein each of the non-volatile memory cells is a split gate flash memory cell.
51. The system of claim 47, wherein each of the non-volatile memory cells is a stacked gate flash memory cell.
52. The system of claim 47, wherein each of the non-volatile memory cells is configured to operate in a sub-threshold region.
53. The system as in claim 47 wherein each of the non-volatile memory cells is configured to operate in a linear region.
54. The system of claim 47, wherein the S/H capacitor serves as a voltage source for the control gate line.
55. A current-to-voltage circuit, the circuit comprising:
a reference circuit to receive an input current and output a first voltage in response to the input current, the reference circuit comprising an input current source, an NMOS transistor, a cascode bias transistor, and a reference memory cell;
a sample and hold circuit for receiving the first voltage and outputting a second voltage constituting a sampled value of the first voltage, the sample and hold circuit comprising a switch and a capacitor.
56. The current-to-voltage circuit of claim 55 further comprising an amplifier for receiving the second voltage and outputting a third voltage.
57. The current-to-voltage circuit of claim 55, wherein the second voltage is provided to a wordline in a flash memory system.
58. The current-to-voltage circuit of claim 56, wherein the second voltage is provided to a wordline in a flash memory system.
59. The current-to-voltage circuit of claim 55, wherein the second voltage is provided to a control gate line in a flash memory system.
60. The current-to-voltage circuit of claim 56, wherein the second voltage is provided to a control gate line in a flash memory system.
61. A current-to-voltage circuit, the circuit comprising:
a reference circuit to receive an input current and output a first voltage in response to the input current, the reference circuit comprising an input current source, an NMOS transistor, a cascode bias transistor, and a reference memory cell;
an amplifier for receiving the first voltage and outputting a second voltage;
a sample and hold circuit for receiving the second voltage and outputting a third voltage constituting a sampled value of the second voltage, the sample and hold circuit.
62. The current-to-voltage circuit of claim 61, wherein the third voltage is provided to a wordline in a flash memory system.
63. The current-to-voltage circuit of claim 61, wherein the third voltage is provided to a control gate line in a flash memory system.
64. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns; and
redundant sectors.
65. The system of claim 64, further comprising a non-volatile register for storing system information.
66. A neuromorphic memory system, the system comprising:
a vector-matrix multiplication array comprising an array of non-volatile memory cells organized into rows and columns; and
a non-volatile register to store system information.
Technical Field
Various embodiments of a decoder for use with a vector-matrix multiplication (VMM) array in an artificial neural network are disclosed.
Background
Artificial neural networks mimic biological neural networks (the central nervous system of animals, in particular the brain) which can be used for estimation or approximation depending on a large number of inputs and generally unknown functions. Artificial neural networks typically include layers of interconnected "neurons" that exchange messages with each other.
FIG. 1 illustrates an artificial neural network, where circles represent the inputs or layers of neurons. Connections (called synapses) are indicated by arrows and have a numerical weight that can be adjusted empirically. This enables the neural network to adapt to the input and to learn. Typically, a neural network includes a layer of multiple inputs. There are typically one or more intermediate layers of neurons, and an output layer of neurons that provides the output of the neural network. Neurons at each level make decisions based on data received from synapses, either individually or collectively.
One of the major challenges in developing artificial neural networks for high-performance information processing is the lack of adequate hardware technology. In practice, practical neural networks rely on a large number of synapses, thereby achieving high connectivity between neurons, i.e., very high computational parallelism. In principle, such complexity may be achieved by a digital supercomputer or a dedicated cluster of graphics processing units. However, these methods are also energy efficient, in addition to high cost, compared to biological networks, which consume less energy mainly because they perform low precision analog calculations. CMOS analog circuits have been used for artificial neural networks, but most CMOS-implemented synapses are too bulky given the large number of neurons and synapses.
Applicants previously disclosed an artificial (simulated) neural network that utilized one or more non-volatile memory arrays as synapses in U.S. patent application 15/594,439, which is incorporated herein by reference. The non-volatile memory array operates as a simulated neuromorphic memory. The neural network device includes a first plurality of synapses configured to receive a first plurality of inputs and generate a first plurality of outputs therefrom, and a first plurality of neurons configured to receive the first plurality of outputs. The first plurality of synapses comprises a plurality of memory cells, wherein each of the memory cells comprises: spaced apart source and drain regions formed in the semiconductor substrate, wherein a channel region extends between the source and drain regions; a floating gate disposed over and insulated from a first portion of the channel region; and a non-floating gate disposed over and insulated from a second portion of the channel region. Each of the plurality of memory cells is configured to store a weight value corresponding to a plurality of electrons on the floating gate. The plurality of memory units is configured to multiply the first plurality of inputs by the stored weight values to generate a first plurality of outputs.
Each non-volatile memory cell used in an analog neuromorphic memory system must be erased and programmed to maintain a very specific and precise amount of charge in the floating gate. For example, each floating gate must hold one of N different values, where N is the number of different weights that can be indicated by each cell. Examples of N include 16, 32, and 64.
Prior art decoding circuits used in conventional flash memory arrays (such as bit line decoders, word line decoders, control gate decoders, source line decoders, and erase gate decoders) are not suitable for use with VMMs in analog neuromorphic memory systems. One reason for this is that in VMM systems, the verify portion of the program and verify operations (which are read operations) operate on a single selected memory cell, while the read operations operate on all the memory cells in the array.
What is needed is an improved decoding circuit suitable for use with a VMM in an emulated neuromorphic memory system.
Disclosure of Invention
Various embodiments of a decoder for use with a vector-matrix multiplication (VMM) array in an artificial neural network are disclosed.
Drawings
Fig. 1 is a schematic diagram illustrating an artificial neural network.
Figure 2 is a cross-sectional side view of a conventional 2-gate non-volatile memory cell.
FIG. 3 is a cross-sectional side view of a conventional 4-gate non-volatile memory cell.
FIG. 4 is a side cross-sectional side view of a conventional 3-gate non-volatile memory cell.
FIG. 5 is a cross-sectional side view of another conventional 2-gate non-volatile memory cell.
FIG. 6 is a schematic diagram illustrating different stages of an exemplary artificial neural network utilizing a non-volatile memory array.
Fig. 7 is a block diagram showing a vector multiplier matrix.
FIG. 8 is a block diagram showing various stages of a vector multiplier matrix.
Fig. 9 depicts an embodiment of a vector multiplier matrix.
FIG. 10 depicts another embodiment of a vector multiplier matrix.
FIG. 11 depicts another embodiment of a vector multiplier matrix.
FIG. 12 depicts another embodiment of a vector multiplier matrix.
FIG. 13 depicts another embodiment of a vector multiplier matrix.
FIG. 14 depicts an embodiment of a bit line decoder for a vector multiplier matrix.
FIG. 15 depicts another embodiment of a bit line decoder for a vector multiplier matrix.
FIG. 16 depicts another embodiment of a bit line decoder for a vector multiplier matrix.
FIG. 17 depicts a system for operating a vector multiplier matrix.
FIG. 18 depicts another system for operating a vector multiplier matrix.
FIG. 19 depicts another system for operating a vector multiplier matrix.
FIG. 20 depicts an embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 21 depicts another embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 22 depicts another embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 23 depicts another embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 24 depicts another embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 25 depicts another embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 26 depicts another embodiment of a word line driver for use with a vector multiplier matrix.
FIG. 27 depicts a source line decoder circuit for use with a vector multiplier matrix.
FIG. 28 depicts a word line decoder circuit, a source line decoder circuit, and a high voltage level shifter for use with a vector multiplier matrix.
FIG. 29 depicts an erase gate decoder circuit, a control gate decoder circuit, a source line decoder circuit, and a high voltage level shifter for use with a vector multiplier matrix.
FIG. 30 depicts a word line decoder circuit for use with a vector multiplier matrix.
FIG. 31 depicts a control gate decoder circuit for use with a vector multiplier matrix.
FIG. 32 depicts another control gate decoder circuit for use with a vector multiplier matrix.
FIG. 33 depicts another control gate decoder circuit for use with a vector multiplier matrix.
FIG. 34 depicts a current-to-voltage circuit for controlling word lines in a vector multiplier matrix.
FIG. 35 depicts another current-to-voltage circuit for controlling word lines in a vector multiplier matrix.
Fig. 36 depicts a current-to-voltage circuit for controlling control gate lines in a vector multiplier matrix.
FIG. 37 depicts another current-to-voltage circuit for controlling control gate lines in a vector multiplier matrix.
Fig. 38 depicts another current-to-voltage circuit for controlling control gate lines in a vector multiplier matrix.
FIG. 39 depicts another current-to-voltage circuit for controlling word lines in a vector multiplier matrix.
FIG. 40 depicts another current-to-voltage circuit for controlling word lines in a vector multiplier matrix.
FIG. 41 depicts another current-to-voltage circuit for controlling word lines in a vector multiplier matrix.
Fig. 42 depicts the operating voltages of the vector multiplier matrix of fig. 9.
Fig. 43 depicts the operating voltages of the vector multiplier matrix of fig. 10.
FIG. 44 depicts the operating voltages of the vector multiplier matrix of FIG. 11.
Fig. 45 depicts the operating voltages of the vector multiplier matrix of fig. 12.
Detailed Description
The artificial neural network of the present invention utilizes a combination of CMOS technology and a non-volatile memory array.
Non-volatile memory cell
Digital non-volatile memories are well known. For example, U.S. patent 5,029,130 ("the' 130 patent") discloses an array of split gate non-volatile memory cells, and is incorporated by reference herein for all purposes. Such a memory cell is shown in fig. 2. Each
The
The memory cell 210 (with electrons placed on the floating gate) is programmed by placing a positive voltage on the
Table 1 depicts typical voltage ranges that may be applied to the terminals of
table 1: operation of
WL
SL
Reading
2V-3V
0.6V-
0V
Erasing
About 11V to 0V
0V
Programming
1V-2V
1μA-3μA
9V-10V
Other split gate memory cell configurations are known. For example, fig. 3 depicts a four-gate memory cell 310 that includes a
Table 2 depicts typical voltage ranges that may be applied to the terminals of memory cell 310 for performing read, erase, and program operations:
table 2: operation of flash memory cell 310 of FIG. 3
FIG. 4 depicts a split-gate
Table 3 depicts typical voltage ranges that may be applied to the terminals of
table 3: operation of
WL/SG
BL
EG
SL
Reading
0.7V-2.2V
0.6V-2V
0-2.6
0V
Erasing
-0.5V/
0V
11.5
0V
Programming
1V
2μA-3μA
4.5V
7V-9V
Fig. 5 depicts a stacked gate memory cell 510. Memory cell 510 is similar to
Table 4 depicts a typical voltage range that may be applied to the terminals of memory cell 510 for performing read, erase, and program operations:
table 4: operation of flash memory cell 510 of FIG. 5
CG
BL
SL
P-
Reading
2V-5V
0.6– 0V
0V
Erasing
-8V to-10V/0V
FLT
FLT
8V-10V/15-20V
Programming
8V-12V
3V- 0V
0V
In order to utilize a memory array comprising one of the above types of non-volatile memory cells in an artificial neural network, two modifications are made. First, the circuitry is configured so that each memory cell can be programmed, erased, and read individually without adversely affecting the memory state of other memory cells in the array, as explained further below. Second, continuous (analog) programming of the memory cells is provided.
In particular, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully erased state to a fully programmed state independently and with minimal disturbance to other memory cells. In another embodiment, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully programmed state to a fully erased state, or vice versa, independently and with minimal disturbance to other memory cells. This means that the cell storage device is analog, or at least can store one of many discrete values (such as 16 or 64 different values), which allows very precise and individual tuning of all cells in the memory array, and which makes the memory array ideal for storing and fine-tuning synaptic weights for neural networks.
Using non-volatile memoriesNeural network of cell array
Figure 6 conceptually illustrates a non-limiting example of a neural network that utilizes a non-volatile memory array. This example uses a non-volatile memory array neural network for facial recognition applications, but any other suitable application may also be implemented using a non-volatile memory array based neural network.
For this example, S0 is an input, which is a 32x32 pixel RGB image with 5-bit precision (i.e., three 32x32 pixel arrays, one for each color R, G and B, respectively, each pixel being 5-bit precision). Synapse CB1 from S0 to C1 simultaneously has different sets of weights and shared weights, and scans the input image with a 3x3 pixel overlap filter (kernel), shifting the filter by 1 pixel (or more than 1 pixel as indicated by the model). Specifically, the values of 9 pixels in the 3x3 portion of the image (i.e., referred to as a filter or kernel) are provided to synaptic CB1, whereby these 9 input values are multiplied by appropriate weights, and after summing the outputs of this multiplication, a single output value is determined by the first neuron of CB1 and provided for generating the pixels of one of the layers C1 of the feature map. The 3x3 filter is then shifted to the right by one pixel (i.e., adding the three pixel column to the right and releasing the three pixel column to the left), thereby providing the 9 pixel values in the newly located filter to synapse CB1, thereby multiplying them by the same weight and determining a second single output value by the associated neuron. This process continues until the 3x3 filter scans all three colors and all bits (precision values) over the entire 32x32 pixel image. This process is then repeated using different sets of weights to generate different feature maps for C1 until all feature maps for layer C1 are computed.
At C1, in this example, there are 16 feature maps, each having 30x30 pixels. Each pixel is a new feature pixel extracted from the product of the input and kernel, so each feature map is a two-dimensional array, so in this example, synapse CB1 is comprised of a 16-layer two-dimensional array (bearing in mind that the neuron layers and arrays referenced herein are logical relationships, not necessarily physical relationships, i.e., the arrays are not necessarily oriented in a physical two-dimensional array). Each of the 16 feature maps is generated by one of sixteen different sets of synaptic weights applied to the filter scan. The C1 feature maps may all relate to different aspects of the same image feature, such as boundary identification. For example, a first map (generated using a first set of weights, shared for all scans used to generate the first map) may identify rounded edges, a second map (generated using a second set of weights different from the first set of weights) may identify rectangular edges, or aspect ratios of certain features, and so on.
Before moving from C1 to S1, an activation function P1 (pooling) is applied that pools values from consecutive non-overlapping 2x2 regions in each feature map. The purpose of the pooling stage is to average neighboring locations (or a max function may also be used) to e.g. reduce the dependency of edge locations and reduce the data size before entering the next stage. At S1, there are 16 15x15 feature maps (i.e., 16 different arrays of 15x15 pixels each). Synapses and associated neurons from S1 to C2 in CB2 scan the mapping in S1 with a 4x4 filter, with the filter shifted by 1 pixel. At C2, there are 22 12x12 feature maps. Before moving from C2 to S2, an activation function P2 (pooling) is applied that pools values from consecutive non-overlapping 2x2 regions in each feature map. At S2, there are 22 6x6 feature maps. The activation function is applied to synapse CB3 from S2 to C3, where each neuron in C3 is connected to each map in S2. At C3, there are 64 neurons. Synapse CB4 from C3 to output S3 fully connects S3 to C3. The output at S3 includes 10 neurons, with the highest output neuron determining the class. For example, the output may indicate an identification or classification of the content of the original image.
Each level of synapse is implemented using an array or portion of an array of non-volatile memory cells. FIG. 7 is a block diagram of a vector-matrix multiplication (VMM) array that includes non-volatile memory cells and is used as synapses between an input layer and a next layer. In particular, the VMM 32 includes a non-volatile memory cell array 33, an erase gate and word line gate decoder 34, a control gate decoder 35, a bit line decoder 36 and a source line decoder 37, which decode the inputs to the memory array 33. In this example, the source line decoder 37 also decodes the output of the memory cell array. Alternatively, the bit line decoder 36 may decode the output of the memory array. The memory array serves two purposes. First, it stores the weights to be used by the VMM. Second, the memory array effectively multiplies the inputs by the weights stored in the memory array and adds them for each output line (source line or bit line) to produce an output that will be the input of the next layer or the input of the final layer. By performing the multiply and add functions, the memory array eliminates the need for separate multiply and add logic circuits and is also power efficient due to in-situ memory computations.
The outputs of the memory array are provided to a differential adder (such as an adding operational amplifier) 38 that sums the outputs of the memory cell array to create a single value for the convolution. The differential adder is to sum the positive and negative weights with the positive input. The summed output value is then provided to an activation function circuit 39 which modifies the output. The activation function may include a sigmoid, tanh, or ReLU function. The corrected output value becomes an element of the feature map for the next layer (e.g., C1 in the above description), and is then applied to the next synapse to produce the next feature map layer or final layer. Thus, in this example, the memory array constitutes a plurality of synapses (which receive their inputs from an existing neuron layer or from an input layer such as an image database), and the summing op-amp 38 and activation function circuitry 39 constitute a plurality of neurons.
FIG. 8 is a block diagram of the various stages of the VMM. As shown in fig. 14, the input is converted from digital to analog by a digital-to-
Vector-matrix multiplication (VMM) array
FIG. 9 depicts a neuron VMM 900, particularly suited for use in a memory cell of the type shown in FIG. 2, and which serves as a synapse and component for neurons between an input layer and a next layer. The VMM 900 includes a memory array 903 of non-volatile memory cells, a reference array 901, and a reference array 902. Reference arrays 901 and 902 are used to convert the current input into terminal BLR0-3 to voltage input WL 0-3. As shown, the reference arrays 901 and 902 are in the column direction. Generally, the reference array direction is orthogonal to the input line. In practice, the reference memory cell is a diode connected through a multiplexer (multiplexer 914, which includes a multiplexer and a cascode transistor VBLR for biasing the reference bit line) to the current input flowing therein. The reference cell is tuned to a target reference level.
The memory array 903 serves two purposes. First, it stores the weights to be used by VMM 900. Second, memory array 903 effectively multiplies the inputs (current inputs provided in terminal BLR 0-3; reference arrays 901 and 902 convert these current inputs to input voltages to provide to word line WL0-3) by the weights stored in the memory array to produce an output that will be either an input to the next layer or an input to the final layer. By performing the multiplication function, the memory array eliminates the need for a separate multiplication logic circuit and is also power efficient. Here, a voltage input is provided on the word line and an output appears on the bit line during a read (infer) operation. The current placed on the bit line performs the summing function of all currents from the memory cells connected to the bit line.
FIG. 42 depicts operating voltages for VMM 900. The columns in the table indicate the voltages placed on the word line for the selected cell, the word lines for the unselected cells, the bit line for the selected cell, the bit lines for the unselected cells, the source line for the selected cell, and the source line for the unselected cells. The rows indicate read, erase, and program operations.
FIG. 10 depicts a neuron VMM 1000, the VMM 1000 being particularly suited for use in memory cells of the type shown in FIG. 2 and serving as a synapse and component for neurons between an input layer and a next layer. VMM 1000 includes a memory array 1003 of non-volatile memory cells, a reference array 1001, and a reference array 1002. VMM 1000 is similar to VMM 900, except that in VMM 1000, the word lines extend in a vertical direction. There are two reference arrays 1001 (at the top, which provide a reference to convert the input current to a voltage for the even rows) and 1002 (at the bottom, which provide a reference to convert the input current to a voltage for the odd rows). Here, the input is provided on a word line and the output appears on a source line during a read operation. The current placed on the source line performs the summing function of all currents from the memory cells connected to the source line.
FIG. 43 depicts operating voltages for VMM 1000. The columns in the table indicate the voltages placed on the word line for the selected cell, the word lines for the unselected cells, the bit line for the selected cell, the bit lines for the unselected cells, the source line for the selected cell, and the source line for the unselected cells. The rows indicate read, erase, and program operations.
FIG. 11 depicts a
Fig. 44 depicts operating voltages for
FIG. 12 depicts a
Fig. 45 depicts operating voltages for
FIG. 13 depicts a neuronal VMM1300, the VMM1300 being particularly suited for use in a memory cell of the type shown in FIG. 3 and serving as a synapse and component for a neuron between an input layer and a next layer. VMM1300 includes a memory array 1301 of non-volatile memory cells and a reference array 1302 (at the top of the array). Alternatively, another reference array may be provided at the bottom, similar to the reference array of fig. 10. In other respects, VMM1300 is similar to
As described herein with respect to neural networks, the flash memory cells are preferably configured to operate in a sub-threshold region.
The memory cells described herein are under weak reverse bias:
Ids=Io*e(Vg-Vth)/kVt=w*Io*e(Vg)/kVt
w=e(-Vth)/kVt
for an I-to-V logarithmic converter that converts an input current to an input voltage using a memory cell:
Vg=k*Vt*log[Ids/wp*Io]
for a memory array used as a vector matrix multiplier VMM, the output current is:
Iout=wa*Io*e(Vg)/kVti.e. by
Iout=(wa/wp)*Iin=W*Iin
W=e(Vthp-Vtha)/kVt
The word line or control gate may be used as an input to the memory cell for an input voltage.
Alternatively, the flash memory cell may be configured to operate in the linear region:
Ids=beta*(Vgs-Vth)*Vds;beta=u*Cox*W/L
Wα(Vgs-Vth)
for an I-to-V linear converter, a memory cell operating in the linear region may be used to linearly convert an input/output current to an input/output voltage.
Other embodiments of ESF vector matrix multipliers are described in U.S. patent application 15/826,345, which is incorporated herein by reference. The source line or bit line may be used as the neuron output (current summation output).
Fig. 14 depicts an implementation of a bit
One challenge with simulating a neuromorphic system is that the system must be able to program and verify individually selected cells (which involves a read operation), and it must also be able to perform an ANN read in which all cells in the array are selected and read. In other words, the bit line decoder must sometimes select only one bit line, and in other cases must select all bit lines.
The bit
FIG. 15 depicts an implementation of a bit line decoder circuit 1500. The bit line decoder circuit 1500 is coupled to the VMM array 1501. The VMM array may be based on any of the previously discussed VMM designs (such as
The select transistors 1502 and 1503 are controlled by a pair of complementary control signals (V0 and VB _0) and are coupled to a bit line (BL 0). The select transistors 1504 and 1505 are controlled by another pair of complementary control signals (V1 and VB _1) and are coupled to another bit line (BL 1). Select transistors 1502 and 1504 are coupled to the same output, such as for enabling programming, and select transistors 1503 and 1505 are coupled to the same output, such as for inhibiting programming. An output line (program and erase PE decode path) of transistor 1502/1503/1504/1505 is coupled, for example, to PE column driver circuitry for controlling programming, PE verification, and erase (not shown).
The select transistor 1506 is coupled to a bit line (BL0) and to an output and activation function circuit 1507 (e.g., a current adder and an activation function such as tanh, sigmoid, ReLU). The select transistor 1506 is controlled by a control line 1508.
When only BL0 is activated, control line 1508 is de-asserted and signal V0 is asserted, reading only BL 0. During an ANN read operation, control line 1508 is asserted, select transistor 1506 and similar transistors are turned on, and all bit lines are read, such as for all neuron processing.
FIG. 16 depicts an embodiment of a bit line decoder circuit 1600. The bit line decoder circuit 1600 is coupled to the VMM array 1601. The VMM array may be based on any of the previously discussed VMM designs (such as
Select transistor 1601 is coupled to a bit line (BL0) and to output and activate function circuit 1603. Select transistor 1602 is coupled to a bit line (BL0) and to a common output (PE decode path).
When only BL0 is activated, select transistor 1602 is activated and BL0 is attached to the common output. During an ANN read operation, the select transistor 1601 and similar transistors are turned on and all bit lines are read.
For the decoding in fig. 14, 15 and 16, for unselected transistors, a negative bias can be applied to reduce transistor leakage so as not to affect memory cell performance. Or a negative bias may be applied to the PE decoding path when the array is in ANN operation. The negative bias may be from-0.1V to-0.5V or greater.
Fig. 17 depicts a
As shown, a reference cell low
The low
Fig. 18 depicts a VMM system 1800. The VMM system 1800 is similar to the
Fig. 19 depicts a
The reference array low
Fig. 20 depicts a word line driver 2000. The word line driver 2000 selects a word line (such as the exemplary word lines WL0, WL1, WL2, and WL3 shown here) and provides a bias voltage to the word line. Each word line is attached to a select transistor, such as a select iso (isolation) transistor 2002, controlled by a control line 2001. The Iso transistor 2002 is used to isolate high voltages (e.g., 8V-12V), such as from erase, from the word line decode transistors, which may be implemented with IO transistors (e.g., 1.8V, 3.3V). Here, during any operation, the control line 2001 is activated, and all the selection transistors similar to the selection iso transistor 2002 are turned on. An exemplary bias transistor 2003 (part of the word line decode circuit) selectively couples the word line to a first bias voltage (such as 3V) and an exemplary bias transistor 2004 (part of the word line decode circuit) selectively couples the word line to a second bias voltage (lower than the first bias voltage, including ground, bias therebetween, negative voltage bias to reduce leakage from unused memory rows). During an ANN read operation, all used word lines will be selected and tied to a first bias voltage. All unused word lines are tied to a second bias voltage. During other operations, such as for programming operations, only one word line will be selected and the other word lines will be tied to a second bias voltage, which may be a negative bias (e.g., -0.3V to-0.5V or greater) to reduce array leakage.
Fig. 21 depicts a word line driver 2100. The word line driver 2100 is similar to the word line driver 2000 except that the top transistors (such as the bias transistor 2103) may be separately coupled to a bias voltage and all such transistors are not tied together as in the word line driver 2000. This allows all word lines to have different independent voltages in parallel at the same time.
Fig. 22 depicts a word line driver 2200. The word line driver 2200 is similar to the word line driver 2100 except that the bias transistors 2103 and 2104 are coupled to the decoder circuit 2201 and the inverter 2202. Thus, fig. 22 depicts a decode sub-circuit 2203 within the word line driver 2200.
FIG. 23 depicts a word line driver 2300. The word line driver 2300 is similar to the word line driver 2100 except that bias transistors 2103 and 2104 are coupled to the output of the stage 2302 of the shift register 2301. The shift register 1301 allows each row to be controlled independently by serial shifting in the data (clocking the registers serially), such as enabling one or more rows to be enabled simultaneously depending on the shift in the data pattern.
Fig. 24 depicts a word line driver 2400. The word line driver 2400 is similar to the word line driver 2000 except that each select transistor is further coupled to a capacitor (such as capacitor 2403). The
Fig. 25 depicts a
FIG. 26 depicts a word line driver. The
Fig. 27 depicts a high voltage source line decoder circuit 2700. The high voltage source line decoder circuit includes transistors 2701, 2702, and 2703 configured as shown. The transistor 2703 is used to deselect the source line to a low voltage. The transistor 2702 is used to drive a high voltage into the source line of the array, and the transistor 2701 is used to monitor the voltage on the source line. The transistors 2702, 2701 and driver circuits (such as opa) are configured in a closed loop manner (force/sense) to maintain voltage (process, voltage, temperature) and varying current load conditions across the PVT. SLE (driven source line node) and SLB (monitored source line node) may be located at one end of the source line. Alternatively, SLE may be located at one end of the source line and SLN may be located at the other end of the source line.
FIG. 28 depicts VMM high voltage decoding circuitry comprising word
The word
The source
The high
Fig. 29 depicts VMM high voltage decoding circuitry including erase
The erase
The source
The high
FIG. 30 depicts a word line decoder 300 for exemplary word lines WL0, WL1, WL2, and
Fig. 31 depicts a
Fig. 32 depicts a control gate decoder 3200 for exemplary control gate lines CG0, CG1, CG2, and
Fig. 33 depicts a
Fig. 34 depicts a current-to-voltage circuit 3400. The circuit includes a configured diode-connected reference cell circuit 3450 and a sample-and-hold circuit 3460. The circuit 3450 includes an input current source 3401, an NMOS transistor 3402, a cascode bias transistor 3403, and a reference memory cell 3404. The sample-and-hold circuit is comprised of a switch 3405 and an S/H capacitor 3406. The memory 3404 is biased in a diode connected configuration with the bias on its bit line used to convert the input current to a voltage, such as for supplying a word line.
Fig. 35 depicts a current-to-
Fig. 36 depicts a current-to-voltage circuit 3600, which current-to-voltage circuit 3600 is designed the same as current-to-voltage circuit 3400 with control gates in a diode connected configuration. Current-to-voltage circuit 3600 includes a configured diode-connected reference cell circuit 3650 and a sample-and-hold circuit 3660.
Fig. 37 shows a current-to-
Fig. 38 depicts a current-to-voltage circuit 3800, the current-to-voltage circuit 3800 being similar to fig. 35 with the control gates connected in a diode connected configuration. The current-to-voltage circuit 3800 includes a diode-connected
Fig. 39 depicts a current-to-voltage circuit 3900, which current-to-voltage circuit 3900 is similar to fig. 34 applied to the memory cell in fig. 2. The current-to-voltage circuit 3900 includes a configured diode-connected reference cell circuit 3950 and a sample-and-hold circuit 3960.
Fig. 40 depicts a current-to-voltage circuit 4000, which current-to-voltage circuit 4000 is similar to fig. 37 applied to the memory cell in fig. 2, with a buffer 4090 placed between a reference circuit 4050 and an S/H circuit 4060.
Fig. 41 depicts a current-to-voltage circuit 4100, which is similar to fig. 38 applied to the memory cell in fig. 2. The current-to-voltage circuit 4100 includes a configured diode-connected reference cell circuit 4150, a sample-and-hold circuit 4170, and an amplifier stage 4162.
It should be noted that as used herein, the terms "above …" and "above …" both inclusively encompass "directly on …" (with no intermediate material, element, or space disposed therebetween) and "indirectly on …" (with intermediate material, element, or space disposed therebetween). Similarly, the term "adjacent" includes "directly adjacent" (no intermediate material, element, or space disposed therebetween) and "indirectly adjacent" (intermediate material, element, or space disposed therebetween), "mounted to" includes "directly mounted to" (no intermediate material, element, or space disposed therebetween) and "indirectly mounted to" (intermediate material, element, or space disposed therebetween), and "electrically coupled to" includes "directly electrically coupled to" (no intermediate material or element therebetween that electrically connects the elements together) and "indirectly electrically coupled to" (intermediate material or element therebetween that electrically connects the elements together). For example, forming an element "over a substrate" can include forming the element directly on the substrate with no intervening materials/elements therebetween, as well as forming the element indirectly on the substrate with one or more intervening materials/elements therebetween.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:使用扩展码的传输