Integrated circuit with a plurality of transistors

文档序号:600266 发布日期:2021-05-04 浏览:15次 中文

阅读说明:本技术 集成电路 (Integrated circuit with a plurality of transistors ) 是由 吕函庭 于 2019-11-04 设计创作,主要内容包括:本发明公开了一种集成电路,包括存储器阵列、多条字线、多条位线以及页面缓冲器。存储器阵列包括多个存储器单元,分别经配置以被写入权重。多条字线分别连接多个存储器单元中的一列存储器单元。多条位线分别连接多个存储器单元的彼此串联连接的一栏存储器单元。多条位线的在存储器阵列的一区块中的多者或多条字线的在存储器阵列的多个区块中的多者经配置以接收多个输入电压,且接收多个输入电压的存储器单元经配置以将写入权重与接收的输入电压相乘。页面缓冲器耦合至存储器阵列,且经配置以感测权重与输入电压的多个乘积。(The invention discloses an integrated circuit, which comprises a memory array, a plurality of word lines, a plurality of bit lines and a page buffer. The memory array includes a plurality of memory cells each configured to be written with a weight. The plurality of word lines are respectively connected with a column of memory cells in the plurality of memory cells. The plurality of bit lines are respectively connected to a column of memory cells of the plurality of memory cells connected in series with each other. The plurality of bit lines in a block of the memory array or the plurality of word lines in a plurality of blocks of the memory array are configured to receive a plurality of input voltages, and the memory cells receiving the plurality of input voltages are configured to multiply the write weight with the received input voltage. The page buffer is coupled to the memory array and configured to sense a plurality of products of the weights and the input voltages.)

1. An integrated circuit, comprising:

a memory array including a plurality of memory cells respectively configured to be written with weights;

a plurality of word lines and a plurality of bit lines, wherein the plurality of word lines respectively connect a column of memory cells of the plurality of memory cells, the plurality of bit lines respectively connect a column of memory cells of the plurality of memory cells connected in series with each other, a plurality of the plurality of bit lines in a block of the memory array or a plurality of the plurality of word lines in a plurality of blocks of the memory array are configured to receive a plurality of input voltages, and a plurality of the plurality of memory cells receiving the plurality of input voltages are configured to multiply a plurality of the plurality of weights written with the plurality of input voltages received; and

a page buffer coupled to the memory array and configured to sense a plurality of products of the plurality of weights and the plurality of input voltages.

2. The integrated circuit of claim 1, wherein the multiple ones of the plurality of bit lines in the block receive the plurality of input voltages and one of the plurality of word lines in the block is configured to receive a read voltage while the other ones of the plurality of word lines in the block are configured to receive a pass voltage.

3. The integrated circuit of claim 2, wherein memory cells corresponding to said multiple ones of said plurality of bit lines and said one of said plurality of word lines are configured to multiply said multiple ones of said plurality of weights stored with said plurality of input voltages received and to generate said plurality of products.

4. The integrated circuit of claim 3, further comprising a counter, wherein the counter is coupled to the page buffer and configured to sum the plurality of products.

5. The integrated circuit of claim 2, wherein at least two of the plurality of input voltages are distinct from each other.

6. The integrated circuit of claim 2, wherein the plurality of input voltages are the same as each other.

7. The integrated circuit of claim 6, wherein the page buffer comprises a first cache and a second cache, the first cache configured to receive a plurality of first logic signals converted from the products of the plurality of weights and the plurality of input voltages and to be pre-written with a plurality of second logic signals converted from a plurality of additional input voltages, and the second cache configured to multiply the plurality of first logic signals and the plurality of second logic signals and to accumulate the plurality of products of the plurality of first logic signals and the plurality of second logic signals.

8. The integrated circuit of claim 7, wherein at least two of the plurality of additional input voltages are distinct from each other and converted to different logic signals.

9. The integrated circuit of claim 1, wherein the multiple ones of the plurality of word lines in the plurality of blocks are configured to receive the plurality of input voltages, a word line of one of the plurality of blocks is electrically isolated from a word line of another one of the plurality of blocks, the plurality of bit lines are respectively shared by the plurality of blocks of the memory array, and one of the plurality of bit lines is configured to receive a read voltage while the other one of the plurality of bit lines is configured to receive a pass voltage.

10. The integrated circuit of claim 9, wherein memory cells corresponding to the multiple ones of the plurality of word lines and the one of the plurality of bit lines are configured to multiply stored multiple ones of the plurality of weights with the received plurality of input voltages and generate the plurality of products.

11. The integrated circuit of claim 10, wherein the plurality of products are summed via the one of the plurality of bit lines.

12. The integrated circuit of claim 10, wherein memory cells corresponding to the plurality of word lines and the one of the plurality of bit lines have a starting voltage greater than or equal to 0V.

13. The integrated circuit of claim 1, wherein the memory array is a nand flash memory array and the plurality of memory cells are a plurality of flash memory cells.

14. The integrated circuit of claim 1, wherein the number of page buffers is a majority and a block of the memory array has a plurality of sub-blocks, each coupled to one of the plurality of page buffers.

Technical Field

The present invention relates to an integrated circuit and an operation method thereof, and more particularly, to a memory circuit.

Background

In a calculator designed with a Von Neumann architecture (Von Neumann architecture), the data storage unit and the data processing unit are separated from each other. Data must travel back and forth between the data storage unit and the data processing unit via the input/output (I/O) port and the bus (bus), which is time and energy consuming. In addition, for the processing of huge amounts of data, the round trip of data between units creates a bottleneck in processing performance. In recent years, with the rise of Artificial Intelligence (AI) technology, the amount of data required to be processed by a calculator is greatly increased, which causes the above-mentioned performance bottleneck to be increasingly serious.

Disclosure of Invention

The invention provides an integrated circuit which can be operated in a memory mode and an operation mode.

The integrated circuit of the present invention includes: a memory array including a plurality of memory cells respectively configured to be written with weights; a plurality of word lines and a plurality of bit lines, wherein the plurality of word lines respectively connect a column of memory cells of the plurality of memory cells, the plurality of bit lines respectively connect a column of memory cells of the plurality of memory cells connected in series with each other, a plurality of the plurality of bit lines in a block of the memory array or a plurality of the plurality of word lines in a plurality of blocks of the memory array are configured to receive a plurality of input voltages, and a plurality of the plurality of memory cells receiving the plurality of input voltages are configured to multiply a plurality of the plurality of weights written with the plurality of input voltages received; and a page buffer coupled to the memory array and configured to sense a plurality of products of the plurality of weights and the plurality of input voltages.

In some embodiments, the ones of the plurality of bit lines in the block receive the plurality of input voltages, and one of the plurality of word lines in the block is configured to receive a read voltage while others of the plurality of word lines in the block are configured to receive a pass voltage.

In some embodiments, memory cells corresponding to said plurality of bit lines and said one of said plurality of word lines are configured to multiply said plurality of stored weights by said plurality of received input voltages and to generate said plurality of products.

In some embodiments, the integrated circuit further comprises a counter, wherein the counter is coupled to the page buffer and configured to sum the plurality of products.

In some embodiments, at least two of the plurality of input voltages are distinct from each other.

In some embodiments, the plurality of input voltages are the same as each other.

In some embodiments, the page buffer includes a first cache and a second cache. The first cache is configured to receive a plurality of first logic signals converted from the products of the weights and the input voltages, and to pre-write a plurality of second logic signals converted from additional input voltages. The second cache is configured to multiply the plurality of first logic signals and the plurality of second logic signals and accumulate a plurality of products of the plurality of first logic signals and the plurality of second logic signals.

In some embodiments, at least two of the plurality of additional input voltages are distinct from each other and converted to different logic signals.

In some embodiments, the plurality of word lines in the plurality of blocks are configured to receive the plurality of input voltages, the word line of one of the plurality of blocks is electrically isolated from the word line of another of the plurality of blocks, the plurality of bit lines are respectively shared by the plurality of blocks of the memory array, and one of the plurality of bit lines is configured to receive a read voltage while the other of the plurality of bit lines is configured to receive a pass voltage.

In some embodiments, memory cells corresponding to the plurality of word lines and the one of the plurality of bit lines are configured to multiply stored multiple ones of the plurality of weights by the received plurality of input voltages and generate the plurality of products.

In some embodiments, the plurality of products are summed via the one of the plurality of bit lines.

In some embodiments, memory cells corresponding to the ones of the plurality of word lines and the one of the plurality of bit lines have a starting voltage greater than or equal to 0V.

In some embodiments, the memory array is a nand flash memory array and the plurality of memory cells is a plurality of flash memory cells.

In some embodiments, the number of page buffers is a majority, and a block of the memory array has a plurality of sub-blocks respectively coupled to one of the plurality of page buffers.

The operation method of the integrated circuit of the invention comprises the following steps: performing at least one programming operation to write the weights into the memory cells respectively; applying a plurality of input voltages to a plurality of the plurality of bit lines in a block of the memory array or a plurality of the plurality of word lines in a plurality of blocks of the memory array, wherein a memory cell receiving the plurality of input voltages is configured to multiply a stored plurality of the plurality of weights with the received plurality of input voltages to obtain a plurality of products; and summing the plurality of products via the page buffer or via one of the plurality of bit lines.

In some embodiments, the step of applying the plurality of input voltages and the step of summing the plurality of products form a loop, and the method of operation of the integrated circuit includes performing the loop a plurality of times.

In some embodiments, the step of applying the plurality of input voltages of one of the plurality of cycles precedes the step of applying the plurality of input voltages of a subsequent one of the plurality of cycles.

In some embodiments, the step of applying the plurality of input voltages for one of the plurality of cycles overlaps in time with the step of summing the plurality of products for a previous one of the plurality of cycles.

In some embodiments, the plurality of input voltages are applied to the plurality of bit lines in the block, and the page buffer is configured to sum the plurality of products.

In some embodiments, the plurality of input voltages are applied to the plurality of word lines in the plurality of blocks, and the plurality of products are summed via the one of the plurality of bit lines.

Based on the above, the integrated circuit of the present invention can operate in a memory mode and an operation mode. The integrated circuit includes a memory array, such as a nand flash memory array. The integrated circuit can execute the sum of product terms function and can be used for artificial intelligence application, a neural pattern simulation operation system and a learning program of a machine learning system. In the memory mode, the weights are written into the memory cells of the memory array. In the operational mode, the stored weights are multiplied by the input voltages delivered to the memory cells via the bit lines or word lines, and the products of the weights and the input voltages are accumulated. In contrast to the van newmann architecture, which performs operations in a data processing unit (e.g., a central processing unit) separate from a data storage unit (e.g., a memory integrated circuit), the integrated circuit of the present invention can operate in both a memory mode and an operational mode. Thus, data no longer needs to travel to and from the data processing unit and the data storage unit, and instruction cycles can be significantly increased. In particular, the page buffer used to write the weights to the memory cells and to receive the product of the weights and the input voltage is coupled to the memory array through a large number of bit lines with high parallelism, so the page buffer has a relatively high bandwidth. Thus, the integrated circuit can be used for large data operations, and may not suffer from performance bottlenecks such as the van Neumann type architecture.

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

FIG. 1A is a schematic diagram of an integrated circuit in accordance with some embodiments of the invention.

FIG. 1B is a flow chart of a method of operating the integrated circuit exemplarily shown in FIG. 1A.

Fig. 2 is a schematic diagram of an integrated circuit in accordance with some embodiments of the invention.

Fig. 3 is a schematic diagram of an integrated circuit in accordance with some embodiments of the invention.

Fig. 4 is a schematic diagram of an integrated circuit in accordance with some embodiments of the invention.

[ notation ] to show

10. 10a, 10b, 20: integrated circuit with a plurality of transistors

100. 100', 200: memory array

BL: bit line

BK1, BK 2: block

BS: inter-sub-block bus system

CA 1: first cache

CA 2: second cache

CT: counter with a memory

GSL: grounding selection line

GST: ground selection transistor

MC: memory unit

PB, PB': page buffer

S100、S102、S1021、S1022、S102n、S104、S1041、S1042、S104n: step (ii) of

SL: source line

SSL: string selection line

SST: string selection transistor

TL: sub-block

Wi、W1、W2: weight of

WL, WL1, WL2, WL3, WLn: word line

X、Xi、X1、X2: input voltage

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

Fig. 1A is a schematic diagram of an integrated circuit 10 in accordance with some embodiments of the present invention. FIG. 1B is a flowchart illustrating a method of operating the integrated circuit 10 illustrated in FIG. 1A.

Referring to fig. 1A, the integrated circuit 10 may be a memory circuit, such as a non-volatile memory circuit. In some embodiments, the integrated circuit 10 is a NAND flash memory (NAND flash memory) circuit, and may be used in applications such as neuromorphic computing systems (neuromorphic computing systems), machine learning systems (machine learning systems), and artificial intelligence (artificial intelligence) that include performing multiply-and-accumulate (MAC) operations. The MAC operation step can be represented by a sum-of-products (sum-of-products) function, as shown in equation (1):

in the formula (1), the accumulated product terms are input values XiAnd weight WiThe product of (a). Weights W of accumulated multiple product termsiThe values may differ from each other. The weights may be specified as a set of constants, and the sum of the multiple product terms varies as the input value changes. In addition, when the algorithm executes a learning procedure (learning procedure), the weights of the plurality of learning procedures may be different from each other to learn from the sum of the multiple product terms. For example, the weights are obtained via remote training performed in a computer, and these weights are downloaded to the integrated circuit 10. Such weights may be downloaded and updated within integrated circuit 10 after the pattern of the remote training is changed.

Integrated circuit 10 includes a memory array 100. The memory array 100 has a plurality of memory cells MC. In some embodiments, the memory array 100 is a three-dimensional memory array. As shown in fig. 1A, the memory cells MC of each block are arranged to have a plurality of columns (or called strings) and a plurality of rows (or called pages). In embodiments where integrated circuit 10 is a nand flash memory circuit, memory cell MC may be a floating gate transistor (floating gate transistor), a semiconductor-oxide-nitride-oxide-semiconductor (SONOS) transistor, or the like. The memory cells MC of each column (or string) are connected in series and between a bit line BL and a source line SL. In some embodiments, a plurality of columns (or strings) of memory cells MC share a source line SL. On the other hand, one of a plurality of word lines WL (e.g., word line 1, word line 2, word line 3 …, and word line WLn as shown in fig. 1A) connects each row (or page) of memory cells MC. In some embodiments, the memory array 100 further includes a string selection transistor SST and a ground selection transistor GST. In such embodiments, each column (or string) of memory cells MC is connected between a string select transistor SST and a ground select transistor GST. A plurality of string selection transistors SST may be respectively connected to one of the plurality of bit lines BL, and a plurality of ground selection transistors GST may be connected to the source line SL. Further, a string selection line SSL connects the string selection transistors SST of one column, and a ground selection line GSL connects the ground selection transistors GST of one column.

The integrated circuit 10 is operable in a memory mode and an operational mode. In the memory mode, a program operation, an erase operation, and a read operation may be performed to write data into the memory cell MC or to read data from the memory cell MC. Peripheral circuits coupled to the memory array 100 may support the program operation, erase operation, and read operation described above. For example, the peripheral circuits may include a decoder (not shown), a Page Buffer (PB), and the like. During a program operation, a word line WL and some bit lines BL are selected, and data is written in the memory cells MC corresponding to the selected word line WL and bit lines BL through the page buffer PB and the selected word line WL. On the other hand, during a read operation, data is read out from the memory cell MC corresponding to the selected word line WL and bit line BL via the page buffer PB and the selected bit line BL. In some embodiments, each program operation writes data to a page of memory cells MC, while each read operation reads data from a page of memory cells MC. In embodiments where integrated circuit 10 is configured to perform a sum of product terms function (as shown in equation (1)), weights W are applied by multiple programming operations as described abovei(e.g., including the weight W shown in FIG. 1A1And weight W2) Writing into a plurality of memory cells MC. Weights W written in a plurality of memory cells MCiThe conductance (conductance) or transconductance (transconductance) of the memory cells MC is determined. In some embodiments, memory cell MC is programmed by a binary mode (binary mode) with a weight WiStored as binary levels. In an alternative embodiment, the weight WiStored as a multi-bit level or analog code. For example, the multi-bit level may be N levels, where N is a positive integer greater than 2.

In the operational mode of integrated circuit 10, weight W stored in memory cell MC is usediAnd an input voltage XiMultiply and accumulate a plurality of weights WiCorresponding to input voltage XiA plurality of products of (a). In some embodiments, a plurality of bit lines BL of a block of the memory array 100 are configured to receive the input voltage Xi(as shown in FIG. 1A, for example, includes an input voltage X1And an input voltage X2). In some embodiments, the plurality of input voltages X received by the bit lines BLiHas a specific distribution (pattern), and the input voltages XiAre different from each other. For example, applying multiple input voltages X via a dibit modeiAnd wherein a voltage X is input1Is a high logic level "1", and the input voltage X2Is a low logic level "0". Alternatively, a plurality of input voltages XiCan be applied as a multi-bit level (e.g., N levels, where N is a positive integer greater than 2) or an analog code. One of the word lines WL of a block of the memory array 100 is selected to receive a read voltage (read voltage), while the other word lines WL of the block of the memory array 100 receive pass voltages (pass voltages). In some embodiments, a page of memory cells MC connected to the selected word line WL is turned on receiving a read voltage. In addition, when the bit line BL will input the voltage XiWhen inputting to the conductive memory cells MC, a plurality of weights W stored in the conductive memory cells MCiCorresponding to the input voltage XiMultiplication. At an input voltage XiIn the embodiment of transferring to the memory cell MC via the bit line BL, the weight W stored in the memory cell MCiCan be regarded asConductance of the memory cell (conductance), and weight WiAnd an input voltage XiThe product of (d) is output in the form of a current. Due to the weight WiAnd an input voltage XiThe multiplication occurs in the memory array 100, which may be considered an in-memory operation (in-memory computing).

In some embodiments, multiple weights WiCorresponding to input voltage XiTo a page buffer PB coupled to the memory array 100 via bit lines BL. A sense amplifier (not shown) in the page buffer PB may be configured to sense the current signals of these outputs. In addition, a counter (counter) CT coupled to the page buffer PB may be configured to sum the output current signals (i.e., weights WiCorresponding to input voltage XiMultiple products of). Although the page buffer PB and the counter CT are illustrated as separate components in fig. 1A, the page buffer PB and the counter CT may alternatively be integrated into a single component. The page buffer PB and the counter CT are disposed in an area surrounding the memory array 100 and are adjacent to the memory array 100. Therefore, the addition operation performed through the page buffer PB and the counter CT can be regarded as a near-memory computation (near-memory computation).

Thus far, the operation (combining multiple weights W) in the memoryiCorresponding to input voltage XiMultiplication) and near memory operation (combining multiple weights WiCorresponding to input voltage XiSum up the multiple products) to perform a sum function of the product terms (as shown in equation (1). In contrast to the van newmann architecture, which performs operations in a data processing unit (e.g., a central processing unit) separate from a data storage unit (e.g., a memory integrated circuit), the integrated circuit 10 of the present invention can operate in both a memory mode and an operational mode. Thus, data no longer needs to travel to and from the data processing unit and the data storage unit, and instruction cycles can be significantly increased. In particular, for applying the weight WiWrite memory cell MC and receive weight WiAnd an input voltage XiThe page buffer PB of the product of (a) passes through a large number and has a high degree of parallelismThe bit lines BL of (parallelism) are coupled to the memory array 100, so the page buffer PB has a relatively high bandwidth. Thus, the integrated circuit 10 may be used for bulk data operations and may not suffer from performance bottlenecks such as the Vann-type architecture. In some embodiments, the page buffer PB may have a bandwidth greater than or equal to 32 kB.

Referring to fig. 1A and 1B, the method of operating the integrated circuit 10 may include the following steps. In step S100, the weight W is set by performing the above-mentioned programming operation a plurality of timesiTo a plurality of memory cells MC.

In step S102, a plurality of input voltages X are appliediIs applied to a page of memory cells MC connected to one word line WL (e.g., word line WL 1). Thus, the weight WiAnd an input voltage XiIs multiplied in the memory cell MC and weighted by WiAnd an input voltage XiThe product of (d) is output via the bit line BL as a current signal. Furthermore, the page buffer PB is configured to sense such outputted current signals. In step S104, the output current signals are summed by means such as a counter CT. Steps S102 and S104 may constitute a single loop of performing a sum of product terms function for a single page of memory cells MC. Subsequently, other cycles are performed to perform the sum of product terms function for the memory cells MC of other pages. For example, other cycles include the step S1021And step S1041Including the loop of step S1022And step S1042And including step S102nAnd step S104nThe cycle of (c), etc. Inputting the voltage X in two consecutive cycles of performing a sum of product terms function for memory cells MC of adjacent pagesiOne of the steps applied to the memory cells MC of the adjacent page (e.g. step S102)1) After the other (e.g., step S102), and may at least partially overlap the step of summing the current signals in the earlier cycle (e.g., step S104). Based on this pipelined timing flow design, some steps overlap each other in time, so the instruction cycle of the integrated circuit 10 can be further increased.

Fig. 2 is a schematic diagram of an integrated circuit 10a in accordance with some embodiments of the present invention. The integrated circuit 10a and the operation method thereof described with reference to fig. 2 are similar to the integrated circuit 10 and the operation method thereof described with reference to fig. 1A and 1B. Only the differences between the two will be described below, and the same or similar parts will not be described again.

Referring to fig. 2, in some embodiments, a plurality of bit lines BL of a block of the memory array 100 receive the same input voltage X in the operation mode. In other words, in these embodiments, the plurality of input voltages X received by the bit lines BL do not have a specific distribution (pattern). For example, in the dual bit mode, all bit lines BL may be configured to receive the input voltage X at a low logic level "1". In this way, a plurality of weights W stored in a plurality of memory cells MCiMultiplied by the same input voltage X, and the resulting products are converted into logic signals (e.g., 1 and 0) in the form of current signals by an amplification sensor (not shown) and input to the page buffer PB'. In some embodiments, the page buffer PB' includes a first cache (cache) CA1 and a second cache CA 2. The first cache CA1 is configured to receive and temporarily store the logic signals (hereinafter referred to as the first logic signals) and is pre-written with a plurality of input voltages XiThe converted other logic signals (hereinafter referred to as second logic signals). These input voltages XiWith a specific distribution (pattern). In other words, a plurality of input voltages XiAre different from each other. For example, in the dibit mode, a plurality of input voltages XiCan be converted into a high logic level signal "1", while a plurality of input voltages XiMay be converted to a low logic level signal "0". A counter (not shown) within the second cache CA2 is then configured to perform a multiply-accumulate operation on the first logic signal and the second logic signal. In other words, the second cache CA2 is configured to multiply the first logic signal with the second logic signal and sum the resulting products. To this end, the product term sum function has been performed by multiply and add operations, both of which can be considered near memory operationsAnd (4) calculating.

Fig. 3 is a schematic diagram of an integrated circuit 10b in accordance with some embodiments of the present invention. The integrated circuit 10B and the operation method thereof described with reference to fig. 3 are similar to the integrated circuit 10 and the operation method thereof described with reference to fig. 1A and 1B. Only the differences between the two will be described below, and the same or similar parts will not be described again.

Referring to fig. 3, in some embodiments, a block of the memory array 100' of the integrated circuit 10b is divided into a plurality of sub-blocks (tiles). For example, as shown in FIG. 3, a block of the memory array 100' is partitioned into 4 sub-blocks TL. The plurality of sub-blocks TL each comprise a portion of the memory array 100', and the plurality of sub-blocks TL are physically separated from each other. It is noted that, for the sake of simplicity, fig. 3 only illustrates the bit lines BL and the word lines WL of each sub-block TL, and omits to illustrate other components of each sub-block TL (e.g., including the memory cells MC, the string selection transistor SST, the ground selection transistor GST, the string selection lines SSL, and the ground selection lines GSL shown in fig. 1A). The plurality of sub-blocks TL are arranged along a plurality of columns and a plurality of arrays of columns. In some embodiments, an inter-tile bus system (inter-tile bus system) BS is coupled to and extends between the plurality of sub-tiles TL. In addition, the inter-sub-block bus system BS may be further coupled to a timing controller (not shown). Furthermore, each sub-block TL is coupled to peripheral circuits including a page buffer PB and a counter CT. In some embodiments, the peripheral circuits coupled to adjacent sub-blocks TL in the same column face each other, and the peripheral circuits coupled to adjacent sub-blocks TL in the same row are located on the same side of the sub-blocks TL. However, one skilled in the art can adjust the number of the sub-blocks TL and the arrangement of the sub-blocks TL and the peripheral circuits according to design requirements, and the invention is not limited thereto. In addition, in some embodiments, each sub-block TL is coupled to a row decoder and a column decoder (both not shown). By dividing the memory array 100' into a plurality of sub-blocks TL, the RC delay effect (RC delay) of the integrated circuit 10b may be reduced, which may further increase the command period of the integrated circuit 10 b.

Fig. 4 is a schematic diagram of an integrated circuit 20 in accordance with some embodiments of the present invention. The integrated circuit 20 and the operation method thereof described with reference to fig. 4 are similar to the integrated circuit 10 and the operation method thereof described with reference to fig. 1A and 1B. Only the differences between the two will be described below, and the same or similar parts will not be described again.

FIG. 4 illustrates a plurality of blocks of memory array 200 of integrated circuit 20, including, for example, block BK1 and block BK 2. Each block of the memory array 200 is similar to the block of the memory array 100 depicted in FIG. 1A, and has a plurality of columns (or strings) and a plurality of rows (or pages) of memory cells MC. One of the word lines WL connects the memory cells MC of each column, and the memory cells MC of each column (or string) are connected between the bit line BL and the source line SL. In some embodiments, multiple columns (or strings) of memory cells MC share the same source line SL in the same block. In addition, the word lines WL of one block (e.g., block BK1) are not connected (or electrically isolated) to the word lines WL of another block (e.g., block BK2), while the bit lines BL of different blocks (e.g., blocks BK1 and BK2) are connected to each other. In other words, each of the blocks has an independent word line WL and a shared bit line BL. In some embodiments, the source lines SL of different blocks may be coupled to each other. In an alternative embodiment, the source lines SL of one block (e.g., block BK1) are not connected (or electrically isolated) to the source lines SL of another block (e.g., block BK 2).

When the integrated circuit 20 is operating in the memory mode, the weights W are applied by the programming operations described with reference to FIG. 1AiA plurality of memory cells MC of the memory array 200 are written. On the other hand, when the integrated circuit 20 operates in the operation mode, a plurality of word lines WL of different blocks and one bit line BL shared by different blocks are selected, and the selected word line WL receives the input voltage Xi. In some embodiments, such input voltages XiHas a specific distribution (pattern), and the input voltages XiAre different from each other. For example, in the dibit mode, a plurality of input voltages XiIs a high logic level "1", while a plurality of input voltages XiAnother one of (1)Which is a low logic level "0". Further, the selected bit line BL receives a read voltage, and the other bit lines BL receive a pass voltage (e.g., 0V). Weight W stored in memory cell MC corresponding to selected word line WL and bit lineiNeutralizing the input voltage X in the memory cells MCiMultiplication. Inputting a voltage X via a word line WLiIn the embodiment transferred to the memory cell MC, the weight W stored in the memory cell MCiWhich may be referred to as the transconductance of the memory cell MC. Multiple weights WiCorresponding to input voltage XiIs output via the selected bit line BL in the form of a current signal. Since each bit line BL is shared by different blocks of the memory array 200, such output current signals from the different blocks are accumulated at the selected bit line BL. In some embodiments, multiple weights W are sensed through a page buffer PB coupled to the memory array 200iCorresponding to input voltage XiThe sum of a plurality of products of (a).

Based on the arrangement shown in fig. 4, the multiplication operation is performed in the memory cell MC, and the addition operation is performed through the bit line BL shared by different blocks. Thus, both multiply and add operations can be considered in-memory operations.

In the embodiment described with reference to fig. 4, over-erasing of the memory cell MC can be avoided before the programming operation is performed on the memory cell MC. That is, in an embodiment where the memory cell MC is an N-type transistor, the threshold voltage of the erased memory cell is greater than or equal to 0V. Thus, in the operation mode, the memory cells MC corresponding to the unselected word lines WL can receive the pass voltage of, for example, 0V, and can be completely turned off (turn off). Therefore, the output current signal can be contributed only by the memory cells MC corresponding to the selected word line WL and bit line BL, thereby improving the reliability of the integrated circuit 20.

In summary, the integrated circuit of the present invention can be operated in the memory mode and the operation mode. The integrated circuit includes a memory array, such as a nand flash memory array. The integrated circuit can execute the sum of product terms function and can be used for artificial intelligence application, a neural pattern simulation operation system and a learning program of a machine learning system. In the memory mode, the weights are written into the memory cells of the memory array. In the operational mode, the stored weights are multiplied by the input voltages delivered to the memory cells via the bit lines or word lines, and the products of the weights and the input voltages are accumulated. In contrast to the van newmann architecture, which performs operations in a data processing unit (e.g., a central processing unit) separate from a data storage unit (e.g., a memory integrated circuit), the integrated circuit of the present invention can operate in both a memory mode and an operational mode. Thus, data no longer needs to travel to and from the data processing unit and the data storage unit, and instruction cycles can be significantly increased. In particular, the page buffer used to write the weights to the memory cells and to receive the product of the weights and the input voltage is coupled to the memory array through a large number of bit lines with high parallelism, so the page buffer has a relatively high bandwidth. Thus, the integrated circuit can be used for large data operations, and may not suffer from performance bottlenecks such as the van Neumann type architecture.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:半导体存储装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!