Memory device and method of operating the same

文档序号：1939940 发布日期：2021-12-07 浏览：15次中文

阅读说明：本技术 存储器装置和操作存储器装置的方法 (Memory device and method of operating the same ) 是由阿密特·伯曼叶夫根尼·布莱克曼于 2021-02-03 设计创作，主要内容包括：公开了存储器装置和操作存储器装置的方法。所述方法包括：使用基于多个嵌入参数的神经网络将一组信息比特编程到一个或多个存储器单元中；使用包括与所述多个嵌入参数一起训练的多个网络参数的神经网络,基于所述一个或多个存储器单元的电压电平来确定一组预测的信息比特；以及基于所述组预测的信息比特从存储器装置读取信息比特。(Memory devices and methods of operating memory devices are disclosed. The method comprises the following steps: programming a set of information bits into one or more memory cells using a neural network based on a plurality of embedded parameters; determining a set of predicted information bits based on the voltage levels of the one or more memory cells using a neural network comprising a plurality of network parameters trained with the plurality of embedded parameters; and reading information bits from the memory device based on the set of predicted information bits.)

1. A method of operating a memory device, comprising:

programming a set of information bits into one or more memory cells using a neural network based on a plurality of embedded parameters;

determining a set of predicted information bits based on the voltage levels of the one or more memory cells using a neural network comprising a plurality of network parameters trained with the plurality of embedded parameters; and

reading information bits from a memory device based on the set of predicted information bits.

2. The method of claim 1, further comprising:

mapping the set of information bits to voltage levels of one or more memory cells based on the plurality of embedding parameters; and

a voltage level of the one or more memory cells is detected.

3. The method of claim 1, wherein,

the one or more memory cells include a plurality of memory cells, and the plurality of embedding parameters include an array having a number of dimensions equal to the number of memory cells.

4. The method of claim 1, further comprising:

embedding the set of information bits into an embedding space based on the plurality of embedding parameters to produce an embedded symbol;

applying an S-type function to constrain the embedded information symbols to produce constrained symbols; and

scaling the constraint symbols to produce scaled symbols corresponding to voltages within the effective dynamic range, wherein the set of information bits is mapped based on the scaled symbols.

5. The method of any of claims 1 to 4, further comprising:

generating a set of information bit probabilities based on voltage levels of the one or more memory cells using a neural network; and

selecting a highest information bit probability from the group of information bit probabilities, wherein the group of predicted information bits is identified based on the highest information bit probability.

6. A method of training an artificial neural network for a memory device, comprising:

initializing a plurality of embedding parameters and network parameters;

mapping a set of information bits to voltage levels of one or more memory cells based on the plurality of embedding parameters;

identifying a set of predicted information bits using an artificial neural network based on network parameters; and

updating the plurality of embedding parameters and network parameters based at least in part on the set of predicted information bits.

7. The method of claim 6, further comprising:

updating a network parameter based on the plurality of embedded parameters to produce an updated network parameter; and

updating the plurality of embedding parameters based on the updated network parameters to produce updated embedding parameters.

8. The method of claim 6, further comprising:

performing a plurality of training iterations, wherein the plurality of embedding parameters and network parameters are updated during each training iteration.

9. The method of any of claims 6 to 8, further comprising:

calculating a gradient of a classification loss function of the set of information bits and the set of predicted information bits, wherein the plurality of embedding parameters or network parameters are updated based on the gradient of the classification loss function.

10. The method of claim 9, wherein,

the gradient includes an approximation of the physical NAND channel.

11. The method of claim 9, further comprising:

a mathematical model of the one or more memory cells is identified, wherein a gradient of the classification loss function is calculated based on the mathematical model.

12. The method of claim 11, further comprising:

the mathematical model is updated based on data from the additional memory unit.

13. The method of any of claims 6 to 8, further comprising:

programming the set of information bits into the one or more memory cells based on a mapping; and

detecting a voltage level of the one or more memory cells to generate one or more detected voltage levels, wherein the set of predicted information bits is identified based on the one or more detected voltage levels.

14. The method of claim 13, further comprising:

generating a set of information bit probabilities based on the detected voltage levels using a neural network, wherein the set of predicted information bits is identified based on a highest information bit probability.

15. The method according to any one of claims 6 to 8,

the one or more memory cells include a plurality of memory cells, and the plurality of embedding parameters include an array having a number of dimensions equal to the number of memory cells.

16. The method of any of claims 6 to 8, further comprising:

embedding the set of information bits into an embedding space based on the plurality of embedding parameters to produce an embedded symbol;

applying an S-type function to constrain the embedded information symbols to produce constrained symbols; and

scaling the constraint symbols to produce scaled symbols, wherein the set of information bits is mapped based on the scaled symbols.

17. A memory device, comprising:

a plurality of memory cells;

a programming component comprising an embedding layer based on a plurality of embedding parameters; and

a reading component comprising a neural network based on a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedded parameters.

18. The apparatus of claim 17, wherein,

the programming component also includes an S-type layer and a scaling layer.

19. The apparatus of claim 17, wherein the neural network comprises a probability-based classifier; and/or

Wherein the plurality of memory cells comprise NAND memory cells.

20. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 16.

Technical Field

The following relates generally to memory devices and, more particularly, to selecting a programming voltage for a memory device.

Background

Memory devices are common electronic components for storing data. NAND (NAND) flash memory devices allow for the storage of multiple bits of data in each memory cell (or memory cell), thereby providing manufacturing cost and performance improvements. A memory cell storing multiple bits of data may be referred to as a multi-level (or multi-level) memory cell. The multi-level memory cells divide the threshold voltage range of the memory cells into a plurality of voltage states, and the data values written to the memory cells are extracted using the memory cell voltage levels.

The voltage level used to program the memory cell can be determined manually based on theoretical considerations. However, manual selection of voltage levels does not provide an optimal voltage level for minimizing read errors. Accordingly, there is a need in the art for improved systems for determining voltage levels for programming data to memory cells.

Disclosure of Invention

A method, apparatus, non-transitory computer-readable medium, and system for selecting a programming voltage for a memory device are described. Embodiments of the method, apparatus, non-transitory computer readable medium, and system may include: programming a set of information bits into one or more memory cells using neural network embedding based on a plurality of embedding parameters; determining a set of predicted information bits based on a voltage level of a memory cell using a neural network comprising a plurality of network parameters trained with the plurality of embedded parameters; and reading information bits from the memory device based on the set of predicted information bits.

A method, apparatus, non-transitory computer-readable medium, and system for selecting a programming voltage for a memory device are described. Embodiments of the method, apparatus, non-transitory computer readable medium, and system may initialize a plurality of embedded parameters and a set of network parameters; mapping a set of information bits to voltage levels of one or more memory cells based on the plurality of embedding parameters; identifying a set of predicted information bits using an Artificial Neural Network (ANN) based on network parameters; and updating the plurality of embedding parameters and network parameters based at least in part on the set of predicted information bits.

An apparatus, system, and method for selecting a programming voltage of a memory device are described. Embodiments of the apparatus, system, and method may include: a plurality of memory cells; a programming component comprising an embedding layer based on a plurality of embedding parameters; and a reading component comprising a neural network based on a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedded parameters.

Drawings

FIG. 1 illustrates an example of an implementation of a data processing system including a memory system in accordance with aspects of the present disclosure.

Fig. 2 illustrates an example of the memory system of fig. 1 in accordance with aspects of the present disclosure.

Fig. 3 illustrates an example of the non-volatile memory device of fig. 1 in accordance with aspects of the present disclosure.

Fig. 4 illustrates an example of the memory cell array of fig. 3, in accordance with aspects of the present disclosure.

Fig. 5 illustrates an example of a memory block of the memory cell array of fig. 4, in accordance with aspects of the present disclosure.

Fig. 6 illustrates an example of a voltage level constellation (constellation) in accordance with aspects of the present disclosure.

Fig. 7 illustrates an example of a learning-based memory system in accordance with aspects of the present disclosure.

Fig. 8 illustrates an example of a programming network in accordance with aspects of the present disclosure.

Fig. 9 illustrates an example of a read network in accordance with aspects of the present disclosure.

FIG. 10 illustrates an example of a process of operating a memory device in accordance with aspects of the present disclosure.

FIG. 11 illustrates an example of a process of programming information to a memory device, according to aspects of the present disclosure.

Fig. 12 illustrates an example of a process of training an ANN for selecting a programming voltage of a memory device, in accordance with aspects of the present disclosure.

Detailed Description

The present disclosure relates to systems and methods for programming and reading data from memory devices, and more particularly, selecting a programming voltage for a memory device. Particular embodiments disclosed relate specifically to NAND flash memory devices capable of storing 5-bit or 6-bit data in each memory cell.

Memory devices are common electronic components for storing data. NAND flash memory devices allow for the storage of multiple bits of data in each memory cell, providing manufacturing cost and performance improvements. A memory cell storing multiple bits of data may be referred to as a multi-level (or multi-level) memory cell. The multi-level memory cells divide the threshold voltage range of the memory cells into a plurality of voltage states, and the data values written to the memory cells are extracted using the memory cell voltage levels.

To read information from the memory device, the voltage of each cell is measured and the voltage level stored in the cell is inferred. The bits may then be recovered. There may be a tradeoff between the number of voltage levels and memory reliability. The larger the number of bits per cell, the more information that can be stored on the memory device, in which case there are more bits in each cell. Alternatively, voltages representing different levels may be packed more closely together because a greater number of distinguishable voltages are used within the same dynamic range. As a result, noise in cell programming or cell reading has a greater chance of changing a voltage of one level to another voltage representing a different level, and thus, errors appear when reading the cell.

There are a number of noise sources in the memory device that can cause erroneous reads of information (such as write noise, interference noise, aging, and read operations). The write noise is the voltage of the cell immediately after programming, which is different from the expected voltage due to the programming process. The interference noise is a function of the voltage of the cell that varies as a result of programming different adjacent cells. Programming a cell causes disturb that affects other cells. Aging is the greater the number of times the device is written to and read from, the greater the noise increase. Furthermore, the longer the time between programming of a cell, the more noise the cell will generate. In addition, the operation of the reading unit may cause noise and interference.

The memory device may be referred to as a channel. The term channel is used because write and/or transmit operations may enter and/or pass through the channel. When information is read, the information will be corrupted by noise, depending on the characteristics of the medium.

Memory programming is a complex process based on applying voltages to the memory cells. However, the cell voltage may be affected by variables such as current voltage level, pulse power, and intercell interference. Cell voltages may also be affected by inhibited cell interruption (cell interruption), inter-Word Line (WL) coupling, and cell retention (cell retention). Further, the results written to the NAND device can be random. For example, the data may also be noise that causes problems with the observation.

Conventional methods for selecting a programming voltage use manual optimization techniques (such as trial and error). These manual processes do not provide optimal performance and may not contain statistical data. Furthermore, a measure of success (such as a target voltage) is generated for a particular application and may not be suitable for multiple applications. Furthermore, manual optimization can be resource intensive, and tradeoffs can be made on various metrics so that other metrics can operate faster or more efficiently.

Thus, the systems and methods of the present disclosure can be used to find improved programming voltages for cells. A particular method of finding the programming voltage of a cell uses a learning-based memory system. The learning-based memory system includes a programming network, a NAND memory (or NAND channel), and a read network. A NAND memory may have a plurality of memory cells, each of which may be programmed using a plurality of different voltage levels.

Embodiments of the present disclosure may be used in a flash memory controller. Furthermore, the present disclosure may be superior to current manual optimization processes in terms of bit error rate and has the advantage of rapid development compared to manual optimization processes.

The present disclosure describes a method to find an optimized constellation (constellation) for modulation given a number of bits per unit N and a number of units K. The method may be automatic and may have data driven using data from a real NAND channel, and therefore finds a constellation that is particularly well suited for that channel. For a given number of bits per cell, embodiments of the present disclosure may find a constellation that produces a small number of errors when read.

The present disclosure utilizes machine learning to find the constellation. The training process may be performed offline during product development (not per a particular NAND chip instance). The training process results can then be applied to all instances of NAND chips with similar specifications.

The machine learning setup consists of a programming network module, a read network, and a NAND channel. The programming network takes the level as input and returns a sequence of voltages for that level. The programming network is a mapping performed by a constellation. The programming network enables continuous optimization of the voltage. The read network predicts the original information based on the detected voltage levels of the memory cells.

Hereinafter, exemplary embodiments of the inventive concept will be described more fully with reference to the accompanying drawings. Like reference numerals may refer to like elements throughout.

It will be understood that the terms "first," "second," "third," and the like, are used herein to distinguish one element from another, and that an element is not limited by these terms. Thus, a "first" element in an exemplary embodiment may be described as a "second" element in another exemplary embodiment.

It should be understood that the description of features or aspects within each exemplary embodiment should generally be considered applicable to other similar features or aspects in other exemplary embodiments, unless the context clearly dictates otherwise.

As used herein, the singular is intended to include the plural unless the context clearly indicates otherwise.

Herein, when one value is described as being about equal to or substantially the same as or equal to another value, it will be understood that the values are equal to each other within measurement error, or if measurably unequal, are sufficiently close in value to be functionally equal to each other as will be understood by those of ordinary skill in the art. For example, consider a measurement of a problem and an error (i.e., a limitation of the measurement system) associated with the measurement of a particular quantity, the term "about" as used herein includes the stated value and represents within an acceptable range of deviation of the particular value as determined by one of ordinary skill in the art. For example, "about" may mean within one or more standard deviations as understood by one of ordinary skill in the art. Further, it will be understood that while a parameter may be described herein as having a "about" particular value, according to an exemplary embodiment, the parameter may be an exact particular value or a particular value that is approximate within measurement error as would be understood by one of ordinary skill in the art.

Exemplary memory System

FIG. 1 is a block diagram illustrating an implementation of a data processing system including a memory system according to an exemplary embodiment of the inventive concepts.

Referring to FIG. 1, data processing system 10 may include a host 100 and a memory system 200. The memory system 200 shown in FIG. 1 may be used in a variety of systems that include data processing functionality. The various systems may be various devices including, for example, mobile devices such as smart phones or tablet computers. However, the various devices are not limited thereto.

Memory system 200 may include various types of memory devices. Herein, exemplary embodiments of the inventive concept will be described as including a memory device as a nonvolatile memory. However, the exemplary embodiments are not limited thereto. For example, the memory system 200 may include memory devices that are volatile memory.

According to an example embodiment, the memory system 200 may include non-volatile memory devices (such as, for example, Read Only Memory (ROM), magnetic disks, optical disks, flash memory, etc.). The flash memory may be a memory that stores data according to a variation in threshold voltage of a Metal Oxide Semiconductor Field Effect Transistor (MOSFET), and may include, for example, a NAND and NOR (NOR) flash memory. The memory system 200 may be implemented using a memory card including a non-volatile memory device, such as, for example, an embedded multi-media card (eMMC), a Secure Digital (SD) card, a micro SD card, or a universal flash memory (UFS), or the memory system 200 may be implemented using, for example, an SSD including a non-volatile memory device. Here, assuming that the memory system 200 is a nonvolatile memory system, the configuration and operation of the memory system 200 will be described. However, the memory system 200 is not limited thereto. The host 100 may include, for example, a system-on-chip (SoC) Application Processor (AP) installed on, for example, a mobile device or a Central Processing Unit (CPU) included in a computer system.

As described above, host 100 may include AP 110. The AP 110 may include various Intellectual Property (IP) blocks. For example, the AP 110 may include a memory device driver 111 that controls the memory system 200. The host 100 may communicate with the memory system 200 to send commands related to memory operations and receive acknowledgement commands in response to the sent commands. Host 100 may also communicate with memory system 200 regarding a TABLE of information related to memory operations (e.g., INFO _ TABLE).

Memory system 200 may include, for example, a memory controller 210 and a memory device 220. The memory controller 210 may receive a command related to a memory operation from the host 100, generate an internal command and an internal clock signal using the received command, and provide the internal command and the internal clock signal to the memory device 220. The memory device 220 may store write data in the memory cell array in response to an internal command or may provide read data to the memory controller 210 in response to an internal command.

Memory device 220 includes an array of memory cells that retain data stored therein even when memory device 220 is not powered on. The memory cell array may include memory cells, such as NAND or NOR flash memory, Magnetoresistive Random Access Memory (MRAM), Resistive Random Access Memory (RRAM), ferroelectric access memory (FRAM), or Phase Change Memory (PCM). For example, when the memory cell array includes a NAND flash memory, the memory cell array may include a plurality of blocks and a plurality of pages. Data can be programmed and read in units of pages, and data can be erased in units of blocks. An example of a memory block included in the memory cell array is shown in fig. 4.

Fig. 2 is a block diagram illustrating the memory system 200 of fig. 1 according to an exemplary embodiment of the inventive concepts.

Referring to fig. 2, a memory system 200 includes a memory device 220 and a memory controller 210. Memory controller 210 may also be referred to herein as a controller circuit. The memory device 220 may perform a write operation, a read operation, or an erase operation under the control of the memory controller 210.

Memory controller 210 may control memory devices 220 according to requests received from host 100 or internally specified schedules. The memory controller 210 may include a controller core 211, an internal memory 214, a host interface block 215, and a memory interface block 216. The memory controller 210 may further include a device information storage 217, the device information storage 217 being configured to provide the first device information DI1 to the host interface block 215 and the second device information DI2 to the controller core 211.

The controller core 211 may include a memory control core 212 and a machine learning core 213, and each of these cores may be implemented by one or more processors. Memory control core 212 may control and access memory device 220 according to requests received from host 100 or internally specified schedules. The memory control core 212 may manage and execute various metadata and code for managing or operating the memory system 200.

As described in further detail below, the machine learning core 213 may be used to perform training and inference of neural networks designed to perform noise cancellation on the memory device 220.

The internal memory 214 may be used as, for example, a system memory used by the controller core 211, a cache memory that stores data of the memory device 220, or a buffer memory that temporarily stores data between the host 100 and the memory device 220. The internal memory 214 may store a mapping table MT indicating the relationship between logical addresses allocated to the memory system 200 and physical addresses of the memory devices 220. The internal memory 214 may include, for example, DRAM or SRAM.

In an exemplary embodiment, a neural network (such as the read network described with reference to fig. 9) may be included in a computer program stored in the internal memory 214 of the memory controller 210 or in the memory device 220. A computer program comprising a neural network may be executed by machine learning core 213 to de-noise data stored in memory device 220. Thus, according to an example embodiment, the memory system 200 may denoise data stored in the memory device 220 during a normal read operation of the memory device 220. That is, after the manufacture of the memory system 200 is completed, during normal operation of the memory system 200, in particular, during normal read operation of the memory system 200 that reads data from the memory device 220, the data being read stored in the memory device 220 may be denoised using a neural network locally stored and executed in the memory system 200, and the denoised data may be read out from the memory device 220.

The host interface block 215 may include components (such as, for example, physical blocks) for communicating with the host 100. The memory interface block 216 may include components (such as, for example, physical blocks) for communicating with the memory device 220.

Next, the operation of the memory system 200 over time will be described. When power is supplied to the memory system 200, the memory system 200 may perform initialization with the host 100.

The host interface block 215 may provide the first request REQ1 received from the host 100 to the memory control core 212. The first request REQ1 may include a command (e.g., a read command or a write command) and a logical address. The memory control core 212 may convert the first request REQ1 into a second request REQ2 suitable for the memory device 220.

For example, the memory control core 212 may translate the format of commands. The memory control core 212 may refer to the mapping table MT stored in the internal memory 214 to obtain the address information AI. The memory control core 212 may translate logical addresses into physical addresses of the memory device 220 by using the address information AI. The memory control core 212 may provide a second request REQ2 appropriate for the memory device 220 to the memory interface block 216.

The memory interface block 216 may register the second request REQ2 from the memory control core 212 in a queue (queue). The memory interface block 216 may send the request first registered in the queue to the memory device 220 as a third request REQ 3.

When the first request REQ1 is a write request, the host interface block 215 may write data received from the host 100 to the internal memory 214. When the third request REQ3 is a write request, the memory interface block 216 may send data stored in the internal memory 214 to the memory device 220.

When the data is completely written, the memory device 220 may send a third response RESP3 to the memory interface block 216. In response to the third response RESP3, the memory interface block 216 may provide a second response RESP2 to the memory control core 212 indicating that the data was completely written.

After the data is stored in the internal memory 214 or after the second response RESP2 is received, the memory control core 212 may send a first response RESP1 to the host 100 over the host interface block 215 indicating that the request is complete.

When the first request REQ1 is a read request, the read request may be sent to the memory device 220 through the second request REQ2 and the third request REQ 3. The memory interface block 216 may store data received from the memory device 220 in the internal memory 214. When the data is completely sent, the memory device 220 may send a third response RESP3 to the memory interface block 216.

When the third response RESP3 is received, the memory interface block 216 may provide a second response RESP2 to the memory control core 212 indicating that the data was completely read. When the second response RESP2 is received, the memory control core 212 may send the first response RESP1 to the host 100 through the host interface block 215.

The host interface block 215 may send data stored in the internal memory 214 to the host 100. In one exemplary embodiment, in the case where data corresponding to the first request REQ1 is stored in the internal memory 214, the transmission of the second request REQ2 and the third request REQ3 may be omitted.

Memory device 220 may also send first serial peripheral interface information SPI1 to memory interface block 216. The memory interface block 216 may send the second serial peripheral interface information SPI2 to the controller core 211.

Fig. 3 is a detailed block diagram of the nonvolatile memory device 220 of fig. 1 according to an exemplary embodiment of the inventive concept. Referring to fig. 3, the memory device 220 may include, for example, a memory cell array 221, a control logic 222, a voltage generation unit 223, a row decoder 224, and a page buffer 225.

The memory cell array 221 may be connected to one or more string selection lines SSL, a plurality of word lines WL, one or more ground selection lines GSL, and a plurality of bit lines BL. The memory cell array 221 may include a plurality of memory cells disposed at intersections between a plurality of word lines WL and a plurality of bit lines BL.

Control logic 222 may receive commands CMD (e.g., internal commands) and addresses ADD from memory controller 210 and control signals CTRL from memory controller 210 for controlling various functional blocks within memory device 220. The control logic 222 may output various control signals for writing data to the memory cell array 221 or reading data from the memory cell array 221 based on the command CMD, the address ADD, and the control signal CTRL. In this manner, control logic 222 may control the overall operation of memory device 220.

Various control signals output by the control logic 222 may be provided to the voltage generation unit 223, the row decoder 224, and the page buffer 225. For example, control logic 222 may provide voltage control signals CTRL _ vol to voltage generation unit 223, row addresses X-ADD to row decoder 224, and column addresses Y-ADD to page buffer 225.

The voltage generation unit 223 may generate various voltages for performing program, read, and erase operations on the memory cell array 221 based on the voltage control signal CTRL _ vol. For example, the voltage generating unit 223 may generate a first driving voltage VWL for driving the plurality of word lines WL, a second driving voltage VSSL for driving the plurality of string selection lines SSL, and a third driving voltage VGSL for driving the plurality of ground selection lines GSL. In this case, the first driving voltage VWL may be a program voltage (e.g., a write voltage), a read voltage, an erase voltage, a pass voltage (pass voltage), or a program verify voltage. Also, the second driving voltage VSSL may be a string selection voltage (e.g., an on voltage or an off voltage). Further, the third driving voltage VGSL may be a ground selection voltage (e.g., an on voltage or an off voltage).

The row decoder 224 may be connected to the memory cell array 221 through a plurality of word lines WL, and may activate a portion of the plurality of word lines WL in response to a row address X-ADD received from the control logic 222. For example, in a read operation, row decoder 224 may apply a read voltage to a selected word line and a pass voltage to unselected word lines.

In a programming operation, row decoder 224 may apply a program voltage to a selected word line and a pass voltage to unselected word lines. In one example embodiment, the row decoder 224 may apply a program voltage to a selected word line and an additional selected word line in at least one of a plurality of program cycles.

The page buffer 225 may be connected to the memory cell array 221 through a plurality of bit lines BL. For example, in a read operation, the page buffer 225 may operate as a sense amplifier that outputs data stored in the memory cell array 221. Alternatively, in a program operation, the page buffer 225 may operate as a write driver that writes desired data to the memory cell array 221.

Fig. 4 and 5 illustrate an example of implementing the memory system 200 using a three-dimensional flash memory. Three-dimensional flash memory may include three-dimensional (e.g., vertical) NAND (e.g., VNAND) memory cells. The implementation of the memory cell array 221 including three-dimensional memory cells is described below. Each of the memory cells described below may be a NAND memory cell.

Fig. 4 is a block diagram of the memory cell array 221 of fig. 3 according to an exemplary embodiment of the inventive concept.

Referring to fig. 4, the memory cell array 221 according to an exemplary embodiment includes a plurality of memory blocks BLK1 through BLKz (where z is a positive integer greater than 1). Each of the memory blocks BLK1 through BLKz has a three-dimensional structure (e.g., a vertical structure). For example, each of the memory blocks BLK1 through BLKz may include structures extending in the first direction through the third direction. For example, each of the memory blocks BLK1 through BLKz may include a plurality of NAND strings extending in the second direction. The plurality of NAND strings may be arranged in, for example, a first direction to a third direction.

Each NAND string is connected to a bit line BL, a string select line SSL, a ground select line GSL, a word line WL, and a common source line CSL. That is, each of the memory blocks BLK1 through BLKz may be connected to a plurality of bit lines BL, a plurality of string select lines SSL, a plurality of ground select lines GSL, a plurality of word lines WL, and a common source line CSL. The memory blocks BLK1 through BLKz will be described in further detail below with reference to fig. 5.

Fig. 5 is a circuit diagram of a memory block BLKi according to an exemplary embodiment of the inventive concept. Fig. 5 illustrates an example of one of the memory blocks BLK1 through BLKz in the memory cell array 221 of fig. 4. The number of memory cells and word lines depicted in fig. 5 is merely an example, and any suitable number of memory cells and word lines may be used.

The memory block BLKi may include a plurality of cell strings CS11 through CS41 and CS12 through CS 42. The plurality of cell strings CS11 to CS41 and CS12 to CS42 may be arranged in a column direction and a row direction to form columns and rows. Each of the cell strings CS11 through CS41 and CS12 through CS42 may include a ground selection transistor GST, memory cells MC1 through MC6, and a string selection transistor SST. The ground selection transistor GST, the memory cells MC1 to MC6, and the string selection transistor SST included in each of the cell strings CS11 to CS41 and CS12 to CS42 may be stacked in a height direction substantially perpendicular to the substrate.

Columns of the plurality of cell strings CS11 through CS41 and CS12 through CS42 may be connected to different string selection lines SSL1 through SSL4, respectively. For example, the string selection transistors SST of the cell strings CS11 and CS12 may be commonly connected to a string selection line SSL 1. The string selection transistors SST of the cell strings CS21 and CS22 may be commonly connected to a string selection line SSL 2. The string selection transistors SST of the cell strings CS31 and CS32 may be commonly connected to a string selection line SSL 3. The string selection transistors SST of the cell strings CS41 and CS42 may be commonly connected to a string selection line SSL 4.

The rows of the plurality of cell strings CS11 to CS41 and CS12 to CS42 may be connected to different bit lines BL1 and BL2, respectively. For example, the string selection transistors SST of the cell strings CS11 to CS41 may be commonly connected to the bit line BL 1. The string selection transistors SST of the cell strings CS12 to CS42 may be commonly connected to the bit line BL 2.

Columns of the plurality of cell strings CS11 to CS41 and CS12 to CS42 may be connected to different ground selection lines GSL1 to GSL4, respectively. For example, the ground selection transistors GST of the cell strings CS11 and CS12 may be commonly connected to the ground selection line GSL 1. The ground selection transistors GST of the cell strings CS21 and CS22 may be commonly connected to the ground selection line GSL 2. The ground selection transistors GST of the cell strings CS31 and CS32 may be commonly connected to the ground selection line GSL 3. The ground selection transistors GST of the cell strings CS41 and CS42 may be commonly connected to the ground selection line GSL 4.

Memory cells disposed at the same height from the substrate (or the ground selection transistor GST) may be commonly connected to a single word line, and memory cells disposed at different heights from the substrate may be respectively connected to different word lines WL1 to WL 6. For example, memory cell MC1 may be commonly connected to word line WL 1. Memory cell MC2 may be connected in common to word line WL 2. Memory cell MC3 may be connected in common to word line WL 3. Memory cell MC4 may be connected in common to word line WL 4. Memory cell MC5 may be connected in common to word line WL 5. Memory cell MC6 may be connected in common to word line WL 6. The ground selection transistors GST of the cell strings CS11 through CS41 and CS12 through CS42 may be commonly connected to the common source line CSL.

Modulation

Fig. 6 illustrates an example of a voltage level constellation 600 in accordance with aspects of the present disclosure. Voltage level constellation 600 represents an example of a modulation scheme that may be used to program data to a memory device in accordance with an embodiment of the present disclosure. The modulation scheme may involve grouping the memory cells into a given number (denoted by K) of groups and dividing the voltage level of each memory cell into discrete levels (denoted by N).

The voltage level constellation 600 includes two units with two bits per unit (bpc), i.e., N-2 bpc and K-2 units. However, in other examples, a different number of cells and bpcs may be used. Each cell is represented as an axis and each information symbol 605 is represented by a pair of voltage levels (one voltage level per cell). The number of cells determines the number of axes, and bpc determines the number of potential voltage levels.

For example, if bpc is equal to N, the number of voltage levels per cell may be2^N. Thus, there are two axes, each with 4 levels, such that voltage level constellation 600 includes 16 information symbols 605. Note that the positions of the information symbols 605 are not perfectly aligned. That is, the voltage levels may be unevenly spaced, and the voltage levels may not be the same for each cell.

Thus, modulation may be used to write information to the NAND device. In short, the modulation does not write N bits per cell, but N × K bits per K cells. To store a stream of bits to a NAND device, the stream of information is grouped into groups of (N × K) bit size. Each group may have a value of 2^N×KA different combination. Thus, execution goes from each group to the range [0, (2)^N×K-1)]Of (2) is mapped to the number L in (1).

A single level is written to each non-interleaved set of K cells. Each level is associated with a unique sequence of K voltages (such as V1(L), V2(L), …, vk (L)) by a fixed, predetermined mapping. Therefore, in order to store the level L in the K cells, the first cell to voltage V1(L), the second cell to voltage V2(L), and the like are jointly programmed.

If the constellation is calculated correctly, the modulation increases reliability (e.g., reduces bit error rate). This is true when there is interference noise between the jointly programmed cells. In the field of telecommunications, modulation is widely used for similar reasons. The appropriate modulation scheme may be selected based on the physical channel and communication needs.

Modulation may be provided for storing a non-integer number of bits per cell. For example, we can store 3 per cell by storing 7 bits in 2 cells¹/₂A bit. In other words, if we have 128 (128-2)⁷) One level and we write 128 levels on 2 cells, this amounts to 3 per cell in terms of bits per cell¹/₂A bit.

In another example, even if NxK is not an integer, at 2^(N×K)Is an integer limit, a non-integer number of bits per cell may also be used. For example, the memory device may be based on K-3 cells, and 2^(N×K)30000. In this case, since the number of bits is N × K, which may not be an integer, there may be a challenge to detect the set of information bits. That is to say that the position of the first electrode, therefore, this case (N × K is not an integer) can be limited to a case where two conditions are satisfied. I.e. 1) the number of information bits is obtained by rounding N × K to the nearest integer (i.e. 15 in this example), and 2) some combinations of information bits are not allowed in input and output. This means that there may be less than 2 for the input¹⁵And (6) selecting the options. For example, the number of allowed combinations may be up to 30000.

According to various embodiments, the programming voltage selection of the cells may be performed by a Pulse Amplitude Modulation (PAM) modulation algorithm, wherein the encoded bits are divided into groups of the number of bits per cell. For example, in a Three Level Cell (TLC), the number of bits per cell is 3. Each group of cells is called a symbol. For example, the symbol with bit 010 is equal to 2. The Nth dynamic range is divided into 2 per cell^NA bit. For example, the 3 rd V dynamic range is divided into 8 target voltages. Each target voltage is mapped to a symbol using a gray code, where only a single bit is changed between adjacent target voltages. For example, if the dynamic range is between-3V and 4V, then the-3V modulation is 111, the-2V modulation is 110, the-1V modulation is 100, the 0V modulation is 101, the 1V modulation is 001, the 2V modulation is 000, the 3V modulation is 010, and the 4V modulation is 011.

The example modulation corresponds to an Additive White Gaussian Noise (AWGN) channel. However, in many cases, the NAND and AWGN channels are not identical, and PAM modulation is not necessarily optimal for the NAND channel. Optionally, heuristic optimization may be performed to find a target voltage with improved performance.

Learning-based memory

Fig. 7 illustrates an example of a learning-based memory system in accordance with aspects of the present disclosure. The illustrated example includes a programming network 700, a memory device 705, and a read network 710.

In some examples, programming network 700 and reading network 710 may include an Artificial Neural Network (ANN). An ANN may be a hardware or software component that includes a plurality of connected nodes (also known as artificial neurons) that may roughly correspond to neurons in the human brain. Each connection or edge (like a physical synapse in the brain) may send a signal from one node to another. When a node receives a signal, the node may process the signal and then send the processed signal to other connected nodes. In some cases, the signals between nodes include real numbers, and the output of each node may be calculated as a function of the sum of the inputs of each node. Each node and edge may be associated with one or more node weights that determine how the signal is processed and transmitted.

It should be noted that this description of the ANN is more visual than the text. In other words, it describes one way of interpreting the ANN, but not necessarily how it can be implemented. In a hardware or software implementation of an ANN, transmitting and receiving signals may not be performed literally.

During the training process, these weights may be adjusted to improve the accuracy of the results (i.e., by minimizing a loss function that corresponds in some way to the difference between the current result and the target result). The weight of an edge may increase or decrease the strength of signals sent between nodes. In some cases, a node may have a threshold below which no signal is sent at all. Nodes may also be aggregated into layers. Different layers may perform different transformations on the inputs of the different layers. The initial layer may be referred to as an input layer and the last layer may be referred to as an output layer. In some cases, a signal may pass through a particular layer multiple times.

Programming network 700 maps a set of information bits to voltage levels of one or more memory cells based on a set of embedding parameters. Programming network 700 may program the set of information bits into one or more memory cells based on the mapping. Programming network 700 may also apply a gray code to the set of information bits, wherein the mapping is based on the gray code. In some casesIn an example, there may be a plurality of memory cells, and the set of embedding parameters includes an array having a number of dimensions equal to the number of memory cells. The dimension may be different from the number of elements in the array. For example, the number of elements in the array may equal the number of possible levels (i.e., 2)^(N×K)) And each element of the array may be a K-dimensional vector.

Before training, programming network 700 and reading network 710 may initialize a set of embedded parameters and a set of network parameters. Programming network 700 may include a programming component that includes an embedding layer based on a set of embedding parameters. In some examples, the programming component further includes a sigmoid (sigmoid) layer and a scaling layer. The programming network 700 may be an example of or include aspects of the respective one or more elements described with reference to fig. 8.

The programming network 700 maps the input symbols to "in", which is taken from a finite discrete set K. For example, the input to the programming network may be a single symbol, which may be taken from the set [0, …,2 ]^(N×K)_1]Wherein the input comprises nxk bits.

Further, the programming network 700 may convert the input symbols into programming voltages x 1. The channel adds unknown noise n such that y ═ x + n, where y represents the output of memory device 705 and x represents the programming voltage x 1. The reading network 710 retrieves the input symbol "in _ predicted". The output of the read network may include not only a prediction of the input symbols. For example, the output may include a score (or probability) for each possible input symbol. The symbol with the highest score (e.g., "predicted") may be used as the prediction, but all scores may be used when a loss function (e.g., cross entropy) is calculated.

In some examples, programming network 700 and read network 710 may be trained together. For example, programming network 700 and reading network 710 may be trained by minimizing the cross entropy between "in" and "in _ predicted" or using some other suitable loss function.

The example programming network 700 may include an embedded layer. The input may be from 2^(N×K)Of a single digitA single symbol of the set. That is, the input may be N × K bits, and the output may be a sequence of K voltages.

Further, the programming network 700 may include an S-type layer and a scaling layer for scaling to the dynamic range of the memory cells. Thus, programming network 700 finds the programming voltage of the cell.

The memory device 705 may include a set of memory cells as described with reference to fig. 1-5. In some examples, the group of memory cells includes NAND memory cells. The memory cells can be set to a specified voltage in a range known as the dynamic range. The terms write and program may be used to describe the process of setting the cell to a desired voltage.

The following is an example process of storing information on a NAND device. A bit stream (b1, b2, b3, b4, b5, b6, b7, b8, b9, b10 …) is given, the bits being grouped into groups having a size of N bits. For example, if N is 4, the groups are represented as (b1, b2, b3, b4), (b5, b6, b7, b8), (b9, b10, b11, b12), etc., where N is the number of bits per unit. For each group of N bits, there is 2 of value^NDifferent combinations are possible. Thus, execution goes from each group to the range [0, (2)^N-1)]Mapping of integer L in (1). The number L indicates the level. A single level is written to each memory cell. Each level is associated with a unique voltage v (l) by a fixed, predetermined mapping. This type of mapping is referred to as a constellation, where a constellation represents a mapping from level to voltage. Thus, to store level L in a cell, voltage V (L) is programmed into the cell.

To read information from the memory device 705, the voltage of each cell is measured and the voltage level stored in the cell is inferred. The bits may then be recovered. In some cases, there may be a tradeoff between the value of N and memory reliability. The larger the value of N, the more information that can be stored on the memory device, in which case there are more bits in each cell. Alternatively, voltages representing different levels may be packed more closely together because a greater number of distinguishable voltages are used within the same dynamic range. As a result, noise in cell programming or cell reading has a greater chance of changing a voltage of one level to another voltage representing a different level, and thus, errors appear when reading the cell.

There are a number of noise sources in the memory device 705 that can cause erroneous reading of information (such as write noise, interference noise, aging, and read operations). The write noise is a cell voltage immediately after programming that is different from an expected voltage due to the programming process. The interference noise is a function of the voltage of the cell that varies as a result of programming different adjacent cells. Programming a cell causes disturb that affects other cells. Aging is the more times the memory device 705 is written to and read from, the greater the noise increase. Furthermore, the longer the time between programming of a cell, the more noise the cell will generate. In addition, the operation of the reading unit may cause noise and interference.

The memory device 705 may be referred to as a channel. The term channel is used because write and/or transmit operations may enter and/or pass through the channel. When information is read, the information will be corrupted by noise, depending on the characteristics of the medium.

The read network 710 detects voltage levels of one or more memory cells to generate one or more detected voltage levels. The read network 710 may then identify a set of predicted information bits based on one or more detected voltage levels using a neural network that includes a set of network parameters. In some cases, the network parameters are trained along with the embedded parameters.

According to one embodiment, the read network 710 may generate a set of information bit probabilities based on the detected voltage levels using a neural network. The reading network 710 may then select the highest information bit probability from the set of information bit probabilities. In some cases, the set of predicted information bits is identified based on a highest information bit probability.

The reading network 710 may use an ANN based on network parameters to identify a set of predicted information bits. The read network 710 can include a read component that includes a neural network based on a set of network parameters. In some cases, the network parameters are trained along with the embedded parameters. In some examples, the neural network includes a probability-based classifier. The read network 710 may be an example of or include aspects of the respective one or more elements described with reference to fig. 9.

Fig. 8 illustrates an example of a programming network 800 in accordance with aspects of the present disclosure. The programming network 800 may be an example of or include aspects of the respective one or more elements described with reference to fig. 7. Programming network 800 may include an embedding layer 805, an S-type layer 810, and a scaling layer 815.

The embedding layer 805 embeds a set of information bits into an embedding space based on embedding parameters to generate an embedded symbol. S-type layer 810 applies an S-type function to constrain the embedded information symbols to produce constrained symbols (constrained symbols). The scaling layer 815 scales the constraint symbols to produce scaled symbols corresponding to voltages within the effective dynamic range. In some cases, the set of information bits is mapped based on the scaling symbols.

Fig. 9 illustrates an example of a read network 900 in accordance with aspects of the present disclosure. The read network 900 may be an example of or include aspects of the corresponding one or more elements described with reference to fig. 7. As shown, the read network 900 may be a neural network including one or more fully-connected layers (e.g., fully-connected linear layers) 905 and one or more modified linear unit (ReLU) layers 910. In some examples, the fully connected layer 905 and the ReLU layer 910 alternate as shown in fig. 9. However, this arrangement is used as an example, and any suitable neural network capable of learning the detected voltage levels associated with the information bits may be used.

In some cases, one or more batch normalization (batch normalization) may be used during training of the neural network. In some cases, networks including batch normalization may use higher learning rates without disappearing gradients or exploding. Furthermore, batch normalization may normalize the network, making generalization easier. Thus, in some cases, it may not be necessary to use dropping (dropout) to mitigate overfitting. The network may also become more robust for different initialization schemes and learning rates. Batch normalization may be achieved by fixing the mean and variance of the inputs for each layer. In some cases, normalization may be performed over the entire training set. In other cases, normalization is limited to each mini-batch (mini-batch) in the training process.

In a neural network, an activation function may be used to transform the summed weighted inputs from the nodes into activations or outputs of the nodes. The ReLU layer can realize a modified linear activation function, the modified linear activation function comprises a piecewise linear function, if the input is a regular piecewise linear function, the input is directly output, otherwise, the piecewise linear function outputs zero. The modified linear activation function may be used as a default activation function for many types of neural networks.

Using a modified linear activation function may enable training of a deep neural network using random gradient descent and error back-propagation together. The modified linear activation function may operate similar to a linear function, but the modified linear activation function may enable complex relationships in the data to be learned. Modifying the linear activation function may also provide greater sensitivity to activate the sum input to avoid saturation. The node or unit implementing the modified linear activation function may be referred to as a modified linear activation unit, or simply, a ReLU. A network that uses a modification function for the hidden layer may be referred to as a modified network.

Operation of

FIG. 10 illustrates an example of a process of operating a memory device in accordance with aspects of the present disclosure. According to various embodiments, the memory device may include an ANN, and the operation of the memory device may include finding an output of the ANN based on the voltage level read from the memory device.

In some examples, the operations may be performed by a system including a processor executing a set of codes to control functional elements of a device. Additionally or alternatively, the processing may be performed using dedicated hardware. In general, these operations may be performed based on the methods and processes described in accordance with aspects of the present disclosure. For example, an operation may consist of individual sub-steps or may be performed in conjunction with other operations described herein.

At operation 1000, the system maps a set of information bits to voltage levels of one or more memory cells based on a set of embedding parameters. In some cases, the operations of this step may be performed with reference to or by a programming network as described with reference to fig. 7 and 8. In some cases, the information bits may be mapped based on a modulation scheme utilizing the voltage level constellation described with reference to fig. 6. For example, the programming network parameter may include a voltage level of each of a plurality of cells corresponding to each symbol in the constellation. Further details regarding the process for mapping information bits are described with reference to fig. 11.

In operation 1005, the system programs the set of information bits into one or more memory cells based on the mapping. In some cases, the operations of this step may be performed with reference to or by a programming network as described with reference to fig. 7 and 8.

In particular, the programming network may include an embedded layer 2^N×K→ K, insertion layer 2^N×K→ K may include the range [0, (2)^N×K-1)]A table of sequences of integers in (1) mapped to real numbers of length K. All entries in the table are treated as independent variables that can be optimized.

The output of the embedding layer may be passed through an S-type function, which is a continuously differentiable monotonic function that takes the input and converts it to a number in the range [0,1 ]. Each element of the sequence of length K is passed through an S-type function. The result of this function is then rescaled to the range [ VMIN, VMAX ], where VMIN and VMAX are the minimum and maximum allowed voltages (i.e., dynamic range). Rescaling is performed using the function x → x (VMAX-VMIN) + VMIN. The sigmoid function and rescaling ensure that the output of the programming network is within a valid range.

In operation 1010, the system detects voltage levels of one or more memory cells to generate one or more detected voltage levels. In some cases, the operations of this step may be with reference to or performed by a read network as described with reference to fig. 7 and 9.

At operation 1015, the system identifies a set of predicted information bits based on the one or more detected voltage levels using a neural network that includes a set of network parameters, wherein the network parameters are trained with the embedded parameters. In some cases, the operations of this step may be with reference to or performed by a read network as described with reference to fig. 7 and 9. For example, the reading network may identify a predicted constellation symbol and identify a set of information bits associated with the constellation symbol.

The read network may be a neural network classifier that takes as input a sequence of K voltages read from K cells from the memory device and returns a prediction of which level was written to the K cells. The read network may be any neural network or any differentiable model. The number of outputs of the read network is 2^N×KWherein 2 is^N×KEach of the numbers represents a fraction of the corresponding level given by the reading network. For example, the score may represent a probability at a corresponding level. The channel may be a real memory channel or a model of a memory channel.

FIG. 11 illustrates an example of a process of programming information to a memory device, according to aspects of the present disclosure. In some examples, the operations may be performed by a system including a processor executing a set of codes to control functional elements of a device. Additionally or alternatively, the processing may be performed using dedicated hardware. In general, these operations may be performed based on the methods and processes described in accordance with aspects of the present disclosure. For example, an operation may consist of individual sub-steps or may be performed in conjunction with other operations described herein.

In operation 1100, the system embeds the set of information bits into an embedding space based on an embedding parameter to generate an embedded symbol. In some cases, the operations of this step may refer to or may be performed by an embedding layer as described with reference to fig. 8.

In operation 1105, the system applies an S-type function to constrain the embedded information symbols to produce constrained symbols. In some cases, the operation of this step may be performed with reference to or by an S-type layer as described with reference to fig. 8.

In operation 1110, the system scales the constraint symbols to generate scaled symbols corresponding to voltages within the effective dynamic range, wherein the set of information bits is mapped based on the scaled symbols. In some cases, the operations of this step may refer to or be performed by a scaling layer as described with reference to fig. 8.

Training

Fig. 12 illustrates an example of a process for training an ANN for selecting a programming voltage of a memory device, in accordance with aspects of the present disclosure. In some examples, the operations may be performed by a system including a processor executing a set of codes to control functional elements of a device. Additionally or alternatively, the processing may be performed using dedicated hardware. In general, these operations may be performed based on the methods and processes described in accordance with aspects of the present disclosure. For example, an operation may consist of individual sub-steps or may be performed in conjunction with other operations described herein.

In operation 1200, the system initializes a set of embedding parameters and a set of network parameters. In some cases, the operation of this step may be with reference to or may be performed by a programming network as described with reference to fig. 7 and 8. In some examples, the training process randomly initializes the parameters of the programming network (i.e., the values in the embedded layers). The training process will then randomly initialize the parameters (weights and biases) of the read network.

In operation 1205, the system maps a set of information bits to voltage levels of one or more memory cells based on the embedding parameters. For example, the mapping may be based on a programming constellation as described above with reference to fig. 6. In some cases, the operation of this step may be with reference to or may be performed by a programming network as described with reference to fig. 7 and 8.

At operation 1210, the system identifies a set of predicted information bits using an ANN based on network parameters. In some cases, the operations of this step may be with reference to or performed by a read network as described with reference to fig. 7 and 9.

At operation 1215, the system updates the embedding parameters and the network parameters based at least in part on the set of predicted information bits. For example, the parameters may be updated based on the output of the ANN, which may include additional information in addition to the predicted information. In particular, the output of the ANN may include scores for various combinations of information bits. In some cases, the operations of this step may be referenced to or performed by a training component.

The process of generating output using the ANN and then updating the parameters of the ANN may be repeated multiple times before the training process is completed. For example, the training process may continue until a threshold accuracy is achieved, until a predetermined number of training iterations have been performed, or until network parameters converge. According to one embodiment, the system may update the network parameters based on the embedded parameters to produce updated network parameters; and updating the embedding parameters based on the updated network parameters to produce updated embedding parameters.

According to one embodiment, updating the network parameters may be done according to the following algorithm. For each iteration of the algorithm of the present disclosure, the programming network and the read network are optimized and the cross entropy is minimized. That is, the system may perform a plurality of training iterations, wherein the embedding parameters and the network parameters are updated during each of the plurality of training iterations. For each iteration, the cross entropy is optimized twice. Let P (θ) be the programming network. Let R (φ) be the read network. "Info" refers to a small batch of information bits used per network.Is a small batch of estimated information bits at the output of the read network. Variable lambda_φ、λ_θIndicating the learning rate. Crosscontrol (info,) Representing the cross entropy.

Example training Algorithm

Thus, according to a particular embodiment, the loss function may be calculated using the predicted fraction and the true voltage level. One option for the loss function is cross entropy, but other options exist. In both the programming network and the reading network, the gradient of the loss is calculated for all optimizable parameters. For example, a gradient of a classification loss function of the set of information bits and the set of predicted information bits is calculated, wherein the embedding parameter or the network parameter is updated based on the gradient of the classification loss function. The parameters are updated using gradients to minimize losses. The updating may be done using any suitable optimization algorithm, such as random gradient descent, Adam, etc. These steps are repeated until there is convergence. An additional variation of training is to switch each step between updating the programming network and updating the read network.

Note that gradients can be calculated during training. The gradient can be calculated by a function that is differentiable and we have a well-defined mathematical form. The programming network and the read network are such functions and can be differentiated using standard libraries such as TensorFlow and Pytrch. If a real memory model is used in the training loop, the present disclosure cannot be differentiated through the training loop due to the lack of mathematical expressions. Thus, an estimate called a robust (REINFORCE) estimate may be used. In one example, the memory model may be updated based on data from additional memory cells.

Alternatively, a memory model may be used. The memory model is a generative model that takes K voltages as inputs and returns K voltages representing voltages corrupted by noise. The generative model is any mathematical expression that can be differentiable and has a random component (such as a parametric gaussian model or a generative countermeasure network). The memory model may be used with measurements collected from physical memory devices such that the memory model models real memory behavior as closely as possible. Fitting the generative model is a known training process. In other words, the memory model may simulate the noise distribution of a real memory device. Once fitted, the memory model may be used for the training process described above, and the memory model may be differentiable.

When the memory model option is used, the model may be re-fitted during the optimization process. The re-fitting is because the memory and behavior can change depending on the constellation used. Thus, after multiple steps of updating the programming network, new measurements may be collected from the real memory device using the current constellation, the memory model may be re-fitted using these measurements, and training may continue.

Accordingly, the present disclosure includes the following embodiments.

A method for selecting a programming voltage of a memory device is described. Embodiments of the method may include: the method includes mapping a set of information bits to voltage levels of one or more memory cells based on a plurality of embedding parameters, programming the set of information bits into the one or more memory cells based on the mapping, detecting the voltage levels of the one or more memory cells to generate one or more detected voltage levels, and identifying a set of predicted information bits based on the one or more detected voltage levels using a neural network comprising a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedding parameters.

An apparatus for selecting a programming voltage for a memory device is described. The apparatus may include a processor, a memory in electronic communication with the processor, and instructions stored in the memory. The instructions may be operable to cause the processor to: the method includes mapping a set of information bits to voltage levels of one or more memory cells based on a plurality of embedding parameters, programming the set of information bits into the one or more memory cells based on the mapping, detecting the voltage levels of the one or more memory cells to generate one or more detected voltage levels, and identifying a set of predicted information bits based on the one or more detected voltage levels using a neural network comprising a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedding parameters.

A non-transitory computer-readable medium storing code for selecting a programming voltage for a memory device is described. In some examples, the code includes instructions executable by a processor to: the method includes mapping a set of information bits to voltage levels of one or more memory cells based on a plurality of embedding parameters, programming the set of information bits into the one or more memory cells based on the mapping, detecting the voltage levels of the one or more memory cells to generate one or more detected voltage levels, and identifying a set of predicted information bits based on the one or more detected voltage levels using a neural network comprising a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedding parameters.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: applying a gray code to the set of information bits, wherein the mapping is based on the gray code. In some examples, the one or more memory cells include a plurality of memory cells, the plurality of embedding parameters includes an array having a number of dimensions equal to a number of memory cells.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: embedding the set of information bits into an embedding space based on the plurality of embedding parameters to produce an embedded symbol. Some examples may also include: an S-type function is applied to constrain the embedded information symbols to produce constrained symbols. Some examples may also include: scaling the constraint symbols to produce scaled symbols corresponding to voltages within the effective dynamic range, wherein the set of information bits is mapped based on the scaled symbols.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: a set of information bit probabilities is generated based on the detected voltage levels using a neural network. Some examples may also include: selecting a highest information bit probability from the group of information bit probabilities, wherein the group of predicted information bits is identified based on the highest information bit probability.

A method for selecting a programming voltage of a memory device is described. Embodiments of the method may include: initializing a plurality of embedding parameters and a set of network parameters; mapping a set of information bits to voltage levels of one or more memory cells based on the plurality of embedding parameters; identifying a set of predicted information bits using an ANN based on the network parameters; and updating the plurality of embedding parameters and network parameters based at least in part on the set of predicted information bits.

An apparatus for selecting a programming voltage for a memory device is described. The apparatus may include a processor, a memory in electronic communication with the processor, and instructions stored in the memory. The instructions may be operable to cause a processor to: initializing a plurality of embedding parameters and a set of network parameters; mapping a set of information bits to voltage levels of one or more memory cells based on the plurality of embedding parameters; identifying a set of predicted information bits using an ANN based on the network parameters; and updating the plurality of embedding parameters and network parameters based at least in part on the set of predicted information bits.

A non-transitory computer-readable medium storing code for selecting a programming voltage for a memory device is described. In some examples, the code includes instructions executable by a processor to: initializing a plurality of embedding parameters and a set of network parameters; mapping a set of information bits to voltage levels of one or more memory cells based on the plurality of embedding parameters; identifying a set of prediction information bits using an ANN based on the network parameters; and updating the plurality of embedding parameters and network parameters based at least in part on a set of prediction information bits.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: updating the network parameters based on the plurality of embedded parameters to produce updated network parameters. Some examples further include: updating the plurality of embedding parameters based on the updated embedding parameters to produce updated embedding parameters.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: performing a plurality of training iterations, wherein the plurality of embedding parameters and network parameters are updated during each training iteration.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: calculating a gradient of a classification loss function of the set of information bits and the set of predicted information bits, wherein the plurality of embedding parameters or network parameters are updated based on the gradient of the classification loss function.

In some examples, the gradient includes an approximation of a physical NAND channel. For example, the gradient may include a mathematical representation based on measurements obtained from one or more physical NAND channels. Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: a mathematical model of the one or more memory cells is identified, wherein a gradient of the classification loss function is calculated based on the mathematical model. Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: the mathematical model is updated based on data from the additional memory unit.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: programming the set of information bits into the one or more memory cells based on a mapping. Some examples may also include: detecting a voltage level of the one or more memory cells to generate one or more detected voltage levels, wherein the set of predicted information bits is identified based on the one or more detected voltage levels.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: generating a set of information bit probabilities based on the detected voltage levels using a neural network, wherein the set of predicted information bits is identified based on a highest information bit probability. In some examples, the one or more memory cells include a plurality of memory cells, and the plurality of embedding parameters include an array having a number of dimensions equal to the number of memory cells.

Some examples of the above methods, apparatus, non-transitory computer-readable media, and systems may further include: embedding the set of information bits into an embedding space based on the plurality of embedding parameters to produce an embedded symbol. Some examples may also include: an S-type function is applied to constrain the embedded information symbols to produce constrained symbols. In some examples may also include: scaling the constraint symbols to produce scaled symbols, wherein the set of information bits is mapped based on the scaled symbols.

An apparatus for selecting a programming voltage for a memory device is described. Embodiments of the apparatus may include a plurality of memory cells; a programming component comprising an embedding layer based on a plurality of embedding parameters; and a reading component comprising a neural network based on a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedded parameters.

A method of manufacturing an apparatus for selecting a programming voltage of a memory device is described. The method may include providing a plurality of memory cells; providing a programming component comprising an embedding layer based on a plurality of embedding parameters; and providing a read component comprising a neural network based on a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedded parameters.

A method of using an apparatus for selecting a programming voltage of a memory device is described. The method may comprise: using a plurality of memory cells; using a programming component comprising an embedding layer based on a plurality of embedding parameters; and using a read component comprising a neural network based on a plurality of network parameters, wherein the plurality of network parameters are trained with the plurality of embedded parameters.

In some examples, the programming component further includes an S-type layer and a scaling layer. In some examples, the neural network includes a probability-based classifier. In some examples, the plurality of memory cells includes NAND memory cells.

Thus, the present disclosure may provide for automatic selection of programming voltages and may be repeatedly invoked for each new memory device version or generation, thereby quickly generating constellations (when compared to manual labor). Embodiments of the present disclosure provide the ability to find constellations faster than pre-made solutions and better than manual trial-and-error or heuristics, based on an optimization process (training process). Furthermore, embodiments of the present disclosure use real data collected from the memory device, providing a constellation that is tailored to a particular problem when compared to conventional programming constellations.

The description and drawings described herein represent example configurations and do not represent all embodiments within the scope of the claims. For example, operations and steps may be rearranged, combined, or otherwise modified. Additionally, structures and devices may be shown in block diagram form in order to represent relationships between components and to avoid obscuring the described concepts. Similar components or features may have the same name, but may have different reference numerals corresponding to different figures.

Some modifications to the disclosure may be apparent to those skilled in the art, and the principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The described methods may be implemented or performed with devices comprising a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, a conventional processor, a controller, a microcontroller, or a state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration). Thus, the functions described herein may be implemented in hardware or software and may be performed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on a computer-readable medium in the form of instructions or code.

Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of code or data. The non-transitory storage medium may be any readable medium that can be accessed by a computer. For example, a non-transitory computer-readable medium may include: random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Compact Disc (CD) or other optical disk storage, magnetic disk storage, or any other non-transitory medium for carrying or storing data or code.

Further, a connected component may be properly termed a computer-readable medium. For example, if the code or data is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technology such as infrared, radio, or microwave signals, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technology is included in the definition of medium. Combinations of the media are also included within the scope of computer-readable media.

In the present disclosure and claims, the word "or" indicates an inclusive list such that a list of, for example, X, Y or Z represents X, or Y, or Z, or XY, or XZ, or YZ, or XYZ. Further, the phrase "based on" is not used to denote a closed set of conditions. For example, a step described as "based on condition a" may be based on both condition a and condition B. In other words, the phrase "based on" should be interpreted to mean "based at least in part on. Furthermore, the words "a" or "an" indicate "at least one".

29页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：存储系统及其操作方法

Memory device and method of operating the same

相关技术

网友询问留言