Maintaining consistent write latency in non-volatile memory devices

文档序号:1815413 发布日期:2021-11-09 浏览:9次 中文

阅读说明:本技术 保持非易失性存储器装置中的一致写入延迟 (Maintaining consistent write latency in non-volatile memory devices ) 是由 哈曼·巴蒂亚 张帆 于 2021-03-09 设计创作,主要内容包括:本公开描述了用于保持非易失性存储器装置中的一致写入延迟的装置、系统和方法。示例方法包括:从主机装置接收写入命令;基于写入命令的到达和写入命令的完成来计算写入命令的实际延迟;基于实际延迟来递增多个计数器中的一个或多个;基于递增之后的多个计数器来更新最小持续时间的值;并且在基于最小持续时间的更新值确定的时间实例处,向主机装置传输写入命令完成的指示,其中最小持续时间表示到达与传输之间的最小延迟,并且其中在时间实例处传输使得观察到的延迟能够保持在实际延迟的平均值的预定公差之内。(This disclosure describes devices, systems, and methods for maintaining consistent write latency in non-volatile memory devices. An example method includes: receiving a write command from a host device; calculating an actual latency of the write command based on arrival of the write command and completion of the write command; incrementing one or more of a plurality of counters based on the actual delay; updating the value of the minimum duration based on the plurality of counters after incrementing; and transmitting an indication of completion of the write command to the host device at a time instance determined based on the updated value of the minimum duration, wherein the minimum duration represents a minimum delay between arrival and transmission, and wherein the transmission at the time instance enables the observed delay to remain within a predetermined tolerance of an average of the actual delays.)

1. A method of maintaining consistent write latency in a non-volatile memory device, comprising:

receiving a write command from a host device;

calculating an actual latency of the write command based on arrival of the write command and completion of the write command;

incrementing one or more of a plurality of counters based on the actual delay;

updating a value of a minimum duration based on the plurality of counters after the incrementing; and is

Transmitting, to the host device, an indication of write command completion at a time instance determined based on an update value of the minimum duration,

wherein the minimum duration represents a minimum delay between arrival of the write command and transmission of the indication to the host device,

wherein transmitting at said time instances enables the observed delay to remain within a predetermined tolerance of an average of said actual delays,

wherein the observed delay is determined based on a time difference between arrival of the write command and the transmission,

wherein a first counter of the plurality of counters indicates a total number of write commands and is incremented upon receipt of the write commands, and

wherein each of the second and subsequent counters corresponds to a type of delay violation and is incremented upon determining that the delay exceeds a respective threshold of the plurality of thresholds.

2. The method of claim 1, wherein updating the value of the minimum duration is further based on a steady state value of the minimum duration.

3. The method of claim 1, wherein updating the value of the minimum duration is further based on an occurrence of a Flash Translation Layer (FTL) event.

4. The method of claim 1, wherein the non-volatile memory device is a NAND flash memory device.

5. The method of claim 1, wherein the plurality of counters comprises three counters, wherein a first threshold of the plurality of thresholds is between 110% and 125% of the value of the minimum duration, and wherein a second threshold of the plurality of thresholds is twice the value of the minimum duration.

6. The method of claim 1, wherein the plurality of counters comprises three counters, wherein a first threshold of the plurality of thresholds exceeds a value of the minimum duration by 5 microseconds, and wherein a second threshold of the plurality of thresholds is twice the value of the minimum duration.

7. The method of claim 1, wherein an initial value of the minimum duration is predetermined.

8. The method of claim 1, further comprising:

updating a first register based on the plurality of counters, the first register storing a state of the method corresponding to a duration or number of commands since a most recent delay violation, wherein the state is an integer in a range from 0 to N, wherein state 0 corresponds to a most recent delay violation that has occurred within a predetermined amount of time, and wherein state N corresponds to the method operating in a steady state.

9. The method of claim 8, further comprising:

updating a second register based on the plurality of counters, the second register storing a value of the minimum duration corresponding to a most recent time that the method was operating in a steady state.

10. A system for maintaining consistent write latency in a non-volatile memory device, comprising:

a processor and a memory, the memory including instructions stored on the memory, wherein the instructions, when executed by the processor, cause the processor to:

receiving a write command from a host device;

calculating an actual latency of the write command based on arrival of the write command and completion of the write command;

incrementing one or more of a plurality of counters based on the actual delay;

updating a value of a minimum duration based on the plurality of counters after the incrementing; and is

Transmitting, to the host device, an indication of write command completion at a time instance determined based on an update value of the minimum duration,

wherein the minimum duration represents a minimum delay between arrival of the write command and transmission of the indication to the host device,

wherein transmitting at said time instances enables the observed delay to remain within a predetermined tolerance of an average of said actual delays,

wherein the observed delay is determined based on a time difference between arrival of the write command and the transmission,

wherein a first counter of the plurality of counters indicates a total number of write commands and is incremented upon receipt of the write commands, and

wherein each of the second and subsequent counters corresponds to a type of delay violation and is incremented upon determining that the delay exceeds a respective threshold of the plurality of thresholds.

11. The system of claim 10, wherein updating the value of the minimum duration is further based on a steady state value of the minimum duration.

12. The system of claim 10, wherein updating the value of the minimum duration is further based on an occurrence of a Flash Translation Layer (FTL) event.

13. The system of claim 10, wherein the plurality of counters comprises three counters, wherein a first threshold of the plurality of thresholds is between 110% and 125% of the value of the minimum duration, and wherein a second threshold of the plurality of thresholds is twice the value of the minimum duration.

14. The system of claim 10, wherein the processor further:

updating a first register based on the plurality of counters, the first register storing a state of the method corresponding to a duration or number of commands since a most recent delay violation, wherein the state is an integer in a range from 0 to N, wherein state 0 corresponds to a most recent delay violation that has occurred within a predetermined amount of time, and wherein state N corresponds to the method operating in a steady state.

15. The system of claim 14, wherein the processor further:

updating a second register based on the plurality of counters, the second register storing a value of the minimum duration corresponding to a most recent time that the method was operating in a steady state.

16. A non-transitory computer-readable storage medium having instructions stored thereon to maintain consistent write latency in a non-volatile memory device, the non-transitory computer-readable storage medium comprising:

instructions to receive a write command from a host device;

instructions to calculate an actual latency of the write command based on arrival of the write command and completion of the write command;

instructions to increment one or more of a plurality of counters based on the actual delay;

instructions to update a value of a minimum duration based on the plurality of counters after the incrementing; and

at a time instance determined based on an update value of the minimum duration, transmitting an indication of write command completion to the host device,

wherein the minimum duration represents a minimum delay between arrival of the write command and transmission of the indication to the host device,

wherein transmitting at said time instances enables the observed delay to remain within a predetermined tolerance of an average of said actual delays,

wherein the observed delay is determined based on a time difference between arrival of the write command and the transmission,

wherein a first counter of the plurality of counters indicates a total number of write commands and is incremented upon receipt of the write commands, and

wherein each of the second and subsequent counters corresponds to a type of delay violation and is incremented upon determining that the delay exceeds a respective threshold of the plurality of thresholds.

17. The storage medium of claim 16, wherein the instruction to update the value of the minimum duration is further based on a steady state value of the minimum duration.

18. The storage medium of claim 16, wherein the instruction to update the value of the minimum duration is further based on an occurrence of a Flash Translation Layer (FTL) event.

19. The storage medium of claim 16, wherein the non-volatile memory device is a NAND flash memory device.

20. The storage medium of claim 16, wherein the plurality of counters comprises three counters, wherein a first threshold of the plurality of thresholds is between 110% and 125% of the value of the minimum duration, and wherein a second threshold of the plurality of thresholds is twice the value of the minimum duration.

Technical Field

This patent document relates generally to memory devices and, more particularly, to improving the performance of memory devices.

Background

Solid State Drives (SSDs) use multi-layer NAND flash devices for persistent storage. However, it is often necessary to erase a multi-layer NAND flash memory device before new data can be rewritten, which may result in a large delay (latency). For personal and enterprise applications, there is a requirement for consistent write command latency.

Disclosure of Invention

Embodiments of the disclosed technology relate to maintaining consistent write latency in non-volatile memory devices. These and other features and advantages may be realized, at least in part, by measuring a delay of each host write command and tracking a minimum duration value that is updated based on a counter that tracks delay violations.

In an example aspect, a method for maintaining consistent write latency in a non-volatile memory device is described. The method comprises the following steps: receiving a write command from a host device; calculating an actual latency of the write command based on arrival of the write command and completion of the write command; incrementing one or more of a plurality of counters based on the actual delay; updating the value of the minimum duration based on the plurality of counters after incrementing; and transmitting an indication of write command completion to the host device at a time instance (instance) determined based on an updated value of a minimum duration, wherein the minimum duration represents a minimum delay between arrival of the write command and transmission of the indication to the host device, wherein the transmission is at the time instance such that an observed delay can be maintained within a predetermined tolerance of an average of actual delays, wherein the observed delay is determined based on a time difference between arrival and transmission of the write command, wherein a first counter of the plurality of counters indicates a total number of write commands and is incremented upon receipt of the write command, and wherein each of the second and subsequent counters corresponds to a type of delay violation and is incremented upon determining that the delay exceeds a respective threshold of the plurality of thresholds.

In yet another example aspect, the above-described method may be implemented by a video encoder device or a video decoder device comprising a processor.

In yet another example aspect, the methods may be implemented in the form of processor executable instructions and stored on a computer readable program medium.

The subject matter described in this patent document can be implemented in a specific manner to provide one or more of the following features.

Drawings

FIG. 1 illustrates an example of a memory system.

FIG. 2 is a diagram of an example non-volatile memory device.

FIG. 3 is a graph showing a cell voltage level distribution (V) of a nonvolatile memory deviceth) Examples of (2) are shown.

FIG. 4 is a graph showing a cell voltage level distribution (V) of a nonvolatile memory deviceth) Another example of (1).

FIG. 5 is a graph showing cell voltage level distributions (V) of a non-volatile memory device before and after program disturbth) Examples of (2) are shown.

FIG. 6 is a graph showing a cell voltage level distribution (V) of a nonvolatile memory deviceth) Exemplary graph of variation with reference voltage.

Fig. 7A illustrates an example Finite State Machine (FSM) that may be used to adaptively determine a minimum duration.

FIG. 7B is a table with example thresholds corresponding to the FSM in FIG. 7A.

FIG. 8 illustrates a flow chart of an example method for adaptively determining latency in a non-volatile memory device.

FIG. 9A is an example numerical comparison of write delay profiles.

FIG. 9B is another example numerical comparison of write delay profiles.

FIG. 9C is yet another example numerical comparison of a write delay profile.

FIG. 10 illustrates a flow chart of an example method for maintaining consistent write latency in a non-volatile memory device.

Detailed Description

Solid State Drives (SSDs) use NAND flash memory as a storage medium because NAND flash memory has superior read latency compared to hard disk drive-based magnetic media. However, NAND flash based media require that a large block be erased before its pages can be rewritten. This results in the SSD periodically garbage collecting old blocks with many invalid pages, creating space for new host writes. In addition, high density NAND flash memories, in which each cell stores a plurality of layers, such as a Triple Layer Cell (TLC) flash memory having 8 layers and a Quad Layer Cell (QLC) flash memory having 16 layers, have a high program time.

Fig. 1-6 outline non-volatile memory systems (e.g., flash-based memory, NAND flash) in which embodiments of the disclosed technology can be implemented.

FIG. 1 is a block diagram of an example of a memory system 100 implemented based on some embodiments of the disclosed technology. Memory system 100 includes a memory module 110 that may be used to store information for use by other electronic devices or systems. The memory system 100 may be incorporated (e.g., located on a circuit board) into other electronic devices and systems. Alternatively, the memory system 100 may be implemented as an external storage device such as a USB flash drive and a Solid State Drive (SSD).

The memory modules 110 included in the memory system 100 may include memory regions (e.g., memory arrays) 102, 104, 106, and 108. Each of memory regions 102, 104, 106, and 108 may be included in a single memory die or in multiple memory dies. The memory die may be included in an Integrated Circuit (IC) chip.

Each of the memory regions 102, 104, 106, and 108 includes a plurality of memory cells. A read operation, a program operation, or an erase operation may be performed on a memory unit basis. Thus, each memory unit may include a predetermined number of memory cells. The memory cells in memory region 102, 104, 106, or 108 may be included in a single memory die or in multiple memory dies.

The memory cells in each of the memory regions 102, 104, 106, and 108 may be arranged in rows and columns in a memory unit. Each of the memory units may be a physical unit. For example, a group of multiple memory cells may form a memory unit. Each of the memory units may also be a logical unit. For example, a unit of memory may be a block or page that may be identified by a unique address, such as a block address or page address, respectively. As another example, memory regions 102, 104, 106, and 108 may comprise computer memory that includes memory banks (memory banks) as logical units of data storage, which may be banks that may be identified by bank addresses. During a read operation or a write operation, the unique address associated with a particular memory unit may be used to access the particular memory unit. Based on the unique address, information may be written to or retrieved from one or more memory cells in the particular memory unit.

The memory cells in memory regions 102, 104, 106, and 108 may include non-volatile memory cells. Examples of non-volatile memory cells include flash memory cells, phase change random access memory (PRAM) cells, Magnetoresistive Random Access Memory (MRAM) cells, or other types of non-volatile memory cells. In example embodiments where the memory cells are configured as NAND flash memory cells, read or write operations may be performed on a page basis. However, the erase operation in the NAND flash memory is performed on a block basis.

Each of the non-volatile memory cells may be configured as a single-layer cell (SLC) or a multi-layer memory cell. Single layer cells can store one bit of information per cell. The multi-level memory cells may store more than one bit of information per cell. For example, each of the memory cells in memory regions 102, 104, 106, and 108 may be configured as a multi-level cell (MLC) that stores two bits of information per cell, a triple-level cell (TLC) that stores three bits of information per cell, or a quad-level cell (QLC) that stores four bits of information per cell. In another example, each of the memory cells in memory regions 102, 104, 106, and 108 may be configured to store at least one bit of information (e.g., one bit of information or multiple bits of information), and each of the memory cells in memory regions 102, 104, 106, and 108 may be configured to store more than one bit of information.

As shown in FIG. 1, the memory system 100 includes a controller module 120. The controller module 120 includes: a memory interface 121 in communication with the memory module 110; a host interface 126 to communicate with a host (not shown); a processor 124 to run firmware level code; and a cache 123 and memory 122, which temporarily or permanently store executable firmware/instructions and associated information, respectively. In some embodiments, the controller module 120 may include an error correction engine 125 to perform error correction operations on information stored in the memory module 110. Error correction engine 125 may be configured to detect/correct single bit errors or multiple bit errors. In another embodiment, error correction engine 125 may be located in memory module 110.

The host may be a device or system that includes one or more processors that operate to retrieve data from the memory system 100 or store or write data to the memory system 100. In some embodiments, examples of a host may include a Personal Computer (PC), a portable digital device, a digital camera, a digital multimedia player, a television, and a wireless communication device.

In some embodiments, the controller module 120 may also include a host interface 126 to communicate with a host. Host interface 126 can include components that conform to at least one of the host interface specifications including, but not limited to, Serial Advanced Technology Attachment (SATA), serial small computer system interface (SAS) specification, peripheral component interconnect express (PCIe).

FIG. 2 illustrates an example of an array of memory cells implemented based on some embodiments of the disclosed technology.

In some embodiments, the memory cell array may include a NAND flash memory array divided into a number of blocks, each block containing a number of pages. Each block includes a plurality of memory cell strings, and each memory cell string includes a plurality of memory cells.

In some implementations in which the array of memory cells is a NAND flash memory array, read and write (program) operations are performed on a page basis, and erase operations are performed on a block basis. Before a program operation is performed on any page included in a block, all memory cells within the same block must be erased simultaneously. In an embodiment, the NAND flash memory may use an even/odd bit line structure. In another embodiment, the NAND flash memory may use an all bit line architecture. In an even/odd bit line architecture, the even and odd bit lines are interleaved along each word line and are accessed alternately so that each pair of even and odd bit lines can share peripheral circuitry such as a page buffer. In an all bit line architecture, all bit lines are accessed simultaneously.

FIG. 3 shows an example of a threshold voltage distribution curve in a multi-level cell device, where the number of cells per program/erase state is plotted as a function of threshold voltage. As shown therein, the threshold voltage distribution curves include an erased state (denoted as "ER" and corresponding to "11") having the lowest threshold voltage and three programmed states (denoted as "P1", "P2", and "P3" corresponding to "01", "00", and "10", respectively) having read voltages between the states (denoted by dashed lines). In some embodiments, each of the threshold voltage distributions for the program/erase state has a finite width due to differences in material properties between memory arrays.

Although FIG. 3 illustrates a multi-level cell device by way of example, each of the memory cells may be configured to store any number of bits per cell. In some embodiments, each of the memory cells may be configured as a single-layer cell (SLC) that stores one bit of information per cell, or a three-layer cell (TLC) that stores three bits of information per cell, or a four-layer cell (QLC) that stores four bits of information per cell.

When writing more than one bit of data to a memory cell, the threshold voltage levels of the memory cells need to be finely arranged because the distance between adjacent distributions is reduced. This is achieved by using Incremental Step Pulse Programming (ISPP), i.e., repeatedly programming memory cells on the same word line using a program and verify method in which a staircase (ladder) program voltage is applied to the word line. Each program state is associated with a verify voltage used in a verify operation and sets a target location for each threshold voltage distribution window.

Threshold voltage distribution distortion or overlap may cause read errors. The ideal memory cell threshold voltage distributions may be severely distorted or overlapped due to, for example, program and erase (P/E) cycles, intercell interference, and data retention errors (as will be discussed below), and in most cases, these read errors can be handled through the use of Error Correction Codes (ECC).

Fig. 4 shows an example of an ideal threshold voltage distribution curve 410 and an example of a distorted threshold voltage distribution curve 420. The vertical axis represents the number of memory cells having a particular threshold voltage represented on the horizontal axis.

For an n-bit multi-level cell NAND flash memory, the threshold voltage of each cell can be programmed to 2nA possible value. In an ideal multi-level cell NAND flash memory, each value corresponds to a non-overlapping threshold voltage window.

Flash memory P/E cycling causes damage to the tunnel oxide of the floating gate of the charge trapping layer of the cell transistor, which results in a shift in threshold voltage and thus a gradual reduction in memory device noise margin. As the P/E cycles increase, the margin between adjacent distributions for different program states decreases and eventually the distributions begin to overlap. Data bits stored in memory cells having threshold voltages programmed into overlapping ranges of adjacent distributions may be misjudged as values outside of the original target value.

Fig. 5 shows an example of intercell interference in the NAND flash memory. Intercell interference may also cause threshold voltage distortion of the flash memory cells. A shift in the threshold voltage of one memory cell transistor may affect the threshold voltage of its neighboring memory cell transistors through parasitic capacitive coupling effects between the aggressor cell and the victim cell. The amount of intercell interference may be affected by the NAND flash bit line structure. In the even/odd bit line architecture, memory cells on one word line are alternately connected to even and odd bit lines, and in the same word line, even cells are programmed before odd cells. Thus, even cells and odd cells experience different amounts of intercell interference. The cells in the all-bit line architecture experience less intercell interference than the even cells in the even/odd bit line architecture, and the all-bit line architecture can efficiently support high-speed current sensing to improve memory read and verify speeds.

The dashed lines in FIG. 5 represent the nominal distribution of the P/E states (before program disturb) of the cell under consideration, and the "neighbor state values" represent the values to which the neighbor states have been programmed. As shown in FIG. 5, when the adjacent state is programmed to P1, the threshold voltage distribution of the cell under consideration is shifted by a certain amount. However, when the adjacent state is programmed to P2, which has a higher threshold voltage than P1, this results in a larger shift than when the adjacent state is P1. Similarly, when the neighboring state is programmed to P3, the shift in the threshold voltage distribution is greatest.

Fig. 6 shows an example of retention errors in a NAND flash memory by comparing a normal threshold voltage distribution and a shifted threshold voltage distribution. Data stored in the NAND flash memory is easily damaged over time, which is called a data retention error. Retention errors are caused by the loss of charge stored in the floating gate or charge trapping layer of the cell transistor. Memory cells with more program-erase cycles are more likely to experience retention errors due to depletion of the floating gate or charge trapping layer. In the example of fig. 6, comparing the voltage distribution of the top row (before damage) and the distribution of the bottom row (contaminated by retention errors) shows a shift to the left.

NAND flash memory devices (e.g., as described in fig. 1-6) are susceptible to large delays in erase and program operations, and to this end SSDs may employ volatile DRAM to cache writes and send write command completions to the host once the data is stored in the DRAM cache. This reduces the host write latency for most writes to the time of DRAM write latency (typically a few microseconds). However, when the DRAM cache is full, the host write command cannot be completed before some data is moved from the DRAM cache to the NAND page. These write commands experience a delay equal to the delay of the NAND programming operation (typically hundreds of microseconds). In the worst case, no NAND pages are available and the SSD must perform one cycle of garbage collection before more host writes can be provided. This results in a latency of the host write command equal to the latency of the erase operation (typically a few milliseconds).

This problem may be exacerbated when the host workload has a large number of burst write commands. From a host perspective, it is desirable to use SSDs with consistent write command latency, especially for enterprise applications, where writes are spread across multiple drives in a RAID scheme, and a longer latency on any one drive may cause the application to lag.

Previous techniques have addressed this problem by adding a constant delay to all write command completions to extend the maximum delay to all commands. However, this approach results in unnecessary delay when the write traffic percentage is small or when the write cache is empty.

According to some embodiments of the disclosed technology, SSD controller hardware (e.g., firmware) may measure the latency of each host write command at the host interface, and delay the write command from completing when the measured latency is below a certain threshold. This specific threshold value representing the minimum duration after which the write command complete message is output is stored as the MinDuration register setting. Using a minimum duration ensures that the delay of the write command can be controlled within a predetermined amount (e.g., some small variation around the average) to maintain a consistent write command delay. On the other hand, when the delay exceeds the MinDuration threshold, the hardware updates one or more counters configured to track delay violations. In some embodiments, after the timer times out or a Flash Translation Layer (FTL) event is encountered, the firmware adjusts the MinDuration value based on the values of one or more counters. It should be noted that in the following description, specific register names and parameters are used by way of example and not limitation to facilitate understanding of the disclosed techniques.

In some embodiments, the SSD controller achieves improved write QoS (quality of service) by adding appropriate latency to command completion when placing data in the DRAM cache completes a host write. To determine which commands require the addition of latency, the SSD controller has dedicated hardware that keeps a timer and keeps track of the arrival time of each write command as it is pending. In these cases, a delay is added to the write command complete command to achieve a delay equal to the current value of MinDuration.

In some embodiments, the MinDuration may be configured to a predetermined value such that, in the case of a persistent write, the rate at which the transmit write completes matches the rate at which data is moved from the cache to the NAND page.

However, this may lead to unnecessary write performance degradation, and therefore, the adaptation scheme applies an appropriate delay to write completion only when necessary. In these embodiments, in addition to the timer and delay measurement block, the SSD controller hardware maintains the following counters:

1) TotalWritesCount: this register holds a count of the total number of host writes that have completed since the last reset.

2) X1 Count: this register holds a count of the number of host writes delayed by more than MinDuration5 microseconds.

3) X2 Count: this register holds a count of the number of host writes delayed by more than 2 x MinDuration.

The firmware running on the SSD controller periodically reads the three counters and updates MinDuration and the following registers:

1) adaptiveState: the current state of the scheme, which represents the time or command since the last write delay violation. An AdaptiveState equal to 0 indicates a recent delay violation (or equivalently, a delay violation occurred within a predetermined duration before the current time), and an AdaptiveState equal to 7 indicates a steady state that has been achieved by the adaptation scheme.

2) Laststeadystestaminduration: finally, the MinDuration value at steady state is reached.

In some embodiments, as shown in FIG. 7A, the firmware uses a Finite State Machine (FSM) to determine the new values for MinDuration, AdapteState, and LastSteadyStatemInDuration. FIG. 7B is a table including various example thresholds, branches, and reset conditions for the FSM shown in FIG. 7A.

In the example shown in fig. 7A and 7B, due to the burstiness of write violations, the minDuration register setting is configured to increase quickly (e.g., trigger a delay violation 1 when TotalWriteCount is greater than or equal to 16), but to shrink at a slower scale. Herein, the threshold and reset conditions vary with workload.

FIG. 8 illustrates a flow diagram of an overall scheme for adaptively determining latency in a non-volatile memory device in accordance with various embodiments of the disclosed technology. As shown therein, command delay monitor 810 receives the arrival of a write command (e.g., the command itself or a corresponding timestamp), the completion of the write command, and calculates and outputs the delayed completion of the write command based on the value in MinDuration register 840.

The MinDuration adaptation module 820 reads and updates the MinDuration register 840, the MinDuration adaptation module 820 also bi-directionally communicating with the counters 830 (e.g., three counters in the example shown in fig. 8) and the status registers 850 (e.g., including AdaptiveState and laststeadystentiminduration).

FIGS. 9A, 9B, and 9C illustrate improvements in the tail of the distribution of delays for host write commands when embodiments of the disclosed technology are used with various traffic models. The tail of the distribution of the delay of the host write command refers to a delay value that is far from the average, i.e., a very low delay value (e.g., when the write cache is empty) or a very high delay value (e.g., when no NAND pages are available and a garbage collection cycle must be performed). Thus, the described embodiments reduce the occurrence of these tail events, ensuring that the delay of host write commands is closely clustered around an average value.

FIG. 9A shows the host write latency for various minDuration values with a 100% workload on a drive with queue depth of 1(QD-1), which corresponds to a single running thread, including write commands that specify 4KB (4K) of random data per write command. In this example, the driver considered is a 128Gb driver with 15 dies, 2000 μ s of Triple Layer Cells (TLC), 28% effective over-provisioning (EOP), and a nominal value of the write complete message without delay (denoted tProg) set to 10 μ s. "2.7 × tProg" refers to a fixed delay case set to an ideal value (of the simulation parameters), and "Adaptive" refers to an Adaptive delay scheme as described by embodiments of the disclosed technology.

As shown in fig. 9A, the latency of the adaptation scheme is relatively constant up to the "5 Nine (5 Nine)" level, which indicates that the adaptation scheme reduces the tail of the host write latency by up to 99.9999 percentage points. The delay of the "tProg" case (no delay is incurred before the write complete command is output) increases sharply with increasing tail constraint (i.e., increasing number of "Nine nes").

Fig. 9B and 9C compare the "tProg" and "Adaptive" schemes for 99% and 95% write workload, respectively, for the same driver used in fig. 9A. As shown therein, the adaptation scheme is only slightly worse than the "tProg" case at 95% workload.

Embodiments of the disclosed technology may be applicable where a Single Level Cell (SLC) NAND cache is used instead of (or in conjunction with) a DRAM cache. These cases depend on the measured delay rather than the number of free pages in the write buffer, so the described embodiments work also when the write buffer is adaptively reused for other tasks in the SSD.

FIG. 10 illustrates a flow chart of a method for maintaining consistent write latency in a non-volatile memory device. The method 1000 includes: at operation 1010, a write command is received from a host device.

The method 1000 includes: at operation 1020, an actual latency of the write command is calculated based on the arrival of the write command and the completion of the write command.

The method 1000 includes: at operation 1030, one or more of a plurality of counters are incremented based on the actual delay.

The method 1000 includes: at operation 1040, the value of the minimum duration is updated based on the plurality of counters after incrementing.

The method 1000 includes: at operation 1050, an indication that the write command is complete is transmitted to the host device at a time instance determined based on the update value of the minimum duration.

In some embodiments, the minimum duration represents a minimum delay between arrival of the write command and transmission of the indication to the host device, the transmission enabling an observed delay to remain within a predetermined tolerance of an average of actual delays at the time instances, the observed delay being determined based on a time difference between arrival and transmission of the write command, a first counter of the plurality of counters indicating a total number of write commands and incrementing upon receipt of the write command, and each of the second and subsequent counters corresponding to a type of delay violation and incrementing upon determining that the delay exceeds a respective threshold of the plurality of thresholds.

In some embodiments, updating the value of the minimum duration is further based on a steady state value of the minimum duration.

In some embodiments, updating the value of the minimum duration is further based on an occurrence of a Flash Translation Layer (FTL) event.

In some embodiments, the non-volatile memory device is a NAND flash memory device.

In some embodiments, the plurality of counters comprises three counters, a first threshold of the plurality of thresholds is between 110% and 125% of the value of the minimum duration, and a second threshold of the plurality of thresholds is twice the value of the minimum duration.

In some embodiments, the plurality of counters comprises three counters, a first threshold of the plurality of thresholds exceeds a value of the minimum duration by 5 microseconds, and a second threshold of the plurality of thresholds is twice the value of the minimum duration.

In some embodiments, the initial value of the minimum duration is predetermined.

In some embodiments, the method 1000 further comprises the steps of: updating a first register based on a plurality of counters, the first register storing a state of a method corresponding to a duration or number of commands since a most recent delay violation, wherein the state is an integer in a range from 0 to N, wherein state 0 corresponds to a most recent delay violation that has occurred within a predetermined amount of time, and wherein state N corresponds to a method operating in a steady state.

In some embodiments, the method 1000 further comprises the steps of: updating a second register based on the plurality of counters, the second register storing a value of a minimum duration corresponding to a most recent time the method was operating in a steady state.

Embodiments of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be run on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, the computer need not have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of any invention that may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only a few embodiments and examples are described in this patent document, and other embodiments, modifications, and variations may be made based on what is described and illustrated in this patent document.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:存储装置和存储装置的重新训练方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类