Predictive data prefetching in a data store

文档序号：1879190 发布日期：2021-11-23 浏览：16次中文

阅读说明：本技术 数据存储装置中的预测性数据预取 (Predictive data prefetching in a data store ) 是由 A·弗罗利科夫 Z·A·P·沃格尔 J·G·门德斯 C·M·古达于 2020-03-10 设计创作，主要内容包括：一种数据存储系统具有非易失性媒体、缓冲存储器、处理装置和数据预取器。所述数据预取器接收要在所述数据存储系统中执行的命令,提供所述命令作为预测模型的输入,获得识别用于预取的至少一个命令作为具有所述命令作为输入的所述预测模型的输出。在所述命令在所述数据存储装置中执行之前,所述数据预取器从所述非易失性存储器检索要在执行所述命令时使用的至少一数据部分；并将所述数据部分存储在所述缓冲存储器中。在所述命令的执行之前,所述数据部分的检索和存储可以与许多命令的执行同时执行,以减小所述命令对与所述命令的执行同时执行的其它命令的时延影响。(A data storage system has a non-volatile medium, a buffer memory, a processing device, and a data prefetcher. The data prefetcher receives commands to be executed in the data storage system, provides the commands as inputs to a predictive model, and obtains an output identifying at least one command for prefetching as the predictive model having the command as an input. Prior to execution of the command in the data storage, the data prefetcher retrieves from the non-volatile memory at least a portion of data to be used in executing the command; and storing the data portion in the buffer memory. The retrieval and storage of the data portion may be performed concurrently with the execution of many commands prior to the execution of the command to reduce the latency impact of the command on other commands that are executed concurrently with the execution of the command.)

1. A data storage system, comprising:

a non-volatile medium;

a buffer memory;

a processing device coupled to the buffer memory and the non-volatile medium; and

a data prefetcher configured to:

receiving a command to be executed in the data storage system;

providing the command as an input to a predictive model;

identifying at least one command for prefetching using the predictive model and based on the command; and

before the command is executed in the data storage device,

retrieving from the non-volatile memory at least a portion of data to be used in executing the command; and

storing the data portion in the buffer memory.

2. The data storage system of claim 1, wherein the data prefetcher is configured to periodically use the predictive model.

3. The data storage system of claim 1, wherein the data prefetcher is configured to provide a predetermined number of the commands as inputs to the predictive model during each use of the predictive model.

4. The data storage system of claim 1, wherein the predictive model is trained using supervised machine learning techniques.

5. The data storage system of claim 4, wherein the data prefetcher is configured to spread latency effects of the commands across more than a threshold number of commands.

6. The data storage system of claim 4, wherein the data prefetcher is configured to retrieve the data portion from the non-volatile memory and store the data portion in the buffer memory using resources not needed for the execution of a plurality of commands during the execution of the plurality of commands.

7. The data storage system of claim 4, wherein predicting the command results in an increase in execution latency of another command by more than a threshold amount if the portion of data is not available in the buffer memory.

8. The data storage system of claim 4, wherein the command is identified by the predictive model based at least in part on the command being in a predetermined category.

9. The data storage system of claim 8, wherein the predetermined class of commands has an average execution latency longer than a threshold.

10. The data storage system of claim 9, further configured to:

generating latency data for a second command executed in the data storage system;

identifying, from the latency data, a third command that causes the execution latency of at least one of the second commands to have an increase greater than a threshold amount; and

training the predictive model using the supervised machine learning technique to reduce a difference between a third command identified using the latency data and a command identified by the predictive model from the second command.

11. A method, comprising:

receiving, in a controller of a data storage system, a command from a host system for execution in the data storage system;

providing the command as input to a predictive model;

identifying at least one command for prefetching using the predictive model and based on the command as input; and

before the command is executed in the data storage device,

retrieving at least a portion of data from a non-volatile memory of a data storage medium to be used in executing the command; and

storing the data portion in a buffer memory of the data storage system.

12. The method of claim 11, wherein the predictive model is trained using supervised machine learning techniques.

13. The method of claim 12, further comprising:

generating execution delay data of a first command;

identifying, from the latency data, a second command that causes the execution latency of at least one of the first commands to have an increase greater than a threshold amount; and

training the predictive model using the supervised machine learning technique to reduce a difference between the second command identified using the latency data and a third command identified by the predictive model from the first command.

14. The method of claim 13, further comprising:

calculating the average execution time delay of different types of commands; and

comparing the execution latency of the first command to an average to identify the at least one of the first commands having an increase in execution latency greater than the threshold amount.

15. The method of claim 14, further comprising:

identifying the second command in response to determining that the second command has a predetermined characteristic and that the second command has been executed concurrently with the at least one of the first commands.

16. The method of claim 15, wherein the predetermined characteristic comprises a predetermined command type, a predetermined command category, or an average execution latency above a threshold, or any combination thereof.

17. The method of claim 12, further comprising:

the latency impact of the commands is spread out over more than a threshold number of commands.

18. The method of claim 12, further comprising:

during execution of a plurality of commands, the data portion is retrieved from the non-volatile memory and stored in the buffer memory using resources not used by the execution of the plurality of commands.

19. A non-transitory computer storage medium storing instructions that, when executed by a computing system, cause the computing system to perform a method, the method comprising:

receiving latency data for a first command executed in a data storage system;

identifying, from the latency data, a second command that causes the execution latency of at least one of the first commands to have an increase greater than a threshold amount; and

training a predictive model using the supervised machine learning technique to reduce a difference between the second command identified using the latency data and a third command identified by the predictive model from the first command.

20. The non-transitory computer storage medium of claim 19, storing further instructions that, when executed by a computing system, cause the computing system to perform the method, the method further comprising:

receiving, in a controller of a data storage system, a pending command from a host system for execution in the data storage system;

providing the pending command to the predictive model as input;

identifying at least one fifth command for prefetching, using the predictive model and based on the pending command as input; and

before the fifth command is executed in the data storage device,

retrieving at least a portion of data from a non-volatile memory of a data storage medium to be used in executing the fifth command; and

storing the data portion in a buffer memory of the data storage system.

Technical Field

At least some embodiments disclosed herein relate generally to memory systems and, more particularly, are not limited to predictive data prefetching in a data store.

Background

The memory subsystem may include one or more memory components that store data. The memory subsystem may be a data storage system, such as a Solid State Drive (SSD) or a Hard Disk Drive (HDD). The memory subsystem may be a memory module, such as a dual in-line memory module (DIMM), a small DIMM (SO-DIMM), or a non-volatile dual in-line memory module (NVDIMM). The memory components may be, for example, non-volatile memory components and volatile memory components. Examples of memory components include memory integrated circuits. Some memory integrated circuits are volatile, requiring power to maintain stored data. Some memory integrated circuits are non-volatile and can retain stored data even when power is not supplied. Examples of non-volatile memory include flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), and electrically erasable programmable read-only memory (EEPROM) memory, among others. Examples of volatile memory include Dynamic Random Access Memory (DRAM) and Static Random Access Memory (SRAM). In general, a host system may utilize a memory subsystem to store data at and retrieve data from memory components.

A computer may include a host system and one or more memory subsystems attached to the host system. The host system may have a Central Processing Unit (CPU) in communication with the one or more memory subsystems that has stored and/or retrieved data and instructions. The instructions of the computer may include an operating system, device drivers, and applications. The operating system manages resources in the computer and provides common services for the application programs, such as memory allocation and time sharing of resources. A device driver operates or controls a particular type of device in the computer; and the operating system uses the device driver to provision the resources and/or services provided by the type of device. A Central Processing Unit (CPU) of a computer system may run an operating system and device drivers to provide services and/or resources to application programs. A Central Processing Unit (CPU) may run applications that use the services and/or resources. For example, an application program implementing a certain type of application of a computer system may instruct a Central Processing Unit (CPU) to store data in and retrieve data from memory components of a memory subsystem.

The host system may communicate with the memory subsystem according to a predefined communication protocol, such as non-volatile memory host controller interface specification (NVMHCI), also known as NVM express (NVMe), which specifies a logical device interface protocol for accessing non-volatile storage devices via a peripheral component interconnect express (PCI express or PCIe) bus. According to the communication protocol, the host system may send different types of commands to the memory subsystem; and the memory subsystem may execute the commands and provide responses to the commands. Some commands instruct the memory subsystem to store data items at, or retrieve data items from, the addresses specified in the commands, such as read commands and write commands. Some commands manage infrastructure and/or management tasks in the memory subsystem, such as commands to manage namespaces, commands to attach namespaces, commands to create input/output commit or completion queues, commands to delete input/output commit or completion queues, commands for firmware management, and so forth.

Drawings

Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

FIG. 1 illustrates an example computing system having a memory subsystem in accordance with some embodiments of the present disclosure.

FIG. 2 illustrates a system configured to train a predictive model to identify commands that may increase the execution latency of other commands.

FIG. 3 illustrates a system with a predictive model that prefetches command data from a non-volatile medium into a buffer memory.

FIG. 4 illustrates a method of training a predictive model to recognize high impact commands.

FIG. 5 illustrates a method of prefetching data of high impact commands based on predictions of a prediction model.

FIG. 6 is a block diagram of an example computer system in which embodiments of the present disclosure may operate.

Detailed Description

At least some aspects of the present disclosure relate to predictively prefetching data for commands that may increase execution latency for other commands executing concurrently in a data store. For example, the predictive model is configured in the data storage device to identify such commands that may cause significant delays in the execution of other commands. Data for use with the identified command may be prefetched from a non-volatile storage medium of the data storage device to a buffer memory of the storage device. Prefetching of data into the buffer memory may reduce, minimize, and/or eliminate delays caused by the identified command in the execution of other commands. The predictive model may be built by applying machine learning techniques on a training set of commands using the execution latency data of the commands in the training set.

Generally, infrastructure commands may be used to manage (manage), configure, manage (administerate), or report the state of the infrastructure in a data storage system. Certain infrastructure commands often result in an unexpected increase in the execution latency of other commands unrelated to such commands. Such infrastructure commands may have high latency. When certain resources in a data storage system are used to execute high latency infrastructure commands, those resources become unavailable to execute other commands, resulting in significant random delays in the execution of other commands that may use those resources.

In at least some embodiments disclosed herein, the predictive model is configured to predict infrastructure commands that are most likely to increase latency of other commands. The prediction is based on some characteristic of the commands currently queued for processing in the data storage system. The prediction allows the data storage system to prefetch data from the non-volatile storage medium to the buffer memory for the predicted infrastructure command. After prefetching the data of the predicted commands, the predicted infrastructure commands have a reduced likelihood of using resources to access the non-volatile storage media during their execution and make them unavailable for execution of other commands. Thus, the impact of execution of the infrastructure command on other commands may be reduced, minimized, and/or removed.

For example, supervised machine learning techniques may be applied to groups of commands in a training dataset. The training data set may have a mixed set of different types of infrastructure commands and different types of other commands. The command training set may represent the workload of the data storage device/system or the actual workload during a service period. Some parameters of the commands in the training set may be used as input parameters to the predictive model, such as the type of command, the region of the storage system that is accessed by the command, and so on. The measured latency of execution of commands in the training set may be used to identify infrastructure commands that have a high impact on the execution of other commands and infrastructure commands that do not have a high impact on the execution of other commands. For example, a high impact command causes the latency of the execution of other commands to have an increase greater than a threshold amount; and low impact commands cause the latency of other commands to have an increase that does not exceed a threshold amount. Supervised machine learning techniques may be used to train the predictive model by adjusting parameters of the predictive model to minimize the difference between the classification/prediction of the infrastructure commands identified by the predictive model and the classification/prediction of the infrastructure commands identified from the latency data of the training data set.

For example, the predictive model may be trained to classify command sequences. Each infrastructure command in the sequence may be classified as having a high likelihood of impact or not having the likelihood for the command in the sequence.

For example, the prediction model may be trained to predict, for a sequence of commands, an increase in latency of execution of other commands in the sequence caused by an infrastructure command in the sequence. The predicted increase in execution latency may be compared to a threshold to classify the infrastructure command as a high impact command or a low impact command.

For example, the prediction model may be trained to predict, for a sequence of commands, infrastructure commands that will enter the data storage device/system such that the execution latency of some of the commands in the sequence has an increase greater than a threshold amount. The prediction may be based on the pattern of the infrastructure commands and other commands.

For example, the predictive model may be based on statistical correlations using logistic regression and/or artificial neural networks.

For example, different sets of training sets may be used for data storage systems having different structures and different configurations.

A data storage system having a particular design may be initially configured with a predictive model trained from typical workloads of commands of the design. The predictive model may then be further trained and/or updated for typical workloads of the data storage system in the computer system and/or based on recent real-time workloads of the data storage system.

Optionally, the data storage system may be further configured to monitor the difference between real-time predictions made using the predictive model and subsequent measurements with increased latency commanded to execute to further periodically train the predictive model to adjust its predictive capabilities according to the real-time workload.

During use of the data storage system with the predictive model, incoming commands to be executed by the data storage system may be provided as input to the predictive model to identify a command table for the prefetch schedule/recommendation.

For example, the predictive model may be used to process a predetermined number of commands to be executed in one or more queues (e.g., 1000 commands) or once per a predetermined period of time (e.g., 10 ms). During use of the predictive model, commands awaiting execution by the data storage system may be fed into the predictive model to identify a table of high impact commands for prefetching. The data storage system is configured to prefetch data that may be used by the high impact command in the table before the high impact command is actually executed in order to spread the impact of executing the high impact command to a large number of other commands. Furthermore, prefetching may be configured to use spare resources that are not used/needed for execution of other commands that are executed before the high impact command; such an arrangement may reduce the overall impact of high impact commands on other commands.

In some cases, the prediction model may predict the infrastructure command before the host system sends the infrastructure command to the data storage system and/or before retrieving the infrastructure command from the queue for execution. The data storage system may use a flag to indicate whether the prefetched data of a predicted infrastructure command is valid.

In general, a memory subsystem may also be referred to as a "memory device". An example of a memory subsystem is a memory module connected to a Central Processing Unit (CPU) via a memory bus. Examples of memory modules include dual in-line memory modules (DIMMs), small DIMMs (SO-DIMMs), non-volatile dual in-line memory modules (NVDIMMs), and SO forth.

Another example of a memory subsystem is a data storage device/system connected to a Central Processing Unit (CPU) via a peripheral interconnect (e.g., input/output bus, storage area network). Examples of storage devices include Solid State Drives (SSDs), flash drives, Universal Serial Bus (USB) flash drives, and Hard Disk Drives (HDDs).

In some embodiments, the memory subsystem is a hybrid memory/storage subsystem that provides both memory and storage functions. In general, a host system may utilize a memory subsystem that includes one or more memory components. The host system may provide data to be stored at the memory subsystem and may request data to be retrieved from the memory subsystem.

FIG. 1 illustrates an example computing system having a memory subsystem (110) according to some embodiments of the present disclosure.

The memory subsystem (110) may include a non-volatile medium (109) that includes memory components. In general, the memory components may be volatile memory components, non-volatile memory components, or a combination thereof. In some embodiments, the memory subsystem (110) is a data storage system. An example of a data storage system is an SSD. In other embodiments, the memory subsystem (110) is a memory module. Examples of memory modules include DIMMs, NVDIMMs, and NVDIMM-ps. In some embodiments, the memory subsystem (110) is a hybrid memory/storage subsystem.

In general, a computing environment may include a host system (120) that uses a memory subsystem (110). For example, the host system (120) may write data to the memory subsystem (110) and read data from the memory subsystem (110).

The host system (120) may be part of a computing device, such as a desktop computer, a laptop computer, a network server, a mobile device, or such computing device that includes memory and a processing device. The host system (120) may include or be coupled to the memory subsystem (110) such that the host system (120) may read data from or write data to the memory subsystem (110). The host system (120) may be coupled to the memory subsystem (110) via a physical host interface. As used herein, "coupled to" generally refers to a connection between components that may be an indirect communication connection or a direct communication connection (e.g., without intervening components), whether wired or wireless, including, for example, electrical, optical, magnetic, etc. Examples of physical host interfaces include, but are not limited to, a Serial Advanced Technology Attachment (SATA) interface, a peripheral component interconnect express (PCIe) interface, a Universal Serial Bus (USB) interface, a fibre channel, a serial attached scsi (sas), a Double Data Rate (DDR) memory bus, and the like. The physical host interface may be used to transfer data and/or commands between the host system (120) and the memory subsystem (110). When the memory subsystem (110) is coupled to the host system (120) over a PCIe interface, the host system (120) may further access the non-volatile media (109) using an NVM express (NVMe) interface. The physical host interface may provide an interface for passing control, address, data, and other signals between the memory subsystem (110) and the host system (120). FIG. 1 shows a memory subsystem (110) as an example. In general, a host system (120) may access multiple memory subsystems via the same communication connection, multiple separate communication connections, and/or a combination of communication connections.

The host system (120) includes a processing device (118) and a controller (116). The processing device (118) of the host system (120) may be, for example, a microprocessor, a Central Processing Unit (CPU), a processing core of a processor, an execution unit, or the like. In some cases, the controller (116) may be referred to as a memory controller, a memory management unit, and/or an initiator. In one example, the controller (116) controls communication over a bus coupled between the host system (120) and the memory subsystem (110).

In general, the controller (116) may send a command or request to the memory subsystem (110) that it wishes to access the non-volatile media (109). The controller (116) may further include interface circuitry to communicate with the memory subsystem (110). The interface circuitry may convert responses received from the memory subsystem (110) into information for the host system (120).

The controller (116) of the host system (120) may communicate with the controller (115) of the memory subsystem (110) to perform operations such as reading data, writing data, or erasing data in the non-volatile media (109), among other such operations. In some cases, the controller (116) is integrated within the same package as the processing device (118). In other cases, the controller (116) is packaged separately from the processing device (118). The controller (116) and/or the processing device (118) may include hardware, such as one or more integrated circuits and/or discrete components, a buffer memory, a cache memory, or a combination thereof. The controller (116) and/or processing device (118) may be a microcontroller, special purpose logic circuitry (e.g., a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), etc.), or another suitable processor.

The non-volatile media (109) may include any combination of different types of non-volatile memory components. In some cases, volatile memory components may also be used. Examples of non-volatile memory components include NAND (NAND) type flash memory. The memory components in medium (109) may include one or more arrays of memory cells, such as Single Level Cells (SLC) or multi-level cells (MLC) (e.g., Three Level Cells (TLC) or four level cells (QLC)). In some embodiments, a particular memory component may include both SLC and MLC portions of a memory cell. Each of the memory cells may store one or more bits of data (e.g., a block of data) used by the host system (120). Although non-volatile memory components such as NAND type flash memory are described, the memory components used in the non-volatile media (109) may be based on any other type of memory. In addition, volatile memory may be used. In some embodiments, the memory components in the medium (109) may include, but are not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Phase Change Memory (PCM), Magnetic Random Access Memory (MRAM), Spin Transfer Torque (STT) -MRAM, ferroelectric random access memory (FeTRAM), ferroelectric RAM (feram), conductive bridge RAM (cbram), Resistive Random Access Memory (RRAM), oxide-based RRAM (oxram), NOR (NOR) flash memory, Electrically Erasable Programmable Read Only Memory (EEPROM), nanowire-based non-volatile memory, memory incorporating memristor technology, or cross-point arrays of non-volatile memory cells, or any combination thereof. A cross-point array of non-volatile memory may store bits based on changes in body resistance in conjunction with a stackable cross-meshed data access array. In addition, in contrast to many flash-based memories, cross-point non-volatile memories can perform write-in-place operations in which non-volatile memory cells can be programmed without prior erasure of the non-volatile memory cells. Furthermore, the memory cells of the memory components in the medium (109) may be grouped into memory pages or data blocks, which may refer to cells of the memory components used to store data.

The controller (115) of the memory subsystem (110) may communicate with memory components in the medium (109) to perform operations such as reading data, writing data, or erasing data at the memory components, as well as other such operations (e.g., in response to commands scheduled by the controller (116) on a command bus). The controller (115) may include hardware, such as one or more integrated circuits and/or discrete components, a buffer memory, or a combination thereof. The controller (115) may be a microcontroller, special purpose logic circuitry (e.g., a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), etc.), or another suitable processor. The controller (115) may include a processing device (117) (processor) configured to execute instructions stored in a local memory (119). In the example shown, the buffer memory (119) of the controller (115) includes embedded memory configured to store instructions for performing various processes, operations, logic flows, and routines that control the operation of the memory subsystem (110), including handling communications between the memory subsystem (110) and the host system (120). In some embodiments, the controller (115) may include memory registers that store memory pointers, fetched data, and the like. The controller (115) may also include Read Only Memory (ROM) for storing microcode. While the example memory subsystem (110) in fig. 1 has been shown to include a controller (115), in another embodiment of the disclosure, the memory subsystem (110) may not include a controller (115), but may rely on external control (e.g., provided by an external host or by a processor or controller separate from the memory subsystem).

In general, the controller (115) may receive commands or operations from the host system (120) and may convert the commands or operations into instructions or appropriate commands to achieve a desired access to memory components in the media (109). The controller (115) may be responsible for other operations such as wear leveling operations, garbage collection operations, error detection and Error Correction Code (ECC) operations, encryption operations, cache operations, and address translation between logical and physical block addresses associated with memory components in the media (109). The controller (115) may further include host interface circuitry to communicate with the host system (120) via a physical host interface. Host interface circuitry may convert commands received from a host system into command instructions to access a memory component in media (109), and convert responses associated with the memory component into information for the host system (120).

The memory subsystem (110) may also include additional circuitry or components not shown. In some embodiments, the memory subsystem (110) may include a cache or buffer (e.g., DRAM) and address circuitry (e.g., row decoder and column decoder) that may receive addresses from the controller (115) and decode the addresses to access memory components in the medium (109).

The computing system includes a data prefetcher (113) in the memory subsystem (110) that can retrieve data from the non-volatile media (109) to the buffer memory (119) for predicted high impact commands. The predicted high impact command may cause an increase in execution latency of other commands by more than a threshold amount when data is not prefetched into the buffer memory (119) before the high impact command is executed.

In some embodiments, the controller (115) in the memory subsystem (110) includes at least a portion of the data prefetcher (113). In other embodiments or in combination, the controller (116) and/or the processing device (118) in the host system (120) includes at least a portion of the data prefetcher (113). For example, the controller (115), the controller (116), and/or the processing device (118) may include logic circuitry that implements the data prefetcher (113). For example, the controller (115) or a processing device (118) (processor) of the host system (120) may be configured to execute instructions stored in memory to perform the operations of the data prefetcher (113) described herein. In some embodiments, the data prefetcher (113) is implemented in an integrated circuit chip disposed in the memory subsystem (110). In other embodiments, the data prefetcher (113) is part of an operating system, device driver, or application of the host system (120).

The memory subsystem (110) may have a queue (123) for commands of one class and another queue (125) for commands of another class. For example, the queue (123) may be configured for typical input/output commands, such as read commands and write commands. The queue (125) may be configured as an infrastructure command for atypical input/output commands. Some infrastructure commands may be high impact commands that cause the execution latency of a particular command in the queue (123) to have an increase greater than a threshold amount. The memory subsystem (110) may include one or more completion queues (121) for reporting the results of execution of commands in the command queues (123 and 125) to the host system (120). In some embodiments, one or more queues may be created in response to a command from the host system (120). Thus, the memory subsystem (110) is not limited to the particular number of queues shown in FIG. 1 in general.

The data prefetcher (113) is configured to predict/classify some commands of the class in the queue (125) as high impact commands. Before a high impact command is retrieved from the command queue (125) for execution, the data prefetcher (113) is configured to load data available for use by the high impact command from the non-volatile media (109) to the buffer memory (119). The payload data may be executed in preparation for execution of the high impact command to use resources not used for command execution from the queue (123) to increase resource utilization and reduce the overall impact of the high impact command. Alternatively or in combination, the payload data may be executed in preparation for execution of the high impact command to diffuse its effect among the execution of more commands from the queue (123) such that its effect is not focused on one or more commands that are executed concurrently with the execution of the high impact command.

FIG. 1 shows an example of a known high impact command in a particular queue (e.g., 125). In other embodiments, different classes of commands may be mixed in the same queue. For example, in some systems, the infrastructure commands may be in the same queue as non-infrastructure commands; and the techniques of this disclosure may also be used to predict high impact commands and prefetch data to a buffer memory for high impact commands. Thus, application of the techniques of this disclosure is not limited to a particular command queue structure.

FIG. 2 illustrates a system configured to train a predictive model (131) to identify commands that may increase the execution latency of other commands.

For example, the predictive model (131) of FIG. 2 may be configured in the data prefetcher (113) of the memory subsystem (110) of FIG. 1.

In FIG. 2, the command training set (137) is used to capture patterns of latency impact of different types of commands on each other. The command training set (137) may be an instance of a command representing a typical workload of the memory subsystem (110) or an actual workload of the memory subsystem (110) during a particular cycle of use in the computer system of FIG. 1.

During execution of commands in the training set by the memory subsystem (110) (e.g., without using the data prefetcher (113)), execution latency data (139) for the commands in the training set is measured. The execution latency data (139) may be used to identify high impact commands (135) that result in increased latency.

For example, an average execution latency for a particular type of command may be calculated from the execution latency data (139). For each respective command in the training set, an increased latency for executing the respective command may be calculated from a difference between an actual execution latency of the command and an average execution latency of commands of the same type as the command. When the delay increases beyond a threshold, the command is considered to be greatly affected. In a time window of execution of a command whose latency is greatly affected, other commands executed in the time window and/or concurrently with executing the command may be examined to identify a high impact command that results in a high impact. For example, an infrastructure command executing in a time window may be identified as a source of high impact; thus, the infrastructure command may be identified as a high impact command. For example, a particular class of commands executed in a time window may be identified as a source of high impact; thus, the command may be identified as a high impact command. For example, the average execution latency is above a threshold and the type of command executed in a time window may be identified as a source of high impact; thus, the command may be identified as a high impact command.

In fig. 2, the predictive model (131) is configured to identify high impact commands (141) that are predicted to result in increased latency from the command training set. The predictive model (131) computes a prediction (141) based on parameters of commands in the training set and/or an order in which the commands occur in the training set. The parameters may include the type of command in the training set and/or the zone/area address that the command accesses. Supervised machine learning (133) is applied to the predictive model (131) to reduce or minimize differences between high impact commands (135) identified from the execution latency data (139) and high impact commands (141) predicted by the predictive model (131).

After training the predictive model (131) using the technique of supervised machine learning (133), the predictive model (131) may be used for the data prefetcher (113) of the memory subsystem (110) of fig. 1 and/or the system as shown in fig. 3.

FIG. 3 shows a system with a predictive model (131) that prefetches data for a command from a non-volatile medium (109) to a buffer memory (119). For example, the system of FIG. 3 may be the memory subsystem (1110) of FIG. 1.

In fig. 3, commands in one or more queues (e.g., 123 and/or 125) are provided as inputs to a predictive model (131) to generate predictions of high impact commands (141) that may result in increased latency. The data prefetcher (113) is configured to retrieve data from the non-volatile media (109) to the buffer memory (119) prior to actually executing the high impact command (141) predicted by the prediction model (131).

Typically, the time required to access the non-volatile medium (109) to obtain a certain amount of data is longer than to access the buffer memory (119). Furthermore, the system may have fewer resources for accessing the non-volatile media (109) to execute multiple commands simultaneously than accessing the buffer memory (119). Thus, when data used by a high impact command is prefetched into the buffer memory (119), its impact on the simultaneous execution of other commands may be reduced.

FIG. 4 illustrates a method of training a predictive model to identify commands that have a high probability of causing significant delays in the execution of other commands. For example, the method of FIG. 4 may be implemented in the computer system of FIG. 1 using the techniques discussed in conjunction with FIG. 2.

At block 151, a first command (e.g., 137) is executed in the data storage system.

The first command may be a typical command example in a data storage system having the same or similar structure as the data storage system. Optionally, the first command may be an actual workload of the data storage system over a period of time.

At block 153, the data storage system (or a host connected to the data storage system) measures an execution latency of the first command. For example, the execution latency of a command may be measured as the duration between the retrieval of the command from the execution queue and the completion of the command execution in the data storage system. Typical commands retrieve data from, or write data at, the address specified in the command.

At block 155, a second command (e.g., 135) is identified, using the computing device, that causes the execution latency of some of the first commands to have an increase greater than a threshold amount. The computing device may be a computer separate from the data storage system and/or a host system of the data storage system, or a host system of the data storage system or a controller of the data storage system.

For example, the second command may be identified by: the method includes calculating an average latency for different command types, identifying an affected command whose execution latency exceeds its corresponding command type average by more than a threshold, and identifying a second command that executes concurrently with the affected command and has a predetermined characteristic. For example, the predetermined characteristic may be a predefined command category (e.g., infrastructure command), a command type having an average latency above a threshold, and/or other attributes.

At block 157, the computing device identifies a third command (e.g., 141) using the predictive model (131) based on the first command.

At block 159, the computing device applies supervised machine learning (133) to the predictive model (131) to reduce a difference between the second command (e.g., 135) and the third command (141).

FIG. 5 illustrates a method of predicting pre-fetching data for high impact commands based on a predictive model (e.g., 131) that may be trained using the method of FIG. 4.

For example, the method of FIG. 5 may be implemented in the computer system of FIG. 1 using the techniques discussed in conjunction with FIG. 3.

At block 171, a data prefetcher (113) of a data storage system (e.g., 110) receives an identification of a command queued for execution in the data storage system.

At block 173, the data prefetcher (113) provides a command as an input to the predictive model (131).

At block 175, the data prefetcher (113) identifies at least one command for prefetching using the predictive model (131) and based on the command as input.

Prior to retrieving a command from the queue for execution in the data storage system, the data prefetcher (113) retrieves at block 177 at least a portion of data to be used in executing the command and stores the retrieved portion of data in a buffer memory (119) of the data storage system at block 179.

At the same time, the controller (115) of the data storage system retrieves some of the queued commands at block 181 and executes the retrieved commands at block 183.

Preferably, the retrieving (177) and storing (179) of the data portion for the prefetched command are performed using resources that are not needed/used in the concurrent execution (183) of the command. This arrangement reduces the overall impact of the command on other commands. Alternatively or in combination, the impact of retrieving (177) and storing (179) the data portion for the prefetched command is spread out over the execution (183) of many commands, such that the impact on each individual command is reduced.

Subsequently, the controller (115) of the data storage system retrieves the command from the queue at block 185 and executes the command using at least the data portion in the buffer memory at block 187.

Since at least the data portion is in the buffer memory, the execution of the command has less impact on the execution latency of other commands that are executed concurrently with the execution of the command.

Optionally, the data prefetcher (113) may include the supervised machine learning (133) functionality shown in fig. 2 and/or discussed in fig. 4. For example, the data prefetcher (113) may measure the execution latency (139) of commands, identify commands (135) that result in an increase in latency, and use supervised machine learning (133) to minimize the number of commands that are predicted not to result in an increase in latency (141) but are found to have resulted in an increase in latency (135) based on the measured execution latency data (139).

In some implementations, the communication channel between the processing device (118) and the memory subsystem includes a computer network, such as a local area network, a wireless personal area network, a cellular communication network, a broadband high-speed always-on wireless communication connection (e.g., a current or future generation mobile network link); and the processing device (118) and memory subsystem may be configured to communicate with each other using data storage management and usage commands similar to those in the NVMe protocol.

In general, the memory subsystem may have non-volatile storage media. Examples of non-volatile storage media include memory cells formed in integrated circuits and magnetic materials coated on rigid magnetic disks. Non-volatile storage media can maintain data/information stored therein without consuming power. The memory cells may be implemented using various memory/storage technologies, such as NAND logic gates, NOR logic gates, Phase Change Memories (PCMs), magnetic memories (MRAMs), resistive random access memories, cross point storage, and memory devices (e.g., 3D XPoint memories). Cross-point memory devices use transistor-less memory elements, each having a memory cell and a selector, stacked together as a column. Columns of memory elements are connected via two perpendicular conductive lines, one of which is located above the column of memory elements and the other of which is located below the column of memory elements. Each memory element may be individually selected at the intersection of one conductive line on each of the two layers. Cross-point memory devices are fast and nonvolatile and can be used as a unified memory pool for processing and storage.

A controller (e.g., 115) of a memory subsystem (e.g., 110) may execute firmware to perform operations in response to communications from a processing device (118). Generally, firmware is a type of computer program that provides for the control, monitoring, and data manipulation of engineered computing devices.

Some embodiments relating to the operation of the controller (115) and/or the data prefetcher (113) may be implemented using computer instructions executed by the controller (115) (e.g., firmware of the controller (115)). In some cases, hardware circuitry may be used to implement at least some functions. The firmware may be initially stored in a non-volatile storage medium or another non-volatile device and loaded into volatile DRAM and/or in-processor cache memory for execution by the controller (115).

Non-transitory computer storage media may be used to store instructions of firmware of a memory subsystem (e.g., 110). When executed by the controller (115) and/or the processing device (117), the instructions cause the controller (115) and/or the processing device (117) to perform the methods discussed above.

Fig. 6 illustrates an example machine of a computer system (200) within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In some embodiments, the computer system (200) may correspond to a host system (e.g., host system (120) of fig. 1) that includes, is coupled to, or utilizes a memory subsystem (e.g., memory subsystem (110) of fig. 1) or an operation that may be used to perform a data prefetcher (113) (e.g., execute instructions to perform an operation corresponding to the data prefetcher (113) described with reference to fig. 1-5). In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, and/or the internet. The machine may operate in the capacity of a server or a client machine in a client-server network environment, as a peer machine in a peer-to-peer (or decentralized) network environment, or as a server or client machine in a cloud computing infrastructure or environment.

The machine may be a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Additionally, while a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.

An example computer system (200) includes a processing device (202), a main memory (204) (e.g., Read Only Memory (ROM), flash memory, Dynamic Random Access Memory (DRAM) such as synchronous DRAM (sdram) or Rambus DRAM (RDRAM), etc.), Static Random Access Memory (SRAM), etc.), and a data storage system (218) that communicate with each other via a bus (230), which may include multiple buses.

The processing device (202) represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More specifically, the processing device may be a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, or processor implementing other instruction sets, or processors implementing a combination of instruction sets. The processing device (202) may also be one or more special-purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP) network processor, or the like. The processing device (202) is configured to execute instructions (226) for performing the operations and steps discussed herein. The computer system (200) may further include a network interface device (208) to communicate over a network (220).

The data storage system (218) may include a machine-readable storage medium (224) (also referred to as a computer-readable medium) having stored thereon one or more sets of instructions (226) or software embodying any one or more of the methodologies or functions described herein. The instructions (226) may also reside, completely or at least partially, within the main memory (204) and/or within the processing device (202) during execution thereof by the computer system (200), the main memory (204) and the processing device (202) also constituting machine-readable storage media. The machine-readable storage medium (224), the data storage system (218), and/or the main memory (204) may correspond to the memory subsystem (110) of fig. 1.

In one embodiment, the instructions (226) include instructions for implementing functionality corresponding to a data prefetcher (113) (e.g., the data prefetcher (113) described with reference to fig. 1-5). While the machine-readable storage medium (224) is shown in an example embodiment to be a single medium, the term "machine-readable storage medium" should be taken to include a single medium or multiple media that store the one or more sets of instructions. The term "machine-readable storage medium" shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term "machine-readable storage medium" shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.

Some portions of the preceding detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical control of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. The present disclosure may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage systems.

The present disclosure also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the method. The structure for a variety of these systems will be presented as set forth in the description below. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.

The present disclosure may be provided as a computer program product or software which may include a machine-readable medium having stored thereon instructions which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). In some embodiments, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., computer) -readable storage medium, such as read only memory ("ROM"), random access memory ("RAM"), magnetic disk storage media, optical storage media, flash memory components, and so forth.

In this specification, various functions and operations are described as being performed by or caused by computer instructions for simplicity of description. However, those skilled in the art will recognize that the intent of such expressions is that the functions result from execution of computer instructions by one or more controllers or processors (e.g., microprocessors). Alternatively or in combination, the functions and operations may be implemented using special purpose circuitry, with or without software instructions, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). Embodiments may be implemented using hardwired circuitry without software instructions or in combination with software instructions. Thus, the techniques are not limited to any specific combination of hardware circuitry and software nor to any particular source for the instructions executed by the data processing system.

In the foregoing specification, embodiments of the disclosure have been described with reference to specific example embodiments thereof. It should be evident that various modifications may be made thereto without departing from the broader spirit and scope of embodiments of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：多处理器/端点数据分割系统

Predictive data prefetching in a data store

相关技术

网友询问留言