Use of outstanding command queues for separate read-only and write-read caches in a memory subsystem

文档序号：108373 发布日期：2021-10-15 浏览：44次中文

阅读说明：本技术 用于存储器子系统中的单独的只读高速缓存和写入-读取高速缓存的未完成命令队列的使用 (Use of outstanding command queues for separate read-only and write-read caches in a memory subsystem ) 是由 D·巴维什于 2020-02-27 设计创作，主要内容包括：可以接收针对读取存储在存储器子系统处的数据的请求。可以确定所述数据是否存储在所述存储器子系统的高速缓存中。响应于确定所述数据未存储在所述存储器子系统的所述高速缓存中,可以由处理装置确定一组队列中的一个队列,以将所述请求与针对存储在所述存储器子系统处的所述数据的其它读取请求一起存储。所述一组队列中的每个队列对应于所述高速缓存的相应高速缓存行。所述请求可以与针对存储在所述存储器子系统处的所述数据的所述其它读取请求一起存储在确定的队列处。(A request may be received to read data stored at a memory subsystem. It may be determined whether the data is stored in a cache of the memory subsystem. In response to determining that the data is not stored in the cache of the memory subsystem, one queue of a set of queues may be determined by a processing device to store the request with other read requests for the data stored at the memory subsystem. Each queue in the set of queues corresponds to a respective cache line of the cache. The request may be stored at the determined queue with the other read requests for the data stored at the memory subsystem.)

1. A method, comprising:

receiving a request to read data stored at a memory subsystem;

determining whether the data is stored at a cache of the memory subsystem;

in response to determining that the data is not stored at the cache of the memory subsystem, determining, by a processing device, one of a plurality of queues to store the request with other read requests for the data stored at the memory subsystem, wherein each queue of the plurality of queues corresponds to a respective cache line of the cache; and

storing the request at the determined queue with the other read requests for the data stored at the memory subsystem.

2. The method of claim 1, wherein determining whether the data is stored at the cache of the memory subsystem comprises determining whether a Content Addressable Memory (CAM) of a read-only cache includes an identifier associated with the data or a CAM of a write-read cache includes the identifier associated with the data.

3. The method of claim 2, wherein the plurality of queues are associated with the read-only cache and another plurality of queues are associated with the write-read cache, and determining whether the data is stored at the cache of the memory subsystem further comprises determining whether any of the plurality of queues or the another plurality of queues are associated with the identifier.

4. The method of claim 1, wherein determining that the queue of the plurality of queues stores the request with other read requests for the data stored at the memory subsystem comprises:

determining an identifier associated with the data based on the request, wherein the identifier comprises an address of the data; and

determining that the queue is assigned the identifier, the queue being assigned the identifier in response to determining that the data in a first request included in the other requests is not found in the cache.

5. The method of claim 4, further comprising determining that the queue is invalid and unblocked prior to assigning the identifier to the queue.

6. The method of claim 1, further comprising, in response to determining that the data is stored at the cache of the memory subsystem, storing the request at another queue for managing execution of requests to read other data present in the cache.

7. The method of claim 1, wherein determining whether the data is stored at the cache of the memory subsystem is based on whether a valid bit of the data is set in the cache.

8. The method of claim 1, further comprising incrementing a read counter and a fill counter associated with the queue in response to storing the request in the queue.

9. The method of claim 1, further comprising:

receiving a request to write other data to the memory subsystem;

determining whether an identifier of the request for writing the other data is the same as another identifier associated with the request for reading the data;

in response to determining that the identifier is the same as the other identifier, determining, based on the identifier, that the queue of the plurality of queues stores the request to write the other data with the request to read the data and the other read request for the data stored at the memory subsystem, wherein the identifier is assigned to the queue in response to determining that the data contained in a first request of the other requests is not found in the cache; and

storing the request to write the data at the determined queue.

10. A system, comprising:

a memory device; and

a processing device operably coupled with the memory device to:

receiving a request to read data stored at a memory subsystem;

determining whether the data is stored at a cache of the memory subsystem;

in response to determining that the data is not stored at the cache of the memory subsystem, determining one of a plurality of queues to store the request with other read requests for the data stored at the memory subsystem, wherein each queue of the plurality of queues corresponds to a respective cache line of the cache; and

storing the request at the determined queue with the other read requests for the data stored at the memory subsystem.

11. The system of claim 10, wherein to determine whether the data is stored at the cache of the memory subsystem, the processing device further determines whether a Content Addressable Memory (CAM) of a read-only cache includes an identifier associated with the data or a CAM of a write-read cache includes the identifier associated with the data.

12. The system of claim 11, wherein the plurality of queues are associated with the read-only cache and another plurality of queues are associated with the write-read cache, and wherein to determine whether the data is stored at the cache of the memory subsystem, the processing device further determines whether any of the plurality of queues is associated with the identifier or any of the another plurality of queues is associated with the identifier.

13. The system of claim 10, wherein to determine that the queue of the plurality of queues stores the request with other read requests for the data stored at the memory subsystem, the processing device further to:

determining an identifier associated with the data based on the request, wherein the identifier comprises an address of the data; and

determining that the queue is assigned the identifier, wherein the queue is assigned the identifier in response to determining that the data in a first request included in the other requests is not found in the cache.

14. The system of claim 13, wherein the processing device further determines that the queue is invalid and unblocked prior to assigning the identifier to the queue.

15. The system of claim 10, wherein in response to determining that the data is stored at the cache of the memory subsystem, the processing device further stores the request at another queue for managing execution of requests to read data present in the cache.

16. The system of claim 10, wherein to determine whether the data is stored at the cache of the memory subsystem, the processing device further determines whether a valid bit of the data is set in the cache.

17. The system of claim 10, wherein the processing device further:

receiving a request to write other data to the memory subsystem;

determining whether an identifier of the request for writing the other data is the same as another identifier associated with the request for reading the data;

storing the request to write the data at the determined queue.

18. A method, comprising:

determining that data requested by a plurality of read operations has been retrieved from a memory subsystem;

performing one or more fill operations to store the data at a cache line of a cache of the memory subsystem;

determining, by a processing device, one of the plurality of queues corresponding to the data, wherein each queue of the plurality of queues corresponds to a respective cache line of a plurality of cache lines of the cache of the memory subsystem; and

in response to performing the one or more fill operations to store the data at the cache line, performing the plurality of read operations in an order in which the memory subsystem has received the plurality of read operations stored at the determined queue.

19. The method of claim 18, further comprising:

storing the data in a fill queue in an order in which the data is retrieved from memory components of the memory subsystem; and

performing the fill operation by deleting the data from the fill queue in the order in which the data was stored in the fill queue to store the data at the cache line.

20. The method of claim 18, further comprising:

decrementing a fill counter for each of the one or more fill operations that have been performed.

Technical Field

Embodiments of the present disclosure relate generally to memory subsystems and, more particularly, to the use of an outstanding command queue for separate read-only and write-read caches in a memory subsystem.

Background

The memory subsystem may be a storage system, such as a Solid State Drive (SSD) or a Hard Disk Drive (HDD). The memory subsystem may be a memory module, such as a dual in-line memory module (DIMM), a small-size DIMM (SO-DIMM), or a non-volatile dual in-line memory module (NVDIMM). The memory subsystem may include one or more memory components that store data. The memory components may be, for example, non-volatile memory components and volatile memory components. In general, a host system may utilize a memory subsystem to store data in and retrieve data from a memory component.

Drawings

The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the disclosure.

FIG. 1 illustrates an example computing environment that includes a memory subsystem in accordance with some embodiments of the present disclosure.

FIG. 2 illustrates example cache components and local memory of a memory subsystem according to some embodiments of the present disclosure.

FIG. 3 is a flow diagram of an example method for using separate read-only and write-read caches based on a determined memory access workload of an application, according to some embodiments of the present disclosure.

FIG. 4 is a flow diagram of an example method for accumulating data in a cache using sectors of a fixed data size in a cache line, according to some embodiments of the present disclosure.

FIG. 5 illustrates an example read-only cache and write-read cache, according to some embodiments of the present disclosure.

FIG. 6 is a flow diagram of an example method for storing read requests for data not present in a cache in an outstanding command queue, according to some embodiments of the present disclosure.

FIG. 7 is a flow diagram of an example method for executing a request stored in an outstanding command queue according to some embodiments of the present disclosure.

FIG. 8 illustrates an example read-only outstanding command queue, write-read outstanding command queue, read-only content addressable memory, and read-only content addressable memory, according to some embodiments of the present disclosure.

FIG. 9 is a flow diagram of an example method for determining a schedule for executing requests in a memory subsystem in accordance with some embodiments of the present disclosure.

FIG. 10 is a flow diagram of another example method for determining a schedule for executing requests in a memory subsystem in accordance with some embodiments of the present disclosure.

Fig. 11 illustrates an example of determining a schedule for executing a request based on a priority indicator using a priority scheduler, according to some embodiments of the present disclosure.

FIG. 12 is a block diagram of an example computer system in which embodiments of the present disclosure may operate.

Detailed Description

Aspects of the present disclosure relate to the use of separate outstanding command queues for read-only and write-read caches in a memory subsystem. The memory subsystem is also referred to hereinafter as a "memory device". An example of a memory subsystem is a storage device coupled to a Central Processing Unit (CPU) via a peripheral interconnect (e.g., input/output bus, storage area network). Examples of storage devices include Solid State Drives (SSDs), flash drives, Universal Serial Bus (USB) flash drives, and Hard Disk Drives (HDDs). Another example of a memory subsystem is a memory module coupled to a CPU via a memory bus. Examples of memory modules include dual in-line memory modules (DIMMs), small DIMMs (SO-DIMMs), non-volatile dual in-line memory modules (NVDIMMs), and the like. In some embodiments, the memory subsystem may be a hybrid memory/storage subsystem. In general, a host system may utilize a memory subsystem that includes one or more memory devices. The host system may provide data to be stored at the memory subsystem and may request data to be retrieved from the memory subsystem.

The memory subsystem may contain a number of memory components that may store data from the host system. In some host systems, the performance of applications executing on the host system may be highly dependent on the speed at which data may be accessed in the memory subsystem. To speed up data access, conventional memory subsystems use the spatial and temporal locality of memory access patterns to optimize performance. These memory subsystems may use higher performance and lower capacity media called caches to store frequently accessed data (temporal locality) or data located in memory regions that have recently been accessed (spatial locality).

Each of the memory components may be associated with a protocol specifying a size of a management unit used by the memory component and/or a preferred size for requests to access data stored at the management unit. For example, a protocol for one memory unit may specify that a 512 Kilobyte (KB) sized request be performed for the memory unit. An application executing on the host system may initially request to read 512KB of data from the memory unit, but due to the protocol of the bus used to communicate between the host system and the memory subsystem, the 512KB requests are typically broken down into smaller granularity requests (e.g., eight 64KB requests). Conventional memory subsystems may perform smaller granularity requests to obtain data from a memory component, which may then be stored in a cache and/or returned to the requesting application. Performing smaller granularity requests on a memory component capable of handling larger granularity requests may result in faster wear and less endurance of the memory component because more read operations will be performed at the memory component.

In addition, some applications executing on the host system may use the memory subsystem as main memory. In such an example, the address space typically has separate memory address regions for read data and write data. In conventional memory subsystems, a single cache may be used that is capable of writing and reading data, which may be undesirable for different memory access workloads. For example, read and write request latencies may be different, and an application may reduce performance of the memory subsystem when writing and reading to different address spaces using a single cache.

The different types of memory access workloads may be sequential (in order) and random (out of order) accesses. For example, an application may request that original data be read from an address, that different data be written to the address, and that different data be read from the address. If the requests are not processed correctly in order, there may be a data hazard such as the memory subsystem returning erroneous data to the application (e.g., returning the original data in response to a read request for different data before writing the different data).

Further, in some examples, an application may request access to data at different addresses. The data may be located at the same or different memory components. The latency of returning data at different addresses from the same or different memory components may vary based on various factors such as the speed of the memory components, the size of the data requested, and so forth. Conventional memory subsystems typically wait until data at the address of a first received request is returned from the memory component, regardless of whether data at a different address of another request is returned from the memory component sooner. That is, data at different addresses may be in an idle state after returning from the memory component until the data at the address of the first received request is stored in the cache. This may reduce data throughput in the memory subsystem.

Aspects of the present disclosure address the above and other deficiencies by using separate read-only and write-read caches in a memory subsystem. Separate read-only and write-read caches in the memory subsystem front end may provide different spaces for applications executing on the host system to read and write. For example, an application may request certain virtual addresses that are translated by the host operating system to logical addresses. The logical addresses may be translated into physical addresses that may be maintained in different spaces for reading and writing using separate read-only and write-read caches. Separate read-only and write-read caches may be located between the host system and the media components of the memory subsystem (also referred to as "backing store"). The read-only cache may be used for sequential read requests for data in the memory component, while the write-read cache may be used for handling read and write requests for data in the media component. A separate cache may improve the performance of the memory subsystem by reading/writing data faster than accessing a slower backing store for each request. In addition, the separate cache increases the durability of the backing store by reducing the number of requests to the backing store.

In some embodiments, the memory subsystem may detect a memory access workload, such as a sequential memory access workload or a random memory access workload. Sequential memory access workloads may refer to read requests that occur sequentially for the same or sequential addresses. Data requested in the sequential memory access workload may be filled in the read-only cache for faster access than each use of the backing store.

Random memory access workload may refer to randomly occurring writes and reads. Some applications may use random memory access workloads. Data associated with random write and read requests may be populated in the write-read cache. For example, data requested to be written to the backing store may be initially written to the write-read cache, and when read data is requested, the write-read cache may return the written data without having to access the backing store.

Each of the read-only cache and the write-read cache may use a respective Content Addressable Memory (CAM) to determine whether data associated with a request received from a host system is present in the read-only cache and/or the write-read cache. For example, the memory subsystem may use the CAM to determine whether the requested data containing the tag is stored in a read-only cache and/or a write-read cache. The data request has an address specifying the location of the requested data. The address may be broken into multiple portions, such as an offset that identifies a particular location within a cache line, a setting that identifies a set containing the requested data, and a tag that includes one or more bits of the address that may be stored in each cache line along with its data to distinguish tags of different addresses that may be placed in the set. The CAM corresponding to the read-only cache or write-read cache that is to store the requested data may store the tag of the requested data to enable a lookup to be performed more quickly upon receipt of the request than searching the cache itself.

Further, as discussed above, the host system may provide requests for data (e.g., 512 bytes) by breaking the requests into 64 byte small granularity requests based on the protocol used by the memory bus communicatively coupling the host system to the memory subsystem. In some embodiments, each of the read-only cache and the write-read cache use sectors to aggregate smaller-granularity requests to larger-granularity cache lines (e.g., 8 64-byte requests aggregated to achieve a 512-byte cache line size). The sectors may have a fixed size specified by the memory access protocol used by the host system and the size of the management unit of the memory component in the backing store where the data is stored. For example, if the size of the management unit in the memory component is 512 bytes and the protocol specifies that 64 byte requests are used, the sectors may have a fixed data size of 64 bytes and the cache line may contain eight sectors to equal 512 bytes of the management unit. In some examples, the management unit may be, for example, 128 bytes, and may use only two sectors with a fixed data size of 64 bytes. The number of sectors of the write-read cache may be greater than the number of sectors of the read-only cache because fewer writes need to be performed to the backing store to increase the durability of the backing store. The memory subsystem may perform a 512 byte one request instead of eight 64 byte requests for the backing store to reduce the number of requests for backing stores with large management units in the memory component, thereby improving the endurance of the memory component.

In some embodiments, the read-only cache and/or the write-read cache may be preloaded with data prior to receiving a memory access request from a host system. For example, the read-only cache and/or the write-read cache may be preloaded during initialization of an application program executing on the host system. The memory protocol may contain semantics that enable an application to send a preload instruction to the memory subsystem to preload the read-only cache and/or the write-read cache with desired data. One or more read requests may be generated by the memory subsystem to obtain data from the backing store. As described below, the outstanding command queue may be used to store requests in the order in which they were generated, and priority scheduling may be performed to determine the schedule for executing the requests. A fill operation may be generated to store data obtained from the backing store in one or more sectors of a cache line in the read-only cache and/or the write-read cache. The application may send the preload instructions based on data that the application typically uses during execution or data that the application plans to use.

In addition, the outstanding command queue may be used to store read requests and write requests to prevent data hazards and to improve the quality of service of accessing data in the memory subsystem. The outstanding command queue may increase request traffic throughput based on different types of traffic in the memory subsystem. For example, the memory subsystem may use control logic and an outstanding command queue to provide ordered access to data requested at the same cache line and unordered access to data requested at different cache lines. Separate outstanding command queues may be used for read-only cache and write-read cache. Each cache line of the read-only outstanding command queue may correspond to a respective cache line in the read-only cache, and each cache line of the write-read outstanding command queue may correspond to a respective cache line in the write-read cache. The number of queues in each of the outstanding command queues may be less than the number of cache lines in the read-only cache and the write-read cache.

Typically, the request may be received from a host system. A read-only Content Addressable Memory (CAM) and a write-read CAM may be searched to determine whether a matching tag associated with the address contained in the request is present in the CAM. If a matching tag is found, the data may be returned from the corresponding cache line for a read request or the data may be written to the cache line for a write request. If no matching tag is found in either CAM, the read-only outstanding command queue and the write-read outstanding command queue may be searched for a matching tag. If a matching tag is found in any of the outstanding command queues, there is a pending request for the cache line assigned to the tag, and the received request is stored in the queue after other requests for data at that address. If no matching tag is found in any of the outstanding command queues, one queue may be selected as the desired outstanding command queue, and the requested tag may be assigned to the selected outstanding command queue. In addition, the memory subsystem may set a block bit to block the selected outstanding command queue and store the request in the selected outstanding command queue. The requests may be processed in the order in which the requests were received in the same queue. The different cache lines may be accessed out of order based on when the request is received and by using the block bit to block and unblock the different outstanding command queues assigned to the different cache lines, as described in further detail below.

In some embodiments, to further improve the performance and quality of service of the memory subsystem, a priority scheduler may be used with the priority queue to determine the scheduling of when to perform request and fill operations. As described above, the outstanding command queue may queue misses for read requests and misses for write requests for data in the cache. The priority scheduler may determine when to perform scheduling of requests based on when the requests are received. The priority scheduler may generate and assign a priority indicator (e.g., a token having a priority value) to the requests to maintain an order of the generated requests and fill operations to store data obtained from the backing store at the cache line of the cache.

For example, for a read request miss, the priority scheduler may generate a priority indicator having a higher priority value for a stuff operation associated with the particular read request, which may be assigned when data associated with the particular read request is obtained from the backing store. When a request is stored in the outstanding command queue and scheduling for execution is determined, the priority scheduler may relay the request to be stored in the priority queue. The requests may be processed in the order in which they are stored in the priority queue to obtain data associated with the requests from the backing store or to write data associated with the requests to the backing store. Data returned from the backing store may be stored in a fill queue with a fill operation assigned a priority indicator. The priority indicator may specify that a fill operation is to be performed first in the fill queue and may be used to regulate processing of requests through the outstanding command queue.

As described further below, certain paradigms exist when data requests stored at different cache lines may be executed out of order. That is, one request for read data may be executed from an outstanding command queue, but another request for the same data in the same outstanding command queue may be blocked to allow another request to be executed in a different outstanding command queue. In such examples, requests may be executed out of order based on a priority indicator assigned to the request and a fill operation associated with the request. Requests may be executed out of order between the outstanding command queues to prevent applications from having to wait to obtain data from the backing store. Such techniques may improve the quality of service of data returned to the host system, thereby improving the performance of the memory subsystem.

Advantages of the present disclosure include, but are not limited to, improving the endurance of a memory component by using sectored cache lines to accumulate requests such that the number of requests performed on the memory component can be reduced. Moreover, the use of separate read-only and write-read caches may provide separate space for applications executing on the host system to read data from and write data to. The separate space may improve the performance of the application program in accessing data by detecting the type of memory access workload used by the application program and selecting the appropriate cache to implement the memory access to the application program. In addition, by using the outstanding command queue and priority scheduler to determine the scheduling of execution requests, the quality of service and performance of the memory subsystem may be improved.

FIG. 1 illustrates an example computing environment 100 that includes a memory subsystem 110, according to some embodiments of the present disclosure. Memory subsystem 110 may contain media, such as memory components 112A through 112N. The memory components 112A-112N may be volatile memory components, non-volatile memory components, or a combination thereof. In some embodiments, the memory subsystem is a storage system. An example of a storage system is an SSD. In some embodiments, memory subsystem 110 is a hybrid memory/storage subsystem. In general, the computing environment 100 may contain a host system 120 that uses a memory subsystem 110. For example, the host system 120 may write data to the memory subsystem 110 and read data from the memory subsystem 110.

The host system 120 may be a computing device such as a desktop computer, a laptop computer, a network server, a mobile device, or such a computing device that includes memory and a processing device. The host system 120 may contain or be coupled to the memory subsystem 110 such that the host system 120 may read data from or write data to the memory subsystem 110. The host system 120 may be coupled to the memory subsystem 110 via a physical host interface. As used herein, "coupled to" generally refers to a connection between components that may be an indirect communication connection or a direct communication connection (e.g., without intervening components), whether wired or wireless, including connections such as electrical, optical, magnetic, and the like. Examples of physical host interfaces include, but are not limited to, Serial Advanced Technology Attachment (SATA) interfaces, peripheral component interconnect express (PCIe) interfaces, Universal Serial Bus (USB) interfaces, fibre channel, serial attached scsi (sas), and the like. The physical host interface may be used to transfer data between the host system 120 and the memory subsystem 110. When the memory subsystem 110 is coupled with the host system 120 over a PCIe interface, the host system 120 may further utilize an NVM express (NVMe) interface to access the memory components 112A-112N. The physical host interface may provide an interface for passing control, address, data, and other signals between the memory subsystem 110 and the host system 120.

The memory components 112A-112N may include any combination of different types of non-volatile memory components and/or volatile memory components. Examples of non-volatile memory components include NAND (NAND) type flash memory. Each of memory components 112A-112N may include one or more arrays of memory cells, such as single level memory cells (SLC) or multi-level memory cells (MLC) (e.g., three level memory cells (TLC), four level memory cells (QLC)). In some embodiments, a particular memory component may contain both SLC and MLC portions of a memory cell. Each of the memory cells may store one or more bits of data (e.g., a block of data) used by the host system 120. Although non-volatile memory components such as NAND type flash memory are described, the memory components 112A-112N may be based on any other type of memory such as volatile memory. In some embodiments, memory components 112A-112N may be, but are not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Phase Change Memory (PCM), Magnetic Random Access Memory (MRAM), NOR (NOR) flash memory, Electrically Erasable Programmable Read Only Memory (EEPROM), and cross-point arrays of non-volatile memory cells. A cross-point array of non-volatile memory may perform bit storage based on changes in bulk resistance in conjunction with a stackable cross-grid data access array. In addition, in contrast to many flash-based memories, cross-point non-volatile memories may perform write-in-place (write in-place) operations in which non-volatile memory cells may be programmed without prior erasure of the non-volatile memory cells. Further, the memory cells of memory components 112A-112N may be grouped into memory pages or data blocks that may refer to one cell of a memory component for storing data.

A memory system controller 115 (hereinafter referred to as a "controller") may communicate with the memory units 112A to 112N to perform operations such as reading data, writing data, or erasing data at the memory units 112A to 112N, and other such operations. The controller 115 may include hardware such as one or more integrated circuits and/or discrete components, buffer memory, or a combination thereof. The controller 115 may be a microcontroller, special purpose logic circuitry (e.g., a Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), etc.), or other suitable processor. The controller 115 may include a processor (processing device) 117 configured to execute instructions stored in a local memory 119. In the illustrated example, the local memory 119 of the controller 115 includes embedded memory configured to store instructions for executing various processes, operations, logic flows, and routines that control the operation of the memory subsystem 110, including handling communications between the memory subsystem 110 and the host system 120. In some embodiments, local memory 119 may contain memory registers that store memory pointers, prefetched data, and the like. The local memory 119 may also include a Read Only Memory (ROM) for storing microcode. Although the example memory subsystem 110 in fig. 1 has been shown to contain a controller 115, in another embodiment of the present disclosure, the memory subsystem 110 may not contain a controller 115, but may rely on external control (e.g., provided by an external host or by a processor or controller separate from the memory subsystem).

In general, the controller 115 may receive commands or operations from the host system 120 and may convert the commands or operations into instructions or appropriate commands to achieve the desired access to the memory components 112A-112N. The controller 115 may be responsible for other operations such as wear leveling operations, garbage collection operations, error detection and Error Correction Code (ECC) operations, encryption operations, cache operations, and address translation between logical block addresses and physical block addresses associated with the memory components 112A-112N. The controller 115 may also include host interface circuitry to communicate with the host system 120 via a physical host interface. The host interface circuitry may convert commands received from the host system into command instructions to access the memory units 112A-112N and convert responses associated with the memory units 112A-112N into information for the host system 120.

Memory subsystem 110 may also contain additional circuitry or components not shown. In some embodiments, memory subsystem 110 may contain a cache or buffer (e.g., DRAM) and address circuitry (e.g., row decoder and column decoder) that may receive addresses from controller 115 and decode the addresses to access memory components 112A-112N.

Memory subsystem 110 contains cache component 113, which may use separate read-only and write-read caches in the memory subsystem. In some embodiments, controller 115 includes at least a portion of cache component 113. For example, the controller 115 may include a processor 117 (processing device) configured to execute instructions stored in the local memory 119 to perform the operations described herein. In some embodiments, cache component 113 is part of host system 110, an application program, or an operating system.

Cache component 113 may use separate read-only and write-read caches in memory subsystem 110. The read-only cache may be used for sequential read requests for data in the memory component, while the write-read cache may be used for handling read and write requests for data in the media component. A separate cache may improve the performance of the memory subsystem by reading/writing data faster than backing store areas that are slower per access. In addition, the separate cache increases the durability of the backing store by reducing the number of requests for the backing store by using sectors in the cache line. In some embodiments, cache component 113 may detect a memory access workload, such as a sequential memory access workload or a random memory access workload. Data requested in the sequential memory access workload may be filled in the read-only cache for faster access than each use of the backing store. Data associated with random write and read requests may be populated in the write-read cache. In some embodiments, the cache component 113 may receive preload instructions from one or more applications executing on the host system 120 and preload the read-only cache and/or the write-read cache to improve quality of service.

In addition, the cache component 113 may use an outstanding command queue to store read requests and write requests to prevent data hazards and improve the quality of service of accessing data in the memory subsystem. The outstanding command queue may increase request traffic throughput based on different types of traffic in the memory subsystem. The controller may use control logic and an outstanding command queue to provide ordered access to data requested at the same cache line and unordered access to data requested at different cache lines.

In some embodiments, to further improve the performance and quality of service of the memory subsystem, the cache component 113 may use a priority scheduler with a priority queue to determine the scheduling of when to perform request and fill operations. As described above, the outstanding command queue may queue misses for read requests and misses for write requests for data in the cache. The priority scheduler may determine when to perform scheduling of requests based on when the requests are received. The priority scheduler may generate and assign a priority indicator (e.g., a token having a priority value) to the requests to maintain an order of the generated requests and fill operations to store data obtained from the backing store at the cache line of the cache.

FIG. 2 illustrates an example cache component 113 and local memory 119 of the memory subsystem 110 according to some embodiments of the present disclosure. As depicted, local memory 119 may contain a separate read-only cache 200 and write-read cache 202. Cache component 113 may contain a read-only Content Addressable Memory (CAM)204 for read-only cache 200, a write-read CAM 206 for write-read cache 202, a read-only outstanding command queue 208, and a write-read outstanding command queue 210. The read-only outstanding command queue 208 and the write-read outstanding command queue 210 may be first-in-first-out (FIFO) queues. The structure and contents of the read-only CAM 204, the write-read CAM 206, the read-only outstanding command queue 208, and the write-read outstanding command queue 210 are discussed further below. Cache component 113 also includes a priority scheduler 212 that uses priority indicators (e.g., digital tokens) to determine a schedule for performing request and/or fill operations. Cache component 113 may contain a state machine that also determines the number of read requests required for the size of a cache line of read-only cache 200 or write-read cache 202 to be filled with data from a backing store. The priority scheduler 212 may also contain arbitration logic that determines the order in which requests and/or stuff operations are executed. The arbitration logic may specify scheduling requests and/or padding operations in the order in which the operations are received. One purpose of the arbitration logic may be to not keep the application waiting if data is obtained from the backing store in cache component 113. In this way, the priority scheduler 212 may assign a higher priority to the fill operation and data. Additional functionality of the priority scheduler 212 is discussed below.

Cache component 113 also contains various queues for different purposes. The queue may be a first-in-first-out (FIFO) queue. As such, the queues may be used to process requests, operations, and/or data in the order in which they were received and stored in the various queues. Cache component 113 may include a fill queue 214, a hit queue 216, an evict queue 218, a priority queue 220, and a hold queue 222. Fill queue 214 may store data obtained from the backing store and fill operations generated for the data. When a read request is received and the requested data is not found in read-only cache 200 or write-read cache 202 (a cache miss), a fill operation may be generated. Hit queue 216 may store the found data request (cache hit) in read-only cache 200 or write-read cache 202.

Eviction queue 218 may be used to evict data from read-only cache 200 and/or write-read cache 202 as needed. For example, when read-only cache 200 and/or write-read cache 202 are full (each cache line containing at least some valid data in one or more sectors), a cache line having the least recently used data for eviction may be selected using, for example, a least recently used eviction policy. Data for the selected cache line may be read from read-only cache 200 and/or write-read cache 202 and stored in evict queue 218. The selected cache line may then be invalidated by setting the valid bit to the invalid state. The invalid cache line may be used to store subsequent data.

The priority queue 220 may store requests to be executed on a backing storage area. Priority scheduler 212 may assign a priority indicator to each request received and/or to the fill operation generated for the request at the time the request was received. Priority scheduler 212 may use the priority indicators to determine scheduling of execution requests and/or fill operations. Based on the determined schedule, the priority scheduler 212 stores the requests in the priority queue 220 to execute the requests on the backing store in the order in which the requests were stored in the priority queue 220. When no read-only outstanding command queue 208 is available or a write-read outstanding command queue 210 is available, the pending queue 222 may store received requests for data not found in the caches 200 and 202.

Read-only cache 200 and write-read cache 202 contained in local memory 119 may provide faster access to data stored in slower memory components of the backing store. Read-only cache 200 and write-read cache 202 may be high performance, low capacity media that store data that is frequently accessed (temporal locality) by applications of host system 120 or data located in memory regions that have recently been accessed (spatial locality). An application binary or paging software system that uses the memory subsystem as an address space may have separate memory address regions for reading data from and writing data to the system using the read-only cache 200 and the write-read cache 202. There may be many cache lines in each of read-only cache 200 and write-read cache 202. As discussed further below, each cache line may include one or more sectors of fixed size.

For read requests, cache component 113 searches read-only CAM 204 and write-read CAM 206 to determine if a matching tag is found. Finding a matching tag indicates that the data is stored on a cache line of the read-only cache 200 or the write-read cache 202, depending on which CAM 204 or 206 the matching tag is found at. If there is a hit, meaning a matching tag is found in one of the CAMs 204 or 206, the request is executed relatively quickly compared to accessing the backing store. If there is a miss, meaning that no matching tag is found in one of the CAMs 204 or 206, then the read only outstanding command queue 208 and the write-read outstanding command queue 210 are searched for a matching tag. If there is a hit and a matching tag is found in one of the outstanding command queues 208 or 210, the request is stored in the outstanding command queue assigned the matching tag. If there is a miss in the outstanding command queues 208 and 210, one of the outstanding command queues 208 or 210 may be selected and assigned to the tag contained in the address of the request. The outstanding command queues 208 and 210 may prevent data hazards by enabling request processing in the order in which requests for a cache line are received. In addition, the outstanding command queues 208 and 210 may improve quality of service and performance by: when data is obtained faster for requests received after the first request, the requests are made executable out of order for different cache lines.

The read-only outstanding command queue 208 or the write-read outstanding command queue 210 may be selected based on the type of memory access workload currently being used by the application or based on the type of request. For example, if the memory access workload is sequential, a read-only outstanding command queue may be selected. If the memory access workload is random, the write-read outstanding command queue may be selected. If the request is to write data, a write-read outstanding command queue may be selected. In any example, a outstanding command queue having a valid bit set to an invalid state may be selected, and the tag of the request may be assigned to the selected outstanding command queue 208 or 210. Each of outstanding command queues 208 and 210 may correspond to a single cache line in cache 200 or 202 at a given time. When a tag is allocated, the valid bit of a selected one of the outstanding command queues 208 or 210 may be set to a valid state. If each of the outstanding command queues is used as indicated by the valid bit being set to a valid state, the request may be stored in the pending queue 222 until the outstanding command queue in the read-only outstanding command queue 208 or the write-read outstanding command queue 210 becomes invalid.

For a write request, cache component 113 can search read-only CAM 204 and invalidate the cache line if the cache line contains data for the address being requested. Cache component 113 can use write-read CAM 206 to identify empty, invalid cache lines in the write-read cache. Data may be written to a selected cache line in the write-read cache 202. A dirty bit in the write-read CAM 206 may be set for a cache line to indicate that data is written to the cache line. Writing data to the cache may be performed faster than writing data to the slower backing store. Subsequent write requests may write data to the same or different cache lines, and an alteration bit may be set in the write-read CAM 206 for the cache line that performed the subsequent write request. Further, if found in read-only cache 200, subsequent data associated with the write request may be invalidated. During operation, when the memory subsystem determines to flush either of caches 200 or 202, the modified cache line may be identified and queued to evict queue 218 for sending to the backing store.

FIG. 3 is a flow diagram of an example method 300 for using separate read-only and write-read caches based on a determined memory access workload of an application, in accordance with some embodiments of the present disclosure. Method 300 may be performed by processing logic that may comprise hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuits, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, method 300 is performed by cache component 113 of FIG. 1. Although shown in a particular order or sequence, the order of the processes may be modified unless otherwise specified. Thus, the illustrated embodiments should be understood only as examples, and the illustrated processes may be performed in a different order, and some processes may be performed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Thus, not all processes are required in every embodiment. Other process flows are also possible.

At operation 310, the processing device determines a memory access workload of the application. The processing device may determine the memory access workload of the application by receiving a set of memory access requests from the application, determining a mode based on the set of memory access requests, and determining the memory access workload of the application based on the mode. For example, if a request reads the same or sequential address in a similar address region, the processing device may determine that the memory access workload is sequential and that the read-only cache should be used to store data associated with the request. Further, if the mode indicates that sequential read requests or operations are received sequentially, the processing device may determine that the memory access workload is sequential and that the read-only cache should be used to store data associated with the requests. If the mode indicates that random read requests and write requests are being received from the application, the processing device may determine that the memory access workload is random to the application and that the write-read cache should be used to store data associated with the request. In some embodiments, the write-read cache is used to store data associated with any write request.

At operation 320, the processing device determines whether a memory access workload of the application is associated with a sequential read operation. For example, as described above, it may be determined whether the memory access workload of an application is sequential or random. At operation 330, based on determining whether the memory workload of the application is associated with a sequential read operation, the processing device stores data associated with the application at one of a first type of cache (read-only) or another cache of a second type (write-read). When the memory access workload is associated with a sequential read operation, the processing device stores data associated with the application at the first type of cache. In some embodiments, if the processing device determines that the memory access workload is associated with write and read operations, the processing device may store data associated with the application at the second type of cache.

The processing device may determine whether the data is present in the read-only cache or the write-read cache by searching the corresponding read-only CAM and write-read CAM. If the data is present in a cache line of either cache, a read request may be performed and the data may be returned to the application. If the data does not exist, the read-only outstanding command queue and the write-read outstanding command queue may be searched for a tag associated with the address of the requested data. If no matching tag is found in the read-only outstanding command queue, the read request may be stored in one of the read-only outstanding command queues and executed to obtain data associated with the request from the backing store. If a matching tag is found in the read-only outstanding command queue, one or more requests for the cache line are stored in the outstanding command queue, and the current request is stored after other requests in the read-only outstanding command. The current request will be executed after other requests for the particular cache line based on the schedule determined by the priority scheduler. Further details regarding the operation of the outstanding command queue and the priority scheduler are discussed below.

In some embodiments, the processing device may receive data associated with the application in one or more requests for writing data to the memory component. One or more write requests may have a fixed data size. The fixed data size is specified by the memory semantics of the protocol used to communicate between the host system and the memory subsystem via the bus. The processing device may store data associated with the application at one or more sectors of a cache line of the second type of cache to accumulate data in the cache line based on determining whether a memory access workload of the application is associated with write and read operations. Each of the one or more sectors has a fixed data size. The processing device may determine when an accumulated data size of one or more sectors storing data associated with the application satisfies a threshold condition. In response to determining that an accumulated data size of one or more sectors storing data associated with the application satisfies a threshold condition, the processing device may transmit a request to store accumulated data at the memory component. When each sector in the cache line contains valid data, a write request may be sent to the backing store to write the accumulated data into the cache line. In this way, instead of issuing eight write requests to the backing store, only one write request for a cache line may be issued to the backing store. Using this technique can improve the endurance of the memory components in the backing store by performing fewer write operations.

Further, read requests may also be received from the application, and the read requests may each have a fixed data size. A cache line in a read-only memory may be broken up into one or more sectors, each sector having a fixed data size. When the data requested to be read is already present in either the read-only cache or the write-read cache, a read request may be performed to read the data from the appropriate cache line storing the requested data. The outstanding command queue may be used to process read requests when there is a cache miss and neither the read-only cache nor the write-read cache stores the requested data. The priority scheduler may determine the number of read requests to be performed based on the size of the cache line (e.g., two 64 byte sectors). For example, if only one read request for 64 bytes is received and the cache line size is 128 bytes, the priority scheduler may determine that two read requests for 64 bytes (128 bytes total) are to be performed to return all of the data to be stored in the cache line associated with the request.

In some embodiments, the processing device may receive a command or instruction from an application to preload data associated with the application to a read-only cache or a write-read cache. Such data may be data used or manipulated by an application. The processing device may preload the read-only cache or the write-read cache with data associated with the application before receiving any request from the application for access to the data. The instructions may be associated with memory semantics in a protocol for communicating between a host system and a memory subsystem. To process the preload instruction, the processing device may use a state machine in the priority scheduler to generate an appropriate number of read requests for the data. The processing device may store the generated read request in a read-only outstanding command queue or a write-read outstanding command queue for execution on a spare storage area. When data associated with a read request is obtained from the backing store, the data may be stored in one or more cache lines of a read-only cache or a write-read cache.

Fig. 4 is a flow diagram of an example method 400 for accumulating data in a cache using sectors of a fixed data size in a cache line, according to some embodiments of the present disclosure. Method 400 may be performed by processing logic that may comprise hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuits, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, method 400 is performed by cache component 113 of FIG. 1. Although shown in a particular order or sequence, the order of the processes may be modified unless otherwise specified. Thus, the illustrated embodiments should be understood only as examples, and the illustrated processes may be performed in a different order, and some processes may be performed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Thus, not all processes are required in every embodiment. Other process flows are also possible.

At operation 410, a processing device receives a set of requests to access data at a memory component. Each of the requests may specify a fixed size of data. The fixed size data is specified by a memory access protocol used by the host system to interface with a memory subsystem that includes one or more memory components in a backing store. The request may be to write data to the backing store.

At operation 420, the processing device stores data for each of the requests into a respective sector in a set of sectors of a cache line of the cache to accumulate the data in the cache line. Each respective sector in the set of sectors of the cache line stores a fixed size of cache data. The particular cache line selected may be in the write-read cache and may be selected by identifying an invalid cache line. In other words, a cache line that does not have any sectors containing either a set or altered bit setting may be initially selected to store the first requested data. The first write request may be stored in a write-read outstanding command queue, and a tag of the write request may be assigned to one of the outstanding command queues. The selected outstanding command queue may correspond to a cache line to which data is to be written. The processing device may execute a write request in the outstanding command queue to write data to a sector of a corresponding cache line. In addition, entries in the write-read CAM may be created with the tag of the write request. Subsequent requests to write data with the matching tag found in the write-read CAM may be stored in the hit queue and then executed to write data in other sectors. Whenever a sector is written, the valid bit of the sector may be set to a state indicating that valid data is stored. In addition, the change bit of a sector may be set to indicate that data is being written to the sector.

At operation 430, the processing device determines when the cumulative data size of a set of sectors storing data for each of the requests satisfies a threshold condition. The threshold condition may include the cumulative data size satisfying a data size parameter specified for accessing the memory component. For example, the data size parameter of a data access request for a memory component may be set to a larger granularity than the data size of a request received from a host. In one example, the data size parameter may be 512 bytes and the data size of the sector may be 64 bytes. The threshold condition may be met when 512 bytes of data are accumulated in eight sectors in a cache line.

At operation 440, in response to determining that the cumulative data size for the set of sectors meets the threshold condition, the processing device transmits a request to store cumulative data at the memory component. In the event that an application attempts to access data quickly, the data may remain in the cache line. For example, an application may read data from a cache line of a write-read cache.

In some embodiments, a processing device may receive commands or instructions to preload data in a cache (e.g., a read-only cache and/or a write-read cache) along with other data associated with an application. The processing device may preload the cache with other data associated with the application program prior to receiving the plurality of requests for access to the data in the memory component. If the application determines that the data will be frequently used by the application, the application may send instructions to the memory subsystem.

Fig. 5 illustrates an example read-only cache 200 and a write-read cache 202 according to some embodiments of the present disclosure. Separate read-only cache 200 and write-read cache 202 may provide separate address spaces for applications or paging systems for reading data from and writing data to the system, which may improve performance of the memory subsystem. Read-only cache 200 and write-read cache 202 contain a plurality of cache lines 500 and 504, respectively. Although only four cache lines are shown in each of caches 200 and 202, it should be understood that many more cache lines (e.g., hundreds, thousands, etc.) may be included. The total size of each of caches 200 and 202 may be any suitable amount, such as 32 kilobytes.

As depicted, a cache line 500 in the read-only cache 200 includes two sectors 502. Each of the sectors has a fixed size, which may be equal to the data size of a request sent from the host system. The data size of the request may be specified by memory semantics of a protocol used to interface between the host system and the memory subsystem via the bus. In one example, the sectors may each be 64 bytes, and the total data size of cache line 500 may be 128 bytes. Furthermore, cache line 504 in write-read cache 202 contains more sectors 506 than read-only cache 200 because the number of write operations needs to be performed on the backing store less than the number of read operations to increase the endurance of the memory components in the backing store. In the depicted example, the write-read cache 202 includes eight sectors that also have a fixed data size (e.g., 64 bytes). The fixed data size may also be equal to the data size of a request received from the host system. In one example, the fixed data size for each sector of a cache line 504 in the write-read cache 202 may be 64 bytes. The write-read cache 202 may accumulate data for eight write requests until the cumulative data size for eight sectors 506 meets a threshold condition. The threshold condition may be that the accumulated data size satisfies a data size parameter specified for accessing the memory component. The data size parameter may be the data size of the management unit of the memory component, e.g. 512 bytes. In response to determining that the cumulative data size of the set of sectors 506 of the cache line 504 storing each of the requested data satisfies the threshold condition, the cache component may transmit a write request to store the cumulative data at the backing storage area.

FIG. 6 is a flow diagram of an example method 600 for storing read requests for data not present in a cache in an outstanding command queue, according to some embodiments of the present disclosure. Method 600 may be performed by processing logic that may comprise hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuits, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, method 600 is performed by cache component 113 of FIG. 1. Although shown in a particular order or sequence, the order of the processes may be modified unless otherwise specified. Thus, the illustrated embodiments should be understood only as examples, and the illustrated processes may be performed in a different order, and some processes may be performed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Thus, not all processes are required in every embodiment. Other process flows are also possible.

At operation 610, the processing device receives a request to read data stored at the memory subsystem. A request to read data may be sent from an application executing on the host system. The request may contain an address from which to read data in the memory subsystem. An identifier called a "tag" may be extracted from the address. The tag may be a subset of bits of the address that may be used to identify the location of the data at the address in the memory subsystem.

At operation 620, the processing device determines whether the data is stored at a cache of the memory subsystem. In some embodiments, the processing device searches the read-only CAM and the write-read CAM for a tag associated with the requested data. The read-only CAM and the write-read CAM may contain tags corresponding to the data stored at each cache line in the respective CAM. The processing device may use comparators in each of the read-only CAM and the write-read CAM to determine whether a matching tag is found for the requested address from which data is to be read. Determining whether the data is stored in the cache may include determining whether the valid bit is set to the valid state of the data in the CAM in which the matching tag was found. In response to determining that the data is stored at the cache of the memory subsystem, the processing device may store the request at another queue (e.g., a hit queue) for managing execution of requests for data present in the cache.

At operation 630, in response to determining that the data is not stored at the cache of the memory subsystem, the processing device determines one queue of a set of queues to store the request with other read requests for the data stored at the memory subsystem. In some embodiments, a cache may refer to a read-only cache or a write-read cache. The set of queues may include a read-only outstanding command queue or a write-read outstanding command queue, depending on which cache is selected to service the request. These queues may be used to store cache misses (e.g., read misses and write misses). As discussed above, the memory access workload may dictate which cache to use to service the request. If the memory access workload comprises a sequential read operation, a read-only cache and a read-only outstanding command queue may be used to service the request. If the memory access workload includes random read and write operations, a write-read cache and a write-read outstanding command queue may be used to service the request.

The processing device may determine the queue by determining whether any queue in the set of queues is associated with an identifier of the request. The processing device may search the read-only outstanding command queue and the write-read outstanding command queue for a queue assigned with an identifier of the request. If no queue is assigned an identifier, the processing device may select a queue from the appropriate set of queues that has a valid bit set to an invalid state and/or a blocked bit set to an unblocked state. The processing device may store the request in an invalid state in the queue, assign a tag to the queue, set the valid bit to the valid state, and set the lockout bit to the lockout state. If each queue in the appropriate set of queues is being used (with the valid state set to the valid bit and the block bit set to the blocked state), the request is stored in the pending queue until one queue in the appropriate set of queues becomes invalid.

If one of the queues in the set of queues is assigned an identifier and is valid, there are other requests that have been received for the same address that have been stored in the queue. At operation 640, the processing device stores the request at the determined queue with other read requests for data stored at the memory subsystem. Each queue in the set of queues corresponds to a respective cache line of the cache. The queue corresponds to the corresponding cache line by assigning the tag of the request to the queue corresponding to the cache line storing the requested data in the appropriate cache and also assigning the tag to an entry in the appropriate CAM.

The request may be assigned a priority indicator and relayed to the priority queue by the priority scheduler when the request is stored in the queue, as discussed further below. The priority scheduler may determine the number of requests to generate for the request based on the size of the cache line. For example, if the request is up to 64 bytes, but the cache line size is 128 bytes, the priority scheduler may determine to generate two 64 byte requests to read data from the backing store to fill the entire cache line with valid data. The priority scheduler may increment the read counter and the fill counter when a request is stored in the priority queue. The request may be performed on the backing store to obtain the desired data and the read counter may be decremented.

Data obtained from the backing store may be stored in another queue (the fill queue) with a fill operation. The processing device may assign a priority indicator to the fill operation and perform the fill operation to store the data to the appropriate cache line in the cache. The processing device may set a block bit of a queue storing requests to an unblocked state and may decrement a fill counter. A CAM entry may be generated for a cache line storing data and a tag may be assigned to the CAM entry. The processing device may execute requests in the queue to read data from or write data to the cache line. Further, after executing the request in the queue, the processing device may invalidate the queue by setting the valid bit to an invalid state and de-assigning the tag. The queue may then be reused for the same or another cache line based on the subsequent request received.

In some embodiments, a processing device may receive a write request to write data to a backing store. The processing device may obtain the tag from the request and search for the tag in the write-read CAM and write-read outstanding command queues. If a matching tag is found in the write-read CAM, the data in the request is written to the cache line corresponding to the tag. The processing device may select one or more invalid sectors in the cache line to which data is to be written. When writing data into one or more sectors, the processing device may set the valid bit and the alter bit of the one or more sectors in a write-read CAM entry corresponding to a cache line containing the one or more sectors.

If no matching tag is found in the write-read CAM, but a matching tag is found in the write-read outstanding command queue, then the other requests containing that tag are stored in the identified queue assigned the matching tag. The processing device may store the write request in a queue assigned a matching tag, and the request may be similarly processed in the order in which it was received to write data to one or more sectors of a corresponding cache line in the write-read cache. For example, the priority scheduler may generate a priority indicator for the write request and assign the priority indicator to the write request. The priority scheduler may store the write request in a priority queue and execute the write request to write data to the cache line when the write request reaches the front of the queue. Storing the write request in the outstanding command queue may prevent data hazards from occurring by not allowing the write request to be executed before other requests received before the write request.

FIG. 7 is a flow diagram of an example method 700 for executing a request stored in an outstanding command queue according to some embodiments of the present disclosure. Method 700 may be performed by processing logic that may comprise hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuits, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, method 700 is performed by cache component 113 of FIG. 1. Although shown in a particular order or sequence, the order of the processes may be modified unless otherwise specified. Thus, the illustrated embodiments should be understood only as examples, and the illustrated processes may be performed in a different order, and some processes may be performed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Thus, not all processes are required in every embodiment. Other process flows are also possible.

At operation 710, the processing device determines that data requested by a set of read operations has been retrieved from a memory component of the memory subsystem. Data retrieved from the memory component may be associated with the generated stuff operation, and the data and stuff operation may be stored in a stuff queue.

At block 720, the processing device performs one or more fill operations to store data at a cache line of a cache of the memory subsystem. A fill operation may be generated when data is retrieved from the backing store. When a fill operation is generated, the fill operation may be stored in a fill queue along with associated data. The fill operations may be performed in the order in which they are stored in the fill queue. Performing a fill operation may include deleting data from the fill queue and storing the data at an appropriate cache line in a cache (e.g., a read-only cache or a write-read cache). The processing device may decrement a fill counter for each of the one or more fill operations performed. In response to performing one or more fill operations to store data at the cache line, the processing device may set a block bit associated with the determined queue to an unblocked state to enable execution of a request stored at the determined queue.

At operation 730, the processing device determines one of a set of queues corresponding to data that has been requested by the set of read operations. Each queue in the set of queues corresponds to a respective cache line in a set of cache lines of a cache of the memory subsystem. The cache may be a read-only cache and/or a write-read cache, and the set of queues may be a read-only outstanding command queue and/or a write-read outstanding command queue. Determining that the queue corresponds to the requested data may include determining whether the queue is assigned an identifier (e.g., a tag) associated with the data.

At operation 740, in response to performing one or more fill operations to store data at the cache line, the processing device performs a set of read operations stored at the determined queue in the order that the memory subsystem has received the set of read operations. The use of an outstanding command queue for storing requests enables ordered access to cache lines corresponding to the outstanding command queue, which may prevent data hazards in the memory subsystem. The requests may be assigned priority indicators by a priority scheduler, which may be based on the order in which the requests are received by the memory subsystem, as described further below. The read operation may read the data stored at the cache line and return the data to the application that sent the request.

FIG. 8 illustrates examples of a read-only outstanding command queue 208, a write-read outstanding command queue 210, a read-only content addressable memory 204, and a read-only content addressable memory 206, according to some embodiments of the present disclosure. As depicted in the figure, the read-only outstanding command queue 208 may contain multiple entries, and each entry may contain fields for a tag, a queue counter, a queue for requests, a read counter and valid bits, and a fill counter and valid bits.

The tag field stores a tag obtained from a request received from a host system. The queue counter may track the number of requests stored in an entry of the queue. The queue counter (qc) may be incremented when additional requests are stored in the queue, and decremented when requests are executed and removed from the queue. The queue for requests may have any suitable number of entries. In one example, the number of entries in the queue is equal to the number of sectors in a cache line of the read-only cache. When a request is stored in the queue, there may be a blocking bit set for the request.

The read counter (R) may track the number of read operations to be performed to obtain the requested data from the backing store. The read counter may be incremented when multiple read operations are determined to retrieve data from the backing store, and the read counter may be decremented when a read operation is performed on the backing store to obtain data. The valid bit of the read counter may indicate whether the data associated with the read is valid or invalid. A fill counter (F) may track the number of fill operations to be performed to store requested data in a cache line corresponding to the queue of storage requests. The fill counter may be incremented when a fill operation is generated and decremented when a fill operation is performed. The valid bit of the stuff counter may indicate whether data associated with the stuff operation is valid or invalid.

The write-read outstanding command queue 210 may contain multiple entries, and each entry may contain fields for a tag, a queue counter, a queue for requests, an eviction counter (E) and valid bits, and a write-back counter (WB) and valid bits. The tag field stores a tag obtained from a request received from a host system. The queue counter may track the number of requests stored in an entry of the queue. The queue counter may be incremented when additional requests are stored in the queue, and decremented when requests are executed and removed from the queue. The queue for requests may have any suitable number of entries. In one example, the number of entries in the queue is equal to the number of sectors in a cache line of the write-read cache. When a request is stored in the queue, there may be a blocking bit set for the request.

The eviction counter may track the number of eviction operations to be performed to delete data from the write-read cache. The eviction counter may be incremented when data in the cache line is selected for eviction, and the eviction counter may be decremented when data in the cache line is evicted from the cache. The valid bit of the eviction counter may indicate whether data associated with the eviction is valid or invalid. The write-back counter may track a number of write operations to be performed to write data in a cache line corresponding to the queue to the backing store. The writeback counter may be incremented when the write request is stored in the queue and decremented when the write request is executed. The valid bit of the write back counter may indicate whether the data associated with the write operation is valid or invalid.

The read-only CAM 204 may contain multiple entries, and each entry may contain a field for a tag, a valid bit for each sector, a change bit for each sector, and an address. The tag field stores the tag obtained from the request. When a sector of the cache line corresponding to the entry stores valid data, the valid bit of each sector may be set. When data is stored at a sector, the change bit for each sector may be set. The address field may store the address contained in the request.

Write-read CAM 206 may contain multiple entries, and each entry may contain a field for a tag, a valid bit for each sector, a change bit for each sector, and an address. The tag field stores the tag obtained from the request. When a sector of the cache line corresponding to the entry stores valid data, the valid bit of each sector may be set. When data is stored at a sector, the change bit for each sector may be set. The address field may store the address contained in the request.

When a request is received to access (e.g., read or write) data, a tag may be obtained from the address of the data contained in the request. Processing device may search read-only outstanding command queue 208, write-read outstanding command queue 210, read-only CAM 204, and write-read CAM 206 for a matching tag. If the read-only CAM 204 or the write-read CAM 206 contains a matching tag, then there is a cache hit and the request may be stored in the hit queue for execution. For example, for a read request cache hit, data stored at a cache line corresponding to an entry in CAM 204 or 206 having a matching tag may be returned to the requesting application. For a write request cache hit, the data in the request may be written to a cache line corresponding to an entry in the read CAM 206 having a matching tag. When a write begins, the change bits for the sectors in the write-read CAM 206 for the written data may be set. When data is written to a sector, the valid bit in the write-read CAM 206 for the sector may be set.

If no matching tag is found in either the read-only CAM 204 or the write-read CAM 206, but a matching tag is found in the read-only outstanding command queue 206, the request may be stored in an empty entry in the read-only outstanding command queue corresponding to the matching tag. When a request is stored in the read-only outstanding command queue, the queue counter may be incremented, and the read counter and the fill counter may be incremented.

If no matching tag is found in either the read-only CAM 204 or the write-read CAM 206, but a matching tag is found in the write-read outstanding command queue 210, the request may be stored in an empty entry in the write-read outstanding command queue corresponding to the matching tag. When a request is stored in the write-read outstanding command queue, the queue counter may be incremented, and the writeback counter may be incremented.

If no matching tag is found in any of the read-only CAM 204, the write-read CAM 206, the read-only outstanding command queue 206, or the write-read outstanding command queue 210, a queue is selected from the read-only outstanding command queue 204 or the write-read outstanding command queue 210 based on the amount of memory access work used by the application. If the memory access workload is using a sequential read operation, then the read only outstanding command queue 208 is selected for the store request. The read-only command queue 208 contains a valid bit set to an invalid state, no tags are assigned, and unblocked entries may be selected to store read requests. The read request may be stored in a read-only outstanding command queue, a tag of the request may be stored in a tag field, a blocking bit may be set for the request in the queue, a queue counter may be incremented, a read counter may be incremented, a fill counter may be incremented, and/or a valid bit may be set for the read counter.

If the memory access workload is using random write and read operations, then the write-read outstanding command queue 210 is selected for the store request. The read-only command queue 208 contains a valid bit set to an invalid state, no tags are assigned, and an unblocked entry may be selected to store the write request. The write request may be stored in a write-read outstanding command queue, a tag of the request may be stored in a tag field, a lock bit may be set for the request in the queue, a queue counter may be incremented, a writeback counter may be incremented, and a valid bit may be set for the writeback counter.

FIG. 9 is a flow diagram of an example method 900 for determining a schedule for executing requests in a memory subsystem in accordance with some embodiments of the present disclosure. Method 900 may be performed by processing logic that may comprise hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuits, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, method 900 is performed by cache component 113 of FIG. 1. Although shown in a particular order or sequence, the order of the processes may be modified unless otherwise specified. Thus, the illustrated embodiments should be understood only as examples, and the illustrated processes may be performed in a different order, and some processes may be performed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Thus, not all processes are required in every embodiment. Other process flows are also possible.

At operation 910, the processing device receives a request to read data stored at the memory subsystem. A request to read data may be received from an application executing on a host system. The request may contain the address of the memory subsystem from which the data was read.

At operation 920, the processing device determines whether the data is stored at a cache of the memory subsystem. The memory subsystem may contain separate read-only and write-read caches. The processing device may determine whether the data is stored in the cache by obtaining a tag from the address contained in the request. The processing device may search in the read-only CAM and the write-read CAM to determine if a matching tag is contained in either CAM. If no matching tag is found, the processing device determines that the data is not stored at the read-only cache or the write-read cache.

The processing device may determine that the tag is not included in both the read-only outstanding command queue and the write-read outstanding command queue by searching for a matching tag in both the queues. As described above, the processing device may select a queue from a read-only outstanding command queue or a write-read outstanding command queue. The processing device may execute a state machine contained in the priority scheduler or separately implemented to determine the number of requests needed to obtain data based on the size of the cache line in the appropriate cache. The processing device may store one or more requests in a selected outstanding command queue for storing requests to read from or write to addresses associated with data. The processing device may determine that a fill operation is to be used for a read request to store data obtained from the backing store to the cache. The priority scheduler may generate priority indicators (e.g., tokens with numerical values) for read requests and stuff operations. The processing means may employ the following strategies: the stuff operation is specified to have a priority indicator with a higher value to enable the stuff operation to be performed before the read request. Priority indicators may be generated and assigned to read requests and stuff operations in the order in which the read requests were received.

At operation 930, the processing device obtains data from a memory component of the memory subsystem in response to determining that the data is not stored at the cache of the memory subsystem. The processing device may obtain data from the memory component by storing the read request in the priority queue and executing the read request to obtain the data from the memory component. The stuff operation may be generated when data is obtained from the memory component.

At operation 940, the processing device assigns a first priority indicator (e.g., a token having a value of "1") to a stuff operation associated with data obtained from the memory component. The fill operation and data obtained from the memory component may be stored in a fill queue.

At operation 950, the processing device assigns a second priority indicator (e.g., a token with a value of "2") to the request to read data. The first priority indicator assigned to the stuff operation may have a higher priority value than the second priority indicator assigned to the request to read data.

At operation 960, the processing device schedules an order of performing the stuff operations and requests for read data based on the first priority indicator and the second priority indicator. Priority scheduling may use arbitration logic to determine the schedule. If no other requests are received, the processing device may use the schedule to perform a fill operation to delete data from the fill queue and store the data in the cache line corresponding to the queue storing the read request. Further, the processing device may execute the read request to read data in the cache line and return the data to the requesting application.

In some embodiments, the processing device may receive a second request to read data stored at the memory subsystem while the request to read data is stored in an outstanding command queue (e.g., read-only or write-read). The processing device may determine whether an identifier (tag) associated with the second request for read data is allocated with an outstanding command queue. In response to determining that the identifier associated with the second request for read data is assigned to the outstanding command queue, the processing device may assign a third priority indicator to the second request and store the second request in the outstanding command queue in an entry subsequent to the initial request for read data.

In some embodiments, the processing device may receive a third request to write other data to an address associated with the data at the memory subsystem. The processing device may determine whether an identifier associated with a third request for writing other data is assigned to the queue. In response to determining that an identifier associated with a third request for writing other data is assigned to the queue, the processing device may assign a fourth priority indicator to the third request and store the write request in an entry following the second request. The processing device may determine a schedule to perform a stuff operation, a request for read data, a second request for read data, and a third request for writing other data based on the first priority indicator, the second priority indicator, the third priority indicator, and the fourth priority indicator. The schedule may reflect an order in which the requests, the second request, and the third request were received in the outstanding command queue. If no other requests are received, a schedule is available to perform a stuff operation, a request for read data, a second request for read data, and a third request for write other data.

In some embodiments, the processing device may receive a second request to read other data stored at the memory subsystem. The processing device may determine whether other data is stored at the cache of the memory subsystem by searching the read-only CAM and the write-read CAM for a tag that matches the tag contained in the second request. If the data is not stored at the cache and the processing device determines that a matching tag is not found in either the read-only outstanding command queue or the write-read outstanding command queue, a second outstanding command queue may be selected to store a second request to read other data. The second outstanding command queue storing second requests for reading other data may be different from the outstanding command queue storing requests for reading data. In response to determining that the other data is not stored at the cache of the memory subsystem, the processing device may obtain the other data from the memory component of the memory subsystem.

The processing device may determine that the second fill operation is to be used to store the requested data obtained from the memory component at the appropriate cache. The processing device may generate a priority indicator for the second stuff operation and the second request for read data. A second stuff operation may be generated and a third priority indicator may be assigned to the second stuff operation. A fourth priority indicator may be generated and assigned to a second request to read other data. The processing device may determine a schedule to perform a stuff operation, a request to read data, a second stuff operation, and a second request to read other data based at least on the first priority indicator, the second priority indicator, the third priority indicator, and the fourth priority indicator.

The processing device may perform the request to read data and the second request to read other data in an order different from an order in which the request to read data and the second request to read other data are received based on the determined schedule. For example, in some examples, even if a request for data is first sent to the backing store, other data requested by the second request may be returned from the backing store more quickly. In this example, the processing device may determine to process a second stuff operation for other data first and process a second request to read the other data before the stuff operation and the request to read the data. The fill operation may store data in a cache line corresponding to the outstanding command queue, and the second fill operation may store other data in a second cache line corresponding to the second outstanding command queue. A request to read data may read data from a cache line and return the data to the requesting application. A second request for read data may read other data from the second cache line and return the other data to the application that sent the second request.

In some embodiments, after a stuff operation, read request, and/or write request is performed, the priority indicator may be reused and reassigned to a subsequent stuff operation, read request, and/or write request. The processing means may set a limit on the number of priority indicators generated. The limit may be any suitable number and may be dynamically configured to achieve effective request throughput.

FIG. 10 is a flow diagram of another example method 1000 for determining a schedule for executing requests in a memory subsystem in accordance with some embodiments of the present disclosure. Method 1000 may be performed by processing logic that may comprise hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of a device, integrated circuits, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In some embodiments, method 1000 is performed by cache component 113 of FIG. 1. Although shown in a particular order or sequence, the order of the processes may be modified unless otherwise specified. Thus, the illustrated embodiments should be understood only as examples, and the illustrated processes may be performed in a different order, and some processes may be performed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Thus, not all processes are required in every embodiment. Other process flows are also possible.

At operation 1010, a processing device receives a set of requests for access to data stored at a memory subsystem. The set of requests may be received from one or more applications executing on the host system. The request may contain an address to access the data. If the data contained in the set of requests does not exist in the read-only cache or the write-read cache, the processing device determines in which of one or more outstanding command queues (e.g., read-only or write-read) the set of requests is stored. If the tags contained in the addresses of the set of requests are the same, the same outstanding command queue is used to store the set of requests. If the tags contained in the addresses of the set of requests are different, more than one outstanding command queue may be used to store the set of requests. For example, separate outstanding command queues may be assigned respective tags.

At operation 1020, the processing device assigns a set of priority indicators to the set of requests. For example, the priority indicator may be generated by the processing device and may comprise a numerical value. The priority indicator may reflect an order in which the set of requests are received by the memory subsystem. Priority indicators may be assigned to the set of requests stored in one or more outstanding command queues. In some examples, when the request is a read request, a stuff operation may be generated that is also assigned a priority indicator, as described above.

At operation 1030, the processing device determines an order in which to execute the set of requests for access data based on the set of priority indicators assigned to the set of requests. For example, if the priority indicator is a numerical value such as 1, 2, 3, 4, 5, 6, etc., the order may be sequential. If there are read requests in the set of requests, the processing device may use a state machine to determine a number of one or more read requests for each respective request in the set of requests to read data based on a size of a cache line in the cache. The processing device may store the one or more read requests in the priority queue based on the order. The processing device may execute one or more requests stored in the priority queue to read data from one or more memory components. The processing device may store the fill operation and the data in the fill queue in response to obtaining the data from the one or more memory components. The processing device may perform a fill operation to delete data from the fill queue and store the data in a cache line of a cache of the memory subsystem.

At operation 1040, in response to obtaining data from one or more memory components of the memory subsystem, the processing device executes the set of requests based on the determined order. In some embodiments, when a fill operation is stored in the fill queue, the processing device may perform the fill operation before performing a read request corresponding to the fill operation because the priority indicator assigned to the fill operation may have a higher priority value than the priority indicator assigned to the corresponding read request.

In some embodiments, a set of second requests may be received for access to other data stored at the memory subsystem. The processing device may assign a set of second priority indicators to the set of second requests. The set of second priority indicators may have a higher priority value than the set of priority indicators when other data is obtained from the one or more memory components before data is obtained from the one or more memory components. The processing device may determine an order of executing the set of requests and the set of second requests based on the set of priority indicators and the set of second priority indicators.

Fig. 11 illustrates an example of determining a schedule for executing a request based on a priority indicator using a priority scheduler, according to some embodiments of the present disclosure. In the depicted example, the processing device may determine that the memory access workload of the application includes a sequential read request, and that the read-only cache will be used for read requests received from the application. The processing device may receive a first read request from an application and search in the read-only CAM and the write-read CAM to determine whether the requested tag is found. If a matching tag is found in either CAM, a first read request may be sent to the hit queue and may be processed in the order in which it was received in the hit queue to return data to the application. If no matching tag is found in either CAM, the processing device may determine that read-only outstanding command queue 208 is used because the application is using a memory access workload of a sequential read request type.

In the depicted example, the processing device obtains tag "123" for the first read request and searches the read-only outstanding command queue 208 for a matching tag. The processing device does not find a matching tag and selects entry 1100 to store the first read request. The processing device may set a blocking bit associated with the read-only outstanding command queue in the request field of entry 1100 to a value indicating that the read-only outstanding command queue is blocked (e.g., when blocked, no requests in the read-only outstanding command queue may be executed). The processing device may increment the queue counter to "1". Processing device assigns a label "123" to the label field of entry 1100. The processing device may determine that a stuff operation is to be generated for data associated with the first read request returned from the backing store. Priority scheduler 212 may generate a priority indicator ("2") for the first read request and a priority indicator ("1") for the stuff operation. The value of the priority indicator of the stuff operation corresponding to the first read request may have a higher priority to enable data obtained from the backing store to be stored in the cache prior to execution of the first read request. The priority scheduler 212 may assign a priority indicator of "2" to the first read request in the outstanding command queue in the request field of entry 1100. As depicted in the figure, the processing device may also increment the read counter to "1" and increment the fill counter to "1".

The processing device may receive a second read request containing tag "234" and may search the read-only outstanding command queue 208 for a matching tag. The processing device does not find a matching tag in the read-only outstanding command queue 208 and the selected entry 1102 to store the second read request. The processing device may set a blocking bit associated with the read-only outstanding command queue in the request field of entry 1102 to a value indicating that the read-only outstanding command queue is blocked (e.g., when blocked, no requests in the read-only outstanding command queue may be executed). The processing device assigns a tag "234" to the tag field in the entry 1102. Priority scheduler 212 may determine that a stuff operation is to be generated for data associated with a second read request returned from the backing store. Priority scheduler 212 may generate a priority indicator ("4") for the second read request and a priority indicator ("3") for the stuff operation. The priority scheduler 212 may assign a priority indicator ("4") to the second read request in the outstanding command queue in the request field of entry 1102.

The processing device may receive a third read request containing tag "123" and may search the read-only outstanding command queue 208 for a matching tag. The processing device finds a matching tag of "123" in entry 1100. Thus, the processing device may store the third read request in the read-only outstanding command queue in the request field of entry 1100. As depicted in the figure, the processing device may increment the queue counter to "2". Priority scheduler 212 may determine that a stuff operation has been generated for data associated with the second read request having a label of "123" and therefore need not generate and assign a priority indicator to another stuff operation. Priority scheduler 212 may generate a priority indicator ("5") for the third read request only, and may assign priority indicator "5" to the third read request in the outstanding command queue in the request field of entry 1100.

Priority scheduler 212 may use priority queue 220 to store read requests and execute the read requests in the order in which the read requests were stored to obtain data from the backing store. As depicted in the figure, a first read request assigned priority indicator "2" is stored first in the priority queue 220, and a second read request assigned priority indicator "4" is stored second in the priority queue 220 because its priority indicator has a smaller priority value than the first read request. Further, the third read request may not be stored in the priority queue 220 because the first read request with the same tag may obtain data from the backing store at the address corresponding to the same tag.

The processing device may execute a first read request to obtain data corresponding to tag "123" from the backing store. After executing the first read request, the processing device may decrement the read counter in entry 1100 to "0". A first stuff operation for data obtained from the first read request may be generated and stored in stuff queue 214 along with data obtained from the backing store. The priority scheduler 212 may assign a priority indicator of "1" to the first stuff operation corresponding to the first read request.

The processing device may execute a second read request to obtain data corresponding to tag "234" from the backing store. After executing the second read request, the processing device may decrement the read counter in entry 1102 to "0". A second fill operation for data obtained from the second read request may be generated and stored in fill queue 214 along with data obtained from the backing store. The priority scheduler 212 may assign a priority indicator of "3" to the second stuff operation corresponding to the second read request.

The priority scheduler 212 may determine a schedule for performing read requests and stuff operations based on the priority indicators assigned to the read requests and stuff operations. Scheduling may be performed in order based on the values. In one example, a first fill operation with priority indicator "1", a first read request with priority indicator "2", a second fill operation with priority indicator "3", a second read request with priority indicator "4", and a third read request with priority indicator "5" are scheduled for execution.

The processing device may perform a first fill operation by removing data from the fill queue 214 and storing the data to the cache line corresponding to the tag "123" in the read-only cache. The processing device may decrement the fill counter to "0" in entry 1100. The priority scheduler 212 may obtain the priority indicator "1" and reuse it for subsequent read requests and/or fill operations. The processing device may unblock the read-only outstanding command queue by setting a value of a blocking bit associated with the read-only outstanding command queue to a value indicating an unblocking status. The processing device may execute the first read request with the next priority indicator of "2" while unblocking the outstanding command queue in entry 1100 to return data from the cache line corresponding to tag "123" to the application that sent the first read request. The processing device may decrement the queue counter to "1". The priority scheduler 212 may obtain the priority indicator "2" and reuse it for subsequent read requests and/or fill operations.

The processing device may search for a read request or a stuff operation with a next priority indicator (e.g., "3"), and may determine that a second stuff operation is assigned the next priority indicator. Because the second read request associated with the second stuff operation is received before the third read request, the second stuff operation may be assigned a next priority rather than the third read request. The processing device may set a lockout bit corresponding to the read-only outstanding command queue to a value indicating a lockout status that prevents execution of the third read request.

The processing device may perform a second fill operation by removing data from the fill queue 214 and storing the data to the cache line corresponding to tag "234" in the read-only cache. The processing device may decrement the padding counter in entry 1102. The priority scheduler 212 may obtain the priority indicator "3" and reuse it for subsequent read requests and/or fill operations. The processing device may unblock the read-only outstanding command queue by setting a value of a blocking bit associated with the read-only outstanding command queue to a value indicating an unblocking status. The processing device may execute a second read request with the next priority indicator of "4" while unblocking the outstanding command queue in entry 1100 to return data at the cache line corresponding to tag "234" to the application sending the second read request. The priority scheduler 212 may obtain the priority indicator "4" and reuse it for subsequent read requests and/or fill operations. After the second read request is performed, the queue counter of entry 1102 may be decremented to "0".

The processing device may search for the next priority indicator "5" assigned to the third read request. The processing device may set the block bit associated with the outstanding command queue of entry 1100 to an unblocked state. The processing device may execute the third read request while the outstanding command queue of entry 1100 is unblocked to return data at the cache line corresponding to tag "123" to the application sending the third request. The priority scheduler 212 may obtain the priority indicator "5" and reuse it for subsequent read requests and/or fill operations. After the third read request is executed, the queue counter of entry 1100 may be decremented to "0".

It will be appreciated that requests may be executed out of order between outstanding command queues corresponding to different cache lines. For example, a first read request with priority indicator "2" is executed in the queue of entry 1100, a second read request with priority indicator "4" is executed in the queue of entry 1102, and then a third read request with priority indicator "5" is executed in the queue of entry 1100. This may provide the benefit of improving the quality of service so that if the requested data is available, the application does not have to wait for other requests to complete execution before receiving the requested data. Also, requests may be executed in order based on when requests are received for the same cache line. As depicted in the figure, a third request is received after the first request for reading data from a cache line corresponding to the same tag, and the third request is stored in a queue after the first request. The use of a first-in-first-out outstanding command queue may ensure that the requests are processed in the order in which they were received.

Fig. 12 illustrates an exemplary machine of a computer system 1200, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein may be executed. In some embodiments, computer system 1200 may correspond to a host system (e.g., host system 120 of fig. 1) that contains, is coupled to, or utilizes a memory subsystem (e.g., memory subsystem 110 of fig. 1) or may be used to perform operations of a controller (e.g., execute an operating system to perform operations corresponding to cache component 113 of fig. 1). In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, or the internet. The machine may operate in a client-server network environment as a server or client, as a peer machine in a peer-to-peer (or distributed) network environment, or as a server or client in a cloud computing infrastructure or environment.

The machine may be a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.

The example computer system 1200 includes a processing device 1202, a main memory 1204 (e.g., Read Only Memory (ROM), flash memory, Dynamic Random Access Memory (DRAM), such as synchronous DRAM (sdram) or Rambus DRAM (RDRAM), etc.), static memory 1206 (e.g., flash memory, Static Random Access Memory (SRAM), etc.), and a data storage system 1218, which communicate with each other via a bus 1230.

The processing device 1202 is representative of one or more general purpose processing devices such as a microprocessor, central processing unit, or the like. More specifically, the processing device may be a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processing device 1202 may also be one or more special-purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), network processor, or the like. The processing device 1202 is configured to execute instructions 1226 for performing the operations and steps discussed herein. The computer system 1200 may further include a network interface device 1208 to communicate over a network 1220.

The data storage system 1218 may include a machine-readable storage medium 1224 (also referred to as a computer-readable medium) on which is stored one or more sets of instructions 1226 or software embodying any one or more of the methodologies or functions described herein. The instructions may also reside, completely or at least partially, within the main memory 1204 and/or within the processing device 1202 during execution thereof by the computer system 1200 to execute the instructions 1226, the main memory 1204 and the processing device 1202 also constituting machine-readable storage media. The machine-readable storage media 1224, data storage system 1218, and/or main memory 1204 may correspond to memory subsystem 110 of fig. 1.

In one embodiment, instructions 1226 include instructions to implement functionality corresponding to a cache component (e.g., cache component 113 of FIG. 1). While the machine-readable storage medium 1224 is shown in an exemplary embodiment to be a single medium, the term "machine-readable storage medium" should be taken to include a single medium or multiple media that store the one or more sets of instructions. The term "machine-readable storage medium" shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term "machine-readable storage medium" shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.

Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. The present disclosure may relate to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage systems.

The present disclosure also relates to an apparatus for performing the operations herein. Such an apparatus may be specially constructed for the intended purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the method. As set forth in the description below, structures for various of these systems will emerge. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.

The present disclosure may be provided as a computer program product or software which may include a machine-readable medium having stored thereon instructions which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). In some embodiments, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., computer) -readable storage medium, such as read only memory ("ROM"), random access memory ("RAM"), magnetic disk storage media, optical storage media, flash memory components, and so forth.

In the foregoing specification, embodiments of the disclosure have been described with reference to specific example embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of embodiments of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

40页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：交流支持系统、交流支持方法、交流支持程序以及图像控制程序

Use of outstanding command queues for separate read-only and write-read caches in a memory subsystem

相关技术

网友询问留言