Cache coherency management for multi-class memory

文档序号:948177 发布日期:2020-10-30 浏览:10次 中文

阅读说明:本技术 用于多类别存储器的缓存一致性管理 (Cache coherency management for multi-class memory ) 是由 F·R·德罗普斯 M·S·沃达克雷 T·麦基 M·马勒维基 于 2020-04-01 设计创作,主要内容包括:在缓存一致性管理的示例性方面,第一请求被接收并且包括共享存储器中的第一存储器块的地址。所述共享存储器包括与对应处理器相关联的存储器设备的存储器块。所述存储器块中的每一个与多种存储器类别中的一种相关联,所述多种存储器类别指示用于管理对应存储器块的缓存一致性的协议。确定与所述第一存储器块相关联的存储器类别,并且对所述第一请求的响应是基于所述第一存储器块的存储器类别。所述第一存储器块和第二存储器块被包括在相同的存储器设备之一中,并且所述第一存储器块的存储器类别不同于所述第二存储器块的存储器类别。(In an exemplary aspect of cache coherency management, a first request is received and includes an address of a first memory block in a shared memory. The shared memory includes memory blocks of a memory device associated with a corresponding processor. Each of the memory blocks is associated with one of a plurality of memory classes that indicate a protocol for managing cache coherency of the corresponding memory block. A memory class associated with the first memory block is determined, and a response to the first request is based on the memory class of the first memory block. The first and second memory blocks are included in one of the same memory devices, and a memory class of the first memory block is different from a memory class of the second memory block.)

1. An apparatus, comprising:

one or more hardware components configured to:

receiving a first request, the first request including an address of a first memory block in a shared memory,

Wherein the shared memory comprises a plurality of memory blocks of one or more memory devices associated with respective processors, and

wherein each of the plurality of memory blocks is associated with one of a plurality of memory classes indicating a protocol for managing cache coherency of the respective memory block;

determining a memory class associated with the first memory block; and

transmitting a response to the first request based on the memory class of the first memory block,

wherein the first memory block is included in the one or more memory devices associated with a first processor of the processors and a memory class of the first memory block is different from a memory class of a second memory block included in the one or more memory devices associated with the first processor.

2. The apparatus of claim 1, wherein the protocol for managing cache coherency corresponding to each of the memory classes indicates where state and ownership information for memory blocks of the respective memory class is stored.

3. The apparatus of claim 1, wherein the plurality of memory classes comprise:

a first memory class, wherein cache coherency of memory blocks of the first memory class is managed by hardware using a coherency directory stored in the one or more memory devices and a coherency directory cache of the apparatus;

a second memory class, wherein cache coherency of memory blocks of the second memory class is managed by hardware using a coherency directory stored in the apparatus; and

a third memory class, wherein cache coherency of memory blocks of the third memory class is managed by software through one or more software applications.

4. The apparatus as set forth in claim 3,

wherein at least the first processor is local to the apparatus and

wherein the apparatus further comprises one or more of: (i) a memory configured to store a coherency directory comprising state and ownership information for at least a portion of the memory blocks of the one or more memory devices associated with the first processor; and (ii) a coherency directory cache configured to store a copy of a portion of state and ownership information included in a coherency directory stored in the one or more memory devices associated with the first processor.

5. The apparatus of claim 4, wherein, for the memory block of the one or more memory devices associated with the first processor:

the coherency directory includes the state and ownership information for memory blocks of the first memory class, an

The coherency directory cache includes a copy of the state and ownership information for a portion of the memory blocks of the second memory class.

6. The apparatus of claim 5, wherein the response comprises one or more of ownership information of the first memory block and data stored in the first memory block.

7. The apparatus of claim 6, wherein the ownership information indicates which of the processors (i) owns the first memory block and/or (ii) is capable of having a copy of data stored in the first memory block stored in their respective caches.

8. The apparatus as set forth in claim 4, wherein,

wherein the request includes an identifier associated with the processor or node controller from which the request originated, and

wherein the one or more hardware components are further configured to identify whether the request is local or remote, the response to the request further based on whether the request is local or remote.

9. The apparatus of claim 4, further comprising a memory to store memory class information for the memory blocks of the memory device associated with a local processor, and

wherein the one or more hardware components determine the memory class associated with the first memory block based on the stored memory class information.

10. The apparatus of claim 4, wherein,

the hardware components include one or more processors,

the apparatus further comprises a memory for storing machine-readable instructions executable by the processor, and

the configuration of the one or more processors is based on the machine-readable instructions.

11. A method, comprising:

receiving a first request comprising an address of a first memory block in a shared memory and an instruction associated with the first memory block,

wherein the shared memory comprises a plurality of memory blocks of one or more memory devices, and each of the plurality of memory blocks is associated with one of a plurality of memory classes indicating a protocol for managing cache coherency of the respective memory block;

Determining a memory class associated with the first memory block; and

transmitting a response to the first request based on the memory class of the first memory block,

wherein the first memory block is included in one of the memory devices, the one of the memory devices further including at least a second memory block of a memory class different from the memory class of the first memory block.

12. The method of claim 11, wherein a memory class of a memory block indicates where state and ownership information of the memory block is stored.

13. The method of claim 11, further comprising:

storing state and ownership information for memory blocks of one of the memory classes in a coherency directory, an

Storing a copy of the state and ownership information for a memory block of another one of the memory classes in a coherent directory cache.

14. The method of claim 13, wherein the plurality of memory classes comprise:

a first memory class, wherein cache coherency of memory blocks of the first memory class is managed by hardware using a coherency directory stored in the one or more memory devices and the coherency directory cache;

A second memory class, wherein cache coherency of memory blocks of the second memory class is managed by hardware using a coherency directory; and

a third memory class, wherein cache coherency of memory blocks of the third memory class is managed by software through one or more software applications.

15. The method of claim 13, wherein the response includes one or more of ownership information of the first memory block and data stored in the first memory block.

16. The method of claim 11, further comprising:

storing class information of the memory blocks of the local memory device, an

Wherein the memory class associated with the first memory block is determined based on the stored class information.

17. A system, comprising:

one or more node controllers; and

a plurality of processors and corresponding memories, wherein each of the plurality of processors and corresponding memory is associated with and local to one of the node controllers, the memories forming a shared memory,

Wherein each of the one or more node controllers is configured to:

a directory storing status and ownership information for memory blocks including the local memory; and

a directory cache storing state information for other memory blocks including the local memory,

wherein the state and ownership information for the other memory blocks of the local memory is not stored in the directory.

18. The system of claim 17, wherein the one or more node controllers are configured to:

storing a memory class of the memory block of the local memory, the memory class indicating a cache coherency protocol for managing cache coherency of the memory block; and

a memory class of the memory block is determined.

19. The system of claim 18, wherein the one or more node controllers comprise a first node controller configured to:

receiving a first request comprising a memory address of a first memory block in the local memory of the node controller and an instruction related to the memory block; and

Transmitting a response to the first request based on the determined memory class of the first memory block.

20. The system of claim 19, wherein the first request is initiated by a processor and transmitted to the first node controller by a second node controller associated with the processor.

Background

In a multi-processor shared memory system, data stored in a processor's local system memory may be shared, resulting in copies of the data also being stored in other processors' caches. Cache coherency is employed to ensure that changes to the shared data or changes to the copies of the data are propagated throughout the system so that all copies reflect the same value. Hardware and/or software implementations supervise or manage cache coherency in a multiprocessor shared memory system by applying a cache coherency protocol, such as a snoop-based protocol or a directory-based protocol. Directory-based cache coherency protocols employ a coherency directory to track and store the state and ownership of memory blocks that may be shared with other processors in a multiprocessor shared memory system. A coherency directory cache may be employed to provide faster access to the state and ownership information stored in the coherency directory.

Drawings

Certain examples are described in the following detailed description with reference to the accompanying drawings, in which:

FIG. 1 is a diagram illustrating an exemplary embodiment of a cache-coherent shared memory system;

FIG. 2 is a diagram illustrating an exemplary embodiment of a portion of the shared memory system of FIG. 1 including multiple memory classes;

FIG. 3A is a diagram illustrating an exemplary embodiment of a compute node and node controller for managing cache coherency for a memory class;

FIG. 3B is a diagram illustrating an exemplary embodiment of a compute node and node controller for managing cache coherency for another memory class;

FIG. 4 is a flow diagram illustrating an exemplary embodiment of a process for managing cache coherency for the shared memory system of FIG. 1 that includes multiple memory classes; and is

FIG. 5 is a diagram illustrating an exemplary embodiment of a node controller of the shared memory system of FIG. 1.

Detailed Description

In the preceding description, numerous details are set forth to provide an understanding of the subject matter disclosed herein. However, embodiments may be practiced without some or all of these details. Other embodiments may include modifications and variations of the details discussed below. It is intended that the appended claims cover such modifications and variations.

The present disclosure provides for managing cache coherency in a multiprocessor shared memory system. Cache coherency may be managed using different protocols, schemes, and/or configurations, each of which provides different advantages and disadvantages. For example, cache coherency may be managed by hardware and/or software. Furthermore, cache coherency may be directory-based, meaning that a cache coherency protocol employs a coherency directory (also referred to herein as a "directory") to store information related to a memory block or a copy of data from a memory block (e.g., a cache line). The information stored in the directory entry may include the state and/or ownership of a memory block or cache line. The state and ownership information stored in the directory may be used to facilitate or ensure consistency across the multiprocessor shared memory system-e.g., changes to shared data are propagated to shared processors or compute nodes in the system.

Systems that provide cache coherency management using a single cache coherency protocol suffer from the disadvantages of such protocols. For example, for some protocols, as the amount of shared memory grows, the size of the directory likewise increases in order to accommodate tracking of state and ownership information for additional memory blocks or cache lines. Larger directories therefore require even more memory, which can result in greater consumption of power, computing, and area resources. Information movement required to maintain a consistency protocol may consume resources that would otherwise be available for other system uses.

In example embodiments described herein, a memory block or region of memory (such as shared memory) of a system may be classified into one of three memory classes. However, it should be understood that the system may support more or less than three memory classes, each with a unique cache coherency protocol or management method. Each memory class indicates a cache coherency protocol to be used for a corresponding memory block or region of the memory. By supporting multiple memory classes simultaneously, the system may implement an optimal or improved cache coherency management arrangement that maximizes or maximizes the advantages and minimizes or avoids the disadvantages of the cache coherency protocols for the memory classes.

For example, for a first memory class, cache coherency is managed by hardware. According to a cache coherency protocol of a first memory class, state and ownership information for local memory blocks and/or corresponding cache lines is stored in a coherency directory in system memory. In addition, a coherency directory cache is provided on the respective node controller to store a copy of some or all of the state and ownership information stored in the coherency directory. The node controller manages, among other things, access to the local memory. Thus, when a node controller receives a request to access state and ownership information, the node controller can efficiently obtain the information from its coherency directory cache without having to retrieve the information from the coherency directory in system memory.

Notably, the size of the node controller may remain constant (and/or not be the same scale as system memory grows) since the size of the coherency directory cache does not have to increase as system memory and corresponding coherency directory increase. Thus, this configuration allows the system to be scalable to very large system memory sizes. On the other hand, the cache coherency protocol of the first memory class needs to consume system memory to store the coherency directory. As system memory grows, the coherency directory likewise increases in size to track the state and ownership of the growing system memory, thereby consuming more memory resources. In addition, when a coherent directory cache has a miss or other cache management related activity, maintenance of the coherent directory cache may require access to system memory to obtain state and ownership information.

For the second memory class, cache coherency is managed by hardware. According to a cache coherency protocol of the second memory class, state and ownership information of the memory blocks and/or corresponding cache lines is tracked using a coherency directory stored in the node controller (e.g., the memory of the node controller). That is, instead of storing a coherency directory in the system memory of the compute node or corresponding to the processor, the directory is stored and managed by the node controller. When a node controller receives a request to access state and ownership information associated with its local memory, the node controller can efficiently obtain the information from its coherency directory without having to retrieve the information from the coherency directory stored in system memory, thereby affecting system performance.

Storing the directory in the node controller does not require additional system memory to be consumed. Furthermore, since the node controller does not need to access system memory to retrieve state and ownership information to maintain its directory, system performance is not much affected by the cache coherency protocol of the second memory class. On the other hand, the cache coherency protocol of the second memory class poses an obstacle to scalability. For example, maintaining accurate state and ownership information for very large system memories (e.g., for all memory blocks) requires adding resources (e.g., memory, size) at the node controller.

For the third memory class, cache coherency is managed by software. That is, the cache coherency of the memory block of the third memory class is handled by or according to the software application. This protocol does not consume system memory since the directory is not used to track state and ownership information. Furthermore, because the coherency managed by the software does not use a directory, there is no need for system memory access or hardware messages required to obtain or maintain the directory. The size and resources of the node controller may also be reduced since no coherency directory or coherency directory cache is required in the node controller for these memory regions. On the other hand, however, software-managed cache coherency does require a more complex application programming model to implement, and some application performance may be made dependent on the amount of data sharing required.

Multiprocessor shared memory system

FIG. 1 illustrates an exemplary embodiment of a cache coherent computing system 100. As shown in fig. 1, for purposes of illustration and simplicity, computing system 100 includes computing components (e.g., processors, memory) arranged as computing nodes n1, n2, n3, and n4 (collectively referred to herein as "n 1 through n 4" or "computing nodes"). That is, each of the compute nodes n1 through n4 refers to an association or logical grouping of compute components that are not necessarily individually packaged or physically separated from the compute components of other compute nodes. In some embodiments, multiple compute nodes (as well as memory, node controllers) may be packaged together in a single enclosure or package. Further, in some embodiments, a reference to one of the compute nodes n1 through n4 may indicate a processor or memory device that references the compute node. The compute nodes n1 through n4 are communicatively coupled to node controller 103-1 (node controller 1) and node controller 103-2 (node controller 2) (collectively referred to herein as "103" or "node controller 103").

The node controllers 103 are communicatively coupled to each other via fabric (or fabric interconnect) 101. As described in further detail below, node controller 103 is configured to provide specific management functions for and/or on behalf of respective computing nodes, as known to those skilled in the art, including cache coherency management and/or implementation of cache coherency protocols or other memory access protocols. For purposes of illustration, although the exemplary computing system 100 of FIG. 1 includes two node controllers and four compute nodes, the computing system 100 may include any number of node controllers and compute nodes.

As shown in the exemplary embodiment illustrated in fig. 1, the compute nodes n1 through n4 are computing systems that include a processor and a memory (e.g., a memory device). It should be understood that a compute node may include or consist of any number of processors and/or memories, as well as other hardware and other software not illustrated in the exemplary FIG. 1, as known to those skilled in the art. The compute nodes n1 through n4 (and/or their corresponding components) may be defined physically or virtually. Further, each of the compute nodes n1 through n4 (and/or components thereof) may be physically packaged separately or with other compute nodes. Thus, in some embodiments, computing system 100 may be a server comprised of one or more enclosures, each enclosure including one or more computing nodes.

In some embodiments, as mentioned above, each of the compute nodes n1 through n4 includes a processor and memory, but may include various other hardware and/or software components. As shown in FIG. 1, compute nodes n1, n2, n3, and n4 include processor 105-1 (processor 1), processor 105-2 (processor 2), processor 105-3 (processor 3), and processor 105-4 (processor 4), respectively (collectively referred to herein as "105" or "processor 105"), and memory 107-1, memory 107-2, memory 107-3, and memory 107-4 (collectively referred to herein as "107" or "memory 107"). In some embodiments, processor 105 (and/or a memory controller of processor 105) is connected via one or more memory channels and/or buses, such as a Peripheral Component Interconnect (PCI) bus, an Industry Standard Architecture (ISA) bus, a PCI Express (PCIe) bus, and a high performance link, such as a Peripheral Component Interconnect (PCI) bus, a PCI Express (ISA) bus, a PCIe) bus, and a high performance bus A Direct Media Interface (DMI) system, a fast channel interconnect, a hypertransport, a Double Data Rate (DDR), SATA, SCSI, or a fibre channel bus, etc.) is communicatively coupled (e.g., directly connected) to its corresponding memory 107. Although not shown in FIG. 1, in some embodiments, one or more of the memories 107 are connected to the fabric 101.

In some embodiments, the memory may be local to one processor and remote to the other processor. For instance, in fig. 1, each of the memories (e.g., memory 107-1) may be considered or referred to as being "local" to one of the processors (e.g., processor 105-1) that is communicatively coupled (e.g., directly attached) to the memory. Each of the memories that is not local to the processor may be considered or referred to as "remote" to those processors. Likewise, the processor 105 and memory 107 (and/or nodes n1 through n4) may be local or remote to one of the node controllers 103. For example, as illustrated in FIG. 1, node controller 103-1 is communicatively coupled to processors 105-1 and 105-2 (and thus to their local memories 107-1 and 107-2). Thus, the processors 105-1 and 105-2 (and their local memories 107-1 and 107-2) are local to the node controller 103-1, while the other processors and memories may be considered remote from the node controller 103-1. It should be understood that the node controller 103 may have any number of local processors and memories.

Each of the processors 105 is an independent processing resource, node, or unit configured to execute instructions. It should be understood that each of the processors 105 may be or refer to one or more Central Processing Units (CPUs), dual or multi-core processors comprised of two or more CPUs, computing clusters, cloud servers, and the like. In some embodiments, two or more of the processors 105 (e.g., processor 105-1 and processor 105-2) may be communicatively coupled using a point-to-point interconnect or bus. For example, two or more of processors 105 may be connected using Intel's super Path interconnect (UPI) or Intel's Quick Path Interconnect (QPI).

Each of the memories 107 may include or consist of any number of memory devices, which may be volatile devices (e.g., Random Access Memory (RAM), static RAM (sram), dynamic RAM (dram)), and/or non-volatile devices (e.g., non-volatile RAM (nvram), double data rate 4 synchronous dynamic RAM (DDR4 SDRAM)). Other types of memory devices that may be used include read-only memory (ROM) (e.g., mask ROM, programmable ROM (prom), erasable programmable ROM (eprom), electrically erasable programmable ROM (eeprom)), flash memory, memristor devices, and so forth.

As known to those skilled in the art, the memory 107 may be used to store software such as an Operating System (OS), a hypervisor, and other applications. The software stored on memory 107 is comprised of processes and/or threads that can be executed concurrently and share resources such as memory (e.g., memory 107) and processor (e.g., processor 105). The processes and/or threads, when executed, may cause requests and responses to be transmitted between processor 105 (and/or node controller 103-1 and node controller 103-2). As described in further detail below, software stored in memory 107 may be used to provide cache coherency.

The memory 107, or a portion (e.g., a memory block, a segment) thereof, may form a shared memory 107 sm. The shared memory 107sm, formed by all or part of all or some of the memory 107, may be shared and/or accessed by all or some of the processors 105. That is, for example, data stored in a portion of the memory 107-1 that is shared and therefore part of the shared memory 107sm may be accessed by a processor other than the processor 105-1. It should be understood that permissions (e.g., read permissions/write permissions) may be used to control access to all or part of the shared memory 107sm and/or by all or some of the processors 105. It should be understood that for simplicity, unless otherwise specified, any or all of the memory 107 mentioned herein shall refer to portions that are shared and make up the shared memory 107sm, although in some embodiments the memory 107 may include unshared regions.

As known to those skilled in the art, each of the memories 107 and/or portions thereof may be configured or programmed according to various settings as known to those skilled in the art. Such configuration may include designating the memory 107 and/or being part of a shared memory. Such a configuration may occur, for example, when memory 107 is partitioned. While many other settings known to those skilled in the art may be defined or set for memory blocks or memory regions of memory, one example is a cache coherency protocol. The cache coherency protocol defines the manner in which cache coherency is managed or handled for a respective memory block or region. For purposes of illustration, memory classes will be described with reference to memory blocks. However, it should be understood that a memory class may be allocated to any size memory region and/or other memory portion.

As described in further detail below (e.g., with reference to fig. 2), memory blocks may be classified using memory classes that indicate their cache coherency protocol and/or cache coherency management scheme. In some example embodiments described herein, when a memory block may be allocated a memory class from any number of sets of memory classes and corresponding cache coherency protocols, the memory block may be allocated any of up to three memory classes that define their cache coherency protocol or management scheme.

·Class 1: caching hardware-managed coherency with a coherency directory on a node controller;

·class 2: consistency managed by hardware using a consistency directory on a node controller; and

·class 3: consistency managed by software.

The dynamic configuration of memory segments into one of the cache coherency classes results in a hybrid shared memory and multiprocessor system that makes it possible to exploit the advantages (and minimize the disadvantages) of each cache coherency class to the maximum. It should be understood that each of the memories 107 may be of different sizes and define any number of memory blocks or regions therein that may be of any size and/or configuration. As described in further detail below, at least some information regarding the definition or configuration of memory blocks or memory regions may be stored, such as in a corresponding node controller. In some embodiments, the node controller may store information indicating a memory class and/or a cache coherency protocol for a memory block (and/or memory region). The information allows the node controller to easily identify the cache coherency protocol for each memory class of the memory block and take specific actions based on the protocol. In some embodiments, a base-limit register may be used to store and/or track memory classes or cache coherency protocols for a memory block (or memory region). That is, for example, the node controller may store base-limit registers, each of which corresponds to a memory class. Each base-limit pair associated with a memory class may include or identify a memory address and/or a range of memory covered by the memory class. Thus, the node controller may determine which base-limit pair the received memory address falls within, and thereby identify the memory class of the base-limit pair and hence the memory block at that memory address. The base-limit pair may also identify the node controller organization location of the memory region.

Still referring to FIG. 1, each of the processors 105 may include or be associated with one or more processor caches for storing data from the memory 107 to provide faster access to the data. More specifically, caches 105-1c, 105-2c, 105-3c, and 105-4c (collectively referred to herein as "105 c" or "cache 105 c") are associated with (and/or local to) processors 105-1, 105-2, 105-3, and 105-4, respectively. It may also be said that in some embodiments, each of the memories 107 is associated with its respective cache 105c of the processor 105. It should be understood that a single cache may correspond to a single processor or may be shared among multiple processors. It should also be understood that each cache may be physically disposed on the same or separate chips or components as their respective processors.

Since data from or stored in memory 107 may be accessed more quickly when cached, cache 105c may be used to store copies of data originally stored in memory 107 that are, for example, accessed more frequently and/or may need to be accessed more efficiently. It should be understood that the cached data may include all or a subset of the data stored in the memory 107. While being cached, data is transferred from memory 107 to cache 105c in fixed-size blocks called cachelines or cache blocks. The copied cache line is stored as a cache entry in cache 105 c. The cache entries contain several types of information, including corresponding data copied from memory 107 and the memory location (e.g., tag or address) of the data within memory 107.

In some embodiments, multiple copies of shared data may be stored in multiple caches 105 c. For example, data stored in the memory 107-1 associated with the processor 105-1 may be cached in its local cache 105-1c and shared with other processors such that its copies are also stored or cached in the remote caches 105-2c, 105-3c, and/or 105-4 c. In this case, access to the shared data is coordinated to provide consistency. For example, when shared data is modified, these changes are propagated throughout the system 100 to ensure that all copies of the data are updated and consistent or that shared copies are invalid to ensure consistency is maintained.

Directories may be used to track shared data and provide consistency. As described in further detail below (e.g., with reference to fig. 2), the directory may track and/or store information about shared data, including, for example, status and ownership information. In addition, as also described in further detail below, the catalog may be stored in one or more components or devices based on the type of memory class. Although a directory may be a "full directory" that includes information (e.g., status, ownership) about all shared data in memory 107, unless otherwise specified, a directory herein is used to track all or part of a cache line or block of a corresponding system memory.

Still referring to fig. 1, as discussed above, a node controller 103 is a computing device or component, including a computing node, memory, and/or processor, configured to provide and/or perform a variety of functions (e.g., cache coherency, routing, load balancing, failover, etc.) on behalf of or for a corresponding computing resource. In some embodiments, the functionality of each of the node controllers 103 may instead be provided in one of the processors 105 of the multiprocessor system 100.

In some embodiments, node controller 103-1 may be or include a general purpose processor (e.g., a microprocessor, a conventional processor, a controller, a microcontroller, a state machine, a sequencer), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. In some embodiments, cache coherency management (e.g., FIG. 4) is performed by a node controller that includes a coherency manager hardware sequencer, a hardware state machine, and/or other circuitry or hardware components. However, it should be understood that coherency management may also be provided by processor-implemented instructions as known to those skilled in the art. The node controller 103-1 may also include one or more memories or memory devices such as ROM, RAM, DRAM, EEPROM, flash memory, registers, or any other form of storage medium known to those skilled in the art.

As illustrated in exemplary fig. 1, node controller 103-1 is communicatively coupled to nodes n1 and n2 (and/or processors 105-1 and 105-2 thereof) and/or associated with nodes n1 and n2 (and/or processors 105-1 and 105-2 thereof); and the node controller 103-2 is communicatively coupled to the nodes n3 and n4 (and/or the processors 105-3 and 105-4 thereof) and/or associated with the nodes n3 and n4 (and/or the processors 105-3 and 105-4 thereof). It should be appreciated that each of node controllers 103 may be communicatively coupled to and/or associated with any number of nodes and/or processors. In some embodiments, the node controllers 103 and their corresponding local processors are communicatively coupled via an interconnect, such as a UPI link. Further, node controllers 103 may be interconnected to each other via fabric 101.

In some embodiments, node controller 103 may provide cache coherency by tracking state and ownership information of cache lines in respective ones of caches 105c, among others. As described in further detail below, node controller 103 may employ a directory and/or directory cache to provide cache coherency. That is, although not illustrated in FIG. 1, in some embodiments, node controller 103 may include a coherent directory cache corresponding to a directory for tracking status and ownership information for a cache line size block of, for example, system memory. However, it should be understood that the directory may additionally or alternatively be stored in memory (e.g., in one of the memories 107) rather than in the node controller, as described herein.

Fabric 101 may include one or more direct interconnects and/or switched interconnects through which node controllers 103 (and thus processors 105) may communicate with each other. For example, in some embodiments, fabric 101 may include a direct interconnection between node controller 103-1 and node controller 103-2 (e.g., to minimize latency). Thus, fabric 101 may be used to transfer data and/or messages between or among node controllers 103 and/or one or more of processors 105. Such communication may include, for example, requests to read or write to memory or cache blocks, in which case, as described below, node controller 103 may provide or facilitate cache coherency via multiple concurrently implemented cache coherency protocols for each type of memory class.

FIG. 2 illustrates an exemplary embodiment of a shared memory having memory blocks of multiple memory classes, each class having or indicating a different cache coherency protocol for managing cache coherency of the memory blocks. More specifically, in FIG. 2, a shared memory 107sm, comprised of memories 107-1, 107-2, 107-3, and 107-4, corresponding to processors 105-1 through 105-4 and/or compute nodes n1 through n4 and/or local to processors 105-1 through 105-4 and/or compute nodes n1 through n4, respectively, is illustrated. For purposes of illustration, the portion of shared memory 107sm made up of memory 107-1 includes memory blocks numbered m01 through m20, each of which may have any size and/or different sizes than the other blocks, as shown. Likewise, the portion of shared memory 107sm comprised of memory 107-2 comprises memory blocks numbered m21 through m 51; the portion of shared memory 107sm made up of memory 107-3 includes memory blocks numbered m52 through m 78; and the portion of shared memory 107sm made up of memory 107-4 comprises memory blocks numbered m79 through m 92. The memory block is configured to store data. It should be appreciated that the shared memory 107sm may be of any size, include any number of memory blocks or regions, and be comprised of any number of memories or memory devices corresponding to any number of compute nodes. Further, the memories 107-1, 107-2, 107-3, and 107-4 (and the blocks, categories, or regions in each of these memories) need not be contiguous, although illustrated in exemplary FIG. 2.

For simplicity, the memory classes and their corresponding cache coherency protocols (e.g., cache coherency management schemes, protocols, configurations) for managing cache coherency are now described in more detail with reference to memory 107-1 of shared memory 107 sm. It should be appreciated that, for purposes of simplicity, memories 107-2, 107-3, and 107-4 may include some or all of the features, arrangements, configurations, and/or functionalities described below with reference to memory 107-1, although not illustrated in fig. 2 or discussed with reference to fig. 2. Thus, it should be understood that, similar to memory 107-1, memories 107-2, 107-3, and 107-4 may comprise memory blocks of multiple memory classes and as such be associated with caches, directories, and/or node controllers.

Still referring to FIG. 2, memory 107-1 includes memory blocks of three different memory classes: memory blocks m01 through m05 belong to a memory class (labeled "class 1") memory block; memory blocks m06 through m16 belong to memory class 2 (labeled "class 2"); and memory blocks m17 through m20 belong to memory class 3 (labeled "class 3"). Different classes of memory blocks have different cache coherency management protocols associated with them. As described above, the memory class of a memory block or memory region may be set, for example, during memory management operations or memory partitioning. It should be understood that the memory may include any number of memory blocks of a memory class. That is, although the exemplary memory 107-1 includes three different memory classes, the memory 107-1 may include one or more different classes of memory blocks. Furthermore, the memory blocks of the memory belonging to a single class need not be contiguous. That is, for example, although the class 1 memory blocks (m 01-m 05) of memory 107-1 illustrated in FIG. 2 are shown as being contiguous, these same memory blocks may not be contiguously distributed at other regions or addresses of memory 107-1.

In some embodiments, memories 107-1 through 107-4 are associated with caches 105-1c through 105-4c, respectively. In some embodiments, data stored in a memory block may be cached (e.g., in a processor cache), meaning that a copy of the data (referred to as a "cache line" or "cache block") is stored in a cache entry of the cache. An exemplary structure of a cache entry storing a cache line (e.g., a copy of data) is shown in table 1 below:

TABLE 1

Cache indexing Significant bit Label (R) Data of

As shown in Table 1, a cache entry may include, among other information, an index, a valid bit, a tag, and a copy of data copied from one of the primary memories, a cache line. In some embodiments, the cache index may be an identifier associated with each cache entry or cache block within the cache; the valid bit may be an identifier (e.g., a bit) indicating whether the corresponding cache entry is used or unused; the tag may be an identifier or descriptor for determining the identity of the cache line and/or for linking or associating the cache line to its corresponding memory block in main memory; and the data refers to a copy (cache line) of the data from the main memory.

For purposes of illustration only, cache 105-1c will now be described in further detail. However, it should be understood that caches 105-2c through 105-4c include some or all of the features of cache 105-1c described herein. In some embodiments, copies of data stored in memory blocks of memory 107-1 may be cached in cache entries of the corresponding cache 105-1c of processor 105-1, as well as in caches 105-2c, 105-3c, and/or 105-4 c. Although the cache may be any size, in some embodiments, cache 105-1c of FIG. 2 is smaller than its corresponding memory 107-1. That is, the cache 105-1c is used to store a subset of the data stored in the memory 107-1 (such as more frequently used data) where it can be more quickly accessed by the processor 105-1.

In exemplary FIG. 2, cache 105-1c includes exemplary cache entries c01 through c11, which may be of any size. As illustrated by the arrows pointing from the memory blocks of memory 107-1 to the cache entries of cache 105-1c, cache entries c03, c01, and c08 are associated with and/or store copies of data stored in memory blocks m01, m04, and m05, respectively, which are class 1 memory blocks. Cache entries c02, c11, c04 are associated with and/or store copies of data stored in memory blocks m06, m15, and m16, respectively, which are class 2 memory blocks. Cache entries c09 and c10 are associated with and/or store copies of data stored in memory blocks m17 and m18, respectively, which are class 3 memory blocks. It should be understood that some cache entries (e.g., c 05-c 07) of cache 105-1c may not be used at any given time.

As described above, in some embodiments, a directory (or "coherency directory") may be used to track the state of memory blocks and/or corresponding cache lines (including cache lines in remote caches) throughout the system (e.g., system 100). That is, in a directory entry, the directory may store information indicating which cache has stored or is storing a copy of the data from the memory block in local memory. Table 2 below illustrates an exemplary structure of a directory entry:

TABLE 2

Label (R) Status of state Ownership rights

As shown in table 2, each directory entry may include, for example, a tag, state information, and ownership information. The tag may be an identifier, pointer, etc. that associates the directory entry with the memory block (or cache entry). That is, the tag in a directory entry indicates to which memory block or cache entry the status and ownership information of the directory entry corresponds. The status information indicates a status of the memory block. It should be understood that different states may be tracked, including, for example, a modified state, an exclusive state, a shared state, and/or an invalid state.

As known to those skilled in the art, the modification status indicates that data from a memory block is cached in only one cache, but that it is dirty (dirty) -meaning that it has been modified from the original value stored in main memory. The exclusive status indicates that data from the memory block is cached in only one cache, but it is clean (clean) -meaning it matches the original value stored in main memory. In some embodiments, the unique state enables data to be changed or updated without notifying other potential sharers of the block. The shared state indicates that data from the memory block can be cached in other caches and that it is clean — meaning that it matches the original value stored in main memory. The invalid state indicates that data from the memory block is not cached.

Still referring to Table 2, the ownership information indicates the processor that owns (e.g., has cached in) the memory block or cache line-e.g., when the state of the memory block in the directory is a modified state or a unique state; or indicating to the processor that the memory block is shared-e.g., when the state of the memory block in the directory is a shared state. It should be understood that the actual information stored as state and ownership information may vary depending on the coherency implementation and coherency protocol used.

The state and ownership of the memory and/or cache of a single compute node may be tracked using a single directory or multiple directories, which may be stored on one or more memories. In some embodiments, the storage location of the directory may be based on a cache coherency category of the memory block in the corresponding cache. Further, in some embodiments, the number of directories associated with a cache of a single compute node may be based on the number of cache coherency classes of the memory block in the respective cache.

For example, as shown in FIG. 2 discussed above, the memory blocks of memory 107-1 have three different categories (e.g., category 1, category 2, category 3). These categories of memory are now described in further detail with reference to fig. 3A and 3B.

As shown in exemplary fig. 3A, cache coherency for a memory block of category 1 is managed by hardware using a coherency directory stored on local main memory and a coherency directory cache stored on a corresponding node controller. As shown in exemplary FIG. 3B, the cache coherency of a memory block of class 2 is managed by hardware using a coherency directory stored on the corresponding node controller. The cache coherency of the class 3 memory block is managed by software.

More specifically, in FIG. 3A, the data is stored in a Category 1 memory block of memory 207-1. Among other things, the directory 209-1 of compute node n1a manages coherency by tracking and/or storing the state and ownership of all or some of the memory blocks of memory 207-1. The compute node n1a is communicatively coupled and/or associated with a node controller 203-1, through which the compute node n1a may communicate with other compute nodes. The node controller 203-1 includes a directory cache 203-1c that caches all or some of the directory entries of the directory 209-1. The example arrangement of FIG. 3A provides scalability by allowing the size of the memory and/or cache of node controller 203-1 to remain relatively constant even as the size of the memory of compute node n1a increases.

In FIG. 3B, the data is stored in a Category 2 memory block of memory 207-2. The compute node n2a is communicatively coupled and/or associated with a node controller 203-2, through which the compute node n2a may communicate with other compute nodes. Node controller 203-2 includes a directory 211-2 that, among other things, manages coherency by tracking and/or storing the state and ownership of all or some of the memory blocks of memory 207-2. The example arrangement of FIG. 3B does not require the addition or additional consumption of memory 207-2 for tracking state and ownership information, and eliminates or reduces access to memory 207-2 as, among other things, directories on node controller 203-2 also store state and ownership information.

It should be understood that the directories and other arrangements of compute node n1a and compute node n2a may be combined such that a single compute node has memory with multiple classes of memory blocks. As a result of this combination, a directory stored in both the compute node and the corresponding node controller may be provided, and a directory cache may also be provided on the node controller. Further, while the directory 209-1 on the compute node n1a is illustrated as being separate from its corresponding memory 207-1, it is to be understood that the directory may be included and/or stored in the memory 207-1 (e.g., in an unshared portion of the memory 207-1). It should be noted that, among other things, directory cache 211-2 may serve as both a directory and a directory cache to track the status and ownership of different memory classes. That is, it should be understood that in some embodiments herein, the distinction between directory and directory cache relates to the entry replacement algorithm and the integrity of the trace to each. For example, a directory may be configured to keep track of all relevant memory blocks and remove any entries without taking advantage of valid remote ownership. Typically, when the directory is located on a node controller, the node controller retains a unique copy of the information (e.g., state, ownership). The directory cache may be configured to contain a subset of directory entries and a complete full set of directory information will also be retained in another memory. Directory caches typically replace older entries with newer entries.

Returning to FIG. 2, memory 107-1 includes different classes of memory blocks. As a result, the cache coherency of the data of the memory blocks of memory 107-1 employs a coherency directory 109-1 stored in the memory of compute node n1, and another coherency directory 111-1 stored on node controller 103-1. In addition, node controller 103-1 includes a coherency directory cache 103-1c that includes a copy of the entries in directory 109-1, according to a coherency management protocol for the class 1 memory block. It should be understood that in some embodiments, directory 109-1 and directory 111-1 may be configured individually or in combination to be smaller or include fewer entries than needed to track status and ownership information of all memory blocks of memory 107-1. That is, directory 109-1 and directory 111-1 may track the status and ownership of only a subset of the memory blocks, either individually or in combination.

The directory 109-1 includes exemplary directory entries d01 through d 12. As shown, directory entries d01, d02, d03, d08, and d09 are associated with and/or store state and ownership information of memory blocks m01, m04, and m05, m03, and m02, respectively, among other things. Notably, memory blocks m01, m04 and m05, m03 and m02 are class 1 memory blocks. Thus, the coherent directory cache 103-1c of the node controller 103-1 stores a copy of all or some of the data (e.g., state, ownership) in the directory entries d01, d02, d03, d08, and d09, according to the corresponding cache coherency management protocol for the class 1 memory block. As discussed in further detail herein, node controller 103-1 may easily access ownership and status information for memory blocks m01, m04 and m05, m03 and m02 in its own cache (103-1c) without having to access directory 109-1, cache 105-1c and/or memory 107-1 on compute node n1 to obtain the information. Specifically, directory cache entries c50, c51, c52, c57, and c58 of coherent directory cache 103-1c include copies of directory entries d01, d02, d03, d08, and d09, respectively, of directory 109-1.

Further, the directory 111-1 stored on the node controller 103-1 includes exemplary directory entries d70 through d 79. As shown, directory entries d70, d71, and d79 are associated with and/or store state and ownership information for data stored in memory blocks m15, m16, and m06, respectively, among other things. Notably, memory blocks m15, m16, and m06 are class 2 memory blocks. Thus, the node controller 103-1 may access ownership and status information of memory blocks m15, m16, and m06 in its own directory without having to access memory 107-1 and/or directory 109-1 on compute node n1 to obtain the information.

It should be understood that node controller 103-1 may include other directories and/or include additional directory entries in directory 111-1 to additionally or alternatively track and/or store status and ownership information for memory blocks in other associated memories, such as memory 107-2 of node controller n 2.

FIG. 4 is a flow diagram illustrating an exemplary embodiment of a process 400 for managing aspects of cache coherency for multi-class memory. More specifically, the process 400 details how the node controller 103-1 processes received requests relating to memory blocks and/or data stored in any of its associated memories 107-1 and 107-2 among the shared memory 107sm in the system 100 (e.g., as described with reference to fig. 1 and 2).

As discussed above, the node controller 103-1 is communicatively coupled to the compute nodes n1 and n2 (and/or their corresponding processors 105-1 and 105-2) that are local thereto. In addition, node controller 103-1 is coupled to other node controllers, such as node controller 103-2, via fabric 101. The node controller 103-2 is communicatively coupled to the compute nodes n3 and n4 (and/or its processors 105-3 and 105-4) that are local thereto. The nodes n3 and n4 (and/or the processors 105-3 and 105-4 thereof) are remote to the node controller 103-1 (and/or to the compute nodes n1 and n2, and the processors 105-1 and 105-2). As shown in FIG. 2, memory 107-1 includes memory blocks of multiple cache coherency classes (e.g., classes 1 through 3).

At step 450 of process 400, node controller 103-1 receives a request associated with a memory block in one of its respective memories, e.g., triggered by a thread being executed by a processor. The request may include an instruction (e.g., read, write), an address of a memory block in or for which the instruction is processed, data, and/or a source identifier (e.g., requesting processor). The request may be a remote request, meaning that it is received by node controller 103-1 from one of the other node controllers (e.g., node controller 103-2) that represents its local processor of system 100 (and/or from one of the processors themselves). On the other hand, the request may be a local request, meaning that it is received by one of the processors (e.g., 105-1 and 105-2) that is local to the node controller 103-1 receiving the request.

That is, in some embodiments, the request received by the node controller 103-1 may be initiated or sent from any of the processors 105 including local processors (e.g., processors 105-1 and 105-2) and remote processors (e.g., processors 105-3 and 105-4). As known to those skilled in the art, requests may be routed to node controller 103-1 based on information (e.g., a memory map) indicating that a memory block (or memory address or an address included in the request) is managed by node controller 103-1. In some embodiments, the request received at step 450 may be a read command or a write command — e.g., to read or write a block of memory into memory 107-1. In some embodiments, the node controller 103-1 responsible for managing cache coherency may contain state and ownership information for the memory block. As described above, based on the memory classes (e.g., classes 1-3 discussed above), the state and ownership information may be stored in different memories or memory devices. Node controller 103-1 may use this information to accurately and efficiently respond to the request received at step 450.

Further, at step 452, the node controller identifies whether the request received by node controller 103-1 at step 450 originated from a local processor (e.g., processors 105-1 and 105-2) or a remote processor (e.g., processors 105-3 and 105-4). This determination may be performed by node controller 103-1 based on data included in the request (e.g., the requesting processor and/or node controller or an identifier of where the request was received from) and/or information stored by node controller 103-1. Based on this information (e.g., whether the request is from a local processor or a remote processor), the node controller 103-1 may perform appropriate cache coherency management at steps 454 through 468.

The node controller, in turn, determines whether the address (and/or addresses, address ranges) indicated in the request is a Software Managed Coherency (SMC) memory block (e.g., class 3). In some embodiments, this may be performed by determining whether a memory block is in the SMC memory area. That is, at step 454, the node controller 103-1 analyzes the address included in the request and checks the memory class associated with the memory block located at the address. As described above, the node controller 103-1 may make such a determination using memory management data or the like that includes information about the node controller's associated memory (e.g., 107-1, 107-2) among the shared memory 107 sm. In some embodiments, this information may include memory classes (e.g., classes 1 through 3) for blocks of memory managed by node controller 103-1 or associated with node controller 103-1.

If the node controller 103-1 determines at step 454 that the address referenced in the received request is a class 3 memory block and/or corresponds to a class 3 memory region (e.g., SMC), the node controller 103-1 transmits a response to the processor from which the request was received (e.g., via the respective node controller of the processor) at step 456. It should be noted that in embodiments where node controller 103-1 determines at step 452 that the request was initiated by a local processor (e.g., processors 105-1, 105-2), the response transmitted at step 456 is sent to the local processor without being routed through another node controller. On the other hand, if the request was initiated by a remote processor (e.g., processors 105-3, 105-4), the response transmitted at step 456 is sent to the remote processor through the remote processor's respective node controller (e.g., 103-2).

As known to those skilled in the art, the type of response and/or the information contained in the response may vary depending on a number of factors, including the cache coherency protocol, whether the request is a read instruction or a write instruction, etc. For example, in some embodiments, the response may include one or more data from the referenced memory block, as well as state and/or ownership information for the memory block.

In some embodiments, the type of response transmitted at step 456 (and/or at step 462 described in further detail below) may be based on whether the determination at step 452 was made by the local processor or the remote processor. In some embodiments, if the request is received from a local processor, the node controller transmits a response at step 456 that includes an indication that no processor owns the memory block and/or has a shared copy of the data stored in the memory block referenced in the request. This is due to the fact that in a software-managed coherency approach cache coherency is managed by software rather than by the processor. That is, the software controls (e.g., tracks and ensures) the consistency of SMC memory blocks or areas. On the other hand, if the request is received by the node controller 103-1 from a remote processor, the response does not include ownership information for the memory block.

In turn, node controller 103-1 returns to its original state of waiting to receive additional requests at step 450.

Returning to step 454, if, on the other hand, the node controller 103-1 determines that the address referenced in the received request does not correspond to a class 3 or SMC memory block, then the node controller proceeds to determine, at step 458, whether the address corresponds to a class 2 memory block having a cache coherency managed by hardware using a directory on the node controller (or on the hub). In some embodiments, this may be performed by checking whether the address is within the class 2 memory region. Such a determination may be made based on information stored by the node controller 103-1 regarding the configuration of its respective or local memory (e.g., memories 107-1 and 107-2), as in step 454.

If node controller 103-1 determines at step 458 that the address referenced in the request received at step 450 corresponds to a memory block (e.g., class 2 memory) having consistency managed by hardware using a directory on the hub, then the node controller proceeds to detect whether a directory hit occurs at step 460. As described above, in a hardware-managed directory (or directory on a node controller) configuration or implementation on the hub, the state and ownership information for the Category 2 memory may be included in a directory 111-1 stored in the node controller 103-1 (rather than in the compute node's main memory directory 109-1). It should be understood that directory 111-1 may include, among other things, state and ownership information for all or a subset of the class 2 memory blocks.

As known to those skilled in the art, a directory hit and/or occurrence thereof indicates that directory 111-1 includes status and/or ownership information for the memory block corresponding to the address referenced in the request. On the other hand, if directory 111-1 does not include the status and ownership information for the memory block referenced in the request, then a failure occurs (e.g., no hit occurs).

If a hit is identified at step 460, node controller 103-1 proceeds to respond to the request at step 462. Because a hit has been identified, meaning that the state and/or ownership information for the memory block is included in directory 111-1, the response to the request may be based on and/or include ownership of the memory block. As discussed above, the type of response may also vary based on whether the request is received from the local or remote processor and/or whether the request is a read or write request. Notably, because the directory 111-1 stored in the node controller 103-1 includes state and ownership information for the relevant memory blocks, the node controller 103-1 can efficiently respond to requests without first accessing or retrieving the state and ownership information stored elsewhere (such as in the compute node's main memory).

If a hit is not identified (and/or a miss is identified) at step 460, meaning that directory 111-1 does not include state or ownership information for the memory block, node controller 103-1 proceeds to transmit a response to the requesting processor at step 456. As discussed above, the response transmitted at step 456 may vary depending on whether the request is received from the local processor or the remote processor. In some embodiments, if the request is received from a local processor, the response may include an indication that the memory block does not belong to any processor (e.g., because it is owned by software). On the other hand, if the request is received from a remote processor, state tracking information may be added to the directory 111-1 on the controller 103-1 and an appropriate response sent to the requesting processor (e.g., via a processor node controller).

In turn, whether a hit is identified or not, after sending a response at step 456 or 462, node controller 103-1 returns to step 450 where it may wait for additional requests to process.

Returning to step 458, if node controller 103-1 determines that the memory block referenced in the request is not a memory block with hardware-managed coherency (e.g., class 2 memory) using the directory stored in the node controller, then the node controller proceeds to identify whether a directory cache hit occurred at step 464. In other words, node controller 103-1 checks whether its coherency directory cache 103-1c includes state and/or ownership information for the memory block referenced in the request.

It should be appreciated that since node controller 103-1 assumes that the memory block referenced in the request is class 1 memory, i.e., memory having a coherency managed by hardware using the coherency directory 109-1 stored in the compute node's main memory (e.g., in memory 107-1) and the directory cache 103-1c stored in node controller 103-1, the coherency directory cache hit check of step 464 is performed. As described above, the directory cache 103-1c may store a copy of the data (e.g., state and ownership information) included in the respective directory 109-1. In some embodiments, node controller 103-1 may make the assumption that: the memory block is a class 1 memory because it was previously determined that the memory block is neither a class 3 type memory nor a class 2 type memory, and thus must be a class 1 memory. However, although not illustrated in FIG. 4, node controller 103-1 may perform a check to determine whether the memory block is a class 1 memory prior to step 464 based on memory configuration information stored by node controller 103-1.

If node controller 103-1 determines at step 464 that a directory cache hit occurred (meaning that the address referenced in the request corresponds to a memory block having state and ownership information stored in directory cache 103-1 c), then node controller 103-1 transmits a response including the state and ownership information to the requesting node controller or processor at step 462. As indicated by the identified hit, because the state and ownership information is stored in the cache directory 103-1c of the node controller 103-1, the node controller 103-1 can efficiently respond to requests without having to access or retrieve the state and ownership information stored elsewhere (such as in the compute node's main memory). In turn, node controller 103-1 returns to step 450 where it may wait for additional requests to process.

On the other hand, if node controller 103-1 does not detect a hit (and/or detects a miss) at step 464, node controller 103-1 reads and/or retrieves directory information (e.g., status and ownership) for the memory block referenced in the request from directory 109-1 at step 466. As described above, the directory 109-1 is stored in memory, rather than in the node controller 103-1. For example, the directory 109-1 corresponding to the first computing node n1 and/or the processor 105-1 may be stored in the local memory 107-1 (e.g., in an unshared portion thereof).

Further, at step 468, node controller 103-1 updates its directory cache 103-1c to include the status and ownership information read or retrieved from directory 109-1. That is, the node controller 103-1 may store a copy of the state and ownership information included in the respective directory 109-1 for the memory block referenced in the request received at step 450.

In turn, at step 462, node controller 103-1 transmits a response to the requesting node controller or processor. The response may include the status and ownership information of the memory block referenced in the request. Node controller 103-1 returns to step 450 where it may wait for additional requests to process.

It should be appreciated that although the determinations of memory classes corresponding to memory blocks (e.g., steps 454, 458) are illustrated as sequential steps in process 400, these determinations may be performed simultaneously and/or partially simultaneously. For example, in some embodiments, node controller 103-1 may determine the memory class of a memory block in a single step and process the request accordingly based thereon.

FIG. 5 illustrates an apparatus 503-1 for providing cache coherency management comprising hardware or hardware components 503-1h (e.g., circuitry, hardware logic). The hardware 503-1h is configured to perform or perform the methods, functions, and/or processes described herein. In some embodiments, these methods, functions and/or processes may be implemented as machine-readable instructions or code stored on a computer-readable medium (such as RAM, ROM, EPROM, EEPROM). These instructions may be executed by one or more processors of device 503-1.

As shown in fig. 5, hardware 503-1h may include hardware (and/or machine-readable and executable instructions) 504-1 for receiving a request, such as a first memory access request including an address and instructions. The address may be a memory address of a first memory block in the shared memory and the instruction may refer to the first memory block. The shared memory may include multiple memory blocks of one or more memory devices associated with a corresponding processor. Each of the plurality of memory blocks may be associated with one of a plurality of memory classes that indicate a protocol for managing cache coherency of the corresponding memory block.

Hardware 503-1h may include hardware (and/or machine-readable and executable instructions) 504-2 for determining a memory class associated with the first memory block, and hardware (and/or machine-readable and executable instructions) 504-3 for transmitting a response to the first memory access request based on the memory class of the first memory block.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种数据传输方法、装置、计算机可读存储介质和计算机设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类