Memory node controller
阅读说明:本技术 存储器节点控制器 (Memory node controller ) 是由 乔纳森·柯蒂斯·比尔德 罗克萨娜·鲁西托鲁 柯蒂斯·格伦·邓纳姆 于 2018-07-05 设计创作,主要内容包括:一种用于数据处理网络的节点的存储器节点控制器,所述网络包括至少一个计算设备和至少一个数据资源,每个数据资源通过物理地址来寻址。所述节点被配置为将所述至少一个计算设备与所述至少一个数据资源耦合。所述数据处理网络的元件经由系统地址空间来寻址。所述存储器节点控制器包括:到所述至少一个数据资源的第一接口;到所述至少一个计算设备的第二接口;以及系统地址到物理地址转换器缓存,所述系统地址到物理地址转换器缓存被配置为将所述系统地址空间中的系统地址转换为所述至少一个数据资源的物理地址空间中的物理地址。(A memory node controller for a node of a data processing network, the network comprising at least one computing device and at least one data resource, each data resource being addressed by a physical address. The node is configured to couple the at least one computing device with the at least one data resource. The elements of the data processing network are addressed via a system address space. The memory node controller includes: a first interface to the at least one data resource; a second interface to the at least one computing device; and a system address to physical address translator cache configured to translate a system address in the system address space to a physical address in a physical address space of the at least one data resource.)
1. A memory node controller for a node of a data processing network, the data processing network having at least one computing device and at least one data resource, the node being configured to couple the at least one computing device with the at least one data resource and an element of the data processing network addressable via a system address space, the memory node controller comprising:
a first interface to the at least one data resource, wherein each of the at least one data resource is addressed via a physical address space;
a second interface to the at least one computing device; and
a system address to physical address translator cache configured to: translating a system address in the system address space to a physical address in the physical address space of a data resource of the at least one data resource.
2. The memory node controller of claim 1, further comprising:
a Physical Device Configuration Settings (PDCS) memory that stores information indicating a mapping of elements of the data processing network into the system address space.
3. The memory node controller of claim 1, wherein the at least one data resource comprises a remote network, and wherein the first interface comprises a network interface card.
4. The memory node controller of claim 1, wherein the first interface comprises an interface to another memory node controller.
5. The memory node controller of claim 4, wherein the system address space comprises a plurality of address partitions, and wherein the memory node controller is associated with a first address partition of the plurality of address partitions and the another memory node controller is associated with a second address partition of the plurality of address partitions.
6. The memory node controller of claim 1, wherein the second interface comprises an interface to a processor core.
7. The memory node controller of claim 1, wherein the second interface comprises an interface to a hardware accelerator.
8. The memory node controller of claim 1, wherein the first interface comprises an interface to a memory device or a storage device.
9. A non-transitory computer readable medium having instructions representing a hardware description language of the memory node controller system of claim 1.
10. A non-transitory computer readable medium having a netlist representing the memory node controller of claim 1.
11. A data processing network comprising:
a first memory node controller;
a first plurality of addressable units addressed by a system address space and comprising a first plurality of data resources, each of the first plurality of data resources coupled to the first memory node controller via a channel and addressed by a physical address space; and
a first plurality of computing devices each coupled to the first memory node controller and configured to access the first plurality of addressable units via the first memory node controller,
wherein the first memory node controller comprises a system address to physical address translator cache configured to: translating system addresses received from computing devices of the first plurality of computing devices to physical addresses in an address space of data resources of the first plurality of data resources.
12. The data processing network of claim 11, further comprising:
one or more second memory node controllers coupled to the first memory node controller;
wherein the first memory node controller is assigned a first partition of system addresses in the system address space,
wherein each of the one or more second memory node controllers is assigned a second partition of system addresses in the system address space, and
wherein a computing device of the first plurality of computing devices includes a range table associating the first memory node controller with a system address in a first partition of the system address and each of the one or more second memory node controllers with a system address in a second partition of the corresponding system address, and the range table is configured to send a request to access memory at the system address to a memory node controller of the first and second memory node controllers associated with a system address.
13. The data processing network of claim 12, further comprising:
a second plurality of data resources each coupled to a second memory node controller of the one or more second memory node controllers via a channel and having a physical address space; and
a second plurality of computing devices each coupled to a second memory node controller of the one or more second memory node controllers and configured to access the data processing network via the system address space,
wherein the one or more second memory node controllers are configured to couple the second plurality of computing devices with the second plurality of data resources.
14. The data processing network of claim 11, wherein the first plurality of addressable units further comprises a hardware accelerator.
15. The data processing network of claim 11, wherein the first plurality of addressable units further comprises a network interface card.
16. A method for accessing one or more data resources by one or more computing devices in a data processing network, the method comprising:
mapping elements of the data processing network to a system address space;
assigning a first partition of the system address space to a first memory node controller of the data processing network, wherein the one or more computing devices and the one or more data resources are coupled to the first memory node controller;
receiving, at the first memory node controller, a request to access an element of the data processing network at a system address in the system address space; and
servicing, by the first memory node controller, the request when the system address is in the first partition of the system address space.
17. The method of claim 16, further comprising:
assigning a second partition of the system address space to a second memory node controller of the data processing network; and
forwarding the request to the second memory node controller when the system address is in the second partition of the system address space.
18. The method of claim 16, further comprising:
assigning a second partition of the system address space to a second memory node controller of the data processing network; and
servicing, by the first memory node controller, the request when the system address is in the second partition of the system address space and the system address is dynamically shared with the first memory node controller.
19. The method of claim 16, wherein each of the one or more data resources is coupled to the first memory node controller via a channel, and wherein servicing the request by the first memory node controller comprises:
identifying a channel to a data resource of the one or more data resources corresponding to the system address;
translating the system address to a physical address in the data resource; and
accessing the data resource at the physical address via the identified channel.
20. The method of claim 16, wherein the first partition of the system address space comprises a first plurality of pages, the method further comprising:
assigning a second partition of the system address space to a second memory node controller of the data processing network, wherein the second partition of the system address space comprises a second plurality of pages;
monitoring access to the second plurality of pages by the one or more computing devices coupled to the first memory node controller; and
migrating pages of the second plurality of pages from the second memory node controller to the first memory node controller in accordance with the monitored accesses.
21. The method of claim 20, further comprising:
the coherency state of the migrated page is recorded.
22. The method of claim 16, wherein the first partition of the system address space comprises a plurality of rows, and wherein the data processing network further comprises a data transfer cache, the method further comprising:
monitor system memory requests to the plurality of rows by the one or more computing devices coupled to the first memory node controller;
servicing, by the first memory node controller, a system memory request when a requested line of the plurality of lines is not present in the data transfer cache;
pushing the requested line from the first memory node controller to a data transfer cache of the data processing network in accordance with the monitored system memory request; and
servicing, by the data transfer cache, the system memory request when the requested line is present in the data transfer cache.
23. The method of claim 16, further comprising:
assigning a second partition of the system address space to a second memory node controller of the data processing network, wherein one or more additional data resources are coupled to the second memory node controller; and is
The first memory node controller:
allocating memory within an address range of a second partition of the system address space;
entering the assigned address range in a system address translation table of the first memory node controller; and
directing memory requests for addresses within the allocated address range to the second memory node controller.
Technical Field
The present disclosure relates to control of physical device memory in a data processing network.
Background
A data processing system may include multiple computing devices of various types and multiple memory resources of different types. For example, a system may include Dynamic Random Access Memory (DRAM), block devices, Remote Direct Memory Access (RDMA) devices, memory located on hardware accelerators, and other types of volatile and non-volatile memory. Memory and other resources within a data processing system are addressed by a system address space, while each memory device is addressed by a physical address space.
The mapping between system addresses and corresponding physical addresses may be performed statically, either by software calls to the operating system, or by hardware caching of software-mediated translation processes. Such an approach does not provide optimal use of memory, particularly when memory resources are shared among multiple processing cores or multiple processes and when the memory resources have different characteristics.
Drawings
FIG. 1 illustrates a data processing network consistent with certain embodiments of the present disclosure.
Fig. 2 is another block diagram of a data processing network consistent with embodiments of the present disclosure.
FIG. 3 is a block diagram of a simplified network incorporating a memory node controller consistent with embodiments of the present disclosure.
FIG. 4 illustrates a state diagram of a data coherency protocol consistent with embodiments of the present disclosure.
Fig. 5 is a block diagram of a data processing network consistent with the present disclosure.
FIG. 6 is a flow diagram of a method for routing memory access requests consistent with an embodiment of the present disclosure.
Detailed Description
While this invention may be embodied in many different forms, there is shown in the drawings and will herein be described in detail specific embodiments, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated and described. In the description below, like reference numerals may be used to describe the same, similar or corresponding parts in the several views of the drawings.
In this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," "including," "includes," "having," "with," "having," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element followed by an "comprising.
Reference throughout this document to "one embodiment," certain embodiments, "" an embodiment, "" implementation(s), "(aspect(s)" or similar terms means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of such phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments without limitation.
The term "or" as used herein is to be interpreted as inclusive or meaning any one or any combination. Thus, "A, B or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive. Additionally, unless stated otherwise or clear from context, grammatical conjunctions are intended to express any and all disjunctive (disjunctive) and conjunctive (conjunctive) combinations that incorporate clauses, sentences, words, and the like. Thus, the term "or" should generally be understood to mean "and/or" and the like.
All documents mentioned herein are hereby incorporated by reference in their entirety. Reference to a singular item should be understood to include a plural item and vice versa unless explicitly stated otherwise or clear from the text.
The words "about," "approximately," "substantially," and the like, when accompanied by numerical values, are to be construed as indicating a deviation as would be appreciated by one of ordinary skill in the art to operate satisfactorily for the intended purpose. Values and/or ranges of values are provided herein as examples only and are not limiting upon the scope of the described embodiments. The use of any and all examples, or exemplary language ("e.g.," such as "etc.), provided herein is intended merely to better illuminate the embodiments and does not pose a limitation on the scope of the embodiments. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the embodiments.
For simplicity and clarity of illustration, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. Numerous details are set forth to provide an understanding of the embodiments described herein. Embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the described embodiments. The description should not be considered as limiting the scope of the embodiments described herein.
In the following description, it is to be understood that such terms as "first," "second," "top," "bottom," "upper," "lower," "above …," "below …," and the like are words of convenience and are not to be construed as limiting terms. Additionally, the terms apparatus and device may be used interchangeably herein.
System for controlling a power supply
Fig. 1 is a schematic illustration of a data processing network consistent with embodiments of the present disclosure. Referring to FIG. 1, a data processing network includes a plurality of
The processor core is arranged to process data according to the virtual memory address. For example, each of the processor cores may process data according to virtual memory addresses in a respective virtual memory address space, e.g., under control of an operating system or so-called hypervisor that allocates virtual memory address spaces to processes being executed by different processor cores, in part as a technique for avoiding processes associated with one processor core accidentally or maliciously accessing data appropriate for a process being executed by another of the processor cores.
First tier memory address translation
Elements of the data processing network, such as memory and other resources, are addressable by system addresses in a system address space. Memory address translation means are provided for translating between virtual memory addresses in the virtual memory address space and system addresses in the system address space. This system address space may be accessed via indirect means or via a processing device that accesses this system address space as an anonymous physical space (i.e., the physical memory for the processing device is virtualized). The system address is the "output" memory address of the first layer. The system address may represent a physical address that can be used to physically address a physical memory device or other addressable unit. Alternatively, the system address may represent an address that requires another phase of address translation before being used to access a physical memory device or other addressable unit. These options are equivalent from the point of view of address translation technology. That is, the address translation technique starts with a virtual memory address and generates a system memory address. Another stage of address translation for system addresses is provided by
In fig. 1, address translation is performed by so-called Range Table Buffers (RTBs) 105, 115. This performs address translation between virtual memory addresses in the virtual memory address space and system memory addresses in the system (output) address space. Each of the processor cores has a respective range table buffer. The operation of the range table buffer is described in detail in co-pending patent application No.15/649,930, which is hereby incorporated by reference in its entirety. However, other address translation techniques may be used, such as, for example, a Translation Lookaside Buffer (TLB).
The
Also provided on-chip is a cache and/or
In one embodiment, the cache/system cache 130(140) operates according to system (output) memory addresses generated by the
Dashed line 150 indicates a logical boundary between an on-chip device and an off-chip device, but it should be appreciated that this is merely an example and that the implementation of any of the modules shown on the same integrated circuit or as different circuits in FIG. 1 is a matter of the system designer. Thus, fig. 1 shows an illustrative example of how separation of on-chip and off-chip components may be achieved.
Memory node controller
One or more off-chip
The translation operation (a) mentioned above is a second layer address translation and may be performed using the techniques to be discussed below or by known memory address translation techniques. The management operation (b) for managing which of the
In the example of FIG. 1, two memory node controllers are provided. If one of the memory node controllers, such as
The present disclosure relates to Memory Node Controllers (MNCs). According to some embodiments, the MNC provides a dedicated hardware mechanism to collect and use metadata, including performance statistics such as reuse distance. The metadata is used to provide better placement of the memory pages among the available technologies.
As described above, the MNC maps at least one partition of the system address space of the data processing network to the physical device memory space. The MNC provides mapping functionality from the system address space to a physical space in a resource, such as, for example, a DRAM device, a block device, a Remote Direct Memory Access (RDMA) appliance, or a memory located on a hardware accelerator. An RDMA appliance may be any memory or storage device for remote memory access. The MNC provides functionality for performing: write to system addresses, maintain page level ownership across memory structures, optimally place pages into memory via metadata tracking, and feed data forward to fast on-chip memory. The MNC is implemented in hardware and may be part of an integrated circuit with additional functionality. For example, the synthetic MNCs may be represented from a netlist or a Hardware Description Language (HDL).
According to some embodiments, the MNC provides a single abstraction of resources, such as storage, memory, or Network Interface Controllers (NICs), into a single system address space.
According to some embodiments, the MNC provides a means for treating memory at the MNC page level as "shareable" between multiple MNCs.
According to some embodiments, the MNC provides a second layer of Copy-on-Write (Copy-on-Write) devices.
According to certain embodiments, the MNC provides an efficient means for performing a single copy operation. This may be provided, for example, to all levels of the accelerator device and may be provided via the NIC.
According to some embodiments, the MNC is part of a memory fabric configured in compliance with a memory server model, where the memory fabric services memory requests from various local or remote computing devices of the data processing network.
Fig. 2 is a block diagram of a
The
IO controller
The
Memory controller
The
The
In the embodiment shown in fig. 2, the
Physical device configuration settings (PDSC) memory
The configuration data is stored in a Physical Device Configuration Settings (PDCS)
In one embodiment, the PDSC memory stores information that tells the MNC about the devices present on the network. This enables devices to "map" each other in effect from the specified memory space into their operating system and file system.
The information about the storage devices is slightly different and tells the MNC what devices are attached, its characteristics, and what bus channels or channels to assign to them. Further, for a PCIe accelerator, it may be desirable to provide other configuration data as well as a system address mapping for the accelerator device so that it can be mapped into the system address space of the host operating system for virtualization.
In summary, the configuration information in the PDSC memory provides the MNC with the information needed to make external devices act and map devices such as accelerators, computing devices and network interface controllers into the system address space to enable virtualization. This information may supplement or replace traditional device configurations within the kernel.
Other memory node controllers can be easily discovered at startup through handshaking, but they can also be specified directly in the configuration information.
System to physical transfer (STP) cache architecture
A system to physical translation (STP)
Caching
The
Network configuration
FIG. 3 is a block diagram of a
The function of the
In operation, a request sent from the computing device to the MNC references a system address. The request sent from the MNC to the memory/storage resource references a physical (or network) address. The MNC is configured to perform a translation from a system address to a physical address.
Routing of memory access requests in a network
One function of the memory node controller is to translate system addresses to physical addresses. When a request to access memory at a particular address is sent in a data processing system of a network having a memory node controller, the request is routed to the appropriate MNC. Various routing techniques may be used. Embodiments may use, for example, a cluster memory node controller scheme as depicted in FIG. 3. Typically, in a clustering scheme, there may be a maximum of N cores or computing devices for each of the K MNCs. The N computer elements will be clustered such that the best route is to the local memory node. Each memory request originating from these cores goes directly to the nearest MNC. If the request is for a page that is statically assigned to the MNC or for a page that is dynamically shared from another MNC, the request may be immediately returned to the core. However, if another MNC owns the memory (e.g., as determined by a coherence protocol such as, for example, the coherence protocol shown in fig. 4 and discussed below), there is one additional network hop before being satisfied for the request. When operation of the system is initiated, each MNC is assigned a region or partition of the overall system addressable space that can be utilized by the system (while providing for repartitioning to facilitate hot-plugging). Each of the partitions assigned to each memory node is then divided into pages. The advantage of this system is that locality is implicit by the MNC from which the memory request originated. The computing device of the access node is known (or at least the cluster of access nodes is known) without additional data. Using this information, the MNC can migrate the data pages within the memory network or retrieve them from the owner of the partition if the computing mode is authorized for it.
Consistent with some embodiments, the retrieval of pages is facilitated by a restricted directory structure. The restricted directory structure may be stored in memory local to the MNC (such as
Referring again to fig. 3, when a request from a core or computing device to access memory at a system address arrives at the MNC of the
When an address arrives in a request from the core to the MNC, a routing calculation is performed for that address. If the address is outside the partition of the current storage node, a range lookup for routing may be performed in parallel by consulting the directory to determine if the page is checked in from its external system address partition master node.
In parallel with routing the system address, one of two operations can be performed depending on implementation requirements. First, the hash can be consulted to see if the page (assuming the read address is outside the partition of the current memory node of the system address space) is checked out from its home node and currently resides in the current node (the node performing the address calculation). Another option is to use a directory-like methodology that sends the request data packet to the master node of the system address partition, which then determines whether the page is detected by a closer node. In this method, the originating MNC (i.e., the first node that receives the request from the computing device) is encoded in a packet. This approach may require an additional network hop if the node is detected locally, but has the benefit of reduced overall data movement while retaining the benefit of data interleaving with the requesting socket.
Within the MNC, there are a number of data structures that can be used in hardware to store paging information. For example, in one embodiment, a sparse hash map structure is used, which may be implemented as a tree structure. In a write operation to a page without physical support, support is created in a class of memory chosen by the optimization function (most likely first in DRAM as an example), however, it can easily create new dirty pages in non-volatile media. In a read operation, a similar thing happens. The operations may be performed on a page-by-page basis, where a page is a subset of the range at a certain granularity (e.g., 4K). In this way, range translation is provided and pages are striped/placed on the most efficient memory technology. Reference is made below to the description of this structure. Each page can be placed anywhere in the memory network by the MNC without the core making any changes or taking any action.
Since data can be shared between computer devices, a coherency protocol is used to prevent access to memory pages containing stale data. To this end, a restricted directory structure may be used to store the state of the data page.
Fig. 4 illustrates an example state diagram 400 for a modified MESI protocol consistent with embodiments of the present disclosure. The data can be identified as being in one of four different states: "Modify" (M)402, "Exclusive" (E)404, "shared" (S)406, and "invalid" (I) 408. MESI diagram 400 illustrates transitions between different states. The status may be indicated by status bits in the metadata for each page, in addition to other data such as page utilization statistics, performance counters, and the like. State diagram 400 illustrates that the modified MESI protocol for MNC page sharing is managed within the MNC network.
For example, a detected page that is not a partition from the current MNC is called a "foreign" page and its detected state is recorded in a directory of the MNC, which may be stored in local memory. The detected state is indicated in the page metadata for pages in the partition of the current MNC, i.e. pages for which the current MNC is the main partition MNC.
For example, when a page is allocated, it may be initially checked out from the main partition MNC in the "exclusive" state (E) 404. After the write has occurred, the state changes to "modified" (M). After the page has been synchronized back to the main partition MNC, the state returns to "shared" (S) 406.
If a page is deallocated while it is in either the exclusive (E)
Once space is needed in the directory structure or if pages are requested from other nodes in the shared state, the checked-out pages will eventually be migrated back to the main partition MNC. Moving the page back to the main partition MNC is similar to writing data back to memory from a standard cache. However, the MNC may keep the page in persistent or volatile memory, which is indicated in the metadata as the state of the checked out page.
Each MNC is assigned a block or partition of the full system address space at start-up or during the renegotiation process. An example system address range may be: { base _ address +0) → { base _ address + n). This partition is further subdivided into physical addresses (or network addresses in the case of a NIC) behind the MNC. The MNC controls access to all resources behind it that can store data and maps the system address space to physical or network addresses in those resources. The file system and networking functionality may also be mapped into this address space. An accelerator with on-board memory is also mapped into this address space and can be accessed from its virtual address space through the interface without knowledge of the system address space.
Starting on the processor core, a Range Translation Buffer (RTB) or a Translation Lookaside Buffer (TLB) is used to translate the virtual memory address in the request to an address in the system address space. The request is then sent to the memory node controller.
In a first embodiment, the memory space is divided between the MNCs so that there is a fixed static mapping after boot. This approach has the advantage of being fast for route calculation and always going to the correct node. However, this approach may not be optimal for on-chip routing or for optimizing memory placement. One reason that fixed computations are suboptimal is that it increases on-chip memory traffic that could otherwise be used for core-to-core (or thread-to-thread) communications. Another reason is that the overhead required to support on-chip cluster locality with N cores is log in each memory request case2(N) is provided. The request will be tagged to indicate its origin and then passed to the MNC. This approach may be used, for example, when simplicity is more important than overhead.
In another embodiment, it is assumed that there will be N cores per K MNCs, and these N cores will be clustered, as opposed to a fully connected grid. In this approach, the routing path is from the cluster through one or more caches (including any DDCs) and then to the off-chip interconnect controller associated with the cluster. For example, this interconnect may utilize PCIe or other physical layers. The MNCs are each assigned a static system address partition. This may be done after boot configuration or system reset to facilitate hot add/remove/plug of storage/memory. At the MNC, the system address range is further subdivided into pages that are either zero-allocated (for initial allocation) or point to physical memory on some device. The advantage of this routing path is that the source of the request is implicit in the traffic source. For example, it is known which core cluster created the traffic implicitly without further information or metadata. Capturing and recording the originating core would otherwise require at least enough bits to encode the number of clusters within a node per memory request. Using the locality information, the MNC network may migrate virtual pages within the network or detect them from the owner of the system address partition if the computing mode grants it.
Efficient use of this infrastructure is facilitated through software awareness. Given that the system address space is split between N MNCs, and these MNCs are connected to the computing device based on, for example, physical locality, tasks may be scheduled such that they are executed on computing devices connected to MNCs that control already allocated system and physical memory, or at least to nearby MNCs. This ensures low latency communication.
Fig. 5 is a block diagram of a
When core 502 issues a request to access data at a virtual address in
If the system address is not in partition R2, the MNC controlling that partition is identified and the request forwarded to the MNC identified to service the request. Any response to the request is returned to core 502 via link 532.
In some embodiments, the translation between the system address and the physical address is made within the MNC using a data structure stored in a system-to-physical translation (STP) cache (e.g., 230 in fig. 2). The data structure may be a table that is hashed using a mask of the page directory purpose. For example, the page number may be calculated with a logical and operation between the system address and the page size that is a power of two. An example page entry in the RTP cache may contain the information shown in table 1.
Table 1.
In one embodiment, copy-on-write is supported using three pointers, one to an entry that is the current clean physical copy (head), one to the parent and one to the child. This enables the update process to be optimized. Other variations will be apparent to those skilled in the art.
Memory allocation may be efficiently handled by using a partner memory allocation scheme or other scheme that may be represented by a sparse tree. Compression (reordering) of the system address range may be accomplished, for example, by signaling the OS to find a processing thread containing the system address and then change the system address range. This process can be time consuming. However, for large system address spaces (such as 64 bits), this is unlikely to happen unless the current system becomes very larger.
The page information is stored in a memory. In the simplest implementation of hardware, an entry is used for each page. For example, if a single MNC is assigned a 100TB address partition and if the page size is selected to be 2MB, the table will fit into a small 64MB SRAM structure even if the device is fully full. Additional space is required if other metadata is to be stored. However, in one embodiment, the size of the table is reduced by compressing empty pages into the zero page range. In another embodiment, the translation data may be persistent or have a second copy to ensure persistence.
FIG. 6 is a flow diagram 600 of a method for accessing one or more data resources by one or more computing devices in a data processing network that routes memory access requests consistent with embodiments of the present disclosure. Following
Some embodiments relate to a method for routing memory access requests consistent with embodiments of the present disclosure. The method is applicable to a cluster memory node controller scheme as described above with reference to figure 3. In general, in a clustering scheme, there may be a maximum of N cores or computing devices for each of the K MNCs. The N computer elements will be clustered such that the best route is to the local memory node. A memory request to access a resource system address is received by the MNC. Each memory request originating from these cores goes directly to the nearest MNC, so the request comes from a core in the local cluster of MNCs. A channel to the appropriate data resource holding the requested page is determined at block 620. If the request is for a page that is statically assigned to the MNC or for a page that is dynamically shared from another MNC, the request may be immediately returned to the core. However, if another MNC owns the memory (e.g., as determined by the coherence protocol), then there is one additional network hop for the request before it is satisfied. When operation of the system is initiated, each MNC is assigned a region of the overall system addressable space that can be utilized by the system (while providing for re-partitioning to facilitate hot-plugging). Each of the partitions assigned to each memory node is then divided into pages. The advantage of this system is that locality is implicit by the MNC from which the memory request originated. The computing device of the access node is known (or at least the cluster of access nodes is known) without additional data. Using this information, the MNC can migrate the data pages within the memory network or detect them from the owner of the partition if the computing mode authorizes it.
The MNC may allocate memory from the system address space of another memory node controller for use within its system address translation table to redirect one system address to another. For example, for a defragmentation operation, the first MNC may be able to allocate memory in the system address space partition of the second MNC, where the first MNC will display these pages as checked out from the second MNC in the first MNC. The first MNC will keep the physical memory support of the page as it was. Once the address range allocated from the second MNC is entered in the appropriate table, the offset within the range table entry may be changed to point to the new system address range. The system address range that was previously used at this time is now free. The new system address range from the second MNC and the multiple pages making up that address range are now free to migrate independently according to a coherency protocol, metadata or scheduling algorithm.
A system may be equipped with a system cache structure, referred to as a data transfer cache (DDC). In this embodiment, system memory requests are sent to the DDC and MNC simultaneously. If the MNC has registered the page as present in the DDC, then the line is served from the DDC and ignored from the MNC. The request is served in the MNC if the line is present in the MNC. It should be apparent that the synchronization between MNCs can take the form of a directory or filter mechanism. Example embodiments implement a send/acknowledge system in which a DDC does not begin serving pages once it has received an acknowledgement from the DDC that the page was installed. Along with the acknowledgement, the DDC will receive outstanding memory requests when it decides to push a page from the MNC to the DDC.
The various embodiments and examples of the present disclosure as presented herein are understood to illustrate the present disclosure without limiting it, and are not limiting with respect to the scope of the present disclosure.
Further specific and preferred aspects of the present disclosure are set out in the accompanying independent and dependent claims. Features of the dependent claims may be combined with features of the independent claims as appropriate and in combinations other than those explicitly set out in the claims.
One or more memory node controllers may be implemented in an integrated circuit. For example, a circuit may be defined as a set of instructions in a Hardware Description Language (HDL), which may be stored in a non-transitory computer readable medium. The instructions may be distributed via a computer readable medium or via other means, such as a wired or wireless network. The instructions may be used to control the fabrication or design of an integrated circuit, and may be combined with other instructions.
Although illustrative embodiments of the present invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.
It will be appreciated that the above-described apparatus, systems, and methods are set forth by way of example, and not by way of limitation. Absent explicit indication to the contrary, the disclosed steps may be modified, supplemented, omitted, and/or reordered without departing from the scope of the present disclosure. Many variations, additions, omissions, and other modifications will become apparent to those of ordinary skill in the art. Additionally, the order or presentation of method steps in the above description and the figures is not intended to require the performance of such order of recited steps, unless a particular order is explicitly required or otherwise clear from the context.
Unless a different meaning is explicitly provided or otherwise clear from the context, the method steps of the implementations described herein are intended to include any suitable method that causes such method steps to be performed, consistent with the patentability of the following claims.
It should further be appreciated that the above methods are provided as examples. Absent explicit indication to the contrary, the disclosed steps may be modified, supplemented, omitted, and/or reordered without departing from the scope of the present disclosure.
It will be appreciated that the above-described methods and systems are set forth by way of example, and not by way of limitation. Many variations, additions, omissions, and other modifications will become apparent to those of ordinary skill in the art. Additionally, the order or representation of the method steps in the above description and the figures is not intended to require the order in which the recited steps are performed, unless a specific order is explicitly claimed or otherwise clear from the context. Thus, while particular embodiments have been shown and described, it will be obvious to those skilled in the art that various changes and modifications in form and detail may be made without departing from the scope of the disclosure, and it is intended that such changes and modifications in form and detail form part of the disclosure as defined by the following claims, which are to be interpreted in the broadest sense permitted by law.
Various representative embodiments that have been described in detail herein have been presented by way of example and not by way of limitation. It is understood that various changes in form and details of the described embodiments may be made to produce equivalent embodiments that remain within the scope of the appended claims.
Thus, some features of the disclosed embodiments are set forth in the following numbered items:
1. a memory node controller for a node of a data processing network having at least one computing device and at least one data resource, the node being configured to couple the at least one computing device with the at least one data resource and an element of the data processing network addressable via a system address space, the memory node controller comprising: a first interface to the at least one data resource, wherein each of the at least one data resource is addressed via a physical address space; a second interface to the at least one computing device; and a system address to physical address translator cache configured to translate a system address in the system address space to a physical address in a physical address space of a data resource of the at least one data resource.
2. The memory node controller of
3. The memory node controller of
4. The memory node controller of
5. The memory node controller of item 4, wherein the system address space comprises a plurality of address partitions, and wherein the memory node controller is associated with a first address partition of the plurality of address partitions and the another memory node controller is associated with a second address partition of the plurality of address partitions.
6. The memory node controller of
7. The memory node controller of
8. The memory node controller of
9. A non-transitory computer readable medium having instructions representing a hardware description language of the memory node controller system of
10. A non-transitory computer readable medium having a netlist representing a memory node controller according to
11. A data processing network, the data processing network comprising: a first memory node controller; a first plurality of addressable units addressed by a system address space and comprising a first plurality of data resources, each of the first plurality of data resources being coupled to the first memory node controller via a channel and addressed by a physical address space; and a first plurality of computing devices each coupled to the first memory node controller and configured to access the first plurality of addressable units via the first memory node controller, wherein the first memory node controller includes a system-to-physical address translator cache configured to translate a system address received from a computing device of the first plurality of computing devices to a physical address in an address space of a data resource of the first plurality of data resources.
12. The data processing network of item 11, further comprising: one or more second memory node controllers coupled to the first memory node controller; wherein the first memory node controller is assigned a first partition of system addresses in the system address space, wherein each of the one or more second memory node controllers is assigned a second partition of system addresses in the system address space, and wherein a computing device of the first plurality of computing devices comprises a range table that associates the first memory node controller with a system address in a first partition of the system address and associates each of the one or more second memory node controllers with a system address in a second partition of the corresponding system address, and configured to send a request to a memory node controller of the first and second memory node controllers associated with the system address to access memory at the system address.
13. The data processing network of item 12, further comprising: a second plurality of data resources each coupled to a second memory node controller of the one or more second memory node controllers via a channel and having a physical address space; and a second plurality of computing devices each coupled to a second memory node controller of the one or more second memory node controllers and configured to access the data processing network via the system address space, wherein the one or more second memory node controllers are configured to couple the second plurality of computing devices with the second plurality of data resources.
14. The data processing network of item 11, wherein the first plurality of addressable units further comprises a hardware accelerator.
15. The data processing network of item 11, wherein the first plurality of addressable units further comprises a network interface card.
16. A method for accessing one or more data resources by one or more computing devices in a data processing network, the method comprising: mapping elements of the data processing network to a system address space; assigning a first partition of the system address space to a first memory node controller of the data processing network, wherein the one or more computing devices and the one or more data resources are coupled to the first memory node controller; receiving, at the first memory node controller, a request to access an element of the data processing network at a system address in the system address space; and servicing, by the first memory node controller, the request when the system address is in a first partition of the system address space.
17. The method of item 16, further comprising: assigning a second partition of the system address space to a second memory node controller of the data processing network; and forwarding the request to the second memory node controller when the system address is in a second partition of the system address space.
18. The method of item 16, further comprising: assigning a second partition of the system address space to a second memory node controller of the data processing network; and servicing, by the first memory node controller, the request when the system address is in a second partition of the system address space and the system address is dynamically shared with the first memory node controller.
19. The method of item 16, wherein each of the one or more data resources is coupled to the first memory node controller via a channel, and wherein servicing the request by the first memory node controller comprises: identifying a channel to a data resource of the one or more data resources corresponding to the system address; translating the system address to a physical address in the data resource; and accessing the data resource at the physical address via the identified channel.
20. The method of item 16, wherein the first partition of the system address space comprises a first plurality of pages, the method further comprising: assigning a second partition of the system address space to a second memory node controller of the data processing network, wherein the second partition of the system address space comprises a second plurality of pages; monitoring access to the second plurality of pages by the one or more computing devices coupled to the first memory node controller; and migrating a page of the second plurality of pages from the second memory node controller to the first memory node controller based on the monitored access.
21. The method of item 20, further comprising: the coherency state of the migrated page is recorded.
22. The method of item 16, wherein the first partition of the system address space comprises a plurality of rows, and wherein the data processing network further comprises a data transfer cache, the method further comprising: monitor system memory requests to the plurality of rows by the one or more computing devices coupled to the first memory node controller; servicing, by the first memory node controller, a system memory store request when a requested line of the plurality of lines is not present in the data transfer cache; pushing the requested line from the first memory node controller to a data transfer cache of the data processing network in accordance with the monitored system memory request; and servicing, by the data transfer cache, the system memory request when the requested line is present in the data transfer cache.
23. The method of item 16, further comprising: assigning a second partition of the system address space to a second memory node controller of the data processing network, wherein the one or more additional data resources are coupled to the second memory node controller; and the first memory node controller: allocating memory within an address range of a second partition of the system address space; entering the assigned address range in a system address translation table of the first memory node controller; and directing memory requests for addresses within the allocated address range to the second memory node controller.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:增强用户空间与内核空间的隔离性的方法和装置