Techniques for demoting cache lines to a shared cache

文档序号：1598842 发布日期：2020-01-07 浏览：8次中文

阅读说明：本技术 用于将高速缓存行降级到共享高速缓存的技术 (Techniques for demoting cache lines to a shared cache ) 是由 E.塔米尔 B.理查森 N.鲍尔 A.坎宁安 D.亨特 K.德维 C.韦于 2019-05-30 设计创作，主要内容包括：用于将高速缓存行降级到共享高速缓存的技术包括一种计算设备,其具有有着多个核的至少一个处理器、具有核本地高速缓存和共享高速缓存的高速缓冲存储器、以及高速缓存行降级设备。计算设备的处理器的处理器核被配置成检索所接收的网络分组的数据的至少一部分,并将该数据移动到核本地高速缓存的一个或多个核本地高速缓存行中。处理器核被进一步配置成对该数据执行处理操作,并在已经完成处理操作之后向高速缓存行降级设备传输高速缓存行降级命令。高速缓存行降级设备被配置成执行高速缓存行降级操作以将该数据从核本地高速缓存行降级到共享高速缓存的共享高速缓存行。本文中描述了其他实施例。(Techniques for destaging cache lines to a shared cache include a computing device having at least one processor with a plurality of cores, a cache memory having a core local cache and a shared cache, and a cache line destaging device. A processor core of a processor of the computing device is configured to retrieve at least a portion of the data of the received network packet and move the data into one or more core local cache lines of the core local cache. The processor core is further configured to perform a processing operation on the data and transmit a cache line destage command to the cache line destage device after the processing operation has been completed. The cache line destaging device is configured to perform a cache line destaging operation to destage the data from the core local cache line to a shared cache line of the shared cache. Other embodiments are described herein.)

1. A computing device for demoting a cache line to a shared cache, the computing device comprising:

one or more processors, wherein each of the one or more processors comprises a plurality of processor cores;

a cache memory, wherein the cache memory comprises a core local cache and a shared cache, wherein the core local cache comprises a plurality of core local cache lines, and wherein the shared cache comprises a plurality of shared cache lines;

a cache line demotion device; and

a Host Fabric Interface (HFI) to receive network packets,

wherein processor cores of a processor of the one or more processors are to:

retrieving at least a portion of the data of the received network packet, wherein retrieving the data comprises moving the data into one or more of the plurality of core local cache lines;

performing one or more processing operations on the data; and

after the one or more processing operations on the data have been completed, transmitting a cache line destage command to a cache line destage device, and

wherein the cache line destaging device is to perform a cache line destaging operation to destage data from the one or more core local cache lines to one or more shared cache lines of the shared cache in response to having received the cache line destaging command.

2. The computing device of claim 1, wherein the processor core is further to determine whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein to transmit the cache line demotion command to the cache line demotion device comprises to transmit the cache line demotion command after determining that the size of the received network packet is greater than the packet size threshold.

3. The computing device of claim 2, wherein the processor core is further to transmit a cache line demotion instruction to a cache manager of the cache memory after having determined that the size of the received network packet is less than or equal to the packet size threshold, and wherein the cache manager is to demote data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on the cache line demotion instruction, wherein the cache line demotion instruction bypasses the cache line demotion device.

4. The computing device of claim 3, wherein to transmit a cache line demotion instruction comprises to transmit one or more cache line identifiers corresponding to the one or more shared cache lines.

5. The computing device of claim 1, wherein to perform a cache line demotion operation comprises to perform a read request or a direct memory access.

6. The computing device of claim 1, wherein the cache line demotion command comprises an indication of a core local cache line associated with the received network packet to be demoted to the shared cache.

7. The computing device of claim 1, wherein the cache line destaging device comprises one of a copy engine, a Direct Memory Access (DMA) device operable to copy data, or an offload device operable to perform read operations.

8. The computing device of claim 1, wherein to transmit a cache line demotion command comprises to transmit one or more cache line identifiers corresponding to the one or more shared cache lines.

9. A computing device for demoting a cache line to a shared cache, the computing device comprising:

means for retrieving, by a processor of a computing device, at least a portion of data of a network packet received by a Host Fabric Interface (HFI) of the computing device, wherein retrieving the data comprises moving the data into one or more core local cache lines of a plurality of core local cache lines of a core local cache of the computing device, and wherein the processor comprises a plurality of processor cores;

means for performing, by a processor core of the plurality of processor cores, one or more processing operations on data;

means for transmitting, by the processor and after the one or more processing operations on the data have been completed, a cache line destage command to a cache line destage device of the computing device; and

means for performing, by the cache line destaging device and in response to having received the cache line destage command, a cache line destage operation to destage data from the one or more core local cache lines to one or more shared cache lines of a shared cache of the computing device.

10. The computing device of claim 9, further comprising means for determining whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein means for transmitting a cache line demotion command to a cache line demotion device comprises means for transmitting the cache line demotion command after determining that the size of the received network packet is greater than a packet size threshold.

11. The computing device of claim 10, further comprising means for transmitting a cache line destage instruction to a cache manager of a cache memory including the core local cache and the shared cache after it has been determined that the size of the received network packet is less than or equal to the packet size threshold, and wherein the cache manager is to destage data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on the cache line destage instruction.

12. The computing device of claim 11, wherein means for transmitting a cache line demotion instruction comprises means for transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

13. The computing device of claim 9, wherein means for performing a cache line demotion operation comprises means for performing a read request or a direct memory access.

14. The computing device of claim 9, wherein means for transmitting a cache line demotion command comprises means for transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

15. A method for demoting a cache line to a shared cache, the method comprising:

retrieving, by a processor of a computing device, at least a portion of data of a network packet received by a Host Fabric Interface (HFI) of the computing device, wherein retrieving the data comprises moving the data into one or more core local cache lines of a plurality of core local cache lines of a core local cache of the computing device, and wherein the processor comprises a plurality of processor cores;

performing, by a processor core of the plurality of processor cores, one or more processing operations on data;

transmitting, by the processor core and after the one or more processing operations on the data have been completed, a cache line demotion command to a cache line demotion device of the computing device; and

performing, by the cache line destaging device and in response to having received the cache line destage command, a cache line destage operation to destage data from the one or more core local cache lines to one or more shared cache lines of a shared cache of the computing device.

16. The method of claim 15, further comprising determining whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein transmitting a cache line destage command to a cache line destage device comprises transmitting the cache line destage command after determining that the size of the received network packet is greater than the packet size threshold.

17. The method of claim 16, further comprising:

transmitting, by the processor core and after having determined that the size of the received network packet is less than or equal to the packet size threshold, a cache line demotion instruction to a cache manager of a cache memory including the core local cache and the shared cache; and

demoting, by the cache manager, data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on a cache line demotion instruction.

18. The method of claim 17, wherein transmitting a cache line demotion instruction comprises transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

19. The method of claim 15, wherein performing a cache line destage operation comprises performing one of a read request or a direct memory access.

20. The method of claim 15, wherein transmitting a cache line demotion command comprises transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

21. A computing device for demoting a cache line to a shared cache, the computing device comprising:

circuitry for retrieving, by a processor of a computing device, at least a portion of data of a network packet received by a Host Fabric Interface (HFI) of the computing device, wherein retrieving the data comprises moving the data into one or more core local cache lines of a plurality of core local cache lines of a core local cache of the computing device, and wherein the processor comprises a plurality of processor cores;

circuitry to perform one or more processing operations on data by a processor core of the plurality of processor cores;

circuitry for transmitting, by the processor core and after the one or more processing operations on the data have been completed, a cache line demotion command to a cache line demotion device of the computing device; and

22. The computing device of claim 21, further comprising circuitry to determine whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein to transmit a cache line demotion command to a cache line demotion device comprises to transmit the cache line demotion command after determining that the size of the received network packet is greater than the packet size threshold.

23. The computing device of claim 22, further comprising:

circuitry for transmitting, by the processor core and after having determined that the size of the received network packet is less than or equal to the packet size threshold, a cache line demotion instruction to a cache manager of a cache memory comprising the core local cache and the shared cache; and

circuitry for demoting, by the cache manager, data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on a cache line demotion instruction.

24. The computing device of claim 23, wherein to transmit a cache line demotion instruction comprises to transmit one or more cache line identifiers corresponding to the one or more shared cache lines.

25. The computing device of claim 21, wherein means for performing a cache line demotion operation comprises means for performing one of a read request or a direct memory access.

Background

Modern computing devices have become ubiquitous tools for personal, business, and social uses. As such, many modern computing devices are capable of connecting to various data networks, including the internet, to transmit and receive data communications at different rates over the various data networks. To facilitate communication between computing devices, data networks typically include one or more network computing devices (e.g., computing servers, storage servers, etc.) to route communications (e.g., north-south network traffic (traffic)) entering/leaving the network and communications (e.g., east-west network traffic) between network computing devices in the network (e.g., via switches, routers, etc.). In current packet-switched network architectures, data is transmitted in the form of network packets between networked computing devices. At a high level, data is packetized into network packets at one computing device, and the resulting packets are transmitted over a network to another computing device via a transmitting device (e.g., a Network Interface Controller (NIC) of the computing device).

Upon receipt of a network packet, the computing device typically performs one or more processing operations (e.g., security, Network Address Translation (NAT), load balancing, Deep Packet Inspection (DPI), Transmission Control Protocol (TCP) optimization, caching, Internet Protocol (IP) management, etc.) to determine what the computing device will do with the network packet (e.g., drop the network packet, process/store at least a portion of the network packet, forward the network packet, etc.). To do so, such packet processing is often performed in a packet processing pipeline (e.g., a service function chain) in which at least a portion of the data of a network packet is passed from one processor core to another as it is processed. However, during such packet processing, stalls may occur due to cross-core snoops, and cache pollution with stale data may be a problem.

Drawings

In the accompanying drawings, the concepts described herein are illustrated by way of example and not by way of limitation. For simplicity and clarity of illustration, elements illustrated in the figures have not necessarily been drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.

FIG. 1 is a simplified block diagram of at least one embodiment of a system for demoting a cache line to a shared cache, the system comprising a source computing device and a network computing device communicatively coupled via a network;

FIG. 2 is a simplified block diagram of at least one embodiment of an environment of a network computing device of the system of FIG. 1;

FIG. 3 is a simplified flow diagram of at least one embodiment of a method for demoting a cache line to a shared cache, which may be performed by the network computing devices of FIGS. 1 and 2; and

fig. 4 and 5 are simplified block diagrams of at least one embodiment of another environment of the network computing device of fig. 1 and 2 for demoting a cache line to a shared cache.

Detailed Description

While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.

References in the specification to "one embodiment," "an illustrative embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is claimed that: it is within the knowledge of one skilled in the art to effect such features, structures, or characteristics in connection with other embodiments, whether or not explicitly described. Further, it should be appreciated that an item included in the list in the form of "at least one of A, B and C" may mean (a); (B) (ii) a (C) (ii) a (A and B); (A and C); (B and C); or (A, B and C). Similarly, an item listed in the form of "at least one of A, B or C" may mean (a); (B) (ii) a (C) (ii) a (A and B); (A and C); (B and C); or (A, B and C).

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disk, or other media device).

In the drawings, some structural or methodical features may be shown in a particular arrangement and/or ordering. However, it should be appreciated that such a particular arrangement and/or ordering may not be required. More specifically, in some embodiments, such features may be arranged in a manner and/or order different from that shown in the illustrative figures. Moreover, the inclusion of structural or methodical features in a particular figure is not intended to imply that such features are required in all embodiments and may not be included or may be combined with other features in some embodiments.

Referring now to FIG. 1, in an illustrative embodiment, a system 100 for demoting a cache line to a shared cache includes a source computing device 102 communicatively coupled to a network computing device 106 via a network 104. Although illustratively shown as having a single source computing device 102 and a single network computing device 106, in other embodiments, the system 100 may include multiple source computing devices 102 and multiple network computing devices 106. It should be appreciated that for purposes of providing clarity to the description, the source computing device 102 and the network computing device 106 have illustratively been designated herein as one of a "source" and a "destination," and the source computing device 102 and/or the network computing device 106 may be capable of performing any of the functions described herein. It should be further appreciated that the source computing device 102 and the network computing device 106 may reside in the same data center or High Performance Computing (HPC) environment. In other words, the source computing device 102 and the network computing device 106 may reside in the same network 104 connected via one or more interconnects.

In use, the source computing device 102 and the network computing device 106 transmit to and receive network traffic (e.g., network packets, frames, etc.) from each other. For example, the network computing device 106 may receive the network packet from the source computing device 102. Upon receipt of a network packet, the network computing device 106, or more specifically, a Host Fabric Interface (HFI) 126 of the network computing device 106, identifies one or more processing operations to be performed on at least a portion of the network packet and performs a certain level of processing thereon. To do so, the processor core 110 requests access to data that may have been previously stored or moved into a shared cache memory (typically cached on or near the processor). The network computing device 106 is configured to move the requested data to a core local cache (e.g., the core local cache 114) for faster access to the data requested by the requesting processor core 110.

Oftentimes, more than one processing operation (e.g., guarding, Network Address Translation (NAT), load balancing, Deep Packet Inspection (DPI), Transmission Control Protocol (TCP) optimization, caching, Internet Protocol (IP) management, etc.) is performed by a network computing device, with each operation typically being performed by a different processor core in a packet processing pipeline, such as a service function chain. Thus, at the completion of processing, data accessed by one processor core needs to be released (e.g., destaged to the shared cache 116) for the next processor core to perform its designated processing operation.

To do so, the network computing device 106 is configured to: based on the size of the network packet, an instruction is transmitted to the cache manager to destage the cache line(s) from the core local cache 114 to the shared cache 116, or a command is transmitted to an offload device (see, e.g., cache line offload device 130) to trigger a cache line destage operation to be performed by the offload device to destage the cache line(s) from the core local cache 114 to the shared cache 116. In other words, once processing has completed, each processor core demotes the applicable packet cache line to the shared cache 116, which allows better cache reuse on the first processing core and saves cross-core snooping on the second processing core in the packet processing pipeline (e.g., modifying data) or input/output (I/O) pipeline. Thus, unlike current techniques, stalls due to cross-core snoops and cache pollution can be effectively avoided. Further, also unlike current techniques, costs attributable to ownership requests when the requested data is not in a shared cache or otherwise inaccessible by the requesting processor core may be avoided.

The network computing device 106 may be embodied as any type of computing or computer device capable of performing the functions described herein, including but not limited to a computer, a server (e.g., standalone, rack-mounted, blade, etc.), a sled (e.g., computing sled, accelerator sled, storage sled, memory sled, etc.), an enhanced or intelligent Network Interface Controller (NIC)/HFI, a network appliance (e.g., physical or virtual), a router, a switch (e.g., a disaggregated switch, a rack-mounted switch, a standalone switch, a fully managed switch, a partially managed switch, a full-duplex switch, and/or a switch that supports a half-duplex communication mode), a web appliance, a distributed computing system, a processor-based system, and/or a multi-processor system.

As shown in FIG. 1, the illustrative network computing device 106 includes one or more processors 108, memory 118, an I/O subsystem 120, one or more data storage devices 122, communication circuitry 124, a destaging device 130, and in some embodiments, one or more peripheral devices 128. It should be appreciated that in other embodiments, the network computing device 106 may include other or additional components, such as those commonly found in typical computing devices (e.g., various input/output devices and/or other components). Further, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form part of, another component.

Processor(s) 108 may be embodied as any type of device or collection of devices capable of performing various computing functions as described herein. In some embodiments, processor(s) 108 may be embodied as one or more multi-core processors, Digital Signal Processors (DSPs), microcontrollers, or other processor(s) or processing/control circuit(s). In some embodiments, the processor(s) 108 may be embodied as, include, or otherwise be coupled to an integrated circuit, an embedded system, a field programmable array (FPGA) (e.g., reconfigurable circuitry), a system on a chip (SOC), an Application Specific Integrated Circuit (ASIC), reconfigurable hardware or hardware circuitry, or other special purpose hardware that facilitates the performance of the functions described herein.

The illustrative processor(s) 108 include a plurality of processor cores 110 (e.g., two processor cores, four processor cores, eight processor cores, sixteen processor cores, etc.) and a cache memory 112. Each of the processor cores 110 may be embodied as a separate logical execution unit capable of executing programmed instructions. It should be appreciated that in some embodiments, the network computing device 106 (e.g., in a supercomputer embodiment) may include thousands of processor cores. Each of the processor(s) 108 may connect to a physical connector or socket on a motherboard (not shown) of the network computing device 106 that is configured to accept a single physical processor package (i.e., a multi-core physical integrated circuit). In addition, each of the processor cores 110 is communicatively coupled to at least a portion of the cache memory 112 and functional units that may be used to independently execute programs, operations, threads, and the like. It should be appreciated that the processor(s) 108 as described herein are not limited to being on the same die or socket.

The cache 112 may be embodied as: any type of cache that the processor 108 may access faster than the memory 118 (i.e., main memory), such as an on-die cache or an on-processor cache. In other embodiments, the cache memory 112 may be an off-die cache, but reside on the same system-on-chip (SoC) as the processor 108. The illustrative cache memory 112 includes a multi-level cache architecture embodied as a core local cache 114 and a shared cache 116. The core local cache 114 may be embodied as a cache memory dedicated to a particular one of the processor cores 110. Thus, while illustratively shown as a single core local cache 114, it should be appreciated that in some embodiments, there may be at least one core local cache 114 for each processor core 110.

The shared cache 116 may be embodied as a cache memory, typically larger than the core local cache 114 and shared by all processor cores 110 of the processor 108. For example, in the illustrative embodiment, the core local caches 114 may be embodied as level 1 (L1) caches and level 2 (L2) caches, while the shared cache 116 may be embodied as a level 3 (L3) cache. In such embodiments, it should be appreciated that the L1 cache may be embodied as any type of memory local to the processor core 110, commonly referred to as the "primary cache," which is the fastest memory that is closest to the processor 108. It should be further appreciated that in such embodiments, the L2 cache may be embodied as any type of memory local to the processor cores 110, commonly referred to as a "mid-level cache," which is capable of feeding the L1 cache, with a memory that is larger, slower than the L1 cache, but is typically smaller, faster than the L3/shared cache 116 (i.e., Last Level Cache (LLC)). In other embodiments, the multi-level cache architecture may include additional and/or alternative levels of cache memory. Although not illustratively shown in fig. 1, it should further be appreciated that the cache memory 112 includes a memory controller (see, e.g., the cache manager 214 of fig. 2) that may be embodied as a controller circuit or other logic that serves as an interface between the processor 108 and the memory 118.

The memory 118 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 118 may store various data and software used during operation of the network computing device 106, such as operating systems, applications, programs, libraries, and drivers. It should be appreciated that the memory 118 can be referred to as a main memory (i.e., primary memory). Volatile memory can be a storage medium that requires power to maintain the state of data stored by the medium. Non-limiting examples of volatile memory may include various types of Random Access Memory (RAM), such as Dynamic Random Access Memory (DRAM) or Static Random Access Memory (SRAM).

Each of the processor(s) 108 and the memory 118 are communicatively coupled to other components of the network computing device 106 via an I/O subsystem 120, which I/O subsystem 120 may embody circuitry and/or components that facilitate input/output operations with the processor(s) 108, the memory 118, and other components of the network computing device 106. For example, the I/O subsystem 120 may be embodied as or otherwise include a memory controller hub (hub), an input/output control hub, an integrated sensor hub, firmware devices, communication links (e.g., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.), and/or other components and subsystems that facilitate input/output operations. In some embodiments, the I/O subsystem 120 may form part of a SoC and be incorporated on a single integrated circuit chip along with one or more of the processors 108, memory 118, and other components of the network computing device 106.

The one or more data storage devices 122 may be embodied as any type of storage device(s) configured for short-term or long-term storage of data, such as, for example, memory devices and circuits, memory cards, hard drives, solid-state drives, or other data storage devices. Each data storage device 122 may include a system partition that stores data and firmware code for the data storage device 122. Each data storage device 122 may also include an operating system partition that stores data files and executable files for the operating system.

The communication circuitry 124 may be embodied as any communication circuitry, device, or collection thereof capable of enabling communication between the network computing device 106 and other computing devices, such as the source computing device 102, as well as any network communication enabling devices, such as access points, switches, routers, and the like, to allow communication over the network 104. Thus, the communication circuit 124 may be configured to use any one or more communication technologies (e.g., wireless or wired communication technologies) and associated protocols (e.g., ethernet, bluetooth)^®、Wi-Fi^®WiMAX, LTE, 5G, etc.) to enable such communication.

It should be appreciated that in some embodiments, the communication circuitry 124 may comprise dedicated circuitry, hardware, or a combination thereof that executes pipeline logic (e.g., hardware algorithms) for performing the functions described herein, including processing network packets (e.g., parsing received network packets, determining a destination computing device for each received network packet, forwarding network packets to a particular buffer queue of a respective host buffer of the network computing device 106, etc.), performing computing functions, and so forth.

In some embodiments, execution of one or more of the functions of the communication circuitry 124 as described herein may be performed by dedicated circuitry, hardware, or a combination thereof of the communication circuitry 124, which may be embodied as a SoC or otherwise form part of a SoC of the network computing device 106 (e.g., incorporated on a single integrated circuit chip along with the processor 108, the memory 118, and/or other components of the network computing device 106). Alternatively, in some embodiments, the dedicated circuitry, hardware, or a combination thereof may be embodied as one or more discrete processing units of the network computing device 106, each of which may be capable of performing one or more of the functions described herein.

The illustrative communication circuitry 124 includes an HFI126, which HFI126 may be embodied as one or more add-in boards, daughter boards, network interface cards, controller chips, chipsets, or other devices that may be used by the network computing device 106 to connect with another computing device (e.g., the source computing device 102). In some embodiments, HFI126 may be embodied as part of a SoC that includes one or more processors, or included on a multi-chip package that also contains one or more processors. In some embodiments, HFI126 may include a local processor (not shown) and/or local memory (not shown), both of which are local to HFI 126. In such embodiments, the local processor of HFI126 may be capable of performing one or more of the functions of processor 108 described herein. Additionally or alternatively, in such embodiments, the local memory of HFI126 may be integrated into one or more components of network computing device 106 at the board level, socket level, chip level, and/or other levels.

The one or more peripheral devices 128 may include any type of device that may be used to input information into the network computing device 106 and/or receive information from the network computing device 106. The peripheral devices 128 may be embodied as any auxiliary device usable to input information into the network computing device 106, such as a keyboard, mouse, microphone, bar code reader, image scanner, or the like, or any auxiliary device usable to output information from the network computing device 106, such as a display, speakers, graphics circuitry, printer, projector, or the like. It is to be appreciated that in some embodiments, one or more of the peripherals 128 can function as both an input device and an output device (e.g., a touch screen display, a digitizer above a display screen, etc.). It should also be appreciated that the type of peripheral device 128 connected to the network computing device 106 may depend on, for example, the type and/or intended use of the network computing device 106. Additionally or alternatively, in some embodiments, the peripheral device 128 may include one or more ports, such as a USB port, for example, for connecting an external peripheral device to the network computing device 106.

The cache line destaging device 130 may be embodied as any type of firmware, software, and/or hardware device operable to initiate a cache line destage from the core local cache 114 to the shared cache 116. In some embodiments, the cache line destaging device 130 may be embodied as, but not limited to, a copy engine, a Direct Memory Access (DMA) device that may be used to copy data, a device that supports offload reads, and the like. It should be appreciated that the cache line demotion device 130 may be any type of device capable of reading or pretending to read data, so long as the cache line associated with the data will be demoted to the shared cache 116 as a side effect when the device interacts with or otherwise requests access to the data.

The source computing device 102 may be embodied as any type of computing or computer device capable of performing the functions described herein, including but not limited to a smartphone, mobile computing device, tablet computer, laptop computer, notebook computer, server (e.g., standalone, rack-mounted, blade, etc.), sled (e.g., computing sled, accelerator sled, storage sled, memory sled, etc.), network appliance (e.g., physical or virtual), web appliance, distributed computing system, processor-based system, and/or multiprocessor system. Although not illustratively shown, it is to be appreciated that the source computing device 102 includes similar and/or identical components to those of the illustrative network computing device 106. As such, for clarity of description, the figures and descriptions of the same components are not repeated herein, with the understanding that the description above regarding corresponding components provided by the network computing device 106 applies equally to corresponding components of the source computing device 102. Of course, it should be appreciated that the computing device may include additional and/or alternative components depending on the embodiment.

The network 104 may be embodied as any type of wired or wireless communication network, including, but not limited to, a Wireless Local Area Network (WLAN), a Wireless Personal Area Network (WPAN), an edge network (e.g., a multiple access edge computing (MEC) network), a fog network, a cellular network (e.g., global system for mobile communications (GSM), Long Term Evolution (LTE), 5G, etc.), a telephone network, a Digital Subscriber Line (DSL) network, a cable network, a Local Area Network (LAN), a Wide Area Network (WAN), a global network (e.g., the internet), or any combination thereof. It should be appreciated that in such embodiments, the network 104 may act as a centralized network, and in some embodiments may be communicatively coupled to another network (e.g., the internet). Thus, the network 104 may include a variety of other virtual and/or physical network computing devices (e.g., routers, switches, network hubs, servers, storage devices, computing devices, etc.) as needed to facilitate communications between the network computing device 106 and the source computing device 102, which are not shown to maintain clarity of description.

Referring now to FIG. 2, in use, the network computing device 106 establishes an environment 200 during operation. The illustrative environment 200 includes the processor(s) 108, HFI126, and cache line destaging device 130 of fig. 1, as well as a cache manager 214 and destage manager 220. The illustrative HFI126 includes a network traffic ingress/egress manager 208, the illustrative cache line destaging device 130 includes an interface manager 210, and the illustrative processor(s) 108 includes a packet processing operation manager 212. The various components of environment 200 may be embodied as hardware, firmware, software, or a combination thereof. As such, in some embodiments, one or more of the components of the environment 200 may be embodied as a set of circuits or electronic devices (e.g., network traffic ingress/egress management circuitry 208, downgraded device interface management circuitry 210, packet processing operation management circuitry 212, cache management circuitry 214, downgrade management circuitry 220, etc.).

As illustratively shown, the network traffic ingress/egress management circuitry 208, the downgrade device interface management circuitry 210, the packet processing operation management circuitry 212, the cache management circuitry 214, and the downgrade management circuitry 220 form part of particular components of the network computing device 106. However, while illustratively shown as being performed by particular components of the network computing device 106, it should be appreciated that in other embodiments, one or more functions described herein as being performed by the network traffic ingress/egress management circuitry 208, the downgraded device interface management circuitry 210, the packet processing operation management circuitry 212, the cache management circuitry 214, and/or the downgrade management circuitry 220 may be performed, at least in part, by one or more other components of the network computing device 106.

Further, in some embodiments, one or more of the illustrative components may form a portion of another component, and/or one or more of the illustrative components may be independent of each other. Additionally, in some embodiments, one or more of the components of environment 200 may be embodied as virtualized hardware components or emulated architectures, which may be established and maintained by HFI126, processor(s) 108, or other components of network computing device 106. It should be appreciated that the network computing device 106 may include other components, subcomponents, modules, sub-modules, logic, sub-logic and/or devices common in computing devices that are not illustrated in fig. 2 for clarity of description.

In the illustrative environment 200, the network computing device 106 additionally includes cache line address data 202, destage data 204, and network packet data 206, each of which may be accessed by various components and/or subcomponents of the network computing device 106. Additionally, each of the cache line address data 202, the destage data 204, and the network packet data 206 may be accessed by various components of the network computing device 106. Further, it should be appreciated that, in some embodiments, data stored in or otherwise represented by each of the cache line address data 202, the destage data 204, and the network packet data 206 may not be mutually exclusive with respect to one another. For example, in some implementations, data stored in cache line address data 202 may also be stored as part of one or more of degraded data 204 and/or network packet data 206, or in another alternative arrangement. As such, while various data utilized by the network computing device 106 is described herein as particular discrete data, in other embodiments such data may be combined, aggregated, and/or otherwise formed into portions of a single or multiple data sets, including duplicate copies.

The network traffic ingress/egress manager 208, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or combinations thereof, as discussed above, is configured to receive inbound network traffic and route/transmit outbound network traffic. To do so, the illustrative network traffic ingress/egress manager 208 is configured to facilitate inbound network communications (e.g., network traffic, network packets, network flows, etc.) to the network computing device 106 (e.g., from the source computing device 102). Thus, the network traffic ingress/egress manager 208 is configured to manage (e.g., create, modify, delete, etc.) connections to physical and virtual network ports (i.e., virtual network interfaces) of the network computing device 106 (e.g., via the communication circuitry 124), as well as ingress buffers/queues associated therewith.

Further, the network traffic ingress/egress manager 208 is configured to facilitate outbound network communications (e.g., network traffic, network packet streams, network flows, etc.) from the network computing devices 106. To do so, the network traffic ingress/egress manager 208 is configured to manage (e.g., create, modify, delete, etc.) connections to the physical and virtual network ports/interfaces of the network computing device 106 (e.g., via the communication circuitry 124), as well as egress buffers/queues associated therewith. In some embodiments, at least a portion of the network packet (e.g., at least a portion of a header of the network packet, at least a portion of a payload of the network packet, a checksum, etc.) may be stored in the network packet data 206.

The destaging device interface manager 210, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or combinations thereof, as discussed above, is configured to manage the interface of the cache line destaging device 130. For example, the destaging device interface manager 210 is configured to receive a cache line destage command from the processor(s) 108 that can be used to identify cache line(s) to be destaged from the core local cache 114 to the shared cache 116. Further, the destaging device interface manager 210 is configured to perform some operation (e.g., a read request) to destage one or more cache lines from the core local cache 114 to the shared cache 116 in response to having received a cache line destage command. It should be appreciated that the cache line demotion command includes an identifier for each cache line to be demoted from the core local cache 114 to the shared cache 116, and each identifier may be used by the cache line demotion device 130 to demote (e.g., copy, evict, etc.) the applicable cache line(s).

The packet processing operation manager 212, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or combinations thereof, as discussed above, is configured to identify which packet processing operations are to be performed on at least a portion of the data of the received network packet (e.g., a header field of the network packet, a portion of the payload of the network packet, etc.) and the associated processor core 110 that is to perform each packet processing operation. Further, in some embodiments, the packet processing operation manager 212 may be configured to identify when each packet processing operation has completed and provide an indication of the completion (e.g., to the destage manager 220). It should be appreciated that although described herein as being performed by an associated processor core 110, one or more of the packet processing operations may be performed by any type of computing device/logic (e.g., accelerator device/logic) that may be needed to access the cache memory 112.

The cache manager 214, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or combinations thereof, as discussed above, is configured to manage the cache memory 112 (e.g., the core local cache 114 and the shared cache 116). To do so, cache manager 214 is configured to manage the addition and eviction of entries into and out of cache 112. Thus, the cache manager 214, which may be embodied as or otherwise include a memory management unit, is further configured to record the results of the virtual to physical address translation. In such embodiments, the translation results may be stored in cache line address data 202. The cache manager 214 is further configured to facilitate fetching and storing of data from and to main memory, and destaging data from and to the applicable core local caches 114 to and from the shared cache 116, and promoting data from and to the shared cache 116 to and from the applicable core local caches 114.

The destage manager 220, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or combinations thereof, as discussed above, is configured to manage destaging of data from the core local caches 114 to the shared cache 116. To do so, the destage manager 220 is configured to transmit instructions to a cache memory manager (e.g., cache manager 214) to destage (e.g., copy, evict, etc.) processed data from the core local cache 114 to the shared cache 116, or to transmit commands to the cache line destaging device 130 to destage processed data from the core local cache 114 to the shared cache 116. To determine whether to send a cache line destage instruction to the cache manager 214 or a cache line destage command to the cache line destage device 130, the destage manager 220 is further configured to compare the size of the network packet to a predetermined packet size threshold.

If the destage manager 220 determines that the network packet size is greater than the packet size threshold, the destage manager 220 is configured to transmit a cache line destage instruction to the cache manager 214. Otherwise, if the destage manager 220 determines that the network packet size is less than or equal to the packet size threshold, the destage manager 220 is configured to transmit a cache line destage command to the cache line destage device 130. Further, the destage manager 220 is configured to include in the cache line destage instruction/command an identifier for each cache line or series of cache lines to be destaged from the core local cache 114 to the shared cache 116. As illustratively shown, the destage manager 220 may be configured to offload devices; however, in some embodiments, the functions as described herein may be performed by a portion of the processor 108 or the processor cores 110, or the destage manager 220 may otherwise form a portion of the processor 108 or the processor cores 110. It should be appreciated that in such a case where the next cache location is known in advance, the destage manager 220 may be configured to move the data to a known core local cache entry of the core local cache associated with the next processor core in the packet processing pipeline.

Referring now to FIG. 3, a method 300 for demoting a cache line to a shared cache is shown, which may be performed by a computing device (e.g., the network computing device 106 of FIGS. 1 and 2). The method 300 begins at block 302, where the network computing device 106 determines whether to process a network packet (e.g., the processor 108 has polled the HFI126 for the next packet to process). If so, the method 300 proceeds to block 304, where the network computing device 106 identifies one or more packet processing operations to be performed on at least a portion of the network packet by the processor cores 110. In block 306, the network computing device 106, or more particularly the requesting processor core 110, performs the identified packet processing operation(s) on the applicable portion of the network packet to be processed. It should be appreciated that although described herein as being performed by the requesting processor core 110, one or more of the packet processing operations may be performed by any type of computing device/logic (e.g., accelerator device/logic) that may be needed to access the cache memory 112.

In block 308, the network computing device 106 determines whether the requesting processor core 110 or the applicable computing device/logic has completed the identified packet processing operation(s), such as may be indicated by the requesting processor core 110. If so, the method 300 proceeds to block 310, where the network computing device 106 determines which cache line or lines in the core local cache 114 are associated with the processed network packet. Further, in block 312, the network computing device 106 identifies a size of the network packet. In block 314, the network computing device 106 compares the identified network packet size to a packet size threshold. In block 316, the network computing device 106 determines whether the identified network packet size is greater than a packet size threshold.

If the network computing device 106 determines that the identified network packet size is less than or equal to the packet size threshold, the method 300 branches to block 318, where the network computing device 106 is configured to transmit a cache line destage instruction to the cache manager 214 to destage one or more cache lines associated with the processed network packet from the core local cache 114 to the shared cache 116. Further, in block 320, the network computing device includes the cache line identifier for the cache line determined by each of the core local caches 114 in the cache line demotion instruction. Referring back to block 316, if the destage manager 220 determines that the network packet size is greater than the packet size threshold, the method 300 branches to block 322, where the network computing device 106 transmits a cache line destage command to the cache line destage device 130 to trigger a cache line destage operation to destage one or more cache lines associated with the processed network packet from the core local cache 114 to the shared cache 116. Further, in block 324, the network computing device 106 includes one or more cache line identifiers corresponding to one or more cache lines to be destaged in the cache line destage command.

Referring now to fig. 4 and 5, in use, the network computing device 106 establishes an illustrative environment 400 for destaging cache lines to the shared cache 116 via a cache line destage instruction and an illustrative environment 500 for destaging cache lines to the shared cache 116 via a cache line destage command to the cache line destage device 130. Referring now to fig. 4, the illustrative environment 400 includes the HFI126, the processor core 110, the core local cache 114, the shared cache 116, and the destaging device 130 of fig. 1, as well as the cache manager 214 of fig. 2. Each of the illustrative core local cache 114 and the shared cache 116 includes a plurality of cache entries.

As illustratively shown, the core local cache 114 includes a plurality of core local cache entries 404. The illustrative core local cache entry 404 includes: a first core local cache entry designated as core local cache entry (1) 404a, a second core local cache entry designated as core local cache entry (2) 404b, a third core local cache entry designated as core local cache entry (3) 404c, a fourth core local cache entry designated as core local cache entry (4) 404d, and a fifth core local cache entry designated as core local cache entry (N) 404e (i.e., "nth" core local cache entry 404, where "N" is a positive integer and designates one or more additional core local cache entries 404). Similarly, the illustrative shared cache 116 includes a plurality of shared cache entries 406. The illustrative shared cache entry 406 includes: a first shared cache entry designated as shared cache entry (1) 406a, a second shared cache entry designated as shared cache entry (2) 406b, a third shared cache entry designated as shared cache entry (3) 406c, a fourth shared cache entry designated as shared cache entry (4) 406d, and a fifth shared cache entry designated as shared cache entry (N) 406e (i.e., "nth" shared cache entry 406, where "N" is a positive integer and designates one or more additional shared cache entries 406).

Referring now to fig. 5, similar to the illustrative environment of fig. 4, the illustrative environment 500 includes the HFI126, the processor cores 110, the core local cache 114, the shared cache 116, and the destaging device 130 of fig. 1, as well as the cache manager 214 of fig. 2. As previously described, the processor core 110 is configured to poll available network packets from the HFI126 (e.g., via an HFI/host interface (not shown)) for processing and perform some level of processing operation on at least a portion of the data of the network packets. As also previously described, upon completion of a processing operation, the processor core 110 is further configured to provide some indication that one or more cache lines are to be demoted from the core local cache 114 to the shared cache 116.

Referring back to fig. 4, as illustratively shown, the indication provided by the processor core 110 takes the form of one or more cache line demotion instructions. It should be appreciated that each cache line destage instruction may be used to identify a cache line from the core local cache 114 and destage data to the shared cache 116. As such, it should be appreciated that such instructions may not be efficient enough for larger packets. Thus, the processor core 110 may be configured to: for larger blocks of data, a cache line destaging device is utilized to offload destaging operations. To do so, referring again to fig. 5, the processor core 110 may be configured to transmit a cache line destage command 502 to the cache line destage device 130 to trigger a cache line destage operation to be performed by the cache line destage device 130, such as may be performed via a data read request, a DMA request, or the like, or any other type of request that would result in data being destaged to the shared cache 116 as a side effect without wasting processor core cycles.

As illustratively shown in both fig. 4 and 5, the data in core local cache line (1) 404a, core local cache line (2) 404b, and core local cache line (3) 404c is associated with the processed network packet, as indicated by the highlighted outline surrounding each of those core local cache lines 404. As also illustratively shown, the cache line destage operation results in the data being destaged such that data in core local cache line (1) 404a is destaged to shared cache line (1) 406a, data in core local cache line (2) 404b is destaged to shared cache line (2) 406b, and data in core local cache line (3) 404c is destaged to shared cache line (3) 406 c; however, it should be appreciated that due to cache line destage operations, a destaged cache line may be moved to any available shared cache line 406.

Examples of the invention

Illustrative examples of the techniques disclosed herein are provided below. Embodiments of the techniques may include any one or more of the examples described below and any combination of the examples described below.

Example 1 includes a computing device to demote a cache line to a shared cache, the computing device comprising one or more processors, wherein each of the one or more processors comprises a plurality of processor cores; a cache memory, wherein the cache memory comprises a core local cache and a shared cache, wherein the core local cache comprises a plurality of core local cache lines, and wherein the shared cache comprises a plurality of shared cache lines; a cache line demotion device; and a Host Fabric Interface (HFI) to receive a network packet, wherein a processor core of a processor of the one or more processors is to retrieve at least a portion of data of the received network packet, wherein retrieving the data comprises moving the data into one or more of the plurality of core local cache lines; performing one or more processing operations on the data; and after the one or more processing operations on the data have been completed, transmitting a cache line destage command to the cache line destage device, and wherein the cache line destage device is to perform a cache line destage operation to destage the data from the one or more core local cache lines to one or more shared cache lines of the shared cache in response to having received the cache line destage command.

Example 2 includes the subject matter of example 1, and wherein the processor core is further to determine whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein transmitting the cache line demotion command to the cache line demotion device comprises transmitting the cache line demotion command after determining that the size of the received network packet is greater than the packet size threshold.

Example 3 includes the subject matter of any of examples 1 and 2, and wherein the processor core is further to transmit a cache line demotion instruction to a cache manager of the cache memory after having determined that the size of the received network packet is less than or equal to the packet size threshold, and wherein the cache manager is to demote data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on the cache line demotion instruction, wherein the cache line demotion instruction bypasses the cache line demotion device.

Example 4 includes the subject matter of any of examples 1-3, and wherein transmitting the cache line demotion instruction comprises transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 5 includes the subject matter of any of examples 1-4, and wherein performing a cache line demotion operation comprises performing a read request or a direct memory access.

Example 6 includes the subject matter of any of examples 1-5, and wherein the cache line demotion command includes an indication of a core local cache line associated with the received network packet to be demoted to the shared cache.

Example 7 includes the subject matter of any of examples 1-6, and wherein the cache line demotion device comprises one of a copy engine, a Direct Memory Access (DMA) device available to copy data, or an offload device available to perform read operations.

Example 8 includes the subject matter of any of examples 1-7, and wherein transmitting the cache line demotion command comprises transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 9 includes one or more machine-readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a computing device: retrieving, by a processor of a computing device, at least a portion of data of a network packet received by a Host Fabric Interface (HFI) of the computing device, wherein retrieving the data comprises moving the data into one or more of a plurality of core local cache lines of a core local cache of the computing device, and wherein the processor comprises a plurality of processor cores; performing, by a processor core of the plurality of processor cores, one or more processing operations on data; transmitting, by the processor and after the one or more processing operations on the data have been completed, a cache line destage command to a cache line destage device of the computing device; and performing, by the cache line destaging device and in response to having received the cache line destage command, a cache line destage operation to destage data from the one or more core local cache lines to one or more shared cache lines of a shared cache of the computing device.

Example 10 includes the subject matter of example 9, and wherein the processor core is further to determine whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein transmitting the cache line demotion command to the cache line demotion device comprises transmitting the cache line demotion command after determining that the size of the received network packet is greater than the packet size threshold.

Example 11 includes the subject matter of any of examples 9 and 10, and wherein the processor core is further to transmit a cache line demotion instruction to a cache manager of a cache memory including the core local cache and the shared cache after having determined that a size of the received network packet is less than or equal to the packet size threshold, and wherein the cache manager is to demote data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on the cache line demotion instruction.

Example 12 includes the subject matter of any of examples 9-11, and wherein transmitting the cache line demotion instruction includes transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 13 includes the subject matter of any of examples 9-12, and wherein performing a cache line demotion operation comprises performing a read request or a direct memory access.

Example 14 includes the subject matter of any of examples 9-13, and wherein transmitting the cache line demotion command includes transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 15 includes a method for demoting cache lines to a shared cache, the method comprising retrieving, by a processor of a computing device, at least a portion of data of a network packet received by a Host Fabric Interface (HFI) of the computing device, wherein retrieving the data comprises moving the data into one or more core local cache lines of a plurality of core local cache lines of a core local cache of the computing device, and wherein the processor comprises a plurality of processor cores; performing, by a processor core of the plurality of processor cores, one or more processing operations on data; transmitting, by the processor core and after the one or more processing operations on the data have been completed, a cache line demotion command to a cache line demotion device of the computing device; and performing, by the cache line destaging device and in response to having received the cache line destage command, a cache line destage operation to destage data from the one or more core local cache lines to one or more shared cache lines of a shared cache of the computing device.

Example 16 includes the subject matter of example 15, and further comprising determining whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein transmitting the cache line demotion command to the cache line demotion device comprises transmitting the cache line demotion command after determining that the size of the received network packet is greater than the packet size threshold.

Example 17 includes the subject matter of any of examples 15 and 16, and further comprising transmitting, by the processor core and after having determined that the size of the received network packet is less than or equal to the packet size threshold, a cache line demotion instruction to a cache manager of a cache memory comprising the core local cache and the shared cache; and demoting, by the cache manager, data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on the cache line demotion instruction.

Example 18 includes the subject matter of any of examples 15-17, and wherein transmitting the cache line demotion instruction includes transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 19 includes the subject matter of any of examples 15-18, and wherein performing a cache line demotion operation includes performing one of a read request or a direct memory access.

Example 20 includes the subject matter of any of examples 15-19, and wherein transmitting the cache line demotion command includes transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 21 includes a computing device to demote a cache line to a shared cache, the computing device comprising circuitry to retrieve, by a processor of the computing device, at least a portion of data of a network packet received by a Host Fabric Interface (HFI) of the computing device, wherein retrieving the data comprises moving the data into one or more of a plurality of core local cache lines of a core local cache of the computing device, and wherein the processor comprises a plurality of processor cores; circuitry to perform one or more processing operations on data by a processor core of the plurality of processor cores; circuitry for transmitting, by the processor core and after the one or more processing operations on the data have been completed, a cache line demotion command to a cache line demotion device of the computing device; and means for performing, by the cache line destaging device and in response to having received the cache line destage command, a cache line destage operation to destage data from the one or more core local cache lines to one or more shared cache lines of a shared cache of the computing device.

Example 22 includes the subject matter of example 21, and further comprising circuitry to determine whether a size of the received network packet is greater than a packet size threshold after the one or more processing operations on the data have been completed, wherein transmitting the cache line demotion command to the cache line demotion device comprises transmitting the cache line demotion command after determining that the size of the received network packet is greater than the packet size threshold.

Example 23 includes the subject matter of any of examples 21 and 22, and further comprising circuitry to transmit, by the processor core and after having determined that the size of the received network packet is less than or equal to the packet size threshold, a cache line demotion instruction to a cache manager of a cache memory comprising the core local cache and the shared cache; and demoting, by the cache manager, data from the one or more core local cache lines to the one or more shared cache lines of the shared cache based on the cache line demotion instruction.

Example 24 includes the subject matter of any of examples 21-23, and wherein transmitting the cache line demotion instruction includes transmitting one or more cache line identifiers corresponding to the one or more shared cache lines.

Example 25 includes the subject matter of any of examples 21-24, and wherein the means for performing a cache line demotion operation comprises means for performing one of a read request or a direct memory access.

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于延迟的不规则载荷的预取器

Techniques for demoting cache lines to a shared cache

相关技术

网友询问留言