System and method to facilitate efficient injection of packets into output buffers in a Network Interface Controller (NIC)

文档序号：1909825 发布日期：2021-11-30 浏览：11次中文

阅读说明：本技术 促进将包高效地注入网络接口控制器(nic)中的输出缓冲区中的系统和方法 (System and method to facilitate efficient injection of packets into output buffers in a Network Interface Controller (NIC) ) 是由 A·M·巴塔耶纳 T·L·科特 V·张 D·C·休森 P·昆都 E·P·伦德贝格于 2020-03-23 设计创作，主要内容包括：提供了一种能够将包高效地注入输出缓冲区中的网络接口控制器(NIC)。所述NIC可以配备有输出缓冲区、多个注入器、优先化逻辑块和选择逻辑块。所述多个注入器可以共享所述输出缓冲区。所述优先化逻辑块可以基于与相应注入器相关联的高水位和低水位来确定与所述注入器相关联的优先级。然后,所述选择逻辑块可以从所述多个注入器中确定与缓冲区类相关联的注入器子集并且确定所述注入器子集是否包括高优先级注入器。在识别所述注入器子集中包括高优先级注入器后,所述选择逻辑块可以选择所述高优先级注入器以用于将包注入所述输出缓冲区。(A Network Interface Controller (NIC) capable of efficiently injecting packets into an output buffer is provided. The NIC may be equipped with an output buffer, a plurality of injectors, a prioritization logic block, and a selection logic block. The plurality of injectors may share the output buffer. The prioritization logic may determine a priority associated with a respective injector based on the high and low water levels associated with the injector. The selection logic block may then determine a subset of injectors from the plurality of injectors that are associated with a buffer class and determine whether the subset of injectors includes a high priority injector. Upon identifying that a high priority injector is included in the subset of injectors, the selection logic block may select the high priority injector for injecting packets into the output buffer.)

1. A Network Interface Controller (NIC), comprising:

an output buffer;

a plurality of injectors sharing the output buffer;

a prioritization logic to determine a priority associated with a respective injector based on a high water level and a low water level associated with the injector; and

a selection logic block to:

determining a subset of injectors from the plurality of injectors associated with a buffer class;

determining whether the subset of injectors comprises high priority injectors; and

in response to identifying a high priority injector in the subset of injectors, selecting the high priority injector for injecting packets into the output buffer.

2. The network interface controller of claim 1, wherein, in response to determining that the subset of injectors does not include a high priority injector, the selection logic block is further to select a low priority injector for injecting packets into the output buffer.

3. The network interface controller of claim 1, wherein the prioritization logic is further to:

determining a command type associated with the respective injector;

in response to the command type being an Immediate Data Command (IDC), determining the high water level and the low water level based on a global limit; and

in response to the command type being a Direct Memory Access (DMA) command, determining the high water level and the low water level based on limitations specific to the injector.

4. The network interface controller of claim 3, wherein the command is issued to the network interface controller via a peripheral component interconnect express (PCIe) interface.

5. The network interface controller of claim 1, wherein the prioritization logic is further to obtain a number of cells in the buffer occupied by data from a respective injector.

6. The network interface controller of claim 5, wherein the prioritization logic is further to:

assigning a high priority to the injector in response to the number of units being less than or equal to the low water level; and

assigning a low priority to the injector in response to the number of units being greater than or equal to the high water level.

7. The network interface controller of claim 1, wherein the prioritization logic is further to:

assigning a high priority to the injector in response to detecting a priority reset; and

assigning a low priority to the injector in response to detecting expiration of a timer associated with the injector.

8. The network interface controller of claim 1, wherein the selection logic block is further to select the buffer class from a set of buffer classes enabled for the network interface controller.

9. The network interface controller of claim 1, wherein an injector is a message slicing unit (MCU) to generate packets from commands issued to the network interface controller.

10. The network interface controller of claim 1, wherein the output buffer is divided into a plurality of cells; and is

Wherein injecting the packet comprises injecting the packet into a next available cell.

11. A method, comprising:

identifying a plurality of injectors sharing an output buffer in a Network Interface Controller (NIC);

determining a priority associated with a respective injector based on a high water level and a low water level associated with the injector;

determining a subset of injectors from the plurality of injectors associated with a buffer class;

determining whether the subset of injectors comprises high priority injectors; and

in response to identifying a high priority injector in the subset of injectors, selecting the high priority injector for injecting packets into the output buffer.

12. The method of claim 11, in response to determining that the subset of injectors does not include a high priority injector, the method further comprising selecting a low priority injector for injecting packets into the output buffer.

13. The method of claim 11, further comprising:

determining a command type associated with the respective injector;

in response to the command type being an Immediate Data Command (IDC), determining the high water level and the low water level based on a global limit; and

in response to the command type being a Direct Memory Access (DMA) command, determining the high water level and the low water level based on limitations specific to the injector.

14. The method of claim 13, wherein the command is issued to the NIC via a peripheral component interconnect express (PCIe) interface.

15. The method of claim 11, further comprising obtaining a number of cells in the buffer occupied by data from a respective injector.

16. The method of claim 15, further comprising:

assigning a high priority to the injector in response to the number of units being less than or equal to the low water level; and

assigning a low priority to the injector in response to the number of units being greater than or equal to the high water level.

17. The method of claim 11, further comprising:

assigning a high priority to the injector in response to detecting a priority reset; and

assigning a low priority to the injector in response to detecting expiration of a timer associated with the injector.

18. The method of claim 11, further comprising selecting the buffer class from a set of buffer classes enabled for the NIC.

19. The method of claim 11, wherein an injector is a message slicing unit (MCU) to generate packets from commands issued to the NIC.

20. The method of claim 11, wherein the output buffer is divided into a plurality of cells; and is

Wherein injecting the packet comprises injecting the packet into a next available cell.

Technical Field

The present disclosure relates generally to the field of networking technology. More particularly, the present disclosure relates to systems and methods for facilitating efficient injection of packets into an output buffer in a Network Interface Controller (NIC).

Prior Art

As network-enabled devices and applications become more prevalent, various types of traffic and increasing network loads continue to demand higher performance from the underlying network architecture. For example, applications such as High Performance Computing (HPC), streaming media, and internet of things (IOT) may produce different types of traffic that are well characterized. Thus, in addition to traditional network performance metrics such as bandwidth and latency, network architects continue to face challenges such as scalability, versatility, and efficiency.

Background

Disclosure of Invention

A Network Interface Controller (NIC) capable of efficiently injecting packets into an output buffer is provided. The NIC may be equipped with an output buffer, a plurality of injectors, a prioritization logic block, and a selection logic block. The plurality of injectors may share the output buffer. The prioritization logic may determine a priority associated with a respective injector based on the high and low water levels associated with the injector. The selection logic block may then determine a subset of injectors from the plurality of injectors that are associated with a buffer class and determine whether the subset of injectors includes a high priority injector. Upon identifying that a high priority injector is included in the subset of injectors, the selection logic block may select the high priority injector for injecting packets into the output buffer.

Drawings

Fig. 1 illustrates an exemplary network.

Fig. 2A shows an exemplary NIC chip having multiple NICs.

Fig. 2B shows an exemplary architecture of a NIC.

Fig. 3A illustrates an exemplary packet injection in an output buffer in a NIC.

Fig. 3B illustrates an exemplary arbitration process for injecting packets into an output buffer in a NIC.

Fig. 4A shows a flow diagram of a priority assignment process for injecting packets into an output buffer in a NIC.

Fig. 4B shows a flow diagram of an arbitration process for injecting packets into an output buffer in a NIC.

Fig. 5 illustrates an example computer system equipped with a NIC that facilitates efficient injection of packets into an output buffer.

In the drawings, like reference numerals refer to like elements.

Detailed Description

Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. The invention is thus not limited to the embodiments shown.

SUMMARY

This disclosure describes systems and methods that facilitate efficient injection of packets into an output buffer in a Network Interface Controller (NIC). The NIC allows the host to communicate with a data driven network. The network can adapt to dynamic data traffic by maintaining state information for individual packet flows for fast, efficient congestion control. More specifically, packets injected into the switch network may be classified into flows, which may be mapped to their layer 2, layer 3, or other protocol specific header information. Each flow may be tagged with a different identifier local to the input port of the switch and provided with a flow specific input buffer so that each flow can be individually flow controlled. In addition, packets in the respective streams may be acknowledged upon reaching an exit point of the network, and the acknowledged packets may be sent back to the entry point of the stream in the opposite direction along the same data path. Thus, each switch can obtain state information for the active packet flows it is forwarding and can perform highly responsive, flow-specific flow control. Such flow control may allow the network to operate at higher capacity while providing general traffic engineering capabilities.

Embodiments described herein address the problem of efficiently allocating packets from multiple injectors to a shared output buffer of a NIC by: (i) determining priorities of respective injectors based on occupancy of buffers by the injectors; and (ii) arbitrating between the injectors based on the class and the determined priority. The injector may be any element of the NIC that may inject traffic into the buffer.

During operation, the NIC may receive commands from a host device of the NIC. The host interface of the NIC may couple the NIC with a host device and facilitate communication between the host device and the NIC. The command may be an Immediate Data Command (IDC) or a Direct Memory Access (DMA) command. The command carrying data associated with the command may be an IDC. On the other hand, commands with pointers to related data are referred to as DMA commands (DMACs) (e.g., Remote DMA (RDMA) "GET" or "PUT" commands). Further, traffic generated based on the commands may be assigned to different classes, such as a traffic shaping class or a buffer class. Each buffer class may be associated with one or more injectors. Alternatively, one injector may be assigned to one buffer class. Thus, multiple injectors may send packets in parallel for the same buffer class.

However, the injectors may share a common output buffer. Thus, if a large number of injectors share a buffer, one injector may occupy a large portion of the buffer due to flow non-uniformity and randomness. In contrast, another injector may not be able to obtain sufficient buffer capacity. Thus, the buffer may result in under-utilization of some injectors, while causing other injectors to encounter bottlenecks. In addition, buffers may be shared unfairly between the injector and the injector's buffer class.

To address this problem, the NIC may arbitrate between the injectors so that the capacity of the buffer is fairly allocated. The buffer may be divided into a plurality of cells. The corresponding cell may have a fixed size (e.g., 2048 bytes). The injector may insert traffic into the buffer at a granularity of the cell size. To ensure that buffer capacity is fairly allocated to the injector, the NIC may select the injector for inserting traffic into the next available cell based on one or more selection criteria. The selection criteria may allow the NIC to select an injector that is under-subscribed (or under-utilized) and avoid selecting an injector that is over-subscribed (or over-utilized). In addition, the NIC may also distinguish between IDCs and DMACs. In addition, the NIC may select an injector to ensure a fair allocation of buffer capacity among buffer classes. In some embodiments, the injector may be a message slicing unit (MCU) module that may segment a message into packets of a size corresponding to a Maximum Transmission Unit (MTU).

During operation, the NIC may assign priorities to the respective injectors based on limitations associated with the injectors (e.g., the maximum buffer capacity that may be assigned to the injectors). The NIC may determine the type of command associated with the injector. Since the DMAC typically stays in the buffer longer (e.g., due to the presence of additional memory accesses for obtaining associated data), the NIC may use the limitations of each injector for DMACs and the overall limitations for IDC commands. Based on the limits, the NIC may determine a high water level and a low water level of the injector, which may be used to determine over-subscription and under-subscription, respectively. A high water level may indicate that the capacity may be approaching a limit. On the other hand, a low water level may indicate that the capacity may be significantly less than the limit.

The NIC may then determine the number of cells in the buffer that are currently occupied by data from the injector. The NIC may assign a low priority to the injector if the capacity represented by the occupied cell is greater than or equal to the high water level. On the other hand, if the capacity represented by the occupied cell is less than or equal to the low water level, the NIC may assign a high priority to the injector. The NIC may assign a priority to each injector by repeating this process. The NIC may then perform a two-phase arbitration process to select an injector for the next available cell in the buffer. In a first phase, the NIC may select a buffer class (e.g., based on a weighted round robin selection).

Upon selecting a buffer class, the NIC may identify the corresponding injector associated with the buffer class. The NIC may then determine whether the identified injectors include an injector having a high priority. The NIC may select the identified injector if the injector includes at least one injector having a high priority. However, to select from among a plurality of such injectors, the NIC may use a selection policy, such as a polling selection and a first available selection, to select one of the injectors. On the other hand, if the identified injectors only include injectors with low priority, the NIC may select one of these injectors based on a selection policy. To ensure that the priority is not assigned to an injector for a long time, the NIC may periodically perturb the priority. In this way, the NIC may facilitate the allocation of packets to the shared output buffer in an efficient manner.

One embodiment of the present invention provides a NIC that may be equipped with an output buffer, a plurality of injectors, a prioritization logic block, and a selection logic block. The plurality of injectors may share the output buffer. The prioritization logic may determine a priority associated with a respective injector based on the high and low water levels associated with the injector. The selection logic block may then determine a subset of injectors from the plurality of injectors that are associated with a buffer class and determine whether the subset of injectors includes a high priority injector. Upon identifying a high priority injector in the subset of injectors, the selection logic block may select the high priority injector for injecting packets into the output buffer.

In a variation of this embodiment, the selection logic block may select a low priority injector for injecting packets into the output buffer if the subset of injectors does not include a high priority injector.

In a variation of this embodiment, the prioritization logic may determine a command type associated with a respective injector. The prioritization logic may determine the high water level and the low water level based on a global constraint if the command type is an Immediate Data Command (IDC). On the other hand, if the command type is a Direct Memory Access (DMA) command, the prioritization logic may determine the high and low water levels based on limitations specific to the injector.

In a further variation, the command is issued to the NIC via a peripheral component interconnect express (PCIe) interface.

In a variation of this embodiment, the prioritization logic may obtain the number of cells in the buffer occupied by data from the respective injector.

In a further variant, the prioritization logic may assign a high priority to the injector if the number of cells is less than or equal to the low water level. On the other hand, if the number of cells is greater than or equal to the high water level, the prioritization logic may assign a low priority to the injector.

In a variation of this embodiment, the prioritization logic may assign a high priority to the injector upon detecting that the priority is reset. Further, the prioritization logic may assign a low priority to the injector if a timer associated with the injector has expired.

In a variation of this embodiment, the selection logic may select the buffer class from a set of buffer classes enabled for the NIC.

In a variant of this embodiment, the injector may be a message slicing unit (MCU) for generating packets according to commands issued to said NIC.

In a variation of this embodiment, the output buffer is divided into a plurality of cells. Injecting the packet may then include injecting the packet into the next available cell.

In this disclosure, the description in connection with fig. 1 is related to network architecture, and the description in connection with fig. 2A and beyond provides more details regarding the architecture and operation associated with NICs that support efficient management of idempotent operations.

Fig. 1 illustrates an exemplary network. In this example, switch network 100 (which may also be referred to as a "switch fabric") may include switches 102, 104, 106, 108, and 110. Each switch may have a unique address or ID within the switch fabric 100. Various types of devices and networks may be coupled to the switch fabric. For example, storage array 112 may be coupled to switch fabric 100 via switch 110; infiniband (IB) -based HPC network 114 may be coupled to switch fabric 100 via switch 108; a plurality of end hosts, such as host 116, may be coupled to the switch fabric 100 via the switch 104; and the IP/ethernet network 118 may be coupled to the switch fabric 100 via the switch 102. In general, a switch may have edge ports and fabric ports. The edge port may be coupled to a device external to the structure. The fabric port may be coupled to another switch within the fabric via a fabric link. In general, traffic may be injected into the switch fabric 100 via an ingress port of an edge switch and exit the switch fabric 100 via an egress port of another (or the same) edge switch. An ingress link may couple a NIC of an edge device (e.g., HPC end host) to an ingress edge port of an edge switch. The switch fabric 100 may then transmit the traffic to an egress edge switch, which in turn may pass the traffic to a destination edge device via another NIC.

Exemplary NIC architecture

Fig. 2A shows an exemplary NIC chip having multiple NICs. Referring to the example in fig. 1, NIC chip 200 may be a custom Application Specific Integrated Circuit (ASIC) designed for host 116 to work with switch fabric 100. In this example, chip 200 may provide two separate NICs 202 and 204. Each NIC of chip 200 may be equipped with a Host Interface (HI) (e.g., an interface for connecting to a host processor) and a high-speed network interface (HNI) for communicating with links coupled to switch fabric 100 of fig. 1. For example, the NIC 202 may include the HI 210 and the HNI 220, and the NIC 204 may include the HI 211 and the HNI 221.

In some embodiments, the HI 210 may be a Peripheral Component Interconnect (PCI) interface or a peripheral component interconnect express (PCIe) interface. The HI 210 may be coupled to a host via a host connection 201, which may include N (e.g., N may be 16 in some chips) PCIe Gen 4 lanes capable of operating at signaling rates up to 25Gbps per lane. The HNI 210 may facilitate a high-speed network connection 203 that may communicate with links in the switch fabric 100 of fig. 1. The HNI 210 may operate at a total rate of 100Gbps or 200Gbps using M (e.g., M may be 4 in some chips) full-duplex serial lanes. Each of the M channels may operate at 25Gbps or 50Gbps based on non return to zero (NRZ) modulation or pulse amplitude modulation 4(PAM4), respectively. The HNI 220 may support Institute of Electrical and Electronics Engineers (IEEE)802.3 ethernet-based protocols, as well as enhanced frame formats that provide support for higher rate small messages.

The NIC 202 may support one or more of the following: message Passing Interface (MPI) based point-to-point messaging, Remote Memory Access (RMA) operations, offloading and scheduling of bulk data collective operations, and ethernet packet processing. When the host issues an MPI message, the NIC 202 may match the corresponding message type. Further, the NIC 202 may implement both the urgency protocol and the agreement protocol for MPI, thereby offloading corresponding operations from the host.

Further, RMA operations supported by the NIC 202 may include PUT, GET, and Atomic Memory Operations (AMO). The NIC 202 may provide reliable transmissions. For example, if the NIC 202 is an originating NIC, the NIC 202 may provide a retry mechanism for idempotent operations. Furthermore, connection-based error detection and retry mechanisms may be used for ordered operations that may manipulate the target state. The hardware of NIC 202 may maintain the state required for the retry mechanism. In this way, the NIC 202 may relieve the burden on the host (e.g., software). The policy for deciding the retry mechanism may be specified by the host through driver software to ensure flexibility of the NIC 202.

In addition, the NIC 202 may facilitate the scheduling of trigger operations, generic offload mechanisms, and dependent sequences of operations (such as bulk data sets). NIC 202 may support an Application Programming Interface (API), such as a libfabric API, to facilitate the provision of fabric communication services by switch fabric 100 of fig. 1 to applications running on host 116. The NIC 202 may also support a low-level network programming interface, such as a Portals API. Additionally, the NIC 202 may provide efficient ethernet packet processing that may include efficient transmission when the NIC 202 is the sender, flow manipulation when the NIC 202 is the target, and checksum calculation. Further, the NIC 202 may support virtualization (e.g., using a container or virtual machine).

Fig. 2B shows an exemplary architecture of a NIC. In the NIC 202, the port macro of the HNI 220 may facilitate low-level ethernet operations, such as Physical Coding Sublayer (PCS) and Media Access Control (MAC). Additionally, the NIC 202 may provide support for Link Layer Retry (LLR). The incoming packets may be parsed by the parser 228 and stored in the buffer 229. The buffer 229 may be a PFC buffer supplied to buffer a threshold amount (e.g., one microsecond) of delay bandwidth. HNI 220 may also include a control transmit unit 224 and a control receive unit 226 for managing outgoing and incoming packets, respectively.

NIC 202 may include a Command Queue (CQ) unit 230. CQ unit 230 may be responsible for fetching and issuing host-side commands. CQ unit 230 may include a command queue 232 and a scheduler 234. The command queue 232 may include two separate sets of queues for initiator commands (PUT, GET, etc.) and target commands (appendix, Search, etc.), respectively. The command queue 232 may be implemented as a circular buffer maintained in memory of the NIC 202. An application running on the host may write directly to the command queue 232. The scheduler 234 may include two separate schedulers for initiator commands and target commands, respectively. The initiator commands are sorted into the flow queue 236 based on a hash function. One of the flow queues 236 may be assigned to a unique flow. In addition, CQ unit 230 may further include a trigger action module 238, which is responsible for queuing and dispatching trigger commands.

The outbound transport engine (OXE)240 may pull the command from the flow queue 236 to process it for dispatch. OXE 240 may include an Address Translation Request Unit (ATRU)244, which may send an address translation request to Address Translation Unit (ATU) 212. ATU 212 may provide virtual to physical address translation on behalf of different engines, such as OXE 240, inbound transport engine (IXE)250, and Event Engine (EE) 216. ATU 212 may maintain a large translation cache 214. ATU 212 may perform the translation itself or may use a host-based Address Translation Service (ATS). OXE 240 may also include a message slicing unit (MCU)246 that may slice a large message into packets of a size corresponding to a Maximum Transmission Unit (MTU). MCU 246 may include a plurality of MCU modules. When the MCU module is available, the MCU module can obtain the next command from the assigned stream queue. The received data may be written into the data buffer 242. The MCU module can then send the packet header, the corresponding traffic classification, and the packet size to the traffic shaper 248. Shaper 248 may determine which requests made by MCU 246 may enter the network.

The selected packets may then be sent to a Packet and Connection Tracking (PCT) 270. The PCT 270 may store the packet in the queue 274. The PCT 270 may also maintain status information for outbound commands and update the status information when responses are returned. PCT 270 may also maintain packet status information (e.g., to allow matching of responses to requests), message status information (e.g., to track progress of multi-packet messages), initiator completion status information, and retry status information (e.g., to maintain information needed to retry a command if a request or response is lost). If no response is returned within the threshold time, the corresponding command may be stored in retry buffer 272. The PCT 270 may facilitate connection management for initiator commands and target commands based on the source table 276 and the target table 278, respectively. For example, the PCT 270 may update its source table 276 to track the status required to reliably deliver packets and message completion notifications. PCT 270 may forward the outgoing packet to HNI 220, which stores the packet in outbound queue 222.

The NIC 202 may also include an IXE 250 that provides packet processing when the NIC 202 is the target or destination. The IXE 250 may obtain the incoming packets from the HNI 220. The parser 256 may parse incoming packets and pass corresponding packet information to a List Processing Engine (LPE)264 or a Message State Table (MST)266 for matching. LPE 264 may match incoming messages to buffers. LPE 264 may determine the buffer and starting address to be used for each message. LPE 264 may also manage a pool of list entries 262 representing buffers, as well as exception messages. MST 266 may store the matching results and the information needed to generate the target end completion event. MST 266 may be used by unlimited operations, including multi-packet PUT commands and single-packet and multi-packet GET commands.

The parser 256 may then store the packet in the packet buffer 254. The IXE 250 may obtain the matching result for conflict checking. The DMA write and AMO module 252 may then issue the updates generated by the write and AMO operations to the memory. If a packet includes a command (e.g., a GET response) that generates a target-side memory read operation, the packet may be passed to OXE 240. The NIC 202 may also include an EE 216 that may receive requests to generate event notifications from other modules or units in the NIC 202. The event notification may specify that a fill event or a count event is generated. EE 216 may manage an event queue located within the host processor memory that writes complete events to the host processor memory. EE 216 may forward the counting event to CQ unit 230.

Efficient packet injection in NICs

Fig. 3A illustrates an exemplary packet injection in an output buffer in a NIC. In this example, host device 300 may include NIC 320. The host interface 322 of the NIC 320 may couple the NIC 320 with the device 300 and facilitate communication between the device 300 and the NIC 320. The NIC 320 may include an MCU 324, which may include a plurality of MCU modules 312, 314, 416, and 318. The MCU module in the MCU 324 may inject traffic into the shared output buffer 328. Thus, the MCU module in the MCU 324 may be an injector of the buffer 328. The traffic injected by MCU 324 may belong to different buffer classes. Each buffer class may be associated with one or more MCU modules. Alternatively, one MCU module may be assigned to one buffer class. Thus, multiple MCU modules can inject packets in parallel for the same buffer class.

However, since MCU modules may share buffer 328, one MCU module may occupy a significant portion of buffer 328 due to traffic non-uniformity and randomness. In contrast, another MCU module may not be able to obtain sufficient buffer capacity. Thus, the buffer 328 may result in under-utilization of some MCU modules, and cause other MCU modules to encounter bottlenecks. Furthermore, the buffer 328 may be shared unfairly between the MCU module and the buffer class of the MCU module.

To address this issue, the NIC 320 may be equipped with an arbiter 326 that may arbitrate among MCU modules in the MCU 324, such that the capacity of the buffer 328 is fairly allocated. The buffer 328 may be divided into a plurality of cells. The corresponding cells may have a fixed size. A corresponding MCU module (such as MCU module 312) may insert traffic into the buffer 328 at a granularity of the cell size. To ensure that the capacity of the buffer 328 is allocated fairly to MCU modules, the arbiter 326 may select MCU modules for insertion of traffic into the next available cell 330 based on one or more selection criteria. The selection criteria may allow the arbiter 326 to select MCU modules that are under-subscribed and avoid selecting MCU modules that are over-subscribed.

In addition, the arbiter 326 may also distinguish between IDCs and DMACs. Assume that the IDCs 342 are allocated to the MCU module 312 and the DMACs 344 are allocated to the MCU module 318. Since the DMAC 344 may reside in the buffer 328 for a longer period of time, the arbiter 326 may use limits specific to the MCU module 312 (e.g., the maximum number of cells that may be allocated to the buffer 328 of the MCU module) to determine whether to select the MCU module 312. On the other hand, the arbiter 326 may use global restrictions associated with the IDCs to determine whether to select the MCU block 318. In addition, the arbiter 326 may select MCU modules to ensure a fair allocation of buffer capacity among buffer classes.

Fig. 3B illustrates an exemplary arbitration process for injecting packets into an output buffer in a NIC. To facilitate efficient injection of packets into the buffer 328, the arbiter 326 may assign priorities to respective ones of the MCUs 324 based on restrictions associated with the MCU modules. The arbiter 326 may determine that the MCU block 312 is associated with the IDC 342. Accordingly, the arbiter 326 may determine the high and low water levels of the MCU block 312 based on the global limits associated with the IDCs. The arbiter 326 may then determine the number of cells in the buffer that are currently occupied by data from the MCU module 312. The number (or count) of occupied cells may be referred to as an Occupied Cell Count (OCC). If the OCC of the MCU module 312 is greater than or equal to the high water level, the arbiter 326 may assign a low priority to the MCU 312. On the other hand, if the OCC of the MCU module 312 is less than or equal to the low water level, the arbiter 326 may assign a high priority to the MCU module 312.

Similarly, arbiter 326 may assign priorities to MCU modules 314, 316, and 318 by repeating this process. For example, the arbiter 326 may determine that the MCU module 318 is associated with the DMAC 344. Accordingly, the arbiter 326 may determine the high and low water levels of the MCU module 318 based on the limits associated with the MCU module 318. The arbiter 326 may then determine the OCC of the MCU module 318. If the OCC of the MCU module 318 is greater than or equal to the high water level, the arbiter 326 may assign a low priority to the MCU 318. On the other hand, if the OCC of the MCU module 318 is less than or equal to the low water level, the arbiter 326 may assign a high priority to the MCU module 318.

The arbiter 326 may then perform a two-phase arbitration process 360 to select an MCU module for the next available cell 330 in the buffer 328. The arbitration process 360 may include a first stage arbitration 362 and a second stage arbitration 364. In arbitration 362, arbiter 326 may select a buffer class among the buffer classes enabled in NIC 320 (e.g., based on weighted round robin selection). In some embodiments, the NIC 320 may support N predefined buffer classes (e.g., 10 classes), each of which may correspond to a traffic shaping class associated with the traffic shaper 248 in fig. 2B. The buffer class may be enabled when sufficient resources are available for the buffer class. Such resources may include transmission credits associated with the retry buffer 272 and the source table 276 of fig. 2B, as well as the availability of the buffer 328 of the buffer class. The source table 276 may include one or more of the following: source Packet Table (SPT), Source Message Table (SMT), and Source Connection Table (SCT).

The NIC 320 may enable the buffer classes 352, 354, and 356. Buffer class 352 may include MCU modules 312 and 314; buffer class 354 may include MCU module 316; and buffer class 356 may include MCU module 318. The arbiter 326 may select the buffer class 352 by applying an arbitration 362 to the buffer classes 352, 354, and 356. Arbiter 326 may then identify that MCU modules 312 and 314 are associated with buffer class 352. Arbiter 326 may then apply arbitration 364 to MCU blocks 312 and 314. Arbitration 364 may select the MCU module with the high priority (if applicable). Otherwise, arbitration 364 may select the MCU module with the low priority.

Accordingly, arbiter 326 may determine whether MCU modules 312 and 314 comprise MCU modules having a high priority. For example, if MCU module 312 has a high priority, arbiter 326 may select MCU module 312. On the other hand, if the MCU modules 312 and 314 have a low priority, the arbiter 326 may select one of the MCU modules 312 and 314 based on a selection policy. If MCU module 312 is selected, MCU module 312 may inject the packet associated with command 342 into cell 330. It should be noted that if command 342 is larger than the MTU (e.g., a PUT command with a large amount of data), MCU module 312 may generate multiple packets based on command 342. After injecting the packet, the MCU module 312 may obtain the next packet associated with command 342. Subsequently, the MCU module 312 may again accept arbitration 360.

To ensure that the priority is not assigned to an MCU module for a long time, the arbiter 326 may periodically perturb the priority. For example, the priority may be reset (e.g., periodically or based on the NIC being reset). Upon reset, the respective MCU module may be assigned a high priority. On the other hand, the respective MCU module may be associated with a priority timer. If the timer expires, the corresponding MCU module may be assigned a low priority. Such perturbations may mitigate the effect of the MCU module maintaining a high priority for a significant period of time and adversely affecting the fairness of the arbitration 360. By ensuring fairness of arbitration 360, NIC 320 may facilitate injecting packets into buffer 328 in an efficient manner.

Fig. 4A shows a flow diagram of a priority assignment process for injecting packets into an output buffer in a NIC. During operation, the NIC may determine a command associated with the injector (operation 402). The NIC may then determine whether the command is a DMA command (operation 404). If the command is a DMA command, the NIC may determine the high and low water levels based on injector-specific limits (operation 416). On the other hand, if the command is not a DMA command (e.g., IDC), the NIC may determine the high and low water levels based on global constraints (operation 406). The NIC may then determine whether a reset has been triggered (operation 408). If a reset has not been triggered, the NIC may also determine whether a timer associated with the injector has expired (operation 410).

If the timer has not expired, the NIC may determine whether the OCC associated with the injector is greater than or equal to the high water level (operation 412). If the OCC is not greater than or equal to the high level, the NIC may determine whether the OCC is less than or equal to the low level (operation 414). If a reset has been triggered (operation 408) or the OCC is less than or equal to the low water level (operation 414), the NIC may assign a high priority to the injector (operation 420). On the other hand, if the associated timer has expired (operation 410) or the OCC is greater than or equal to the high water level (operation 412), the NIC may assign a low priority to the injector (operation 418).

Fig. 4B shows a flow diagram of an arbitration process for injecting packets into an output buffer in a NIC. During operation, the NIC may determine the enabled buffer class (operation 452) and select the buffer class based on the class selection policy (operation 454). The NIC may then identify the injectors associated with the buffer class (operation 456) and determine whether any high priority injectors exist in the selected injectors (operation 458). If there is at least one high priority injector among the selected injectors, the NIC may select an injector from the high priority injectors based on an injector selection policy (operation 460). Otherwise, the NIC may select an injector from the low priority injectors based on an injector selection policy (operation 462).

Exemplary computer System

Fig. 5 illustrates an exemplary computer system equipped with a NIC that facilitates efficient packet forwarding. The computer system 550 includes a processor 552, a memory device 554 and a storage device 556. Memory device 554 may include a volatile memory device (e.g., a dual in-line memory module (DIMM)). Further, the computer system 550 may be coupled to a keyboard 562, a pointing device 564, and a display device 566. The storage device 556 may store an operating system 570. The application programs 572 may operate on an operating system 570.

Computer system 550 may be equipped with a host interface to couple NIC 520 that facilitates efficient data request management. The NIC 520 may provide one or more HNIs to the computer system 550. NIC 520 may be coupled to switch 502 via one of the HNIs. NIC 520 may include arbitration logic 530, as described in connection with fig. 3A and 3B. The arbiter logic block 530 may include a tracking logic block 532, a priority logic block 534, and a selection logic block 536.

Tracking logic block 532 may track the OCC of the respective injector (e.g., MCU block) of NIC 520. The priority logic block 534 may determine the high and low water levels for the respective injector based on the command type associated with the injector. Priority logic block 534 may then determine and assign priorities to the respective injectors, as described in connection with fig. 4A. The selection logic block 536 may select an injector for injecting packets into the shared output buffer based on the priority and the buffer class of the respective injector, as described in connection with fig. 4B.

In summary, the present disclosure describes a NIC that facilitates efficient injection of packets into an output buffer. The NIC may be equipped with an output buffer, a plurality of injectors, a prioritization logic block, and a selection logic block. The plurality of injectors may share the output buffer. The prioritization logic may determine a priority associated with a respective injector based on the high and low water levels associated with the injector. The selection logic block may then determine a subset of injectors from the plurality of injectors that are associated with a buffer class and determine whether the subset of injectors includes a high priority injector. Upon identifying a high priority injector in the subset of injectors, the selection logic block may select the high priority injector for injecting packets into the output buffer.

The methods and processes described above may be performed by hardware logic blocks, modules, logic blocks, or devices. A hardware logic block, module, logic block, or apparatus may include, but is not limited to, an Application Specific Integrated Circuit (ASIC) chip, a Field Programmable Gate Array (FPGA), a dedicated or shared processor that executes code at a particular time, and other programmable logic devices now known or later developed. A hardware logic block, module or device when activated, performs the methods and processes included therein.

The methods and processes described herein may also be embodied as code or data, which may be stored in a storage device or computer-readable storage medium. The methods and processes may be performed by a processor when the stored code or data is read and executed by the processor.

The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. The description is not intended to be exhaustive or to limit the invention to the precise form disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the invention is defined by the appended claims.

20页详细技术资料下载

System and method to facilitate efficient injection of packets into output buffers in a Network Interface Controller (NIC)

相关技术

网友询问留言