System and method for facilitating efficient packet forwarding in a Network Interface Controller (NIC)

文档序号:1895346 发布日期:2021-11-26 浏览:14次 中文

阅读说明:本技术 促进网络接口控制器(nic)中的高效包转发的系统和方法 (System and method for facilitating efficient packet forwarding in a Network Interface Controller (NIC) ) 是由 R·L·阿尔弗森 P·昆都 D·罗威斯 D·C·休森 A·成 于 2020-03-23 设计创作,主要内容包括:提供了一种能够进行高效包转发的网络接口控制器(NIC)。所述NIC可以配备有主机接口、包生成逻辑块和转发逻辑块。在操作期间,所述包生成逻辑块可以经由所述主机接口从所述主机设备为远程设备获得消息。所述包生成逻辑块可以根据所述消息生成针对所述远程设备的多个包。然后,所述转发逻辑块可以基于有序传送来发送所述多个包的第一包子集。如果满足第一条件,则所述转发逻辑块可以基于无序传送来发送所述多个包的第二包子集。此外,如果满足第二条件,则所述转发逻辑块可以基于有序传送来发送所述多个包的第三包子集。(A Network Interface Controller (NIC) capable of efficient packet forwarding is provided. The NIC may be equipped with a host interface, packet generation logic, and forwarding logic. During operation, the packet generation logic may obtain a message from the host device for a remote device via the host interface. The packet generation logic may generate a plurality of packets for the remote device from the message. The forwarding logic block may then transmit a first subset of packets of the plurality of packets based on the in-order delivery. The forwarding logic block may send a second subset of packets of the plurality of packets based on out-of-order delivery if a first condition is satisfied. Further, the forwarding logic may send a third subset of the plurality of packets based on the in-order delivery if the second condition is satisfied.)

1. A Network Interface Controller (NIC), comprising:

a host interface; and

a packet generation logic block coupled to the host interface and configured to:

obtaining a message from a host device for a remote device via the host interface; and

generating a plurality of packets destined for the remote device from the message;

a forwarding logic block to:

transmitting a first subset of packets of the plurality of packets based on an ordered transfer;

transmitting, in response to a first condition, a second subset of packets of the plurality of packets based on out-of-order delivery; and

in response to a second condition, a third subset of packets of the plurality of packets is transmitted based on the in-order transmission.

2. The network interface controller of claim 1, wherein the packet generation logic is further to determine that the size of the message is greater than a first threshold.

3. The network interface controller of claim 1, wherein triggering the first condition comprises receiving a response from the remote device to one of the first subset of packets.

4. The network interface controller of claim 1, wherein triggering the second condition comprises determining that a number of packets in the third subset of packets is less than a second threshold.

5. The network interface controller of claim 4, wherein the second threshold indicates a number of outstanding packets in the first subset of packets and the second subset of packets.

6. The network interface controller of claim 1, wherein the forwarding logic block is further to:

identifying a last packet in the third subset of packets; and

not sending the last packet until receiving a respective response to all packets in the second subset of packets.

7. The network interface controller of claim 1, wherein the first, second, and third packet subsets are transmitted in a non-overlapping order.

8. The network interface controller of claim 1, wherein the forwarding logic block is further to:

maintaining a first counter indicating a number of outstanding packets in the first subset of packets and the third subset of packets; and

maintaining a second counter indicating a number of outstanding packets in the second subset of packets.

9. The network interface controller of claim 1, wherein the forwarding logic block is further to set a flag in respective packets in the first subset of packets and the third subset of packets, wherein the flag indicates that packets need to be transmitted in order.

10. The network interface controller of claim 1, wherein the message corresponds to a Remote Direct Memory Access (RDMA) command.

11. A method, comprising:

obtaining a message from a host device for a remote device via a host interface;

generating a plurality of packets destined for the remote device from the message;

transmitting a first subset of packets of the plurality of packets based on an ordered transfer;

transmitting, in response to a first condition, a second subset of packets of the plurality of packets based on out-of-order delivery; and

in response to a second condition, a third subset of packets of the plurality of packets is transmitted based on the in-order transmission.

12. The method of claim 11, further comprising determining that the size of the message is greater than a first threshold.

13. The method of claim 11, wherein triggering the first condition comprises receiving a response from the remote device to one of the first subset of packets.

14. The method of claim 11, wherein triggering the second condition comprises determining that a number of packets in the third subset of packets is less than a second threshold.

15. The method of claim 11, wherein the second threshold indicates a number of outstanding packets in the first subset of packets and the second subset of packets.

16. The method of claim 11, further comprising:

identifying a last packet in the third subset of packets; and

not sending the last packet until receiving a respective response to all packets in the second subset of packets.

17. The method of claim 11, wherein the first, second, and third packet subsets are transmitted in a non-overlapping order.

18. The method of claim 11, further comprising:

maintaining a first counter indicating a number of outstanding packets in the first subset of packets and the third subset of packets; and

maintaining a second counter indicating a number of outstanding packets in the second subset of packets.

19. The method of claim 11, further comprising setting a flag in respective packets in the first subset of packets and the third subset of packets, wherein the flag indicates that packets need to be transmitted in order.

20. The method of claim 11, wherein the message corresponds to a Remote Direct Memory Access (RDMA) command.

Technical Field

The present disclosure relates generally to the field of networking technology. More particularly, the present disclosure relates to systems and methods for facilitating efficient packet forwarding in a Network Interface Controller (NIC).

Prior Art

As network-enabled devices and applications become more prevalent, various types of traffic and increasing network loads continue to demand higher performance from the underlying network architecture. For example, applications such as High Performance Computing (HPC), streaming media, and internet of things (IOT) may produce different types of traffic that are well characterized. Thus, in addition to traditional network performance metrics such as bandwidth and latency, network architects continue to face challenges such as scalability, versatility, and efficiency.

Background

Disclosure of Invention

A Network Interface Controller (NIC) capable of efficient packet forwarding is provided. The NIC may be equipped with a host interface, packet generation logic, and forwarding logic. The host interface may couple host devices. During operation, the packet generation logic may obtain a message from the host device for a remote device via the host interface. The packet generation logic may generate a plurality of packets for the remote device from the message. The forwarding logic block may then transmit a first subset of packets of the plurality of packets based on the in-order delivery. The forwarding logic block may send a second subset of packets of the plurality of packets based on out-of-order delivery if a first condition is satisfied. Further, the forwarding logic may send a third subset of the plurality of packets based on the in-order delivery if the second condition is satisfied.

Drawings

Fig. 1 illustrates an exemplary network.

Fig. 2A shows an exemplary NIC chip having multiple NICs.

Fig. 2B shows an exemplary architecture of a NIC.

Fig. 3A illustrates an exemplary switch to out-of-order packet forwarding in a NIC.

Fig. 3B illustrates an exemplary switch to in-order packet forwarding in a NIC.

Fig. 4A shows a flow diagram of a message selection process for Input Output Input (IOI) packet forwarding in a NIC.

Fig. 4B shows a flow diagram of an IOI packet forwarding process in a NIC.

Fig. 4C shows a flow diagram of the IOI packet forwarding process for the last packet in the NIC.

Fig. 5 illustrates an exemplary computer system equipped with a NIC that facilitates efficient packet forwarding.

In the drawings, like reference numerals refer to like elements.

Detailed Description

Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. The invention is thus not limited to the embodiments shown.

SUMMARY

The present disclosure describes systems and methods that facilitate efficient packet forwarding in a Network Interface Controller (NIC). The NIC allows the host to communicate with a data driven network. The network can adapt to dynamic data traffic by maintaining state information for individual packet flows for fast, efficient congestion control. More specifically, packets injected into the switch network may be classified into flows, which may be mapped to their layer 2, layer 3, or other protocol-specific header information. Each flow may be tagged with a different identifier local to the input port of the switch and provided with a flow-specific input buffer so that each flow can be individually flow-controlled. In addition, packets in the respective streams may be acknowledged upon reaching an exit point of the network, and the acknowledged packets may be sent back to the entry point of the stream in the opposite direction along the same data path. Thus, each switch can obtain state information for the active packet flows it is forwarding and can perform high-responsiveness, flow-specific flow control. Such flow control may allow the network to operate at higher capacity while providing general traffic engineering capabilities.

Embodiments described herein address the problem of efficiently forwarding ordered packet flows by: (i) forwarding an initial group of packets and a final group of packets in the flow in sequence; and (ii) switch to out-of-order forwarding for the remaining packets in the flow. In this manner, the NIC may facilitate the orderly transfer of the first and last packets in the flow, and the orderly transfer of the intermediate packets in the flow.

During operation, an application that may be running on a source device of the NIC may issue a message indicating a data operation (e.g., a Remote Direct Memory Access (RDMA) "GET" or "PUT" command) on a memory location of a remote target device. The NICs of the source device and the destination device may be referred to as a source NIC and a destination NIC, respectively. The operations may be idempotent operations or non-idempotent operations. Idempotent operations can be performed more than once without causing errors. On the other hand, a non-idempotent operation may be performed once. Performing a non-idempotent operation more than once may result in errors. In general, if an idempotent RDMA operation is not completed, the target device's software (e.g., operating system) may replay the operation, rather than the target NIC performing the operation.

The message indicating the operation may be a large message that may be transmitted via multiple packets. Message semantics may require that the packets be delivered in order. For example, memory access related messages may require packets to be transmitted in sequence. However, in-order delivery may generate a large amount of overhead (such as transmission over a predetermined path, strict performance of in-order packet transmission, and packet dropping of out-of-order packets), which may result in inefficient data forwarding. Consequently, the orderly delivery of large messages can adversely affect performance.

To address this issue, the originating NIC may use both in-order packet delivery and out-of-order packet delivery for messages to improve performance while maintaining the order of message boundaries. In particular, if the message is for idempotent operation, the NIC may send some of the packets based on out-of-order delivery. During operation, the originating NIC may receive messages that are larger than the Maximum Transmit Unit (MTU). Thus, the originating NIC may generate multiple packets from the message based on the MTU. Since multiple packets may include a portion of a message in their respective payloads, the packets may be referred to as a packet stream. The originating NIC may then determine whether the size of the message (or packet in the packet stream) is greater than a size threshold. In some embodiments, the size threshold may correspond to a transfer time greater than twice the size of a Round Trip Time (RTT) between the originating NIC and the destination NIC. The NIC may dynamically determine the threshold based on the RTT and the effective bandwidth. When the first response is returned to the NIC, the NIC may measure RTT and effective bandwidth based on the number and size of outstanding in-order packets.

If the message size is greater than the size threshold, the originating NIC may initiate an input-output-input (IOI) packet transfer for the packet stream. To facilitate IOI packet transmission, the originating NIC may forward an initial set of packets for in-order delivery. Each of these packets may include a sequence number and an indicator indicating in-order delivery. The destination NIC may receive one or more packets and issue a corresponding response. Since these packets are ordered packets, the response may also be a cumulative response. However, since the responses may not be in order, the originating NIC may receive any one of the responses. Based on the received first response, the originating NIC may determine that the destination NIC has successfully received all packets within the sequence number of the response.

The originating NIC may then switch to out-of-order delivery for subsequent packets. The source NIC may switch back to in-order delivery when the number of remaining packets becomes less than the switch threshold (i.e., the end-of-message packets). In some embodiments, the handover threshold may indicate a number of outstanding packets. An outstanding packet is a packet for which the source NIC has not received a response. To further ensure that the last packet is delivered in order, the originating NIC may not send the last packet in the packet stream until the originating NIC has received a response to all out-of-order packets. In this manner, the originating NIC may use IOI packet transport, which may combine both in-order and out-of-order packet delivery, thereby facilitating efficient packet forwarding for large messages.

One embodiment of the present invention provides a NIC that may be equipped with a host interface, packet generation logic, and forwarding logic. The host interface may couple host devices. During operation, the packet generation logic may obtain a message from the host device for a remote device via the host interface. The packet generation logic may generate a plurality of packets for the remote device from the message. The forwarding logic block may then transmit a first subset of packets of the plurality of packets based on the in-order delivery. The forwarding logic block may send a second subset of packets of the plurality of packets based on out-of-order delivery if a first condition is satisfied. Further, the forwarding logic may send a third subset of the plurality of packets based on the in-order delivery if the second condition is satisfied.

In a variation of this embodiment, the packet generation logic may determine that the size of the message is greater than a first threshold.

In a variation of this embodiment, triggering the first condition may include receiving a response to one of the first subset of packets from the remote device.

In a variation of this embodiment, triggering the second condition may include determining that a number of packets in the third subset of packets is less than a second threshold.

In a further variant, the second threshold indicates a number of outstanding packets in the first subset of packets and the second subset of packets.

In a variant of this embodiment, the forwarding logic block may identify the last packet in the third subset of packets and not send the last packet until a respective response is received for all packets in the second subset of packets.

In a variation of this embodiment, the first, second and third packet subsets may be transmitted in a non-overlapping order.

In a variation of this embodiment, the forwarding logic block may maintain a first counter indicating a number of outstanding packets in the first subset of packets and the third subset of packets. The forwarding logic block may also maintain a second counter indicating a number of outstanding packets in the second subset of packets.

In a variation of this embodiment, the forwarding logic may set a flag in respective packets in the first subset of packets and the third subset of packets. The flag may indicate that packets need to be transmitted in order.

In a variation of this embodiment, the message corresponds to an RDMA command.

In this disclosure, the description in connection with fig. 1 is related to network architecture, and the description in connection with fig. 2A and beyond provides more details regarding the architecture and operation associated with NICs that support efficient management of idempotent operations.

Fig. 1 illustrates an exemplary network. In this example, switch network 100 (which may also be referred to as a "switch fabric") may include switches 102, 104, 106, 108, and 110. Each switch may have a unique address or ID within the switch fabric 100. Various types of devices and networks may be coupled to the switch fabric. For example, storage array 112 may be coupled to switch fabric 100 via switch 110; infiniband (IB) -based HPC network 114 may be coupled to switch fabric 100 via switch 108; a plurality of end hosts, such as host 116, may be coupled to the switch fabric 100 via the switch 104; and the IP/ethernet network 118 may be coupled to the switch fabric 100 via the switch 102. In general, a switch may have edge ports and fabric ports. The edge port may be coupled to a device external to the structure. The fabric port may be coupled to another switch within the fabric via a fabric link. In general, traffic may be injected into the switch fabric 100 via an ingress port of an edge switch and exit the switch fabric 100 via an egress port of another (or the same) edge switch. An ingress link may couple a NIC of an edge device (e.g., HPC end host) to an ingress edge port of an edge switch. The switch fabric 100 may then transmit the traffic to an egress edge switch, which in turn may transmit the traffic to a destination edge device via another NIC.

Exemplary NIC architecture

Fig. 2A shows an exemplary NIC chip having multiple NICs. Referring to the example in fig. 1, NIC chip 200 may be a custom Application Specific Integrated Circuit (ASIC) designed for host 116 to work with switch fabric 100. In this example, chip 200 may provide two separate NICs 202 and 204. Each NIC of chip 200 may be equipped with a Host Interface (HI) (e.g., an interface for connecting to a host processor) and a high-speed network interface (HNI) for communicating with links coupled to switch fabric 100 of fig. 1. For example, the NIC 202 may include the HI 210 and the HNI 220, and the NIC 204 may include the HI 211 and the HNI 221.

In some embodiments, the HI 210 may be a Peripheral Component Interconnect (PCI) interface or a peripheral component interconnect express (PCIe) interface. The HI 210 may be coupled to a host via a host connection 201, which may include N (e.g., N may be 16 in some chips) PCIe Gen 4 lanes capable of operating at signaling rates up to 25Gbps per lane. The HNI 210 may facilitate a high-speed network connection 203 that may communicate with links in the switch fabric 100 of fig. 1. The HNI 210 may operate at a total rate of 100Gbps or 200Gbps using M (e.g., M may be 4 in some chips) full-duplex serial lanes. Each of the M channels may operate at 25Gbps or 50Gbps based on non return to zero (NRZ) modulation or pulse amplitude modulation 4(PAM4), respectively. The HNI 220 may support Institute of Electrical and Electronics Engineers (IEEE)802.3 ethernet-based protocols, as well as enhanced frame formats that provide support for higher rate small messages.

The NIC 202 may support one or more of the following: message Passing Interface (MPI) based point-to-point messaging, Remote Memory Access (RMA) operations, offloading and scheduling of bulk data collective operations, and ethernet packet processing. When the host issues an MPI message, the NIC 202 may match the corresponding message type. Further, the NIC 202 may implement both the urgency protocol and the agreement protocol for MPI, thereby offloading corresponding operations from the host.

Further, RMA operations supported by the NIC 202 may include PUT, GET, and Atomic Memory Operations (AMO). The NIC 202 may provide reliable transmissions. For example, if the NIC 202 is an originating NIC, the NIC 202 may provide a retry mechanism for idempotent operations. Furthermore, connection-based error detection and retry mechanisms may be used for ordered operations that may manipulate the target state. The hardware of NIC 202 may maintain the state required for the retry mechanism. In this way, the NIC 202 may relieve the burden on the host (e.g., software). The policy for deciding the retry mechanism may be specified by the host through driver software to ensure flexibility of the NIC 202.

In addition, the NIC 202 may facilitate the scheduling of trigger operations, generic offload mechanisms, and dependent sequences of operations (such as bulk data sets). NIC 202 may support an Application Programming Interface (API), such as a libfabric API, to facilitate the provision of fabric communication services by switch fabric 100 of fig. 1 to applications running on host 116. The NIC 202 may also support a low-level network programming interface, such as a Portals API. Additionally, the NIC 202 may provide efficient ethernet packet processing that may include efficient transmission when the NIC 202 is the sender, flow manipulation when the NIC 202 is the target, and checksum calculation. Further, the NIC 202 may support virtualization (e.g., using a container or virtual machine).

Fig. 2B shows an exemplary architecture of a NIC. In the NIC 202, the port macro of the HNI 220 may facilitate low-level ethernet operations, such as Physical Coding Sublayer (PCS) and Media Access Control (MAC). Additionally, the NIC 202 may provide support for Link Layer Retry (LLR). The incoming packets may be parsed by the parser 228 and stored in the buffer 229. The buffer 229 may be a PFC buffer supplied to buffer a threshold amount (e.g., one microsecond) of delay bandwidth. HNI 220 may also include a control transmit unit 224 and a control receive unit 226 for managing outgoing and incoming packets, respectively.

NIC 202 may include a Command Queue (CQ) unit 230. CQ unit 230 may be responsible for fetching and issuing host-side commands. CQ unit 230 may include a command queue 232 and a scheduler 234. The command queue 232 may include two separate sets of queues for initiator commands (PUT, GET, etc.) and target commands (appendix, Search, etc.), respectively. The command queue 232 may be implemented as a circular buffer maintained in memory of the NIC 202. An application running on the host may write directly to the command queue 232. The scheduler 234 may include two separate schedulers for initiator commands and target commands, respectively. The initiator commands are sorted into the flow queue 236 based on a hash function. One of the flow queues 236 may be assigned to a unique flow. In addition, CQ unit 230 may further include a trigger action module 238, which is responsible for queuing and dispatching trigger commands.

The outbound transport engine (OXE)240 may pull the command from the flow queue 236 to process it for dispatch. OXE 240 may include an Address Translation Request Unit (ATRU)244, which may send an address translation request to Address Translation Unit (ATU) 212. ATU 212 may provide virtual to physical address translation on behalf of different engines, such as OXE 240, inbound transport engine (IXE)250, and Event Engine (EE) 216. ATU 212 may maintain a large translation cache 214. ATU 212 may perform the translation itself or may use a host-based Address Translation Service (ATS). OXE 240 may also include a message slicing unit (MCU)246 that may slice a large message into packets of a size corresponding to a Maximum Transmission Unit (MTU). MCU 246 may include a plurality of MCU modules. When the MCU module is available, the MCU module can obtain the next command from the assigned stream queue. The received data may be written into the data buffer 242. The MCU module may then send the packet header, the corresponding traffic classification, and the packet size to the traffic shaper 248. Shaper 248 may determine which requests made by MCU 246 may enter the network.

The selected packets may then be sent to a Packet and Connection Tracking (PCT) 270. The PCT 270 may store the packet in the queue 274. The PCT 270 may also maintain status information for outbound commands and update the status information when responses are returned. PCT 270 may also maintain packet status information (e.g., to allow matching of responses to requests), message status information (e.g., to track progress of multi-packet messages), initiator completion status information, and retry status information (e.g., to maintain information needed to retry a command if a request or response is lost). If no response is returned within the threshold time, the corresponding command may be stored in retry buffer 272. The PCT 270 may facilitate connection management for initiator commands and target commands based on the source table 276 and the target table 278, respectively. For example, the PCT 270 may update its source table 276 to track the status required to reliably deliver packets and message completion notifications. PCT 270 may forward the outgoing packet to HNI 220, which stores the packet in outbound queue 222.

The NIC 202 may also include an IXE 250 that provides packet processing when the NIC 202 is the target or destination. The IXE 250 may obtain the incoming packets from the HNI 220. The parser 256 may parse incoming packets and pass corresponding packet information to a List Processing Engine (LPE)264 or a Message State Table (MST)266 for matching. LPE 264 may match incoming messages to buffers. LPE 264 may determine the buffer and starting address to be used for each message. LPE 264 may also manage a pool of list entries 262 representing buffers, as well as exception messages. MST 266 may store the matching results and the information needed to generate the target end completion event. MST 266 may be used by unlimited operations, including multi-packet PUT commands and single-packet and multi-packet GET commands.

The parser 256 may then store the packet in the packet buffer 254. The IXE 250 may obtain the matching result for conflict checking. The DMA write and AMO module 252 may then issue the updates generated by the write and AMO operations to the memory. If a packet includes a command (e.g., a GET response) that generates a target-side memory read operation, the packet may be passed to OXE 240. The NIC 202 may also include an EE 216 that may receive requests to generate event notifications from other modules or units in the NIC 202. The event notification may specify that a fill event or a count event is generated. EE 216 may manage an event queue located within the host processor memory that writes complete events to the host processor memory. EE 216 may forward the counting event to CQ unit 230.

Efficient packet forwarding in NICs

Fig. 3A illustrates an exemplary switch to out-of-order packet forwarding in a NIC. In this example, devices 302 and 304 may be coupled to each other via a switch fabric 310. Devices 302 and 304 may be equipped with NICs 320 and 330, respectively. HNIs 322 and 332 may couple NICs 320 and 330, respectively, to switch fabric 310. NIC 320 may be equipped with PCT 324 and OXE 326, and NIC 330 may be equipped with PCT 334 and IXE 336.

During operation, an application that may be running on the device 320 may issue a message 340 that may indicate that a data operation (e.g., an RDMA operation) is to be performed on the device 330. The size of message 340 may be larger than the MTU. Thus, the NIC 320 may generate a plurality of packets 370 from the message 340 based on the MTU to send the message 340 over the switch fabric 310. The semantics of message 340 may require the packets 370 to be delivered in order. For example, if message 340 relates to a DMA operation, then packet 370 may need to be transferred in order. However, the in-order delivery of the packets 370 may incur a significant amount of overhead, such as transmitting through a predetermined path in the switch fabric 310, strictly performing in-order packet transmission from the NIC 320, and packet dropping out-of-order packets at the NIC 330. Thus, the orderly delivery of the packets 370 may adversely affect the performance of the data transmission.

To address this issue, the NIC 320 may use both in-order delivery and out-of-order delivery of packets 370 to improve performance while maintaining the order of the boundaries of the messages 340. Since each of the packets 370 may include a portion of the message 340 in its respective payload, the packets 370 may also be referred to as a packet flow 370. The NIC 320 may then determine whether the size of the message 340 (or the packet in the packet flow 370) is greater than a size threshold. The size threshold may correspond to a size of the transmission time greater than twice the RTT via the switch fabric 310. The NIC 320 may dynamically adjust the size threshold based on the bandwidth and latency of the response achieved by the concurrent messages forwarded by the HNI 322.

If the message size is greater than the size threshold, the OXE 326 of the NIC 320 may initiate an IOI packet transfer for the packet flow 370. To facilitate IOI packet transmission, OXE 326 may forward an initial set of packets 342 and 344 for in-order delivery. Each of these packets may include a sequence number associated with message 340 and an indicator indicating that out-of-order delivery is to be performed. The header of each of the packets 342 and 344 may include a Differentiated Services Code Point (DSCP) value that may indicate in-order delivery.

For example, the NIC 320 may set a flag in the header to indicate that the NIC 330 should check the sequence numbers in the headers of the packets 342 and 344 to order them. Accordingly, when the NIC 330 receives the packets 342 and 344, the NIC 330 may check the respective sequence numbers of the packets 342 and 344 and process them in sequence. The NIC 330 may also issue a corresponding response. Since these packets are ordered packets, the response may also be a cumulative response. However, since the responses may not be in order, the NIC 320 may receive any of the responses issued by the NIC 330.

Assume that the NIC 320 receives a response 350, which may be a response to the packet 344. Based on the sequence number of response 350, NIC 320 may determine that all packets up to packet 344 (i.e., packets 342 and 344) have been received by NIC 330. Upon receiving the response 350, the PCT 324 may notify the OXE 326 until all of the packets 344 have been received by the NIC 330. Thus, OXE 326 may switch to out-of-order delivery for subsequent packets 346 and 348.

In some embodiments, OXE 326 may maintain respective counters for ordered and unordered packets. For example, the OXE 326 may increment an Ordered Packet Counter (OPC) when sending each of the packets 342 and 344 and decrement the OPC when receiving notification from the PCT 324 about the response 350. Since response 350 may acknowledge both packets 342 and 344, response 350 may cause OXE 326 to decrement OPC twice. OXE 326 may also maintain an out-of-order packet counter (UPC). OXE 326 may increment the UPC when sending each of packets 346 and 348. Based on OPC and UPC, OXE 326 may track the number of outstanding ordered and unordered packets, respectively.

The NIC 320 may switch back to the in-order delivery when the number of remaining packets becomes less than the switch threshold. Fig. 3B illustrates an exemplary switch to in-order packet forwarding in a NIC. If the OXE 326 has sent outstanding out-of-order packets 452, 356, and 356, the value of the UPC may be 3. On the other hand, OXE 326 may determine that message 340 leaves packets 358 and 360. Therefore, the number of remaining packets may be 2. As each out-of-order packet is sent, OXE 326 may determine whether the number of remaining packets has become less than the handoff threshold. The handover threshold may correspond to a combination (e.g., sum) of the OPC value and the UPC value.

When sending the packets 356, the OXE 326 may determine that the number of remaining packets (2 in this example) has become less than the value of the UPC (3 in this example). Accordingly, OXE 326 may determine that the remaining packets 358 and 360 correspond to the end of message 340. Thus, OXE 326 may switch back to in-order delivery. Accordingly, OXE 326 may send packet 358 based on in-order delivery by setting a corresponding flag in the header of packet 358. When the NIC 330 receives the packet 358, the NIC 330 determines that the packet 358 needs to be processed in order based on the flag. NIC 330 may then process the sequence number in the header of packet 358.

To further ensure that the packet 360, which may be the last packet, is transmitted in order, the OXE 326 may not send the packet 360 until the NIC 320 receives a response to all out-of-order packets (i.e., the value of UPC becomes 0). OXE 326 may decrement the UPC for the response of each of packets 352, 354, and 356. As a result, the value of UPC may become 0 and OXE 326 may send packet 360. In this manner, the NIC 320 may use IOI packet transport, which may combine both in-order packet delivery and out-of-order packet delivery, thereby facilitating efficient packet forwarding of the message 340.

Fig. 4A shows a flow diagram of a message selection process for Input Output Input (IOI) packet forwarding in a NIC. During operation, the NIC may obtain a message for a remote device (operation 402) and initiate packet forwarding of the message based on the in-order delivery (operation 404). The NIC may then dynamically determine a threshold based on the initial response (e.g., the first response received at the NIC) (operation 406) and determine whether the size of the remaining packets is greater than the threshold (operation 408). Since the IOI has not been triggered, the remaining packets can still be considered to be ordered packets. If the size is greater than the threshold, the NIC may initiate IOI forwarding for subsequent packets (operation 410). On the other hand, if the size is less than or equal to the threshold, the NIC may continue packet forwarding based on the in-order delivery (operation 412).

Fig. 4B shows a flow diagram of an IOI packet forwarding process in a NIC. During operation, the NIC may obtain a message (operation 452) and generate a packet stream for the packet (operation 454). The NIC may then select packets from the packet stream and mark the packet stream as in order (e.g., by setting a flag in the packets) (operation 456). The NIC may then send the packet based on the in-order forwarding policy (operation 458). The ordered forwarding policy may specify how the ordered packet is likely to be forwarded via the network, such as forwarding paths and corresponding forwarding parameters in the network.

The NIC may then check whether a response has been received (operation 460). If the NIC has not received a response, the NIC may continue to select the next packet from the packet stream and mark the packet stream in order (operation 456). On the other hand, after receiving the response, the NIC may determine whether the size of the remaining packets is greater than the threshold (operation 462), as described in connection with fig. 4A. If the size of the transmitted packet is greater than the threshold, the NIC may switch to out-of-order delivery (operation 464). Accordingly, the NIC may select the next packet from the packet stream and mark the packet stream as out-of-order (e.g., by not setting a flag in the packet) (operation 466). The NIC may then send the packet based on the out-of-order forwarding policy (operation 468). An out-of-order forwarding policy may specify how out-of-order packets may be forwarded, such as load balancing and multi-path forwarding.

The NIC may then check whether the number of remaining packets is less than the threshold (operation 470). If the number of remaining packets is not less than the threshold, the NIC may continue to select the next packet from the packet flow and mark the packet flow as out-of-order (operation 466). On the other hand, if the number of remaining packets is less than the threshold, the NIC may switch to in-order delivery (operation 472). If the size of the transmitted packet is less than or equal to the threshold (operation 462), the message may be too small for the IOI. Accordingly, the NIC may mark the remaining packets in the flow as in-order (operation 474) and send the remaining packets based on the in-order forwarding policy and the final packet policy (operation 476). The final packet policy may specify that the last packet may be forwarded.

Fig. 4C shows a flow diagram of the IOI packet forwarding process for the last packet in the NIC. The forwarding process may conform to the final packet policy. During operation, the NIC may identify the last packet remaining (operation 482) and determine whether the value of the UPC has become zero (i.e., responses to all out-of-order packets have been received) (operation 484). If the value of the UPC is not zero, the NIC may not send the last packet (operation 486) and continue to determine if the value of the UPC has become zero (operation 484). On the other hand, if the value of the UPC has become zero, the NIC may send the last packet remaining based on the in-order forwarding policy (operation 488).

Exemplary computer System

Fig. 5 illustrates an exemplary computer system equipped with a NIC that facilitates efficient packet forwarding. The computer system 550 includes a processor 552, a memory device 554 and a storage device 556. Memory device 554 may include a volatile memory device (e.g., a dual in-line memory module (DIMM)). Further, the computer system 550 may be coupled to a keyboard 562, a pointing device 564, and a display device 566. The storage device 556 may store an operating system 570. The application programs 572 may operate on an operating system 570.

Computer system 550 may be equipped with a host interface to couple NIC 520 that facilitates efficient data request management. The NIC 520 may provide one or more HNIs to the computer system 550. NIC 520 may be coupled to switch 502 via one of the HNIs. NIC 520 may include an IOI logic block 530, as described in connection with fig. 3A and 3B. The IOI logic block 530 may include a monitoring logic block 532, a packet generation logic block 534, a forwarding logic block 536, and a switching logic block 538. Monitoring logic block 532 may determine whether a packet of a message invokes IOI forwarding.

Packet generation logic block 534 (e.g., in an MCU in NIC 520) may generate a packet stream from the message. Forwarding logic 536 (e.g., in an OXE in NIC 520) may forward packets of the message based on IOI forwarding, as described in connection with fig. 4B and 4C. The monitoring logic 532 may also determine whether a response to the initial packet has been received. The monitoring logic 532 may maintain values for OPC and UPC for the message. Switching logic block 536 (e.g., in an OXE in NIC 520) may determine whether to switch between in-order and out-of-order delivery for IOI forwarding.

In summary, the present disclosure describes a NIC that facilitates efficient packet forwarding. The NIC may be equipped with a host interface, packet generation logic, and forwarding logic. The host interface may couple host devices. During operation, the packet generation logic may obtain a message from the host device for a remote device via the host interface. The packet generation logic may generate a plurality of packets for the remote device from the message. The forwarding logic block may then transmit a first subset of packets of the plurality of packets based on the in-order delivery. The forwarding logic block may send a second subset of packets of the plurality of packets based on out-of-order delivery if a first condition is satisfied. Further, the forwarding logic may send a third subset of the plurality of packets based on the in-order delivery if the second condition is satisfied.

The methods and processes described above may be performed by hardware logic blocks, modules, logic blocks, or devices. A hardware logic block, module, logic block, or apparatus may include, but is not limited to, an Application Specific Integrated Circuit (ASIC) chip, a Field Programmable Gate Array (FPGA), a dedicated or shared processor that executes code at a particular time, and other programmable logic devices now known or later developed. A hardware logic block, module or device when activated, performs the methods and processes included therein.

The methods and processes described herein may also be embodied as code or data, which may be stored in a storage device or computer-readable storage medium. The methods and processes may be performed by a processor when the stored code or data is read and executed by the processor.

The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. The description is not intended to be exhaustive or to limit the invention to the precise form disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the invention is defined by the appended claims.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于促进网络中的全局公平性的系统和方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!