System and method for facilitating dynamic command management in a Network Interface Controller (NIC)

文档序号:1895350 发布日期:2021-11-26 浏览:12次 中文

阅读说明:本技术 促进网络接口控制器(nic)中的动态命令管理的系统和方法 (System and method for facilitating dynamic command management in a Network Interface Controller (NIC) ) 是由 D·罗威斯 A·M·巴塔耶纳 E·L·弗洛伊斯 于 2020-03-23 设计创作,主要内容包括:提供了一种能够进行高效命令管理的网络接口控制器(NIC)。所述NIC可以配备有主机接口、仲裁逻辑块和命令管理逻辑块。在操作期间,所述主机接口可以将所述NIC耦接到主机设备。所述仲裁逻辑块可以选择所述主机设备的命令队列来获得命令。所述命令管理逻辑块可以确定与所述命令队列相关联的内部缓冲区是否包含命令。如果所述内部缓冲区包含所述命令,则所述命令管理逻辑块可以从所述内部缓冲区获得所述命令。另一方面,如果所述内部缓冲区为空,则所述命令管理逻辑块可以经由所述主机接口从所述命令队列中获得所述命令。(A Network Interface Controller (NIC) capable of efficient command management is provided. The NIC may be equipped with a host interface, arbitration logic, and command management logic. During operation, the host interface may couple the NIC to a host device. The arbitration logic block may select a command queue of the host device to obtain a command. The command management logic may determine whether an internal buffer associated with the command queue contains a command. The command management logic may obtain the command from the internal buffer if the internal buffer contains the command. On the other hand, if the internal buffer is empty, the command management logic may obtain the command from the command queue via the host interface.)

1. A Network Interface Controller (NIC), comprising:

a host interface to couple a host device;

arbitration logic to select a command queue of the host device to obtain a command; and

command management logic to:

receiving the command via the host interface;

determining whether an internal buffer associated with the command queue contains a command;

in response to determining that the internal buffer contains the command, obtaining the command from the internal buffer; and

in response to determining that the internal buffer is empty, obtaining the command from the command queue via the host interface.

2. The network interface controller of claim 1, wherein the command management logic block is further to provide the host device with one or more of:

processing information associated with the command queue; and

including state information of a read pointer of the command queue.

3. The network interface controller of claim 1, wherein the command management logic block is further to determine that the command queue has a new command based on advancement of a write pointer of the command queue.

4. The network interface controller of claim 1, wherein the arbitration logic block is further to select the command queue from a plurality of command queues in a memory device of the host device.

5. The network interface controller of claim 4, further comprising a corresponding internal buffer for a respective command queue.

6. The network interface controller of claim 1, wherein the command management logic block is further to discard a new command received from the host interface and destined for the internal buffer in response to determining one or more of:

the internal buffer is not large enough to accommodate the new command; and

the command queue is not empty.

7. The network interface controller of claim 6, wherein, in response to receiving the new command, the command management logic block is further to advance a write pointer of the command queue to determine that a command is present in the command queue.

8. The network interface controller of claim 1, wherein the command management logic is to:

advancing a prefetch pointer of the command queue in response to requesting the command; and

advancing a read pointer of the command queue in response to receiving data associated with the command.

9. The network interface controller of claim 1, wherein the host interface is a peripheral component interconnect express (PCIe) interface; and is

Wherein the command management logic is to obtain the command from the command queue based on a PCIe read.

10. The network interface controller of claim 1, wherein the command comprises a Remote Direct Memory Access (RDMA) command.

11. A computer system for facilitating a command management system, the computer system comprising:

a processor;

a memory device storing a command queue;

a host interface to couple a Network Interface Controller (NIC) that maintains an internal buffer associated with the command queue; and

a storage device storing instructions that, when executed by the processor, cause the processor to perform a method comprising:

writing commands to the command queue;

determining whether the internal buffer is capable of accepting the command based on a state of the command queue;

in response to determining that the internal buffer is capable of accepting the command, writing the command into the internal buffer via the host interface; and

notifying, via the host interface, the NIC that the command has been written to the command queue in response to determining that the internal buffer cannot accept the command.

12. The computer system of claim 11, wherein the status of the command queue indicates whether the command queue is empty.

13. The computer system of claim 11, wherein the method further comprises:

obtaining statistics associated with performance of the internal buffer; and

speculatively determining whether the command queue is expected to be empty based on the obtained statistics.

14. The computer system of claim 11, wherein notifying the NIC further comprises advancing a write pointer of the command queue.

15. The computer system of claim 11, wherein the memory device is to store a plurality of command queues.

16. The computer system of claim 11, wherein the method further comprises:

writing a plurality of commands into the command queue based on a granularity at which writing into the command queue is permitted; and

advancing a write pointer of the command queue according to the granularity.

17. The computer system of claim 11, wherein the method further comprises: selecting the command in the command queue as a backup command in response to determining that the internal buffer cannot accept the command.

18. The computer system of claim 17, wherein the processor is further configured to,

wherein the NIC obtains the command from the command queue via the host interface in response to determining that the command has been dropped at the internal buffer.

19. The computer system of claim 11, wherein the host interface is a peripheral component interconnect express (PCIe) interface; and is

Wherein the command is written to the internal buffer based on a PCIe write.

20. The computer system of claim 11, wherein the command comprises a Remote Direct Memory Access (RDMA) command.

Technical Field

The present disclosure relates generally to the field of networking technology. More particularly, the present disclosure relates to systems and methods for facilitating efficient command management by a Network Interface Controller (NIC).

Prior Art

As network-enabled devices and applications become more prevalent, various types of traffic and increasing network loads continue to demand higher performance from the underlying network architecture. For example, applications such as High Performance Computing (HPC), streaming media, and internet of things (IOT) may produce different types of traffic that are well characterized. Thus, in addition to traditional network performance metrics such as bandwidth and latency, network architects continue to face challenges such as scalability, versatility, and efficiency.

Disclosure of Invention

A Network Interface Controller (NIC) capable of efficient command management is provided. The NIC may be equipped with a host interface, an arbitration module, and a command management module. During operation, the host interface may couple the NIC to a host device. The arbitration module may select a command queue of the host device to obtain a command. The command management module may determine whether an internal buffer associated with the command queue contains a command. The command management module may obtain the command from the internal buffer if the internal buffer contains the command. On the other hand, if the internal buffer is empty, the command management module may obtain the command from the command queue via the host interface.

Drawings

Fig. 1 illustrates an exemplary network.

Fig. 2A shows an exemplary NIC chip having multiple NICs.

Fig. 2B shows an exemplary architecture of a NIC.

Fig. 3A illustrates an exemplary dynamic command management process in a NIC.

Fig. 3B illustrates an exemplary queue for facilitating dynamic command management in a NIC.

Fig. 4A shows a flow diagram of a dynamic queue selection process for processing commands in a NIC.

FIG. 4B shows a flow diagram of a dynamic command management process for a host device.

Fig. 4C shows a flow diagram of a dynamic command management process for a memory-based command path in a NIC.

FIG. 4D illustrates a flow diagram of a dynamic command management process for a low latency command path in a NIC.

FIG. 5 illustrates an exemplary computer system equipped with a NIC to facilitate dynamic command management.

In the drawings, like reference numerals refer to like elements.

Detailed Description

Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. The invention is thus not limited to the embodiments shown.

SUMMARY

This disclosure describes systems and methods that facilitate dynamic command management in a Network Interface Controller (NIC). The NIC allows the host to communicate with a data driven network.

Embodiments described herein address the problem of efficiently transmitting commands to a NIC by: (i) facilitating large-scale and low-latency command transmission of a command queue in a host device and an internal command buffer in a NIC, respectively; and (ii) dynamically select between the command queue and the internal buffer to receive commands.

During operation, a host device of the NIC may issue commands (e.g., Remote Direct Memory Access (RDMA) "GET" or "PUT" commands) for data operations to the NIC. Thus, the host device may transmit a command (e.g., a Direct Memory Access (DMA) descriptor of the command) to the NIC. If the host device needs to transmit a large number of commands to the NIC, the host device may store the commands in a command queue of the host device. The host device may maintain a command queue in a memory of the host device. When the NIC is ready to receive a new command (e.g., has resources available for the next command), the NIC may request the command from the host device. The processor of the host device may then transmit the command to the NIC.

This read-based approach is based on the NIC accessing the memory of the host device. Thus, the read-based approach may be referred to as a memory-based command path. The memory-based command path may allow large-scale transfers to the NIC and facilitate efficient bandwidth utilization of the host device's internal bandwidth. However, the memory-based command path may have a high command propagation delay, and thus may require multiple accesses to the interface system (or processor interface) to access the command.

To transmit commands with low latency, the host device may transmit commands associated with a small amount of data to an internal command buffer of the NIC. In some embodiments, the processor of the host device may write in an internal buffer of the NIC. This write-based approach may provide low latency data transfer. Thus, the write-based approach may be referred to as a low latency command path. However, since the internal buffer capacity of the NIC may be limited, the low latency command path may limit the amount of transmission.

To address this problem, the NIC may combine the two approaches to facilitate efficient transmission rates with low latency. The host device may maintain a command queue in memory of the host device for the respective command stream (e.g., based on traffic classification). If an application issues a command for a NIC, the command may be stored in a corresponding command queue. The host device may then notify the NIC of the new command by advancing the write pointer (advance). This approach may be application independent, as any application may write to the command queue. The NIC may then issue a read operation to the command queue and advance the prefetch pointer of the queue. When returning data, the NIC may process the command and advance the read pointer.

However, if the command queue may be empty (i.e., it is known that any commands previously written to the command queue have or may have been processed by the NIC), the host device may insert the commands into an internal buffer of the NIC. The NIC may maintain internal buffers for the respective command queues of the host device. The buffer may have a fixed size and may host a limited number of commands. Since commands can be written directly into the internal buffer, the NIC can avoid round-trip data exchanges via internal communication channels (e.g., peripheral component interconnect express (PCIe) channels). In this manner, the NIC may reduce latency in issuing commands to the NIC. By dynamically switching between command paths, the host device may select a command path that may efficiently transmit commands.

One embodiment of the present invention provides a NIC that may be equipped with a host interface, arbitration logic, and command management logic. During operation, the host interface may couple the NIC to a host device. The arbitration logic block may select a command queue of the host device to obtain a command. The command management logic may receive the command via the host interface and determine whether an internal buffer associated with the command queue contains a command. The command management logic may obtain the command from the internal buffer if the internal buffer contains the command. On the other hand, if the internal buffer is empty, the command management logic may obtain the command from the command queue via the host interface.

In a variation of this embodiment, the command management logic may provide one or more of the following to the host device: (i) process information associated with the internal buffer; and (ii) state information including a read pointer of the command queue.

In a variation of this embodiment, the command management logic may determine that the command queue has a new command based on an advancement of a write pointer (advance) of the command queue.

In a variation of this embodiment, the arbitration logic may select the command queue from a plurality of command queues in a memory of the host device.

In a further variant, the NIC may also include a corresponding internal buffer for the respective command queue.

In a variation of this embodiment, the command management logic may discard a new command received from the host interface and destined for the internal buffer after determining one or more of: (i) the internal buffer is not large enough to accommodate the new command; and (ii) the command queue is not empty.

In a variation of this embodiment, in response to receiving the new command, the command management logic may advance a write pointer of the command queue to determine that a command is present in the command queue.

In a variation of this embodiment, the command management logic may advance a prefetch pointer of the command queue upon requesting the command and advance a read pointer of the command queue upon receiving data associated with the command.

In a variation of this embodiment, the host interface may be a peripheral component interconnect express (PCIe) interface. The command management logic may then obtain the command from the command queue based on a PCIe read.

In a variation of this embodiment, the command may comprise an RDMA command.

One embodiment of the present invention provides a computer system that may include a memory device, a host interface, and a command management system. The memory device may store a command queue. The host interface may couple a NIC that may maintain an internal buffer associated with the command queue. During operation, the system may write commands to the command queue and determine whether the internal buffer may accept the commands based on the status of the command queue. If the internal buffer can accept the command, the system can write the command into the internal buffer via the host interface. On the other hand, if the internal buffer cannot accept the command, the system may notify the NIC via the host interface that the command has been written into the command queue.

In a variation of this embodiment, the status of the command queue indicates whether the command queue is empty.

In a variation of this embodiment, the system may obtain statistics associated with performance of the internal buffer and speculatively determine whether the command queue is expected to be empty based on the obtained statistics.

In a variation of this embodiment, notifying the NIC may include advancing a write pointer of the command queue.

In a variation of this embodiment, the memory device may store multiple command queues.

In a variation of this embodiment, the system may write a plurality of commands to the command queue based on the granularity at which writing to the command queue is allowed. Subsequently, the system may advance a write pointer of the command queue according to the granularity.

In a variation of this embodiment, the system may select the command in the command queue as a backup command if the internal buffer cannot accept the command.

In a further variation, the NIC may obtain the command from the command queue via the host interface if the command has been dropped at the internal buffer.

In a variation of this embodiment, the host interface may be a PCIe interface. The system may then write the command into the internal buffer based on a PCIe write.

In a variation of this embodiment, the command may comprise an RDMA command.

In this disclosure, the description in connection with fig. 1 is related to network architecture, and the description in connection with fig. 2A and beyond provides more details regarding the architecture and operation associated with NICs that support efficient command management.

Fig. 1 illustrates an exemplary network. In this example, switch network 100 (which may also be referred to as a "switch fabric") may include switches 102, 104, 106, 108, and 110. Each switch may have a unique address or ID within the switch fabric 100. Various types of devices and networks may be coupled to the switch fabric. For example, storage array 112 may be coupled to switch fabric 100 via switch 110; infiniband (IB) -based HPC network 114 may be coupled to switch fabric 100 via switch 108; a plurality of end hosts, such as host 116, may be coupled to the switch fabric 100 via the switch 104; and the IP/ethernet network 118 may be coupled to the switch fabric 100 via the switch 102. In general, a switch may have edge ports and fabric ports. The edge port may be coupled to a device external to the structure. The fabric port may be coupled to another switch within the fabric via a fabric link. In general, traffic may be injected into the switch fabric 100 via an ingress port of an edge switch and exit the switch fabric 100 via an egress port of another (or the same) edge switch. An ingress link may couple a NIC of an edge device (e.g., HPC end host) to an ingress edge port of an edge switch. The switch fabric 100 may then transmit the traffic to an egress edge switch, which in turn may pass the traffic to a destination edge device via another NIC.

Exemplary NIC architecture

Fig. 2A shows an exemplary NIC chip having multiple NICs. Referring to the example in fig. 1, NIC chip 200 may be a custom Application Specific Integrated Circuit (ASIC) designed for host 116 to work with switch fabric 100. In this example, chip 200 may provide two separate NICs 202 and 204. Each NIC of chip 200 may be equipped with a Host Interface (HI) (e.g., an interface for connecting to a host processor) and a high-speed network interface (HNI) for communicating with links coupled to switch fabric 100 of fig. 1. For example, the NIC 202 may include the HI 210 and the HNI 220, and the NIC 204 may include the HI 211 and the HNI 221.

In some embodiments, the HI 210 may be a Peripheral Component Interconnect (PCI) interface or a peripheral component interconnect express (PCIe) interface. The HI 210 may be coupled to a host via a host connection 201, which may include N (e.g., N may be 16 in some chips) PCIe Gen 4 lanes capable of operating at signaling rates up to 25Gbps per lane. The HNI 210 may facilitate a high-speed network connection 203 that may communicate with links in the switch fabric 100 of fig. 1. The HNI 210 may operate at a total rate of 100Gbps or 200Gbps using M (e.g., M may be 4 in some chips) full-duplex serial lanes. Each of the M channels may operate at 25Gbps or 50Gbps based on non return to zero (NRZ) modulation or pulse amplitude modulation 4(PAM4), respectively. The HNI 220 may support Institute of Electrical and Electronics Engineers (IEEE)802.3 ethernet-based protocols, as well as enhanced frame formats that provide support for higher rate small messages.

The NIC 202 may support one or more of the following: message Passing Interface (MPI) based point-to-point messaging, Remote Memory Access (RMA) operations, offloading and scheduling of bulk data collective operations, and ethernet packet processing. When the host issues an MPI message, the NIC 202 may match the corresponding message type. Further, the NIC 202 may implement both the urgency protocol and the agreement protocol for MPI, thereby offloading corresponding operations from the host.

Further, RMA operations supported by the NIC 202 may include PUT, GET, and Atomic Memory Operations (AMO). The NIC 202 may provide reliable transmissions. For example, if the NIC 202 is an originating NIC, the NIC 202 may provide a retry mechanism for idempotent operations. Furthermore, connection-based error detection and retry mechanisms may be used for ordered operations that may manipulate the target state. The hardware of NIC 202 may maintain the state required for the retry mechanism. In this way, the NIC 202 may relieve the burden on the host (e.g., software). The policy for deciding the retry mechanism may be specified by the host through software to ensure flexibility of the NIC 202.

In addition, the NIC 202 may facilitate the scheduling of trigger operations, generic offload mechanisms, and dependent sequences of operations (such as bulk data sets). NIC 202 may support an Application Programming Interface (API), such as a libfabric API, to facilitate the provision of fabric communication services by switch fabric 100 of fig. 1 to applications running on host 116. The NIC 202 may also support a low-level network programming interface, such as a Portals API. Additionally, the NIC 202 may provide efficient ethernet packet processing that may include efficient transmission when the NIC 202 is the sender, flow manipulation when the NIC 202 is the target, and checksum calculation. Further, the NIC 202 may support virtualization (e.g., using a container or virtual machine).

Fig. 2B shows an exemplary architecture of a NIC. In the NIC 202, the port macro of the HNI 220 may facilitate low-level ethernet operations, such as Physical Coding Sublayer (PCS) and Media Access Control (MAC). Additionally, the NIC 202 may provide support for Link Layer Retry (LLR). The incoming packets may be parsed by the parser 228 and stored in the buffer 229. The buffer 229 may be a PFC buffer supplied to buffer a threshold amount (e.g., one microsecond) of delay bandwidth. HNI 220 may also include a control transmit unit 224 and a control receive unit 226 for managing outgoing and incoming packets, respectively.

NIC 202 may include a Command Queue (CQ) unit 230. CQ unit 230 may be responsible for fetching and issuing host-side commands. CQ unit 230 may include a command queue 232 and a scheduler 234. The command queue 232 may include two separate sets of queues for initiator commands (PUT, GET, etc.) and target commands (appendix, Search, etc.), respectively. The command queue 232 may be implemented as a circular buffer maintained in memory of the NIC 202. An application running on the host may write directly to the command queue 232. The scheduler 234 may include two separate schedulers for initiator commands and target commands, respectively. The initiator commands are sorted into the flow queue 236 based on a hash function. One of the flow queues 236 may be assigned to a unique flow. In addition, CQ unit 230 may further include a trigger action module (or logic block) 238, which is responsible for queuing and dispatching trigger commands.

The outbound transport engine (OXE)240 may pull the command from the flow queue 236 to process it for dispatch. OXE 240 may include an Address Translation Request Unit (ATRU)244, which may send an address translation request to Address Translation Unit (ATU) 212. ATU 212 may provide virtual to physical address translation on behalf of different engines, such as OXE 240, inbound transport engine (IXE)250, and Event Engine (EE) 216. ATU 212 may maintain a large translation cache 214. ATU 212 may perform the translation itself or may use a host-based Address Translation Service (ATS). OXE 240 may also include a message slicing unit (MCU)246 that may slice a large message into packets of a size corresponding to a Maximum Transmission Unit (MTU). MCU 246 may include a plurality of MCU modules. When the MCU module is available, the MCU module can obtain the next command from the assigned stream queue. The received data may be written into the data buffer 242. The MCU module can then send the packet header, the corresponding traffic classification, and the packet size to the traffic shaper 248. Shaper 248 may determine which requests made by MCU 246 may enter the network.

The selected packets may then be sent to a Packet and Connection Tracking (PCT) 270. The PCT 270 may store the packet in the queue 274. The PCT 270 may also maintain status information for outbound commands and update the status information when responses are returned. PCT 270 may also maintain packet status information (e.g., to allow matching of responses to requests), message status information (e.g., to track progress of multi-packet messages), initiator completion status information, and retry status information (e.g., to maintain information needed to retry a command if a request or response is lost). If no response is returned within the threshold time, the corresponding command may be retrieved from the retry buffer 272. The PCT 270 may facilitate connection management for initiator commands and target commands based on the source table 276 and the target table 278, respectively. For example, the PCT 270 may update its source table 276 to track the status required to reliably deliver packets and message completion notifications. PCT 270 may forward the outgoing packet to HNI 220, which stores the packet in outbound queue 222.

The NIC 202 may also include an IXE 250 that provides packet processing when the NIC 202 is the target or destination. The IXE 250 may obtain the incoming packets from the HNI 220. The parser 256 may parse incoming packets and pass corresponding packet information to a List Processing Engine (LPE)264 or a Message State Table (MST)266 for matching. LPE 264 may match incoming messages to buffers. LPE 264 may determine the buffer and starting address to be used for each message. LPE 264 may also manage a pool of list entries 262 representing buffers, as well as exception messages. MST 266 may store the matching results and the information needed to generate the target end completion event. The event may be an internal control message for communicating between elements of the NIC 202. MST 266 may be used by unlimited operations, including multi-packet PUT commands and single-packet and multi-packet GET commands.

The parser 256 may then store the packet in the packet buffer 254. The IXE 250 may obtain the matching result for conflict checking. The DMA write and AMO module 252 may then issue the updates generated by the write and AMO operations to the memory. If a packet includes a command (e.g., a GET request) that generates a target memory read operation, the packet may be passed to OXE 240. The NIC 202 may also include an EE 216 that may receive requests to generate event notifications from other modules or units in the NIC 202. The event notification may specify that a full event or a count event is to be generated. EE 216 may manage an event queue located within the host processor memory that writes complete events to the host processor memory. EE 216 may forward the counting event to CQ unit 230.

Dynamic command management in NICs

Fig. 3A illustrates an exemplary dynamic command management process in a NIC. In this example, host device 300 may be equipped with NIC 330. The device 300 may include a processor 302, a memory device 304, and an interface system 306. HI 332 of NIC 330 may be coupled to interface system 306 of device 300. In some embodiments, HI 332 may be a PCIe interface and interface system 306 may be a PCIe system that provides slots for HI 332. The NIC 330 may also include a command queue unit 334 for managing incoming commands from the device 300, as described in connection with fig. 2A.

During operation, the device 300 may issue a command 320 for an operation (e.g., an RDMA operation). To transmit the command, the host 300 may generate a command descriptor (e.g., a DMA descriptor) and transmit the command 320 to the NIC 330. If the command 320 is one of a large number of commands, the device 300 may store the command 320 in the command queue 312 in the memory device 304. When the NIC 330 has resources available for the next command, the NIC 330 may request the command from the device 300. If command 320 is the next command, processor 302 may transmit command 320 to NIC 330 via HI 332. Here, the NIC 330 may read a command from the memory device 304 of the host 300. Such a memory-based command path may allow large-scale commands to be transmitted to NIC 300, thereby facilitating efficient bandwidth utilization by interface system 306.

However, the memory-based command path may have a high command propagation delay, and thus may require multiple accesses to the interface system 306 to access the command 320. Alternatively, if the command 320 is associated with a small amount of data (e.g., within a threshold), the device 300 may transmit the command 320 to the internal command buffer 314 in the NIC 330. In some embodiments, processor 302 may write in internal buffer 314. Such a low latency command path may provide low latency data transmission. However, since the internal buffer 314 may have a limited capacity, the low latency command path may limit the amount of transfer.

To address this issue, the NIC 330 may combine two command paths to facilitate efficient transmission rates with low latency. Further, device 300 may maintain multiple command queues in memory device 304, each for a respective command stream. The command queue 312 may be one of the command queues. When an application running on device 300 issues a command 320 to NIC 330, command 320 may be stored in command queue 312. The device 300 may then notify the NIC 330 of the command 320 by advancing the write pointer of the command queue 312. NIC 330 may then issue a read operation to command queue 312 via HI 332 and advance the prefetch pointer of command queue 312. When data is returned for the command 320, the NIC 330 may process the command 320 and advance the read pointer of the command queue 312.

However, if the command queue 312 is empty, the device 300 may insert the command 320 into the internal buffer 314. NIC 330 may maintain internal buffers for the respective command queues of device 300. Internal buffers 314 may be managed by command queue unit 334. The internal buffer 314 may have a fixed size and may store a limited number of commands. Since the commands 320 may be written directly into the internal buffer 314, the NIC 330 may avoid exchanging data to and from the processor 302 via the interface channel 306. In this manner, the internal buffer 314 may reduce latency in issuing commands to the NIC 330. By dynamically switching between the command queue 312 and the internal buffer 314, the host 300 may select a command path that may efficiently transmit commands to the NIC 330.

In some embodiments, the device driver 308 of the NIC 330 running on the operating system of the host 300 may select the command path. Driver 308 may dynamically determine whether to use a memory-based command path or a low-latency command path for the respective command (i.e., on a command-by-command basis). Driver 308 may determine whether there are outstanding commands in command queue 312 and internal buffer 314 based on information from NIC 330. For example, the NIC 330 may provide the driver 308 with the current location of one or more pointers of the command queue 312. In addition, the NIC 330 may also provide statistics on how to efficiently use the internal buffer 314. Driver 308 may determine whether to select internal buffer 314 to transfer the next command.

In addition, driver 308 may speculatively determine that internal buffer 314 should have available capacity. Based on the determination, the driver 308 may select the internal buffer 314 to issue the command if the current state of the command queue 312 and the internal buffer 314 satisfy the selection criteria. Otherwise, the driver 308 may use the command queue 312. Accordingly, the NIC 330 may obtain commands from the internal buffer 314, if possible. Otherwise, the NIC 330 may obtain the command from the command queue 312.

Fig. 3B illustrates an exemplary queue for facilitating dynamic command management in a NIC. Operations on the command queue 312 may be based on operations of a circular buffer. During operation, if device 300 determines that command 362 should be issued to command queue 312, device 300 may format command 362. The device 300 may then store the command 362 at the location in the command queue 312 indicated by the write pointer 352. The device 300 may then advance the write pointer 352 to the next memory location. Advancing the write pointer 352 may trigger a notification (or "doorbell") to the NIC 330. The device 300 may write a plurality of commands to the command queue 312 before advancing the write pointer 352. In some embodiments, the granularity at which the write pointer 352 is advanced may be configured at the device 300 (e.g., by a user).

Based on the trigger, the NIC 330 may determine that the command queue 312 has a new command. If the NIC 330 selects the command queue 312 for processing (e.g., based on an arbitration process between command queues), the NIC 330 may read the command indicated by the prefetch pointer 354 from the command queue 312 and advance the prefetch pointer 354. For example, if prefetch pointer 354 indicates the location of command 362, NIC 330 may read command 362 from command queue 312. When data associated with command 362 is returned to NIC 330, NIC 330 may process command 362 and advance read pointer 356.

In some embodiments, advancing the read pointer 356 may include updating the application-visible copy of the read pointer 356 according to a queue-specific policy. The NIC 330 may continue to read commands from the command queue 312 until the processing resources (e.g., the execution units described in connection with fig. 2B) of the NIC 330 have sufficient commands to execute. If the prefetch pointer 354 reaches the write pointer 352 (e.g., if the command queue 312 is empty), the NIC 330 may stop the read command. Since the command queue 312 may be a circular queue of fixed size, the device 330 may suspend issuing commands to the command queue 312 if the write pointer 352 reaches the read pointer 356. Arrival of the write pointer 352 at the read pointer 356 may indicate that the command queue 312 is full and unable to accept new commands. Pointers 352, 354, and 356 may indicate locations or positions in command queue 312. For example, pointers 352, 354, and 356 may represent indices (e.g., array indices) of command queue 312, or represent memory pointers that indicate memory locations.

On the other hand, if the apparatus 300 determines or speculates that the command queue 312 is empty, the apparatus 300 may determine that a command 364 should be issued to the internal buffer 314. The device 300 may then format the command 364 and store the command 364 in the command queue 312 at the location indicated by the write pointer 352. However, if the device 300 determines or speculates that the internal buffer 314 has sufficient capacity to accommodate the command 364, the device 300 may not advance the write pointer 352, but may write the command 364 into the internal buffer 364. The device 300 may perform write operations using PCIe-based write operations. A write memory barrier, such as an SFENCE instruction, may be used between corresponding writes to the command queue 312 and the internal buffer 314.

Upon detecting a write operation in the internal buffer 314, the NIC 330 may advance the write pointer 358. When the NIC 330 selects the command queue 312 to process a command, the NIC 330 determines that the internal buffer 314 stores a command. Accordingly, the NIC 330 reads from the internal buffer 314 rather than issuing an interface-based read (such as a PCIe read) to the command queue 312. Upon obtaining the command 364 from the internal buffer 314, the NIC 330 may advance the prefetch pointer 354.

It should be noted that write operations to the internal buffer 314 may arrive out of order. Further, the granularity of write operations may be less than the granularity of certain commands. When a write operation within a block of the internal buffer 314 has completed, the NIC 330 may track the partial write operation to the internal buffer 314 and advance the write pointer 352. If the internal buffer 314 still contains data for a previous write operation, or the command queue 312 is not yet empty when the operation 364 is issued (i.e., the prefetch pointer 354 is not yet equal to the write pointer 352), the NIC 330 may discard the command 364. Thus, the commands 364 in the command queue 312 may operate as backup commands. When the NIC 330 has resources available to execute another command, the NIC 330 may obtain the next command from the command queue 312.

Fig. 4A shows a flow diagram of a dynamic queue selection process for processing commands in a NIC. During operation, the host device of the NIC may obtain the last known state of the command queue (operation 402). The device may then determine whether the command queue is empty (operation 404). If the command queue is not empty, the device may determine whether it is beneficial to speculatively issue commands to the internal buffer (operation 406). For example, if the command queue may be empty, it may be beneficial to speculatively issue commands to the internal buffer.

If it is not beneficial to speculatively issue the command, the device may maintain a memory-based command path (operation 408) and continue to obtain the status of the command queue (operation 402). On the other hand, if the speculative command queue is empty (operation 404) or it is beneficial to issue a command (operation 406), the device may switch to a low latency command path (operation 410). It should be noted that the memory-based command path may be a default option for the device. Unless switched to the low latency command path, the device may continue to use the memory-based command path to transmit commands to the NIC.

FIG. 4B shows a flow diagram of a dynamic command management process for a host device. During operation, the device may generate a command (e.g., in a format acceptable to the device's NIC), insert the command into a command queue associated with the command (operation 432), and advance a copy of the device's write pointer (operation 434). The device may then check whether the low latency command path is selected (operation 436). If a low latency command path is selected, the device may also insert the command into a NIC internal buffer associated with the command queue (operation 438).

If the low latency command path is not selected (operation 434), the device may advance a write pointer in the NIC (operation 440). The device may check whether the device's copy of the write pointer has reached the read pointer (operation 442). If the write pointer has not reached the read pointer, the device may continue to generate and insert commands into the command queue associated with the command (operation 432). However, if the write pointer has reached the read pointer, the command queue may be full and the device may avoid issuing more commands (operation 444).

Fig. 4C shows a flow diagram of a dynamic command management process for a memory-based command path in a NIC. During operation, the NIC may select a command queue to obtain commands (e.g., based on an arbitration process) (operation 452) and check whether the corresponding internal buffer contains commands (operation 454). If the internal buffer does not contain a command, the NIC may determine whether there is a command in the command queue (operation 456). If there are commands in the command queue, the NIC may request commands from the command queue and advance the prefetch pointer (operation 456).

The NIC may then wait for the requested command to be returned (operation 460). On the other hand, if the internal buffer contains a command, the NIC may obtain the command from the internal buffer associated with the command queue and advance the prefetch pointer (operation 464). After obtaining the command (operations 460 or 464), the NIC may advance the read pointer (operation 462).

Fig. 4D shows a flow diagram of a command management process for a low latency command path in a NIC. During operation, the NIC may receive a command from the host device (operation 472) and determine whether the command queue is empty (operation 474). If the command queue is empty, the NIC may determine whether there is available capacity in the internal buffer to accommodate the received command (operation 476). If the internal buffer has available capacity, the NIC may insert a command into the internal buffer (operation 478) and advance a local copy of the write pointer (i.e., a copy of the NIC) (operation 480). On the other hand, if the command queue is not empty (operation 474) or the internal buffer does not have available capacity (operation 476), the NIC may continue to use the memory-based command path (operation 482).

Exemplary computer System

FIG. 5 illustrates an exemplary computer system equipped with a NIC to facilitate dynamic command management. The computer system 550 includes a processor 552, a memory device 554 and a storage device 556. Memory device 554 may include a volatile memory device (e.g., a dual in-line memory module (DIMM)). Further, the computer system 550 may be coupled to a keyboard 562, a pointing device 564, and a display device 566. The storage device 556 may store an operating system 570. The application programs 572 may operate on an operating system 570.

Computer system 550 may be equipped with a host interface to couple NIC 520 that facilitates efficient command management. The NIC 520 may provide one or more HNIs to the computer system 550. NIC 520 may be coupled to switch 502 via one of the HNIs. NIC 520 may include command logic block 530, as described in connection with fig. 2B and 3. The command logic 530 may include a retrieval logic 532 and an execution logic 534. The retrieve logic 532 may provide information associated with the state of the command queue 560 as known by the command logic 530 to the computer system 550 via the HI.

The device driver 580 of the NIC 520 running on the operating system 570 may select the command path based on the provided information. The driver 580 may dynamically determine whether to use the memory-based command path or the low-latency command path based on the current state of the command queue 560 in the memory device 554. In addition, the driver 580 may speculatively determine that the command queue 560 may be empty and that the internal buffer 536 should have available capacity. Accordingly, NIC 520 may obtain commands from internal buffer 536, if possible. Otherwise, NIC 520 may obtain the command from command queue 560.

The retrieve logic 532 may determine whether the internal buffer 536 of the NIC 220 contains a command. If the internal buffer 536 contains a command, the retrieve logic 532 may obtain the command from the internal buffer 536. On the other hand, if the internal buffer 536 does not contain a command, the retrieve logic 532 may obtain the command from the command queue 560 in the memory device 554. In either case, the retrieval module 532 may advance the prefetch pointer. The execution logic block 534 may execute the command. The execution logic block 534 may then advance the read pointer.

In summary, the present disclosure describes a NIC that facilitates efficient command management. The NIC may be equipped with a host interface, arbitration logic, and command management logic. During operation, the host interface may couple the NIC to a host device. The arbitration logic block may select a command queue of the host device to obtain a command. The command management logic may determine whether an internal buffer associated with the command queue contains a command. The command management logic may obtain the command from the internal buffer if the internal buffer contains the command. On the other hand, if the internal buffer is empty, the command management logic may obtain the command from the command queue via the host interface.

The methods and processes described above may be performed by hardware logic blocks, modules, or devices. A hardware logic block, module, logic block, or apparatus may include, but is not limited to, an Application Specific Integrated Circuit (ASIC) chip, a Field Programmable Gate Array (FPGA), a dedicated or shared processor that executes code at a particular time, and other programmable logic devices now known or later developed. A hardware logic block, module or device when activated, performs the methods and processes included therein.

The methods and processes described herein may also be embodied as code or data, which may be stored in a storage device or computer-readable storage medium. The methods and processes may be performed by a processor when the stored code or data is read and executed by the processor.

The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. The description is not intended to be exhaustive or to limit the invention to the precise form disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the invention is defined by the appended claims.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用户终端以及无线通信方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!