System and method for adaptive routing in the presence of a continuous flow

文档序号：1942987 发布日期：2021-12-07 浏览：14次中文

阅读说明：本技术 用于在存在持续流的情况下的自适应路由的系统和方法 (System and method for adaptive routing in the presence of a continuous flow ) 是由 D·罗威斯于 2020-03-23 设计创作，主要内容包括：描述了用于在存在持续流的情况下提供自适应路由的系统和方法。结构中的交换机具有建立流通道的能力。交换机可以自适应地路由流,同时监视流通道的传输特性以识别任何流是否在朝向目的地时正经历拥塞。响应于检测到拥塞,可以进一步确定流是否与拥塞源有关或替代地流是否为拥塞受害者。作为拥塞源的流的路由受到约束以防止拥塞传播。例如,作为拥塞源的流的新包可能被迫仅采取检测到所述拥塞的数据传输路径(防止拥塞扩散)。替代地,拥塞受害者的路由不受约束,并且包可以采取如由自适应路由所准许的任何路径。(Systems and methods for providing adaptive routing in the presence of a continuous flow are described. The switches in the fabric have the ability to establish flow paths. The switch may adaptively route flows while monitoring the transmission characteristics of the flow channels to identify whether any flow is experiencing congestion toward the destination. In response to detecting congestion, it may be further determined whether the flow is related to the source of the congestion or alternatively whether the flow is a congestion victim. The routing of flows that are sources of congestion is constrained to prevent congestion propagation. For example, a new packet of a flow that is a source of congestion may be forced to take only a data transmission path in which the congestion is detected (congestion spreading is prevented). Alternatively, the routing of the congestion victim is unconstrained, and the packet may take any path as permitted by adaptive routing.)

1. A method of routing a plurality of data transmissions in a network having a plurality of switches, the method comprising:

establishing a plurality of flow channels corresponding to each of a plurality of flows comprising the plurality of data transmissions;

adaptively routing the plurality of flows through a network having a plurality of switches;

monitoring transmission characteristics of each of the plurality of flows via the corresponding flow channel to identify a flow from among the plurality of flows that is experiencing congestion;

in response to identifying a flow that is experiencing congestion, identifying whether the flow is related to a source of congestion; and

in response to identifying that the flow is related to a congestion source, constraining routing decisions for the congestion source flow via the corresponding flow channel such that congestion does not propagate in the network.

2. The method of claim 1, further comprising:

in response to identifying a flow that is experiencing congestion, identifying whether the flow is a congestion victim; and

continuing to adaptively route the congestion victim flow through the network in response to identifying that the flow is a congestion victim.

3. The method of claim 1, wherein the plurality of data transmissions are communicated from a plurality of source ports to a plurality of destination ports via a plurality of fabric ports associated with the plurality of switches in the network.

4. The method of claim 1, wherein each of the plurality of flow channels corresponds to a source port and destination port pair of a respective flow.

5. The method of claim 4, wherein adaptively routing comprises performing routing decisions at each of the fabric ports.

6. The method of claim 1, wherein at least a segment of the plurality of flows comprises a continuous flow.

7. The method of claim 6, wherein adaptive routing comprises:

utilizing the plurality of flow channels to select a path for a first packet in a flow for each of the plurality of flows.

8. The method of claim 7, wherein adaptively routing further comprises dynamically rerouting flows in a direction away from a congestion point in the network.

9. The method of claim 8, wherein dynamically rerouting the flow comprises:

utilizing the plurality of flow channels to force subsequent packets in a flow to follow the same path taken by the first packet such that the ordering of the packets is maintained throughout the data transmission.

10. The method of claim 9, wherein identifying flows that are experiencing congestion is based on congestion acknowledgements indicated in flow channels monitored for the flows.

11. The method of claim 10, wherein the congestion acknowledgement signals upstream that the flow is experiencing congestion.

12. The method of claim 11, wherein adaptively rerouting the flow comprises:

transmitting the congestion acknowledgement upstream to an Input Flow Channel Table (IFCT); and

in response to receiving the congestion acknowledgement at each of the upstream IFTCs, decreasing a maximum flow _ extend to prevent new packets from being routed to the congested destination.

13. The method of claim 1, wherein identifying whether the flow is related to a source of congestion is based on a congestion detection capability of each of the plurality of switches in the network.

14. A switch, comprising:

an Application Specific Integrated Circuit (ASIC) to:

establishing a plurality of flow channels corresponding to each of a plurality of flows comprising the plurality of data transmissions;

adaptively routing the plurality of flows through the network;

monitoring transmission characteristics of each of the plurality of flows via the corresponding flow channel to identify a flow from among the plurality of flows that is experiencing congestion;

in response to identifying a flow that is experiencing congestion, identifying whether the flow is related to a source of congestion; and

15. The switch of claim 14, having the ASIC, the ASIC further to:

in response to identifying a flow that is experiencing congestion, identifying whether the flow is a congestion victim; and

continuing to adaptively route the congestion victim flow through the network in response to identifying that the flow is a congestion victim.

16. The switch of claim 15, having the ASIC, the ASIC further to:

detecting congestion within the network by measuring a congestion level associated with an egress edge port; and

limiting injection of new packets that are part of the congestion source flow from entering a network fabric.

Background

As network-enabled devices and applications become more prevalent, various types of traffic and increasing network loads continue to demand higher performance from the underlying network architecture. For example, applications such as High Performance Computing (HPC), media streaming, and internet of things (IOT) may generate different types of traffic with distinct characteristics. As a result, network architectures continue to face challenges such as scalability, versatility, and efficiency, in addition to conventional network performance metrics such as bandwidth and latency.

Drawings

The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The drawings are provided for purposes of illustration only and merely depict typical or example embodiments.

FIG. 1 illustrates an example network in which various embodiments may be implemented.

Fig. 2A illustrates an example switch that facilitates flow channels.

Fig. 2B illustrates an example of how switches along a data path may maintain flow state information.

Fig. 3A illustrates an example structure header for a data packet.

Fig. 3B illustrates an example Acknowledgement (ACK) packet format.

FIG. 3C illustrates example relationships between different variables used to derive and maintain state information for a flow.

Fig. 4A illustrates an example of how a flow may be delivered using a flow channel table.

Fig. 4B illustrates an example of an Edge Flow Channel Table (EFCT).

Fig. 4C illustrates an example of an Input Flow Channel Table (IFCT).

Fig. 4D illustrates an example of an Output Flow Channel Table (OFCT).

Fig. 5 illustrates an example of a network experiencing congestion in which adaptive routing techniques in the presence of a continuous flow may be implemented.

Fig. 6 illustrates a flow diagram of an exemplary process of adaptive routing in the presence of a continuous flow, in accordance with various embodiments.

Figure 7 illustrates an example of a hardware architecture of a switch that facilitates flow channels in accordance with various embodiments.

FIG. 8 is an example computing component that may be used to implement various features of embodiments described in this disclosure.

The drawings are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed.

Detailed Description

A large network consists of many individual switches connected to many data links. Conventional networks split data into manageable chunks (referred to as packets or frames). This allows many separate and distinct communications to share the bandwidth of a single link. In particular, a single large data transfer of one communication will not prevent many other small communications from being completed. The large communication is broken down into many individual packets and its packets are time multiplexed with the packets of other small and large communications. This approach allows a single shared network resource to perform many concurrent communications and significantly reduces the maximum latency of small communications in the presence of large communications.

However, sharing resources between many disparate communications works well only if none of the communications can exhaust any of the shared resources of other communications. It is also important that access to the shared resources remain fair and appropriate for the importance of each communication that is made. Thus, when routing packets between a source node and a destination, typical routing techniques are either static or adaptive (dynamic). In one example of adaptive routing, local routing decisions are made dynamically based on load information and other factors. In current systems, adaptive routing (without regard to the source of congestion) can cause congestion to flood. According to the adaptive routing techniques disclosed herein, certain data flows may be identified as congestion sources, while other data flows may be identified as congestion victims. As will be described in detail, adaptive routing techniques may allow the victim flow to continue making normal dynamic routing decisions. Flows that are causing congestion will be restricted according to adaptive routing techniques, thereby causing their routing to be restricted. Further, the disclosed adaptive routing techniques handle persistent flows.

This disclosure describes systems and methods that can accommodate billions of computations, for example, performing data intensive tasks such as simulation, data analysis, artificial intelligence workloads, at billions of speeds. In particular, an HPC network or interconnect fabric is provided that may be Ethernet compatible, capable of connecting to third party data storage (data storage), and may be constructed using extremely high bandwidth switch components (e.g., about 12.8Tb/s/dir per switch), with, for example, 64 200Gbps ports supporting large network creation at very low diameters (e.g., only three network hops). Furthermore, low latency can be achieved by novel congestion control mechanisms, adaptive routing, and using traffic classes that allow flexibility in bandwidth shaping, priority, and routing policies.

With respect to adaptive routing, the techniques and systems described herein may enable dynamic routing of flows by leveraging the identification and management of flow channels. When routing packets between a source node and a destination, typical routing techniques are either static or adaptive (dynamic). In one example of adaptive routing, local routing decisions are made dynamically based on load information and other factors. In current systems, adaptive routing can cause congestion to spread. That is, some data flows may be identified as congestion sources, while other data flows may simply be identified as congestion victims. In adaptive routing techniques that handle continuous flows, as disclosed herein, the victim flow is allowed to continue to make traditional routing decisions, while the routing of flows that cause congestion will be limited. As alluded to above, this capability is enabled through the identification and management of flow channels.

Fig. 1 shows an example network 100 that includes a plurality of switches, which may also be referred to as a "switch fabric. As illustrated in fig. 1, network 100 may include switches 102, 104, 106, 108, and 110. Each switch may have a unique address or Identifier (ID) within the switch fabric 100. Various types of devices and networks may be coupled to the switch fabric. For example, storage array 112 may be coupled to switch fabric 100 via switch 110; infiniband (IB) -based HPC network 114 may be coupled to switch fabric 100 via switch 108; a number of end hosts (such as host 116) may be coupled to switch fabric 100 via switch 104; and the IP/ethernet network 118 may be coupled to the switch fabric 100 via the switch 102. For example, a switch, such as switch 102, may receive 802.3 frames (including encapsulated IP payloads) through an ethernet device, such as a Network Interface Card (NIC), switch, router, or gateway. IPv4 or IPv6 packets, frames formatted specifically for network 100, etc. may also be received, transmitted through switch fabric 100 to another switch (e.g., switch 110). Thus, the network 100 is capable of handling multiple types of traffic simultaneously. In general, a switch may have edge ports and fabric ports. The edge port may be coupled to a device external to the structure. A fabric port may be coupled to another switch within the fabric via a fabric link.

In general, traffic may be injected into the switch fabric 100 via an ingress port of an edge switch and exit the switch fabric 100 via an egress port of another (or the same) edge switch. The ingress edge switch may aggregate the injected packets into flows, which may be identified by flow IDs. The concept of flows is not limited to a particular protocol or layer, such as layer 2 or layer 3 in the Open Systems Interface (OSI) reference model. For example, a flow may be mapped to traffic having a particular ethernet source address, traffic between a source IP address and a destination IP address, traffic corresponding to a TCP or UDP port/IP 5-tuple (source and destination IP addresses, source and destination TCP or UDP port numbers, and IP protocol numbers), or traffic generated by a process or thread running on an end host. In other words, a flow may be configured to map to data between any physical or logical entities. The configuration of this mapping may be done remotely or locally at the ingress edge switch.

Upon receiving the injected packet, the ingress edge switch may assign a flow ID to the flow. The flow ID may be included in a special header that the ingress edge switch may use to encapsulate the injected packet. In addition, the ingress edge switch may also examine the original header field of the injected packet to determine the appropriate egress edge switch address and include this address as the destination address in the encapsulation header. Note that the flow ID may be a link-specific locally valid value, and this value may be unique only to a particular input port on the switch. When a packet is forwarded to a next-hop (next-hop) switch, the packet enters another link and the flow ID may be updated accordingly. Since packets of a flow traverse multiple links and switches, the flow ID corresponding to the flow may form a unique chain. That is, at each switch, the packet's flow ID may be updated to the flow ID used by the outgoing link before the packet leaves the switch. This upstream-to-downstream one-to-one mapping between flow IDs may begin at the ingress edge switch and end at the egress edge switch. Since the flow ID need only be unique within the incoming link, the switch can accommodate a large number of flows. For example, if the stream ID is 11 bits long, the input port may support up to 2048 streams. In addition, the matching pattern (one or more header fields of a packet) for mapping to a stream may include a larger number of bits. For example, a 32-bit long matching pattern (which may include multiple fields in the packet header) may map 2^32 different header field patterns. If the fabric has N ingress edge ports, a total of N x 2^32 identifiable flows can be supported.

The switch may assign a separate dedicated input queue to each flow. This configuration allows the switch to monitor and manage the congestion levels of the individual flows and prevent head of queue blocking that may occur if a shared buffer is used for multiple flows. When a packet is delivered to a destination egress switch, the egress switch may generate an Acknowledgement (ACK) and send the ACK back to the ingress edge switch along the same data path in the upstream direction. Because this ACK packet traverses the same data path, switches along the path can obtain status information associated with the delivery of the corresponding flow by monitoring the amount of outstanding (unacknowledged) data. This state information can then be used to perform flow-specific traffic management to ensure overall network health and fair handling of flows. As explained in more detail below, this per-flow queuing, in combination with flow-specific delivery acknowledgements, may allow the switch fabric to implement efficient, fast, and accurate congestion control. In turn, the switch fabric can deliver traffic with significantly improved network utilization without suffering congestion.

Flows may be set up and released dynamically or "on the fly" based on demand. In particular, a flow may be set up by an ingress edge switch (e.g., establishing a mapping of flow IDs to packet headers) when a data packet arrives at the switch and no flow ID has been previously assigned to this packet. As this packet travels through the network, a flow ID may be assigned along each switch traversed by the packet, and a chain of flow IDs may be established from ingress to egress. Subsequent packets belonging to the same flow may use the same flow ID along the data path. When a packet is delivered to a destination egress switch and an ACK packet is received by the switches along the data path, each switch may update its state information regarding the amount of outstanding, unacknowledged data for the flow. When the switch's input queue for the flow is empty and there is no more unacknowledged data, the switch may release the flow ID (i.e., release the flow channel) and reuse the flow ID for other flows. Such a data-driven dynamic flow setup and teardown mechanism may eliminate the need for centralized flow management and allow the network to quickly respond to traffic pattern changes. Note that the network architecture described herein is distinct from Software Defined Networking (SDN), which typically uses the OpenFlow protocol. In SDN, switches are configured by a central network controller and forward packets based on one or more fields in a layer 2 (data link layer, such as ethernet), layer 3 (network layer, such as IP), or layer 4 (transport layer, such as TCP or UDP) header. In SDN, such header-field lookups are performed at each switch in the network and there is no flow ID based fast forwarding as done in the networks described herein. Furthermore, since OpenFlow header-field lookups are done using Ternary Content Addressable Memory (TCAM), the cost of such lookups can be high. Moreover, since the header-field mapping configuration is done by the central controller, the setup and teardown of each mapping is relatively slow and may require a significant amount of control traffic. As a result, SDN networks may be slow to respond to various network conditions (such as congestion). In contrast, in the networks described herein, flows may be dynamically set up and torn down based on traffic demand; and packets may be forwarded according to a fixed-length flow ID. In other words, flow channels may be data driven and managed (i.e., set up, monitored, and torn down) in a distributed manner without intervention by a central controller. Furthermore, flow ID based forwarding can reduce the amount of TCAM space used and, as a result, can accommodate a much larger number of flows.

Referring to the example shown in FIG. 1, assume that storage array 112 is to send data to host 116 using TCP/IP. During operation, storage array 112 may send a first packet using the IP address of host 116 (as the destination address) and a predetermined TCP port specified in the TCP header. When this packet arrives at switch 110, a packet processor at the input port of switch 110 can identify the TCP/IP 5 tuple of the packet. The packet processor of switch 110 may also determine that this 5-tuple is not currently mapped to any flow ID and may assign a new flow ID to this 5-tuple. Further, switch 110 may determine the egress switch (which is switch 104) for this packet based on the destination (i.e., host 116's) IP address (assuming switch 110 knows that host 116 is attached to switch 104). Switch 110 may then encapsulate the received packet with a fabric header that indicates the newly assigned flow ID and the fabric address of switch 104. Switch 110 may then schedule the encapsulated packet to be forwarded towards switch 104 based on a fabric forwarding table, which may be computed by all switches in fabric 100 using a routing algorithm, such as a link state or distance vector.

Note that when the first packet is received, the operations described above may be performed substantially at line speed with little buffering and delay. After the first packet is processed and scheduled for transmission, subsequent packets from the same flow can be processed by switch 110 even faster because the same flow ID is used. In addition, the design of the flow channels may be such that the distribution, matching and de-distribution of the flow channels may have substantially the same cost. For example, a conditional assignment of a flow channel based on a lookup match and a separate, independent de-assignment of another flow channel may be performed concurrently in nearly every clock cycle. This means that the generation and control flow path adds little additional overhead to the regular forwarding of packets. On the other hand, congestion control mechanisms may improve the performance of some applications by more than three orders of magnitude.

At each switch along the data path (which includes switches 110, 106, and 104), a dedicated input buffer may be provided for the flow and the amount of transmitted but unacknowledged data may be tracked. When the first packet arrives at switch 104, switch 104 may determine that the destination fabric address in the fabric header of the packet matches its own address. In response, switch 104 may decapsulate the packet from the fabric header and forward the decapsulated packet to host 116. In addition, switch 104 may generate an ACK packet and send this ACK packet back to switch 110. As this ACK packet traverses the same data path, switches 106 and 110 may each update their own state information for the flow's unacknowledged data.

Generally, congestion within the network causes network buffers to fill. When a network buffer is full, traffic that ideally attempts to pass through the buffer should be slowed or stopped. Otherwise, the buffer may overflow and packets may be lost. In conventional networks, congestion control is typically done end-to-end at the edge. It is assumed that the core of the network serves only as a "dummy pipe," the primary purpose of which is to forward traffic. Such network designs often suffer from slow response to congestion because congestion information often cannot be sent quickly to the edge devices, and the resulting actions taken by the edge devices are not always effective in removing congestion. This slow response in turn limits the utilization of the network, since network operators often need to limit the total amount of traffic injected into the network in order to keep the network clear. Furthermore, end-to-end congestion control is typically only effective if the network has not been congested. Once the network is heavily congested, end-to-end congestion control will not work because the congestion notification message itself will be congested (unless a separate control plane network different from the data plane network is used to send the congestion control message).

In contrast, flow paths may prevent such congestion from growing within the switch fabric. The flow path mechanism may recognize when a flow is experiencing some degree of congestion and in response may slow down or stop new packets of the same flow entering the fabric. These new packets, in turn, may be buffered in a stream path queue on the edge port, and they are only allowed into the fabric when packets for the same stream exit the fabric at the destination edge port. This process may limit the total buffering requirements for the stream within the fabric to an amount that will not cause the fabric buffer to become too full.

In terms of flow channels, the switch has fairly accurate state information about the amount of in-transit data in outstanding transmissions within the fabric. This status information for all flows on the ingress edge port may be aggregated. This means that the total amount of data injected through the ingress edge port can be known. Thus, the flow channel mechanism may set a limit on the total amount of data in the structure. When this limiting action is applied by all edge ports, the amount of packet data in the entire fabric can be well controlled, which in turn can prevent the entire fabric from saturating. The flow path may also slow down the progress of individual congested flows within the fabric without slowing down other flows. This feature can keep packets away from congestion hot spots while preventing buffers from becoming full and ensuring available buffer space for irrelevant traffic.

Operation of flow channels

In general, a flow path may define a path for each communication session across the switch fabric. The path and amount of data belonging to each flow may be described in a set of dynamically connected flow tables associated with each link of the switch fabric. On each ingress port, edge, and fabric, a set of flow path queues may be defined. There may be one queue per flow channel. When packets arrive, they may either be assigned to flow channels on the edge port, or already assigned to flow channels on the fabric ingress port by the egress fabric port of the link partner. The flow lane information may be used to direct the packet into the appropriate flow lane queue.

Fig. 2A illustrates an example switch that facilitates a flow path. In this example, the switches may include crossbar switch 202. Crossbar switch 202 may have several input ports (such as input port 204) and several output ports (such as output 208). Crossbar switch 202 may forward packets from input ports to output ports. Each input port may be associated with several input queues, each assigned to a different incoming stream arriving at the input port. For example, data arriving at a given port of a switch may first be segregated based on their respective flows and stored in a flow-specific input queue (such as input queue 206). Packets stored in the input queues may be dequeued and sent to the crossbar switch 202 based on a scheduling algorithm designed to control congestion (described in more detail in later sections). On the output side, once a packet passes through crossbar switch 202, it may be temporarily stored in an output transmit queue (such as output transmit queue 210), which may be shared by all flows leaving on the same output port. At the same time, the packet's header may be updated with the flow ID of the outgoing link before the packet is dequeued from the outgoing transmit queue and transmitted on the outgoing link. Note that this hop-by-hop (hop-by-hop) flow ID mapping may be done as the first packet in the flow travels across the network. When the packet reaches the next hop switch, the packet may again be stored in the flow-specific input queue and the same process may be repeated. Note that the flow ID is used to distinguish flows traveling on the same fabric link and may typically be assigned by the transmitter end of the link, which is the output port of the switch that is transmitting onto this link.

By providing a flow-specific input queue, the switch can allow each flow to move independently of all other flows. The switch can avoid the queue head blocking problem, which is common in the case of shared input buffers. The flow-specific input queues also allow packets within a single flow to be kept in order. As a flow passes through the switch, flow-specific input queues on each input port may be allocated for the flow, and become linked, effectively forming one long queue for the flow that spans the entire fabric, and packets for the flow may be kept in order.

The progress of successful delivery of packets belonging to a flow may be reported by a series of ACKs generated by the edge ports of the egress switches. These ACK packets may travel in the opposite direction along the data path traversed by the data packet and may be forwarded by the switch according to forwarding information maintained in the flow table. As ACK packets travel upstream, they may be processed by each switch's input queue manager, which may update the state information of the corresponding flow based on the information carried by the ACK packets. The ACK packet may have a type field to provide advanced information about the downstream data path, such as congestion. The switch's input queue manager may use this information to make decisions about pending packets currently buffered in its input queue, such as limiting the transmission rate or changing the forwarding path. In addition, the input queue manager may update the information carried in the ACK packet based on the buffered flow's state information so that the upstream switch can make the appropriate decision. For example, if an input queue for a given flow is experiencing congestion (e.g., the amount of data in the queue is above a predetermined threshold), the input queue manager may update an ACK packet forwarded to the next upstream switch to include the congestion information.

If the ACK corresponds to the last packet of the flow, the switch may determine that there is no more unacknowledged data for the flow. Accordingly, the switch may release the flow channel by removing the corresponding entry in the flow table.

As mentioned above, the input queue manager at each switch may maintain information about transmitted but unacknowledged data for a given flow. Fig. 2B shows an example of how switches along a data path may maintain flow state information. In this example, the data path taken by the flow may include switches 222, 224, and 226. The amount of transmitted but unacknowledged stream data may be indicated by a variable "flow _ extend", which may be measured in the number of fixed-length data units, such as 256 bytes. In addition, flow _ extend and other flow state information may be maintained by the switch's input queue manager, which may continuously monitor all flow-specific queues.

In the example in fig. 2B, the flow _ extend value at the input queue manager of the switch is 1 because there is one data unit that has been sent out of the input queue and forwarded through the crossbar switch. Note that data packets sent by the input queue may be temporarily buffered in the output transmission buffer due to scheduling of all data packets to be transmitted via the output link. When such a packet is buffered in the transmit buffer of the output port, the input queue may still treat the packet as transmitted for the purpose of updating the flow _ extend value.

Accordingly, since the input queue for a given flow at switch 226 has six queued data units, and two additional data units are in transit between switches 224 and 226, the flow _ extend value at switch 224 is 9. Similarly, the flow _ extend value at switch 222 is 13 because there are three data units stored in the input queues at switch 224 and one data unit is in transit between switches 222 and 224.

In general, a flow lane may remain assigned to a single flow until all ACKs for all packets sent on the flow lane have been returned. This means that the flow channel table entries may remain active for a longer time near the fabric inlet edge ports than near the outlet edge ports. If a single packet is injected into the network, one flow channel may be allocated for the ingress edge port, and then another flow channel may be allocated for the next fabric link traversed by the packet, and so on, until the last flow channel is allocated when the packet reaches the last fabric link. Each allocation may generate a flow ID (denoted as variable "flow _ ID") to identify an entry of the flow table of the fabric link. (more details about the flow path table are provided below in connection with the description of FIG. 4A.) this first packet may cause an assignment of a different flow _ id on each fabric link across the switch fabric that the packet traverses.

At the input queue of each switch, the flow channel table entry may indicate the state information (including the flow _ extend value) for each flow from that point downstream to the flow's egress destination edge port. A packet received on a local input port may increment the flow _ extend value by the amount of incoming data and an ACK may decrement the flow _ extend by the amount of acknowledged delivery data.

When a packet reaches the final destination egress port, an ACK packet may be generated and returned for the packet. At each switch along the data path, the ACK may be routed using the data path information stored in the corresponding entry of the flow channel table. Alternatively, the ACK packet itself need not carry path information and therefore may be small and lightweight. If no other packets are sent on the flow, the ACK may release each flow channel in reverse order. Once released, the flow path at each switch may be assigned to a different flow.

If another packet follows the first packet on the same flow, it will need to receive an ACK corresponding to the second packet before the flow path can be released at the given switch. In one embodiment, the flow channel may be released only if the ACKs of all transmitted packets of the same flow have returned.

In general, various protocols may require packet delivery in order. Stream lanes may be used to guarantee this delivery order even when the fabric uses adaptive routing for load balancing across multiple data paths. If packets between ingress and egress edge ports (possibly in different switches on the far side of the fabric) are injected at a very low rate, each injected packet may reach its destination and return an ACK to the source before injecting the next packet. In this case, each packet may be a bootstrap packet and use the best available dynamic adaptive routing to freely take any path across the fabric. This is possible because the first packet may define the path of the stream through the fabric.

Now assume that the packet injection rate is slightly increased to the point where the next packet of the same stream is injected before the ACK for the current packet has returned to the source. The second packet may convey the ACK of the first packet somewhere along the data path of the flow. Beyond (beyond) this delivery point, the ACK will release the flow channel assigned to the first packet because the flow _ extend value associated with the first packet is restored to zero when the logic of the flow channel processes the ACK. At the same time, the second packet can now define a new flow, as it again causes the flow channels to be distributed over each of the subsequent fabric links. This second packet, while causing the flow channels to be distributed beyond the delivery point, may be forwarded to a different path based on dynamic adaptive routing. On the other hand, before the delivery point, the second packet may expand the outstanding stream created by the first packet to include the second packet. This means that the ACK of the first packet may not reduce the flow _ extend value to zero and the flow channel may remain active until the delivery point. That is, the second packet may follow the exact path taken by the first packet up to the point of delivery. Note that when it follows the previous packet, the second packet cannot reach the egress edge port before the first packet reaches the egress edge port, and thus the correct packet order can be maintained.

If the injection rate of the flow is further increased, the second packet will pass the ACK of the first packet at a position closer to the destination edge port. It is also possible that a third, fourth, fifth or additional packet may enter the structure before the ACK of the first packet returns to the source edge port, depending on the packet injection rate and packet-ACK round trip delay for the flow. The maximum packet rate may depend on the size of the packet and the bandwidth of the link. The round trip delay of the data packets and ACKs may be an important parameter for the fabric implementation and may be used along with the maximum packet rate to calculate the maximum number of flow channels required for each link. Ideally, the design can provide a reasonable number of unassigned flow channels regardless of the flow pattern. When a large number of packets arriving at an ingress edge port have different destinations and the packets have small sizes and high injection rates, the demand on the number of flow channels may be high. In the most extreme case, each packet may be assigned a different flow channel. These flow channels are released when the ACK for the packet returns. Accordingly, the required number of flow channels may be calculated as ((packet rate) × (average packet-ACK round trip delay)).

Note that the packet rate on a single flow channel is not to be confused with the packet rate on the link. If the traffic pattern is such that many small packets are sent to different destinations, consecutive packets sent onto the link may have different destinations. This means that each packet may belong to a different flow and may be the only packet using the corresponding flow channel. In this example, the link would experience a high packet rate, but the packet rate of each flow would be low. Optionally, several ACKs (e.g., 48 ACKs) may be aggregated together into a single ACK frame for transmission over the link and protected by a frame check sequence (e.g., 32-bit FCS). For example, each ACK may occupy 25 bits and the frame may have an overhead of 9 bytes. That is, the overhead per ACK on a full-size frame is about 9/(25/8 × 48) × 100% — 6%. Logic may optimize the number of ACKs per frame so that when these ACKs arrive slowly, the ACKs do not have to wait too long to aggregate. For example, the ACK aggregation logic block may use three timers to manage ACK transmission based on the activity of the outgoing link. These timers may be started when a new ACK arrives at the ACK aggregation logic block. If the sending link is idle, a first timer (which may be set to 30ns, for example) may be used to hold the ACK while waiting for additional ACK to arrive. When this timer expires, all ACKs received within the corresponding time window may be aggregated into one frame and transmitted onto the outgoing link. If the outgoing link is busy, a second timer (which may be set to 60ns, for example) may be used to wait for additional ACKs. Using this second timer may allow more ACKs to be aggregated into a single frame and this frame may only be transmitted if a predetermined number of ACKs are collected. Note that due to ethernet framing constraints, some number of ACKs in a single frame may use less line bandwidth (wire bandwidth) per ACK than other numbers of ACKs. A third timer (which may be set to 90ns, for example) may be used if a significant number of ACKs are not collected and the outgoing link is still busy sending normal packets. Once this third timer expires, all ACKs that have been collected may be aggregated in one frame and transmitted onto the link. By using these three timers, the system can significantly reduce the overhead of sending ACKs on the outgoing link.

In some examples, an ingress edge port of a switch may encapsulate a received data packet with a fabric header, which allows forwarding of the packet using a flow path. Fig. 3A illustrates an example structure header for a data packet. The structure header may include: a flow _ id field, which can identify a flow channel; and a "data _ flow" field, which may indicate the progress of the entire flow.

At least one ACK may be generated when the data packet is delivered to its destination. Fig. 3B shows an exemplary ACK packet format. The ACK packet may include a "flow _ id" field, an "ACK _ flow" field, an "ACK type" field, and a Cyclic Redundancy Check (CRC) field. The flow _ id field may indicate the flow to which this ACK packet belongs. The ACK flow field may indicate the data packet to which this ACK packet acknowledges. Recall that each switch may maintain a flow _ extend value that indicates the amount of data that has been transmitted but not acknowledged. The flow _ extend value may be derived as data _ flow-ack _ flow, where the data _ flow value is taken from the last transmitted data packet.

The ACK type field may indicate different types of ACKs. As mentioned above, during normal operation, when a data packet is delivered to the destination edge port, a conventional ACK packet may be generated and sent back to the source. Accordingly, the ACK type field in the ACK packet may indicate a normal ACK. When congestion occurs in the fabric, the ACK type field may be used to indicate various types and severity of congestion, such as new congestion, persistent congestion, or severe congestion at the egress edge port (severe congestion requires rerouting of flows). In addition, in special cases (such as the presence of heavily congested fabric links, dropped packets, or link errors), ACKs may also be generated by intermediate switches that are not the ultimate destination, and the ACK type field may be used to inform upstream switches of different types of network conditions. Other additional fields may also be included in the ACK packet.

Fig. 3C shows the relationship between different variables used to derive and maintain state information for a flow. In this example, the switch may use the variable "total _ extend" to track the total amount of unacknowledged transmitted data and data currently queued at the switch. the total _ extend value may be equal to the sum of flow _ extend (which is the amount of data that has been transmitted without acknowledgement) and queue _ extend (which is the amount of data stored in the input queue for the corresponding flow). The variable "ACK _ flow" may indicate the data location corresponding to the most recent ACK for the flow. The variable "data _ flow" may indicate the location of the next packet to be transmitted, which also corresponds to the packet stored at the head of the input queue. The variable "next _ data _ flow" may indicate the location where the switch may expect the next packet received from an upstream switch. Note that queue _ extent is next _ data _ flow-data _ flow, and flow _ extent is data _ flow-ack _ flow.

In some examples, a flow channel table may be used to facilitate flow channels throughout the structure. A flow path table is a data structure that stores forwarding and state information for a given flow at a port of a switch. Fig. 4A illustrates an example of how state information associated with multiple streams may be stored using a stream channel table. This state information can be specific to each flow and efficiently stored in a table. Assume that source host 402 is sending a data packet to destination host 404 via a fabric. The data path traversed by the packet may include ingress edge switch 406, intermediate switches 408 and 430, and egress edge switch 432.

When a packet arrives at an ingress edge link 403 of a switch 406, the header of the packet may be analyzed by the address translation logic 410. Address translation logic 410 may determine the destination fabric address of the egress switch (in this case switch 432) based on the ethernet, IP, or HPC header information of the packet. Note that the address translation logic 410 may also use header information associated with other protocols or combinations of different protocols. The fabric destination address determined by the address translation logic 410 may then be used to perform a lookup in an Edge Flow Channel Table (EFCT) 412. EFCT 412 may perform a lookup operation on a packet using the packet's structural destination address and optionally additional values extracted from the packet's header, which may be referred to as a matching pattern. EFCT 412 may compare the matching pattern of packets to the stored matching patterns of all existing allocated flows. If a match is found, then this packet is part of an existing flow and the previously assigned flow ID may be returned for the packet. If no match is found, a new flow ID may be assigned to the packet and a matching pattern may be added to EFCT 412. In other words, EFCT 412 may be used to determine whether a flow channel already exists for an incoming packet or whether a new flow channel needs to be allocated. In addition to the destination fabric address, other packet header information (such as traffic class, TCP or UDP port number, and process or thread ID) may also be used to map or assign the flow ID.

The stream ID obtained by EFCT 412 may then be used as an index to map to an entry in an input stream channel table (IFCT) 414. Each entry in IFCT 414 may be indexed by a stream ID and store state information for the corresponding stream. The entries in IFCT 414 may store the values of next _ data _ flow, and ack _ flow (see FIG. 3C) associated with the flow. In addition, the IFCT entries may store other parameters for congestion control and dynamic routing of flows.

The flow ID may also be used to identify or assign a flow-specific input queue in which incoming packets may be temporarily stored. Status information for a particular queue and parameters for monitoring and controlling the queue (such as thresholds for detecting congestion) may be stored in corresponding entries in IFCT 414. The input queue management logic may determine when a packet may be dequeued from an input queue and sent to the data crossbar switch 413 based on a flow control parameter stored in an entry of the IFCT 414.

When a packet is dequeued from the input queue and sent through crossbar switch 413 to an output port, the packet is sent using the input port number on which it has arrived at switch 406. When a packet arrives at the output port's transmission buffer, the packet's header may be updated based on the packet's flow ID and input port number, where the new flow ID will be used by the next-hop switch for the same flow (i.e., switch 408). This is because each link may have its own set of flow channels in each direction, which are identified by their respective flow IDs. The mapping from incoming stream ID to outgoing stream ID used on the next link can be done by looking up the outgoing stream channel table (OFCT) 416. OFCT 416 may perform a lookup using a matching pattern that is a combination of the local input port number corresponding to link 403 and the flow ID of the packet produced by EFCT 412. If a match is found, the flow has been defined and the packet's flow ID is updated with a value corresponding to the match pattern (this new outgoing flow ID will be used by the downstream next-hop switch 408). If no match is found, a new outgoing stream ID may be assigned to the new stream channel, which may be mapped to the input port number and the previous incoming stream ID. An entry including the outgoing flow ID, the incoming port number, and the incoming flow ID may be stored in OFCT 416.

In the case where the packet is the first packet in the flow, the lookup performed in OFCT 416 will not result in any mapping. In turn, OFCT 416 may assign flow channels with flow IDs to packets to be used by ingress ports and IFCT 418 on switch 408. This new flow path (identified by its flow ID) may be added to the packet header for transmission onto link 417 and may be used by the link partner's (which is switch 408) IFCT 418 to access the flow path's congestion information. As previously described, if no match is found, OFCT 424 may further generate a new flow channel using a matching pattern of its input port number immediately upstream and the flow ID associated with link 417. OFCT 424 can then assign a new flow channel identified by the new flow ID. Note that OFCT 416 may also serve as a forwarding table for ACKs for this flow in the upstream direction. After being forwarded upstream from switch 408 to switch 406, the ACK packet may be updated with the flow ID associated with edge link 403 and forwarded to the appropriate input port on switch 406, as indicated by the corresponding entry in OFCT 416. The ACK packet may be forwarded by ACK crossbar 415 to the input port in the upstream direction.

Subsequently, when a packet arrives at switch 408, its flow ID may be used to identify the input queue to use and determine the entry in IFCT 418. If switch 408 has not previously allocated a packet's flow ID, a new input queue may be provided and a new entry in IFCT 418 may be created. From then on, a similar process may be performed to forward the packet across switches 408 and 430 until the packet reaches egress switch 432.

When a packet arrives at switch 432, ACK generator logic block 420 may generate an ACK packet based on the packet's flow ID and input port number after the packet is forwarded by data crossbar switch 423. This ACK packet may then be forwarded by ACK crossbar 422 in the upstream direction. Meanwhile, based on the ACK packet, IFCT 421 may update the state information of the flow in the corresponding table entry. When an ACK packet arrives at switch 430, OFCT 419 may be looked up to determine the upstream flow ID and upstream input port to which the ACK packet is to be forwarded. The ACK packet may then update its flow ID and be forwarded to the appropriate input port in the upstream direction. Since ACK packets traverse the data path upstream in a similar manner, the IFCT at each switch may update its table entry based on the ACK.

Note that the flow _ extend variable may be an important parameter because it represents the total amount of packet data downstream of the flow. When the flow _ extend of the entry is zero, the stream channel is considered to be freely reassigned to another stream. Generally, upon receiving a new packet, the input logic may request that data be sent to the output port. The selected output port may be dependent on the flow _ extend stored in the IFCT. If flow _ extend is zero, there are no packets downstream in the flow to the destination egress edge port. As a result, the switch can use load-based adaptive routing to select any active path to the destination. In a multi-path network, dynamic adaptive routing can be accomplished without reordering packets. If flow _ extend is not zero, and if delivery in order is required, the packet may use the same route taken by the previous packet. The IFCT may have a field that stores a previous output port number that is loaded when a packet request is made to an output port and may be used to ensure connection with the previously used output port.

As mentioned previously, a flow channel may use a matching function to identify packets belonging to an existing flow. When a frame or packet is received on an ingress edge port, the received ethernet frame or other type of packet may be parsed in real time and some fields of the packet header may be used for lookup in a CAM or Ternary Content Addressable Memory (TCAM). If there is a match, the matching address may become the stream ID used to select the stream channel. When no match occurs, the switch hardware can load the pattern of the failure to match directly onto the free row of the CAM, which can be done without additional delay. As a result, any subsequent packet can be matched to this new entry without extensive buffering. The selected free entry becomes the new stream ID of the new stream channel entry. Note that no external software intervention is required to load the new entry. This process may be done autonomously by the switch hardware.

The deallocation of the flow ID and corresponding CAM matching row may also be performed automatically by the hardware when the last ACK is returned for the flow. Deallocation can occur in hardware with respect to a potentially matching new packet without external software intervention.

In some examples, the ingress edge switch 406 may include fine-grained flow control logic 434 that may communicate with a Network Interface Controller (NIC)401 on the host 402 to apply flow control on a per-flow basis. More details on fine-grained flow control are provided below in connection with the description of congestion management.

Fig. 4B shows an example of EFCT. In this example, EFCT may include a data _ flow field 454, an ACK _ flow field 456, and optionally additional fields. The EFCT may be associated with an input port and entries in the EFCT may be indexed by a flow ID value (such as flow ID 452). In one embodiment, the match pattern field may reside in a match function logic block, which may comprise a CAM or TCAM. The match function logic may use the match pattern to generate a flow _ ID value, which in turn may be used as an index to a corresponding EFCT entry. From the perspective of the EFCT, flow _ extend (i.e., data _ flow-ack _ flow) may include all unacknowledged data downstream of the table, which may include the local flow _ queue plus the flow _ extend value of the corresponding IFCT.

Fig. 4C shows an example of IFCT. In this example, an IFCT may be associated with an input port and may include a follow _ port field 466, a next _ data _ flow field 468, a data _ flow field 470, an ACK _ flow field 472, an ep _ context field 474, an Upstream Metering (UM) flag field 477, a Downstream Metering (DM) flag field 478, and optionally additional fields. The flow _ ID value of an incoming packet (such as flow _ ID 464) may be used as an index to look up the output port number (indicated by the follow _ port field 466) and the state information associated with the corresponding flow. Congestion control information associated with endpoint congestion (such as ep _ congestion field 474) and hop-by-hop credit-based flow control (such as UM flag field 477 and DM flag field 478), the latter of which is described in more detail later in this document, may also be stored in the IFCT. The IFCT may further store information related to dynamic routes associated with different flows.

Fig. 4D shows an example of OFCT. In this example, the OFCT may be associated with an output port and may include an input _ port field 482, an input _ port _ flow _ ID field 484 (which corresponds to the existing flow _ ID of a packet as it arrives at the input port), a data _ flow field 486, an ACK _ flow field 488, and optionally additional fields. The data _ flow field 486 and the ACK _ flow field 488 may be used to determine the value of flow _ extend from the OFCT onward. The combination of input _ port field 482 and input _ port _ flow _ ID field 484 (which may also be referred to as an "incoming stream ID") may be used to determine or assign an outgoing stream ID ready for transmission to a packet on the outgoing link corresponding to the OFCT. In one embodiment, an outgoing flow ID value (such as flow _ ID 486) may be used as an index to look up an entry in the OFCT.

Adaptive routing using flow channels in a multi-path network

As previously described, the flow channels may define a path across the network for each communication. The first packet in a flow may define a path and, if the flow remains active, subsequent packets may be forced to follow the same path (defined by the flow channel) as the first packet takes. A high performance fabric may have many routes from a particular source to another destination. Multipath networks allow for a greater total split network bandwidth. In most cases, HPC systems include a multi-path network. A common metric used to measure the performance of a multi-path network is the global bandwidth (the total bandwidth delivered when in full-to-full communication mode). In most networks, each node in the system sends packets over a set of links that are used to transmit data from a source to a destination. Some methods use hash values generated from values found in packet headers. While this improves performance, it also suffers from systematic misbehavior, resulting in unpredictable performance. Using local load information to make dynamic adaptive routing decisions may be an improvement over the hash-based techniques described above. However, the use of load information may allow reordering of packets for a single flow. Reordering may occur, for example, when a packet is sent in a new direction as a result of a routing decision and back exceeds a packet sent in an old direction.

This type of out-of-order delivery (or reordering) can be a serious problem for some network protocols. Notably, most ethernet networks should deliver packets in order. For HPC environments, the ordering requirements may vary depending on the programming model. In yet another example, MPI requires point-to-point ordering of messages, but does not require delivery of bulk data in order. The PGAS remote memory access model also requires point-to-point ordering of accesses to the same address, but may allow reordering of operations that act on different addresses. The different transport layers should be able to specify their minimum ordering requirements and the switch fabric should be able to meet these requirements.

Allowing a truly dynamically adaptive routing decision to be made for each packet at each routing stage may result in reordering packets within the same flow from source to destination. Thus, using a packet-level dynamic routing method may cause packets of a flow to be scattered throughout a multi-path structure. As a result, any control of the flow as a packet flow may be lost. However, packet-level dynamic routing may be a good model for some network traffic patterns, such as Uniform Resource Locator (URL) traffic, for example, billions of update per second (GUPS) benchmarks. For example, GUPS typically generates many small packets, where each packet is sent to a random destination. Thus, in large networks using GUPS, individual flows from a particular source to a particular destination have less chance of forming such traffic, and the adverse effects of persistent flows may never occur. In the case of GUPS, dynamic adaptive routing of small packets to random destinations can produce very balanced load across the fabric. However, adaptive routing based on flow channels, as disclosed herein, may result in optimal routing in each of the above examples of networking environments.

The utilization of flow channels enables true dynamic adaptive routing decisions to be made for the first packet of a new flow based on the local load of the network. Referring back to the MPI and PGAS environments where small messages are generated, these messages appear as new flows to the structure implementing the flow channel. These flow channels may allow fully adaptive routing, resulting in very similarly balanced loads that may be generated across the fabric (UR traffic is an extreme case).

With respect to network environments that require point-to-point ordering (or where point-to-point ordering is desired), the use of flow channels may ensure that subsequent packets in the flow are forced to follow the first packet, thereby preventing packet reordering. Furthermore, the stream path gives an opportunity to handle packet loss caused by link errors. Each packet sent onto a link may be reordered in the flow channel state. Input logic may detect a packet loss for a stream by observing "holes" in the stream. The lack of packets (or the location of holes) may be signaled back to the source of the stream.

In addition, for network environments that do not require point-to-point sequencing, flow channels may still be used to provide significant advantages. For example, if an out-of-order traffic class has been defined, each packet in the flow may be allowed to be adaptively routed. Thus, instead of defining a single path across a multi-path structure, a tree of paths will be formed that all converge to the same destination. Each flow on each downstream fabric link will have one packet passing through it and if the same output port is used again, will have more packets passing through it. All downstream flow channels will be referred back to the same upstream flow channel. An individual ack will retrace the path taken by the packet that created the ack at the egress edge port. When these packets reach the destination, they may be out of order. However, by using a flow channel, the following advantages can be achieved:

the ingress flow table can still accurately measure the total amount of injected packet data for an egress flow. This means that node injection limits for limiting the total amount of packet data in the network can still come into play. This limit preserves the structure input buffer space even on a tapered structure, and in so doing prevents congestion from developing.

Ack may still signal the destination as being congested (either because the node is saturated and/or is otherwise forming an incast) back to the source edge port. These acks can then control congestion in two ways; first to limit the total amount and maximum bandwidth that the flow can inject into the structure, and second to force the flow to become ordered. Once in order, the flow channel tree, which may have started, will merge back (collapseback) into a single source-destination stream.

Chaotic flow may provide excellent performance for desirable flow patterns, especially for HPC. However, out-of-order traffic also exacerbates the congested traffic pattern. Out-of-order traffic also consumes available fabric bandwidth to the extent that other applications that may share the same fabric and have other traffic patterns (with much longer flows) may be prevented from making any significant progress. In contrast, a stream path with an injection limit manages the fabric utilization of each application so that each application keeps its access fair. As disclosed herein, adaptive routing using flow channels allows both dynamic adaptive routing and in-order delivery of packets (e.g., prevents reordering).

Adaptive routing using flow channels in congested networksFig. 5 illustrates an example of congestion flows 510, 520 in a multi-path network 500 that includes a plurality of switches 501 and 506 and a plurality of paths 531 and 537. By implementing dynamically adaptive routing in accordance with the disclosed embodiments, these congested flows (e.g., congestion sources or congestion victims) may be identified and then routed accordingly. Additionally, the negatives across the multipath network 500The traffic may be distributed via the disclosed dynamic adaptive routing techniques. In the illustrated example of fig. 5, two streams 510 and 520 are shown. Stream 510 may have a destination (destination not shown) that is different from the destination of stream 520. However, both flows 510, 520 share a congested link 535. The flows 510, 520 may be continuous flows. As mentioned herein, a continuous flow may be described as a flow that lasts for a long time without interruption. As a continuous flow 510, 520 may have the potential to saturate the link 531-537 at full bandwidth. The steering packet in one of these flows (e.g., flow 510) may establish a path across the network 500. Subsequent packets in the stream (such as stream 510) may then continue on this path established by the first packet, which may maintain the order of the packets in the stream. The new flow may be adapted to bypass (around) the already established flow, but if all new flows are also persistent and congested flows, the traffic pattern may become static and relatively suboptimal (if not adaptively routed). Thus, the disclosed adaptive routing techniques may utilize flow channels in the presence of a continuous flow (such as flows 510, 520) to address the negative effects of congestion. As a general description, congestion may be caused by bottlenecks within the system. Bottlenecks may include, but are not limited to:

final Link-if many sources are trying to send one to a single destination, as in the case of incast

NIC-if NIC consumes packets not urgently

Variation of bandwidth-if the bandwidth changes from a high bandwidth link to a lower bandwidth link

Single Link-A single link in the network ontology has too much flow directed through it

In fig. 5, the example illustrates a single link 535 that may be a bottleneck. As shown, link 535 is experiencing congestion. For example, switch 505 may have a fabric egress port (with a deep egress port header FIFO) that congests link 535. The manner in which the bottleneck can be resolved may depend on the manner and location in which the bottleneck is detected. If a flow is to be rerouted, the adaptive routing technique takes into account a reasonable confidence whether the changes due to the rerouting will reduce congestion. For example, the adaptive routing technique may determine that flow 510 is the source of congestion. As a source of congestion, it may indicate that there is a high degree of confidence, as compared to just a victim, as it relates to rerouting the flow. Thus, if flow 510 is identified as a source of congestion, adaptive routing techniques may decide to limit the routing of flow 510, or otherwise restrict routing decisions 500 made for flow 510 in a manner that prevents the flow from spreading the congestion further in the network. Conversely, the adaptive routing technique may determine that flow 520 is a congestion victim (as opposed to a source). As a victim flow, there is little confidence in rerouting that particular flow. Reiterating, rerouting a victim flow (such as flow 520) will not significantly prevent more congestion from occurring on network 500. Thus, adaptive routing techniques allow flow 520 to continue making its own adaptive routing decisions under the assumption that the victim flow will not substantially affect congestion on the network.

In practice, adaptively routing a continuing flow that is a cause of congestion, such as flow 510, may reroute the flow by directing flow 510 to bypass (around) an existing bottleneck. This rerouting decision may be based on the following concept: effectively redirecting the flow ensures that the new path does not return to the same bottleneck before reaching the final edge port. The new route for the flow selected from the adaptive routing may have spare capacity on each link along its entire path to provide improvements in transmission over the original route. The stream channel may give visibility of the entire stream from the source to the destination. Along the path of the stream, there may be a minimum bandwidth point. Beyond (beyond) this low bandwidth point, packets can be accepted from the input queue very quickly. Thus, the input queue may be generally empty (or nearly empty). In contrast, prior to the low bandwidth point, with a continuous flow, packets will most likely accumulate in the FIFO of the output port. At the ingress edge switch, the flow control mechanism will limit the delivery rate to set a limit (cap) on the total amount of nodes and flow data within the fabric, before the switch with the congestion queue. Referring back to fig. 5, the low bandwidth point in network 500 is considered to be at congested link 535. As can be seen, at switch 506 downstream of link 535, its input queue is shown as not congested or empty (e.g., indicated by several vertical lines). At switch 505, its output (or egress) port is illustrated as being congested due to the accumulation of packets (indicated by the multiple arrows) prior to the low bandwidth point on link 535. Further upstream of the low bandwidth point at link 535, particularly at ingress edge switches 501 and 502, flow control may implement injection limits to prevent too much data from entering the structure by keeping packets at their respective input queues (indicated by many vertical lines).

Adaptive rerouting using flow channels in a congested network

Further, the adaptive routing techniques disclosed herein may implement flow rerouting to redirect flows away from congested fabric links. The performance of the structure may depend on the load balanced across the entire structure on these links. Balancing can be achieved by adaptively routing subsequent packets away from the heavily loaded link. Adaptive routing may also rely on injection limits as part of each edge ingress port IFTC in order to prevent too much data from entering the structure, as illustrated in fig. 5.

The injection limit may be particularly important on tapered structures where the number of expensive optical global links has been reduced to facilitate cost considerations. These tapered global links can provide an intermediate fabric bandwidth bottleneck and are a natural place for congestion to develop. The injection limit sets a limit (cap) on the total structural data. If there is no single egress fabric port overloaded onto a global link and each global link has significant load, the fabric may be considered well balanced. Another well-balanced characteristic may include a small number of expensive links that do not contribute to data delivery. Conversely, an unbalanced structure may be characterized by the depth of the output FIFO of the overloaded global structure egress port becoming too deep. By implication, and assuming that there are operational injection limits, other global fabric outlet ports will not be fully utilized. This means that attempting to move (or redirect) some flows that are currently using overloaded links will likely result in them being directed towards underutilized global links and should provide significant rebalancing of the fabric.

The depth of the egress port output FIFO may be a good measure of port loading. If the depth itself and the rate of change of depth (first derivative of depth) can be combined to provide a measure (or value) of congestion on the port. The magnitude of the congestion value may then be used to decide whether or not a redirection of the flow should be attempted. For example, an overloaded port may have too many flows using it. Generally, to get proper balancing, a few flows should be redirected from that port. Moving too much flow may result in an under-loaded port, which is also associated with certain disadvantages.

According to an embodiment, the flow may be redirected (or moved) by returning a "redirect ack". Normal acks generated by any packet of the flow as it passes are unlinked, redirection acks belong to the flow and can be generated by any packet of the flow as it passes. As with all other acks, it follows the path of the flow upstream towards the edge ingress port.

In an embodiment, when a frame is loaded into the header output FIFO of the global fabric egress port, a redirect ack may be generated by observing the congestion level in that FIFO. At this point, the amount of congestion may be measured and compared to a locally created random number. If the congestion value is greater than the random number, a redirect ack is returned for the flow. The use of random numbers gives the probability of generating a redirection ack, which is proportional to the size of the congestion. In other words, only some flows will be redirected during a more nominal (more nominal) congestion condition, but more flows will be redirected if the congestion becomes severe. This ensures that not too many flows are moved unnecessarily, which can cause problems (e.g., insufficient port loading).

When an enabled IFCT (shown in FIG. 2) receives the redirected ack, the state of the flow in this IFCT changes from RUNNING to BLOCK _ FOR _ REDIRECT. The enabled IFCT may be any IFCT within the switch fabric. However, for example, the ICFT may be an edge port, such as an ingress edge port. This state prevents any new packets from dequeuing from the flow queue for the flow until flow _ extend becomes zero. Flow _ extend will become zero when all acks of all packets of the downstream flow have returned. At this point, the state may change back to RUNNING because it would be safe from a sequencing perspective to take the new path because the next packet taking the new path cannot anti-precede the previous packet. The first packet to be released after the state change back to RUNNING may use the best adaptive load information to select the path through the fabric where congestion is the least.

Referring now to fig. 6, an example of a process 600 for adaptively routing in the presence of a continuing flow is depicted. Process 600 may be implemented by a network switch (as shown in fig. 1). Thus, the process is illustrated as a series of executable operations stored in a machine-readable storage medium 640 and executed by a hardware processor 635 in the computing component 630. The hardware processor 635 performs the process 600, thereby implementing the techniques disclosed herein.

Generally, networks do not perform well in the presence of extreme congestion. The adaptive routing technique in process 600 takes advantage of the knowledge of the source of congestion in order to manage congestion when routing. Also, process 600 employs communicating congestion information and responding to indications of congestion. In many cases, congestion may be caused by over-subscription (over-subscription) of the network shared resources. Therefore, proper management of congestion (especially the source of congestion) is very important to provide fairness to all users on the network and good network utilization to the system as a whole. As described throughout, the disclosed adaptive routing techniques employ flow channels to manage congestion while routing a continuous flow. The stream lane may provide very fine grained control of the flow of frames from the source to the destination. Further, IFCT may enable feedback from early frames of the stream, which is delivered over ack.

As a general description, process 600 involves identifying when packets are going from a source to a destination, where each packet has complete freedom to take any path from the source to the destination. Then, when any packet detects congestion into a destination, a congestion acknowledgement for that packet then forces a new subsequent packet from the source to that destination to take only the path of that packet, where all congestion control can be used for a single ordered flow. Before congestion, there are no constraints on the route, and packets can take any path. Without congestion control, all of these different paths may become congested. When congestion forces a single path, only that path becomes congested (and the flow path potentially prevents congestion even on that path). Thus, the process 600 may monitor congestion into a destination experienced by any data transmission. Subsequently, in response to detecting congestion into the destination while any data transmission is in progress, all new data transmissions sharing the same source and destination may be forced to take only the path of the data transmission for which the congestion was detected (preventing congestion spreading).

The process may proceed by establishing a plurality of flow channels corresponding to a plurality of data transmissions at operation 606. For example, in a network having a switch fabric (shown in fig. 1) or a plurality of switches, data may be transferred from a source port to a destination port across fabric ports within the network body. Operation 606 involves establishing a plurality of flow channels corresponding to the data transfers. The flow channels are described in detail throughout, for example, with reference to fig. 2A and 2B. According to an embodiment, a switch may implement establishing a flow path and maintaining a flow path. As an example, a flow channel for a continuous flow communicated within the plurality of data transfers (from a source port to a destination port) may be established at operation 606. The flow path may use a set of dynamically connected flow tables and flow path queues (at the ingress port) to maintain the path and amount of data belonging to the persistent flow (and each flow in the plurality of data transfers). Since each flow and its path corresponds to a flow path, each flow path established is hooked to a source port and destination port pair.

The process may proceed to operation 608 where the plurality of transmissions are routed through the network via the switch fabric. According to an embodiment, switches in a fabric are configured to perform adaptive routing of traffic on a flow-by-flow basis. Accordingly, operation 608 may involve routing the plurality of data transmissions, including the continuous flow, using adaptive routing. As described above, adaptive routing in operation 608 may maintain the order of packets in the streams (by routing on a stream-by-stream basis), which is particularly desirable in networking environments that require point-to-point ordering. Details of adaptive routing and in-order packet delivery in a multi-path network are described in detail with reference to fig. 5. Further, the adaptive routing in operation 608 may include: congestion information (or real-time information about network load) is used to reroute flows in a way that avoids congestion hotspots. Details regarding rerouting for adaptive routing are previously described. In operation 608, routing decisions for each flow may be dynamically made at each port on its path in accordance with the disclosed adaptive routing techniques.

Subsequently, at operation 610, characteristics of the streams in the plurality of transmissions may be monitored. In particular, the flow channels for the multiple transfers may maintain the state of their respective flows. As a result, operation 610 may involve monitoring the flow channel for the status of the flow. In an example, the continuous flow may be on a path that includes a bottleneck (low bandwidth or over-utilized link), causing the continuous flow to experience congestion in the fabric. The capability of the disclosed switch (in addition to flow channels) is congestion detection. Congestion detection may include measuring congestion levels and utilization of ack packets indicative of congestion. For example, local congestion within a switch can be observed by buffer utilization. The buffer utilization may be an indication of the link utilization of the local port. Global congestion can be observed, which detects "hot spots" that traffic encounters on its path to the destination. In general, such congestion may be described as end point congestion (e.g., many-to-one communications), or as a result of multiple traffic flows being directed through a particular portion of a network and overloading resources within that particular area of the network. With respect to flow channels, an acknowledgement ack (newcongestionack) may be generated when a packet arrives at a heavily congested edge egress port. The ack may be transmitted to all upstream IFCTs to indicate that the flow is experiencing congestion characteristics, in this case, as a result of destination congestion (which may be caused by incast traffic). In some embodiments, flow channels may be utilized to adjust traffic based on receiving acks indicating congestion. By way of example, when the newconfigionack passes IFCT, the maximum flow _ extend, the maximum fabric link flow queue depth (referred to as queue _ extend), and the maximum flow bandwidth will all be reduced. This may prevent new packets from reaching an already congested destination. Thus, operation 610 initially determines whether the flow is experiencing congestion.

Another capability related to congestion detection implemented by the switches disclosed herein is to identify the source of congestion. Congestion detection (and management) can be implemented in hardware, with each switch detecting congestion, identifying its cause, and providing real-time feedback to its peers. Thus, the switch can determine whether traffic flowing through the congestion point contributes to congestion (congestion source) or a congestion victim. Accordingly, at operation 612, this capability may be leveraged in order to identify a flow as a source of congestion (e.g., a portion of traffic contributes to congestion) or alternatively as a congestion victim. In some cases, the source of congestion may be identified by any field derived from a packet that has contributed to congestion. For example, Explicit Congestion Notification (ECN) (e.g., a two-bit field in an IP header) may be used to signal congestion. As an example, the continuous flow may be part of an incast communication, where the injected bandwidth may far exceed the ejection bandwidth (ejection bandwidth) in a manner that causes congestion. Thus, operation 612 may use the flow state (e.g., congestion ack) maintained within its corresponding flow channel to identify whether the flow experiencing congestion is related to the source of the congestion. In the event that the continuing flow is identified as the source of congestion, process 600 continues to operation 614.

Thereafter, at operation 614, routing decisions made for flows identified as sources of congestion are constrained. Routing decisions made for flow paths corresponding to congested source flows are limited in a manner that prevents congestion propagation. Restated, using the adaptive routing techniques described above, flows that are causes of congestion are not allowed to continue through the fabric (as are flows that do not propagate congestion). For example, once a flow is identified as a source of congestion at the previous operation 612, new packets for that flow may be forced to take only one path for which congestion was detected (preventing congestion spreading). Furthermore, any subsequent data transmission having the same source and destination as the flow associated with the source of the congestion may also be forced to take the same path where the congestion was detected.

As another example, a flow path may constrain the routing of a flow (identified as a source of congestion) from entering the fabric by slowing down the injection of new packets that are part of the same flow. Restricting flow may include forcing new packets to be buffered in a flow path queue in the edge port. These new packets are only allowed into the fabric when the packets of the same flow leave the fabric at the destination edge port. By limiting the routing of flows in this manner, the total buffering requirements for the flow within the fabric are limited to an amount that will not cause the fabric buffers to become full, and thereby prevent further congestion in the network (caused by the flow).

Alternatively, operation 612 may determine that the flow is not the source of congestion but the victim. As a result, process 600 continues to operation 616. At operation 616, flows identified as congestion victims are allowed to nominally continue to route through the fabric in accordance with the disclosed adaptive routing techniques. Referring back to the example where the continuing flow is part of an incast communication and is identified as the source of congestion, another flow (another of the multiple data transmissions) that is not part of the incast communication may be routed to share a link with the incast flow. Although incast flows cause congestion, subsequent flows still experience congestion (due to being on the same congested link). The process 600 does not treat flows that are sources of congestion and flows that are merely victims of congestion in the same manner, thereby improving the performance of the overall system (e.g., without restricting the routing of flows that have a low likelihood of significantly reducing congestion).

By utilizing flow channels in adaptive routing, each flow in a network switch may have its own dedicated packet queue. This separates the flow control for each stream, further allowing separate flow control for each stream. This completely separate flow control enables the network to be lossless. For example, one flow using a link may be blocked on its way to the final destination without blocking any other flow using the same link as the first blocked flow. Unlike a conventional packet-switched network, congestion in one part of the network will only affect the flows that send packets into the congestion bottleneck. A typical lossless network can cause the buffer before the congestion bottleneck to be filled quickly with congested packets. This in turn forces the switch to assert a pause or use some other flow control method to prevent the previous switch from sending packets onto the link with the fill buffer. The congested packets are stopped and all other packets that may not go to the congestion bottleneck are also stopped, forcing the congestion to spread laterally and increasing the size of the saturation tree.

By implementing flow channels, the load on the link presented by a congested flow before a congested bottleneck is reduced, allowing other flows sharing earlier links to use more link bandwidth and complete their communications faster. Only packets belonging to flows identified as sources of congestion bottlenecks are slowed down. Other flows not affected by congestion are not slowed down. These flows will utilize the released load previously assumed by the congested flow. Eventually, the congestion will be cleared and the flows destined for the congested hot spot will complete their communication without losing any packets. Fig. 7 illustrates an example switch 702 (which may be an embodiment of any one or more of switches 102, 104, 106, 108, and 110) that may be used to create a switch fabric (e.g., switch fabric 100 of fig. 1). In this example, switch 702 may include several communication ports, such as port 720. Each port may include a transmitter and a receiver. The switch 702 may also include a processor 704, a storage 706, and stream channel switching logic 708. The stream lane switch logic 708 may be coupled to all communication ports and may further include a crossbar switch 710, an EFCT logic 812, an IFCT logic 814, and an OFCT logic 816.

Crossbar switch 710 may include one or more crossbar switch chips that may be configured to forward data packets and control packets (such as ACK packets) among the communication ports. EFCT logic 712 may process packets received from the edge link and map the received packets to corresponding flows based on one or more header fields in the packets. Additionally, EFCT logic 712 may assemble FGFC ethernet frames that may be communicated to an end host to control the amount of data injected by the various processes or threads. The IFCT logic block 714 may include an IFCT and perform various flow control methods in response to control packets, such as endpoint congestion notification ACKs and fabric link credit-based flow control ACKs. The OFCT logic 716 can include a memory unit that stores OFCTs and communicates with the IFCT logic of another switch to update the flow IDs of packets as they are forwarded to the next hop switch.

In one embodiment, switch 702 is an Application Specific Integrated Circuit (ASIC) that can provide 64 network ports that can operate at either 100Gbps or 200Gbps to achieve a total aggregate throughput of 12.8 Tbps. Each network edge port may be capable of supporting IEEE 802.3 ethernet and optimized IP based protocols as well as portals (an enhanced frame format that provides support for higher rate cookies). Ethernet frames may be bridged based on their L2 address, or they may be routed based on their L3(1PV4//1PV6) address. The optimized IP frame may have only an L3(1PV4/1PV6) header and be routed. Specialized NICs support fabric formats that may be used for portal enhanced frame formats and may map directly to network 100, e.g., fabric formats that provide certain control and status fields to support multi-chip fabrics when switch/switch chips (such as switches 102, 104, 106, 108, and 110) connect and communicate with each other. As alluded to above, flow-channel based congestion control mechanisms may be used by such switches, and may also enable high transmission rates of small packets (e.g., over 12 billion packets per second per port) to accommodate the needs of HPC applications.

Switch 702 may provide system-wide quality of service (QoS) classes along with the ability to control how network bandwidth is allocated to different classes of traffic and different classes of applications, where a single privileged application may access more than one class of traffic. In the presence of network bandwidth contention, the arbiter selects a packet to forward based on the packet's traffic class and the credits available for that class. The network may support a minimum bandwidth and a maximum bandwidth for each traffic class. If one class does not use its minimum bandwidth, the other classes may use the unused bandwidth, but none of the classes may get more than its maximum allocated bandwidth. The ability to manage bandwidth provides an opportunity to dedicate network resources as well as CPU and memory bandwidth to a particular application.

In addition to supporting QoS classes, switch 702 also implements flow path-based congestion control, and may limit routing flows identified as congested and reduce the number of network hops (e.g., in a network with a dragonfly topology) from five network hops to three. The design of the switch 702, described in more detail below, may reduce network cost and power consumption, and may further facilitate the use of innovative adaptive routing algorithms that improve application performance. The structure created by multiple switches (such as multiple switches 702) can also be used to construct a fat tree network, for example when building a storage subsystem for integration with third party networks and software. Still further, the use of the switch 702 enables fine-grained adaptive routing while maintaining in-order packet delivery. In some embodiments, the switch 702 may be configured to send the header of the packet from the input port to the output port before the complete data payload arrives, thereby allowing the output port load metric to reflect future load, thereby improving adaptive routing decisions made by the switch 202. Crossbar switch 710 may include individual, distributed crossbars (crossbar) that route data/data elements between input ports and output ports. Switch 802 may have multiple transmit/receive ports, such as port 720. The portion of the switch 702 associated with the egress function typically operates on frames within a switch fabric format and has a fabric header, e.g., even for frames arriving and leaving an ethernet port within a single switch 702.

Fig. 8 depicts a block diagram of an example computer system 800 in which various embodiments described herein may be implemented. Computer system 800 includes a bus 802 or other communication mechanism for communicating information, and one or more hardware processors 804 coupled with bus 802 for processing information. Hardware processor(s) 804 may be, for example, one or more general-purpose microprocessors.

Computer system 800 also includes a main memory 806, such as a Random Access Memory (RAM), cache memory, and/or other dynamic storage device, coupled to bus 802 for storing information and instructions to be executed by processor 804. Main memory 806 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 804. Such instructions, when stored in a storage medium accessible to processor 804, cause computer system 800 to appear as a special-purpose machine that is customized to perform the operations specified in the instructions.

Computer system 800 further includes a Read Only Memory (ROM)808 or other static storage device coupled to bus 802 for storing static information and instructions for processor 804. A storage device 810, such as a magnetic disk, optical disk or USB thumb drive (flash drive), is provided and coupled to bus 802 for storing information and instructions.

Computer system 800 may be coupled via bus 802 to a display 812, such as a Liquid Crystal Display (LCD) (or touch screen), for displaying information to a computer user. An input device 814, including alphanumeric and other keys, is coupled to bus 802 for communicating information and command selections to processor 804. Another type of user input device is cursor control 816, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 804 and for controlling cursor movement on display 812. In some embodiments, the same directional information and command selections as the cursor control may be implemented via receiving a touch on the touchscreen without a cursor.

The computing system 800 may include a user interface module for implementing a GUI, which may be stored in a mass storage device as executable software code executed by the computing device(s). By way of example, such and other modules may include components (such as software components, object-oriented software components, class components, and task components), procedures, functions, attributes, programs, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

In general, the terms "component," "engine," "system," "database," "data store," and the like, as used herein, may refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language such as, for example, Java, C, or C + +. The software components may be compiled and linked into an executable program, installed in a dynamically linked library, or written in an interpretive programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts. Software components configured for execution on a computing device may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disk, or any other tangible medium, or provided as a digital download (and may be initially stored in a compressed or installable format that requires installation, decompression, or decryption prior to execution). Such software code may be stored, partially or wholly, on a memory device executing the computing device for execution by the computing device. The software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that the hardware components may include connected logic units (such as gates and flip-flops), and/or may include programmable units (such as programmable gate arrays or processors).

Computer system 800 may implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs, firmware, and/or program logic, which in combination with the computer system causes computer system 800 to be a special purpose machine or programs computer system 800 to be a special purpose machine. According to one embodiment, the techniques herein are performed by computer system 800 in response to processor(s) 804 executing one or more sequences of one or more instructions contained in main memory 806. Such instructions may be read into main memory 806 from another storage medium, such as storage device 810. Execution of the sequences of instructions contained in main memory 806 causes processor(s) 804 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term "non-transitory medium" and similar terms as used herein refer to any medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such non-transitory media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 810. Volatile media includes dynamic memory, such as main memory 806. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and network versions thereof.

The non-transitory medium is different from, but may be used in combination with, a transmission medium. Transmission media participate in the transfer of information between non-transitory media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 802. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.

Computer system 800 also includes a communication interface 818 coupled to bus 802. Network interface 818 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, communication interface 818 may be an Integrated Services Digital Network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 818 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component for communicating with a WAN). Wireless links may also be implemented. In any such implementation, network interface 818 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

The network link typically provides data communication through one or more networks to other data devices. For example, the network link may provide a connection through a local network to a host computer or to data equipment operated by an Internet Service Provider (ISP). The ISP in turn provides data communication services through the worldwide packet data communication network (now commonly referred to as the "internet"). Both the local network and the internet use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on the network link and through communication interface 818, which carry the digital data to and from computer system 800, are exemplary forms of transmission media.

Computer system 800 can send messages and receive data, including program code, through the network(s), network link, and communication interface 818. In the Internet example, a server might transmit a requested code for an application program through the Internet, an ISP, local network and communication interface 818.

The received code may be executed by processor 804 as it is received, and/or stored in storage device 810, or other non-volatile storage for later execution. Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code means executed by one or more computer systems or computer processors comprising computer hardware. The one or more computer systems or computer processors may also be operable to support the performance of related operations in a "cloud computing" environment or to operate as a "software as a service" (SaaS). The processes and algorithms may be implemented in part or in whole in application specific circuitry. The various features and processes described above may be used independently of one another or may be combined in various ways. Various combinations and sub-combinations are intended to fall within the scope of the present disclosure, and certain method or process blocks may be omitted in some embodiments. The methods and processes described herein are also not limited to any particular order, and the blocks or states associated therewith may be performed in other suitable orders, or may be performed in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The performance of certain operations or processes may be distributed among computer systems or computer processors, not only residing within a single machine, but also deployed across several machines.

As used herein, a circuit may be implemented using any form of hardware, software, or combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logic components, software routines, or other mechanisms may be implemented to make up a circuit. In embodiments, the various circuits described herein may be implemented as discrete circuits, or the functions and features described may be shared, in part or in whole, among one or more circuits. Although various features or functions may be described or claimed separately as separate circuits, these features and functions may be shared among one or more common circuits, and such description should not require or imply that separate circuits are required to implement such features or functions. Where circuitry is implemented, in whole or in part, using software, such software may be implemented to operate with a computing or processing system (such as computer system 800) capable of implementing the functionality described with respect thereto.

As used herein, the term "or" may be interpreted in an inclusive or exclusive sense. Furthermore, the description of a resource, operation, or structure in the singular is not to be construed as excluding the plural. Conditional language (such as, inter alia, "may," "might," "perhaps," or "may") is generally intended to convey that certain embodiments include but other embodiments do not include certain features, elements, and/or steps unless expressly stated otherwise or otherwise understood within the context as used.

Terms and phrases used in this document, and variations thereof, unless expressly stated otherwise, should be construed as open ended as opposed to limiting. Adjectives such as "conventional," "traditional," "normal," "standard," "known," and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known at any time, now or in the future. In some instances, the presence of broad words and phrases such as "one or more," "at least," "but not limited to," or other like phrases should not be read to mean that the narrower case is intended or required in instances where such broad phrase may be lacking.

37页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于分发软件确定的全局负载信息的方法

System and method for adaptive routing in the presence of a continuous flow

相关技术

网友询问留言