System and method for performing transaction aggregation in a network on chip (NoC)

文档序号:1952544 发布日期:2021-12-10 浏览:17次 中文

阅读说明:本技术 用于执行片上网络(NoC)中的事务聚合的系统和方法 (System and method for performing transaction aggregation in a network on chip (NoC) ) 是由 B·德勒斯库里 M·弗兰克 于 2021-06-08 设计创作,主要内容包括:本文公开了用于聚合从多个发起器通过片上网络(NoC)被发送到目标的相同请求的系统和方法。请求被标记为用于聚合。NoC使用请求聚合器(RA)作为聚合点来聚合被标记为用于聚合的相同请求。在聚合点,相同的请求被减少为单个请求。单个请求被发送到目标。所述过程通过NoC以级联方式被重复,可能涉及多个请求聚合器。当响应事务被从目标接收回时,即离目标最近的聚合点,响应事务被直接或通过其他请求聚合器被复制并发送给每个原始请求方,其他请求聚合器进一步复制所复制的响应事务。(Systems and methods for aggregating identical requests sent from multiple initiators over a network on chip (NoC) to a target are disclosed herein. The request is marked for aggregation. The NoC aggregates the same requests marked for aggregation using a Request Aggregator (RA) as an aggregation point. At the aggregation point, the same request is reduced to a single request. A single request is sent to the target. The process is repeated in a cascaded manner by the NoC, possibly involving multiple request aggregators. When a response transaction is received back from the target, i.e., the aggregation point closest to the target, the response transaction is replicated and sent to each original requestor, either directly or through other request aggregators, which further replicate the replicated response transaction.)

1. A system comprising:

a plurality of initiators for generating requests;

at least one target for generating a response; and

a network on chip, NoC, in communication with the plurality of initiators and the target, the NoC comprising:

a plurality of initiator network interface units, each initiator network interface unit connected to one initiator of the plurality of initiators;

at least one target network interface unit connected to the at least one target;

a plurality of switches connected to the plurality of initiator network interface units and the target network interface unit; and

a plurality of request aggregators in communication with the plurality of switches,

wherein any one or more request aggregators of the plurality of request aggregators receive a plurality of request transactions from a plurality of initiators and aggregate the plurality of request transactions into aggregated request transactions, each request transaction marked for aggregation and targeted to the at least one target to which the aggregated request transactions are sent.

2. The system of claim 1, wherein the target generates a response transaction that is sent to the request aggregator of the NoC.

3. The system of claim 2, wherein the request aggregator replicates the response transactions and forwards one replicated response transaction to each initiator of the plurality of initiators that sent the request transactions marked for aggregation.

4. One method comprises the following steps:

receiving an aggregate request transaction from a plurality of initiators at a request ingress port of a request aggregator;

aggregating, at the request aggregator, the aggregated request transactions to generate aggregated request transactions;

sending the aggregated request transaction from a request egress port of the request aggregator to a target;

receiving, at a response ingress port of the request aggregator, a response transaction from the target;

replicating the response transaction at the request aggregator to generate a plurality of replicated response transactions; and

sending the plurality of replicated response transactions from the response egress port of the request aggregator to the plurality of initiators such that each initiator gets one replicated response transaction.

5. The method of claim 4, further comprising: sending the replicated response transaction to a second request aggregator to further replicate the replicated response transaction.

6. A non-transitory computer-readable medium comprising code executed by a processor to cause a network on chip NoC to:

identifying request transactions marked for aggregation;

aggregating the request transactions to generate aggregated request transactions;

sending the aggregated request transaction to a target;

receiving a response transaction for the aggregated request transaction;

replicating the response transaction to produce a plurality of replicated response transactions; and

sending the plurality of replicated response transactions to a plurality of initiators such that each initiator gets one replicated response transaction.

Technical Field

The present technology is in the field of computer system design and, more particularly, relates to aggregating transactions within a network on chip (NoC) to improve performance.

Background

Multiprocessor systems implemented in a system on chip (SoC) communicate over a network, such as a network on chip (NoC). Intellectual Property (IP) blocks are used for chip design. The SoC includes an instance of an Intellectual Property (IP) block. Some IP blocks are initiators (initiators). Some IP blocks are targets. The initiator and target communicate over a network, such as a NoC.

Transactions are sent from the initiator to the target using any one of a number of industry standard protocols (e.g., ARM AMBA AXI, AHB or APB, or OCP-IP). Such protocols have strict request/response semantics and are typically viewed by nocs as unicast: an initiator connected to the NoC sends a request to a target. The initiator selects a target using the address, and the initiator sends a request transaction to the target. The NoC decodes the address and transmits the request from the initiator to the target. The target process requests a transaction and sends a response transaction, which is transmitted by the NoC back to the initiator.

If multiple initiators need to send the same request to the same address (to one target), the set of requests (all the same, all sent to the same address) will be serialized by the interconnect at the target (e.g., NoC). The target sends the same response to each initiator, one at a time. This is very inefficient because the target sends the same response to multiple initiators in a serialized manner, and the same response may occupy the same interconnect link multiple times. For example, if 16 initiators are to read data at the same address of the same target, an interconnect (NoC) must sequence 16 identical requests to the target. The target then needs to send the same data back 16 times to each request initiator. The total operation at the target takes 16 times the time of a single read request. Therefore, there is a need for a system and method to aggregate the same requests sent to a target to reduce the number of same requests sent to the target, to reduce the number of same requests transmitted by a NoC, and conversely, to reduce the number of same responses transmitted by a NoC.

Disclosure of Invention

According to various embodiments and aspects of the present invention, a system and method of aggregating identical requests sent from multiple initiators to a target is disclosed. The same request is marked for aggregation. An interconnect or network on chip (NoC) aggregates all of the same requests at an aggregation point. At the aggregation point, multiple identical requests are reduced to a single request. A single request is sent to the target. According to various aspects of the present invention, the process is repeated in a cascaded manner by the NoC. When a response to a request from a target is received back at the aggregation point, the response is replicated and sent to each original initiator.

One advantage of the various aspects and embodiments of the present invention is that maximum efficiency of operations including sending the same request from multiple initiators to the same address is achieved. Another advantage includes, but is not limited to, a system where the request is a read request. Various aspects and embodiments of the present invention encompass other types of requests, such as write requests. Taking the above non-limiting example as an example, 16 initiators each send a read (or write) to the same address of the target. The target will receive only one read (or write) command. The target will then send only one response (no response is needed if the request is a read; if the request is a write or the response is an acknowledgement that the write has been performed), which will be replicated 16 times en route.

Yet another advantage of the various aspects and embodiments of the present invention is in applications where multiple initiators or nodes want to read the same data from external memory at the same time, such as deep neural network computations and artificial intelligence computations.

Drawings

For a more complete understanding of the present invention, reference is now made to the accompanying drawings. The present invention is described with reference to drawings or diagrams () in which like numerals represent the same or similar elements, according to aspects and embodiments in the following description. Understanding that these drawings are not to be considered limiting of its scope, the presently described aspects and embodiments, as well as the best mode presently understood of the invention, are described with additional detail through use of the drawings.

Fig. 1A illustrates a network on chip (NoC) connecting multiple initiators to multiple targets in accordance with various aspects and embodiments of the invention.

Fig. 1B illustrates a NoC connecting multiple initiators to multiple targets in accordance with various aspects and embodiments of the invention.

Fig. 1C illustrates a NoC connecting multiple initiators to multiple targets in accordance with various aspects and embodiments of the invention.

Fig. 2 illustrates the NoC of fig. 1 with an aggregation of request transactions sent from multiple initiators to one target in accordance with various aspects and embodiments of the invention.

Fig. 3 illustrates the NoC of fig. 2 with a response transaction sent from one target to multiple initiators in accordance with various aspects and embodiments of the invention.

FIG. 4 illustrates an address diagram in accordance with various aspects and embodiments of the invention.

Fig. 5 illustrates a process for aggregating transactions in accordance with various aspects and embodiments of the invention.

Detailed Description

Various examples of the present technology are described below, which illustrate various aspects and embodiments of the present invention. In general, examples may use the described aspects in any combination. All statements herein reciting principles, aspects, and embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Further, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

Note that as used herein, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Reference throughout this specification to "one aspect," "an aspect," "certain aspects," "various aspects," or similar language means that a particular aspect, feature, structure, or characteristic described in connection with any embodiment is included in at least one embodiment of the present invention.

The appearances of the phrases "in one embodiment," "in at least one embodiment," "in certain embodiments," and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment or similar embodiments. Furthermore, the aspects and embodiments of the present invention described herein are merely exemplary and should not be construed as limiting the scope or spirit of the invention as understood by those of ordinary skill in the art. The disclosed invention is effectively made or used in any embodiments including any novel aspects described herein. All statements herein reciting aspects and embodiments of the invention are intended to encompass both structural and functional equivalents thereof. It is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future.

As used herein, an "initiator" refers to an Intellectual Property (IP) block, unit, or module. The terms "initiator," "requestor," and "master" may be used interchangeably within the scope and embodiments of the present invention. As used herein, "target" and "slave" refer to IP blocks; the terms "target" and "slave" may be used interchangeably within the scope and embodiments of the present invention.

As used herein, a transaction may be a request transaction or a response transaction. Examples of request transactions include write requests and read requests. According to various aspects of the invention, when multiple initiators send the same request to the same target, the requests are aggregated. According to one aspect of the invention, "identical requests" means that the request transactions have the same address, the same type, and the same size; the same request is sent from multiple initiators to one target. According to an aspect of the invention, "identical requests" means that the request transactions have the same address, optionally the same request type and optionally the same length.

As used herein, a node is defined as a distribution point or a communication endpoint that is capable of creating, receiving, and/or transmitting information over a communication path or channel. A node may refer to any of the following: a switch (switch), a splitter (splitter), a merger (merger), a buffer (buffer), and an adapter (adapter). As used herein, splitters and combiners are switches; not all switches are splitters or combiners. As used herein and in accordance with various aspects and embodiments of the present invention, the term "splitter" describes a switch having a single ingress port and multiple egress ports. As used herein and in accordance with various aspects and embodiments of the present invention, the term "merger" describes a switch having a single egress port and multiple ingress ports.

Referring now to fig. 1, a network on chip (NoC)100 is shown in accordance with various aspects and embodiments of the present invention. The NoC 100 includes nodes or switches 102a, 102b, 102c, and 102 d. The initiators 104(M0, M1, M2, M3) are connected to the NoC 100 using a Network Interface (NI) unit 106. The NI unit 106 converts incoming transactions (requests to read or write) into the protocol used by the NoC 100. The NoC 100 is connected to the target 108(T0, T1) using a Network Interface (NI) unit 116, and the Network Interface (NI) unit 116 performs a function similar to the NI unit 106, converting the NoC protocol to the target protocol.

In accordance with various aspects of the present invention, fig. 1A, 1B, and 1C only show the requested network connection for clarity. A request transaction travels from any one of the initiators 104 through the NoC 100 to any one of the targets 108. According to other aspects of the invention, there are also corresponding response networks, where responses travel from the target to reach the initiator.

According to some aspects and embodiments of the present invention, the NoC 100 includes a Request Aggregator (RA)110, such as RA0, RA1, RA10, and RA 11. Various embodiments for implementing RA110 are possible. According to some aspects and embodiments of the invention, RA110 is a hardware component that works in conjunction with switch 102 and is connected to switch 102, as shown in FIG. 1 for clarity.

Referring to fig. 1B, according to other aspects and embodiments of the invention, any RA110 (e.g., RA 10110 shown) may be part of or an internal hardware component of switch 102, such as switch 102B.

Referring now to fig. 1C, according to some other aspects and embodiments of the invention, any RA110 (e.g., RA 10110 shown) may support and serve multiple switches 102 (e.g., switch 102b and switch 102 d).

RA110 is connected to the requesting and responding networks. As described above, the requesting network side for the connection is shown in fig. 1A, 1B, and 1C for clarity only. The responsive network connection is not shown, although it is similar to fig. 1A, 1B, and 1C and in the opposite direction. RA110 includes a plurality of ingress ports. Each ingress port is the destination of a request (for a particular target) and the source of a response (for a particular initiator). RA110 has at least one egress port, which is the source of the request and the destination of the response. The RA110 receives request packets on a set of ingress ports, aggregates the requests and sends the aggregated requests to one egress port. According to an aspect of the invention, the number of inlet and outlet ports is set at design time.

Referring now to fig. 2, RA110 (RA0, RA1, RA10, and RA11) is shown, according to some aspects and embodiments of the present invention. In one example, a plurality of identical requests, such as Read (RD) requests (RD0, RD1, RD2, and RD3) are generated from a plurality of initiators 104(M0, M1, M2, and M3), respectively. Requests RD0, RD1, RD2, and RD3 are transactions to or sent to T0. Read request transactions RD0, RD1, RD2, and RD3 are the same requests and have been identified for aggregation using a special tag, a special address, a qualifier bit, or a combination of bits (which are contained in the payload of the transaction). These requests may be aggregated when they are sent to a particular address region of the address map 400 as shown in FIG. 4.

RD0 and RD1 arrive at switch 102a and then go to RA 0110. RD0 and RD1 are aggregated at RA0 to generate an aggregated read request transaction RD 4. RD4 is a single event of the original request transactions RD0 and RD1, sent to switch 102b, and then to RA 10110, RD4 is advancing towards target T0.

RD2 and RD3 arrive at switch 102c and then at RA 1110. RD2 and RD3 are aggregated at RA 1110 to generate an aggregated read request transaction RD 5. RD5 is a single event of the original request transactions RD2 and RD3, sent to switch 102b, and then to RA 10110, RD4 is advancing towards target T0.

At RA 10110, the two aggregated requests RD4 and RD5 from RA 0110 and RA 1110, respectively, are again aggregated to generate an aggregated read request transaction RD 6. RD6 is sent to NI unit 116 and then to target T0.

Referring now to FIG. 3, a response transaction RSP0 is generated by target T0108. The appropriate RA is the identified destination for the response transaction. RSP0 is sent back from target T0108 and at some point reaches the switch of the responding network connected to RA 10110 and then goes to the responding port of the egress port of RA 10110. As described above, according to various aspects and embodiments of the present invention, the RA may be part of the switch hardware for the requesting transaction side.

RA 10110 replicates the response transaction to generate RSP1 and RSP 2. RA 10110 sends RSP1 and RSP2 back to the responding ports of the egress ports of RA 0110 and RA 1110, respectively. RA 10110 sends a response transaction to the RA from which RA 10110 received the request transaction. Thus, RA 10110 sends a response transaction to RA 0110 and a response transaction to RA 1110, since the original request transaction arrived at RA 10110 from RA 0110 and RA 1110.

RSP1 is received at the response port of the egress port of RA 0110. RA 0110 copies RSP1 into generated RSP3 and RSP4, and sends RSP3 and RSP4 to initiator M0104 and initiator M1104, respectively, that sent the original requests RD0 and RD 1.

RSP2 is received at the response port of the egress port of RA 1110. RA 1110 replicates RSP2 into generated RSP5 and RSP 6. RA 1110 then sends RSP5 and RSP6 to initiator M2104 and initiator M3104, respectively, which sent the original request transactions RD2 and RD 3.

The various aspects and embodiments of the present invention are most advantageous but not limited to the case where the request is a read request. Other types of requests, such as write requests, are also contemplated by the present invention. In some embodiments, the write request has a response that is generated after the transaction is completed. In some embodiments, when a write request is executed, no response is generated after the transaction is completed.

The various aspects and embodiments of the present invention are applicable to any type of network topology and are most useful, but not limited to, situations where RAs may be cascaded to limit the traffic of requests and responses to a minimum. According to various embodiments of the invention, there may be multiple aggregator cascades in a NoC, and one aggregator may be part of multiple different cascades, e.g., two cascades ending at T0 and T1 as shown in fig. 1.

According to various aspects and embodiments of the present invention, the RA aggregates two or more identical requests by using the storage devices present in each RA. The storage device is used to track the address and size of aggregated requests. When a new request passes through one of the aggregator ingress ports, the storage device is looked up to see if the same request has been logged (same address, and optionally the same request type and/or optionally the same length). If so, the RA keeps track of which ingress port the request was received from and from which source (initiator) the request came; no content is sent to the egress port of the RA: the request is "aggregated" with the requests that have been received. If the storage device lookup does not indicate that the same request has been logged, the RA will remember from which ingress port the request was received and from which source (initiator) the request came, and a new entry is added to its storage device. In addition, the request is recorded in the storage device, including an address and optionally a type and/or optionally a length. After the multiple requests are aggregated, the RA sends a corresponding request on its egress port. According to some aspects and embodiments of the invention, the number of requests that need to be aggregated before issuing a request on an egress port of a RA is configured at design time. According to some aspects and embodiments of the invention, the number of requests that need to be aggregated is dynamically specified in a special field of the packet header. When all responses corresponding to the request have been replicated and sent back to the requestor, the entry will be deleted from the storage device used to track the aggregated request address.

According to various aspects and embodiments of the present invention, an initiator selects between normal transactions and aggregate operations to a target by marking the operations to be aggregated with a request qualifier. One example of such a request qualifier is the use of special flags or address bits, as shown in FIG. 4. When a request transaction is sent to a target with the special address bit equal to 0, the NoC does not do the aggregation. If the request transaction is sent when the special address bit is 1, the request transaction will pass the RA. The request transaction is aggregated with other requests that use the same special address bit set to 1 and are destined for the same address.

According to various aspects of the invention, the RA performs request aggregation by using a combining function to combine all requests co-located with a target accessing the NoC or connected to the NoC into a single request.

As described above, one advantage of the various aspects and embodiments of the present invention is for read aggregation, where all read requests are combined into one request that is identical to all original requests. Other advantages may be realized where the request transaction is a write request. The invention is not limited by the type of transaction. For example, write request aggregation, where the write data associated with each request is combined together according to some mathematical or logical operational function.

According to a variation of aspects and embodiments of the invention, additional responsive storage devices are implemented in some or all of the RA's. The additional storage device may be used to hold the replicated responses for a period of time, allowing each responding port to process independently of the other ports. After a group of requests for the same address has been aggregated and responded to, service requests for that address are also allowed to arrive. This eliminates the need to delete the corresponding entries in the address storage device after all responses are sent back, but rather retains the entries for a longer time and deletes them based on age or other eviction mechanisms.

According to various aspects and embodiments of the present invention, the system and method includes implementing a speculative problem in some or all RA's that aggregated read requests are larger (require more data) than the original read requests. Then, if a subsequent request comes to the aggregator and requests data, which is part of the speculatively requested data, the transaction waits in the aggregator for a response return to the speculative request. When a response is returned, all requests are responded to in the aggregator, including requests waiting for speculatively requested data. In other words, the RA may support speculative pre-reads of read data.

Referring now to fig. 5, a process 500 for aggregating request transactions is shown, in accordance with various embodiments and aspects of the present invention. Process 500 begins and at step 502, a plurality of initiators each send a request transaction to a NoC, and the RA (in communication with or included in the switch) receives the request transaction marked for aggregation when the request transaction is marked for aggregation. At step 504, the RA of the NoC aggregates the request transactions to generate aggregated request transactions. At step 506, the NoC forwards the aggregated request transaction to the targeted target. The target receives the aggregated request transactions. At step 508, the target generates a response transaction for the RA that is sent to the NoC. At step 510, the RA replicates the response transaction and the NoC sends the replicated response transaction to each initiator that sent the request transaction marked for aggregation. As noted herein and in accordance with various aspects and embodiments of the present invention, there are some request transactions that are aggregated and completed, but no response transaction is needed or generated.

Certain methods according to various aspects of the present invention may be performed by instructions stored on a non-transitory computer-readable medium. A non-transitory computer-readable medium stores code comprising instructions that, if executed by one or more processors, cause a system or computer to perform the steps of the methods described herein. The non-transitory computer readable medium includes: rotating magnetic disks, rotating optical disks, flash Random Access Memory (RAM) chips, and other mechanical mobile or solid state storage media. According to various examples, any type of computer-readable medium is suitable for storing code, including instructions.

Certain examples have been described herein, and it should be noted that different combinations of different components from different examples are possible. Salient features are presented to better explain examples; it will be apparent, however, that certain features may be added, modified and/or omitted without modifying the functional aspects of these described examples.

Various examples are methods of using the behavior of any machine or combination of machines. The method examples are complete anywhere in the world where most of the constituent steps occur. For example and in accordance with various aspects and embodiments of the present invention, an IP element or unit comprises: a processor (e.g., a CPU or GPU), random access memory (RAM-e.g., off-chip dynamic RAM or DRAM), a network interface for wired or wireless connection, such as ethernet, WiFi, 3G, 4G Long Term Evolution (LTE), 5G, and other wireless interface standard radios. The IP may also include various I/O interface devices to meet the needs of different peripheral devices, such as touch screen sensors, geolocation receivers, microphones, speakers, bluetooth peripherals, and USB devices, such as keyboards and mice, etc. By executing instructions stored in the RAM device, the processor performs the steps of the method as described herein.

Some examples are one or more non-transitory computer-readable media arranged to store such instructions for the methods described herein. Any machine having a non-transitory computer-readable medium that includes any necessary code may implement the examples. Some examples may be implemented as: physical devices, such as semiconductor chips; a hardware description language representation of the logical or functional behavior of such a device; and one or more non-transitory computer-readable media for storing such hardware description language representations. The description of the principles, aspects, and embodiments described herein includes structural and functional equivalents thereof. Elements described herein as coupled have an effective relationship that may be achieved through direct connection or indirect connection with one or more other intermediate elements.

Those skilled in the art will recognize many modifications and variations. Modifications and variations include any relevant combination of the disclosed features. The description of the principles, aspects, and embodiments described herein includes structural and functional equivalents thereof. Elements described herein as "coupled" or "communicatively coupled" have a valid relationship that may be achieved through direct connection or indirect connection using one or more other intermediate elements. Embodiments described herein as "communicating" or "with" another device, module, or element include any form of communication or linkage and include an operative relationship. For example, the communication link may be established using a wired connection, a wireless protocol, a near field protocol, or RFID.

To the extent that the terms "includes", "including", "includes", "having", "has", "with", or variants thereof are used in the detailed description and the claims, such terms are intended to be inclusive in a manner similar to the term "comprising".

Thus, the scope of the present invention is not intended to be limited to the exemplary embodiments and aspects shown and described herein. Rather, the scope and spirit of the invention is embodied by the appended claims.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:片上网络拓扑结构的确定方法、装置及芯片

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!