Cache allocation method for router, network on chip and electronic equipment

文档序号：856854 发布日期：2021-04-02 浏览：2次中文

阅读说明：本技术 用于路由器的缓存分配方法、片上网络及电子设备 (Cache allocation method for router, network on chip and electronic equipment ) 是由杨平于 2020-12-28 设计创作，主要内容包括：一种用于路由器的缓存分配方法、片上网络及电子设备,该缓存分配方法包括：确定多个接收端口中的有效活动端口；响应于全局共享池中的缓存单元的数量小于有效活动端口的数量且多个接收端口中存在至少一个空闲端口,回收至少一个空闲端口的缓存单元,并将回收的缓存单元加入全局共享池。多个缓存单元中可进行分配的缓存单元组成全局共享池,全局共享池中的缓存单元用于被分配给有效活动端口,空闲端口为多个接收端口中除了有效活动端口之外的接收端口。该缓存分配方法可以在更细的粒度上动态调节缓存的分配,可以提高共享缓存的利用率,至少一些实施例提供的缓存分配方法还可以实现端口优先级的动态调整,以基于端口优先级进行缓存分配。(A cache allocation method, a network on chip and an electronic device for a router are provided, the cache allocation method includes: determining a valid active port of a plurality of receive ports; and in response to that the number of the cache units in the global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache unit of the at least one idle port and adding the recovered cache unit into the global shared pool. The buffer units which can be allocated in the plurality of buffer units form a global shared pool, the buffer units in the global shared pool are used for being allocated to the effective active ports, and the idle ports are the receiving ports except the effective active ports in the plurality of receiving ports. The cache allocation method may dynamically adjust allocation of the cache at a finer granularity, may improve utilization of the shared cache, and the cache allocation method provided in at least some embodiments may also implement dynamic adjustment of the port priority to perform cache allocation based on the port priority.)

1. A cache allocation method for a router, wherein the router comprises a plurality of receiving ports and a plurality of cache units, the method comprising:

determining a valid active port of the plurality of receive ports;

in response to that the number of cache units in a global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache units of the at least one idle port and adding the recovered cache units into the global shared pool;

the buffer units that can be allocated in the plurality of buffer units form the global shared pool, the buffer units in the global shared pool are used for being allocated to the active ports, and the idle ports are the receiving ports of the plurality of receiving ports except the active ports.

2. The method as claimed in claim 1, wherein the active port is a receiving port that is currently transmitting data, and/or a receiving port that is not transmitting data, has corresponding data in a corresponding upper level router, and has a number of corresponding cache units smaller than a maximum preset value, and the upper level router is configured to send data to the router.

3. The method of claim 1, wherein reclaiming the cache units of the at least one free port and adding the reclaimed cache units to the global shared pool comprises:

determining a cache reclamation budget based on the number of active ports;

according to the recovery weight value of the at least one idle port, distributing the cache recovery budget to the at least one idle port to obtain a budget value of each idle port in the at least one idle port;

based on the budget value, sending a recovery request to an upper-level router connected to each idle port in the at least one idle port, wherein the recovery request carries the budget value of the corresponding idle port;

and in response to receiving a token recovery signal sent by an upper-level router connected with each idle port, recovering the cache units of each idle port according to the budget value, and adding the recovered cache units into the global sharing pool.

4. The method of claim 3, wherein the reclaim weight value is determined based on a number of cache units of each free port, the reclaim weight value being positively correlated to a number of cache units of the corresponding free port.

5. The method of claim 3, wherein the cache reclamation budget is equal to the number of active ports or a difference between the number of active ports and a number of cache molecules in the global shared pool.

6. The method of claim 2, wherein determining the active one of the plurality of receive ports comprises:

determining a receiving port which does not transmit data at the current moment and has corresponding data in the corresponding upper-level router as a standby receiving port;

determining the alternative receiving ports with the number of the corresponding cache units smaller than the maximum preset value as the effective active ports;

and determining the receiving port transmitting data at the current moment as the effective active port.

7. The method of any of claims 1-6, further comprising:

and allocating the cache units in the global sharing pool to the effective active ports according to the priority of the effective active ports.

8. The method of claim 7, wherein assigning a cache location in the global shared pool to the active port according to the priority of the active port comprises:

receiving a cache congestion state signal sent by an upper-level router connected with each receiving port in the plurality of receiving ports, wherein the cache congestion state signal indicates a congestion state cached in a corresponding upper-level router;

determining the priority of the effective active port according to the cache congestion state signal;

and allocating the cache units in the global sharing pool to the effective active ports according to the descending order of the priority.

9. The method of claim 8, wherein the buffer congestion status signal indicates a plurality of congestion states representing different degrees of congestion,

the plurality of congestion states are determined based on the number of cache units occupied by data corresponding to the receiving port corresponding to the upper-level router in the data stored in the upper-level router.

10. The method according to claim 9, wherein the plurality of receiving ports include at least a first receiving port and a second receiving port, and a number of the cache units occupied by the data corresponding to the first receiving port in the data stored in the upper router to which the first receiving port is connected is greater than a number of the cache units occupied by the data corresponding to the second receiving port in the data stored in the upper router to which the second receiving port is connected, or a percentage of the number of the cache units occupied by the data corresponding to the first receiving port in the data stored in the upper router to which the first receiving port is connected to the number of the cache units occupied by the data stored in the upper router is greater than a percentage of the number of the cache units occupied by the data corresponding to the second receiving port in the data stored in the upper router to which the second receiving port is connected to the upper router Percentage of number of occupied cache units;

the congestion states at least comprise a first state and a second state, the buffer congestion state signal corresponding to the first receiving port indicates the first state, the buffer congestion state signal corresponding to the second receiving port indicates the second state, and the congestion degree of the first state is higher than that of the second state.

11. The method of claim 10, wherein the priorities comprise at least a first priority and a second priority, the first receive port being the first priority, the second receive port being the second priority, the first priority being higher than the second priority.

12. The method of claim 8, wherein the number of allocated cache units for an allocated active port is 1.

13. The method of any of claims 1-6, further comprising:

and adding the cache unit released by the effective active port into the global shared pool.

14. The method of any of claims 1-6, further comprising:

and responding to the condition that the number of the cache units in the global sharing pool is larger than or equal to the number of the effective active ports, and allocating at least one cache unit in the global sharing pool to each effective active port.

15. The method of any of claims 1-6, further comprising:

in an initialization stage, the plurality of buffer units are allocated to the plurality of receiving ports to complete initialization.

16. A network on chip comprising a plurality of routers, wherein the plurality of routers comprises a receiving router and at least one transmitting router, the receiving router configured to receive data transmitted from the at least one transmitting router,

the receiving router is connected with each transmitting router through a data and command transmission bus configured to transmit the data from the transmitting router to the receiving router, and a bidirectional control channel configured to bidirectionally transmit control signals between the receiving router and the transmitting router,

the receiving router comprises a buffer allocation controller, a plurality of receiving ports and a plurality of buffer units, each receiving port is connected with one sending router,

the cache allocation controller is configured to:

determining a valid active port of the plurality of receive ports;

17. The network on chip of claim 16, wherein the active port is a receiving port that is currently transmitting data, and/or a receiving port that is not currently transmitting data, has corresponding data in a correspondingly connected sending router, and has a number of corresponding cache units smaller than a maximum preset value.

18. The network on chip of claim 16, wherein the bidirectional control channel is a sideband bypass control channel or the data and command transport bus is multiplexed into the bidirectional control channel.

19. The network on chip of claim 18, wherein, in a case where the bidirectional control channel is the sideband bypass control channel, the sideband bypass control channel comprises a first channel, a second channel, and a third channel,

the first channel is configured to transmit a buffer congestion status signal generated by the sending register to the receiving register, the buffer congestion status signal indicating a congestion status of a buffer in a corresponding sending router,

the second channel is configured to send a recovery request generated by the cache allocation controller to the sending register, where the recovery request carries a budget value of a corresponding idle port, where the budget value is obtained based on a cache recovery budget,

and the third channel is configured to transmit a token recovery signal generated by the sending register to the receiving register, so as to recover the buffer unit of the idle port according to the budget value and add the recovered buffer unit to the global shared pool.

20. The network on chip of any of claims 16-19, wherein the receiving router further comprises a cache allocation arbiter,

the cache allocation arbiter is configured to allocate cache units in the global shared pool to the active ports according to priorities of the active ports.

21. A network on chip configured to implement the cache allocation method for a router of any one of claims 1 to 15.

22. An electronic device comprising the network on chip of any of claims 16-21.

Technical Field

The embodiment of the disclosure relates to a cache allocation method for a router, a network on chip and an electronic device.

Background

With the increasing number of cores of a processor Chip, a System on Chip (SoC) has a development trend from multi-core to many-core, and the processor also puts more severe requirements on communication bandwidth and System expandability. In many-core systems, the global interconnect may cause severe on-chip synchronization errors, unpredictable communication delays, and significant power consumption overhead. A Network-on-Chip (NoC) is a new SoC communication architecture, is a main component of a multi-core technology, can realize a communication function between modules of a Chip, and replaces a traditional bus or point-to-point interconnection mode. Nocs have significantly better performance than conventional bus-based systems, and systems based on nocs can better adapt to globally asynchronous locally synchronous clock mechanisms used in the design of future complex multi-core socs, so nocs are increasingly widely used.

Disclosure of Invention

At least one embodiment of the present disclosure provides a cache allocation method for a router, where the router includes a plurality of receiving ports and a plurality of cache units, the method including: determining a valid active port of the plurality of receive ports; in response to that the number of cache units in a global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache units of the at least one idle port and adding the recovered cache units into the global shared pool; the buffer units that can be allocated in the plurality of buffer units form the global shared pool, the buffer units in the global shared pool are used for being allocated to the active ports, and the idle ports are the receiving ports of the plurality of receiving ports except the active ports.

For example, in the method provided in an embodiment of the present disclosure, the active port is a receiving port that transmits data at the current time, and/or a receiving port that does not transmit data at the current time, has corresponding data in a corresponding upper level router, and has a number of corresponding cache units smaller than a maximum preset value, and the upper level router is configured to send data to the router.

For example, in a method provided in an embodiment of the present disclosure, recovering a cache unit of the at least one idle port, and adding the recovered cache unit to the global shared pool includes: determining a cache reclamation budget based on the number of active ports; according to the recovery weight value of the at least one idle port, distributing the cache recovery budget to the at least one idle port to obtain a budget value of each idle port in the at least one idle port; based on the budget value, sending a recovery request to an upper-level router connected to each idle port in the at least one idle port, wherein the recovery request carries the budget value of the corresponding idle port; and in response to receiving a token recovery signal sent by an upper-level router connected with each idle port, recovering the cache units of each idle port according to the budget value, and adding the recovered cache units into the global sharing pool.

For example, in the method provided by an embodiment of the present disclosure, the recycling weight value is determined based on the number of cache units of each idle port, and the recycling weight value is positively correlated to the number of cache units of the corresponding idle port.

For example, in the method provided by an embodiment of the present disclosure, the cache recycling budget is equal to the number of the active ports, or is equal to a difference between the number of the active ports and the number of cache units in the global shared pool.

For example, in a method provided by an embodiment of the present disclosure, determining the active port of the plurality of receiving ports includes: determining a receiving port which does not transmit data at the current moment and has corresponding data in the corresponding upper-level router as a standby receiving port; determining the alternative receiving ports with the number of the corresponding cache units smaller than the maximum preset value as the effective active ports; and determining the receiving port transmitting data at the current moment as the effective active port.

For example, an embodiment of the present disclosure provides a method further including: and allocating the cache units in the global sharing pool to the effective active ports according to the priority of the effective active ports.

For example, in a method provided by an embodiment of the present disclosure, allocating a cache unit in the global shared pool to the active port according to the priority of the active port includes: receiving a cache congestion state signal sent by an upper-level router connected with each receiving port in the plurality of receiving ports, wherein the cache congestion state signal indicates a congestion state cached in a corresponding upper-level router; determining the priority of the effective active port according to the cache congestion state signal; and allocating the cache units in the global sharing pool to the effective active ports according to the descending order of the priority.

For example, in the method provided in an embodiment of the present disclosure, the buffer congestion state signal indicates a plurality of congestion states, the congestion states represent different congestion degrees, and the congestion states are determined based on the number of buffer units occupied by data corresponding to a receiving port corresponding to the upper level router in the data stored in the upper level router.

For example, in the method provided in an embodiment of the present disclosure, the multiple receiving ports include at least a first receiving port and a second receiving port, a number of cache units occupied by data corresponding to the first receiving port in data stored in an upper router connected to the first receiving port is greater than a number of cache units occupied by data corresponding to the second receiving port in data stored in an upper router connected to the second receiving port, or a percentage of a number of cache units occupied by data corresponding to the first receiving port in data stored in an upper router connected to the first receiving port to a number of cache units occupied by data stored in the upper router is greater than a percentage of a number of cache units occupied by data corresponding to the second receiving port in data stored in an upper router connected to the second receiving port to a number of cache units occupied by data stored in the upper router Percentage of number of occupied cache units; the congestion states at least comprise a first state and a second state, the buffer congestion state signal corresponding to the first receiving port indicates the first state, the buffer congestion state signal corresponding to the second receiving port indicates the second state, and the congestion degree of the first state is higher than that of the second state.

For example, in the method provided in an embodiment of the present disclosure, the priorities include at least a first priority and a second priority, the first receiving port is the first priority, the second receiving port is the second priority, and the first priority is higher than the second priority.

For example, in the method provided in an embodiment of the present disclosure, the number of allocated cache units of the allocated active port is 1.

For example, an embodiment of the present disclosure provides a method further including: and adding the cache unit released by the effective active port into the global shared pool.

For example, an embodiment of the present disclosure provides a method further including: and responding to the condition that the number of the cache units in the global sharing pool is larger than or equal to the number of the effective active ports, and allocating at least one cache unit in the global sharing pool to each effective active port.

For example, an embodiment of the present disclosure provides a method further including: in an initialization stage, the plurality of buffer units are allocated to the plurality of receiving ports to complete initialization.

At least one embodiment of the present disclosure also provides a network on chip comprising a plurality of routers, wherein, the plurality of routers including a receiving router and at least one transmitting router, the receiving router configured to receive data transmitted from the at least one transmitting router, the receiving router is connected with each sending router through a data and command transmission bus and a bidirectional control channel, the data and command transfer bus is configured to transfer the data from the sending router to the receiving router, the bidirectional control channel is configured to bidirectionally transmit control signals between the receiving router and the transmitting router, the receiving router comprises a cache allocation controller, a plurality of receiving ports and a plurality of cache units, wherein each receiving port is connected with one sending router, and the cache allocation controller is configured to: determining a valid active port of the plurality of receive ports; in response to that the number of cache units in a global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache units of the at least one idle port and adding the recovered cache units into the global shared pool; the buffer units that can be allocated in the plurality of buffer units form the global shared pool, the buffer units in the global shared pool are used for being allocated to the active ports, and the idle ports are the receiving ports of the plurality of receiving ports except the active ports.

For example, in the network on chip provided in an embodiment of the present disclosure, the active port is a receiving port that transmits data at the current time, and/or a receiving port that does not transmit data at the current time, has corresponding data in a sending router connected correspondingly, and has a number of corresponding cache units smaller than a maximum preset value.

For example, in the network on chip provided in an embodiment of the present disclosure, the bidirectional control channel is a sideband bypass control channel, or the data and command transmission bus is multiplexed into the bidirectional control channel.

For example, in the network on chip provided in an embodiment of the present disclosure, in a case that the bidirectional control channel is the sideband bypass control channel, the sideband bypass control channel includes a first channel, a second channel, and a third channel, the first channel is configured to transmit a buffer congestion state signal generated by the sending register to the receiving register, the buffer congestion state signal indicates a congestion state of a buffer in a corresponding sending router, the second channel is configured to send a recovery request generated by the buffer allocation controller to the sending register, the recovery request carries a budget value of a corresponding idle port, the budget value is obtained based on a buffer recovery budget, and the third channel is configured to transmit a token recovery signal generated by the sending register to the receiving register, so as to recover, according to the budget value, a buffer unit of the idle port and add the recovered buffer unit to the global shared memory And (4) a pool.

For example, in the network on chip provided by an embodiment of the present disclosure, the receiving router further includes a cache allocation arbiter, and the cache allocation arbiter is configured to allocate the cache units in the global shared pool to the active ports according to the priorities of the active ports.

At least one embodiment of the present disclosure further provides a network on chip configured to implement the cache allocation method for a router according to any embodiment of the present disclosure.

At least one embodiment of the present disclosure further provides an electronic device including the network on chip according to any one of the embodiments of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.

FIG. 1 is a schematic diagram of a topology of a router in a network-on-chip;

FIG. 2 is a schematic diagram of a dynamic router cache allocation scheme;

FIG. 3 is a diagram illustrating data interaction between routers in a network-on-chip;

fig. 4 is a schematic structural diagram of a network on chip according to some embodiments of the present disclosure;

fig. 5 is a schematic flowchart of a cache allocation method for a router according to some embodiments of the present disclosure;

FIG. 6 is a schematic flow chart of step S10 of the method shown in FIG. 5;

FIG. 7 is a schematic flow chart illustrating step S20 of the method shown in FIG. 5;

fig. 8 is a schematic flow chart of another cache allocation method for a router according to some embodiments of the present disclosure;

FIG. 9 is a schematic flow chart of step S30 of the method shown in FIG. 8;

fig. 10 is a schematic flow chart of another cache allocation method for a router according to some embodiments of the present disclosure;

FIG. 11 is a schematic flow chart illustrating step S40 of the method shown in FIG. 10;

fig. 12 is a schematic flow chart of another cache allocation method for a router according to some embodiments of the present disclosure;

fig. 13 is a schematic structural diagram of a network on chip according to some embodiments of the present disclosure;

fig. 14 is a schematic block diagram of another network on chip provided by some embodiments of the present disclosure; and

fig. 15 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

For a Network on chip (NoC), a narrow definition may be understood as a Network architecture for implementing communication functions between modules of a chip, the NoC being composed of routing nodes, communication links and Network Interfaces (NI). Another broad definition may be understood as a NoC that is an entire multiprocessor system on a single chip based on network communication, the NoC including Processing Elements (PEs) and a communication network. The processing unit realizes a generalized computing function, can be a microprocessor core, and can also be a storage unit or other functional components; a communication network, i.e., a NoC in a narrow sense, is responsible for interconnection between processing units.

In a NoC routing node (also referred to as a router), a Buffer (Buffer) may be used to temporarily store data of an input port that cannot be immediately forwarded because data input by multiple input ports compete for the same output port, but the Buffer configuration may also cause some problems. For example, a cache may generate large power consumption, including static power consumption when idle and dynamic power consumption when performing read/write; cache memory occupies a large amount of chip area; the complexity of system design is greatly increased by corresponding control logic; the system working frequency can be reduced by about 50%. Therefore, the cache quota must be strictly limited in the design of the routing node, and the limited cache resources must be reasonably allocated because the utilization efficiency of the cache resources is different according to different cache allocation strategies, which results in different network delays and throughput performance of the system.

Since nocs are mostly designed for specific applications, the traffic of the network is not ideally uniform pattern, and the load of the routing nodes in each direction is not uniform. Therefore, in order to control the cache overhead of the routing node in the NoC and provide a good network delay and throughput performance guarantee for the system under a strict overhead constraint, it is necessary to research the cache allocation technology of the routing node in the NoC.

The cache allocation can be divided into static allocation and dynamic allocation according to different execution stages of the cache allocation. Static allocation refers to a scheme for determining the configuration of a cache in the initialization stage of the operation of the NoC system, and dynamic allocation refers to the real-time configuration of cache resources in the operation stage of the NoC system. The two cache allocation modes are used for allocating cache resources from different angles and are suitable for different application scenes.

The connection and data transmission of each module in the current NoC system are mainly realized by a Router (Router) through various topologies, and the connection and data transmission mainly form a structure as shown in fig. 1. Referring to fig. 1, a master 1A, a master 1B are connected to a router 1, and a master 2A is connected to a router 2. In an actual NoC system, these main devices may be a central processing unit (CPU core), a direct memory access module (DMA), a special algorithm module, and the like, and the main devices are responsible for initiating read and write operation requests. The slave 1A and the slave 1B are connected to the router 1, and the slave 2A and the slave 2B are connected to the router 2. These slave devices may be intellectual property cores (IPs), such as IO interfaces IPs, display unit IPs, etc., or may be storage devices such as DDRs. There is also a separate port connection between router 1 and router 2 to enable master 1A, master 1B to access slave 2A and slave 2B.

The router (router 1 or router 2) determines the output port of the read-write request by means of calculation or a lookup table according to the address information in the read-write request sent by the master device or the ID information of the slave device to ensure that the read-write request reaches the accessed slave device. After receiving the read-write request and completing the corresponding operation, the slave device returns the responded data and state information to the master device initiating the read-write request through the router. The returned route path can be the original route return, and other route paths can be selected to return according to the actual application requirements of the system.

Regarding dynamic cache allocation, the current dynamic cache allocation scheme mainly adopts two-aspect technology to realize dynamic allocation. The first is to divide the cache into 3 types, which are respectively a Port Virtual Channel (VC) dedicated cache, a Port shared cache (PSB), and a Global shared cache (GSB). And secondly, the dynamic allocation of the bandwidth is realized by allocating different cache numbers to different ports in a way of port bandwidth proportional allocation and priority arbitration.

Fig. 2 is a schematic diagram of a dynamic router cache allocation scheme. As shown in fig. 2, all cache resources in the router 5 are managed in a unified and centralized manner, and are allocated to all input ports through a plurality of register sets and control logic, so as to implement dynamic cache allocation. The Virtual Channel (VC) dedicated cache serves as a path minimum bandwidth guarantee to avoid deadlock (deadlock) caused by the exhaustion of input port resources by some kind of transmission data or transmission request. The port shared cache (PSB) can further improve the transmission bandwidth of each VC channel in each port, and improve the flexibility of bandwidth allocation among VC channels. The global shared cache (GSB) can make the bandwidth allocation between the ports more flexible, and can realize the dynamic adjustment of the bandwidth allocation between the ports.

For example, in this example, router 5 includes three ports, port 5A, port 5B, and port 5C, respectively. Each port has a plurality of VC channels, each VC channel having a corresponding VC private cache. Each port has a port shared cache (PSB) that may be dynamically allocated to each VC port. The router 5 also has a global shared cache (GSB), which can be dynamically allocated to each port to serve as a port shared cache for each port. For example, the router 5 further includes a cache arbitration distributor BAA5 and a route arbitration switch RAS 5. The buffer arbitration distributor BAA5 is used for arbitrating each port and distributing buffers based on the arbitration result. The route arbitration switch RAS5 is used to arbitrate switching between ports and determine a data transmission port based on the arbitration result.

Regarding the transmission of the cache allocation information, the counter mode is mainly adopted to update the cache allocation information. For example, for a single transmission path composed of a sending register and a receiving register, after setting a buffer allocation value of each port of the receiving router by using a configuration register, in an initialization stage, the allocation value is transferred to an output port of the sending router connected to the port by a sideband bypass signal. An increasable and decreasable counter is configured in the sending router for each output port so as to update the number of buffers available to the port of the corresponding receiving router in real time. The cache mechanism based on the token is beneficial to the flow control of the sending router, and avoids the loss and overflow of the transmitted data.

Fig. 3 is a schematic diagram of data interaction of a router in a network on chip. It should be noted that, in an actual NoC system, bidirectional transmission channels are provided between routers, and each of the bidirectional transmission channels plays roles of both a sender and a receiver, and in order to avoid confusion, fig. 3 only shows a single transmission channel, where a router 3 serves as a sender (i.e., a sending router) and a router 4 serves as a receiver (i.e., a receiving router). The interaction between routers is described below with reference to fig. 3, taking a single transmission channel as an example.

For example, as shown in fig. 3, the router 3 serves as a transmitting router, the router 4 serves as a receiving router, and the port 3A of the router 3 and the port 4A of the router 4 are connected to each other to transmit data. Ports 3B and 3C of router 3 and port 4B of router 4 are also connected to other routers and are not shown in detail in fig. 3.

Before data transmission, at initialization, the router 4 will assign its buffer type and number allocated to the port 4A as initial values to the release counter 4A. The release counter 4A transmits its value to the reception counter 3A in the router 3 by a sideband bypass signal indicated by a black solid line in fig. 3. For example, the value may be transmitted by a pulse signal or any other type of signal. For example, every time a value of 1 is transmitted, the release counter 4A will decrement by 1 and the receive counter 3A will increment by 1. As long as the value in release counter 4A is not 0, release counter 4A continues to transmit values to receiving router 3A until release counter 4A decrements to 0. When the release counter 4A is decremented to 0, the data in the release counter 4A is transferred to the reception counter 3A in the router 3, and the value of the reception counter 3A is equal to the initial value of the release counter 4A.

For example, in some examples, the initial value in release counter 4A is 20, i.e., router 4 allocates 20 port-shared buffers for port 4A. The release counter 4A then passes its value to the receive counter 3A via the sideband bypass signal. Every time the value 1 is passed, the release counter 4A is decremented by 1 and the receive counter 3A is incremented by 1 until the value of the release counter 4A becomes 0, at which time the value of the receive counter 3A reaches 20. This successfully records the number of buffers available to the port 4A of the router 4 in the reception counter 3A.

After the initialization is completed, the router 3 and the router 4 enter a normal operation phase. In the normal operation phase, every time router 3 sends a transmission request (or transmits a data packet) to router 4, reception counter 3A will be automatically decremented by 1. After forwarding the transmission request, the router 4 releases a corresponding buffer, and the released buffer is again transmitted to the receiving counter 3A in the router 3 via the sideband bypass signal, so that the receiving counter 3A is again incremented by 1.

When the router 4 fails to forward the transmitted data or request and cannot release the buffer, and the router 3 still continuously transmits the data or request to the router 4, the value of the receiving counter 3A is continuously decremented. When the value of the receive counter 3A is decremented to zero, the router 3 will automatically suspend the transmission of data or requests to the router 4 to avoid the loss of data or requests due to overflow.

The above manner is a token-based cache pre-allocation, that is, a receiving router (e.g. router 4) will always reserve a certain cache allocation budget in a sending router (e.g. router 3), so that the continuity of transmission can be ensured. However, when the output port of the sending router (for example, the port 3A of the router 3) has no transmission task, the shared cache reserved for the corresponding port in the receiving router (for example, the port shared cache reserved for the port 4A in the router 4) cannot be released for use by other input ports in the receiving router, which results in a low utilization rate of the shared cache, and cannot dynamically adjust bandwidth allocation at a fine granularity in real time, so that it is difficult to solve imbalance of transmission load traffic of each transmission path in the NoC network in time and space, and thus, transmission congestion occurs on some paths in a certain time period, which results in a severe shortage of transmission bandwidth and an increase of transmission delay of these paths.

At least one embodiment of the disclosure provides a cache allocation method for a router, a network on chip and an electronic device. The cache allocation method can dynamically adjust the allocation of the cache on a finer granularity, improve the utilization rate of the shared cache, improve the transmission bandwidth of the active port, reduce the transmission delay and improve the transmission performance of the network on chip. At least some embodiments of the cache allocation method adjust the priority of each port in the current receiving router based on the cache congestion state in the upper-level sending router, and may implement dynamic adjustment of the port priority, thereby performing cache allocation based on the port priority, and further improving transmission performance.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference numerals in different figures will be used to refer to the same elements that have been described.

At least one embodiment of the present disclosure provides a cache allocation method for a router. The router includes a plurality of receiving ports and a plurality of cache units. The cache allocation method comprises the following steps: determining a valid active port of a plurality of receive ports; and in response to that the number of the cache units in the global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache unit of the at least one idle port and adding the recovered cache unit into the global shared pool. The buffer units which can be allocated in the plurality of buffer units form a global shared pool, the buffer units in the global shared pool are used for being allocated to the effective active ports, and the idle ports are the receiving ports except the effective active ports in the plurality of receiving ports.

Fig. 4 is a schematic structural diagram of a network on chip according to some embodiments of the present disclosure, where the cache allocation method according to the embodiments of the present disclosure is used, for example, in a router in the network on chip. For example, as shown in fig. 4, the network on chip 100 includes a plurality of routers including a receiving router (e.g., router 7) and at least one transmitting router (e.g., router 6). The receiving router is configured to receive data transmitted from at least one transmitting router.

The receiving router is connected with each sending router through a data and command transmission bus and a bidirectional control channel. The data and command transfer bus is configured to transfer data from the sending router to the receiving router. The bidirectional control channel is configured to transmit control signals bidirectionally between the receiving router and the transmitting router.

For example, the receiving router includes a cache allocation arbiter 110, a cache allocation controller 120, a plurality of receiving ports (e.g., ports 7A, 7B, and 7C), and a plurality of cache units. Each receiving port is connected to a transmitting router. For example, in the example shown in fig. 4, port 7A of the receiving router is connected to port 6A of the sending router, and both are connected through a data and command transfer bus and a bidirectional control channel. A cache unit is, for example, a Buffer in a receiving router, and tens of cache units or even hundreds of cache units are usually arranged in the receiving router. For example, each cache unit has about 500-600 bits of storage space.

The bidirectional control channels include, for example, a first channel ch1, a second channel ch2, and a third channel ch 3. The first channel ch1 is configured to transmit a buffer congestion status signal generated by the sending register to the receiving register, the buffer congestion status signal indicating the congestion status of the buffer in the corresponding sending router. The second channel ch2 is configured to send a reclaim request generated by the cache assignment controller 120 to the send register. The third channel ch3 is configured to transmit the token recovery signal generated by the transmission register to the reception register. For example, the bi-directional control channel may be a sideband bypass control channel. Of course, the embodiments of the present disclosure are not limited to this, and the data and command transmission bus may be multiplexed into a bidirectional control channel, or any other suitable manner may be adopted as long as the control signal can be transmitted bidirectionally between the sending router and the receiving router, and the embodiments of the present disclosure are not limited to this. For example, a sideband bypass channel for transmitting a token release signal is further provided between the receiving router and the transmitting router to pass the value of the release counter 7A to the receiving counter 6A in normal operation.

It should be noted that the ports 6B and 6C of the sending router and the ports 7B and 7C of the receiving router are also connected to other routers, and for any one of the connection paths including the sender and the receiver, the connection manner is similar to that of the ports 6A and 7A of the sending router and is not shown in detail in fig. 4.

It should be noted that, in the embodiments of the present disclosure, the sending router and the receiving router are determined with respect to the roles of the respective routers in the data transmission link. For a router, when the router sends data, it acts as a sending router, and when the router receives data, it acts as a receiving router. That is, from the viewpoint of the transmission direction, for a certain register, the register acts as a receiving router with respect to an upper router, and acts as a transmitting router with respect to a lower router. The roles of the sending router and the receiving router are interchangeable. For the sake of simplicity, the transmitting router in fig. 4 only shows the output port (i.e., the transmitting port), the receiving router only shows the input port (i.e., the receiving port), and the signals of the port B and the port C in the two routers are simplified, and only the port a is described in detail. In practical applications, a router has both input ports and output ports.

It should be noted that compared with a typical system on chip, the system on chip 100 provided by the embodiment of the present disclosure adds a buffer congestion status signal, a recycle request, a token recycle signal, and a buffer allocation controller 120. The buffer allocation controller 120 is responsible for allocating the amount of buffer to be recycled to the relevant free port and issuing a recycle request to its upper router. The buffer congestion state signal, the recovery request and the token recovery signal are mainly used for the dynamic allocation and recovery management of the tokens between the sending router and the receiving router. While at initialization and normal operation, the token release signal path is still used.

Fig. 5 is a flowchart illustrating a cache allocation method for a router, which is used in the network on chip 100 shown in fig. 4, according to some embodiments of the present disclosure. For example, in some examples, as shown in fig. 5, the cache allocation method includes the following operations.

Step S10: determining a valid active port of a plurality of receive ports;

step S20: and in response to that the number of the cache units in the global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache unit of the at least one idle port and adding the recovered cache unit into the global shared pool.

For example, the cache units that can be allocated in the plurality of cache units form a global shared pool, and the cache units in the global shared pool are used for being allocated to the effective active ports. The idle port is a receiving port of the plurality of receiving ports except for the active port.

The above steps are exemplarily described below with reference to the network on chip 100 shown in fig. 4.

For example, in step S10, a valid active port is determined among the plurality of receiving ports of the receiving router. For example, port 7A, port 7B and port 7C of the receiving router are all for receiving data and therefore are all receiving ports, and a valid active port is determined among port 7A, port 7B and port 7C. Of course, the receiving router may also include a plurality of transmitting ports, which are not shown in fig. 4.

For example, the active port may be the receiving port that is transmitting data at the current time. For example, as shown in fig. 4, if there is a port that is transmitting data among the ports 7A, 7B, and 7C, the port that is transmitting data may be directly determined as a valid active port. Since the active port is transmitting data, the buffer unit allocated to the port in the receiving router must be occupied to some extent (the transmitted data needs to be stored in the buffer unit of the port), and therefore the number of the buffer units corresponding to the port must be smaller than the maximum preset value.

For example, the active port may also be a receiving port which does not transmit data at the current time, has corresponding data in the corresponding upper-level router, and has a number of corresponding cache units smaller than the maximum preset value. For example, as shown in fig. 4, if some ports that do not transmit data at the current time exist in the ports 7A, 7B, and 7C, it is assumed that the port 7A does not transmit data at the current time, but corresponding data exists in the upper level router (i.e., the sending router configured to send data to the current router) corresponding to the port 7A, that is, there is data that needs to be sent to the port 7A in the sending router, and the number of cache units corresponding to the port 7A in the receiving router is smaller than the maximum preset value, the port 7A is also determined to be a valid active port.

For example, the preset maximum value is the maximum value of the allocable buffer unit set for each receiving port in the initialization stage, which may be set by the configuration register. The preset maximum values of the ports may be the same or different, and the specific values of the preset maximum values may also be any applicable values, which may be determined according to actual design requirements, and the embodiment of the present disclosure is not limited thereto.

Fig. 6 is a schematic flow chart of step S10 in the method shown in fig. 5. For example, in some examples, step S10 may further include the following operations.

Step S11: determining a receiving port which does not transmit data at the current moment and has corresponding data in a corresponding upper-level router as a standby receiving port;

step S12: determining the alternative receiving ports of which the number of the corresponding cache units is smaller than the maximum preset value as effective active ports;

step S13: and determining the receiving port transmitting data at the current moment as a valid active port.

For example, in step S11, a receiving port that does not transmit data at the current time and has corresponding data in the corresponding upper level router is determined as an alternative receiving port. The meaning of "corresponding data exists in the corresponding upper level router" has been described above, and is not described herein again. The number of buffer units of the alternative receiving port may be smaller than the maximum preset value, and may also be equal to the maximum preset value. Further determination of the alternate receiving port is therefore required.

For example, in step S12, the candidate receiving ports whose number of corresponding buffer units is smaller than the maximum preset value are determined as valid active ports. For example, if the number of the buffer units of some candidate receiving ports is equal to the maximum preset value, it indicates that the number of the buffer units of the candidate receiving ports has reached the upper limit, and the buffer units cannot be allocated to the candidate receiving ports any more, so that the candidate receiving ports need to be removed, and the candidate receiving ports do not participate in the subsequent buffer allocation operation.

For example, in step S13, the receiving port that is currently transmitting data is determined as the valid active port. Since the active port is transmitting data, the buffer unit allocated to the port in the receiving router must be occupied to a certain extent, and therefore the number of the buffer units corresponding to the port must be smaller than the maximum preset value.

It should be noted that, although steps S11, S12, and S13 are shown in fig. 6 in a certain order, this does not mean that these steps need to be executed in the order shown in fig. 6. For example, steps S11 and S12 may be sequentially performed, and step S13 may be performed in parallel with steps S11 and S12, that is, the receiving port to which data is being transferred and the receiving port to which data is not being transferred may be simultaneously determined, whereby the processing efficiency may be increased.

For example, as shown in fig. 5, in step S20, if the number of cache units in the global shared pool is less than the number of active ports and there is at least one free port in the plurality of receiving ports, the cache unit of the at least one free port is recycled, and the recycled cache unit is added to the global shared pool. Here, the cache units that can be allocated among the plurality of cache units of the receiving router form a global shared pool, that is, the cache units that are not occupied among the plurality of cache units of the receiving router form the global shared pool. The buffer units in the global shared pool are used to be allocated to the active ports (for example, according to the priority allocation, the specific allocation manner will be described below, and will not be described herein again). For example, the idle port is a receiving port other than the active port among the plurality of receiving ports, that is, the idle port does not transmit data at the current time, and there is no data that needs to be sent to the idle port in the upper level router (sending router) corresponding to the idle port.

By recycling the buffer units of the idle ports, the idle ports can be prevented from occupying buffer resources. Here, the recycled cache unit is a port-shared cache (PSB) of the free port. The recycled cache unit can be used as a global shared cache (GSB) to be allocated to an effective active port, so that the allocation of the cache can be dynamically adjusted on a finer granularity, the utilization rate of the shared cache can be improved, the transmission bandwidth of the active port is improved, the transmission delay is reduced, and the transmission performance of the network on chip is improved.

Fig. 7 is a schematic flow chart of step S20 in the method shown in fig. 5. For example, in some examples, step S20 may further include the following operations.

Step S21: determining a cache reclamation budget based on the number of active ports;

step S22: according to the recovery weight value of at least one idle port, distributing the cache recovery budget to the at least one idle port to obtain a budget value of each idle port in the at least one idle port;

step S23: based on the budget value, sending a recovery request to an upper-level router connected with each idle port in at least one idle port, wherein the recovery request carries the budget value of the corresponding idle port;

step S24: and in response to receiving a token recovery signal sent by an upper-level router connected with each idle port, recovering the cache units of each idle port according to the budget value, and adding the recovered cache units into the global sharing pool.

For example, in step S21, a cache reclamation budget is determined based on the number of active ports that are active.

For example, in some examples, the cache reclamation budget is equal to the number of active ports that are active. For example, assuming that there are 8 valid active ports, the cache reclamation budget may be determined to be 8, i.e., 8 cache units need to be reclaimed from the cache units of the idle ports. At this time, no matter how many cache units exist in the global shared pool, at least one cache unit can be allocated to each effective active port after the cache units are recovered. The method can improve the recovery rate and the allocation allowance of the buffer unit.

For example, in other examples, the cache reclamation budget is equal to a difference between the number of active ports and the number of cache molecules in the global shared pool. For example, still assuming that there are 8 valid active ports, and at this time, the number of cache units in the global shared pool is 3, the cache reclamation budget may be determined to be 5, that is, 5 cache units need to be reclaimed from the cache units of the idle ports. At this time, 3 buffer units and 5 recycled buffer units in the global shared pool may be allocated to 8 active ports, so that each active port is allocated to obtain one buffer unit. The method can reduce the potential influence on the idle port and avoid the shortage of cache resources when the idle port bursts large-flow data transmission after recovery.

It should be noted that, in the embodiment of the present disclosure, the manner of determining the cache recycling budget is not limited to the two manners described above, and may also be determined in any other suitable manner, for example, according to a certain proportion of the number of active ports, which may be determined according to actual needs, and the embodiment of the present disclosure is not limited thereto.

For example, in step S22, the cache recycling budget is allocated to the at least one idle port according to the recycling weight value of the at least one idle port, so as to obtain a budget value of each of the at least one idle port. As shown in fig. 4, assuming that the port 7A and the port 7B in the receiving register are idle ports, and the port 7A and the port 7B each have a reclamation weight value, the cache reclamation budgets are allocated to the port 7A and the port 7B according to the reclamation weight values of the port 7A and the port 7B, so as to obtain the budget values of the port 7A and the port 7B. For example, if the cache recycling budget is 8, the recycling weight value of the port 7A is 0.5, and the recycling weight value of the port 7B is also 0.5, the budget value allocated to the port 7A is 4, and the budget value allocated to the port 7B is also 4. That is, it is necessary to retrieve 4 cache units from the port 7A and 4 cache units from the port 7B. For another example, if the cache recovery budget is 8, the recovery weight value of the port 7A is 0.25, and the recovery weight value of the port 7B is 0.75, the budget value allocated to the port 7A is 2, and the budget value allocated to the port 7B is also 6. That is, 2 cache units need to be retrieved from port 7A, and 6 cache units need to be retrieved from port 7B.

For example, the reclamation weight value is determined based on the number of cache cells of each free port, and the reclamation weight value is positively correlated with the number of cache cells of the corresponding free port. That is, if the number of the cache units of a certain idle port is large, the recovery weight value of the idle port is large, so that the budget value of the idle port is large, and more cache units can be recovered from the idle port; if the number of the buffer units of a certain idle port is less, the recovery weight value of the idle port is smaller, so that the budget value of the idle port is smaller, and fewer buffer units can be recovered from the idle port. By setting the recovery weight value, the buffer units can be recovered in a targeted manner, and the number of the buffer units of each idle port after recovery is basically balanced, so that the bandwidth of each idle port during burst data transmission after recovery is ensured.

It should be noted that, in the embodiment of the present disclosure, the manner of allocating the budget value is not limited to the manner described above, and any other suitable manner may also be adopted, which may be determined according to actual needs, and the embodiment of the present disclosure is not limited to this. For example, in other examples, the recovery weight value may not be set, and the cache recovery budgets may be equally allocated to the respective idle ports, so that the allocation manner of the cache recovery budgets may be simplified.

For example, in step S23, based on the budget value, a reclamation request is sent to the upper router to which each free port is connected. For example, as shown in fig. 4, in the receiving router, assuming that the cache assignment controller 120 determines that the budget value of the port 7A (the free port in this case) is 4, a reclamation request is sent to the upper router (the sending router) to which the port 7A is connected, the reclamation request being sent through, for example, the second channel ch 2. For example, the reclaim request carries a budget value for the corresponding free port. When the sending router receives the recovery request, the budget value can be obtained, so that the budget value is subtracted from the value of the corresponding receiving counter. For example, assuming that the sending router receives a reclaim request from port 7A of the receiving router indicating a budget value of 4, the receive counter 6A would subtract 4 from the original value. The sending router may then send a token reclaim signal to the receiving router to indicate that corresponding reclamation has occurred. For example, a token recovery signal may be transmitted through the third channel ch 3.

For example, in step S24, after receiving the token recovery signal transmitted by the upper router to which each free port is connected, the buffer unit of each free port is recovered according to the budget value, and the recovered buffer unit is added to the global shared pool. For example, as shown in fig. 4, when the port 7A of the receiving router receives the token recycling signal sent by the sending router, it can be known that the value of the receiving counter 6A in the sending router has been subtracted by a value (for example, 4) equal to the budget value, and therefore, 4 cache units can be taken out from the cache unit of the port 7A and put into the global shared pool. Therefore, the recovery of the buffer unit of the idle port can be realized.

By the above mode, the recovery of the buffer unit of the idle port can be realized, and the idle port is prevented from occupying buffer resources. The recycled cache unit is a port shared cache (PSB) of the idle port, and the recycled cache unit may be allocated to the active port as a global shared cache (GSB). Therefore, the allocation of the cache can be dynamically adjusted on a finer granularity, the utilization rate of the shared cache can be improved, the transmission bandwidth of the active port is improved, the transmission delay is reduced, and the transmission performance of the network on chip is improved. And under the condition of the same performance, the utilization rate of the shared cache is improved, so that the number of the cache units arranged in the register can be effectively reduced, the area of the router can be reduced, the power consumption of the router is reduced, the working frequency of the router is improved, the production cost and the use cost of the chip can be reduced, the difficulty of physical realization of the chip is reduced, and the time sequence convergence of the chip is accelerated.

Fig. 8 is a flowchart illustrating another cache allocation method for a router according to some embodiments of the present disclosure. In this embodiment, except that the method further includes step S30, the cache allocation method provided in this embodiment is substantially the same as the cache allocation method shown in fig. 5, and reference may be made to the foregoing for related descriptions, which are not repeated herein.

For example, as shown in fig. 8, the cache allocation method further includes the following operations.

Step S30: and allocating the cache units in the global shared pool to the effective active ports according to the priority of the effective active ports.

Fig. 9 is a schematic flow chart of step S30 in the method shown in fig. 8. For example, in some examples, step S30 may further include the following operations.

Step S31: receiving a cache congestion state signal sent by an upper-level router connected with each receiving port in a plurality of receiving ports, wherein the cache congestion state signal indicates the congestion state cached in the corresponding upper-level router;

step S32: determining the priority of the effective active port according to the cache congestion state signal;

step S33: and allocating the cache units in the global sharing pool to the effective active ports according to the descending order of the priority.

For example, in step S31, a buffer congestion state signal sent by an upper level router to which each of the plurality of receiving ports is connected is received, and the buffer congestion state signal indicates a congestion state buffered in the corresponding upper level router. For example, as shown in fig. 4, taking port 7A of the receiving router as an example, the receiving router receives a buffer congestion state signal transmitted by an upper-level router (transmitting router) connected to port 7A, the buffer congestion state signal indicating a congestion state buffered in the transmitting router corresponding to port 7A. The buffer congestion status signal is transmitted, for example, via the first channel ch 1.

For example, the buffer congestion status signal indicates a plurality of congestion states, which represent different degrees of congestion. For example, in some examples, the buffer congestion status signal may indicate three congestion levels, low, medium, and high. For example, the buffer congestion status signal may be any signal type as long as it can indicate the corresponding congestion status. For example, in some examples, the buffer congestion status signal may use a three-bit binary number, and different congestion statuses may be represented by different encoding modes of the binary number. For example, "001" indicates a low congestion state, "010" indicates a medium congestion state, and "100" indicates a high congestion state. It should be noted that, the number of the multiple congestion states indicated by the buffer congestion state signal and the specific signal type of the buffer congestion state signal may be determined according to actual needs, and the embodiment of the disclosure is not limited to this.

For example, the plurality of congestion states are determined based on the number of cache units occupied by data corresponding to the receiving port corresponding to the upper level router in the data stored in the upper level router. Here, the data may be a packet, a request, a command, or the like, and the cache unit occupied by the data is a cache unit in the upper router.

For example, in some examples, as shown in fig. 4, if the number of cache units occupied by the data corresponding to the port 7A in the data stored by the sending router falls within a certain numerical range, the congestion state indicated by the cache congestion state signal sent by the sending router to the receiving router is determined according to the numerical range. For example, assume [10,20] corresponds to a first state, which represents a high congestion level; [0,9] corresponds to a second state, which represents a low congestion level. If the number of the cache units occupied by the data corresponding to the port 7A in the data stored in the sending router is 13, the cache congestion state signal sent by the sending router indicates the first state, where the 13 cache units occupied by the data corresponding to the port 7A refer to 13 cache units occupied by the corresponding data in the sending router, and the cache units are cache units in the sending router.

For example, in other examples, as shown in fig. 4, if the percentage of the number of cache units occupied by the data corresponding to the port 7A in the data stored in the sending router to the number of cache units occupied by the data stored in the sending router falls within a certain numerical range, the congestion state indicated by the cache congestion state signal sent by the sending router to the receiving router is determined according to the numerical range. For example, assume [0.5,1] corresponds to a first state, which represents a high congestion level; 0,0.5) corresponds to a second state representing a low congestion level. If the number of the cache units occupied by the data corresponding to the port 7A in the data stored in the sending router is 7 and the number of the cache units occupied by the data stored in the sending router is 10, the percentage of the number of the cache units occupied by the data corresponding to the port 7A in the data stored in the sending router to the number of the cache units occupied by the data stored in the sending router is 0.7, and the cache congestion state signal sent by the sending router indicates the first state. Here, the 7 cache units occupied by the data corresponding to the port 7A refer to 7 cache units occupied by the corresponding data in the sending router, the 10 cache units occupied by the data stored in the sending router refer to 10 cache units occupied by all the data in the sending router, and these cache units are cache units in the sending router.

For example, in some examples, the plurality of receive ports of the receive router includes at least a first receive port and a second receive port. The number of the cache units occupied by the data corresponding to the first receiving port in the data stored in the upper level router connected to the first receiving port is greater than the number of the cache units occupied by the data corresponding to the second receiving port in the data stored in the upper level router connected to the second receiving port, or the percentage of the number of the cache units occupied by the data corresponding to the first receiving port in the data stored in the upper level router connected to the first receiving port in the number of the cache units occupied by the data stored in the upper level router is greater than the percentage of the number of the cache units occupied by the data corresponding to the second receiving port in the data stored in the upper level router connected to the second receiving port in the number of the cache units occupied by the data stored in the upper level router. For example, the plurality of congestion states includes at least a first state and a second state. The buffer congestion state signal corresponding to the first receiving port indicates a first state, the buffer congestion state signal corresponding to the second receiving port indicates a second state, and the congestion degree of the first state is higher than that of the second state.

That is, for two ports (taking port 7A and port 7B as an example) in the receiving router, if the number of cache units occupied by data corresponding to port 7A in the data stored in the transmitting router connected to port 7A is greater than the number of cache units occupied by data corresponding to port 7B in the data stored in the transmitting router connected to port 7B, the congestion degree of the state indicated by the cache congestion state signal received by port 7A is higher than the congestion degree of the state indicated by the cache congestion state signal received by port 7B. Or, if the percentage of the number of the buffer units occupied by the data corresponding to the port 7A in the data stored in the sending router connected to the port 7A to the number of the buffer units occupied by the data stored in the sending router is greater than the percentage of the number of the buffer units occupied by the data corresponding to the port 7B in the data stored in the sending router connected to the port 7B to the number of the buffer units occupied by the data stored in the sending router, the congestion degree of the state indicated by the buffer congestion state signal received by the port 7A is higher than the congestion degree of the state indicated by the buffer congestion state signal received by the port 7B.

For example, in step S32, the priority of the active port is determined according to the buffer congestion status signal. For example, after each port of the receiving router receives the buffer congestion status signal, the congestion level of the status indicated by each buffer congestion status signal may be identified, so that the priority of each active port may be determined according to the congestion level. For example, a port corresponding to a cache congestion state signal with a higher congestion degree has a higher priority, and a port corresponding to a cache congestion state signal with a lower congestion degree has a lower priority. That is, the priority of each port is positively correlated with the congestion level indicated by the buffer congestion status signal of each port.

For example, in some examples, in a case where the buffer congestion status signal indicates three congestion degrees of low, medium, and high, the priority levels may be divided into three priority levels of low, medium, and high to correspond to the congestion degrees one to one. For example, if the buffer congestion status signal of a certain active port indicates a low congestion level, the priority of the active port is a low priority; if the buffer congestion state signal of a certain effective active port indicates the medium congestion degree, the priority of the effective active port is the medium priority; if the buffer congestion status signal of a certain active port indicates a high congestion level, the priority of the active port is a high priority. It should be noted that the dividing manner of the priority is not limited to the manner described above, and the priority is not limited to be divided into three levels, namely, low, medium and high, and the dividing manner and the number of levels of the priority may be determined according to actual needs, which is not limited by the embodiment of the present disclosure.

For example, in other examples, the priorities include at least a first priority and a second priority. If the congestion degree indicated by the cache congestion state signal received by the first receiving port in the receiving router is higher than the congestion degree indicated by the cache congestion state signal received by the second receiving port, it may be determined that the first receiving port is of a first priority, the second receiving port is of a second priority, and the first priority is higher than the second priority.

That is, for two ports (taking port 7A and port 7B as an example) in the receiving router, if the congestion degree indicated by the buffer congestion status signal received by port 7A is higher than the congestion degree indicated by the buffer congestion status signal received by port 7B, the priority of port 7A is higher than the priority of port 7B.

For example, in step S33, the cache units in the global shared pool are allocated to the active ports according to the order of decreasing priority. That is, the buffer unit is allocated to the active port with higher priority first, and then the buffer unit is allocated to the active port with lower priority. In some examples, if there are multiple active ports with the same priority, the active ports may be allocated cache units in a random order. For example, the cache units in the global shared pool are global shared caches (GSBs), which are allocated to the active ports as the port shared caches (PSBs) of the active ports.

For example, in some examples, in a case where an active port is allocated, the number of buffer units allocated by the active port is 1, that is, in one allocation, 1 buffer unit is allocated to the active port, so that more active ports can be allocated. Of course, the embodiments of the present disclosure are not limited thereto, and a plurality of buffer units may be allocated to one active port in one allocation, which may be determined according to actual needs, and the embodiments of the present disclosure are not limited thereto.

For example, as shown in fig. 4, after the buffer allocation arbiter 110 in the receiving router receives the buffer congestion status signals corresponding to the ports, the priority is determined for each active port based on the buffer congestion status signals, and the buffer units in the global shared pool are allocated to the active ports according to the order of gradually decreasing priorities. For example, in some examples, as long as the number of cache locations in the global shared pool is not 0, valid active ports are allocated cache locations according to priority until all valid active ports are allocated cache locations, or until the number of cache locations in the global shared pool is 0.

In the cache allocation method provided in the embodiment of the present disclosure, the cache congestion state of the sending router is transferred to the receiving router, so that the cache allocation arbiter of the receiving router adjusts the priority of each active port in the receiving router based on the cache congestion state in the sending router, and further performs cache allocation according to the priority of each active port, thereby achieving dynamic adjustment of the port priority, dynamically adjusting the cache allocation at a finer granularity, and achieving cache allocation linkage of two-level routers, thereby ensuring that the congestion state of each path of the receiving router is alleviated, and reducing transmission delay in a burst high load flow of each path in a specific time period. The receiving router sends the recovery request to the sending router connected with the idle port according to the working state of each port, and the idle shared cache of the idle port can be reallocated to the effective active port, so that the transmission bandwidth requirement of the effective active port is ensured to the maximum extent, the utilization rate of the cache in the router is improved, and the transmission performance of the network on chip is improved.

Fig. 10 is a flowchart illustrating another cache allocation method for a router according to some embodiments of the present disclosure. In this embodiment, except that the method further includes steps S40, S50, and S60, the method for allocating the buffer provided in this embodiment is substantially the same as the method for allocating the buffer shown in fig. 8, and reference may be made to the foregoing for related descriptions, which are not repeated herein.

For example, as shown in fig. 10, the cache allocation method further includes the following operations.

Step S40: in the initialization stage, distributing a plurality of cache units to a plurality of receiving ports to complete initialization;

step S50: responding to the condition that the number of the cache units in the global sharing pool is larger than or equal to the number of the effective active ports, and allocating the cache units in at least one global sharing pool to each effective active port;

step S60: and adding the cache units released by the effective active ports into the global shared pool.

It should be noted that, although steps S10-S60 are shown in a certain order in fig. 10, this does not mean that these steps need to be executed in the order shown in fig. 10, and the execution order of each step may be determined according to actual needs, which is not limited by the embodiment of the present disclosure.

For example, in step S40, in the initialization phase, a plurality of buffer units are allocated to a plurality of receiving ports to complete the initialization. For example, in some examples, initialization may occur upon power-up of a system-on-chip to which the cache allocation method applies.

Fig. 11 is a schematic flow chart of step S40 in the method shown in fig. 10. For example, in some examples, step S40 may further include the following operations.

Step S41: allocating a cache of the virtual channel;

step S42: allocating a port shared cache;

step S43: and distributing the global shared cache according to the configured weight value and the set preset maximum value of the port cache according to an arbitration mode.

For example, the allocation of the buffer in the initialization stage is mainly divided into 3 steps, i.e., the above steps S41, S42, and S43. Firstly, software allocates the minimum number and the maximum number of two types of caches, namely a Virtual Channel (VC) of each port and a port shared cache (PSB) of each port through a register respectively. After all the ports are allocated, the remaining cache is the global shared cache (GSB). The initialization process will be briefly described with reference to fig. 11 and 4.

For example, in step S41, the configured minimum value of VC0 is first assigned to release counter 7A of the receiving router (router 7) in order based on the VC channel number, and release counter 7A will start to decrement to transfer the buffer allocation value of VC0 to the sending router (router 6). When sampling the token release signal channel (including the VC channel number) to be valid, the receiving counter 6A in the sending router will increment by 1 until the VC channel number changes, at which time the value of the receiving counter 6A is saved as the token value of VC0, and then the receiving counter 6A starts counting from zero again to count the token value of the next VC channel. After traversing all VC lane numbers in this way, the token value reception of all VC lanes is completed.

For example, in step S42, the passing port shares the buffer value of the buffer (PSB), and the principle and method thereof are the same as the token value passing of the VC channel, and only the token channel identifier needs to be changed, and the token value reception statistics is also implemented by releasing the counter 7A and the reception counter 6A. Thus, token value reception for all ports is completed.

For example, in step S43, a global shared cache (GSB) is pre-allocated. Based on the preset maximum value configured for each port and the default weight value configured for the register, the port of which the cache does not reach the preset maximum value is sent to the cache allocation arbiter 110, and according to the default weight value, during each round of allocation, each port is allocated one by one according to the weight value until the global shared cache is allocated completely, or the port reaches the preset maximum value and exits the cache allocation arbiter 110. Thus, the pre-allocation of the global shared cache is completed.

For example, as shown in fig. 10, after step S40 is completed, that is, after the initialization phase of port buffer allocation is completed, the sending router transmits data or request command to the port 7A of the receiving router through the port 6A, and then the receiving router starts to start dynamic allocation of the global shared buffer and recycle of the idle global shared buffer, that is, steps S10-S30 are performed. For the related descriptions of steps S10-S30, reference is made to the above descriptions, and the description thereof is omitted here.

For example, in step S50, after the active ports are determined, if the number of cache locations in the global shared pool is greater than or equal to the number of active ports, at least one cache location in the global shared pool is allocated to each active port. In this case, it can be ensured that each active port can be allocated to a cache unit, and therefore, cache recovery of idle ports is not required, so that resource overhead can be avoided and a control manner can be simplified.

For example, in step S60, the cache location freed by the active port is added to the global shared pool. For example, when the receiving router forwards the received data or command, the corresponding active port releases the cache unit. At this time, the released cache units are added to the global shared pool for the cache allocation arbiter 110 to allocate according to the priority of each active port, and the released cache units are not returned to the port releasing the cache units.

Fig. 12 is a flowchart illustrating another cache allocation method for a router according to some embodiments of the present disclosure. The following briefly describes a specific operation flow of the cache allocation method provided in the embodiment of the present disclosure with reference to fig. 12.

As shown in fig. 12, after the sending router starts transmitting data or commands to port 7A of the receiving router through port 6A, the receiving router marks port 7A as a valid active port as conditional input C2. Flow C2 is to mark the receiving port that is receiving data as the active port, because it has data or command transmission currently, and it will consume the reserved buffer, and the buffer value will not reach the preset maximum value. The active port marked in flow C2 is taken as input to the decision flow J2.

Regarding other ports (i.e., receiving ports not transmitting data) in the receiving register, the state of the buffer occupied by all data and commands corresponding to these ports in the connected upper-level router (i.e., transmitting router) is taken as condition C1, and if the buffer occupancy is not empty, the port is marked as an alternative active port. For example, the buffer occupancy state may be divided into a plurality of levels of states, such as three-level low-medium-high state indications, by using a preset threshold or relative proportion values, as weight values for buffer allocation in the receiving router (the weight values indicate priorities), which may correspond to 1 (low) -2 (medium) -3 (high), respectively.

The process C1 is mainly responsible for recording the buffer occupation status of the data and commands output to the receiving router in the sending router corresponding to each receiving port of the receiving router, and if there are data and commands that are not completely transmitted, the receiving port will be marked as an alternative receiving port.

Taking C1 as an input of the decision process J1, J1 checks whether the total buffer count of the candidate receiving ports whose buffer occupancy is not empty has reached a preset maximum value, respectively. If the preset maximum value is reached, the port does not participate in the cache allocation arbitration any more, and the port is marked as an invalid active port. If the predetermined maximum value is not reached, it is marked as a valid active port as an input to the decision process J2.

The process J2 will determine whether the number of caches in the global shared pool within the current receiving router, i.e., the remaining global shared cache (GSB), is greater than or equal to the number of active ports. If yes, the flow of A1 is entered, and a global shared cache is allocated to each active port. If not, the process proceeds to decision process J3.

The process J3 will determine if there is a free port on the current receiving router. If not, the operation flow of A3 is entered, and after waiting for the current receiving router to forward the data or command to the next level router, the released cache is added into the global shared pool. If a free port currently exists, then operational flow A4 is entered. The operation flow a4 determines the buffer recovery budget and the budget value of each idle port according to the number of the reserved buffers of each idle port, that is, the buffers to be released are allocated to each idle port by the arbitration of the buffer allocation controller 120. And, a recycle request is issued to the last level router (i.e., the sending router) to which the free port is connected. Flow A5 is then entered. In flow A5, the previous router releases the cache that is consistent with the budget value and signals token reclamation, for example, by automatically decrementing its receive counter to release the required amount of cache to the receiving router. Then, the process enters a flow a6, and after the receiving router receives the token recovery signal sent from the sending router, the receiving router recovers the cache of each idle port and adds the recovered cache to the global shared pool.

Then, the flow proceeds to a flow a 2. The buffer occupation state (congestion state) of the previous-level router of each effective active port is sent to the arbiter as an arbitration weight value (priority), and the global shared buffers in the global shared pool are allocated to the effective active ports in the order of priority from high to low, for example, are released to each effective active port one by one until all the global shared buffers are allocated completely.

The cache recovery and the cache allocation are completed in the above mode, and then the above dynamic cache allocation and recovery processes are repeated.

It should be noted that, in the embodiment of the present disclosure, the cache allocation method may further include more or fewer steps, and is not limited to the steps described above. The execution order of the steps is not limited, which may be determined according to actual needs, and the embodiments of the present disclosure are not limited thereto.

At least one embodiment of the present disclosure also provides a network on chip. The network on chip can dynamically adjust the allocation of the cache on a finer granularity, improve the utilization rate of the shared cache, improve the transmission bandwidth of the active port, reduce the transmission delay and improve the transmission performance of the network on chip. At least some embodiments of the network on chip adjust the priority of each port in the current receiving router based on the congestion state of the cache in the upper-level sending router, and may implement dynamic adjustment of the port priority, so that the cache allocation may be performed based on the port priority, and further, the transmission performance may be improved.

Fig. 13 is a schematic structural diagram of a network on chip according to some embodiments of the present disclosure. As shown in fig. 13, the network on chip 200 includes a plurality of routers. For example, the plurality of routers includes a receiving router and at least one transmitting router, the receiving router configured to receive data transmitted from the at least one transmitting router.

The receiving router includes a buffer allocation controller 120, a plurality of receiving ports, and a plurality of buffer units. Each receiving port is connected to a transmitting router. The cache allocation controller 120 is configured to: determining a valid active port of a plurality of receive ports; and in response to that the number of the cache units in the global shared pool is smaller than the number of the effective active ports and at least one idle port exists in the plurality of receiving ports, recovering the cache unit of the at least one idle port and adding the recovered cache unit into the global shared pool. For example, the cache units that can be allocated in the plurality of cache units form a global shared pool, the cache units in the global shared pool are used for being allocated to the active ports, and the idle ports are the receive ports other than the active ports in the plurality of receive ports.

For example, the active port is a receiving port which is currently transmitting data, and/or a receiving port which is not currently transmitting data, has corresponding data in a correspondingly connected sending router, and has a number of corresponding cache units smaller than a maximum preset value.

For example, the receiving router further comprises a cache allocation arbiter 110, the cache allocation arbiter 110 being configured to allocate cache locations in the global shared pool to the active ports according to their priorities.

For example, in some examples, the bidirectional control channel is a sideband bypass control channel. In this case, the network on chip 200 may be the network on chip 100 shown in fig. 4. The sideband bypass control channel includes a first channel ch1, a second channel ch2, and a third channel ch 3. The first channel ch1 is configured to transmit a buffer congestion status signal generated by the transmission register to the reception register. For example, the buffer congestion status signal indicates the congestion status of the buffer in the corresponding sending router. The second channel ch2 is configured to send a reclaim request generated by the cache assignment controller 120 to the send register. For example, the reclamation request carries a budget value for the corresponding free port, which is derived based on the cache reclamation budget. The third channel ch3 is configured to transmit a token recycle signal generated by the sending register to the receiving register for recycling the buffer locations of the free ports according to the budget value and adding the recycled buffer locations to the global shared pool. The method can realize signal transmission, has simple transmission mode, and does not need to greatly change logic.

For example, in other examples, the data and command transfer buses may be multiplexed into a bidirectional control channel. That is, the buffer congestion status signal, the recycle request, and the token recycle signal are all transmitted through the data and command transmission bus, and for example, the signals and the data between the routers may be transmitted in a time division multiplexing manner. For example, the signals may be transmitted by using a data and command transmission bus in a self-defined command manner, and at this time, a physical path with a sideband bypass need not be additionally arranged in the network-on-chip 200, and the transmission of state information and the cache linkage of the two-stage router may also be implemented, which may reduce the change of hardware and simplify the design.

It should be noted that, a specific implementation manner of the bidirectional control channel is not limited to the above-described manner, and any applicable manner may be adopted to implement the bidirectional control channel as long as transmission of the buffer congestion status signal, the reclaim request, and the token reclaim signal is achieved, which may be determined according to actual needs, and this is not limited by the embodiments of the present disclosure. The cache allocation arbiter 110 and the cache allocation controller 120 may be implemented as a logic IP, a module or a circuit composed of components, or any other form, which is not limited in this embodiment of the disclosure.

It should be noted that, in the embodiments of the present disclosure, the sending router and the receiving router are determined with respect to the roles of the respective routers in the data transmission link. For a router, when the router sends data, it acts as a sending router, and when the router receives data, it acts as a receiving router. That is, from the viewpoint of the transmission direction, for a certain register, the register acts as a receiving router with respect to an upper router, and acts as a transmitting router with respect to a lower router. The roles of the sending router and the receiving router are interchangeable. For the sake of simplicity, the transmitting router in fig. 13 only shows the output port (i.e., the transmitting port), the receiving router only shows the input port (i.e., the receiving port), and the signals of the other ports of the two routers are simplified, and only the port a is described in detail. In practical applications, a router has both input ports and output ports.

It should be noted that only the structure related to cache allocation is shown in fig. 13, and other structures in the network on chip 200 may refer to conventional designs, and will not be described in detail here. For detailed description and technical effects of the network on chip 200, reference may be made to the above description of the network on chip 100 and the cache allocation method, which are not described herein again.

Fig. 14 is a schematic block diagram of another network on chip provided by some embodiments of the present disclosure. As shown in fig. 14, the network on chip 300 is configured to implement the cache allocation method for a router according to any embodiment of the present disclosure. The network-on-chip 300 may be the network-on-chip 100 or the network-on-chip 200 described above. For detailed description and technical effects of the network on chip 300, reference may be made to the above description of the network on chip 100 and 200 and the cache allocation method, which are not described herein again.

Fig. 15 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure. As shown in fig. 15, the electronic device 400 includes a network-on-chip 410, and the network-on-chip 410 may be a network-on-chip provided in any embodiment of the disclosure, such as the aforementioned networks-on-chips 100, 200, and 300. For example, the electronic device 400 may be implemented as a chip, an integrated circuit, or the like, or may be implemented as any device capable of routing data transmission, which is not limited by the embodiments of the present disclosure.

For detailed description and technical effects of the electronic device 400, reference may be made to the above description of the network on chip 100, 200, 300 and the cache allocation method, which are not described herein again.

The following points need to be explained:

(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to common designs.

(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.

The above description is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

33页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种面向Spark的基于数据感知的缓存替换方法及系统

Cache allocation method for router, network on chip and electronic equipment

相关技术

网友询问留言