High-performance regularized network-on-chip architecture

文档序号：1098982 发布日期：2020-09-25 浏览：25次中文

阅读说明：本技术 高性能规则化的片上网络架构 (High-performance regularized network-on-chip architecture ) 是由 G·W·贝克勒 M·朗哈默尔 S·V·格里波克于 2020-03-16 设计创作，主要内容包括：提供了用于设计和实现片上网络(NoC)的技术。例如,一种用于将片上网络(NoC)编程到集成电路上的计算机实现的方法包括：确定潜在地被包括在NoC设计中的多个寄存器的第一部分；确定关于所述多个寄存器的第一部分中的寄存器之间的数据路径的路由信息；以及确定与所述多个寄存器的第一部分相关联的预期性能。所述方法还包括：确定所述预期性能是否在门限范围内；在确定所述预期性能在所述门限范围内之后,在所述NoC设计中包括所述多个寄存器的第一部分和所述数据路径；以及生成被配置为使与所述NoC设计相对应的电路在所述集成电路上被实现的指令。(Techniques for designing and implementing a network on chip (NoC) are provided. For example, a computer-implemented method for programming a network on chip (NoC) onto an integrated circuit includes: determining a first portion of a plurality of registers potentially included in a NoC design; determining routing information about data paths between registers in a first portion of the plurality of registers; and determining an expected performance associated with a first portion of the plurality of registers. The method further comprises the following steps: determining whether the expected performance is within a threshold range; including a first portion of the plurality of registers and the data path in the NoC design after determining that the expected performance is within the threshold range; and generating instructions configured to cause circuitry corresponding to the NoC design to be implemented on the integrated circuit.)

1. A computer-implemented method for programming a network on chip (NoC) onto an integrated circuit, the method comprising:

determining a first portion of a plurality of registers potentially included in a NoC design;

determining routing information about data paths between registers in a first portion of the plurality of registers;

determining an expected performance associated with a first portion of the plurality of registers;

determining whether the expected performance is within a threshold range;

based on determining that the expected performance is within the threshold range, including a first portion of the plurality of registers and the data path in the NoC design; and

generating instructions configured to cause circuitry corresponding to the NoC design to be implemented on the integrated circuit.

2. The computer-implemented method of claim 1, wherein the first portion of the plurality of registers is selected from a bank comprising potential register locations specific to the integrated circuit.

3. The computer-implemented method of claim 2, wherein the data path is selected from a plurality of chip-specific data paths included in the library.

4. The computer-implemented method of claim 1, comprising: based on determining that the expected performance is not within the threshold range, moving one or more locations of registers in the first portion of the plurality of registers, altering a portion of the data path, or both.

5. The computer-implemented method of claim 1, comprising:

determining whether a first portion of the plurality of registers includes registers located in more than one clock sector; and

based on determining that the first portion of the plurality of registers includes registers located in more than one clock sector, shifting one or more registers in the first portion of the plurality of registers such that each register in the first portion of the plurality of registers is included in a single clock sector.

6. The computer-implemented method of claim 1, wherein the plurality of registers comprises:

a second portion of the register configured to route data vertically away from the data source; and

a third portion of registers disposed perpendicular to the second portion of registers and configured to route data horizontally away from the data source.

7. The computer-implemented method of claim 1, comprising:

determining whether any of the data paths are routed through a non-pass area of the integrated circuit; and

after determining that at least one of the data paths is routed through the non-passable region, altering a portion of the data path to bypass the non-passable region.

8. The computer-implemented method of claim 7, comprising altering one or more locations of a second plurality of registers of the plurality of registers based on altering a portion of the data path to bypass the non-passable region.

9. The computer-implemented method of claim 1, comprising:

receiving a programming associated with a function to be performed by the integrated circuit; and

compiling the programming based on the NoC design to generate a hardware implementation on the integrated circuit for the function.

10. The computer-implemented method of claim 1, comprising receiving expected performance information about the NoC design; and

determining a first portion of the plurality of registers and the routing information based on the expected performance information.

11. The computer-implemented method of claim 1, wherein the integrated circuit comprises a programmable logic device.

12. A non-transitory computer-readable medium comprising instructions that, when executed, are configured to cause one or more processors to:

receiving data about an integrated circuit and data about a performance characteristic of a network on chip (NoC) to be designed;

selecting a source point and an endpoint of the NoC based on the data about the integrated circuit and the data about the performance characteristic of the NoC;

determining a route between the source point and the end point;

determining whether an expected performance level associated with the route is within a threshold range of a performance level defined by the performance characteristic;

including the endpoint and the route in the NoC; and

generating instructions configured to cause circuitry corresponding to the NoC to be implemented on the integrated circuit.

13. The non-transitory computer-readable medium of claim 12, wherein the instructions are configured to cause the one or more processors to:

including the endpoint in the NoC comprises: setting the endpoint as a source point;

determining a second endpoint of the NoC based on the endpoints;

determining a route between the endpoint and the second endpoint;

determining a second route between the endpoint and the second endpoint; and

including the second endpoint and the second route in the NoC.

14. The non-transitory computer-readable medium of claim 12, wherein the instructions are configured to cause the one or more processors to:

moving the endpoint closer to the source point based on determining that the expected performance level is slower than a minimum speed defined by the threshold range; and

based on determining that the expected performance level is faster than a maximum speed defined by the threshold range, moving an endpoint away from the source point.

15. The non-transitory computer-readable medium of claim 12, wherein the instructions are configured to cause the one or more processors to: including the endpoint and the route in the NoC based on determining that the expected performance level is within a threshold range.

16. The non-transitory computer-readable medium of claim 12, wherein the instructions are configured to cause the one or more processors to: selecting the route from data about the integrated circuit based on data about performance characteristics of the NoC.

17. The non-transitory computer-readable medium of claim 12, wherein the endpoints correspond to a plurality of registers of the integrated circuit.

18. An integrated circuit configured to:

receiving instructions regarding NoC design; and

implementing a circuit corresponding to the NoC design, wherein the NoC design is determined by:

determining a first portion of a plurality of registers potentially included in the NoC design;

determining routing information about data paths between registers in a first portion of the plurality of registers;

determining an expected performance associated with a first portion of the plurality of registers;

determining whether the expected performance is within a threshold range; and

including the first portion of the plurality of registers and the data path in the NoC design after determining that the expected performance is within the threshold range.

19. The integrated circuit of claim 18, comprising a programmable logic device.

20. The integrated circuit of claim 19, wherein the programmable logic device comprises a Field Programmable Gate Array (FPGA).

Background

The present disclosure relates generally to integrated circuits such as Field Programmable Gate Arrays (FPGAs). More particularly, the present disclosure relates to the design and implementation of networks on chip (nocs).

This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.

Integrated circuits may be used to perform various functions such as encryption and machine learning. Also, various operations may be performed by various portions of the integrated circuit. For example, one portion of an integrated circuit may perform a function on data while another portion of the integrated circuit may be used to further process the data. Nocs may be used to route communications between different parts of an integrated circuit or to communicate between multiple integrated circuits. For example, a soft NoC may be generated by software used to program an integrated circuit. However, soft nocs may perform inconsistently, operate at relatively low speeds, and cannot route wide buses across long spans of integrated circuits. Also, it may be difficult to control the distribution of a relatively small number of data bits across an integrated circuit using a soft NoC.

Drawings

Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:

fig. 1 is a block diagram of a system for programming a network on chip (NoC) onto an integrated circuit, according to an embodiment;

fig. 2 is a block diagram of an integrated circuit in which a NoC may be implemented according to an embodiment;

FIG. 3 is a block diagram of a clock distribution network according to an embodiment;

FIG. 4 is a block diagram of a regularized clock distribution network according to an embodiment;

fig. 5 is a flow diagram of a process for implementing a NoC on an integrated circuit, according to an embodiment;

fig. 6 is a diagram of a NoC, according to an embodiment;

fig. 7 is a diagram of a portion of the NoC of fig. 6, according to an embodiment;

fig. 8 is a diagram of a portion of a bidirectional NoC, according to an embodiment;

fig. 9 is a diagram of another bidirectional NoC, according to an embodiment;

fig. 10 is a diagram of a portion of a NoC including a gated data path, according to an embodiment;

fig. 11 is a flow diagram of a process for determining routing information and placement of endpoints of a NoC, according to an embodiment;

fig. 12 is a block diagram of a portion of a NoC having register blocks located in several clock portions, according to an embodiment;

FIG. 13 is a block diagram of a register block routed around a non-accessible region of an integrated circuit according to an embodiment;

FIG. 14 is a block diagram of a register block that is being routed around other register blocks that have been routed around an unreachable region of an integrated circuit, according to an embodiment;

fig. 15 is a block diagram of a NoC including a direct data path, according to an embodiment;

fig. 16A and 16B (hereinafter "fig. 16") illustrate a flow diagram of a process for determining routing information and placement of endpoints of a NoC, according to an embodiment;

fig. 17 is a block diagram of another NoC that includes a direct data path, according to an embodiment; and

FIG. 18 is a block diagram of a data processing system according to an embodiment.

Detailed Description

One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.

When introducing elements of various embodiments of the present disclosure, the articles "a," "an," and "the" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements. In addition, it should be understood that references to "one embodiment" or "an embodiment" of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Further, the phrase a "based on" B is intended to mean that a is based, at least in part, on B. Furthermore, unless explicitly stated otherwise, the term "OR" is intended to be inclusive (e.g., a logical OR) and not exclusive (e.g., a logical XOR). In other words, the phrase a "or" B "is intended to mean A, B, or both a and B.

Integrated circuits such as programmable logic devices may be used to perform various functions. In many cases, different parts of the integrated circuit may be used to perform operations in a function. For example, a portion of an integrated circuit may receive data, perform a first operation on the data, and send the data to another portion of the integrated circuit. Another portion of the integrated circuit may then perform another function on the data. Similarly, multiple integrated circuits may be utilized to perform functions. For example, one operation of an operation may be performed by one integrated circuit. The data may then be sent to another integrated circuit, which may perform subsequent operations on the data.

The movement of data and paths available across an integrated circuit is important to the overall performance of the integrated circuit. Some integrated circuits, including programmable logic devices such as Field Programmable Gate Arrays (FPGAs), may utilize a network on chip (NoC) to help facilitate data transfer across the integrated circuit. For example, nocs may be utilized when routing data from one portion of an integrated circuit (e.g., a sector or an Accelerator Function Unit (AFU)) to another portion of the same integrated circuit or another integrated circuit.

A soft NoC, which may be designed by software for programming an integrated circuit, may be designed and implemented onto an integrated circuit. In other words, a circuit designer may utilize software to generate a NoC to be implemented on an integrated circuit. However, in many cases, the performance of soft nocs is limited. For example, a soft NoC may perform inconsistently, operate at a relatively low speed, and be unable to route a wide bus across a long span of an integrated circuit. Also, it may be difficult to control the distribution of a relatively small number of data bits across an integrated circuit using a soft NoC. Furthermore, when a circuit design is modified (e.g., by compiling multiple iterations of the circuit design), it may be difficult to provide a NoC that meets the circuit designer's desired characteristics while also allowing the integrated circuit to perform the functions desired by the circuit designer. In other words, optimizing the performance of a NoC and enabling a portion of the integrated circuit to perform the intended function may prove to be infeasible.

The present disclosure relates to techniques for designing and generating high performance soft nocs. For example, as described below, a regularized approach may be taken to provide an integrated circuit with a NoC that operates according to the designer's desired settings, may be designed prior to compilation, and enables large amounts of data to be transmitted.

With the above in mind, FIG. 1 illustrates a block diagram of a system 10 that may be used to program one or more integrated circuit devices 12 (e.g., integrated circuit devices 12A, 12B). The integrated circuit device 12 may be reconfigurable (e.g., an FPGA) or may be an Application Specific Integrated Circuit (ASIC). User-accessible designSoftware 14 (e.g., Intel)A version of quaartus) to implement a circuit design to be programmed onto integrated circuit device 12.

The design software 14 may be executed by one or more processors 16 of a respective computing system 18. Computing system 18 may include any suitable device capable of executing design software 14, such as a desktop computer, laptop computer, mobile electronic device, server, and the like. Computing system 18 may access integrated circuit device 12, configure integrated circuit device 12, and/or communicate with integrated circuit device 12. Processor 16 may include multiple microprocessors, one or more other integrated circuits (e.g., ASICs, FPGAs, reduced instruction set processors, etc.), or some combination of these.

One or more memory devices 20 may store design software 14. Additionally, memory device 20 may store information related to integrated circuit device 12, such as control software, configuration software, look-up tables, configuration data, and the like. In some embodiments, the processor 16 and/or the memory device 20 may be external to the computing system 18. The memory device 20 may include a tangible, non-transitory, machine-readable medium, such as volatile memory (e.g., Random Access Memory (RAM)) and/or non-volatile memory (e.g., Read Only Memory (ROM)). The memory device 20 may store a variety of information that may be used for various purposes. For example, memory device 20 may store machine-readable and/or processor-executable instructions (e.g., firmware or software) for execution by processor 16, such as instructions for determining a speed of integrated circuit device 12 or a region of integrated circuit device 12, instructions for determining a criticality of a design path programmed in integrated circuit device 12 or a region of integrated circuit device 12, instructions for programming a design in integrated circuit device 12 or a region of integrated circuit device 12, and so forth. The memory device 20 may include one or more storage devices (e.g., non-volatile storage devices) that may include Read Only Memory (ROM), flash memory, a hard drive, or any other suitable optical, magnetic, or solid state storage medium, or any combination thereof.

Design software 14 may use compiler 22 to generate a low-level circuit design program 24 (bit stream), sometimes referred to as a program object file, that programs integrated circuit device 12. That is, compiler 22 may provide machine-readable instructions representing a circuit design to integrated circuit device 12. For example, integrated circuit device 12 may receive one or more programs 24 as a bit stream that describes the hardware implementation that should be stored in integrated circuit device 12. Program 24 (bit stream) may be programmed into integrated circuit device 12 as program configuration 26.

As shown, system 10 also includes a cloud computing system 28, which may be communicatively coupled to computing system 18, e.g., via an internet or network connection. The cloud computing system 28 may include processing circuitry 30 and one or more memory devices 32. Memory device 32 may store information related to integrated circuit device 12 such as control software, configuration software, look-up tables, configuration data, and the like. The memory device 32 may include a tangible, non-transitory, machine-readable medium, such as volatile memory (e.g., Random Access Memory (RAM)) and/or non-volatile memory (e.g., Read Only Memory (ROM)). The memory device 32 may store a variety of information that may be used for various purposes. For example, the memory device 32 may store machine-readable and/or processor-executable instructions (e.g., firmware or software) for execution by the processing circuitry 30. Additionally, memory device 32 of cloud computing system 28 may include program 24 and circuit designs previously made by designers and computing system 18. The memory device 32 may also include one or more libraries of chip-specific predefined locations and fixed routes that may be used to generate nocs. While the designer is utilizing design software 14, processor 16 may request information regarding nocs previously designed by other designers or implemented on other integrated circuit devices 12. For example, a designer who is programming integrated circuit device 12A may utilize design software 14A and processor 16A to request a design from cloud computing system 28 for an NoC for use on another integrated circuit (e.g., integrated circuit device 12B). Processing circuitry 30 may generate a design for the NoC and/or retrieve the design for the NoC from memory device 32 and provide the design to computing system 18A. Additionally, the cloud computing system 28 may provide information about predefined locations and fixed routes of the NoC to the computing system 18A based on the particular integrated circuit device 12A (e.g., the particular chip). Further, memory device 32 may keep track of and/or store the design for providing the regularized structure to the NoC, and processing circuitry 30 may select a particular NoC based on integrated circuit device 12A as well as the designer's design considerations (e.g., amount of data to be transferred, desired data transfer speed).

Turning now to a more detailed discussion of integrated circuit device 12, FIG. 2 illustrates an example of integrated circuit device 12 as a programmable logic device such as a Field Programmable Gate Array (FPGA). Further, it should be understood that integrated circuit device 12 may be any other suitable type of programmable logic device (e.g., an application specific integrated circuit and/or an application specific standard product). As shown, the integrated circuit device 12 may have input/output circuitry 42 for driving signals away from the device and for receiving signals from other devices via input/output pins 44. Interconnect resources 46 (e.g., global and local vertical and horizontal conductors and buses) may be used to route signals on integrated circuit device 12. Additionally, the interconnect resources 46 may include fixed interconnects (wires) and programmable interconnects (i.e., programmable connections between respective fixed interconnects), such as the soft nocs disclosed herein. Programmable logic 48 may include combinational and sequential logic circuits. For example, programmable logic 48 may include look-up tables, registers, and multiplexers. In various embodiments, programmable logic 48 may be configured to perform custom logic functions. The programmable interconnect associated with the interconnect resource may be considered part of programmable logic 48.

A programmable logic device, such as integrated circuit device 12, may contain programmable elements 50 having programmable logic 48. For example, as described above, a designer (e.g., a customer) may program (e.g., configure) programmable logic 48 to perform one or more desired functions. For example, some programmable logic devices may be programmed by configuring their programmable elements 50 using a mask programming arrangement, which is performed during semiconductor fabrication. Other programmable logic devices are configured after a semiconductor manufacturing operation is completed, for example by programming their programmable elements 50 using electrical programming or laser programming. In general, the programmable elements 50 may be based on any suitable programmable technology (e.g., fuses, antifuses, electrically programmable read only memory technology, random access memory cells, mask programmed elements, etc.).

Many programmable logic devices are electrically programmed. In the case of an electrical programming arrangement, the programmable element 50 may be formed of one or more memory cells. For example, during programming, configuration data is loaded into the memory cells using pins 44 and input/output circuitry 42. In one embodiment, the memory unit may be implemented as a Random Access Memory (RAM) unit. The use of a memory unit based on RAM technology described herein is intended to be only one example. Furthermore, because these RAM cells are loaded with configuration data during programming, they are sometimes referred to as configuration RAM Cells (CRAMs). These memory cells may each provide a corresponding static control output signal that controls the state of an associated logic component in programmable logic 48. For example, in some embodiments, the output signal may be applied to the gate of a Metal Oxide Semiconductor (MOS) transistor within programmable logic 48.

Further, it should be noted that programmable logic 48 may correspond to different portions or sectors on integrated circuit device 12. That is, integrated circuit device 12 may be sectorized, meaning that programmable logic resources may be allocated through a plurality of discrete programmable logic sectors (e.g., each programmable logic 48). In some cases, a sector may be programmed to perform a particular task. For example, a first sector (e.g., programmable logic 48A) may perform a first operation on data. The interconnect resources 46, which may include nocs that may be designed using the design software 14, may be used to provide data to another sector (e.g., programmable logic 48B) that may perform further operations on the data. As described below, a soft NoC may provide a predictable way to normalize to provide a large amount of data between computing elements (e.g., between different portions of programmable logic 48).

Continuing with the figures, FIG. 3 provides a block diagram of clock distribution network 80. Clock distribution network 80 (which may be referred to as an "H-tree") represents communication between different portions 82 of integrated circuit device 12. For example, portion 82 may represent a sector of programmable logic 48 or a portion of programmable logic 48 (e.g., programmable element 50 or a group of programmable elements 50) within a single sector. In general, data may propagate from one portion 82 of integrated circuit device 12 to an adjacent portion 82. This may be repeated multiple times until the data is provided to the target destination. For example, data may be sent from portion 82A to portion 82B to portion 82C to portion 82D to transfer data from portion 82A to portion 82D.

Nocs may be designed based on clock distribution network 80. Nocs are said to be based on an "optimized" design in the sense that they can be generated to provide routes that enable data to be transferred between certain points of the integrated circuit device 12 as quickly as possible. However, in such nocs, a portion of the path will be rate-limiting, meaning that there will be some point of data in the path that is transmitted at the slowest rate. For example, where portions 82 correspond to different regions or sectors of programmable logic 48, the various portions of interconnection resources 46 may be comprised of different numbers of wires and/or registers. A portion of the interconnect resources 46 (e.g., where fewer wires are utilized and/or interconnect resources transmit data less frequently) may become a bottleneck in the speed at which data may be transmitted from one portion 82 of the integrated circuit device 12 to another portion 82. Further, in certain types of integrated circuit devices 12 (e.g., FPGAs), the integrated circuit device 12 may be a structured that is regularized, in which data does not originate from a "central" location. For example, data may be generated from programmable logic 48 located in a corner of integrated circuit device 12. Thus, it can be said that the clock distribution network 80 attempts to optimize the overall cost of the entire network (e.g., the entire clock distribution network 80), which may result in the inability to transmit the desired amount of data at the desired speed.

As discussed herein, a "regularized" approach is employed to design and implement a NoC on integrated circuit device 12 (e.g., as part of interconnect resources 46). That is, rather than emphasizing optimization as with clock distribution network 80, emphasis is placed on the regularized NoC architecture that may be provided for integrated circuit device 12. For example, nocs discussed herein may be regularized to a particular integrated circuit device (e.g., a particular model or chip). With this in mind, FIG. 4 is a block diagram of another clock distribution network 90 that follows a regularization method. In particular, clock distribution network 90 is a unidirectional network in which communication from one portion 92 (e.g., a portion of programmable logic 48) of integrated circuit device 12 to another portion 92 is optimized for a single direction. For example, communication from portion 92A to 92B may be optimized by providing a particular bandwidth (e.g., number of wires multiplied by frequency) between portion 92A and portion 92B.

With the foregoing in mind, fig. 5 is a flow diagram of a process 100 for implementing a NoC on an integrated circuit device. For example, processor 16A of computing system 18A may perform process 100 on a NoC on integrated circuit device 12A. Additionally, process 100 may be performed by computing system 18A in conjunction with cloud computing system 28. The process 100 generally includes: receiving data regarding integrated circuit device 12A and NoC metrics (process block 102), estimating initial endpoints, which may also be referred to as "source points" (process block 104), determining routes and placements of the endpoints (process block 106), determining whether each route and endpoint has been determined (decision block 108), and determining routes and placements of the endpoints when it is determined that some routes or endpoints have not been determined (process block 106). When it is determined that each route and endpoint has been determined, process 100 further includes: the method includes receiving a circuit design (process block 110), generating a program (bit stream) 24A (process block 112), and causing the integrated circuit device 12A to be programmed (process block 114).

At process block 102, computing system 18A may receive information about integrated circuit device 12A and metrics associated with a NoC to be developed and implemented onto integrated circuit device 12A. The information may be received from the cloud computing system 28. The information about integrated circuit device 12A may include information about a particular integrated circuit device 12A, such as a particular model (e.g., a particular chip) of integrated circuit device 12A, and characteristics associated with a particular integrated circuit device 12A. Metrics associated with a NoC may include bus width (e.g., number of wires), bus speed (e.g., value in hertz), target performance level, and target performance level threshold range. Some NoC metrics may be received from the cloud computing system 28. For example, the bus width and bus speed may be information specific to a particular type of integrated circuit device 12A, which may be stored on the memory device 32. Additionally, the target performance level and the target performance level threshold range may be received via user input from a designer utilizing the design software 14A. In other embodiments, the processor 16A may determine a target performance level and a target performance level threshold range. For example, a default number or percentage may be used to determine the target performance level threshold range. For example, if the target performance level is 600 megahertz (MHz), the threshold range may be defined by a certain amount above or below 600MHz, such as 580-620MHz (i.e., within 20MHz of 600 MHz). Keeping the target performance level at 600MHZ in the same example, the target performance level threshold range may be defined by a certain percentage (i.e., within five percent of 600 MHZ) above or below 600MHZ (e.g., 570-. Additionally, the designer may set the target performance level threshold range based on a quantity (e.g., megahertz number) or a percentage value (e.g., five percent).

At process block 104, computing system 18A may estimate an endpoint of the NoC. As an example, an endpoint may include a register (e.g., a block of registers) that is relatively close to, or adjacent to, a neighboring point between a source of data (e.g., a portion of programmable logic 48 from which the data originates) and a destination (e.g., another portion of programmable logic 48 to which the data is sent). As another example, an endpoint may refer to a "spine" (e.g., a column) of a register from which a "rib" (e.g., a row) of the register is to be subsequently generated. To aid in illustration, FIG. 5 is provided. In particular, fig. 6 is a diagram of a NoC150A generated and implemented by computing system 18 on integrated circuit device 12A by executing process 100. As shown, NoC150A includes ribs 154, which generally refer to columns, regions of sectors, or portions of programmable logic 48 that may be used to provide data vertically (e.g., in an outward direction to other portions of programmable logic 48 included in ridgeline 152) and horizontally (e.g., to ribs 154). As shown, the ribs 154 may extend parallel to each other and perpendicular to the spine 152.

As shown, the register block 156 may be used to route data to a desired location (e.g., from one portion of the programmable logic 48 to another portion of the programmable logic 48) through the NoC 150A. In particular, register blocks 156A, 156B may be used to send data vertically to other portions of programmable logic 48, such as included in rib 154. The register blocks 156C, 156D, 156E, 156F may transmit data horizontally.

In the illustrated embodiment, NoC150A includes a 512-bit bus implemented via register block 156, which register block 156 is routed at 600MHz across integrated circuit device 12A, such as an embodiment in which integrated circuit device 12A is a programmable logic device (e.g., FPGA). Using the illustrated approach, a fairly uniform performance distribution can be achieved across the entire NoC 150A. For example, the speed profile from the slowest to the fastest route in the NoC150A may be approximately 5% (e.g., 0 to 10%). Such uniformity may be achieved even though the NoC150A may include different amounts of distance between the register blocks 156. For example, as shown, the distance between register block 156D and register block 156E is greater than the distance between register block 156E and register block 156F. The placement of register blocks not included in spine 150, such as register blocks 156D, 156E, 156F, is discussed in more detail below with respect to FIG. 11.

As also shown in FIG. 6, each sector of programmable logic 48, such as programmable logic 48C, includes three blocks of registers 156A, 156B, 156C forming a "Z" shape. The following discusses the registers within programmable logic 48C and the arrangement of register blocks 156A, 156B, 156C. However, before continuing with the discussion of programmable logic 48C, it should be noted that in other embodiments, spine 150 may not be located in a central location (e.g., the central column of programmable logic 48 sectors). Further, although ridges 150 are illustrated as extending vertically, in other embodiments ridges 150 may extend horizontally. In embodiments where the ridgeline 150 extends horizontally, the ribs 154 may extend vertically. That is, ridge 150 may still transmit data both horizontally and vertically, but rib 154 will be used to transmit data vertically.

Continuing with the figures, fig. 7 is a diagram of a portion 160 of a NoC150A located on programmable logic 48C. More particularly, fig. 7 shows registers 162 of register blocks 156A, 156B, 156C included in NOC 150A. Registers 162A, 162B may be used to transmit data vertically along the spine, while register 162C may transmit data horizontally. Utilizing a diagonal set of registers (e.g., register block 156C) may also enable the data path distances associated with each register 162 in register block 156 to be approximately equal. For example, a first total distance (e.g., sum) of the data paths (e.g., wires) 164A, 164B associated with the register 162C is equal to a second total distance (e.g., sum) of the data paths 164C, 164D associated with the register 162D (corresponding data paths of any other register 162 in the register block 156C). Because the total distance of the data paths associated with each register is approximately equal, data may be sent by registers 162 of register block 156C at approximately equal speeds.

Although fig. 6 and 7 refer to a unidirectional NoC (e.g., NoC 150A), in other embodiments, nocs generated by computer system 18A and cloud computing system 28 may be bidirectional. In other words, while the register blocks 156A, 156B are only used to send data in one direction (e.g., up), the register blocks may be used to send data both up and down. For example, fig. 8 is a diagram of a portion of a bidirectional NoC 150B that may be generated by following process 100 and included as part of a ridgeline 152. As shown, alternate registers of the register blocks 156G, 156H are used to provide data in one direction. For example, registers 162E of register block 156G and registers 162F of register block 156H may be used to send data up (e.g., out from a data source). Additionally, registers 162G of register block 156I and registers 162H of register block 156J may send data down (e.g., inward to the data source).

Although fig. 6-8 generally refer to a "Z" configuration of register blocks, in other embodiments, other configurations may be used. For example, fig. 9 is a diagram of another embodiment of a portion of a bidirectional NoC 150C that may be generated by following process 100 and included as part of a ridgeline 152. As shown, the NoC 150C utilizes an "X" configuration that includes two horizontal register blocks 156I, 156J and two diagonal register blocks 156K, 156L. Some of the registers 162 of the register block 156I may be used to transmit data away from the data source, while other registers 162 of the register block 156I may be used to transmit data toward the data source. For example, every other register 162 in register block 156I may transmit data in a particular direction, and adjacent registers 162 may be used to transmit data in the opposite direction. Registers 162 included in register block 156K may be used to transmit data in one direction (e.g., outward) while registers 162 included in register block 156L may transmit data in another direction (e.g., inward).

Continuing with discussion of different embodiments of the NoC, fig. 10 is a diagram of a portion of NoC 150D, which may also be generated by following process 100 and included as part of ridgeline 152. In particular, NoC 150D includes a gated data path. For example, the NoC 150D includes aggregated data paths, e.g., data paths 164E, 164F, 164G, aggregated at a gate router 170 (e.g., an OR gate router OR other logic gate router) communicatively coupled to the register 162. Gate router 170 may perform logical operations on inputs received via data paths 164E, 164F, 164G and send data. For example, the gate router 170 may combine signals received via the data paths 164E, 164F, 164G and send the combined signal inward (e.g., toward a data source). Data packet collisions may be prevented with a gate router, such as gate router 170, and an aggregated data path may be implemented without utilizing a data buffer.

Now that several variations of a NoC have been introduced (e.g., nocs 150A-150D), the discussion will return to fig. 6 and process 100 to further explain how and why one of nocs 150A-150D may be generated (e.g., as compared to another of nocs 150A-150D). Continuing with process 100, at process block 106, computer system 18A may determine routing information and placement of endpoints. For example, determining placement of an endpoint may include determining where other registers or register blocks should be placed, and determining routing information includes determining routes (e.g., particular wires) between the endpoint and other endpoints (e.g., endpoints determined at process block 104). To assist in explaining process block 106 in more detail, FIG. 11 is provided. In particular, fig. 11 is a flow diagram of a process 200 for determining routing information and placement of endpoints. In other words, the process 200 may be performed as the process block 106 of the process 100. Process 200 may be performed by computer system 18A, cloud computing system 28, or a combination of computer system 18A and cloud computing system 28. In general, process 200 includes defining destination register placement and routing information (process block 202), determining expected performance based on the defined register placement and routing information (process block 204), determining whether the expected performance is within a threshold range, such as a target performance level threshold range (decision block 206), and defining destination register placement and routing information when the expected performance is determined not to be within the threshold range (process block 202). When it is determined that the expected performance is within the threshold range, process 200 further includes: a determination is made as to whether there are any clock sector considerations (decision block 208) and destination register placement and routing information is defined upon determining that there are clock sector considerations to consider (process block 202). When it is determined that there are no clock considerations to consider, process 200 may also include determining whether there are any discontinuities in the route (decision block 210) and defining destination register placement and routing information when it is determined that there are discontinuities in the route (process block 202). When it is determined that there is no discontinuity in the route, process 200 may also include setting the destination register to the source register and saving the route information (process block 212).

At process block 202, computing system 18A may define destination register placement and routing information. Defining destination register placements may include determining potential locations of registers or groups of registers (e.g., blocks of registers) to be included in the NoC. For example, in the context of fig. 6, defining destination register placement may include determining registers that are proximate to rib 154, such as register block 156D. Determining routing information includes determining a route between an existing endpoint (e.g., rib 154 or a portion thereof (e.g., a portion of programmable logic 48 included in rib 154)) and a potential destination register. The routing information may for example comprise a number of wires to be included in the route between two registers. Further, the computing system 18A may define destination register placement and routing information based on data included in the memory device 32 of the cloud computing system 28. For example, the destination register placement and routing information may be determined based on predefined locations and fixed routes included in the memory device 32, which may be chip-specific and/or application-specific.

At process block 204, computing system 18A may determine expected performance based on the defined register placement and routing information. In other words, based on the location of the destination register and the conductors included in a particular path, computing system 18A may determine the performance that is expected to occur, such as the speed at which data may be sent along the route. Computing system 18A may make this determination using information about integrated circuit device 12A (e.g., information stored on a memory device of cloud computing system 28).

At decision block 206, computing system 18A may determine whether the expected performance is within a threshold range. For example, the threshold range may be the target performance level threshold range discussed above. When computing system 18A determines that the expected performance is not within the threshold range (e.g., executing below the range or above the range), computing system 18A may return to process block 202 and define new destination register placement and routing information. For example, the computing system 18A may adjust the location of the destination register, modify the route between the endpoint (e.g., the endpoint estimated at process block 104) and the destination register, or both. For example, if it is determined that the expected performance is too slow (e.g., slower than a minimum speed defined by a target performance level threshold range), computing system 18A may move the destination register closer to a previously set endpoint (e.g., a source point or source register). As another example, if the expected performance is too fast (e.g., faster than a maximum speed defined by a target performance level threshold range), then computing system 18A may move the destination register further away from the previously set endpoint.

If at decision block 206 computing system 18A determines that the expected performance is within the threshold range, then at decision block 208 computing system 18A may determine whether there are clock sector considerations to consider. To aid in illustration, FIG. 12 is provided. In particular, fig. 12 is a block diagram of a portion 240 of a NoC (e.g., NoC design) that includes register blocks 156M, 156N, 156O, 156P, each register block including registers (e.g., register 162I in register block 156M, register 162J in register block 156N, register 162K in register block 156O, register 162 in register block 156P) and located in respective clock portions 242A, 242B, 242C, 242D. When registers are placed while performing process 200 (or process 100) (e.g., when destination registers are initially placed or when destination registers are moved (e.g., due to expected performance being outside of a threshold range)), registers included in the same set of registers (e.g., register block) may be placed across multiple clock sectors. When registers of a register block are included in more than one clock sector, computing device 18A may determine clock sector considerations that have not been considered. For example, clock skew may occur when registers of a register block are included in a plurality of block sectors. Referring back to FIG. 11, if at decision block 208 computer system 18A determines that clock considerations (e.g., registers of a register block included in more than one block) are present, computer system 18A may return to process block 202 and define new destination register placement and routing information. For example, the computing system 18A may adjust the location of the destination register, modify the route between the endpoint (e.g., the endpoint estimated at process block 104) and the destination register, or both.

If at decision block 208 computer system 18A determines that there are no clock sector considerations to consider (e.g., if the registers of the register block are located within the same clock sector, as shown in FIG. 12), at decision block 210 computer system 18A may determine whether there are any discontinuities in the way. For example, integrated circuit device 12A may include areas that cannot be programmed by a developer, such as hard IP blocks or portions of integrated circuit device 12A that include logic that cannot be modified. Computer system 18A may determine whether there is a discontinuity in the route by determining whether the route will pass through an area of integrated circuit device 12A that cannot be programmed by the developer. If, at decision block 210, computing system 18A determines that there are any discontinuities in the route, computing system 18A may return to process block 202 and define new destination register placement and routing information. For example, the computing system 18A may adjust the location of the destination register, modify a route between an endpoint (e.g., an endpoint estimated at process block 104) and the destination register, or both.

To aid in the explanation, FIG. 13 is provided. In particular, FIG. 13 is a diagram illustrating register blocks 156Q, 156R, 156S, 156T that are not routable around a non-pass area 262 of integrated circuit device 12A. The no-pass area 262 may be a portion of the integrated circuit device 12A that cannot be modified (e.g., does not include the programmable logic 48). During initial placement of registers, register block 156T may have been placed in the position shown in fig. 13, and a direct route from register block 156Q to register block 156T may have been generated. Such a route may have passed through the impassable area 262. During an embodiment of process 200, computing system 18A may have determined that the route will pass through the non-passable area 262 and performed an action to process the route passing through the non-passable area 262. For example, computing system 18A may add register blocks 156R, 156S, and a route from register block 156Q to register block 156R to register block 156S to register block 156T.

It should also be noted that the placement of the destination register or register block (and associated route) may be made based on other registers or register blocks (and associated routes) that bypass the non-passable area 262. For example, fig. 14 shows a diagram of a portion 280 of integrated circuit device 12A, the portion 280 including portion 260 and register blocks 156U, 156V, 156W, 156X routed based on register blocks 156Q, 156R, 156S, 156T. For example, because the placed register blocks 156Q, 156R, 156S, 156T are routed around the non-routable region 262, the register blocks 156Q, 156R, 156S, 156T and the routes (e.g., wires) associated with the register blocks 156Q, 156R, 156S, 156T may be routed in a similar manner.

Returning to FIG. 11 and the discussion of process 200, if at decision block 210 computing system 18A determines that there is no interrupt in the route, then at process block 212 computing system 18A may set the destination register to the source register and save the routing information. As will be discussed below with respect to process 100, computing system 18A may add more registers (e.g., destination registers) based on the source registers. Computing system 18A may also determine a route between the source register and the destination register.

It should be noted that although process 200 is described as being performed by computing system 18A, in some embodiments, process 200 may be performed by computing system 18A and cloud computing system 28 in combination. For example, computing system 18A may provide information about integrated circuit device 12A, such as a model (e.g., a particular chip) of integrated circuit device 12A, and a target performance level and/or a target performance level threshold range. Processing circuitry 30 of cloud computing system 28 may perform operations of process 200 to generate a NoC design for integrated circuit device 12A based on historical information, predefined locations (e.g., of registers), and/or associated fixed routes stored on memory device 32. For example, the memory device may include data or statistics regarding previous programs implemented on integrated circuit device 12A or integrated circuit device 12B that is the same integrated circuit type as integrated circuit device 12A. Cloud computing system 28 may provide NoC designs to processor 16, and as described above, NoC designs may be implemented on integrated circuit device 12A. Further, process 200 may be performed by computing system 18B alone or in combination with cloud computing system 28.

Additionally, it should be noted that in other embodiments, rather than returning to process block 202 and then to the most recently executed decision block based on the determinations at decision blocks 206, 208, 210, computer system 18A may adjust the placement, routing, or both of the destination registers. For example, if at decision block 208 computer system 18A determines that there are clock sector considerations that have not been considered, computer system 18A may modify the location of the destination register and then return to decision block 208.

Further, while performing process 200 or after performing process 200, computing system 18A may provide data associated with performing process 200 to cloud computing system 28. For example, data regarding registers (e.g., register blocks) and routes associated with the registers, the included registers and routes that are not ultimately implemented into the NoC may be provided to the cloud computing system 28 and added to the memory device 32. For example, data regarding the registers and the routes associated with the registers may be stored as chip-specific predefined locations and fixed routes, respectively, which may be used to generate nocs during other iterations of process 200 or process 100.

With the discussion of process 200 in mind, but returning to FIG. 5, at decision block 108 computing system 18A may determine whether each endpoint and route has been determined. In other words, computing system 18A may determine whether a complete NOC has been designed that will operate according to the designer's desired performance. For example, computing system 18A may determine whether more registers or routes should be added to the NoC design (e.g., nocs 150A-150D). If computing system 18A determines that each route and endpoint has not been determined, the computing system may return to process block 106 and determine additional routing information, placement of additional endpoints, or both. That is, multiple iterations of process 200, or portions thereof, may be performed to develop a NoC, such as one of nocs 150A-150D. For example, while determining register placement and routing information in the NoC, the computing system 18A may adjust the location of the registers and/or modify which conductors to utilize until the expected performance associated with the destination register (e.g., register block) is within the target performance level threshold, any clocking factors have been considered, and no route discontinuities are present (e.g., no discontinuities are present or discontinuities have been considered). Registers or blocks of registers may be described as being added in a "tier" in which destination registers are placed based on the location of other registers that have been set (e.g., source registers). After confirming the placement of the destination register, the destination register may be saved as a source register, and associated route information may also be saved. More layers of registers may be added until the NoC is completed. In other words, at decision block 108, computing system 18A may determine whether the NoC design is complete.

If, at decision block 108, computing system 18A determines that each endpoint (e.g., each register or register block) and associated route has been determined, then at process block 110, computing system 18A may receive the circuit design. For example, the circuit design may be a high-level programming language description of a hardware implementation of integrated circuit device 12A written by a designer using design software 14A. At process block 112, computing system 18A may generate a program or bit stream, such as program (bit stream) 24. For example, as described above, compiler 22A may generate program 24, which may be a low-level circuit design that describes a hardware implementation to be stored on integrated circuit device 12A.

A NoC, such as one of nocs 150A-150D, may be described in program 24. It should be noted, however, that nocs including the registers utilized as well as routing information (e.g., the particular wires utilized between registers) may be defined prior to compilation. In other words, the design for a NoC may be made independent of the compilation, and then before the compilation is performed. When generating program 24, computing system 18A may program programmable elements 50 of programmable logic 48 of integrated circuit device 12A based on a NoC design (e.g., a NoC to be implemented on integrated circuit device 12A). For example, the computing system 18A and/or compiler 22 may determine portions of the programmable logic 48 that will not be used in the NoC. When provided to integrated circuit device 12A, program 24 may only cause portions of programmable logic 48 that will not be used for NoC to be programmed to perform the operations described by the high-level programming language description provided to compiler 22.

At process block 114, computing system 18A may cause integrated circuit device 12A to be programmed. For example, computing system 18A may cause the hardware implementation described by program 24 to be implemented on integrated circuit device 12A by sending program 24 to integrated circuit device 12A. As described above, the program 24 may include instructions for a NoC. Thus, at process block 114, the NoC may be implemented onto programmable logic 48 of integrated circuit device 12A.

It should be noted that although process 100 is described as being performed by computing system 18A, in some embodiments, process 100 may be performed by computing system 18A and cloud computing system 28 in combination. For example, computing system 18A may provide information about integrated circuit device 12A, such as a model (e.g., a particular chip) of integrated circuit device 12A, and a target performance level and/or a target performance level threshold range. Processing circuitry 30 of cloud computing system 28 may use historical information, predefined locations (e.g., locations of registers), and/or fixed routes stored on memory device 32 to generate a NoC design for integrated circuit device 12A. For example, the memory device may include data or statistics regarding previous programs implemented on integrated circuit device 12A or integrated circuit 12B of the same integrated circuit type as integrated circuit device 12A. Cloud computing system 28 may provide NoC designs to processor 16, and as described above, NoC designs may be implemented on integrated circuit device 12A.

By performing process 100 and process 200, computing system 18A may design a device-specific and application-specific NoC and implement the NoC on integrated circuit device 12A. For example, the techniques discussed above enable high-speed data transfer via a wide bus that spans a relatively large distance on integrated circuit device 12A. Furthermore, because nocs may be designed prior to compilation and the locations of registers and datapaths may be set, hardware implementations for performing the functions described by program 24 (e.g., machine learning, encryption, etc.) may take into account nocs and make portions of programmable logic 48 available based on nocs. Furthermore, because the NoC may be determined independently of compilation, less time is required for compilation than determining placement of logic elements used to fabricate the NoC during compilation.

The above discussion provides several examples of nocs that are generally regular in nature. For example, nocs 150A-150D include a pattern of registers and ways that are repeated because the use of such a regularization structure enables large amounts of data to be transmitted quickly across integrated circuit device 12A. However, in some cases, only a relatively small amount of data (e.g., a few bits) may need to be routed. In such a case, utilizing NoC150A may not be as desirable as using a NoC that provides a direct path between sectors or portions of programmable logic 48, such as a particular Accelerator Function Unit (AFU) used to perform a particular operation (e.g., as part of a larger function performed by integrated circuit device 12A). With this in mind, fig. 15 is a diagram of a NoC 150E that includes direct paths between various portions of programmable logic 48 of integrated circuit device 12A. For example, fig. 15 shows endpoints 270 (e.g., endpoints 270A, 270B, 270C), which may include registers or register blocks, and routes 272 (e.g., routes 272A, 272B) between the endpoints 270.

The computing system 18A may generate the NoC 150E by performing the process 100 described above. For example, referring briefly to FIG. 5, computing system 18A may receive a relatively small amount of data to be transferred, a target performance level (e.g., speed) associated with the data, and information about integrated circuit device 12A, such as hardware included in integrated circuit device 12A (process block 102). Based on the desired speed of integrated circuit device 12A and the NoC (e.g., NoC 150E), an initial endpoint, such as the endpoint depicted in fig. 15, may be determined. Computing system 18A may also determine the route (e.g., data path) between the endpoints and the placement of the endpoints (process block 106), for example, in the manner described above with respect to process 200. However, in some embodiments, other techniques may be used.

For example, fig. 16 is a flow diagram of a process 300 for determining routing between endpoints and placement of endpoints. Although the following process is described as being performed by computing system 18A, process 300 may be performed by computing systems 18A, 18B, cloud computing system 28, or cloud computing system 28 in combination with one or more of computer systems 18A, 18B. As described below, the process 300 generally includes: analyzing the expected performance of each route (process block 302), adjusting the location of the end points associated with routes having expected performance outside of a threshold range (process block 304), storing the locations of the passing end points and routes (process block 306), adjusting the locations of the end points associated with routes having expected performance outside of a threshold range (process block 308), generating routes for the adjusted end points (process block 310), analyzing the expected performance of the adjusted routes having both the passing end points and the routes associated with the passing end points (process block 312), determining a first score based on the expected performance (process block 314), analyzing the expected performance of the adjusted routes having new routes between the passing end points and the passing end points (process block 316), determining a second score based on the expected performance (process block 318), determining whether the first score is greater than the second score (decision block 320), and using the passed end point, the passed route, the passed adjusted end point, and the passed route for the adjusted end point when it is determined that the first score is greater than the second score (process block 322). When it is determined that the first score is not greater than the second score, process 300 may include using the passed end point, the passed new route, the passed adjusted end point, and the passed route of the adjusted end point (process block 324). Process 300 may also include determining if all endpoints and routes pass (decision block 326). When there are failed endpoints and/or routes (e.g., performance outside of a threshold range), the process 300 may include adjusting the location of the endpoint associated with the route having expected performance outside of the threshold range (process block 308). The process 300 may end as each endpoint and route passes (process block 328).

At process block 302, computing system 18A may analyze the expected performance of each route. For example, using the endpoints determined at process block 104 of process 100, computing system 18A may determine whether the expected performance (e.g., speed) of each route is within a threshold range (e.g., a target performance level threshold range).

At process block 304, computing system 18A may store the locations of the end points of the passage and the routes of the passage, including routes having expected performance within the target performance level threshold range and end points associated with such routes. At process block 306, computing system 18A may adjust the location of the end point associated with the route having the expected performance outside of the target performance level threshold range. For example, if a route (e.g., a data path) has too slow an expected performance (e.g., below a minimum of a target performance level threshold range), computing system 18A may move one or more endpoints associated with the route (e.g., near another endpoint). As another example, if the expected performance is too fast (e.g., above the maximum of the target performance level threshold range), computing system 18A may move one or more endpoints associated with the route (e.g., farther away from another endpoint).

At process block 308, computing system 18A may generate the route for the endpoint modified at process block 306. At process block 310, computing system 18A may analyze the expected performance of the adjusted route (e.g., the adjusted end point route) while also using the passed end points and the passed route. In other words, the performance of routes in a potential NoC design that includes passed end points, passed routes, adjusted end points, and adjusted routes may be determined. At process block 312, computing system 18A may determine a first score indicative of an analysis for each adjusted route and the passed route. For example, the first score may be a numerical value that indicates a percentage of routes (e.g., adjusted routes and passed routes) that have expected performance within a target performance level threshold range. As another example, the first score may be a score that weights certain routes more heavily than other paths. For example, the route traversed may be weighted less heavily than the adjusted route.

At process block 314, computing system 18A may analyze the performance of the adjusted route while also using the passed end points and the new route between the passed end points. That is, the performance of routes in a potential NoC design that includes passed end points, new routes between passed end points, adjusted end points, and adjusted routes may be determined. At process block 316, computing system 18A may determine a second score indicative of the analysis for each route analyzed at process block 314. That is, the second score may be determined in a manner similar to the first score.

At decision block 318, computing system 18A may determine whether the first score is greater than the second score. In other words, at decision block 318, computing system 18A may determine whether the analysis associated with process block 310 or the analysis associated with process block 314 provided better results. If computing system 18A determines that the first score is greater than the second score, then at process block 320, computing system 18A may use the passed end points, the passed route of the adjusted end points (e.g., the route between the adjusted end points having the expected performance within the target performance level threshold range), and the passed adjusted end points (e.g., the adjusted end points associated with the passed adjusted route).

However, if the computing system 18A determines that the first score is not greater than the second score, then at process block 322, the computing system 18A may use the passed end point, the passed new route (e.g., the new route having the expected performance level within the target performance level threshold range), the passed route of the adjusted end point (e.g., the route between the adjusted end points having the expected performance within the target performance level threshold range), and the passed adjusted end point (e.g., the adjusted end point associated with the passed adjusted route).

Continuing from process block 320 or process block 322 (based on decision block 318), at decision block 324 computing system 18A may determine whether the endpoint and route being used are associated with expected performance within the target performance level threshold. In other words, the computing system 18A may determine whether there are any routes having an expected performance level that is not within the target performance level threshold. If the computing system 18A determines that each route has an expected performance level that falls within the target performance level, the process 300 may end, as indicated by process block 326.

However, if computing system 18A determines that there is a route having an expected performance that is not within the target performance level threshold range, computing system 18A may return to process block 306 and adjust the location of the end point associated with the route having the expected performance that is not within the target performance level threshold range. In other words, the computing system 18A may traverse portions of the process 300 several times until each route has an expected performance within the threshold range of the target performance level.

By utilizing the process 300, the computing system 18A may develop a NoC, such as NoC 150E, having a more direct route (e.g., data path) than the route of the regularized NoC described above. It should be noted, however, that nocs such as NoC 150E may be used in combination with other nocs (e.g., nocs 150A-150D). In other words, multiple nocs may be utilized, such as one regularized NoC (which enables a large amount of data to be sent quickly) and another NoC for sending a relatively small amount of data between particular portions of the integrated circuit device 12A (e.g., between two AFUs, registers, or register blocks).

In some embodiments, process 300 may include an addition operation. For example, process 300 may also include similar operations as decision blocks 208, 210 of process 200. That is, while performing process 300, computing system 18A may consider clock sectors and consider whether any routes will pass through the non-accessible area of integrated circuit device 12A. As another example, further process 300 may include unlocking routes for all endpoints (e.g., at a particular point during execution of process 300 or randomly), which may enable new routes to be generated to determine whether other routes may be better than the route currently being used. As yet another example, process 300 may include determining an endpoint related to allocating a channel or other local route. However, before discussing in greater detail with respect to fig. 17, it should be noted that in some embodiments, process 300 may be performed as an alternative to process 200, even for generating a regularized NoC such as nocs 150A-150D. It should also be noted that after performing process 300, computing system 18A may continue to decision block 108 of process 100 and proceed through the operations of process 100, as discussed above with respect to FIG. 5.

Continuing with the figures, fig. 17 is a diagram of a NoC 150F, which is an embodiment of a NoC 150E that includes additional data paths. As described above, process 300 may include determining endpoints associated with a distribution channel or other local route. For example, while performing process 300, computing system 18A may define additional endpoints and/or routes based on the stored endpoints and generate additional routes. For example, as shown in FIG. 17, route 272A may be modified (as compared to FIG. 15) to provide a wider data distribution (e.g., by adding more wires). That is, a wider data bus may be utilized between two endpoints (e.g., endpoints 270A, 270B). In this example, the increased bandwidth of route 272A may have been increased to enable route 272A to better transmit data (e.g., to other endpoints). However, routes may also be added for purposes such as data distribution (e.g., within a sector or portion of programmable logic 48, which is illustrated by route 272C).

Thus, the nocs discussed herein and the techniques for generating and implementing the nocs discussed herein enable high speed data transfer via a wide bus that spans a large distance across integrated circuit device 12A. In addition, a NoC having data paths (e.g., routes) that enable a desired amount of data to be transmitted at a target performance level (e.g., speed) or within a target performance level threshold may eliminate chokepoints or bottlenecks that other nocs may encounter. Furthermore, because nocs may be designed and the locations of registers and datapaths set prior to compilation, hardware implementations for performing the functions described by program 24 (e.g., machine learning, encryption, etc.) may take into account nocs and make portions of programmable logic 48 utilized based on nocs. Furthermore, because the NoC may be determined independently of compilation, less time is required for compilation than determining placement of logic elements used to fabricate the NoC during compilation.

With the above in mind, the integrated circuit device 12 (e.g., integrated circuit device 12A) may be part of a data processing system or may be a component of a data processing system that may benefit from the use of the techniques discussed herein. For example, integrated circuit device 12 may be a component of data processing system 400 shown in FIG. 18. Data processing system 400 includes a host processor 402, memory and/or storage circuitry 404, and a network interface 406. Data processing system 400 may include more or fewer components (e.g., electronic displays, user interface structures, Application Specific Integrated Circuits (ASICs)).

Host processor 402 may include any suitable processor, such as

A processor or reduced instruction processor (e.g., Reduced Instruction Set Computer (RISC), Advanced RISC Machine (ARM) processor) may manage data processing requests to the data processing system 400 (e.g., to perform machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security mode recognition, spatial navigation, etc.). The memory and/or storage circuitry 404 may include Random Access Memory (RAM), Read Only Memory (ROM), one or more hard disk drives, flash memory, and the like. Memory and/or storage circuitry 404 may be considered external memory to integrated circuit device 12 and may hold data to be processed by data processing system 400 and/or may be internal to integrated circuit device 12. Memory and/or storage circuitry 404 may also store a configuration program (e.g., a bit stream) for programming programmable structures of integrated circuit device 12. Network interface 406 may allow data processing system 400 to communicate with other electronic devices. Data processing system 400 may include several different packages or may be contained within a single package on a single package substrate.

In one example, data processing system 400 may be part of a data center that processes a variety of different requests. For example, the data processing system 400 may receive a data processing request via the network interface 406 to perform machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern recognition, spatial navigation, or some other specialized task. Host processor 402 may cause the programmable logic structures of integrated circuit device 12 to be programmed with specific accelerators related to the requested task. For example, host processor 402 may command storage of configuration data (bit streams) on memory and/or storage circuitry 404 or caching in sector aligned memory of integrated circuit device 12 to be programmed into programmable logic structures of integrated circuit device 12. The configuration data (bitstream) may represent the circuit design for the particular accelerator function associated with the requested task.

The processes and apparatus of the present disclosure may be incorporated into any suitable circuitry. For example, the processes and devices may be incorporated into many types of devices, such as microprocessors or other integrated circuits. Exemplary integrated circuits include Programmable Array Logic (PAL), Programmable Logic Array (PLA), Field Programmable Logic Array (FPLA), Electrically Programmable Logic Device (EPLD), Electrically Erasable Programmable Logic Device (EEPLD), Logic Cell Array (LCA), Field Programmable Gate Array (FPGA), Application Specific Standard Product (ASSP), Application Specific Integrated Circuit (ASIC), and microprocessor, to name a few.

Further, while process operations have been described in a particular order, it should be understood that other operations may be performed between the described operations, the described operations may be adjusted so that they occur at slightly different times, or the described operations may be distributed in a system that allows processing operations to occur at various intervals associated with processing, so long as the processing of the overlying operations is performed as desired.

The embodiments set forth in this disclosure are susceptible to various modifications and alternative forms, specific embodiments having been shown by way of example in the drawings and have been described in detail herein. However, it should be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims. Furthermore, the technology presented and claimed herein is cited and applied to material objects and specific examples having practical uses that clearly improve upon the art and are therefore not abstract, intangible, or purely theoretical. Furthermore, if any claim appended to this specification contains one or more elements designated as "a unit for [ perform ] [ function ] … …" or "a step for [ perform ] [ function ] … …," then it is intended that such elements be construed in accordance with U.S. patent Law clause 112 f. However, no element in any claim containing an element specified in any other way should be construed in accordance with clause 112 of the U.S. patent Law.

37页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：FPGA布线方法

High-performance regularized network-on-chip architecture

相关技术

网友询问留言