Programmable integrated circuit bottom layer

文档序号:1938070 发布日期:2021-12-07 浏览:13次 中文

阅读说明:本技术 可编程集成电路底层 (Programmable integrated circuit bottom layer ) 是由 G·W·贝克勒 M·朗汉默 于 2020-12-23 设计创作,主要内容包括:提供了一种用于实现可编程器件的方法。该方法可以包括从可编程器件上的现有路由网络提取底层,然后将用户设计映射到所提取的底层。底层可以表示满足预定约束的快速路由布线的子集。底层可以由多个重复的相邻逻辑块组成,每个逻辑块实现某个数据路径缩减操作。以这种方式实现电路设计可以显著地改善电路性能,同时将编译时间缩减一半以上。(A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping the user design to the extracted underlay. The bottom layer may represent a subset of the fast routing wires that satisfy predetermined constraints. The bottom layer may consist of multiple repeated adjacent logical blocks, each implementing some data path reduction operation. Implementing the circuit design in this manner can significantly improve circuit performance while reducing compile time by more than half.)

1. A method of implementing a logic circuit on a programmable device using a design tool, comprising:

extracting an underlay from a routing network on the programmable device, wherein the extracted underlay comprises a subset of routing wires in the routing network that satisfy the target routing constraint; and

the logic circuit is mapped to the extracted bottom layer.

2. The method of claim 1, wherein extracting the bottom layer comprises:

a database is accessed to obtain information about the routing network.

3. The method of claim 2, wherein extracting the bottom layer further comprises:

receiving a target routing constraint, wherein the target routing constraint comprises a constraint selected from the group consisting of: source coordinates, timing requirements, speed requirements, routing resource type, routing direction, and crosstalk attributes.

4. The method of claim 1, further comprising:

it is determined whether the logic circuit is fully mapped to the extracted bottom layer.

5. The method of claim 4, further comprising:

additional layout and routing operations are performed on unmapped portions of the logic circuit in response to determining that the logic circuit cannot be fully mapped to the extracted underlying layer.

6. The method of claim 1, further comprising:

the extracted underlayer is used on at least one other region on the programmable device.

7. The method of claim 1, wherein the extracted bottom layer comprises a plurality of adjacent programmable logic blocks.

8. The method of any of claims 1-7, wherein the extracted bottom layer comprises a plurality of 2:1 datapath reduction operators.

9. The method of claim 8, wherein the plurality of 2:1 datapath reduction operators comprises a plurality of 2:1 multiplexers.

10. The method of claim 8, wherein the plurality of 2:1 datapath reduction operators comprises a plurality of adders.

11. The method of claim 8, wherein the plurality of 2:1 datapath reduction operators comprises a plurality of logic gates.

12. The method of claim 8, wherein the plurality of 2:1 data path reduction operators have different ingress and egress modes.

13. An integrated circuit, comprising:

a programmable routing network; and

logic circuitry is implemented using an underlying implementation abstracted from the programmable routing network, wherein the underlying implementation includes a routing pattern within the programmable routing network that satisfies a target routing constraint.

14. The integrated circuit of claim 13, wherein the bottom layer comprises a plurality of programmable logic blocks.

15. The integrated circuit of claim 13, wherein the bottom layer comprises a plurality of adjacent programmable logic blocks.

16. The integrated circuit of claim 14, wherein at least one of the plurality of programmable logic blocks in the bottom layer is to implement a 2:1 datapath reduction operator.

17. The integrated circuit of claim 16, wherein the 2:1 data path reduction operator comprises a 2:1 multiplexer.

18. The integrated circuit of claim 16, wherein the 2:1 data path reduction operator comprises an adder.

19. The integrated circuit of claim 16, wherein the 2:1 data path reduction operator comprises a logic gate.

20. The integrated circuit of any of claims 13-19, wherein the target routing constraint comprises a timing constraint.

21. A design tool for implementing logic circuitry on a programmable device, comprising:

means for extracting an underlay from a routing network on a programmable device, wherein the extracted underlay comprises a subset of routing wires in the routing network that satisfy a target routing constraint; and

means for mapping the logic circuit to the extracted underlying layer.

22. The design tool of claim 21, further comprising:

means for accessing a database to obtain information about a routing network.

23. The design tool of claim 22, further comprising:

means for receiving a target routing constraint, wherein the target routing constraint comprises a constraint selected from the group consisting of: source coordinates, timing requirements, speed requirements, routing resource type, routing direction, and crosstalk attributes.

24. The design tool of any one of claims 21-23, further comprising:

means for determining whether the logic circuit is fully mapped to the extracted underlying layer.

25. The design tool of claim 24, further comprising:

means for performing additional layout and routing operations on unmapped portions of the logic circuitry in response to determining that the logic circuitry cannot be fully mapped to the extracted underlying layer.

Background

This relates generally to integrated circuits, and more particularly to ways to improve the design and speed of programmable integrated circuits.

Programmable integrated circuits, such as Programmable Logic Devices (PLDs), include configurable logic circuits having look-up tables (LUTs) and adder-based logic designed to allow a user to customize the circuit according to the user's particular needs. PLDs also include arithmetic logic, such as adders, multipliers, and dot product circuits.

Programmable integrated circuits typically have maximum speed capabilities. For example, programmable logic devices are provided with a large amount of pipeline resources that allow the device to have a maximum operating speed of 1 GHz. In practice, however, typical customer designs only operate at 300-400 MHz, and thus the device is essentially underutilized from a performance standpoint.

In this context, the embodiments described herein are presented.

Drawings

FIG. 1 is a diagram of an illustrative programmable integrated circuit, according to an embodiment.

FIG. 2 is a diagram of programmable logic blocks coupled together using interconnect circuitry according to an embodiment.

FIG. 3 is a diagram of an illustrative circuit design system that may be used to design an integrated circuit, according to an embodiment.

Fig. 4 is a diagram of an illustrative computer-aided design (CAD) tool that may be used in a circuit design system, according to an embodiment.

FIG. 5 is a flow diagram of illustrative steps for designing an integrated circuit, according to an embodiment.

FIG. 6 is a flow diagram of illustrative steps for identifying an underlying layer and mapping a circuit design to the underlying layer in accordance with an embodiment.

Fig. 7 is a diagram of an illustrative routing bottom layer consisting of 2:1 multiplexers, according to an embodiment.

Fig. 8 is a diagram of an actual routing pattern using the bottom layer of fig. 7, according to an embodiment.

9A-9C are diagrams of an illustrative route bottom layer comprised of adders according to some embodiments.

Fig. 10 is a diagram of an illustrative routing bottom layer comprised of functional blocks in accordance with an embodiment.

11A-11D are diagrams of illustrative 2:1 operators with different ingress/egress modes, according to some embodiments.

FIG. 12 is a diagram of an illustrative routing bottom layer formed using multiple 2:1 operators with different ingress/egress modes, according to an embodiment.

Detailed Description

The present embodiments relate to a method for extracting or parsing a fast routing pattern from a programmable integrated circuit interconnect architecture and mapping a user application to the extracted fast routing pattern. The extracted routing pattern (sometimes referred to as the route "underlay") may differ depending on target logic utilization and speed. The routing pattern may be repeated across the programmable integrated circuit.

Using the underlying design customization logic in this manner can significantly increase the speed of user applications while reducing compile time by 50% or more. For example, in a scenario where the programmable logic device has a maximum operating speed of 1 GHz, user applications designed in this manner may run up to 800-900 MHz more than twice as fast as existing designs. It will be understood by those skilled in the art that the present exemplary embodiments may be practiced without some or all of these specific details. In other instances, well known operations have not been described in detail in order not to unnecessarily obscure the present invention.

With the foregoing in mind, FIG. 1 is a diagram of a programmable integrated circuit 10. As shown in fig. 1, programmable logic device 10 may include a two-dimensional array of functional blocks, including Logic Array Blocks (LABs) 11 and other functional blocks, such as Random Access Memory (RAM) blocks 13 and special-purpose processing blocks, such as Digital Signal Processing (DSP) blocks 12 that are partially or completely hardwired to perform one or more particular tasks, such as mathematical/arithmetic operations.

Functional blocks such as LAB11 may include smaller programmable regions (e.g., logic elements, configurable logic blocks, or adaptive logic modules) that receive input signals and perform customized functions on the input signals to produce output signals. Device 10 may also include programmable routing structures for interconnecting LAB11 with RAM block 13 and DSP block 12. The combination of programmable logic and routing structures is sometimes referred to as "soft" logic, while the DSP blocks are sometimes referred to as "hard" logic. The type of hard logic on device 10 is not limited to DSP blocks and may include other types of hard logic. Adders/subtractors, multipliers, dot product calculation circuits and other arithmetic circuits which may or may not form part of DSP block 12 may sometimes be collectively referred to as "arithmetic logic".

Programmable logic device 10 may contain programmable memory elements for configuring soft logic. The memory elements may be loaded with configuration data (also referred to as programming data) using input/output elements (IOEs) 16. Once loaded, the memory elements provide corresponding static control signals that control the operation of one or more LABs 11, programmable routing structures, and optionally DSP12 or RAM 13. In a typical scenario, the output of a loaded memory element is applied to the gates of metal-oxide-semiconductor transistors (e.g., pass transistors) to turn certain transistors on or off, thereby configuring logic in the functional block that includes the routing path. Programmable logic circuit elements that can be controlled in this manner include portions of multiplexers (e.g., multiplexers used to form routing paths in the interconnect circuitry), look-up tables, logic arrays, AND, OR, NAND, AND NOR logic gates, pass gates, AND the like. Logic gates and multiplexers that are part of soft logic, configurable state machines, or any general-purpose logic component that does not have a single dedicated purpose on device 10 may be collectively referred to as "random logic.

The memory elements may use any suitable volatile and/or non-volatile memory structures, such as Random Access Memory (RAM) cells, fuses, antifuses, programmable read only memory cells, mask-programmed and laser-programmed structures, mechanical memory devices (e.g., including local mechanical resonators), mechanically operated RAM (moram), Programmable Metallization Cells (PMC), conductive bridge RAM (cbram), resistive memory elements, combinations of these structures, and the like. Since the memory elements are loaded with configuration data during programming, the memory elements are sometimes referred to as configuration memory, configuration ram (cram), configuration memory elements, or programmable memory elements.

In addition, programmable logic device 10 may use input/output elements (IOEs) 16 to drive signals off of device 10 and to receive signals from other devices. Input/output elements 16 may include parallel input/output circuits, serial data transceiver circuits, differential receiver and transmitter circuits, or other circuits for connecting one integrated circuit to another. Input/output elements 16 may be located around the periphery of the chip as shown. If desired, the programmable logic device may have input/output elements 16 arranged in a different manner.

Routing structures (sometimes referred to as programmable interconnect circuitry) on PLD 10 may be provided in the form of vertical routing channels 14 (i.e., interconnects formed along a vertical axis of PLD 10) and horizontal routing channels 15 (i.e., interconnects formed along a horizontal axis of PLD 10), each routing channel including at least one track to route at least one wire. The routing wiring may be shorter than the entire length of the routing channel, if desired. A length L of wiring may span L functional blocks. For example, a four-wire long wire may span four functional blocks. The four-wire long wiring in the horizontal routing channels may be referred to as "H4" wiring, while the four-wire long wiring in the vertical routing channels may be referred to as "V4" wiring.

Further, it should be understood that embodiments may be implemented in any integrated circuit. If desired, the functional blocks of such an integrated circuit may be arranged in more levels or layers, with a plurality of functional blocks interconnected to form a larger block. Other device arrangements may use functional blocks that are not arranged in rows and columns. Devices 10 arranged in this manner are sometimes referred to as Field Programmable Gate Arrays (FPGAs).

FIG. 2 is a diagram of programmable logic blocks coupled together using interconnect circuitry. As shown in fig. 2, two logic blocks, such as logic block 202, may be interconnected using horizontal (row-wise) routing channels R _ long and R _ short and using vertical (column-wise) routing channels C _ long and C _ short. Logical blocks 202 may represent LABs 11 of fig. 1, or may represent other suitable groupings of logical components. In the example of fig. 2, each logic block 202 may include a smaller area of programmable logic 204. The smaller programmable logic areas 204 within each logic block 202 are sometimes referred to as Adaptive Logic Modules (ALMs), logic elements, or logic cells. There may be any suitable number of logic cells 204 within the logic block 202. In general, each logic unit or ALM 202 may be a native element on an FPGA that includes a set of look-up table (LUT) circuits and associated registers that may be collectively configured to implement logic gates or even arithmetic circuits.

The line routing channels R _ long and R _ short may represent the horizontal routing channels 15 shown in fig. 1. A longer routing channel R _ long may span more logical blocks (e.g., span more than 10 logical blocks, more than 20 logical blocks, more than 30 logical blocks, etc.), while a shorter routing channel R _ short may span relatively fewer logical blocks (e.g., span less than 10 logical blocks, less than five logical blocks, etc.).

Similarly, the columnar routing channels C _ long and C _ short may represent the vertical routing channels 14 shown in FIG. 1. A longer routing channel C _ long may span more logical blocks (e.g., span more than 10 logical blocks, more than 20 logical blocks, more than 30 logical blocks, etc.), while a shorter routing channel C _ short may span relatively fewer logical blocks (e.g., span less than 10 logical blocks, less than five logical blocks, etc.).

Thus, the long routing channels R _ long and C _ long are sometimes referred to as long global interconnects, while the short routing channels R _ short and C _ short are sometimes referred to as short global interconnects. Each logic block 202 may be coupled to a short global interconnect via routing wiring 210. The logic block 202 may access the long global interconnect via the short global interconnect.

Each logic block 202 may also be coupled to an associated local interconnect circuit 206 via path 208. Signals on the R _ short may be coupled to local interconnect circuitry 206 via path 212, while signals on the C _ short may be coupled to local interconnect circuitry 206 via path 214. The logic block 202 may further be directly coupled to the adjacent local interconnect circuitry 206 (i.e., the local interconnect circuitry 206 associated with the adjacent logic block 202) via a direct link path 216. This direct link path 216 may represent the fastest routing path between adjacent logical blocks, and may sometimes be referred to as a "sneak" path.

Designing and implementing custom logic circuits in programmable logic devices can be an important task. Accordingly, logic designers often use Computer Aided Design (CAD) tool based logic design systems to assist them in designing circuits. A logic design system may assist a logic designer in designing and testing complex circuits for a system. When the design is complete, the logic design system may be used to generate configuration data for electrically programming the appropriate programmable logic device.

An illustrative logic circuit design system 300 in accordance with an embodiment is shown in FIG. 3. The circuit design system 300 may be implemented on an integrated circuit design computing device. For example, the system 300 may be based on one or more processors, such as a personal computer, a workstation, and the like. The processor(s) may be linked using a network, such as a local area network or a wide area network. Memory in these computers, or external memory and storage devices, such as internal and/or external hard disks, may be used to store instructions and data.

Software-based components, such as computer-aided design tools 320 and databases 330, reside on the system 300. During operation, executable software, such as software of computer-aided design tool 320, runs on the processor(s) of system 300. Database 330 is used to store data for the operation of system 300. In general, software and data can be stored on non-transitory computer-readable storage media (e.g., tangible computer-readable storage media). Software code may sometimes be referred to as software, data, program instructions, or code. The non-transitory computer-readable storage medium may include a computer memory chip, non-volatile memory such as non-volatile random access memory (NVRAM), one or more hard drives (e.g., magnetic or solid state drives), one or more removable flash drives or other removable media, a Compact Disc (CD), a Digital Versatile Disc (DVD), a blu-ray disc (BD), other optical media, and a floppy disk, magnetic tape, or any other suitable memory or storage device(s).

Software stored on a non-transitory computer readable storage medium may be executed on system 300. When the software of system 300 is installed, the storage of system 300 has instructions and data that cause the computing devices in system 300 to perform various methods (processes). When performing these processes, the computing device is configured to implement the functionality of circuit design system 300.

The Computer Aided Design (CAD) tool 320 may be provided by a single vendor or multiple vendors, some or all of which are sometimes collectively referred to as CAD tools, circuit design tools, or Electronic Design Automation (EDA) tools. Tool 320 may be provided as one or more sets of tools (e.g., a suite of compilers for performing tasks associated with implementing circuit designs in programmable logic devices) and/or as one or more separate software components (tools). Database(s) 330 may include one or more databases that are only accessed by one or more particular tools, and may include one or more shared databases. The shared database is accessible by multiple tools. For example, the first tool may store data for the second tool in a shared database. The second tool may access the shared database to retrieve the data stored by the first tool. This allows one tool to pass information to another tool. The tools may also pass information between each other, if desired, without storing the information in a shared database.

An illustrative computer-aided design tool 420 that may be used in a circuit design system, such as the circuit design system 300 of fig. 3, is shown in fig. 4.

The design process may begin with the formulation of a functional specification for the integrated circuit design (e.g., a functional or behavioral description of the integrated circuit design). The circuit designer may use design and constraint input tools 464 to specify the functional operation of the desired circuit design. Design and constraint input tools 464 may include tools such as design and constraint input aids 466 and design editor 468. Design and constraint input aids, such as aid 466, may be used to assist a circuit designer in locating a desired design from a library of existing circuit designs, and computer-assisted aids may be provided to the circuit designer to input (specify) the desired circuit design.

As one example, the design and constraint input aids 466 may be used to present a screen of options to the user. The user may click on an on-screen option to select whether the circuit being designed should have certain characteristics. Design editor 468 may be used to enter a design (e.g., by entering lines of hardware description language code), may be used to edit a design obtained from a library (e.g., with the aid of design and constraint entry), or may assist a user in selecting and editing an appropriate prepackaged code/design.

Design and constraint input tools 464 may be used to allow a circuit designer to provide a desired circuit design using any suitable format. For example, design and constraint input tools 464 may include tools that allow a circuit designer to input a circuit design using a truth table. Truth tables can be specified using text files or timing diagrams and can be imported from libraries. The truth table circuit design and constraint inputs may be used for a portion of a large circuit or for the entire circuit.

As another example, the design and constraint input tools 464 may include schematic capture tools. The schematic capture tool may allow a circuit designer to visually construct an integrated circuit design from constituent parts such as logic gates and groups of logic gates. A pre-existing library of integrated circuit designs may be used to allow a desired portion of the design to be imported with the schematic capture tool.

If desired, design and constraint input tool 464 may allow a circuit designer to provide a circuit design to circuit design system 300 using a hardware description language, such as Verilog hardware description language (Verilog HDL), very high speed integrated circuit hardware description language (VHDL), SystemVerilog, or a high level circuit description language, such as OpenCL or SystemC, to name a few. A designer of an integrated circuit design may enter the circuit design by writing hardware description language code with the editor 468. The code patch may be imported from a user-maintained or commercial library, if desired.

After the design has been entered using the design and constraint input tools 464, a behavioral simulation tool 472 may be used to simulate the functionality of the circuit design. If the function of the design is incomplete or incorrect, the circuit designer may make changes to the circuit design using the design and constraint input tools 464. Before the synthesis operations have been performed using the tool 474, the functional operation of the new circuit design may be verified using the behavioral simulation tool 472. Simulation tools such as behavioral simulation tool 472 may also be used at other stages in the design flow (e.g., after logic synthesis), if desired. The output of the behavioral simulation tool 472 may be provided to the circuit designer in any suitable format (e.g., truth table, timing diagram, etc.).

Once the functional operation of the circuit design has been determined to be satisfactory, the logic synthesis and optimization tool 474 may generate a gate-level netlist of the circuit design, for example, using gates from a particular library that are related to the target process supported by the factory that has been selected to produce the integrated circuit. Alternatively, logic synthesis and optimization tool 474 may generate a gate-level netlist of the circuit design using the gates of the target programmable logic device (i.e., in the logic and interconnect resources of a particular programmable logic device product or product family).

Logic synthesis and optimization tool 474 may optimize a design by making appropriate selections of hardware to implement different logic functions in a circuit design based on circuit design data and constraint data entered by a logic designer using tool 464. As one example, logic synthesis and optimization tool 474 may perform multi-level logic optimization and technology mapping based on the lengths of the combined paths between registers in the circuit design and the corresponding timing constraints input by the logic designer using tool 464.

After logical synthesis and optimization using tool 474, the circuit design system may perform physical design steps (layout synthesis operations) using tools such as layout, routing, and physical synthesis tools 476. Tool 476 may be used to determine where to place each gate of the gate level netlist generated by tool 474. For example, if two counters interact with each other, the tool 476 may position the counters in adjacent regions to reduce interconnect latency or meet timing requirements specifying a maximum allowed interconnect latency. The tool 476 creates an ordered and efficient implementation of a circuit design for any target integrated circuit, e.g., for a given programmable integrated circuit such as a Field Programmable Gate Array (FPGA).

Tools such as tools 474 and 476 may be part of a compiler suite (e.g., part of a compiler suite of tools provided by a programmable logic device vendor). In certain embodiments, tools such as tools 474, 476, and 478 may also include timing analysis tools such as timing estimators. This allows tools 474 and 476 to meet performance requirements (e.g., timing requirements) before the integrated circuit is actually produced.

After an implementation of the desired circuit design has been generated using tool 476, the implementation of the design can be analyzed and tested using analysis tool 478. For example, the analysis tools 478 may include a timing analysis tool, a power analysis tool, or a form verification tool, to name a few.

After tool 420 has been used and satisfactory optimization operations have been completed according to the target integrated circuit technology, tool 420 may generate a mask level layout description of the integrated circuit or configuration data for programming the programmable logic device.

Illustrative operations involved in using the tool 420 of FIG. 4 to generate a mask level layout description of an integrated circuit are shown in FIG. 5. As shown in FIG. 5, a circuit designer may first provide a design specification 502. Design specification 502 can generally be a behavioral description provided in the form of application code (e.g., C code, C + + code, SystemC code, OpenCL code, etc.). In some scenarios, the design specification may be provided in the form of a Register Transfer Level (RTL) description 506.

The RTL description may have any form that describes the circuit function of the register transfer stage. For example, RTL descriptions can be provided using a hardware description language such as Verilog hardware description language (Verilog HDL or Verilog), systemveilog hardware description language (systemveilog HDL or systemveilog), or very high speed integrated circuit hardware description language (VHDL). If desired, some or all of the RTL description can be provided as a schematic representation or in the form of code using OpenCL, MATLAB, Simulink, or other high-level synthesis (HLS) languages.

In general, the behavior design specification 502 may include untimed or partially clocked functional code (i.e., application code does not describe cycle-by-cycle hardware behavior), while the RTL description 506 may include a fully clocked design description that details the cycle-by-cycle behavior of the circuitry of the register transfer stage.

The design specification 502 or the RTL description 506 may also include target criteria such as area usage, power consumption, delay minimization, clock frequency optimization, or any combination thereof. The optimization constraints and the target criteria may be collectively referred to as constraints.

Those constraints may be provided for a separate data path, a portion of a design, or the entire design. For example, constraints may be provided in a constraint file with design specification 502, RTL description 506 (e.g., as a compilation directive (pragma) or assertion), or by user input (e.g., using design and constraint input tool 464 of FIG. 4), to name a few.

At step 504, behavior synthesis (also sometimes referred to as algorithmic synthesis) may be performed to convert the behavior description into an RTL description 506. If the design specification has been provided in the form of an RTL description, step 504 may be skipped.

At step 518, the behavioral simulation tool 472 may perform an RTL simulation of the RTL description, which may verify the functionality of the RTL description. If the function of the RTL description is incomplete or incorrect, the circuit designer may make changes to the HDL code (as one example). During RTL simulation 518, actual results obtained from simulating the behavior described by the RTL may be compared to expected results.

During step 508, the logic synthesis operation may generate a gate level description 510 using the logic synthesis and optimization tool 474 from FIG. 4. The output of the logic synthesis 508 is a gate level description 510 of the design.

During step 512, the different gates in the gate level description 510 may be placed in preferred locations on the target integrated circuit using, for example, the layout operation of the layout tool 476 of fig. 4 to meet given target criteria (e.g., minimize area and maximize routing efficiency or minimize path delay and maximize clock frequency or minimize overlap between logic elements, or any combination thereof). The output of layout 512 is a laid out gate level description 513 that satisfies the legal layout constraints of the underlying target device.

During step 515, the gates from the laid out gate level description 513 may be connected using, for example, routing operations of the routing tool 476 of FIG. 4. The routing operation may attempt to meet a given target criteria (e.g., minimize congestion, minimize path delay and maximize clock frequency, meet minimum delay requirements, or any combination thereof). The output of the route 515 is a gate level description 516 of the route, sometimes referred to as a device configuration bitstream or device configuration image.

While layout and routing are performed in steps 512 and 515, a physical synthesis operation 517 may be performed simultaneously to further modify and optimize the circuit design (e.g., using the physical synthesis tool 476 of fig. 4).

The RTL design flow of fig. 5 generally results in a circuit design with limited performance. This is because the ability to access high-speed routing wiring for a particular logic cell is largely a luck issue when designed using conventional RTL flows, as optimizations related to physical placement and routing operations typically lack symmetric access to the routing network on the FPGA. For example, if the first-choice direct-path routing is not available at a given location, the logic unit is forced to take a much slower and circuitous route to reach the intended destination. This is caused by resource contention on the limited physical routing wires, which is a difficult computational problem to optimize.

According to an embodiment, additional underlying flows may be used to map the circuit design to existing underlying layers. An "underlying" layer may be defined herein as a subset of routing wiring or routing patterns that naturally exist as part of the FPGA routing network architecture and meet some predetermined speed criteria. Thus, the underlying routing pattern(s) that naturally exist on an FPGA are sometimes referred to as "artifacts" of the FPGA routing fabric. The underlying routing wiring should typically include fast data path connections suitable for arithmetic, networking, switching, or other functional accelerator designs.

FIG. 6 is a flow diagram of illustrative steps for extracting (parsing) an underlying layer and mapping a circuit design to the extracted underlying layer. The steps of fig. 6 may be performed using the circuit design tool 320 or 420 shown in fig. 3 and 4. At step 600, a design tool may be used to extract the bottom layer from the FPGA architecture. Step 600 may include a series of sub-steps 602, 604, and 606.

At step 602, the design tool may access an FPGA device database (see, e.g., database 330 of fig. 3) to obtain a device routing network for the FPGA device. The device routing network arranges all available and existing routing connections on the FPGA.

At step 604, the design tool may receive a user-defined target routing constraint. As examples, the routing constraints may specify source/origin coordinates of the signal routing paths, timing and speed requirements of the signal routing paths, the type of routing resources that should be used (e.g., using only short global lanes of a particular length, using only local interconnect circuitry, using only direct link parasitic paths, etc.), routing direction(s), crosstalk attributes, and other suitable signal routing criteria.

At step 606, the design tool may identify a subset/pattern of routing wires that are composed of adjacent (or nearly adjacent) logic blocks within the device routing network that satisfy the target routing constraints defined at step 604. The identified subset of wires constitutes the bottom layer, and the rest of the wireless should be discarded as not being part of the bottom layer.

This underlying abstraction is in fact a severely limited logical router. Given certain constraints, there may be no solution or only a very small number of solutions. In this way, the design tool may identify a subset of routing lines using a recursive search with a hierarchical heuristic (which may or may not be fully stable) rather than a conventional global search. Recursive searching is more exhaustive and computationally intensive, but is acceptable for such a constrained domain. It is not necessary that the extraction tool can assume additional degrees of freedom relative to conventional CAD flows. While conventional routers operate with fixed origin and destination terminals (i.e., fixed circuit placements that need to be wired together), the underlying router may accept various destination terminals and then modify the requested logical resources to deliver a portion of the high-speed solution for the desired functionality.

At step 608, the user's circuit design may be mapped to the extracted bottom layer (e.g., such that signals of the circuit design use only routing paths defined by the extracted bottom layer). Ideally, a user circuit design or application can be mapped to the extracted bottom layer with 100% efficiency. However, this is not always possible. If the user's design cannot be fully mapped to the extracted underlying layer (as determined at step 610), then at step 612, the conventional RTL flow of FIG. 5 may be used to implement the remainder of the unmapped circuit design (e.g., additional layout and routing operations may be performed on the unmapped portion of the logic circuit).

At step 614, the extracted bottom layer can optionally be relocated or repeated to one or more other regions on the FPGA (e.g., the bottom layer can be applied to a local region or across the entire device). FPGAs typically have conventional repeatable building blocks, so the underlying pattern can be repeated over ten, hundreds, or even thousands of times on the device. By way of example, the bottom layer may be mapped to a Clos network, an Artificial Intelligence (AI) network, an accelerator platform, or other suitable datapath design. Mapping the circuit design to the extracted high-speed bottom layer in this manner will greatly improve the performance of the custom logic design, typically doubling the maximum operating frequency (Fmax) compared to existing implementations that use only conventional RTL flows.

The underlying target design may coexist with a conventional design flow. As an example, the underlying map design may be an accelerator that operates at a much higher clock rate than the rest of the logic circuitry implemented using conventional flow.

Although the method of operations is described in a particular order, it is to be understood that other operations may be performed between the described operations, the described operations may be adjusted so that they occur at slightly different times, or the described operations may be distributed in a system that allows processing operations to occur at various intervals associated with processing, so long as the processing of the overlap operation is performed in a desired manner.

The underlying extraction method of fig. 6 may be used to extract a repeatable high-speed routing pattern, such as the routing underlay 700 of fig. 7. As shown in fig. 7, the bottom layer 700 may include repeating 2:1 multiplexing circuits 704 connected in a chain. The first multiplexer 704 may have inputs configured to receive signals from two different logic blocks 702. Logic block 702 may represent logic block 202 of fig. 2. The second multiplexer 704 may have a first input connected to the output of the first multiplexer 704 and a second input configured to receive a signal from another logic block 702. The third multiplexer 704 may have a first input connected to the output of the second multiplexer 704 and a second input configured to receive a signal from the further logic block 702. This example, in which the bottom layer 700 includes three multiplexers 704 connected in series, is merely illustrative. In general, bottom layer 700 may include any suitable number of repeatable multiplexing circuits (e.g., 2:1 multiplexers, 3:1 multiplexers, some combination of 2:1 and 3:1 multiplexers, etc.) interconnected to form high-speed routes. There is no fixed bottom layer for the FPGA. Different routing modes may also be detected depending on the target routing density and target performance or whether constraints are relaxed.

Fig. 8 is a diagram of an actual routing scheme using the bottom layer 700 of fig. 7. As shown in FIG. 8, bottom layer 700 can include a plurality of adjacent (or nearly adjacent) logic blocks, where each logic block 702 can be implemented using LAB 202-logic, and where each 2:1 multiplexer 704 can be implemented using a LAB, such as LAB 202-mux (sometimes referred to herein as a LAB-wide multiplexer). LAB 202-inputs can include device input pins that feed signals to corresponding LAB 202-logic. A third LAB wide-mux 202-mux may feed signals to a corresponding LAB 202-output, which may include a device output pin.

Fig. 8 illustrates a fast routing connection that links logic units 204 between different LABs together (see, e.g., direct link path 216 described in connection with fig. 2). Although most of these fast routes have 1:1 correlation or pairing between logical units of the same index, some of the fast routes may be scrambled (as shown in section 850). When the underlying routing is scrambled, one potential problem when mapping arithmetic data paths to such routing underlayers is that carry chains may not be mapped to such underlayers. However, this is the fastest floor. There may be a slightly slower bottom layer that will have a 1:1 pairing for each LAB (i.e., the signals will arrive in a logically acceptable bit order for the destination function).

The bottom layer also need not be fully developed. In one case, most connections will be mapped to the bottom layer (e.g., by constraining the connections to known quick connections in the bottom layer), and where no logic cells for that bottom layer with known quick connections can be left floating (float), relying on conventional RTL procedures to find connections for those floating logic cells later. Since most connections in the bottom layer are constrained, the design tool should have an easier time to find fast routes for unspecified connections, since the layout/routing freedom for the entire circuit structure is greatly reduced.

The bottom layer is most efficient in terms of performance when used with fast networks. Speed may drop more as more and more logic is added around the bottom layer. However, in contrast to conventional design flows, the extracted bottom layer will provide a locally repeatable routing structure/framework that can be replicated across devices to optimize speed and logic utilization according to the user's goals.

The bottom layer 700 of fig. 7, consisting of repeated 2:1 multiplexers, is merely illustrative and is not intended to limit the scope of the present embodiment. Fig. 9A shows another suitable bottom layer 900 consisting of a duplicate adder configured to compute the sum of eight input words. As shown in fig. 9A, the bottom layer 900 may include repeatable adders 902 connected in a chain-like structure to add together the input words d1-d8, where each adder 902 combines two separate values. Each adder 902 may be implemented using a logic block (e.g., see LAB11 of fig. 1 or logic block 202 of fig. 2, or other suitable group of logic elements), and thus may be referred to as a LAB-based adder. In the example of FIG. 9A, most of the connections between adjacent adders 904 are short horizontal wires 904, but note that one of the connections consists of one horizontal routing segment 906-1 and one vertical routing segment 906-2. The delay through segments 906-1 and 906-2 may be slower than the delay through a single horizontal wire 904.

FIG. 9B shows another suitable bottom layer 900' that includes additional register circuitry 910 interposed between horizontal routing segment 906-1 and vertical routing segment 906-2. Registers 910 configured in this manner may be used as pipeline elements to improve the throughput of the entire adder-based bottom layer 900'.

Fig. 9C shows the moving of an adder, such as adder 902', to yet another suitable bottom layer 900 "of the steering node. Configured in this manner, all routing between adjacent adders 902 can be equally fast without sacrificing any latency that otherwise would be introduced by pipeline registers.

The examples of fig. 7-8, in which the bottom layer is comprised of 2:1 multiplexers, and the examples of fig. 9A-9C, in which the bottom layer is comprised of adders, are merely illustrative. Fig. 10 shows a bottom layer, such as FPGA bottom layer 1000, which is formed by functional blocks 1002 configured to compute some functions of eight different input words d1-d 8. The functional blocks 1002 may be "soft" (programmable) logic blocks or "hard" non-configurable functional blocks (sometimes referred to as embedded/hardwired functional blocks). Each functional block 1002 may be implemented using a logic block (e.g., see LAB11 of fig. 1 or logic block 202 of fig. 2, or other suitable group of logic elements) or logic region. Each functional block 1002 may be a multiplexer (e.g., a 2:1 multiplexer circuit), an adder, a logic gate (e.g., a logical AND (AND) gate, a logical NAND (NAND) gate, a logical OR (OR) gate, a logical NOR (NOR) gate, a logical exclusive OR (XOR) gate, a logical exclusive OR (XNOR) gate), AND other suitable 2:1 functional operators that may be selected from a pre-formed bank of elements. Each functional block 1002 in the bottom layer 1000 may have more than two inputs if desired. In general, the bottom layer 1000 may include any number of functional blocks 1002 for combining more than eight input signals or less than eight input signals.

Pre-formed library elements may have different ingress and egress patterns, which facilitates the assembly of larger systems, as data does not always flow from source to destination in one direction simply using a constant data path width. 11A-11D are diagrams of illustrative 2:1 data path reduction operators with different ingress/egress modes. These 2:1 data path reduction operators may be multiplexers (e.g., 2:1 multiplexer circuits), adders, logic gates (e.g., logic and gates, logic nand gates, logic or gates, logic nor gates, logic xor gates, logic and gates), and other suitable 2:1 functional operators.

As an example, FIG. 11A shows a first data path reduction operator 1100-1 having first and second ingress ports from the west and north edges and an egress port at the east edge. As another example, FIG. 11B shows a second datapath reduction operator 1100-2 having first and second ingress ports from the west and south sides and an egress port at the east side. As yet another example, FIG. 11C shows a third data path reduction operator 1100-3 having first and second ingress ports from north and south and an egress port at west. As yet another example, FIG. 11D shows a fourth data path reduction operator 1100-4 having first and second ingress ports from the south and east and an egress port at the north.

The examples of fig. 11A-11D are merely illustrative. In general, a 2:1 function operator may have an ingress port from any two edges and an egress port at some other edge. Each of these datapath reduction operators can be implemented using logic blocks (see, e.g., LABs 11 of FIG. 1 or logic blocks 202 of FIG. 2, or other suitable groups of logic elements) or logic regions. The input and output ports of these datapath reduction operators may be connected to lanes of any suitable length (e.g., short global lanes R _ short/C _ short or long global lanes R _ long/C _ long of the type described in connection with fig. 2).

Although there are some limitations in the free-form structural sense, a large number of circuits of interest can be expressed or constructed using adjacent (or near adjacent) 2:1 data path reduction operators. Fig. 12 is an illustrative N: a diagram of 1-mux, which can be constructed from multiple 2: 1-mux operators with different ingress/egress modes. As shown in FIG. 12, the bottom layer 1200, representing a 16:1 multiplexer, can be composed using different types of adjacent data path reduction operators of FIGS. 11A-11D.

In general, any digital circuit can be expressed as a combination of 2:1 logic nand gates, so all circuits can be assembled in this manner using 2:1 reduction nodes. The additional requirement of adjacency (or near adjacency) in a two-dimensional layout is constraining, but not insurmountable. The bottom layer formed by these 2:1 data path reduction operators is considered to be a priori extremely fast, so any larger circuit mapped to such a bottom layer will be able to operate at very high speeds. This is in stark contrast to prior methods of performing full placement and routing, which are generally limited by the speed of the least successful wire connections.

Embodiments have thus far been described for programmable integrated circuits. Examples of programmable logic devices include Programmable Array Logic (PAL), Programmable Logic Arrays (PLA), Field Programmable Logic Arrays (FPLA), Electrically Programmable Logic Devices (EPLD), Electrically Erasable Programmable Logic Devices (EEPLD), Logic Cell Arrays (LCA), Complex Programmable Logic Devices (CPLD), and Field Programmable Gate Arrays (FPGA), to name a few.

The programmable logic devices described in one or more embodiments herein may be part of a data processing system that includes one or more of the following components: a processor; a memory; an IO circuit; and a peripheral device. Data processing may be used for various applications, such as computer networking, data networking, instrumentation, video processing, digital signal processing, or any other suitable application where the advantages of using programmable or reprogrammable logic are desirable. Programmable logic devices may be used to perform a variety of different logic functions. For example, the programmable logic device may be configured as a processor or controller that works in conjunction with the system processor. The programmable logic device may also function as an arbiter for arbitrating access to a shared resource in the data processing system. In yet another example, the programmable logic device may be configured as an interface between the processor and one of the other components in the system.

Example (c):

the following examples relate to further embodiments.

Example 1 is a method of implementing a logic circuit on a programmable device using a design tool, comprising: extracting an underlay from a routing network on the programmable device, wherein the extracted underlay comprises a subset of routing wires in the routing network that satisfy a target routing constraint; the logic circuit is mapped to the extracted bottom layer. Signals in the logic circuit use only the extracted routing paths defined in the bottom layer.

Example 2 is the method of example 1, wherein the extraction underlayer optionally comprises accessing a database to obtain information about the routing network.

Example 3 is the method of example 2, wherein extracting the bottom layer optionally further comprises receiving a target routing constraint, and wherein the target routing constraint comprises a constraint selected from the group consisting of: source coordinates, timing requirements, speed requirements, routing resource type, routing direction, and crosstalk attributes.

Example 4 is the method of any one of examples 1-3, optionally further comprising determining whether the logic circuit is fully mapped to the extracted bottom layer.

Example 5 is the method of example 4, optionally further comprising performing additional layout and routing operations on the unmapped portion of the logic circuit in response to determining that the logic circuit cannot be fully mapped to the extracted underlying layer.

Example 6 is the method of any one of examples 1-5, optionally further comprising using the extracted underlayer on at least one other region on the programmable device.

Example 7 is the method of any one of examples 1-6, wherein the extracted bottom layer optionally includes a plurality of adjacent programmable logic blocks.

Example 8 is the method of any one of examples 1-7, wherein the extracted bottom layer optionally includes a plurality of 2:1 datapath reduction operators.

Example 9 is the method of example 8, wherein the plurality of 2:1 datapath reduction operators optionally comprises a plurality of 2:1 multiplexers.

Example 10 is the method of example 8, wherein the plurality of 2:1 datapath reduction operators optionally includes a plurality of adders.

Example 11 is the method of example 8, wherein the plurality of 2:1 datapath reduction operators optionally comprises a plurality of logic gates.

Example 12 is the method of example 8, wherein the plurality of 2:1 data path reduction operators optionally have different ingress and egress modes.

Example 13 is an integrated circuit, comprising: a programmable routing network; and logic circuitry implemented using an underlayer abstracted from the programmable routing network, wherein the underlayer includes routing patterns within the programmable routing network that satisfy the target routing constraints.

Embodiment 14 is the integrated circuit of example 13, wherein the bottom layer optionally includes a plurality of programmable logic blocks.

Example 15 is the integrated circuit of example 13, wherein the bottom layer optionally includes a plurality of adjacent programmable logic blocks.

Example 16 is the integrated circuit of any of examples 14-15, wherein at least one of the plurality of programmable logic blocks in the bottom layer is optionally used to implement a 2:1 datapath reduction operator.

Example 17 is the integrated circuit of example 16, wherein the 2:1 data path reduction operator optionally comprises a 2:1 multiplexer.

Example 18 is the integrated circuit of example 16, wherein the 2:1 data path reduction operator optionally comprises an adder.

Example 19 is the integrated circuit of example 16, wherein the 2:1 data path reduction operator optionally comprises a logic gate.

Example 20 is the integrated circuit of any one of examples 13-19, wherein the target routing constraints optionally include timing constraints.

Example 21 is a non-transitory computer-readable storage medium comprising instructions to: extracting a subset of routing paths in the programmable interconnect structure, wherein the extracted subset of routing paths satisfies a predetermined performance criterion; and mapping the application to a subset of the extracted routing paths.

For example, all optional features of the apparatus described above may also be implemented with reference to the methods or processes described herein. The foregoing is merely illustrative of the principles of the present disclosure and various modifications can be made by those skilled in the art. The above embodiments may be implemented individually or in any combination.

29页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种借由自动工具完成的模块化半定制FPGA芯片设计方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类