Reconfigurable computing device

文档序号:197379 发布日期:2021-11-02 浏览:23次 中文

阅读说明:本技术 可重新配置的计算装置 (Reconfigurable computing device ) 是由 R·E·杜贝 于 2020-03-02 设计创作,主要内容包括:可重新配置的计算装置包括多个计算块。每个计算块包括可重新配置的处理元件和被配置为通过网络结构进行通信的网络结构接口设备。所述可重新配置的处理元件对从I/O输入接口接收的数据和/或经由网络结构接口设备接收的数据进行操作。(The reconfigurable computing device includes a plurality of computing blocks. Each computing block includes a reconfigurable processing element and a network fabric interface device configured to communicate over a network fabric. The reconfigurable processing element operates on data received from the I/O input interface and/or data received via the network fabric interface device.)

1. A reconfigurable computing device, comprising:

a housing;

a network fabric interface disposed on the housing;

a data I/O interface disposed on the housing; and

a first computing block disposed in the housing,

wherein the first calculation block comprises:

a network fabric interface device coupled to the network fabric interface configured to send data to and receive data through a network fabric; and

a Reconfigurable Processing Element (RPE) coupled to at least one of the network fabric interface device and the data I/O interface and configured to process input data received from at least one of the network fabric interface device and the data I/O interface and to provide output data to at least one of the network fabric interface device and the data I/O interface, wherein the output data is a function of the received input data.

2. The reconfigurable computing device of claim 1, further comprising:

a daisy chain port disposed on the housing and coupled to the first computing block,

wherein the daisy chain port is operable to couple the first computing block to a computing block on another RCA.

3. The reconfigurable computing device of claim 1, wherein the network fabric interface device is an ASIC.

4. The reconfigurable computing device of claim 1, wherein the RPE is a Field Programmable Gate Array (FPGA).

5. The reconfigurable computing device of claim 1, wherein the computing block includes a first RPE and a second RPE, and

wherein the first RPE has a first configuration and the second RPE has a second configuration different from the first configuration.

6. The reconfigurable computing device of claim 1, further comprising:

in the second calculation block, the first calculation block,

wherein the RPE on the first computing block has a first configuration and the RPE on the second computing block has a second configuration different from the first configuration.

7. The reconfigurable computing device of claim 1, further comprising:

a first jumper port disposed on the housing and coupled to the first computing block;

a second calculation block; and

a second jumper port disposed on the housing and coupled to the second computing block,

wherein the first and second jumper ports are coupleable to each other to couple the first and second compute blocks together.

8. The reconfigurable computing device of claim 7, further comprising:

a jumper cable having a first end coupled to the first jumper port and a second end coupled to the second jumper port.

9. A computing cluster, comprising:

a frame;

a plurality of reconfigurable computing devices (RCAs) mounted in the rack,

wherein each RCA comprises:

a housing;

a network fabric interface disposed on the housing;

a data I/O interface disposed on the housing; and

a plurality of computing blocks disposed in the housing,

wherein each computation block comprises:

a network fabric interface device coupled to the network fabric interface configured to send data to and receive data through a network fabric; and

a Reconfigurable Processing Element (RPE) coupled to at least one of the network fabric interface device and the data I/O interface and configured to process input data received from at least one of the network fabric interface device and the data I/O interface and to provide output data to at least one of the network fabric interface device and the data I/O interface, wherein the output data is a function of the received input data.

10. The computing cluster of claim 9, wherein each RCA further comprises:

a daisy chain port disposed on the enclosure and coupled to one of the plurality of computing blocks,

wherein the daisy chain port is operable to couple the one computing block to a computing block on another RCA.

11. The computing cluster of claim 9, wherein at least one network fabric interface device is an ASIC.

12. The computing cluster of claim 9, wherein at least one RPE is a Field Programmable Gate Array (FPGA).

13. The computing cluster of claim 9, wherein at least one computing block comprises a first RPE and a second RPE, and

wherein the first RPE has a first configuration and the second RPE has a second configuration different from the first configuration.

14. The computing cluster of claim 9, wherein at least one RCA comprises:

a first calculation block and a second calculation block,

wherein the RPE on the first computing block has a first configuration and the RPE on the second computing block has a second configuration different from the first configuration.

15. The computing cluster of claim 9, wherein at least one RCA further comprises:

a first calculation block and a second calculation block;

a first jumper port disposed on the housing and coupled to the first computing block; and

a second jumper port disposed on the housing and coupled to the second computing block,

wherein the first and second jumper ports are coupleable to each other to couple the first and second compute blocks together.

16. The computing cluster of claim 15, wherein the at least one RCA further comprises:

a jumper cable having a first end coupled to the first jumper port and a second end coupled to the second jumper port.

Background

Programmable elements, such as Field Programmable Gate Arrays (FPGAs), are used for High Performance Computing (HPC) tasks. However, there is no convenient way to package these components with the traditional HPC form factor. Furthermore, there is no integrated mechanism to efficiently ingest large high-speed data streams and then efficiently transfer the processing results back and forth across the HPC cluster network fabric. These limitations make it difficult to utilize the capabilities of non-general purpose computing elements (e.g., FPGAs) for stream computing in an HPC or clustered computing environment.

Currently, stream computing requires either large volumes of commercial off-the-shelf (COTS) or custom hardware using inefficient integration schemes to perform HPC tasks. The most popular use of programmable logic or FPGAs integrated into HPC clusters is packaged as a plug-in board for COTS rack servers. These approaches have limited I/O capacity and no cluster fabric integration capability. OpenVPX (and other form factors) circuit card assemblies implemented in card cage embedded computing environments cannot efficiently handle the large number of externally connected I/os and powerful cluster fabric interfaces.

There is a need for improvements in deploying programmable logic elements in a clustered computing environment.

Disclosure of Invention

In one aspect of the invention, a reconfigurable computing device (RCA) comprises: a housing; a network fabric interface disposed on the housing; a data I/O interface disposed on the housing; and a first computing block disposed in the housing, wherein the first computing block includes: a network fabric interface device coupled to the network fabric interface configured to transmit data to and receive data through a network fabric; and a Reconfigurable Processing Element (RPE) coupled to at least one of the network fabric interface device and the data I/O interface and configured to process input data received from at least one of the network fabric interface device and the data I/O interface and to provide output data to at least one of the network fabric interface device and the data I/O interface, wherein the output data is a function of the received input data.

The RCA may further include a daisy chain port disposed on the housing and coupled to the first computing block, wherein the daisy chain port is operable to couple the first computing block to a computing block on another RCA.

In another aspect of the invention, a computing cluster is described, comprising: a frame; a plurality of reconfigurable computing devices (RCAs) mounted in the rack, wherein each RCA comprises: a housing; a network fabric interface disposed on the housing; a data I/O interface disposed on the housing; and a plurality of computing blocks disposed in the housing, wherein each computing block includes: a network fabric interface device coupled to the network fabric interface configured to transmit data to and receive data through the network fabric; and a Reconfigurable Processing Element (RPE) coupled to at least one of the network fabric interface device and the data I/O interface and configured to process input data received from at least one of the network fabric interface device and the data I/O interface and to provide output data to at least one of the network fabric interface device and the data I/O interface, wherein the output data is a function of the received input data.

Drawings

Various aspects of the invention are discussed herein with reference to the figures. It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn accurately or to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity, or several physical components may be included in one functional block or element. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. However, for purposes of clarity, not every component may be labeled in every drawing. The drawings are provided for purposes of illustration and explanation and are not intended as a definition of the limits of the invention. In the drawings:

FIG. 1 is a perspective view of a reconfigurable computing device according to an aspect of the present invention;

FIG. 2 is a schematic diagram of the reconfigurable computing device of FIG. 1; and

FIG. 3 is a functional block diagram of a compute block in accordance with an aspect of the present invention.

Detailed Description

In the following detailed description, details are set forth in order to provide a thorough understanding of various aspects of the invention. It will be understood by one of ordinary skill in the art that aspects may be practiced without some of these specific details. In other instances, well-known methods, procedures, components, and structures may not have been described in detail so as not to obscure aspects of the invention.

It is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components or steps set forth in the following description or illustrated in the drawings, since it is capable of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.

For clarity, certain features are described in the context of separate implementations, but may also be provided in combination in a single implementation. Conversely, various features that are, for brevity, described in the context of a single implementation, may also be provided separately or in any suitable subcombination.

In one aspect of the invention, a reconfigurable computing device (RCA) packages reconfigurable processing elements or programmable logic devices, such as but not limited to FPGAs, in a form factor compatible with commercially available, i.e., standardized, 19-inch racks. The HPC task can use a known 19-inch rack shape to simplify integration. Other implementations using HPC, such as a land mobile environment, may use different rack form factors, or may even not be rack-mounted and may require liquid cooling and/or ruggedization packaging options.

Advantageously, according to aspects of the present invention, the RCA provides I/O, e.g., 10GB, 40GB, or 100GB Ethernet, for direct access to process using programmable logic. This allows efficient processing of high-speed data streams (stream computation) using non-general purpose processing elements. The number of these programmable elements can be adjusted according to the number of I/os required for the process. These I/O links between one or more RCA units may be linked or connected in various configurations, depending on the desired programmable function.

The programmable elements within the RCA include connections to HPC cluster fabrics that support Remote Direct Memory Access (RDMA), such as InfiniBand, RoCE, Ethernet, or Omnipath. These connections allow high-rate, low-latency data transfer between the RCA programmable logic elements and other HPC processing resources.

Referring now to FIG. 1, in accordance with an aspect of the present invention, a reconfigurable computing device (RCA)100 includes a housing 104, e.g., a housing corresponding to a standard 19-inch rack size. The front panel 108 includes: a plurality of front-end data input/output (I/O), bi-directional interfaces or ports 112 provided for receiving input data from or providing output data to one or more external sources of the RCA 100; and a plurality of network fabric I/O ports 116, also bi-directional, provided for coupling to the HPC network fabric. In one approach, the data I/O port 112 may be configured to operate with 10GB Ethernet. The HPC network fabric may be, for example, but not limited to, one of the HPC cluster fabric(s) supporting Remote Direct Memory Access (RDMA), such as InfiniBand, RoCE, Ethernet, or Omnipath. The front panel 108 can also include at least one control port 120, one or more jumper ports 124, one or more daisy-chain ports 128, and an on/off switch 132. Each of the control port 120, jumper port 124, and daisy-chain port 128 are bi-directional. In one aspect of the invention, some of the network fabric I/O ports 116, data I/O ports 112, control ports 120, jumper ports 124, daisy chain ports 128, or on/off switches 132 may be disposed on the back portion 130 of the enclosure.

One or more computing blocks 136 are provided within the housing 104, the details of which are described below. Each of the compute blocks 136 may be coupled to one or more of the data I/O ports 112, fabric I/O ports 116, and jumper ports 124, as shown in FIG. 2. It should be noted that RCA 100 may be configured such that a given computation block 136: neither coupled to the data I/O port 112 nor the fabric I/O port 116 (i.e., coupled only to another compute block 136); coupled to only one or the other of the data I/O port 112 and fabric I/O port 116; and/or another calculation block 136; or to both the data I/O port 112 and the fabric I/O port 116 and/or to another computing block 136. Each computing block 136 may be coupled/decoupled to another computing block 136 within the RCA 100 by connecting an appropriate jumper cable 204 to the jumper port 124. Jumper cables 204 and jumper ports 124 provide a path for transferring data between compute blocks 136. Thus, the external jumper cables 204 allow for reconfiguration and customization for different use cases.

Furthermore, a computing block may be connected to another computing block through a connection 220 internal to the RCA 100, as it does not involve a connection external to the housing 104.

In addition, one RCA 100 may be coupled to another RCA 100 via daisy-chain ports 128 and appropriate wiring 208. Those of ordinary skill in the art understand that, although not shown, other devices (e.g., power supplies, fans, etc.) and any corresponding support devices needed to operate the computing block are also present in RCA 100. However, these other devices are not germane to aspects of the present invention.

Referring now to fig. 3, each compute block 136 includes at least one Reconfigurable Processing Element (RPE)304, such as, but not limited to, an FPGA. RPE 304 is coupled to a corresponding Network Fabric Interface (NFI) device 308, which NFI device 308 is configured to interface with an HPC network fabric. In one non-limiting example, NFI device 308 is an Application Specific Integrated Circuit (ASIC) provided and configured according to known techniques to serve as an interface to the HPC network fabric.

Support controllers 312 may also be incorporated into each compute block 136 and coupled to RPE 304 and ASIC 308. The support controller 312 on each computing block is typically programmed to coordinate the operation of the RPE 304 and ASIC 308 among other tasks and to communicate with other computing blocks in the RCA 100. As known to those of ordinary skill in the art, the support controller 312 may include a CPU, ROM, RAM, I/O interfaces, and the like.

In one approach, RPE 304 is programmed via the front panel connection discussed above through known techniques and provides a processing throughput capacity that may exceed that of software-based solutions. Alternatively, RPE 304 may be programmed through an interface to supporting controller 312, a pre-programmed memory, or another interface, such as an interface compliant with the Joint Test Action Group (JTAG) industry standard. On a computing block 136 that includes multiple RPEs, each RPE may be programmed with the same configuration, or each RPE may be programmed with a different configuration than the other RPEs on the computing block 136.

Furthermore, in the RCA 100 having a plurality of computation blocks 136, there may be differences in the programming (i.e., configuration) of the individual RPEs of one computation block 136 and the programming of the individual RPEs of the next computation block 136. Advantageously, the calculation block 136 may be provided in a manner tailored to the intended operation.

Accordingly, aspects of the present invention provide programmable logic solutions via computation blocks and corresponding RPEs as local participants on a computing fabric in the same manner as general purpose computing solutions. The reconfigurable I/O mechanism available through the plug configuration of the I/O port provides flexibility to meet different processing requirements, such as radar signal processing and machine learning by applying a set of programmable elements in the RCA, to name a few. In addition, the tray, module, or rack form factor facilitates application and integration into COTS or embedded computing infrastructure.

Aspects of the above-described systems and methods may be implemented in combinations and subcombinations of digital electronic circuitry, computer hardware, firmware, and/or software. The implementation may comprise, for example, a computer program product, i.e. a computer program embodied in a tangible information carrier, a machine-readable storage device for controlling the operation of a data processing apparatus or a programmable processor, a computer and/or computers.

A computer program can be written in any form of programming language, including compiled and/or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a subroutine, element, and/or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site.

It should be understood that the present invention has been described using non-limiting detailed descriptions of various aspects of the invention that are provided by way of example only and are not intended to limit the scope of the invention. Features and/or steps described in relation to one aspect may be used with other aspects and not all aspects of the invention may have all of the features and/or steps shown in a particular figure or described in relation to one of these aspects. Variations of the described aspects will occur to those of skill in the art.

It should be noted that some of the above-described aspects include structure, acts, or details of structure and acts, which may not be essential to the invention and which have been described as examples. Structures and/or acts described herein may be substituted for equivalents which perform the same function, even if the structures or acts are different, as is known in the art, e.g., using multiple dedicated devices to perform at least some of the described functions, as performed by the processor of the present invention. The scope of the invention is therefore intended to be limited solely by the elements and limitations as used in the claims.

Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that the particular aspects shown and described by way of illustration are in no way intended to be considered limiting. In addition, the subject matter has been described with reference to specific aspects, but variations within the spirit and scope of the invention will occur to those skilled in the art. It should be noted that the foregoing examples have been provided merely for the purpose of explanation and are in no way to be construed as limiting of the present invention.

Although the invention has been described herein with reference to particular means, materials and aspects, the invention is not intended to be limited to the particulars disclosed herein; rather, the invention extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims. Various modifications and changes to the disclosed implementations may occur to those skilled in the art without departing from the scope of the invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:从随机化意图向量邻近度创建意图识别模型

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!