Multichannel calculation accelerating equipment based on FPGA

文档序号:974731 发布日期:2020-11-03 浏览:12次 中文

阅读说明:本技术 一种基于fpga的多通道计算加速设备 (Multichannel calculation accelerating equipment based on FPGA ) 是由 诸俊辉 刘一清 李俊伟 于 2020-07-06 设计创作,主要内容包括:本发明公开了一种基于FPGA的多通道计算加速设备,该设备包括:万兆以太网接口模块、光口数据收发模块、仲裁调度模块、运算加速模块、DDR4控制模块、后处理模块、监控模块、三速以太网接口模块、DDR4存储芯片、电源模块及时钟模块。本发明为了解决边缘计算应用中,硬件加速设备资源共享的问题。本发明包括了多个万兆以太网接口,可以同时接收来自多台服务器的数据。通过本发明的资源调度机制,为这些数据安排了计算顺序,从而可以使本发明的硬件运算加速资源被多台服务器共享,提高了资源利用率。(The invention discloses a multichannel calculation accelerating device based on FPGA, comprising: the device comprises a gigabit Ethernet interface module, an optical port data transceiver module, an arbitration scheduling module, an operation acceleration module, a DDR4 control module, a post-processing module, a monitoring module, a three-speed Ethernet interface module, a DDR4 memory chip, a power supply module and a clock module. The invention aims to solve the problem of resource sharing of hardware acceleration equipment in edge computing application. The invention comprises a plurality of gigabit Ethernet interfaces and can simultaneously receive data from a plurality of servers. Through the resource scheduling mechanism, the calculation sequence is arranged for the data, so that the hardware operation acceleration resource can be shared by a plurality of servers, and the resource utilization rate is improved.)

1. The multichannel calculation accelerating equipment based on the FPGA is characterized by comprising a gigabit Ethernet interface module, an FPGA module, a three-speed Ethernet interface module, a DDR4 storage chip, a power supply module and a clock module, wherein the gigabit Ethernet interface module is connected with the FPGA module and the power supply module and used for receiving original operation data of a server and sending a data operation result;

the FPGA module is connected with the gigabit Ethernet interface module, the three-speed Ethernet interface module, the DDR4 memory chip, the power module and the clock module, receives original operation data from the gigabit Ethernet interface module, arranges a calculation sequence for the original operation data through an arbitration scheduling algorithm, performs hardware acceleration operation on the original operation data, packages an operation result into a network data packet, and sends the network data packet to the gigabit Ethernet interface module; the FPGA module also generates a working state data packet, which comprises: calculating the resource usage amount of the FPGA, the resource usage amount of the DDR4 storage and the flow statistics of each ten-gigabit Ethernet port;

the three-speed Ethernet interface module is connected with the FPGA module, the power supply module and the clock module and is used for sending a working state data packet generated by the FPGA module;

the DDR4 storage chip is connected with the FPGA module, the power supply module and the clock module and is used for storing original operation data from a plurality of servers;

the power supply module is connected with the FPGA module, the three-speed Ethernet interface module, the DDR4 memory chip and the clock module and is used for supplying power to all the modules;

the clock module is connected with the FPGA module, the three-speed Ethernet interface module and the DDR4 memory chip and is used for providing reference clocks for the modules.

2. The multi-channel computing accelerator device of claim 1, wherein the gigabit ethernet interface module is in data communication with a plurality of servers, and comprises a plurality of gigabit communication ports, and each gigabit communication port comprises an SFP + gigabit optical module interface and an optical module configuration circuit.

3. The multi-channel computing acceleration device of claim 1, characterized in that: the FPGA module comprises an optical port data transceiver module, an arbitration scheduling module, an operation acceleration module, a DDR4 control module, a post-processing module and a monitoring module;

the optical port data transceiver module is connected with the gigabit Ethernet interface module, the arbitration scheduling module, the post-processing module and the monitoring module, and is used for processing an MAC layer in the gigabit Ethernet, converting original operation data received from the gigabit Ethernet interface module into an AXI-Stream format with sixty-four bit width, adding port marks for data from different communication ports, and sending the data to the arbitration scheduling module; the optical port data transceiver module is also used for receiving an operation result data packet from the post-processing module, checking a port mark of the data packet and sending the port mark to a port corresponding to the port mark in the gigabit Ethernet interface module;

the arbitration scheduling module is connected with the DDR4 control module and the operation acceleration module and used for allocating and scheduling operation acceleration resources, and selects one original operation data from the optical port data transceiver module and the original operation data of each cached port in the DDR4 chip through an arbitration algorithm and submits the original operation data to the operation acceleration module for operation;

the operation acceleration module is connected with the arbitration scheduling module, the post-processing module and the monitoring module and is used for performing hardware acceleration calculation on data to obtain a data operation result;

the DDR4 control module is connected with the arbitration scheduling module, the DDR4 memory chip and the monitoring module and is used for generating a bus control signal of the DDR4 chip and finishing data storage and reading of the DDR4 chip; the DDR4 control module divides the DDR4 memory chip into a plurality of address spaces, corresponding to a plurality of communication ports in a gigabit Ethernet interface module; FIFO is established in each address space and used for storing original operation data packets from the corresponding communication ports;

the post-processing module is connected with the operation acceleration module and the optical port data transceiving module, determines which port the operation acceleration result is sent to, adds a port mark for the operation result, and encapsulates the operation result into a gigabit Ethernet network data packet;

the monitoring module is connected with the gigabit Ethernet interface module, the operation acceleration module, the optical port data transceiver module and the DDR4 control module and is used for counting the usage amount of FPGA operation resources, the usage amount of DDR4 storage resources and the data traffic of each gigabit Ethernet port; and packaging the statistical result into a working state data packet and sending the working state data packet to the three-speed Ethernet interface module.

4. The multi-channel computing accelerator of claim 1, wherein the three-speed ethernet interface module is a monitor port, and comprises an RJ45 ethernet socket and a PHY interface chip, for connecting to an external monitor device through a network cable, and transmitting the operation status data packet from the monitor module.

Technical Field

The invention relates to the technical field of network communication technology and hardware acceleration, in particular to a multi-channel computation acceleration device based on an FPGA (field programmable gate array) and specially used for the computation acceleration of a plurality of application servers.

Technical Field

With the development of technologies such as big data and artificial intelligence, the traditional computing equipment taking a CPU as a core is not enough to meet the computing requirements of big data size, high complexity, low delay and the like. Therefore, many graphics cards with GPU as a core or acceleration board card devices with dedicated hardware chips as a core are produced in the market. The devices can assist the server to complete the calculation work of complex data, and the internal structure of the device can better deal with complex calculation compared with a CPU (central processing unit), so that the calculation efficiency is improved, the calculation delay is reduced, the server pressure is shared, and the server resources are greatly released.

However, there are still some deficiencies in existing hardware acceleration devices, especially in terms of resource utilization. Most of the current hardware acceleration devices such as the GPU display card and the acceleration board card can only be installed on one server, and are dedicated to providing hardware acceleration services for the server. In this case, if the server is temporarily in an idle state and there is no need for data calculation, the utilization rate of the hardware acceleration device is greatly reduced, which unnecessarily increases power consumption and reduces usage efficiency.

Disclosure of Invention

The invention aims to solve the problem that the traditional hardware acceleration equipment (such as a hardware acceleration board card or a hardware acceleration bar) in hardware acceleration can only be exclusively used for one server in the applications of edge calculation, artificial intelligence, deep learning and the like, so that the computing resource can only be used by the server.

The specific technical scheme for realizing the invention is as follows:

a multichannel calculation accelerating device based on FPGA is characterized by comprising a gigabit Ethernet interface module, an FPGA module, a three-speed Ethernet interface module, a DDR4 storage chip, a power supply module and a clock module, wherein the gigabit Ethernet interface module is connected with the FPGA module and the power supply module and used for receiving original operation data of a server and sending a data operation result;

the FPGA module is connected with the gigabit Ethernet interface module, the three-speed Ethernet interface module, the DDR4 memory chip, the power module and the clock module, receives original operation data from the gigabit Ethernet interface module, arranges a calculation sequence for the original operation data through an arbitration scheduling algorithm, performs hardware acceleration operation on the original operation data, packages an operation result into a network data packet, and sends the network data packet to the gigabit Ethernet interface module; the FPGA module also generates a working state data packet, which comprises: calculating the resource usage amount of the FPGA, the resource usage amount of the DDR4 storage and the flow statistics of each ten-gigabit Ethernet port;

the three-speed Ethernet interface module is connected with the FPGA module, the power supply module and the clock module and is used for sending a working state data packet generated by the FPGA module;

the DDR4 storage chip is connected with the FPGA module, the power supply module and the clock module and is used for storing original operation data from a plurality of servers;

the power supply module is connected with the FPGA module, the three-speed Ethernet interface module, the DDR4 memory chip and the clock module and is used for supplying power to all the modules;

the clock module is connected with the FPGA module, the three-speed Ethernet interface module and the DDR4 memory chip and is used for providing reference clocks for the modules.

The gigabit Ethernet interface module is in data communication with a plurality of servers and comprises a plurality of gigabit communication ports, and each gigabit communication port comprises an SFP + gigabit optical module interface and an optical module configuration circuit.

The FPGA module comprises an optical port data transceiver module, an arbitration scheduling module, an operation acceleration module, a DDR4 control module, a post-processing module and a monitoring module;

the optical port data transceiver module is connected with the gigabit Ethernet interface module, the arbitration scheduling module, the post-processing module and the monitoring module, and is used for processing an MAC layer in the gigabit Ethernet, converting original operation data received from the gigabit Ethernet interface module into an AXI-Stream format with sixty-four bit width, adding port marks for data from different communication ports, and sending the data to the arbitration scheduling module; the optical port data transceiver module is also used for receiving an operation result data packet from the post-processing module, checking a port mark of the data packet and sending the port mark to a port corresponding to the port mark in the gigabit Ethernet interface module;

the arbitration scheduling module is connected with the DDR4 control module and the operation acceleration module and used for allocating and scheduling operation acceleration resources, and selects one original operation data from the optical port data transceiver module and the original operation data of each cached port in the DDR4 chip through an arbitration algorithm and submits the original operation data to the operation acceleration module for operation;

the operation acceleration module is connected with the arbitration scheduling module, the post-processing module and the monitoring module and is used for performing hardware acceleration calculation on data to obtain a data operation result;

the DDR4 control module is connected with the arbitration scheduling module, the DDR4 memory chip and the monitoring module and is used for generating a bus control signal of the DDR4 chip and finishing data storage and reading of the DDR4 chip; the DDR4 control module divides the DDR4 memory chip into a plurality of address spaces, corresponding to a plurality of communication ports in a gigabit Ethernet interface module; FIFO is established in each address space and used for storing original operation data packets from the corresponding communication ports;

the post-processing module is connected with the operation acceleration module and the optical port data transceiving module, determines which port the operation acceleration result is sent to, adds a port mark for the operation result, and encapsulates the operation result into a gigabit Ethernet network data packet;

the monitoring module is connected with the gigabit Ethernet interface module, the operation acceleration module, the optical port data transceiver module and the DDR4 control module and is used for counting the usage amount of FPGA operation resources, the usage amount of DDR4 storage resources and the data traffic of each gigabit Ethernet port; and packaging the statistical result into a working state data packet and sending the working state data packet to the three-speed Ethernet interface module.

The three-speed Ethernet interface module is a monitoring port, comprises an RJ45 Ethernet socket and a PHY interface chip, and is used for being connected with external monitoring equipment through a network cable and sending a working state data packet from the monitoring module.

Compared with the prior art, the invention has the beneficial effects that:

(1) the equipment comprises a plurality of data transmission ports, an arbitration scheduling mechanism is adopted inside the equipment, hardware acceleration calculation can be provided for a plurality of servers, the sharing of calculation acceleration resources is realized, the utilization rate of the calculation acceleration resources is greatly improved, the cost for realizing hardware acceleration by the plurality of servers is reduced, and the economic benefit of calculation acceleration is improved.

(2) The equipment adopts a plurality of gigabit Ethernet interfaces for completing data transmission between the server and the board card, the data communication rate of each interface is as high as 10Gbps, the data transmission efficiency is improved, and the low delay of the whole hardware acceleration process is ensured.

(3) The device adopts the FPGA as a core chip, has strong capability of parallel processing data, and can provide deeper acceleration service for data calculation.

Drawings

FIG. 1 is a block diagram of an embodiment of the present invention;

FIG. 2 is a flowchart illustrating the operation of data reception according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the operation of data processing according to an embodiment of the present invention;

fig. 4 is a schematic diagram of the connection application system of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于SONiC交换机实现错误回报的方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!