Artificial intelligence data processing method and related device

文档序号:1815842 发布日期:2021-11-09 浏览:22次 中文

阅读说明:本技术 一种人工智能的数据处理方法及相关装置 (Artificial intelligence data processing method and related device ) 是由 林正伟 于 2021-06-30 设计创作,主要内容包括:本申请公开了一种人工智能的数据处理方法,包括:N个GPU进行人工智能数据计算,得到原始数据;将原始数据发送至M个从CPU,控制每个从CPU对接收到的原始数据进行预处理,得到预处理数据;其中,M小于等于N;将预处理数据通过PCIe switch从M个从CPU发送至主CPU,控制主CPU对预处理数据进行处理,得到目标数据。保持数据处理效果的同时,降低数据处理的时延,提高数据处理的效果。本申请还公开了一种人工智能的数据处理装置、服务器以及计算机可读存储介质,具有以上有益效果。(The application discloses a data processing method of artificial intelligence, comprising the following steps: carrying out artificial intelligence data calculation on the N GPUs to obtain original data; sending the original data to M slave CPUs, and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N; and sending the preprocessed data to the main CPU from the M slave CPUs through the PCIe switch, and controlling the main CPU to process the preprocessed data to obtain target data. And the data processing effect is kept, meanwhile, the time delay of data processing is reduced, and the data processing effect is improved. The application also discloses an artificial intelligence data processing device, a server and a computer readable storage medium, which have the beneficial effects.)

1. An artificial intelligence data processing method, comprising:

carrying out artificial intelligence data calculation on the N GPUs to obtain original data;

sending the original data to M slave CPUs, and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N;

and sending the preprocessed data from the M slave CPUs to a master CPU through a PCIe switch, and controlling the master CPU to process the preprocessed data to obtain target data.

2. The artificial intelligence data processing method of claim 1, wherein sending the raw data to M slave CPUs, and controlling each slave CPU to preprocess the received raw data to obtain preprocessed data comprises:

sending the original data from the GPU to the corresponding slave CPUs through PCIe ports of the slave CPUs;

and controlling each slave CPU to preprocess the received original data to obtain the preprocessed data.

3. The artificial intelligence data processing method of claim 1, wherein each of the slave CPUs is connected to the master CPU through the PCIe switch.

4. The artificial intelligence data processing method of claim 1, wherein each of the GPUs is directly connected to a corresponding slave CPU through a PCIe port.

5. The artificial intelligence data processing method of claim 1, wherein controlling each of the slave CPUs to preprocess the received raw data to obtain preprocessed data comprises:

and controlling each slave CPU to carry out data simplification processing on the received original data to obtain the preprocessed data.

6. The artificial intelligence data processing method of claim 1, wherein controlling the main CPU to process the preprocessed data to obtain target data comprises:

and controlling the main CPU to carry out integration calculation on the received preprocessing data to obtain the target data.

7. An artificial intelligence data processing apparatus, comprising:

the GPU calculation module is used for carrying out artificial intelligence data calculation on the N GPUs to obtain original data;

the slave CPU calculation module is used for sending the original data to M slave CPUs and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N;

and the main CPU computing module is used for sending the preprocessed data from the M slave CPUs to a main CPU through a PCIe switch, and controlling the main CPU to process the preprocessed data to obtain target data.

8. The artificial intelligence data processing apparatus of claim 7, wherein the slave CPU computing module comprises:

the original data sending unit is used for sending the original data from the GPU to the corresponding slave CPU through a PCIe port of each slave CPU;

and the raw data preprocessing unit is used for controlling each slave CPU to preprocess the received raw data to obtain the preprocessed data.

9. A server, comprising:

a memory for storing a computer program;

processor for implementing the steps of the artificial intelligence data processing method according to any one of claims 1 to 6 when executing said computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the artificial intelligence data processing method according to any one of claims 1 to 6.

Technical Field

The present application relates to the field of computer technologies, and in particular, to an artificial intelligence data processing method, an artificial intelligence data processing apparatus, a server, and a computer-readable storage medium.

Background

With the continuous development of information technology, an AI (Artificial Intelligence) server is frequently used for performing Artificial Intelligence calculation, and a GPU (Graphics Processing Unit) card is widely used for the AI server for Artificial Intelligence. Because the system needs to carry a plurality of GPU cards for operation, but the number of PCIe port (peripheral component interconnect express port) of each CPU (Central Processing Unit) is limited, if only the CPU PCIe port is used to connect the GPU cards, the total number of GPU cards that the system can receive is limited. In order to expand the number of GPU cards in the server, more PCIe ports are developed through the PCIE Switch, so that the system can be connected with more GPU cards to execute more artificial intelligence operations.

In the related art, in order to expand more PCIe ports, multiple PCIe switches are used in the server, the PCIe Switch is connected to the CPU in an uplink manner, and the PCIe Switch expands more downstream PCIe ports to be connected to the GPU card. Only one PCIe bus is connected between the CPU and the PCIe Switch, and the PCIe Switch is connected with five GPUs through five PCIe buses, so that the upstream bandwidth and the downstream bandwidth of the PCIe Switch are unbalanced. The original data calculated by each GPU card needs to be transmitted to the CPU through the PCIe Switch, and because the PCIe Switch has unbalanced uplink and downlink bandwidths, a large amount of original data may cause PCIe between the CPU and the PCIe Switch to continuously transmit a large amount of data after GPU operation, which may cause a problem of calculation delay, and reduce the calculation efficiency.

Therefore, how to reduce the operation delay and improve the calculation efficiency is a key issue of attention of those skilled in the art.

Disclosure of Invention

The application aims to provide an artificial intelligence data processing method, an artificial intelligence data processing device, a server and a computer readable storage medium, so as to reduce the time delay of the artificial intelligence data processing method and improve the data processing efficiency.

In order to solve the above technical problem, the present application provides an artificial intelligence data processing method, including:

carrying out artificial intelligence data calculation on the N GPUs to obtain original data;

sending the original data to M slave CPUs, and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N;

and sending the preprocessed data from the M slave CPUs to a master CPU through a PCIe switch, and controlling the master CPU to process the preprocessed data to obtain target data.

Optionally, the sending the original data to M slave CPUs, and controlling each slave CPU to perform preprocessing on the received original data to obtain preprocessed data includes:

sending the original data from the GPU to the corresponding slave CPUs through PCIe ports of the slave CPUs;

and controlling each slave CPU to preprocess the received original data to obtain the preprocessed data.

Optionally, each slave CPU is connected to the master CPU through the PCIe switch.

Optionally, each GPU is directly connected to a corresponding slave CPU via a PCIe port.

Optionally, controlling each slave CPU to perform preprocessing on the received original data to obtain preprocessed data, including:

and controlling each slave CPU to carry out data simplification processing on the received original data to obtain the preprocessed data.

Optionally, controlling the main CPU to process the preprocessed data to obtain target data, including:

and controlling the main CPU to carry out integration calculation on the received preprocessing data to obtain the target data.

The present application further provides an artificial intelligence data processing apparatus, including:

the GPU calculation module is used for carrying out artificial intelligence data calculation on the N GPUs to obtain original data;

the slave CPU calculation module is used for sending the original data to M slave CPUs and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N;

and the main CPU computing module is used for sending the preprocessed data from the M slave CPUs to a main CPU through a PCIe switch, and controlling the main CPU to process the preprocessed data to obtain target data.

Optionally, the slave CPU computation module includes:

the original data sending unit is used for sending the original data from the GPU to the corresponding slave CPU through a PCIe port of each slave CPU;

and the raw data preprocessing unit is used for controlling each slave CPU to preprocess the received raw data to obtain the preprocessed data.

The present application further provides a server, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the artificial intelligence data processing method as described above when executing said computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the artificial intelligence data processing method as described above.

The application provides a data processing method of artificial intelligence, which comprises the following steps: carrying out artificial intelligence data calculation on the N GPUs to obtain original data; sending the original data to M slave CPUs, and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N; and sending the preprocessed data from the M slave CPUs to a master CPU through a PCIe switch, and controlling the master CPU to process the preprocessed data to obtain target data.

The original data obtained by GPU processing is directly sent to the slave CPU for corresponding preprocessing, instead of sending the data to the slave CPU for processing through PCIe Switch, the problem of unbalanced uplink bandwidth and downlink bandwidth is avoided, further, the data is finally processed through the master CPU, the data is processed according to different processing performances in a segmented mode, finally, the data volume received by the master CPU is reduced, the data processing effect is kept, meanwhile, the data processing time delay is reduced, and the data processing effect is improved.

The application also provides an artificial intelligence data processing device, a server and a computer readable storage medium, which have the beneficial effects, and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart of a data processing method for artificial intelligence according to an embodiment of the present disclosure;

fig. 2 is a schematic hardware structure diagram of an artificial intelligence data processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an artificial intelligence data processing apparatus according to an embodiment of the present disclosure.

Detailed Description

The core of the application is to provide an artificial intelligence data processing method, an artificial intelligence data processing device, a server and a computer readable storage medium, so as to reduce the time delay of the artificial intelligence data processing method and improve the data processing efficiency.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the related art, in order to expand more PCIe ports, multiple PCIe switches are used in the server, the PCIe Switch is connected to the CPU in an uplink manner, and the PCIe Switch expands more downstream PCIe ports to be connected to the GPU card. Only one PCIe bus is connected between the CPU and the PCIe Switch, and the PCIe Switch is connected with five GPUs through five PCIe buses, so that the upstream bandwidth and the downstream bandwidth of the PCIe Switch are unbalanced. The original data calculated by each GPU card needs to be transmitted to the CPU through the PCIe Switch, and because the PCIe Switch has unbalanced uplink and downlink bandwidths, a large amount of original data may cause PCIe between the CPU and the PCIe Switch to continuously transmit a large amount of data after GPU operation, which may cause a problem of calculation delay, and reduce the calculation efficiency.

Therefore, the application provides an artificial intelligence data processing method, original data obtained by GPU processing is directly sent to a slave CPU for corresponding preprocessing, instead of sending the data to the slave CPU for processing through a PCIe Switch, the problem of unbalanced uplink bandwidth and downlink bandwidth is avoided, further, the data is finally processed through a main CPU, the data is segmented and processed according to different processing performances, finally, the data volume received by the main CPU is reduced, the data processing time delay is reduced while the data processing effect is maintained, and the data processing effect is improved.

The following describes an artificial intelligence data processing method provided by the present application, by way of an example.

Referring to fig. 1, fig. 1 is a flowchart illustrating an artificial intelligence data processing method according to an embodiment of the present disclosure.

In this embodiment, the method may include:

s101, carrying out artificial intelligence data calculation on N GPUs to obtain original data;

the step aims to perform artificial intelligence data calculation on the N GPUs to obtain original data. That is, in this embodiment, the GPU performs corresponding artificial intelligence data calculation to obtain the raw data.

In the prior art, the raw data obtained after the GPU is calculated is directly sent to a single CPU through the PCIe switch, so that the CPU performs corresponding calculation processing on all the received raw data, and finally obtains target data. Therefore, in the prior art, a single CPU is connected to multiple GPUs by using PCIe switch. However, the PCIe switch is adopted, which easily causes the bandwidth imbalance between the uplink and the downlink, and causes the problem of an excessive amount of original data processed by the CPU, thereby increasing the time delay of data processing and reducing the efficiency.

The artificial intelligence data calculation performed by the GPU in this embodiment may adopt any artificial intelligence data calculation method provided in the prior art, and is not specifically limited herein.

S102, sending original data to M slave CPUs, and controlling each slave CPU to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N;

on the basis of S101, the step aims to send original data to M slave CPUs, and each slave CPU is controlled to preprocess the received original data to obtain preprocessed data; wherein M is less than or equal to N. That is, the raw data obtained by the calculation processing is sent to the corresponding CPUs, and each CPU performs preprocessing on the received raw data to obtain preprocessed data. Wherein M is less than or equal to N. That is to say, the original data is not directly sent to the corresponding CPU through the PCIe switch, so that the bandwidth received by the CPU is equal to or less than the set bandwidth, and the pressure of the CPU on processing data is reduced.

Therefore, in this embodiment, the CPU and the GPU are directly connected through a PCIe bus. That is, each GPU is directly connected to the corresponding slave CPU through a PCIe port.

Further, the step may include:

step 1, sending original data from a GPU to a corresponding slave CPU through a PCIe port of each slave CPU;

and step 2, controlling each slave CPU to carry out preprocessing on the received original data to obtain preprocessed data.

It can be seen that the present alternative is primarily illustrative of how to process the resulting preprocessed data. In the alternative scheme, the original data are sent from the GPU to the corresponding slave CPUs through PCIe ports of the slave CPUs, and the slave CPUs are controlled to preprocess the received original data to obtain preprocessed data.

Further, the step may include:

and controlling each slave CPU to carry out data simplification processing on the received original data to obtain preprocessed data.

It can be seen that the present alternative still illustrates how data preprocessing is performed. In the alternative, each slave CPU is controlled to carry out data simplification processing on the received original data to obtain preprocessed data.

S103, sending the preprocessed data to the main CPU from the M slave CPUs through the PCIe switch, and controlling the main CPU to process the preprocessed data to obtain target data.

On the basis of S102, this step is intended to send the preprocessed data from M slave CPUs to the master CPU through PCIe switch, and control the master CPU to process the preprocessed data, so as to obtain the target data. That is, each CPU sends the processed pre-processed data to the main CPU through the PCIe switch, so that the final integrated processing is performed on the data by the main CPU.

Wherein each slave CPU is connected with the master CPU through a PCIe switch.

Further, the step may include:

and controlling the main CPU to carry out integration calculation on the received preprocessed data to obtain target data.

As can be seen, in the alternative, the processing performed by the main CPU is mainly explained. In the alternative, the main CPU is controlled to carry out integrated calculation on the received preprocessed data to obtain target data.

Furthermore, in this embodiment, two CPUs are added to the existing AI server, and the GPU and one CPU do not make a direct connection scheme through the PCIe Switch, so that the raw data after the GPU operation is more efficiently transferred to the first CPU (CPU _0/CPU _1), which is the slave CPU, and the first CPU (CPU _0/CPU _1) performs the preliminary data processing and data simplification of the raw data after the GPU operation. The first CPU (CPU _0/CPU _1) transmits the processed preliminary data processing and data simplification to the main CPU (CPU Master) of the system through PCIe switch. The main CPU (CPU Master) of the system collects and integrates all the data, and then performs the final integration operation. The method reduces the time consumption of a main CPU (CPU Master) of the system for processing a large amount of original data caused by directly transmitting the original data after the operation of the GPU to the main CPU (CPU Master) through the PCIe Switch, and improves the overall operation efficiency of the AI server.

In summary, in the embodiment, original data obtained by processing by the GPU is directly sent to the slave CPU for corresponding preprocessing, instead of sending the data to the slave CPU for processing through the PCIe Switch, the problem of imbalance between the uplink bandwidth and the downlink bandwidth is avoided, and further, the data is finally processed by the master CPU, so that the data is processed in segments according to different processing performances, and finally, the data amount received by the master CPU is reduced, so that the data processing effect is maintained, the data processing delay is reduced, and the data processing effect is improved.

The following further describes an artificial intelligence data processing method provided by the present application from the perspective of a hardware structure by a specific embodiment.

Referring to fig. 2, fig. 2 is a schematic diagram of a hardware structure of an artificial intelligence data processing method according to an embodiment of the present disclosure.

In this embodiment, the GPU card is not directly connected to the CPU (CPU _0/CPU _1) via the PCIe Switch, which solves the problem of uplink and downlink imbalance caused by the PCIe Switch. The GPU card is used for performing calculation required by artificial intelligence, the calculated original data is transmitted to a first CPU (CPU _0/CPU _1) for preliminary data processing, then the CPU (CPU _0/CPU _1) transmits the processed data back to a main CPU (CPU Master) of the system through a PCIe switch Master, complex data processing is completed in advance, the original data is transmitted to the main CPU (CPU Master) of the system through a PCIe bus, the complex original data calculated by the main CPU (CPU Master) can be reduced, and the main CPU (CPU Master) only processes the data calculated by two CPUs (CPU _0/CPU _1), so that the data transmission and data processing of the main CPU (CPU Master) are more efficient, and the high-efficiency artificial intelligence calculation server is achieved.

Wherein, the first CPU is the slave CPU.

Two CPUs are added in the existing AI server, and a direct connection scheme is not performed between the GPU and one CPU through a PCIe Switch, so that the original data after GPU operation is more efficiently transmitted to a first CPU (CPU _0/CPU _1), the first CPU is a slave CPU, and the initial data processing and data simplification of the original data after GPU operation is performed on the first CPU (CPU _0/CPU _ 1). The first CPU (CPU _0/CPU _1) transmits the processed preliminary data processing and data simplification to the main CPU (CPU Master) of the system through PCIe switch. The main CPU (CPU Master) of the system collects and integrates all the data, and then performs the final integration operation. The method reduces the time consumption of a main CPU (CPU Master) of the system for processing a large amount of original data caused by directly transmitting the original data after the operation of the GPU to the main CPU (CPU Master) through the PCIe Switch, and improves the overall operation efficiency of the AI server.

It can be seen that, in this embodiment, the original data obtained by processing by the GPU is directly sent to the slave CPU for corresponding preprocessing, instead of sending the data to the slave CPU for processing through the PCIe Switch, so as to avoid the problem of imbalance between the uplink bandwidth and the downlink bandwidth, and further, the data is finally processed by the master CPU, so that the data is segmented and processed according to different processing performances, and finally, the data amount received by the master CPU is reduced, so that while the data processing effect is maintained, the time delay of data processing is reduced, and the data processing effect is improved.

The following describes an artificial intelligence data processing apparatus provided in an embodiment of the present application, and the artificial intelligence data processing apparatus described below and the artificial intelligence data processing method described above may be referred to in correspondence.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an artificial intelligence data processing apparatus according to an embodiment of the present disclosure.

In this embodiment, the apparatus may include:

the GPU calculation module 100 is used for carrying out artificial intelligence data calculation on the N GPUs to obtain original data;

the slave CPU calculation module 200 is configured to send the original data to M slave CPUs, and control each slave CPU to perform preprocessing on the received original data to obtain preprocessed data; wherein M is less than or equal to N;

and the main CPU computing module 300 is configured to send the preprocessed data from the M slave CPUs to the main CPU through the PCIe switch, and control the main CPU to process the preprocessed data to obtain target data.

Optionally, the slave CPU computation module is specifically configured to be a raw data sending unit, and is configured to send raw data from the GPU to the corresponding slave CPU through a PCIe port of each slave CPU; and the raw data preprocessing unit is used for controlling each slave CPU to preprocess the received raw data to obtain preprocessed data.

An embodiment of the present application further provides a server, including:

a memory for storing a computer program;

a processor for implementing the steps of the artificial intelligence data processing method as described in the above embodiments when executing the computer program.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the artificial intelligence data processing method according to the above embodiments.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The artificial intelligence data processing method, the artificial intelligence data processing device, the server and the computer readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种多节点集群环形通信的方法、装置、设备及可读介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!