Data processing method and device and heterogeneous system

文档序号:135411 发布日期:2021-10-22 浏览:39次 中文

阅读说明:本技术 数据处理方法及装置、异构系统 (Data processing method and device and heterogeneous system ) 是由 李涛 林伟彬 刘昊程 许利霞 李生 于 2020-04-22 设计创作,主要内容包括:本申请公开了一种数据处理方法及装置、异构系统,属于计算机技术领域。所述异构系统包括:相连接的第一处理器和第一加速器,以及与第一加速器连接的第一辅存储器;其中,第一处理器用于将待处理数据写入第一辅存储器,以及触发第一加速器根据处理指令,对第一辅存储器中的待处理数据进行处理;第一加速器用于将待处理数据的处理结果写入第一辅存储器,以及触发第一处理器从第一辅存储器中读取处理结果。本申请实施例中第一处理器与第一加速器的交互次数较少,数据处理方法的过程较为简单,使得数据处理的效率较高。(The application discloses a data processing method and device and a heterogeneous system, and belongs to the technical field of computers. The heterogeneous system comprises: the system comprises a first processor and a first accelerator which are connected, and a first auxiliary memory connected with the first accelerator; the first processor is used for writing the data to be processed into the first auxiliary memory and triggering the first accelerator to process the data to be processed in the first auxiliary memory according to the processing instruction; the first accelerator is used for writing a processing result of the data to be processed into the first auxiliary memory and triggering the first processor to read the processing result from the first auxiliary memory. In the embodiment of the application, the number of times of interaction between the first processor and the first accelerator is small, and the process of the data processing method is simple, so that the data processing efficiency is high.)

1. A heterogeneous system, comprising: the system comprises a first processor and a first accelerator which are connected, and a first auxiliary memory connected with the first accelerator;

the first processor is used for writing data to be processed into the first auxiliary memory;

the first processor is further configured to trigger the first accelerator to process the data to be processed in the first secondary memory according to a processing instruction;

the first accelerator is used for writing a processing result of the data to be processed into the first auxiliary memory;

the first accelerator is configured to trigger the first processor to read the processing result from the first secondary memory.

2. The heterogeneous system of claim 1, wherein the first processor and the first accelerator are connected by a cache coherent bus.

3. The heterogeneous system of claim 2, wherein the cache coherency bus comprises: a CCIX bus, the first processor comprising: an advanced reduced instruction set machine (ARM) architecture processor;

alternatively, the cache coherency bus comprises: a CXL bus, the first processor comprising: x86 architecture processor.

4. The heterogeneous system of any one of claims 1 to 3, wherein the heterogeneous system comprises: a plurality of accelerators connected to each other, the first accelerator being any one of the plurality of accelerators;

the processing instruction carries an accelerator identifier, and the accelerator identifier is an identifier of an accelerator used for executing the processing instruction in the plurality of accelerators;

and the first accelerator is used for processing the data to be processed in the first auxiliary memory according to the processing instruction when the accelerator identifier is the identifier of the first accelerator.

5. The heterogeneous system of claim 4, wherein the heterogeneous system comprises: the first processor is any one processor connected with the first accelerator in the plurality of processors; the processing instruction also carries an identifier of the first processor;

the first accelerator is used for writing the data to be processed into an auxiliary memory connected with an auxiliary accelerator indicated by the accelerator identifier when the accelerator identifier is not the identifier of the first accelerator, and triggering the auxiliary accelerator to process the data to be processed according to the processing instruction;

the auxiliary accelerator is to:

after the data to be processed is processed according to the processing instruction, writing a processing result of the data to be processed into a connected auxiliary memory;

and triggering the first processor to read the processing result from an auxiliary memory connected with the auxiliary accelerator according to the identifier of the first processor carried by the processing instruction.

6. A data processing method for use in a first accelerator in a heterogeneous system, the heterogeneous system further comprising: a first processor and a first secondary memory coupled to the first accelerator; the method comprises the following steps:

under the trigger of the first processor, processing the data to be processed in the first auxiliary memory according to a processing instruction;

writing a processing result of the data to be processed into the first auxiliary memory;

triggering the first processor to read the processing result from the first secondary memory.

7. The method of claim 6, wherein the heterogeneous system comprises: a plurality of accelerators connected to each other, the first accelerator being any one of the plurality of accelerators; the processing instruction carries an accelerator identifier, the accelerator identifier is an identifier of an accelerator in the heterogeneous system, the accelerator is used for executing the processing instruction, and the processing of the data to be processed in the first auxiliary memory according to the processing instruction includes:

and when the accelerator identifier is the identifier of the first accelerator, processing the data to be processed in the first auxiliary memory according to the processing instruction.

8. The method of claim 7, wherein the heterogeneous system comprises: the first processor is any one processor connected with the first accelerator in the plurality of processors; the processing instruction further carries an identifier of the first processor, and the method further includes:

and when the accelerator identifier is not the identifier of the first accelerator, writing the data to be processed into an auxiliary memory connected with the auxiliary accelerator indicated by the accelerator identifier, and triggering the auxiliary accelerator to process the data to be processed according to the processing instruction.

9. A data processing method for an auxiliary accelerator in a heterogeneous system, the heterogeneous system comprising: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the auxiliary accelerator and the first accelerator are any two connected accelerators in the plurality of accelerators;

the method comprises the following steps:

under the triggering of the first accelerator, processing data to be processed in an auxiliary memory connected with the auxiliary accelerator according to a processing instruction, wherein the processing instruction carries an identifier of the first processor connected with the first accelerator;

writing the processing result of the data to be processed into a connected auxiliary memory;

and triggering the first processor to read the processing result from an auxiliary memory connected with the auxiliary accelerator according to the identifier of the first processor carried by the processing instruction.

10. A data processing method for use in a first processor in a heterogeneous system, the heterogeneous system further comprising: a first accelerator coupled to the first processor, and a first secondary memory coupled to the first accelerator; the method comprises the following steps:

writing data to be processed into the first secondary memory;

triggering the first accelerator to process the data to be processed in the first auxiliary memory according to a processing instruction;

and under the triggering of the first accelerator, reading a processing result of the data to be processed from the first auxiliary memory.

11. The method of claim 10, wherein the heterogeneous system comprises: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the processing instruction carries: an accelerator identification and an identification of the first processor, the accelerator identification being an identification of an accelerator in the heterogeneous system for executing the processing instruction;

the reading, triggered by the first accelerator, a processing result of the data to be processed from the first secondary memory includes:

when the accelerator identifier is the identifier of the first accelerator, reading a processing result of the data to be processed from the first auxiliary memory under the triggering of the first accelerator;

the method further comprises the following steps:

and when the accelerator identification is the identification of the auxiliary accelerator connected with the first accelerator, under the triggering of the auxiliary accelerator, reading the processing result from an auxiliary memory connected with the auxiliary accelerator.

12. A data processing apparatus for use in a first accelerator in a heterogeneous system, the heterogeneous system further comprising: a first processor and a first secondary memory coupled to the first accelerator; the data processing apparatus includes:

the processing module is used for processing the data to be processed in the first auxiliary memory according to a processing instruction under the triggering of the first processor;

a write-in module, configured to write a processing result of the to-be-processed data into the first auxiliary memory;

and the triggering module is used for triggering the first processor to read the processing result from the first auxiliary memory.

13. The data processing apparatus of claim 12, wherein the heterogeneous system comprises: a plurality of accelerators connected to each other, the first accelerator being any one of the plurality of accelerators; the processing instruction carries an accelerator identifier, and the accelerator identifier is an identifier of an accelerator used for executing the processing instruction in the heterogeneous system;

and the processing module is used for processing the data to be processed in the first auxiliary memory according to the processing instruction when the accelerator identifier is the identifier of the first accelerator.

14. The data processing apparatus of claim 13, wherein the heterogeneous system comprises: the first processor is any one processor connected with the first accelerator in the plurality of processors; the processing instruction also carries an identifier of the first processor;

the writing module is further configured to write the data to be processed into a secondary memory connected to the secondary accelerator indicated by the accelerator identifier when the accelerator identifier is not the identifier of the first accelerator;

the triggering module is further configured to trigger the auxiliary accelerator to process the data to be processed according to the processing instruction.

15. A data processing apparatus for use in an auxiliary accelerator in a heterogeneous system, the heterogeneous system comprising: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the auxiliary accelerator and the first accelerator are any two connected accelerators in the plurality of accelerators; the data processing apparatus includes:

the processing module is used for processing data to be processed in an auxiliary memory connected with the auxiliary accelerator according to a processing instruction under the triggering of the first accelerator, wherein the processing instruction carries an identifier of the first processor connected with the first accelerator;

the write-in module is used for writing the processing result of the data to be processed into the connected auxiliary memory;

and the triggering module is used for triggering the first processor to read the processing result from an auxiliary memory connected with the auxiliary accelerator according to the identifier of the first processor carried by the processing instruction.

16. A data processing apparatus for use in a first processor in a heterogeneous system, the heterogeneous system further comprising: a first accelerator coupled to the first processor, and a first secondary memory coupled to the first accelerator; the data processing apparatus includes:

the writing module is used for writing the data to be processed into the first auxiliary memory;

the triggering module is used for triggering the first accelerator to process the data to be processed in the first auxiliary memory according to a processing instruction;

and the reading module is used for reading the processing result of the data to be processed from the first auxiliary memory under the triggering of the first accelerator.

17. The data processing apparatus of claim 16, wherein the heterogeneous system comprises: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the processing instruction carries: an accelerator identification and an identification of the first processor, the accelerator identification being an identification of an accelerator in the heterogeneous system for executing the processing instruction;

the reading module is used for:

when the accelerator identifier is the identifier of the first accelerator, reading a processing result of the data to be processed from the first auxiliary memory under the triggering of the first accelerator;

and when the accelerator identification is the identification of the auxiliary accelerator connected with the first accelerator, under the triggering of the auxiliary accelerator, reading the processing result from an auxiliary memory connected with the auxiliary accelerator.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and a heterogeneous system.

Background

Heterogeneous systems typically include processors and accelerators connected by a high-speed serial computer interconnect express (PCIE) bus. The accelerator can assist the processor to execute a certain data processing flow, so that the heterogeneous system has strong data processing capacity.

Typically, the processor is coupled to a main memory and the accelerator is coupled to a secondary memory. When the processor needs to control the accelerator to process data, the processor first needs to notify the accelerator to move the data to be processed in the main Memory to the auxiliary Memory by using a Direct Memory Access (DMA) mode. The processor then notifies the accelerator to process the data in the secondary memory. After the accelerator finishes processing the data, the accelerator writes the processing result into the auxiliary memory and informs the processor that the data is finished being processed. Finally, the processor needs to notify the accelerator to move the processing result from the secondary memory to the main memory in a DMA manner, so that the processor can obtain the processing result from the main memory.

It can be seen that, in the process of data processing by the accelerator auxiliary processor, the number of information interaction between the processor and the accelerator is large, thereby affecting the efficiency of data processing.

Disclosure of Invention

The application provides a data processing method and device and a heterogeneous system, which can solve the problem of low data processing efficiency, and the technical scheme is as follows:

in a first aspect, a heterogeneous system is provided, which includes: the system comprises a first processor and a first accelerator which are connected, and a first auxiliary memory connected with the first accelerator; the first processor is used for writing data to be processed into the first auxiliary memory and triggering the first accelerator to process the data to be processed in the first auxiliary memory according to a processing instruction; the first accelerator is used for writing a processing result of the data to be processed into the first auxiliary memory and triggering the first processor to read the processing result from the first auxiliary memory.

It can be seen that, in the data processing method provided in the embodiment of the present application, the first accelerator can assist the first processor in processing the data to be processed, so that the data processing capability of the entire heterogeneous system is high. In addition, in the data processing method, the first processor can directly write the data to be processed into the auxiliary memory connected with the first accelerator. Therefore, the process that the first processor informs the first accelerator to move the data to be processed from the main memory connected with the first processor to the auxiliary memory is avoided, and the process that the first accelerator moves the data to be processed is also avoided. In addition, in the data processing method, the first accelerator may directly write the processing result to the secondary memory, and the first processor may be capable of acquiring the processing result from the secondary memory. Therefore, the process that the first accelerator informs the first processor that the data to be processed is processed completely and the first processor informs the first accelerator to move the processing result from the auxiliary memory to the main memory is avoided. Therefore, in the embodiment of the application, the number of times of interaction between the first processor and the first accelerator is small, and the process of the data processing method is simple, so that the data processing efficiency is high.

Optionally, the first processor and the first accelerator are connected by a cache coherent bus. A cache coherency bus is a bus that employs a cache coherency protocol. In the case of a cache coherent bus connection between a processor and an accelerator, memory space on the main memory, memory space on the accelerator, and memory space on the secondary memory in a heterogeneous system can be visible to the processor. The memory spaces are uniformly addressed in the processor so that the processor can read and write to the memory spaces based on the addresses of the memory spaces.

Optionally, the cache coherency bus comprises: a CCIX bus or a CXL bus. Optionally, when the cache coherency bus comprises: a CCIX bus, the first processor comprising: an ARM architecture processor; alternatively, when the cache coherency bus comprises: CXL bus, the first processor comprising: x86 architecture processor.

Optionally, the secondary memory comprises: HBM. Because the HBM can provide a storage function with higher bandwidth, the data processing efficiency of the heterogeneous system can be improved. Furthermore, the HBM is small in size and low in operation power.

Optionally, the accelerator comprises: GPU, FPGA, or ASIC. Of course, the accelerator may be other devices having data processing functions, and the present application is not limited thereto.

Optionally, the heterogeneous system comprises: a plurality of accelerators connected to each other, the first accelerator being any one of the plurality of accelerators; the processing instruction carries an accelerator identifier, and the accelerator identifier is an identifier of an accelerator used for executing the processing instruction in the plurality of accelerators; and the first accelerator is used for processing the data to be processed in the first auxiliary memory according to the processing instruction when the accelerator identifier is the identifier of the first accelerator. It can be seen that where the heterogeneous system includes multiple accelerators, the first accelerator can determine from the accelerator identification whether the first accelerator is an accelerator designated by the processor for processing data to be processed.

Optionally, the heterogeneous system comprises: the first processor is any one processor connected with the first accelerator in the plurality of processors; the processing instruction also carries an identifier of the first processor; the first accelerator is used for writing the data to be processed into an auxiliary memory connected with an auxiliary accelerator indicated by the accelerator identifier when the accelerator identifier is not the identifier of the first accelerator, and triggering the auxiliary accelerator to process the data to be processed according to the processing instruction; the auxiliary accelerator is to: after the data to be processed is processed according to the processing instruction, writing a processing result of the data to be processed into a connected auxiliary memory, and triggering the first processor to read the processing result from the auxiliary memory connected with the auxiliary accelerator according to the identifier of the first processor carried by the processing instruction. When the first processor sends the processing instruction to the first accelerator by mistake, the first accelerator can forward the processing instruction to the auxiliary accelerator, so that the auxiliary accelerator executes processing on data to be processed, and adverse consequences caused by the fact that the first processor sends the processing instruction by mistake are avoided.

Optionally, the plurality of accelerators are connected by a cache coherent bus and the plurality of processors are connected by a cache coherent bus.

Optionally, in the heterogeneous system, the processor may trigger the accelerator to process the data to be processed in the secondary memory according to the processing instruction in various ways, and the accelerator may trigger the processor to read the processing result from the secondary memory connected to the accelerator in various ways. For example, the processor triggers the accelerator to process the data to be processed by sending a processing instruction to the accelerator, and the accelerator triggers the processor to read the processing result by sending a processing response to the processor. For another example, the processor may trigger the accelerator to process the data to be processed by changing the state value of the register, and the accelerator may trigger the processor to read the processing result by changing the state value of the register.

In a second aspect, a data processing method is provided, which is used for a first accelerator in a heterogeneous system, where the heterogeneous system further includes: a first processor and a first secondary memory coupled to the first accelerator; the method comprises the following steps: and under the triggering of the first processor, processing the data to be processed in the first auxiliary memory according to a processing instruction, then writing a processing result of the data to be processed into the first auxiliary memory, and triggering the first processor to read the processing result from the first auxiliary memory.

Optionally, the heterogeneous system comprises: a plurality of accelerators connected to each other, the first accelerator being any one of the plurality of accelerators; the processing instruction carries an accelerator identifier, the accelerator identifier is an identifier of an accelerator used for executing the processing instruction in the heterogeneous system, and the first accelerator may process the data to be processed in the first auxiliary memory according to the processing instruction when the accelerator identifier is the identifier of the first accelerator.

Optionally, the heterogeneous system comprises: the first processor is any one processor connected with the first accelerator in the plurality of processors; the processing instruction further carries an identifier of the first processor, and the method further includes: and when the accelerator identifier is not the identifier of the first accelerator, writing the data to be processed into an auxiliary memory connected with the auxiliary accelerator indicated by the accelerator identifier, and triggering the auxiliary accelerator to process the data to be processed according to the processing instruction.

Alternatively, the first accelerator may trigger the first processor to read the processing results from the secondary memory to which the first accelerator is connected in a number of ways. For example, the first accelerator triggers the first processor to read the processing result by sending a processing response to the first processor. For another example, the first accelerator may trigger the first processor to read the processing result by changing the state value of the register.

In a third aspect, a data processing method is provided for an auxiliary accelerator in a heterogeneous system, where the heterogeneous system includes: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the auxiliary accelerator and the first accelerator are any two connected accelerators in the plurality of accelerators; the method comprises the following steps: under the triggering of the first accelerator, processing data to be processed in an auxiliary memory connected with the auxiliary accelerator according to a processing instruction, wherein the processing instruction carries an identifier of the first processor connected with the first accelerator, and writing a processing result of the data to be processed into the connected auxiliary memory; and then triggering the first processor to read the processing result from an auxiliary memory connected with the auxiliary accelerator according to the identifier of the first processor carried by the processing instruction.

Alternatively, the secondary accelerator may trigger the first processor to read the processing results from a secondary memory to which the secondary accelerator is connected in a number of ways. For example, the assist accelerator triggers the first processor to read the processing result by sending a processing response to the first processor. For another example, the assist accelerator may also trigger the first processor to read the processing result by changing the state value of the register.

In a fourth aspect, a data processing method is provided for a first processor in a heterogeneous system, where the heterogeneous system further includes: a first accelerator coupled to the first processor, and a first secondary memory coupled to the first accelerator; the method comprises the following steps: writing data to be processed into the first auxiliary memory, and triggering the first accelerator to process the data to be processed in the first auxiliary memory according to a processing instruction; and then reading the processing result of the data to be processed from the first auxiliary memory under the triggering of the first accelerator.

Optionally, the heterogeneous system comprises: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the processing instruction carries: an accelerator identification and an identification of the first processor, the accelerator identification being an identification of an accelerator in the heterogeneous system for executing the processing instruction; the first processor can read a processing result of the data to be processed from the first auxiliary memory under the triggering of the first accelerator when the accelerator identifier is the identifier of the first accelerator; when the accelerator identifier is the identifier of the auxiliary accelerator connected to the first accelerator, the first processor may read the processing result from the auxiliary memory connected to the auxiliary accelerator under the trigger of the auxiliary accelerator.

Alternatively, the first processor may trigger the first accelerator to process the data to be processed in the first secondary memory according to the processing instruction in various ways. For example, the first processor triggers the first accelerator to process the data to be processed by sending a processing instruction to the first accelerator. For another example, the first processor may also trigger the first accelerator to process the data to be processed by changing a state value of the register.

In a fifth aspect, a data processing apparatus is provided, for use in a first accelerator in a heterogeneous system, the heterogeneous system further comprising: a first processor and a first secondary memory coupled to the first accelerator; the data processing apparatus includes: the modules are used for executing the data processing method provided by the second aspect.

In a sixth aspect, a data processing apparatus is provided for use in an auxiliary accelerator in a heterogeneous system, the heterogeneous system comprising: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the auxiliary accelerator and the first accelerator are any two connected accelerators in the plurality of accelerators; the data processing apparatus includes: the respective modules for executing the data processing method provided in the third aspect.

In a seventh aspect, a data processing apparatus is provided, which is used for a first processor in a heterogeneous system, where the heterogeneous system further includes: a first accelerator coupled to the first processor, and a first secondary memory coupled to the first accelerator; the data processing apparatus includes: and the modules are used for executing the data processing method provided by the fourth aspect.

In an eighth aspect, a computer storage medium is provided, in which a computer program is stored, and when the computer program runs on a computer device, the computer device is caused to execute any one of the data processing methods provided in the second aspect, the third aspect, or the fourth aspect.

In a ninth aspect, there is provided a computer program product comprising instructions which, when run on a computer apparatus, cause the computer apparatus to perform any one of the data processing methods provided in the second, third or fourth aspects.

The beneficial effects of the second aspect to the ninth aspect may refer to the beneficial effects in the corresponding description of the first aspect, and are not repeated herein.

Drawings

Fig. 1 is a schematic structural diagram of a heterogeneous system according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of another heterogeneous system provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of another heterogeneous system provided in an embodiment of the present application;

fig. 4 is a flowchart of a data processing method according to an embodiment of the present application;

FIG. 5 is a flow chart of another data processing method provided by the embodiments of the present application;

fig. 6 is a functional module schematic diagram of a heterogeneous system according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 8 is a block diagram of another data processing apparatus provided in an embodiment of the present application;

fig. 9 is a block diagram of another data processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the principle and technical solution of the present application clearer, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

With the development of computer technology, heterogeneous systems with strong data processing capability have been widely developed. The heterogeneous system can realize high-efficiency data processing, such as online prediction processing based on deep learning, video transcoding processing in live broadcasting, picture compression or decompression processing and the like.

Exemplarily, fig. 1 is a schematic structural diagram of a heterogeneous system provided in an embodiment of the present application, and as shown in fig. 1, the heterogeneous system generally includes: at least one processor and at least one accelerator. In fig. 1, the heterogeneous system includes a processor 011 and an accelerator 021 as an example, but the number of processors and accelerators in the heterogeneous system may also be greater than 1, which is not limited in this application.

For example, as shown in fig. 2, at least one processor in a heterogeneous system may include: processor 011 and processor 012, the at least one accelerator comprising: accelerator 021 and accelerator 022. For another example, as shown in FIG. 3, at least one processor in a heterogeneous system may include: processor 011, processor 012, processor 013, and processor 014, the at least one accelerator comprising: accelerator 021, accelerator 022, accelerator 023, and accelerator 024. As can be seen from the connection relationship between the devices shown in fig. 2 and fig. 3, when the heterogeneous system includes a plurality of processors, the processors are connected to each other by using a cache coherent bus (some processors are directly connected to each other, and some processors are indirectly connected to each other). When the heterogeneous system comprises a plurality of accelerators, the accelerators are connected with each other by a cache coherence bus (some accelerators are directly connected with each other, and some accelerators are indirectly connected with each other). Each processor is connected to at least one accelerator.

The processor and the accelerator in the heterogeneous system each have data processing functionality, and the accelerator is used to assist the processor in performing some data processing to enhance the data processing capabilities of the heterogeneous system. The processor may be any type of processor, such as an Advanced Reduced Instruction Set machine (ARM) architecture processor, an x86 architecture processor, or the like. The ARM architecture processor and the x86 architecture processor are names of processors with two different architectures, respectively, and the two processors have different protocols and different power consumption, performance and cost. The accelerator may be any device having a data Processing function, such as a Graphics Processing Unit (GPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), or the like.

Further, each processor in the heterogeneous system may have a main memory attached and each accelerator may have a secondary memory attached. At this time, the heterogeneous system further includes: a main memory to which each processor is connected and a secondary memory to which each accelerator is connected. For example, with continued reference to fig. 1, the heterogeneous system further includes: a main memory 031 to which the processor 011 is connected, and a secondary memory 041 to which the accelerator 021 is connected. With continued reference to fig. 2, the heterogeneous system further includes: processor 011 connected main memory 031, processor 012 connected main memory 032, accelerator 021 connected secondary memory 041, and accelerator 022 connected secondary memory 042. With continued reference to fig. 3, the heterogeneous system further includes: processor 011 connected main memory 031, processor 012 connected main memory 032, processor 013 connected main memory 033, processor 014 connected main memory 034, accelerator 021 connected secondary memory 041, and accelerator 022 connected secondary memory 042, accelerator 023 connected secondary memory 043, and accelerator 024 connected secondary memory 044.

The main Memory and the secondary Memory in the heterogeneous system may be any kind of Memory, such as a double data rate synchronous dynamic random access Memory (DDR), a High Bandwidth Memory (HBM), and the like. In the embodiment of the present application, the primary memory is DDR, and the secondary memory is HBM. When the heterogeneous system includes the HBM, the HBM can provide a storage function with a higher bandwidth, and thus, the data processing efficiency of the heterogeneous system can be improved. Furthermore, the HBM is small in size and low in operation power.

Additionally, the main memory in the heterogeneous system may be independent of the connected processor, and the secondary memory may be independent of the connected accelerator; alternatively, the main memory may be integrated on the connected processor and the secondary memory may be integrated on the connected accelerator. In the embodiment of the present application, the main memory is independent from the connected processor, and the secondary memory is integrated on the connected accelerator (the integration relation is not shown in fig. 1).

In the related art, a processor and an accelerator in a heterogeneous system are connected through a PCIE bus. The memory space on the main memory to which the processor is connected and the memory space on the accelerator to which the processor is connected are visible to the processor, which can read and write to these memory spaces. However, the storage space on the auxiliary memory connected to the accelerator is invisible to the processor, and the processor cannot read and write the storage space. Therefore, when the processor in the related art needs to control the accelerator to process data, the processor first needs to write the data into the main memory and notify the accelerator to move the data to be processed in the main memory to the secondary memory by means of DMA. The processor then notifies the accelerator to process the data in the secondary memory. After the accelerator finishes processing the data, the accelerator writes the processing result into the auxiliary memory and informs the processor that the data is finished being processed. Finally, the processor needs to notify the accelerator to move the processing result from the secondary memory to the main memory by means of DMA, and then the processor can read the processing result from the main memory. It can be seen that, in the process of data processing by the accelerator auxiliary processor, the number of information interaction between the processor and the accelerator is large, thereby affecting the efficiency of data processing.

In the embodiment of the application, a processor and an accelerator in a heterogeneous system are connected through a cache coherence bus. A cache coherency bus is a bus that employs a cache coherency protocol. In the case of a cache coherent bus connection between a processor and an accelerator, memory space on the main memory, memory space on the accelerator, and memory space on the secondary memory in a heterogeneous system can be visible to the processor. The memory spaces are uniformly addressed in the processor so that the processor can read and write to the memory spaces based on the addresses of the memory spaces.

The Cache coherency bus may be any bus using a Cache coherency protocol, such as a Cache coherence Interconnect for accesses (CCIX) bus, or a computer Express Link (CXL) bus, for example. Optionally, when the cache coherency bus is a CCIX bus, the processor may be the ARM architecture processor described above; where the cache coherency bus is a CXL bus, the processor may be the x86 architecture processor described above. The type of the cache coherency bus and the type of the processor are not limited in the embodiments of the present application.

On the basis of connection between a processor and an accelerator through a cache coherence bus, the embodiment of the application provides a data processing method for the heterogeneous system. The data processing method not only can enable the accelerator auxiliary processor to process data, but also can reduce the number of times of interaction between the processor and the accelerator, thereby improving the efficiency of data processing.

The data processing method provided by the embodiment of the present application may be used in a heterogeneous system (e.g., a heterogeneous system shown in any one of fig. 1 to fig. 3) provided by the embodiment of the present application. The method involves a first processor, a first accelerator and a first auxiliary memory in the heterogeneous system, wherein the first processor is any processor in the heterogeneous system, the first accelerator is any accelerator connected with the first processor, and the first auxiliary memory is an auxiliary memory connected with the first accelerator.

For example, fig. 4 is a flowchart of a data processing method according to an embodiment of the present application, and in fig. 4, the first processor is the processor 011 in fig. 1, the first accelerator is the accelerator 021 in fig. 1, and the secondary memory connected to the first accelerator is the secondary memory 041 in fig. 1. As shown in fig. 4, the data processing method may include:

s401, the processor 011 writes the data to be processed into an auxiliary memory 041 connected with an accelerator 021.

In the embodiment of the present application, since the processor 011 and the accelerator 021 are connected by a cache coherent bus, the memory space on the main memory, the accelerator, and the auxiliary memory in the heterogeneous system is visible to the processor 011. At the time of startup of a heterogeneous System, all processors in the heterogeneous System (such as a Basic Input Output System (BIOS) in all processors) need to uniformly address storage spaces on a main memory, an accelerator, and a secondary memory in the System. In this way, each processor in the heterogeneous system has the address of each storage unit (the minimum storage unit in the storage space) in the storage spaces, and then the processors in the heterogeneous system can directly read and write data at the addresses.

By way of example, assume that the storage space on main memory 031 in fig. 1 includes: storage unit 1 and storage unit 2, the storage space on accelerator 021 (such as the storage space of some I/O (input/output) registers on accelerator 021) includes: a storage unit 3 and a storage unit 4, and a storage space on the auxiliary memory 041 includes: a storage unit 5 and a storage unit 6. The processor 011 can obtain the addresses A, B, C, D, E and F of the memory locations shown in table 1 after uniformly addressing the memory spaces.

TABLE 1

In S401, the processor 011 can write the data to be processed into at least one storage unit in the secondary memory 041 according to the address of each storage unit on the secondary memory 041. It should be noted that the to-be-processed data may be data generated by the processor 011, or data sent by another device outside the heterogeneous system, or the to-be-processed data may also be data stored in the main memory 031 by the processor 011 before S401, which is not limited in this embodiment of the present application.

S402, the processor 011 sends a processing instruction of the data to be processed to the accelerator 021.

The processing instruction is used to instruct the accelerator 021 to perform some processing on the data to be processed. Therefore, the processing instruction may carry a storage address of the data to be processed in the secondary memory connected to the accelerator 021, and indication information of the certain processing. For example, the certain processing may be processing based on a machine learning algorithm, a deep learning algorithm, or a financial wind control algorithm, and the processing indicated by the processing instruction is not limited in the embodiment of the present application.

S403, the accelerator 021 processes the data to be processed in the auxiliary memory 041 according to the processing instruction.

The accelerator 021, after receiving the processing instruction, can parse the processing instruction to determine the address where the data to be processed indicated by the processing instruction is located and the processing that the data to be processed needs to be processed. Then, the accelerator 021 can read the data to be processed from the connected auxiliary memory 041, and execute the processing indicated in the processing instruction on the data to be processed, so as to obtain a processing result of the data to be processed.

S404, the accelerator 021 writes the processing result of the data to be processed into the connected auxiliary memory 041.

S405, the accelerator 021 sends a processing response of the data to be processed to the processor 011.

The processing response is used for instructing the processor 011 to acquire a processing result of the data to be processed. Therefore, the processing response needs to carry the storage address of the processing result in the auxiliary memory connected to the accelerator 021.

S406, the processor 011 reads the processing result from the auxiliary memory 041 connected to the accelerator 021 in accordance with the processing response.

Illustratively, the processing response carries the storage address of the processing result in the auxiliary memory 041 connected to the accelerator 021, and thus the processor 011 can read the processing result according to the storage address.

It can be seen that, in the data processing method provided in this embodiment of the application, the first accelerator (such as the accelerator 021 mentioned above) can assist the first processor (such as the processor 011 mentioned above) in processing the data to be processed, so that the data processing capability of the entire heterogeneous system is higher.

In the data processing method, the first processor (such as the processor 011) can directly write the data to be processed into the auxiliary memory (such as the auxiliary memory 041) connected to the first accelerator (such as the accelerator 021). Therefore, the process that the first processor informs the first accelerator to move the data to be processed from the main memory connected with the first processor to the auxiliary memory is avoided, and the process that the first accelerator moves the data to be processed is also avoided.

In addition, in the data processing method, the first accelerator may directly write the processing result to the secondary memory, and the first processor may be capable of acquiring the processing result from the secondary memory. Therefore, the process that the first accelerator informs the first processor that the data to be processed is processed completely and the first processor informs the first accelerator to move the processing result from the auxiliary memory to the main memory is avoided.

Therefore, in the embodiment of the application, the number of times of interaction between the first processor and the first accelerator is small, and the process of the data processing method is simple, so that the data processing efficiency is high.

In addition, in order to further improve the data processing efficiency of the heterogeneous system, in the embodiment of the present application, a cache coherence bus with a higher transmission bandwidth may be used, for example, a cache coherence bus with a transmission bandwidth of 25 giga transmission/second (GT/s) may be used.

The embodiment shown in fig. 4 is exemplified by the heterogeneous system shown in fig. 1, and when the heterogeneous system includes a plurality of accelerators (such as the heterogeneous system shown in fig. 2 or fig. 3) connected to each other by using a cache coherence bus, the data processing method further involves an auxiliary accelerator in the heterogeneous system, and an auxiliary memory connected to the auxiliary accelerator.

For example, at this time, the data processing method may be as shown in fig. 5, where in fig. 5, the first processor is the processor 011 in fig. 2, the first accelerator is the accelerator 021 in fig. 2, the first accelerator-connected secondary memory (which may be referred to as the first secondary memory) is the secondary memory 041 in fig. 2, the auxiliary accelerator is the accelerator 022 in fig. 2, and the auxiliary accelerator-connected secondary memory is the secondary memory 042 in fig. 2 as an example. Referring to fig. 5, the data processing method may include:

s501, the processor 011 writes the data to be processed into an auxiliary memory 041 connected with an accelerator 021. S502 is performed.

S501 may refer to S401, and details of the embodiment of the present application are not described herein.

S502, the processor 011 sends a processing instruction of the data to be processed to the accelerator 021, wherein the processing instruction carries an identifier of the accelerator and an identifier of the processor 011, and the identifier of the accelerator is an identifier of an accelerator used for executing the processing instruction in the heterogeneous system. S503 is executed.

S502 may refer to S402, which is not described herein in detail in this embodiment of the present application.

In addition, in the embodiment of the present application, since the heterogeneous system includes a plurality of accelerators, in order to associate a processing instruction with an accelerator for executing the processing instruction, the processing instruction needs to carry an identification of the accelerator for executing the processing instruction. Also, in the case where the heterogeneous system includes multiple processors, in order to associate a processing instruction with the processor that issued the processing instruction, the processing instruction needs to carry an identification of the processor 011 that issued the processing instruction.

S503, the accelerator 021 detects whether the accelerator identifier in the processing instruction is the identifier of the accelerator 021. If the accelerator identifier in the processing instruction is the identifier of the accelerator 021, executing S504; if the accelerator id in the processing instruction is not the id of the accelerator 021, S508 is executed.

Since the heterogeneous system in the embodiment of the present application includes multiple accelerators, in order to avoid that the first processor sends the processing instruction incorrectly, after receiving the processing instruction sent by the processor 011, the accelerator 021 needs to detect whether an accelerator identifier carried in the processing instruction is the same as its own identifier, so as to determine whether it is an accelerator designated by the processor 011 for executing the processing instruction.

When the accelerator identification in a processing instruction is the identification of accelerator 021, that accelerator 021 can determine that it is the accelerator designated by processor 011 to execute that processing instruction. At this time, the accelerator 021 may execute S504 to perform corresponding data processing according to the processing instruction.

When the identification of the accelerator in a processing instruction is not that of the accelerator 021, the accelerator 021 may determine that it is not the accelerator designated by the processor 011 for executing the processing instruction. At this time, the accelerator 021 may execute S508 to trigger the accelerator 022 designated by the processor 011 to perform corresponding data processing according to the processing instruction.

S504, the accelerator 021 processes the data to be processed in the auxiliary memory 041 according to the processing instruction. S505 is executed.

S504 may refer to S403, which is not described herein again in this embodiment of the present application.

S505 and the accelerator 021 write the processing result of the data to be processed into the connected auxiliary memory 041. S506 is performed.

S505 may refer to S404, which is not described herein again in this embodiment of the present application.

S506, the accelerator 021 sends a processing response of the data to be processed to the processor 011. S507 is executed.

S506 may refer to S405, which is not described herein again in this embodiment of the present application.

S507, the processor 011 reads the processing result from the auxiliary memory 041 connected to the accelerator 021 according to the processing response sent by the accelerator 021.

S507 may refer to S406, which is not described herein again in this embodiment of the present application.

S508, the accelerator 021 writes the data to be processed into the secondary storage 042 connected with the accelerator 022 indicated by the accelerator identification.

S509, the accelerator 021 forwards a processing instruction of the data to be processed to the accelerator 022. S510 is performed.

Since the accelerators in the heterogeneous system are connected to each other in the embodiment of the present application, the accelerator 021 is connected to the accelerator 022, and the accelerator 021 can write the data to be processed into the secondary memory 042 connected to the accelerator 022 based on the connection and send the processing instruction to the accelerator 022.

S510, the accelerator 022 processes the data to be processed in the connected secondary memory 042 according to the processing instruction. S511 is performed.

The processing procedure in S510 may refer to the processing procedure in S403, and details of the embodiment of the present application are not described herein.

S511, the accelerator 022 writes the processing result of the data to be processed into the connected secondary memory 042. S512 is performed.

The process of writing the processing result in S511 may refer to the process of writing the processing result in S404, which is not described herein again in this embodiment of the present application.

S512, the accelerator 022 sends a processing response of the data to be processed to the processor 011 according to the identifier of the processor 011 carried by the processing instruction. S513 is performed.

Since the processing instruction carries the identifier of the processor 011 sending the processing instruction, after the accelerator 022 finishes executing the processing instruction sent by the accelerator 021, the accelerator 022 can send a processing response to the processor 011 according to the identifier of the processor 011 to instruct the processor 011 to acquire a processing result of the data to be processed. At this time, the processing response needs to carry the storage address of the processing result in the auxiliary memory 042 to which the accelerator 022 is connected.

S513, the processor 011 reads the processing result from the auxiliary memory 042 connected to the accelerator 022, based on the processing response transmitted from the accelerator 022.

The process of reading the processing result in S513 may refer to the process of reading the processing result in S406, which is not described herein again in this embodiment of the application.

It can be seen that, in the data processing method provided in the embodiment of the present application, the first accelerator (such as the accelerator 021 described above) or the auxiliary accelerator (such as the accelerator 022 described above) can assist the first processor (such as the processor 011 described above) in processing the data to be processed, so that the data processing capability of the entire heterogeneous system is higher.

In the data processing method, the first processor can directly write the data to be processed into the auxiliary memory (such as the auxiliary memory 041) connected to the first accelerator. Therefore, the process that the first processor informs the first accelerator to move the data to be processed from the main memory connected with the first processor to the auxiliary memory is avoided, and the process that the first accelerator moves the data to be processed is also avoided.

In addition, in the data processing method, the first accelerator or the auxiliary accelerator may directly write the processing result to the auxiliary memory, and the first processor may be capable of acquiring the processing result from the auxiliary memory. Therefore, the process that the first accelerator or the auxiliary accelerator informs the first processor that the data to be processed has been processed completely and the first processor informs the first accelerator or the auxiliary accelerator to move the processing result from the auxiliary memory to the main memory is avoided.

Therefore, in the embodiment of the present application, the number of interactions between the first processor and the first accelerator or the auxiliary accelerator is small, and the process of the data processing method is simple, so that the efficiency of data processing is high.

In addition, in order to further improve the data processing efficiency of the heterogeneous system, in the embodiment of the present application, a cache coherence bus with a higher transmission bandwidth may be used, for example, a cache coherence bus with a transmission bandwidth of 25 giga transmission/second (GT/s) may be used.

Alternatively, in the case that the heterogeneous system includes a plurality of accelerators, the processor in the heterogeneous system may control the plurality of accelerators to perform data processing in parallel based on the data processing method shown in fig. 5, thereby further improving the data processing efficiency of the entire heterogeneous system.

According to the above embodiments (such as S402 and S403 in the embodiment shown in fig. 4, or S502 and S504 in the embodiment shown in fig. 5), it is known that the first processor may trigger the first accelerator to process the data to be processed in the connected secondary memory according to the processing instruction. As can be seen from the above embodiments (such as S405 and S406 in the embodiment shown in fig. 4, or S506 and S507 in the embodiment shown in fig. 5), the first accelerator may trigger the first processor to read the processing result from the secondary memory to which the first accelerator is connected.

In the above embodiment, the first processor is triggered to execute data processing by sending a processing instruction to the first accelerator, and the first accelerator is triggered to read a processing result by sending a processing response to the first processor. Alternatively, the first processor may not trigger the first accelerator to perform data processing by sending a processing instruction to the first accelerator, and the first accelerator may not trigger the first processor to read a processing result by sending a processing response to the first processor.

For example, the storage space on the secondary storage may include three storage units, respectively: the processing unit comprises a data storage unit for storing data, an instruction storage unit for storing processing instructions, and a result storage unit for storing processing results. Furthermore, the I/O register in the first accelerator may have a correspondence relationship with the data storage unit, the instruction storage unit, and the result storage unit on the secondary memory to which the first accelerator is connected. The first processor and the first accelerator can both acquire the corresponding relationship and execute the data processing method based on the corresponding relationship.

For example, when writing the data to be processed into a certain data storage unit in the auxiliary memory connected to the first accelerator, the first processor may write the processing instruction into the instruction storage unit corresponding to the data storage unit according to the correspondence, and modify the state value of the I/O register corresponding to the data storage unit. The I/O register may have a plurality of state values, which may include: a first state value and a second state value. Before the first processor changes the state value of a certain I/O register in the first accelerator, the state value of the I/O register is a first state value; after the first processor changes the state value of an I/O register in the first accelerator, the state value of the I/O register changes to a second state value. When the first accelerator detects that the state value of a certain I/O register is changed into a second state value, the first accelerator can acquire a processing instruction from the instruction storage unit corresponding to the I/O register according to the corresponding relation, and read data to be processed from the data storage unit corresponding to the I/O register. And then, the first accelerator can process the data to be processed according to the processing instruction to obtain a processing result of the data to be processed.

After obtaining the processing result of the data to be processed, the first accelerator may modify the state value of the I/O register according to the corresponding relationship, and write the processing result of the data to be processed into the result storage unit corresponding to the I/O register. The plurality of state values of the I/O register may further include: a third state value. After the first accelerator obtains the processing result, the first accelerator may change the state value of the I/O register to a third state value. The first processor may detect whether an I/O register having a third state value is present in the first accelerator. When the first accelerator changes the state value of a certain I/O register to the third state value, the first processor may read the processing result of the data to be processed from the result storage unit corresponding to the I/O register according to the correspondence.

For example, assume that the correspondence relationship is as shown in table 2, and when the processor 011 does not change the state values of the I/O registers in the accelerator 021, the state values of the respective I/O registers are the first state values 0 shown in table 2. If the processor 011 writes the data to be processed into the data storage unit 1.1 and writes the processing instruction into the instruction storage unit 2.1, the processor 011 can also change the state value of the I/O register 3.1 from the first state value 0 to the second state value 1 as shown in table 3. At this time, the accelerator 021 can detect that the state value of the I/O register 3.1 is changed to the second state value 1, and can acquire the data to be processed from the data storage unit 1.1 corresponding to the I/O register 3.1, acquire the processing instruction from the instruction storage unit 2.1 corresponding to the I/O register 3.1, and process the data to be processed according to the processing instruction to obtain the processing result of the data to be processed. Thereafter, the accelerator 021 may write the processing result into the result storage unit 4.1 corresponding to the data storage unit 1.1, and change the state value of the I/O register 3.1 corresponding to the data storage unit 1.1 to the third state value 2 as shown in table 4. When detecting that the state value of the I/O register 3.1 is the third state value 2, the processor 011 can obtain the processing result of the data to be processed from the result storage unit 4.1 corresponding to the I/O register 3.1.

TABLE 2

Data storage unit Instruction memory unit Result storage unit I/O register Status value
1.1 2.1 4.1 3.1 0
1.2 2.2 4.2 3.2 0
1.3 2.3 4.3 3.3 0

TABLE 3

Data storage unit Instruction memory unit Result storage unit I/O register Status value
1.1 2.1 4.1 3.1 1
1.2 2.2 4.2 3.2 0
1.3 2.3 4.3 3.3 0

TABLE 4

Data storage unit Instruction memory unit Result storage unit I/O register Status value
1.1 2.1 4.1 3.1 2
1.2 2.2 4.2 3.2 0
1.3 2.3 4.3 3.3 0

It can be seen that, in the embodiment of the present application, the first processor may also trigger the first accelerator to perform data processing by changing the state value of the I/O register, and the first accelerator may also trigger the first processor to read a processing result by changing the state value of the I/O register.

Optionally, after the first processor reads the graduation processing result, the first accelerator may change the state value of the I/O register corresponding to the result storage unit where the processing result is located to the first state value. So that the next time the first processor can trigger the first accelerator to perform data processing by altering the state value of the I/O register, and the first accelerator triggers the first processor to read the processing result by altering the state value of the I/O register.

As can be seen from S509 and S510 in the embodiment shown in fig. 5, the first accelerator may trigger the auxiliary accelerator to process the data to be processed in the connected auxiliary memory according to the processing instruction. As can be seen from the above-mentioned S512 and S513 in the embodiment shown in fig. 5, the auxiliary accelerator may trigger the first processor to read the processing result from the auxiliary memory to which the auxiliary accelerator is connected.

The first accelerator is triggered to execute the execution of the data processing by sending a processing instruction to the second accelerator in S509 and S510, and the second accelerator is triggered to read the processing result by sending a processing response to the first processor in S512 and S513, for example. Alternatively, the first accelerator may not trigger the assist accelerator to perform data processing by sending a processing instruction to the assist accelerator, or the assist accelerator may not trigger the first processor to read a processing result by sending a processing response to the first processor.

For example, the first accelerator in S509 and S510 may trigger the first accelerator to perform the data processing procedure by changing the state value of the I/O register with reference to the first processor, and trigger the auxiliary accelerator to perform the data processing. The assist accelerator in S512 and S513 may trigger the first processor to read the processing result with reference to the first accelerator triggering the first accelerator to read the processing result by altering the state value of the I/O register. The embodiments of the present application are not described herein in detail.

The functions of each device in the heterogeneous system are briefly described in the above embodiments by using the data transmission method for the heterogeneous system, and the functional modules of each device in the heterogeneous system are further described below.

For example, fig. 6 is a schematic diagram of functional modules of a heterogeneous system according to an embodiment of the present application, and fig. 6 takes a group of connected processors, accelerators, and secondary memories in the heterogeneous system as an example, when the heterogeneous system includes multiple groups of such structures, reference may be made to fig. 6 for the functional modules in each group of structures.

As shown in fig. 6, the processor may include: an Application adaptation layer, an Application Programming Interface (API), an interprocess shared memory, and a cache coherent memory. The accelerator may include: the device comprises a cache consistency module and a processing module. The cache coherence memory in the processor is connected with the cache coherence module in the accelerator through a cache coherence bus. The cache consistency module and the processing module are both connected to the secondary memory.

In the processor, application software running in the processor can invoke the acceleration API by calling the application adaptation layer. The acceleration API is used to implement data conversion and control between application software and an accelerator. The interprocess shared memory is used for communication among a plurality of processes running in the processor. Cache coherent memory is used to enable communication between the processor and the accelerator.

In the accelerator, the processing module is configured to execute processing operations executed by the accelerator in the data processing method, and the processing module may further trigger the cache coherency module to execute read and write operations executed by the accelerator in the data processing method.

In the data processing method, the reading and writing of the data in the auxiliary memory by the processing modules in the processor and the accelerator are both realized by the cache consistency module. For example, when a processing module in the processor or the accelerator needs to read/write data in the secondary memory, a read/write request may be sent to the cache coherency module. The cache coherency module may generate a Request Agent (RA) (not shown in FIG. 6) for each received read/write request, and perform the corresponding read/write operation by the RA. In addition, each RA, when reading data on the secondary storage, will cache a copy of the data so that the next time the data can be read by reading the local copy, it is not necessary to read the data from the secondary storage.

The cache coherency module further comprises: a Host Agent (HA) (not shown in fig. 6) that manages all the RAs in the cache coherency module to achieve cache coherency.

For example, each RA needs to send a request for reading and writing data in the secondary storage to the HA first when reading and writing data in the secondary storage.

For an RA of multiple RAs for reading data (e.g., data such as a read processing instruction, data to be processed, or a processing result), the HA, upon receiving a request for reading data sent by the RA, gives the RA the right to read data in the secondary memory, and then the RA can read the data in the secondary memory.

For an RA (e.g., data such as a write processing instruction, data to be processed, and a processing result) in the multiple RAs for writing data to a certain address in the secondary memory, after receiving a request sent by the RA for writing data to the address, the HA needs to perform a consistency check to ensure that the RA HAs exclusive right to the address. For example, the HA may detect whether there are other RA's that cache copies of the data at this address at this time. If there is a copy of the data at this address in other RA caches, if a certain RA writes data to this address, it will cause the copy of the other RA caches to be inconsistent with the real data at this address. Therefore, in the process of consistency check, the HA will invalidate all the copies, and then give the RA for writing data to the address the right to write data to the address, and then the RA can write data to the address. In this way, it can be ensured that the data at the address read by each RA is consistent. It should be noted that, when the copy of the RA cache fails, if the RA needs to read the data again, the RA may initiate a request for reading the data in the secondary memory to the HA again because the copy fails.

Further, if the data processing method involves reading and writing the state value of the I/O register in the accelerator, the reading and writing process may also be implemented by the cache coherency module, so as to ensure the cache coherency of the state value of the I/O register in the accelerator. The process of reading and writing the state value of the I/O register is realized by the cache coherency module, and the process of reading and writing the auxiliary memory by the cache coherency module may be referred to, which is not described herein again in this embodiment of the present application.

As described above with reference to fig. 1 to fig. 6, the data processing method provided in the present application is described in detail, and according to the data processing method, the heterogeneous system provided in the embodiment of the present application includes: the system includes a first processor and a first accelerator connected, and a first secondary memory connected with the first accelerator.

The first processor is used for writing data to be processed into the first auxiliary memory; the first processor is also used for triggering the first accelerator to process the data to be processed in the first auxiliary memory according to the processing instruction; the first accelerator is used for writing a processing result of the data to be processed into the first auxiliary memory; the first accelerator is used for triggering the first processor to read the processing result from the first auxiliary memory.

Optionally, the first processor and the first accelerator are connected by a cache coherent bus.

Optionally, the cache coherency bus comprises: a CCIX bus, or a CXL bus.

Optionally, the cache coherency bus comprises: a CCIX bus, the first processor comprising: an ARM architecture processor; alternatively, the cache coherency bus comprises: a CXL bus, the first processor comprising: x86 architecture processor.

Optionally, the secondary memory comprises: HBM.

Optionally, the accelerator comprises: GPU, FPGA, or ASIC.

Optionally, the heterogeneous system comprises: the system comprises a plurality of accelerators connected with each other, wherein the first accelerator is any one of the plurality of accelerators; the processing instruction carries an accelerator identifier, and the accelerator identifier is the identifier of an accelerator used for executing the processing instruction in the plurality of accelerators; and the first accelerator is used for processing the data to be processed in the first auxiliary memory according to the processing instruction when the accelerator identifier is the identifier of the first accelerator.

Optionally, the heterogeneous system comprises: the system comprises a plurality of auxiliary memories and a plurality of processors, wherein the auxiliary memories are connected with a plurality of accelerators in a one-to-one correspondence mode, the processors are connected with one another, and the first processor is any one of the processors connected with the first accelerator; the processing instruction also carries an identifier of the first processor; the first accelerator is used for writing the data to be processed into an auxiliary memory connected with the auxiliary accelerator indicated by the accelerator identifier and triggering the auxiliary accelerator to process the data to be processed according to the processing instruction when the accelerator identifier is not the identifier of the first accelerator; the auxiliary accelerator is used for: after the data to be processed is processed according to the processing instruction, writing a processing result of the data to be processed into a connected auxiliary memory; and triggering the first processor to read a processing result from an auxiliary memory connected with the auxiliary accelerator according to the identification of the first processor carried by the processing instruction.

Alternatively, the plurality of accelerators are connected by a cache coherent bus and the plurality of processors are connected by a cache coherent bus.

It can be seen that, in the data processing method provided in the embodiment of the present application, the first accelerator can assist the first processor in processing the data to be processed, so that the data processing capability of the entire heterogeneous system is high.

In addition, in the data processing method, the first processor can directly write the data to be processed into the auxiliary memory connected with the first accelerator. Therefore, the process that the first processor informs the first accelerator to move the data to be processed from the main memory connected with the first processor to the auxiliary memory is avoided, and the process that the first accelerator moves the data to be processed is also avoided.

In addition, in the data processing method, the first accelerator may directly write the processing result to the secondary memory, and the first processor may be capable of acquiring the processing result from the secondary memory. Therefore, the process that the first accelerator informs the first processor that the data to be processed is processed completely and the first processor informs the first accelerator to move the processing result from the auxiliary memory to the main memory is avoided.

Therefore, in the embodiment of the application, the number of times of interaction between the first processor and the first accelerator is small, and the process of the data processing method is simple, so that the data processing efficiency is high.

Further, each data processing apparatus in the data processing system provided by the present application will be described below with reference to fig. 7 to 9.

For example, fig. 7 is a block diagram of a data processing apparatus provided in an embodiment of the present application, where the data processing apparatus may be a first accelerator in a data processing system provided in an embodiment of the present application. As shown in fig. 7, the data read/write apparatus includes:

the processing module 701 is configured to process, under the trigger of the first processor, to-be-processed data in the first auxiliary memory according to the processing instruction; the operation performed by the processing module 701 may refer to S403 or S504 (or the description related to S403 or S504), which is not described herein again in this embodiment of the present application.

A writing module 702, configured to write a processing result of the data to be processed into the first auxiliary memory; the operation performed by the writing module 702 may refer to S404 or S505 (or the description related to S404 or S505), which is not described herein again in this embodiment of the present application.

And a triggering module 703, configured to trigger the first processor to read a processing result from the first secondary memory. The operation performed by the trigger module 703 may refer to S405 or S506 (or the description related to S405 or S506), which is not described herein again in this embodiment of the present application.

Optionally, the data processing apparatus is further configured to perform other operations in the data processing method shown in fig. 5. For example, the processing module 701 is further configured to execute S503 in fig. 5, the writing module 702 is further configured to execute S508 in fig. 5, and the triggering module 703 is further configured to execute S509 in fig. 5. The specific flow of each module in the data processing apparatus for executing each step is described above with reference to fig. 4 and 5, and is not described here again.

By way of further example, fig. 8 is a block diagram of another data processing apparatus provided in an embodiment of the present application, where the data processing apparatus may be an auxiliary accelerator in a data processing system provided in an embodiment of the present application. The heterogeneous system comprises: the system comprises a plurality of processors connected with each other, a plurality of accelerators connected with each other, and a plurality of auxiliary memories connected with the accelerators one by one; the auxiliary accelerator and the first accelerator are any two connected accelerators in the plurality of accelerators. As shown in fig. 8, the data read/write apparatus includes:

the processing module 801 is configured to, under the trigger of the first accelerator, process data to be processed in an auxiliary memory connected to the auxiliary accelerator according to a processing instruction, where the processing instruction carries an identifier of a first processor connected to the first accelerator. For the operation performed by the processing module 801, reference may be made to the above-mentioned S510 (or the description related to S510), and details of the embodiment of the present application are not described herein.

A write-in module 802, configured to write a processing result of the data to be processed into a connected secondary memory; the operation performed by the writing module 802 may refer to the above S511 (or the description related to S511), and is not described herein again in this embodiment of the present application.

And the triggering module 803 is configured to trigger the first processor to read a processing result from an auxiliary memory connected to the auxiliary accelerator according to the identifier of the first processor carried by the processing instruction. For the operation performed by the triggering module 803, reference may be made to the above-mentioned S512 (or the description related to S512), which is not described herein again in this embodiment of the present application.

By way of further example, fig. 9 is a block diagram of another data processing apparatus provided in an embodiment of the present application, where the data processing apparatus may be a first processor in a data processing system provided in an embodiment of the present application. The heterogeneous system further comprises: the system includes a first accelerator connected to the first processor, and a first secondary memory connected to the first accelerator. As shown in fig. 9, the data read/write apparatus includes:

a writing module 901, configured to write data to be processed into a first auxiliary memory; the operation performed by the writing module 901 may refer to the above S401 or S501 (or the description related to S401 or S501), and is not described herein again in this embodiment of the application.

A triggering module 902, configured to trigger the first accelerator to process, according to the processing instruction, data to be processed in the first auxiliary memory; the operation performed by the trigger module 902 may refer to S402 or S502 (or the description related to S402 or S502), which is not described herein again in this embodiment of the present application.

A reading module 903, configured to read a processing result of the to-be-processed data from the first secondary memory under the trigger of the first accelerator. The operation performed by the reading module 903 may refer to S406 or S507 (or the description related to S406 or S507), which is not described herein again in this embodiment of the present application.

Optionally, the data processing apparatus is further configured to perform other operations in the data processing method shown in fig. 5. For example, the reading module 903 is further configured to execute S513 in fig. 5. The specific flow of each module in the data processing apparatus for executing each step is described above with reference to fig. 4 and 5, and is not described here again.

The embodiment of the application provides a computer storage medium, wherein a computer program is stored in the storage medium and used for executing any data processing method provided by the application.

The present application provides a computer program product including instructions, which when run on a computer device, causes the computer device to execute any one of the data processing methods provided in the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or can comprise one or more data storage devices, such as a server, a data center, etc., integrated with the available medium. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium (e.g., solid state disk), among others.

In this application, the terms "first" and "second," etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "at least one" means one or more, and "a plurality" means two or more, unless expressly defined otherwise.

Different types of embodiments such as the method embodiment and the apparatus embodiment provided by the embodiment of the present application can be mutually referred to, and the embodiment of the present application does not limit this. The sequence of operations in the method embodiments provided in the present application can be appropriately adjusted, and the operations can also be increased or decreased according to the situation, and any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application shall be covered by the protection scope of the present application, and therefore, the details are not repeated.

In the corresponding embodiments provided in the present application, it should be understood that the disclosed apparatus and the like may be implemented by other configuration modes. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

Modules described as separate components may or may not be physically separate, and components described as modules may or may not be physical units, may be located in one place, or may be distributed on multiple devices. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

27页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:动态存储器控制器及其使用方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!