Heterogeneous multi-core data transmission method, device, equipment and storage medium

文档序号:19990 发布日期:2021-09-21 浏览:45次 中文

阅读说明:本技术 异构多核数据的传输方法、装置、设备及存储介质 (Heterogeneous multi-core data transmission method, device, equipment and storage medium ) 是由 闫野鹤 梁天乐 吴飞红 黄晓峰 宋磊 贾惠柱 于 2021-08-25 设计创作,主要内容包括:本发明公开了一种异构多核数据的传输方法、装置、设备及存储介质,方法包括获取待处理的图像数据,并将所述图像数据分割成尺寸相等的图像块;根据所述图像块的尺寸以及每个知识产权核的本地存储器的容量计算每个知识产权核单次传输图像块的数量;每个知识产权核根据总线确定的数据读取顺序,并按照所述图像块的连接顺序依次将所述数量的图像块从共享存储器中传输到本地存储器中,每个知识产权核从本地存储器中读取图像数据并进行计算,直到所述图像数据处理完成。根据本实施例中的异构多核数据的传输方法,可以减少每个知识产权核对总线的占用时长,降低每个知识产权核数据存取过程中的延迟,使每个知识产权核单位时间内的算力最大。(The invention discloses a transmission method, a device, equipment and a storage medium of heterogeneous multi-core data, wherein the method comprises the steps of obtaining image data to be processed and dividing the image data into image blocks with equal sizes; calculating the number of the image blocks transmitted by each intellectual property core for one time according to the size of the image blocks and the capacity of a local memory of each intellectual property core; and each intellectual property core sequentially transmits the image blocks in the number from the shared memory to the local memory according to the data reading sequence determined by the bus and the connection sequence of the image blocks, and reads image data from the local memory and calculates the image data until the image data is processed. According to the transmission method of the heterogeneous multi-core data in the embodiment, the occupied time of each intellectual property right core bus can be shortened, the delay in the access process of each intellectual property right core data is reduced, and the calculation capacity in each intellectual property right core unit time is maximized.)

1. A method for transmitting heterogeneous multi-core data is characterized by comprising the following steps:

acquiring image data to be processed, and dividing the image data into image blocks with equal size;

calculating the number of the image blocks transmitted by each intellectual property core for one time according to the size of the image blocks and the capacity of a local memory of each intellectual property core;

and each intellectual property core sequentially transmits the image blocks in the number from the shared memory to the local memory according to the data reading sequence determined by the bus and the connection sequence of the image blocks, and reads image data from the local memory and calculates the image data until the image data is processed.

2. The method of claim 1, wherein partitioning the image data into equal-sized image blocks comprises:

obtaining the size of the image block through a dynamic programming algorithm and a preset segmentation rule;

and dividing the image data into image blocks with equal sizes according to the sizes of the image blocks.

3. The method of claim 1, wherein calculating the number of single-transmission image blocks for each intellectual property core based on the size of the image blocks and the capacity of the local memory of each intellectual property core comprises calculating the number of single-transmission image blocks for each intellectual property core based on the following formula:

wherein n represents the number of image blocks transmitted by the intellectual property core at a time, localmemory represents the capacity of the local memory of the intellectual property core, and size represents the size of the image blocks.

4. The method of claim 1, wherein each intellectual property core sequentially transfers the number of tiles from the shared memory to the local memory in a data reading order determined by the bus and in accordance with the order of connection of the tiles, each intellectual property core reading image data from the local memory and performing calculations comprises:

each intellectual property core averagely divides the local memory into a first local memory and a second local memory;

transmitting the number of image blocks from a shared memory to the first local memory, and calculating image data in the first local memory;

transferring the number of tiles from shared memory to the second local memory while computing image data in the first local memory;

and after the intellectual property right checks the image data in the first local storage, calculating the image data in the second local storage.

5. The method of claim 1, wherein each intellectual property core sequentially transfers the number of tiles from the shared memory to the local memory according to a data reading order determined by the bus and a connection order of the tiles, and after each intellectual property core reads image data from the local memory and performs the calculation, further comprising:

and dynamically adjusting the number of the image blocks transmitted by each intellectual property core for one time according to the data transmission duration of each intellectual property core for each time.

6. The method of claim 5, wherein dynamically adjusting the number of tiles transmitted per intellectual property core per time based on the duration of each intellectual property core's data transmission comprises:

calculating the average value of the data transmission duration of each intellectual property core at the current moment according to the data transmission duration and the data transmission times of each intellectual property core;

if the average value of the data transmission time length of the intellectual property core at the current moment is larger than a preset time length threshold, reducing the number of image blocks transmitted at one time of the intellectual property core by a preset value;

and if the average value of the data transmission time length of the intellectual property core at the current moment is less than a preset time length threshold, increasing the number of the image blocks transmitted at one time of the intellectual property core by a preset value.

7. A device for transmitting heterogeneous multi-core data, comprising:

the segmentation module is used for acquiring image data to be processed and segmenting the image data into image blocks with equal sizes;

the calculation module is used for calculating the number of the image blocks transmitted by each intellectual property core in a single time according to the size of the image blocks and the capacity of a local memory of each intellectual property core;

and the transmission module is used for sequentially transmitting the image blocks in the number from the shared memory to the local memory according to the data reading sequence determined by the buses and the connection sequence of the image blocks, and reading the image data from the local memory and calculating by each intellectual property core until the image data processing is finished.

8. The apparatus of claim 7, wherein the transmission module further comprises a dynamic adjustment unit configured to dynamically adjust the number of the image blocks transmitted by each intellectual property core for a single time according to a data transmission duration of each intellectual property core for each time.

9. A heterogeneous multi-core data transmission device comprising a processor and a memory storing program instructions, the processor being configured to perform the method of transmitting heterogeneous multi-core data according to any one of claims 1 to 6 when executing the program instructions.

10. A computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement a method for heterogeneous multi-core data transmission according to any one of claims 1 to 6.

Technical Field

The invention relates to the technical field of heterogeneous multi-core chips, in particular to a method, a device, equipment and a storage medium for transmitting heterogeneous multi-core data.

Background

On-chip multi-core processors have become the research development direction of multi-core processors. With the improvement of the design level and the manufacturing process of the integrated circuit, more and more resources can be integrated on the chip, and the on-chip multi-core system, especially the heterogeneous multi-core system, can exert the advantages of different computing units and accelerate the performance of different embedded applications. A computing system adopting a heterogeneous system structure can simultaneously utilize a plurality of computing modes, and can fully exert the advantages of different computing modes in different application fields. Therefore, integrating various feasible computing modes into one chip will become a development trend of the current heterogeneous computing system.

However, as the number of IP cores (intellectual property cores) integrated inside a chip increases, when a heterogeneous multi-core chip performs pipeline task processing in a multi-core cooperation manner, it becomes important that a plurality of IP cores reasonably utilize bandwidth.

Therefore, it is an urgent technical problem to be solved by those skilled in the art how to reduce the delay in the data access process of each IP core and improve the computational performance of the chip.

Disclosure of Invention

The embodiment of the disclosure provides a method, a device, equipment and a storage medium for transmitting heterogeneous multi-core data. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

In a first aspect, an embodiment of the present disclosure provides a method for transmitting heterogeneous multi-core data, including:

acquiring image data to be processed, and dividing the image data into image blocks with equal sizes;

calculating the number of the image blocks transmitted by each intellectual property core for one time according to the size of the image blocks and the capacity of a local memory of each intellectual property core;

and each intellectual property core sequentially transmits the image blocks in quantity from the shared memory to the local memory according to the data reading sequence determined by the bus and the connection sequence of the image blocks, and reads the image data from the local memory and calculates the image data until the image data processing is finished.

In one embodiment, partitioning image data into equal-sized image blocks comprises:

obtaining the size of the image block through a dynamic programming algorithm and a preset segmentation rule;

the image data is divided into image blocks of equal size according to the size of the image blocks.

In one embodiment, calculating the number of single-transmission image blocks for each intellectual property core based on the size of the image blocks and the capacity of the local memory of each intellectual property core comprises calculating the number of single-transmission image blocks for each intellectual property core based on the following formula:

wherein n represents the number of image blocks transmitted by the intellectual property core at a time, localmemory represents the capacity of the local memory of the intellectual property core, and size represents the size of the image blocks.

In one embodiment, each intellectual property core sequentially transfers a number of image blocks from the shared memory to the local memory according to a data reading order determined by the bus and a connection order of the image blocks, and reads image data from the local memory and performs calculation, including:

each intellectual property core averagely divides the local memory into a first local memory and a second local memory;

transmitting the number of image blocks from the shared memory to a first local memory, and calculating image data in the first local memory;

transferring the number of image blocks from the shared memory to the second local memory while calculating the image data in the first local memory;

after the intellectual property right checks the image data in the first local storage and the calculation is completed, the image data in the second local storage is calculated.

In one embodiment, after each intellectual property core reads image data from the local memory and performs calculation, the method further includes:

and dynamically adjusting the number of the image blocks transmitted by each intellectual property core for one time according to the data transmission duration of each intellectual property core for each time.

In one embodiment, dynamically adjusting the number of image blocks transmitted by each intellectual property core in a single time according to the data transmission duration of each intellectual property core comprises:

calculating the average value of the data transmission duration of each intellectual property core at the current moment according to the data transmission duration and the data transmission times of each intellectual property core;

if the average value of the data transmission time length of the intellectual property core at the current moment is larger than a preset time length threshold, reducing the number of image blocks transmitted at one time of the intellectual property core by a preset value;

and if the average value of the data transmission time length of the intellectual property core at the current moment is less than a preset time length threshold, increasing the number of the image blocks transmitted at one time of the intellectual property core by a preset value.

In a second aspect, an embodiment of the present disclosure provides a device for transmitting heterogeneous multi-core data, including:

the segmentation module is used for acquiring image data to be processed and segmenting the image data into image blocks with equal sizes;

the calculation module is used for calculating the number of the image blocks transmitted by each intellectual property core in a single time according to the size of the image blocks and the capacity of a local memory of each intellectual property core;

and the transmission module is used for sequentially transmitting the image blocks in quantity from the shared memory to the local memory according to the data reading sequence determined by the bus and the connection sequence of the image blocks, and reading the image data from the local memory and calculating until the image data is processed.

In one embodiment, the transmission module further includes a dynamic adjustment unit, configured to dynamically adjust the number of the image blocks transmitted by each intellectual property core for a single time according to the data transmission duration of each intellectual property core for each time.

In a third aspect, an embodiment of the present disclosure provides a heterogeneous multi-core data transmission device, which includes a processor and a memory storing program instructions, where the processor is configured to execute the heterogeneous multi-core data transmission method provided in the foregoing embodiment when executing the program instructions.

In a fourth aspect, the present disclosure provides a computer-readable medium, on which computer-readable instructions are stored, where the computer-readable instructions are executable by a processor to implement a method for transmitting heterogeneous multi-core data provided in the foregoing embodiments.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the heterogeneous multi-core data transmission method provided by the embodiment of the disclosure, when each IP core reads image data, the data in the shared memory is directly transported to the local memory, and then each IP core reads the data from the local memory, so that the occupation of the bus at one time is reduced, the competition of the multi-core to the bus is reduced, and the delay of data access is reduced. In order to further reduce the long-time occupation of the bus, when each IP core carries data from the shared memory, the whole carrying mode is not adopted, but the carrying mode of the image block matched with the size of the local memory is adopted, and the occupation of the bus is further reduced.

In summary, according to the transmission method of heterogeneous multi-core data in the embodiment, the occupied time of each intellectual property right core bus can be reduced, the delay in the access process of each intellectual property right core data is reduced, the calculation power in each intellectual property right core unit time is maximized, and the calculation performance of a chip is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flowchart illustrating a method for transmitting heterogeneous multi-core data according to an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating a prior art IP core data access flow in accordance with an illustrative embodiment;

FIG. 3 is a schematic diagram illustrating a prior art IP core data access flow in accordance with an illustrative embodiment;

FIG. 4 is a schematic diagram illustrating an IP core data access flow in the present application, in accordance with an exemplary embodiment;

FIG. 5 is a diagram illustrating a method for segmenting an image block, according to an exemplary embodiment;

FIG. 6 is a diagram illustrating a structure of a heterogeneous multi-core data transmission apparatus according to an example embodiment;

FIG. 7 is a block diagram illustrating a heterogeneous multi-core data transmission device according to an example embodiment;

FIG. 8 is a schematic diagram illustrating a computer storage medium in accordance with an exemplary embodiment.

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

When different IP cores of the multi-core heterogeneous chip process data, the calculation power of the chip can reach a theoretical design peak value as if each IP core is in a calculation state at a moment and the bandwidth of the chip is not a bottleneck.

Supposing that the theoretical maximum computing power of a single IP core is Tops, the unit time is t, the delay of each IP core bus is l, and the actual computing power model formula of the heterogeneous chips with m IP cores in the unit time is as follows:

according to the model, the smaller the bus delay in unit time, the more the computational power of the heterogeneous chip can reach the theoretical peak value. In order to exert the computing power of heterogeneous chips, the occupation of a single IP check bus in unit time needs to be reduced. And each IP core in unit time uses the bus in a balanced manner, and matching and balancing are carried out according to the calculation power of each IP core and the data access capability of the bus. Therefore, the application provides a data transmission method for reducing the occupation of the bus by the IP core.

Fig. 1 is a schematic flow chart illustrating a transmission method of heterogeneous multi-core data according to an exemplary embodiment, and referring to fig. 1, the method specifically includes the following steps;

s101, image data to be processed is obtained, and the image data is divided into image blocks with equal sizes.

At present, each IP core adopts a data access mode of a whole image when processing image data, and an independent IP core needs to occupy a bus for a long time. In order to reduce the occupation of the bus of the IP core, the application provides a data access mode of the image block, and the whole image is divided into the image blocks with equal sizes according to the processing capacity and the bandwidth access capacity of the IP core of the image data to be processed.

The method comprises the steps of dividing the whole image into a plurality of image blocks, determining the length and the width of the image blocks on the premise of occupying reasonable bandwidth and exerting the maximum calculation force of an IP core, and determining the size of the image blocks. The division of the image block follows the following division rules under certain conditions of a hardware computing unit:

(1) data transfer is performed in a square data size as much as possible, and it is preferable to divide the image block into squares, for example, 64x64 and 128x128, and if the above requirements cannot be met, the length and width of the image block need to be satisfied according to the characteristics of Direct Memory Access (DMA): the width of the image block is larger than the requirement of the length so as to improve the transmission efficiency of the DMA.

(2) Within the constraints of the local memory size, the area of the image block should be as large as possible to reduce the number of DMA transfers.

(3) The length and width sizes of the individual image blocks must be aligned with 8 bytes.

Further, according to the above segmentation rule, a DP (dynamic programming) algorithm is used to calculate the size of the image block.

The dynamic programming algorithm is generally used for solving an optimal solution of a problem, and the problem state and the relation between the states are defined by splitting the problem, so that the problem can be solved in a recursion mode. The basic idea of the dynamic programming algorithm is similar to that of a divide-and-conquer method, and the problem to be solved is decomposed into a plurality of sub-problems, and the sub-stages and the solution of the former sub-problem are solved in sequence, so that useful information is provided for the solution of the latter sub-problem. When any sub-problem is solved, various possible local solutions are listed, the local solutions which are possibly optimal are retained through decision, and other local solutions are discarded. And solving the sub-problems in sequence, wherein the last sub-problem is the solution of the initial problem.

In one possible implementation, an array of depth N is created: r [0.. N ];

r[0] = 0;

for j = 1 to n

q = -∞;

fori = 1 to j

q = max(q,p[i] + r[j - i]);

r[j] = q;

return r[N];

wherein i represents the length of the image block, j represents the width of the image block, and the value which maximizes the area of the image block is returned by dynamic programming to obtain the optimal solution.

According to the step, the size of the image block can be obtained through a dynamic programming algorithm and a preset segmentation rule, and the image data is segmented into the image blocks with the same size according to the size of the image block. Fig. 5 is a schematic diagram illustrating a method for dividing an image block according to an exemplary embodiment, where as shown in fig. 5, an entire image is divided into a plurality of image blocks, each of which has an equal length and width, and each of the image blocks is connected in an order of 20 image blocks including tile0-tile19 from top to bottom and from left to right.

By dividing the whole image into a plurality of image blocks, a specific IP core does not occupy a bus for a long time when carrying out data transportation, so that the delay of data access of other IP cores is increased; moreover, the IP core calculates data in an image block mode, so that a local memory inside the IP core can be reasonably utilized, and the delay of data access of the IP core is reduced.

S102, calculating the number of the image blocks transmitted by each IP core once according to the size of the image blocks and the capacity of the local memory of each IP core.

In one embodiment, according to the above steps, a segmented image block size may be derived, and calculating the number of single-transmission image blocks per IP core according to the segmented image block size and the capacity of the local memory of each IP core includes calculating the number of single-transmission image blocks per IP core according to the following formula:

wherein n represents the number of image blocks transmitted by the IP core at a time, localmemory represents the capacity of the local memory of the IP core, and size represents the size of the image blocks.

Through the step, the optimal number of the image blocks to be accessed each time is calculated, and the IP core only carries n image blocks each time when carrying data, so that the data volume accessed each time is reduced, and the occupation time of a single IP core chip bus is reduced.

S103, each IP core sequentially transmits the calculated number of image blocks from the shared memory to the local memory according to the data reading sequence determined by the bus and the connection sequence of the image blocks, and reads and calculates the image data from the local memory until the image data processing is finished.

And after the number of the image blocks which are conveyed by each IP core in a single time is calculated, the IP cores start to convey the image data. For a heterogeneous multi-core chip, to implement data sharing between cores, a shared memory (Share RAM) mode is used to send data to be processed to a corresponding IP core for processing. The IP core has to process the data of the shared memory through two steps to transport the data of the shared memory to the IP core. As shown in fig. 2, first, the IP core must issue a read instruction to the bus to transfer data in the shared memory into the Private memory (Private RAM). Then, as shown in fig. 3, the IP core issues a read instruction to the bus to read the data to be processed from the private memory, thereby completing one data access.

It can be seen that, in the data access method in the prior art, data calculation of a shared memory needs to be completed once, two buses need to be occupied, for frequent data reading and writing of multiple cores, an IP core is in an idle state when reading data from the buses, and data calculation cannot be performed, and at this time, bus delay becomes a main bottleneck that restricts the IP core from exerting calculation power. Based on the above description, if the maximum power of the IP core is to be exerted, the occupation time and the number of times of the bus of the IP core must be reduced, so the present embodiment proposes a Local Memory (Local Memory) based data transfer method, as shown in fig. 4, when data is read from the shared Memory in heterogeneous multi-core communication, the IP core directly transfers the data in the shared Memory to the Local Memory, and then the IP core reads the data from the Local Memory, thereby reducing the occupation of the bus once, reducing the contention of the multi-core to the bus, and reducing the delay of data access.

In one exemplary scenario, each IP core sequentially transmits image blocks to be transmitted from the shared memory to the local memory according to a data reading order determined by the bus and a connection order of the image blocks, for example, an image is divided into 20 image blocks, the number of image blocks transmitted per IP core is 5, the first IP core transfers the 5 tiles of tile0-tile4 from the shared memory to the local memory, the second IP core transfers the 5 tiles of tile5-tile9 from the shared memory to the local memory, the third IP core transfers the 5 tiles of tile10-tile14 from the shared memory to the local memory, the fourth IP core transfers the 5 tiles of tile15-tile19 from the shared memory to the local memory, and the data reading sequence of the first IP core, the second IP core, the third IP core and the fourth IP core is determined according to the interaction with the bus.

Further, in order to fully exert the maximum calculation power of each IP core, the local memory of each IP core is averagely divided into a first local memory and a second local memory, when the IP core calculates data, firstly, image blocks to be transmitted are transmitted to the first local memory from the shared memory, and the IP core is triggered to calculate the image data in the first local memory; and when the image data in the first local memory is calculated, the IP core transmits the image blocks to be transmitted in the next calculation quantity from the shared memory to the second local memory by using the DMA accelerator, and calculates the image data stored in the second local memory after the IP core finishes calculating the image data in the first local memory. Through the steps, the calculation performance of the IP core can be fully exerted, and the data processing efficiency is improved.

In one possible implementation, when multiple IP cores are combined together, the number of image blocks accessed by each IP core needs to be dynamically adjusted, and only then, the bus occupancy of each IP core can reach a dynamic balance.

Specifically, dynamically adjusting the number of the image blocks transmitted by each IP core for a single time according to the data transmission duration of each IP core includes: first, according to the data transmission time length and the data transmission times of each IP core, a data transmission time length average value of each IP core at the current time is calculated, for example, a data transmission time length average value of the ninth time = (a data transmission time length average value of the previous 8 times + a data transmission time length of the ninth time)/2.

Further, if the average value of the data transmission durations of the IP core at the current time is greater than the preset duration threshold, it indicates that the bandwidth of the current chip is in a state of serious shortage, and it is necessary to reduce the occupied time of the bus of the IP core, at this time, the number of the carried image blocks can be appropriately reduced, so as to achieve the balance of the bandwidths of the IP cores, and the number of the image blocks for single transmission of the IP core is reduced by taking the number n calculated in step S102 as a reference, so as to reduce the preset value. Wherein, the reduction amount can be set by the technicians in the field according to the actual condition of the hardware.

And if the average value of the data transmission time length of the IP core at the current moment is less than the preset time length threshold, increasing the number of the image blocks transmitted for one time of the IP core by a preset value, wherein the increased number can be set by a person skilled in the art according to the actual condition of hardware.

In an exemplary scenario, the preset duration threshold is 200 clocks, if the average transmission duration of the IP core is greater than 200 clocks, the number of single-transmission image blocks may be appropriately decreased, and if the average transmission duration of the IP core is less than 200 clocks, the number of single-transmission image blocks may be appropriately increased, where the number n of single-transmission image blocks calculated in the above steps is taken as a reference: n +/-delta n, the dynamic change of the carrying quantity is maintained, and the balanced utilization of the multi-core heterogeneous bandwidth can be realized. Wherein, the preset time length threshold can be set by the technicians in the field according to the actual condition of hardware.

Optionally, the preset duration threshold in this embodiment may also be a preset duration range, for example, 180clock to 210clock, if the average transmission duration of the IP core is greater than the preset duration range, the number of the single-transmission image blocks may be appropriately reduced, and if the average transmission duration of the IP core is less than the preset duration range, the number of the single-transmission image blocks may be appropriately increased.

Optionally, the local Memory in the IP core in the present application may be an SRAM (Static Random-Access Memory), which has an advantage of being fast compared to a RAM (Random Access Memory).

According to the transmission method of the heterogeneous multi-core data provided by the embodiment of the application, each IP core directly transmits the data in the shared memory to the local memory, so that the number of times of bus occupation can be reduced, the bus is occupied only once, the competition of the multi-core to the bus is reduced, the delay of data access is reduced, the bus occupation is further reduced by adopting the transmission method of the image blocks matched with the capacity of the local memory, and the data transmission efficiency and the computing performance of the chip are greatly improved.

An embodiment of the present disclosure further provides a device for transmitting heterogeneous multi-core data, where the device is configured to execute the method for transmitting heterogeneous multi-core data in the foregoing embodiment, and as shown in fig. 6, the device includes:

the segmentation module 601 is configured to obtain image data to be processed, and segment the image data into image blocks with equal sizes;

a calculating module 602, configured to calculate, according to the size of the image block and the capacity of the local memory of each IP core, the number of image blocks transmitted by each IP core;

and a transmission module 603, configured to transmit, by each IP core, the number of image blocks from the shared memory to the local memory in sequence according to the data reading order determined by the bus and the connection order of the image blocks, and each IP core reads image data from the local memory and performs calculation until the image data processing is completed.

In one embodiment, the transmission module further includes a dynamic adjustment unit, configured to dynamically adjust the number of the image blocks transmitted by each IP core in a single time according to the data transmission duration of each IP core in each time.

It should be noted that, when the transmission apparatus for heterogeneous multi-core data provided in the foregoing embodiment executes the transmission method for heterogeneous multi-core data, only the division of the functional modules is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the transmission apparatus for heterogeneous multi-core data and the transmission method embodiment for heterogeneous multi-core data provided in the foregoing embodiments belong to the same concept, and details of implementation processes are referred to in the method embodiments and are not described herein again.

The embodiment of the present disclosure further provides an electronic device corresponding to the method for transmitting heterogeneous multi-core data provided in the foregoing embodiment, so as to execute the method for transmitting heterogeneous multi-core data.

Referring to fig. 7, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 7, the electronic apparatus includes: the processor 700, the memory 701, the bus 702 and the communication interface 703, wherein the processor 700, the communication interface 703 and the memory 701 are connected through the bus 702; the memory 701 stores a computer program that can be executed on the processor 700, and the processor 700 executes the method for transmitting heterogeneous multi-core data provided by any of the foregoing embodiments when executing the computer program.

The Memory 701 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 703 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 702 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 701 is configured to store a program, and the processor 700 executes the program after receiving an execution instruction, where the method for transmitting heterogeneous multi-core data disclosed in any embodiment of the present application may be applied to the processor 700, or implemented by the processor 700.

The processor 700 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 700. The Processor 700 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 701, and the processor 700 reads the information in the memory 701, and completes the steps of the method in combination with the hardware thereof.

The electronic device provided by the embodiment of the application and the transmission method of the heterogeneous multi-core data provided by the embodiment of the application have the same inventive concept and the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 8, the computer-readable storage medium is an optical disc 800, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program may execute the method for transmitting heterogeneous multi-core data provided in any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above embodiments of the present application and the transmission method of heterogeneous multi-core data provided by the embodiments of the present application have the same inventive concept, and have the same beneficial effects as methods adopted, run, or implemented by application programs stored in the computer-readable storage medium.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:人工智能芯片及其操作方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!