Multi-core parallel hardware coding method and device suitable for JPEG

文档序号:912764 发布日期:2021-02-26 浏览:14次 中文

阅读说明:本技术 适用于jpeg的多核并行硬件编码方法和装置 (Multi-core parallel hardware coding method and device suitable for JPEG ) 是由 雷理 韦虎 张云 刘守浩 于 2020-10-26 设计创作,主要内容包括:本发明公开了适用于JPEG的多核并行硬件编码方法和装置,涉及图像编码技术领域。所述方法包括步骤:获取待编码图像,将待编码图像分割为多个MCU行;通过编码器进行多线程JPEG并行编码;其中,所述编码器包括多个JPEG编码核,通过多个JPEG编码核同时启动各自所在MCU行的编码以并行编码,在压缩完各自MCU行的末尾MCU块后,插入一个各自的重启标识符进行数据阻断;整张图像编码完成后,将多组MCU行的码流按顺序进行码流重组。本发明的多核基于MCU行展开并行计算,显著提升了JPEG图像的编码速度,并且不会降低压缩率,不带来额外内存开销。(The invention discloses a multi-core parallel hardware coding method and device suitable for JPEG, and relates to the technical field of image coding. The method comprises the following steps: acquiring an image to be coded, and dividing the image to be coded into a plurality of MCU rows; performing multi-thread JPEG parallel coding through a coder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to carry out parallel coding, and after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to carry out data blocking; and after the whole image is coded, carrying out code stream recombination on code streams of a plurality of groups of MCU lines in sequence. The multi-core of the invention is based on MCU line expansion parallel computation, thereby obviously improving the coding speed of JPEG images, not reducing the compression ratio and not bringing extra memory overhead.)

1. A multi-core parallel hardware coding method suitable for JPEG is characterized by comprising the following steps:

acquiring an image to be coded, and dividing the image to be coded into a plurality of MCU rows, wherein each MCU row comprises a plurality of MCU blocks;

performing multi-thread JPEG parallel coding through a coder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to perform parallel coding, after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to perform data blocking, and each JPEG coding core corresponds to a code stream of one group of MCU lines;

and after the whole image is coded, carrying out code stream recombination on code streams of a plurality of groups of MCU lines in sequence.

2. The method of claim 1, wherein: every other MCU row is interleaved with a restart identifier.

3. The method according to claim 1 or 2, characterized in that: when encoding, blocking the transmission of the DC coefficient value by circularly utilizing the restart identifier; the restarting identifier interpolation interval is marked by 0xFFDD, and two bytes of data are connected behind the 0xFFDD to indicate that one restarting identifier is interpolated between every two MCU blocks;

the restart identifier is marked by 0xFFD0-0xFFD7, and is incremented from 0xFFD0 to 0xFFD7 and then returns to 0xFFD0 when the restart identifier is inserted, and the steps are sequentially recycled.

4. The method of claim 3, wherein: and allocating an independent output code stream storage area for each JPEG coding core, so as to independently store the code streams of the corresponding MCU lines processed by each JPEG coding core in a segmented manner.

5. The method of claim 3, wherein: presetting an X byte storage space in a DRAM to store code streams of each group of MCU rows;

when a plurality of JPEG coding cores code a plurality of MCU rows which are adjacent up and down in parallel, setting a plurality of rows of code streams to be alternately filled into the DRAM every X bytes, so that the data of the plurality of rows of code streams which are adjacent up and down can be stored in a storage interval with a preset size;

and during recombination, after reading the complete multi-line code stream and splicing and writing the complete multi-line code stream into the DRAM, recovering the storage space corresponding to the recombined multi-line code stream, and storing the code stream finally.

6. The method of claim 1, wherein: the encoder comprises 4 JPEG coding cores which are Core0, Core1, Core2 and Core3 respectively, wherein the Core0 is responsible for coding of a 4N MCU line, the Core1 is responsible for coding of a 4N +1 MCU line, the Core2 is responsible for coding of a 4N +2 MCU line, the Core3 is responsible for coding of a 4N +3 MCU line, and N is an integer greater than or equal to 0.

7. The method of claim 6, wherein: after each MCU row is coded, storing the initial address and the code stream length of the MCU row code stream, and recombining the code streams of 4 groups of MCU rows as follows:

acquiring a code stream starting address and code stream length information of each MCU line;

sequentially reading corresponding 4 MCU line code streams according to the information;

and completing splicing the MCU row code streams in sequence, and writing the MCU row code streams to the DRAM.

8. The method of claim 7, wherein: distributing Stream _ st _ addr as the initial address of the final code Stream, and setting an initial line offset distance as line _ offset _ gap;

filling the subcode Stream corresponding to the 1 st to 4 th MCU line into the Core _ st _ addr by taking the Core _ st _ addr as Stream _ st _ addr + line _ offset _ gap; during recombination, reading and splicing the subcode streams of the 1 st to 4 th MCU lines, and after writing in Stream _ st _ addr, recovering the subcode Stream space of the 1 st to 4 th MCU lines which are already recombined in the Core _ st _ addr and storing the subcode streams; subsequently, recombining the 5 th to 8 th MCU lines, reading and splicing the subcode streams of the 5 th to 8 th MCU lines, and utilizing the recovered subcode Stream space of the 1 st to 4 th MCU lines when writing in Stream _ st _ addr; and circularly progressing until the code streams of all MCU rows are recombined.

9. A multi-core parallel hardware encoding apparatus suitable for JPEG, characterized by comprising:

the data dividing module is used for acquiring an image to be coded and dividing the image to be coded into a plurality of MCU rows, and each MCU row comprises a plurality of MCU blocks;

the encoding module is used for carrying out multi-thread JPEG parallel encoding through an encoder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to perform parallel coding, after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to perform data blocking, and each JPEG coding core corresponds to a code stream of one group of MCU lines;

and the recombination module is used for splicing the code streams in sequence and writing the code streams into the DRAM to complete the recombination of the code streams of the MCU rows.

10. The apparatus of claim 9, wherein: presetting an X byte storage space in a DRAM to store code streams of each group of MCU rows;

the coding module is configured to set multiple lines of code streams to be alternately filled into the DRAM every X bytes when multiple JPEG coding cores code multiple MCU lines which are adjacent up and down in parallel, so that the multiple lines of code stream data which are adjacent up and down can be limited in a storage interval with a preset size to be stored; and the recombination module is configured to, during recombination, read the complete multi-line code stream, and after splicing and writing the complete multi-line code stream into the DRAM again, recover the storage space corresponding to the recombined multi-line code stream, and store the code stream finally.

Technical Field

The invention relates to the technical field of image coding, in particular to a multi-core parallel hardware coding method and device suitable for JPEG.

Background

As the industrial machine vision gradually replaces the manual detection, the requirements on the resolution and the capturing frame rate of the captured image are higher and higher technically. In order to realize that machine vision completely replaces manual detection, a machine is required to be capable of quickly compressing and storing high-definition images, particularly in the fields such as security monitoring and aerial photography, the images are required to be compressed basically in real time, and new challenges are provided for the high-definition image quick compression technology.

JPEG, as a general international image compression standard, can provide good compression performance and good reconstruction quality, and is widely used in the field of image and video processing. In the era of high-definition shooting, software compression of pictures with tens of millions of pixels brings high expenses such as CPU (central processing unit), power consumption and the like, and particularly, the software compression coding speed is difficult to meet the requirements in application scenes such as high-definition image rapid continuous shooting and multi-path shooting. Therefore, specialized hardware accelerators are used in many fields such as industrial cameras for image compression. The JPEG hardware encoder mostly adopts a pipeline design, and uses an MCU (Minimum Coded Unit) as a pipeline Unit, and the main pipeline stages thereof are divided as shown in fig. 1. Fig. 1 includes 3 levels, each level of which is described as follows: stage 1 Src Fetch treatment: prefetching the source of the image from memory, and some simple conversion processing; level 2 DCT, QT processing: performing DCT transformation and quantization on 8x8 block in the MCU; level 3 Encopy Enc treatment: and entropy coding the quantized DC and AC values respectively.

The conventional JPEG encoding flow is shown in fig. 2. Mainly comprises the following steps:

1) the image is prefetched from the memory, and some simple operations such as zero level offset, angle rotation (0/90/180/270 degrees) are executed and then stored in the pipeline cache.

2) And performing DCT (discrete cosine transformation) on each 8x8 pixel block in the MCU to obtain Direct Current (DC) and Alternating Current (AC) coefficients so as to remove the spatial redundancy of the image.

3) The DC and AC coefficients are separately quantized. And the low-frequency fine quantization and the high-frequency coarse quantization are performed by using the DC and AC quantization matrixes designed according to the visual characteristics of human eyes, so that the visual redundancy is reduced.

4) The DC coefficient is subjected to differential coding (namely DPCM) and entropy coding (namely Huffman), and the AC coefficient is subjected to zig-zag scanning and Run-length coding (namely Run-Level) and then entropy coding, so that data redundancy is reduced.

In the encoding flow above JPEG, since the DC coefficient reflects the DC component contained in the DCT unit (8x8 pixel block), the data is usually large, and the DC coefficients of two adjacent DCT units usually have large correlation, so when the DC coefficient is differentially encoded in the step 4) above, the difference between the DC coefficient of the current 8x8 block and the DC coefficient of the previous 8x8 block is losslessly encoded, as shown in Diff in fig. 3i(difference).

In the coding process, because the DPCM coding on the DC coefficient has obvious data dependence among MCUs, the parallel algorithm is difficult to realize. Because of the parallel computation, the independence of the MCU data must be guaranteed.

For the parallel coding problem of JPEG, the prior art also provides a multi-core solution, for example, chinese patent application CN201910032350.9, which discloses a black-and-white image JPEG data coding method based on nvidia gpu: black and white data codes are transplanted to the nvidia GPU CUDA base based on a JPEG coding principle, and acceleration is realized by combining the nvidia GPU CUDA base by utilizing the characteristic of high speed and high parallelism of a GPU. For a multicore cpu, a coding algorithm can be specified to run on a certain core or a plurality of cores through software at a PC end, so that the coding speed is increased. However, the multi-core parallel coding scheme not only needs to set a special nvidia gpu CUDA library as a transplant library, but also is only suitable for black and white data, and has a small application range.

How to provide a multi-core extensible JPEG image coding method without extra memory overhead and with wide application range is a technical problem which needs to be solved urgently at present.

Disclosure of Invention

The invention aims to: the defects of the prior art are overcome, and the multi-core parallel hardware coding method and the device suitable for JPEG are provided. According to the coding scheme provided by the invention, the multi-core is based on JPEG MCU line development parallel computation, the multi-core can be expanded, the coding process of a JPEG image can be greatly accelerated, the real-time compression requirement is met, the compression rate cannot be reduced, and no extra memory overhead is brought.

In order to achieve the above object, the present invention provides the following technical solutions:

a multi-core parallel hardware coding method suitable for JPEG comprises the following steps:

acquiring an image to be coded, and dividing the image to be coded into a plurality of MCU rows, wherein each MCU row comprises a plurality of MCU blocks;

performing multi-thread JPEG parallel coding through a coder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to perform parallel coding, after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to perform data blocking, and each JPEG coding core corresponds to a code stream of one group of MCU lines;

and after the whole image is coded, carrying out code stream recombination on code streams of a plurality of groups of MCU lines in sequence.

Further, a restart identifier is inserted every other MCU row.

Further, when encoding, the restart identifier is circularly utilized to block the transmission of the DC coefficient value; the restarting identifier interpolation interval is marked by 0xFFDD, and two bytes of data are connected behind the 0xFFDD to indicate that one restarting identifier is interpolated between every two MCU blocks;

the restart identifier is marked by 0xFFD0-0xFFD7, and is incremented from 0xFFD0 to 0xFFD7 and then returns to 0xFFD0 when the restart identifier is inserted, and the steps are sequentially recycled.

Optionally, an independent output code stream storage area is allocated to each JPEG coding core, so that the code streams of the corresponding MCU line processed by each JPEG coding core are stored in segments independently.

Or, optionally, an X byte storage space is preset in the DRAM to store the code streams of each group of MCU rows;

when a plurality of JPEG coding cores code a plurality of MCU rows which are adjacent up and down in parallel, setting a plurality of rows of code streams to be alternately filled into the DRAM every X bytes, so that the data of the plurality of rows of code streams which are adjacent up and down can be stored in a storage interval with a preset size;

and during recombination, after reading the complete multi-line code stream and splicing and writing the complete multi-line code stream into the DRAM, recovering the storage space corresponding to the recombined multi-line code stream, and storing the code stream finally.

Further, the encoder comprises 4 JPEG encoding cores, namely Core0, Core1, Core2 and Core3, wherein the Core0 is responsible for encoding of a 4N MCU line, Core is responsible for encoding of a 4N +1 MCU line, Core2 is responsible for encoding of a 4N +2 MCU line, and Core3 is responsible for encoding of a 4N +3 MCU line, wherein N is an integer greater than or equal to 0.

Further, after the coding of each MCU row is finished, the initial address and the code stream length of the MCU row code stream are stored, and the steps of recombining the code streams of 4 groups of MCU rows are as follows:

acquiring a code stream starting address and code stream length information of each MCU line;

sequentially reading corresponding 4 MCU line code streams according to the information;

and completing splicing the MCU row code streams in sequence, and writing the MCU row code streams to the DRAM.

Further, Stream _ st _ addr is allocated as the initial address of the final code Stream, and an initial line offset distance is set as line _ offset _ gap;

filling the subcode Stream corresponding to the 1 st to 4 th MCU line into the Core _ st _ addr by taking the Core _ st _ addr as Stream _ st _ addr + line _ offset _ gap; during recombination, reading and splicing the subcode streams of the 1 st to 4 th MCU lines, and after writing in Stream _ st _ addr, recovering the subcode Stream space of the 1 st to 4 th MCU lines which are already recombined in the Core _ st _ addr and storing the subcode streams; subsequently, recombining the 5 th to 8 th MCU lines, reading and splicing the subcode streams of the 5 th to 8 th MCU lines, and utilizing the recovered subcode Stream space of the 1 st to 4 th MCU lines when writing in Stream _ st _ addr; and circularly progressing until the code streams of all MCU rows are recombined.

The invention also provides a multi-core parallel hardware coding device suitable for JPEG, which comprises the following structures:

the data dividing module is used for acquiring an image to be coded and dividing the image to be coded into a plurality of MCU rows, and each MCU row comprises a plurality of MCU blocks;

the encoding module is used for carrying out multi-thread JPEG parallel encoding through an encoder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to perform parallel coding, after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to perform data blocking, and each JPEG coding core corresponds to a code stream of one group of MCU lines;

and the recombination module is used for splicing the code streams in sequence and writing the code streams into the DRAM to complete the recombination of the code streams of the MCU rows.

Furthermore, an X byte storage space is preset in the DRAM to store code streams of each group of MCU rows;

the coding module is configured to set multiple lines of code streams to be alternately filled into the DRAM every X bytes when multiple JPEG coding cores code multiple MCU lines which are adjacent up and down in parallel, so that the multiple lines of code stream data which are adjacent up and down can be limited in a storage interval with a preset size to be stored;

and the recombination module is configured to, during recombination, read the complete multi-line code stream, and after splicing and writing the complete multi-line code stream into the DRAM again, recover the storage space corresponding to the recombined multi-line code stream, and store the code stream finally.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects as examples: according to the coding scheme provided by the invention, the multi-core is based on JPEG MCU line development parallel computation, the multi-core can be expanded, the coding process of a JPEG image can be greatly accelerated, the real-time compression requirement is met, the compression rate cannot be reduced, and no extra memory overhead is brought.

On one hand, the invention realizes multi-core parallel coding based on MCU rows by borrowing restart identifiers, and the compression ratio is little influenced by inserting 1 Byte (RST) according to the row number. Meanwhile, the multiple cores can be adjusted randomly according to the image size, more multiple cores are set according to the needs, and the method is good in expandability and wide in applicability.

On the other hand, the invention also provides a code stream recombination dynamic recovery and cyclic utilization method, which can obviously reduce the extra memory overhead brought by multi-core parallel coding.

Drawings

FIG. 1 is a schematic diagram of a pipeline design of a JPEG hardware encoder in the prior art.

Fig. 2 is a flowchart of JPEG encoding in the prior art.

FIG. 3 shows the difference Diff of DC coefficients in the prior arti

FIG. 4 is a diagram of an example of inserting RST in quad-core parallel code according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a storage and reassembly process of an output code stream of each coding core according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of the operation of code stream recombination dynamic recovery and cyclic utilization provided in the embodiment of the present invention.

Detailed Description

The multi-core parallel hardware coding method and device suitable for JPEG disclosed by the invention are further described in detail in the following with the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings.

It should be noted that the structures, proportions, sizes, and other dimensions shown in the drawings and described in the specification are only for the purpose of understanding and reading the present disclosure, and are not intended to limit the scope of the invention, which is defined by the claims, and any modifications of the structures, changes in the proportions and adjustments of the sizes and other dimensions, should be construed as falling within the scope of the invention unless the function and objectives of the invention are affected. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that described or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

Examples

To implement parallel computation, it is necessary to ensure that each MCU data is independent. According to the JPEG standard, an optional "ReStart Marker" (RST) mechanism is defined to allow resynchronization after transmission of partially compressed stream data errors at the time of processing. Since most JPEG images are transmitted through an error-free channel at present, the RST (or called restart identifier) is hardly used during normal encoding, and after the restart identifier is inserted into a coding block, the DC value of the first pixel block inside the restart identifier is reset to 0.

Based on the principle, the invention provides a multi-core parallel coding scheme based on MCU rows by borrowing restart identifiers. After the MCU is inserted into the restart identifier, the DC value of the first pixel block in the MCU is reset to be 0, so that the data dependence of DPCM coding on the DC coefficient can be blocked, and the multithreading JPEG parallel coding is realized. Since restart markers can be inserted at any interval of MCUs, the encoder can divide a compression task into N threads to be processed in parallel, and finally the compressed stream data is spliced to form a final compressed stream.

The invention provides a multi-core parallel hardware coding method suitable for JPEG, which comprises the following steps:

s100, dividing the image to be coded into a plurality of MCU lines, wherein each MCU line comprises a plurality of MCU blocks.

S200, carrying out multi-thread JPEG parallel coding through a coder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to perform parallel coding, after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to perform data blocking, and each JPEG coding core corresponds to a code stream of one group of MCU lines.

S300, after the whole image is coded, code streams of a plurality of groups of MCU lines are recombined in sequence.

In this embodiment, the restart identifier may be recycled to block the transfer of DC coefficient values during encoding. According to the JPEG protocol, the restart identifier inter-insertion interval can be marked by 0xFFDD, and two bytes of data follow the 0xFFDD to indicate how many MCU blocks are interleaved with one restart identifier. The restart identifier is marked with 8 flags from 0xFFD0 to 0xFFD7, and when inserted, increments from 0xFFD0 to 0xFFD7 and back to 0xFFD0, as shown in the following table, and is thus recycled in sequence.

Although theoretically each MCU could be followed by an RST insertion so that each MCU could compute in parallel, frequent insertion of 0xFFD0-0xFFD7 would reduce the compression rate. In this embodiment, to ensure the compression rate and reduce the complexity of multi-core scheduling, it is preferable to insert one RST every other MCU row.

The present embodiment will be described in detail with reference to fig. 4 to 6, taking a four-core parallel encoding scheme in which 4 JPEG encoding cores are provided as an example.

Referring to fig. 4, the encoder includes 4 JPEG encoding cores, Core0, Core1, Core2, and Core3, respectively. According to the parallel scheme, 4 JPEG coding cores simultaneously start the independent coding of the MCU line, the Core0 is responsible for the coding of the 4 Nth MCU line, the Core1 is responsible for the coding of the 4N +1 th MCU line, the Core2 is responsible for the coding of the 4N +2 th MCU line, and the Core3 is responsible for the coding of the 4N +3 th MCU line, wherein N is an integer greater than or equal to 0.

And after the row tail MCU blocks of the respective MCU rows are compressed, inserting a respective RST character. When the RST character is inserted at the end of the line, the RST character is sequentially inserted in the order of 0xFFD0 to 0xFFD7 according to the line sequence, and the RST character is sequentially circulated from 0xFFD0 to 0xFFD7 and back to 0xFFD 0. In the encoding process, each encoding core outputs the compressed code stream of the line where the encoding core is located.

In an implementation manner of this embodiment, an independent output code stream storage area is allocated to each JPEG encoding core, so that the code streams of the corresponding MCU line processed by each JPEG encoding core are stored in segments independently. Therefore, the code stream of the whole image needs to be stored in 4 spaces. Referring to fig. 5, the DRAM is divided into at least 5 spaces, 4 of which are used to store the segmented subcode streams processed by the coding cores Core0, Core1, Core2 and Core3, and 1 is used to store the final subcode stream. The signs Core0_ st _ addr, Core1_ st _ addr, Core2_ st _ addr and Core3_ st _ addr respectively represent the start addresses of the respective segmented sub-streams, and the sign Stream _ st _ addr represents the start address of the final Stream.

After the whole image is coded, the code streams of 4 groups of MCU lines need to be recombined (i.e. Reorder).

In this embodiment, after the coding of each MCU row is completed, the start address and length of the code stream of the MCU row may be stored to ensure that each code stream can be spliced correctly during the re-assembly. With reference to fig. 5, the step of recombining the code streams of 4 groups of MCU rows is as follows:

acquiring a code stream starting address and code stream length information of each MCU line;

sequentially reading corresponding 4 MCU line code streams according to the information;

and completing splicing the MCU row code streams in sequence, and writing the MCU row code streams to the DRAM.

In another embodiment of this embodiment, in consideration of that when four memory spaces are respectively allocated to the 4 nth MCU row, the 4N +1 th MCU row, the 4N +2 th MCU row, and the 4N +3 th MCU row for sub-compressed stream storage, an additional memory overhead is doubled, a code stream recombination dynamic recycling method is further provided, which can significantly reduce the additional memory overhead caused by multi-core parallel coding.

Referring to fig. 6, when the adjacent 4 MCU rows are encoded in parallel, an X-byte storage space with a fixed size may be preset, and the 4-segment code streams stored in segments are converted to be alternately filled into a DRAM (memory) every X bytes, so that a group of adjacent 4 rows of compressed code streams may be limited within a certain DRAM interval. During recombination, after the complete 4 lines of code streams are read and spliced again and written into the DRAM, the 4 lines of recombined subcode streams can be recycled and used for storing the final code streams.

Specifically, as an example and not by way of limitation, for example, Stream _ st _ addr is allocated as a start address of a final code Stream, and a start line offset distance is set to be line _ offset _ gap; in this embodiment, the row offset distance is at least 4 rows.

Let Core _ st _ addr be Stream _ st _ addr + line _ offset _ gap (at least 4 lines), perform the following steps:

1) during encoding, the subcode streams corresponding to the 1 st to 4 th MCU lines (line 0-line3 of the MCU) are filled in the Core _ st _ addr.

2) Reading and splicing the subcode streams of the 1 st to 4 th MCU lines (line 0-line3 of MCU) during recombination (Reorder), and recovering the 4 lines of subcode Stream space of line0-line3 which is already recombined in the Core _ st _ addr after writing into the Stream _ st _ addr for storing final code streams;

3) when the 5 th to 8 th MCU lines (MCU line4-line7) in the next period are recombined, the subcode streams of the 5 th to 8 th MCU lines are read and spliced by the recombination module, and when Stream _ st _ addr is written, because the 4-line subcode Stream space of line0-line3 can be utilized, the address of the written space after splicing is certainly smaller than the address of the line4-line7 subcode Stream.

With the circulation, the 4 MCU row code stream storage spaces in the previous period after the recombination are recovered and used for writing in the final code stream of the 4 MCU row code streams in the later period until the code streams of all the MCU rows are recombined. Since the segmented subcode stream is recycled for filling of the final code stream after subsequent recombination after reading, the subcode stream data which is not subjected to recombination (Reorder) is not disturbed after splicing, and thus, the dynamic cyclic utilization can save the extra memory overhead.

The invention further provides a multi-core parallel hardware coding device suitable for JPEG.

The device comprises a data dividing module, a coding module and a recombination module.

The data dividing module is used for acquiring an image to be coded and dividing the image to be coded into a plurality of MCU lines, and each MCU line comprises a plurality of MCU blocks.

The encoding module is used for carrying out multi-thread JPEG parallel encoding through an encoder; the encoder comprises a plurality of JPEG coding cores, the coding of MCU lines where the JPEG coding cores are located is started simultaneously through the plurality of JPEG coding cores to perform parallel coding, after the tail MCU blocks of the MCU lines are compressed, a respective restart identifier is inserted to perform data blocking, and each JPEG coding core corresponds to a code stream of one group of MCU lines.

And the recombination module is used for splicing the code streams in sequence and writing the code streams into the DRAM to complete the recombination of the code streams of the MCU rows.

In this embodiment, an X-byte storage space is preset in the DRAM to store the code streams of each group of MCU rows.

At this point, the encoding module is configured to: when a plurality of JPEG coding cores code a plurality of MCU rows which are adjacent up and down in parallel, a plurality of rows of code streams are set to be alternately filled into the DRAM every X bytes, so that the data of the plurality of rows of code streams which are adjacent up and down can be stored in a storage interval with a preset size.

The reassembly module is configured to: and during recombination, after reading the complete multi-line code stream and splicing and writing the complete multi-line code stream into the DRAM, recovering the storage space corresponding to the recombined multi-line code stream, and storing the code stream finally.

Other technical features are described in the previous embodiment and are not described in detail herein.

In the foregoing description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. Common terms found in dictionaries should not be interpreted too ideally or too realistically in the context of related art documents unless the present disclosure expressly limits them to that. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:硬件解码器流水线优化方法及应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类