Hardware decoder pipeline optimization method and application

文档序号：912765 发布日期：2021-02-26 浏览：22次中文

阅读说明：本技术 硬件解码器流水线优化方法及应用 (Hardware decoder pipeline optimization method and application ) 是由雷理张云韦虎占坤谢峥江焕承于 2020-10-26 设计创作，主要内容包括：本发明公开了一种硬件解码器流水线优化方法及应用,涉及视频解码技术领域。一种硬件解码器流水线优化方法,将AVC视频序列的解码划分为多级流水线结构,该方法规定了熵解码、反量化、反DCT变换、帧内预测、帧间预测、图像重建和去块滤波过程的协同工作方式；使反量化过程与熵解码过程位于同一流水级,反DCT变换过程位于反量化过程所在流水级的下一流水级；通过熵解码反量化单元对宏块进行熵解码和反量化处理,将反量化处理后的多个像素点的数据同步输入到反变换单元进行多点并行的反DCT变换处理。本发明可以节省IQT运算与IDCT运算处于同一流水级时带来的额外乘法器资源开销。(The invention discloses a hardware decoder pipeline optimization method and application, and relates to the technical field of video decoding. A hardware decoder pipeline optimization method divides the decoding of AVC video sequence into multi-stage pipeline structure, the method stipulates the cooperative working mode of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering process; enabling the inverse quantization process and the entropy decoding process to be located at the same pipeline level, and enabling the inverse DCT transformation process to be located at a pipeline level next to the pipeline level where the inverse quantization process is located; the entropy decoding and inverse quantization unit is used for carrying out entropy decoding and inverse quantization processing on the macro block, and the data of a plurality of pixel points after inverse quantization processing are synchronously input to the inverse transformation unit for carrying out multipoint parallel inverse DCT transformation processing. The invention can save the extra multiplier resource overhead brought by the fact that the IQT operation and the IDCT operation are in the same pipeline level.)

1. A hardware decoder pipeline optimization method divides the decoding of AVC video sequence into multi-stage pipeline structure, the method stipulates the cooperative working mode of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering process, characterized in that:

enabling the inverse quantization process and the entropy decoding process to be located at the same pipeline level, and enabling the inverse DCT transformation process to be located at a pipeline level next to the pipeline level where the inverse quantization process is located;

the entropy decoding and inverse quantization unit is used for carrying out entropy decoding and inverse quantization processing on the macro block, and the data of a plurality of pixel points after inverse quantization processing are synchronously input to the inverse transformation unit for carrying out multipoint parallel inverse DCT transformation processing.

2. The method of claim 1, wherein: the decoding of an AVC video sequence is divided into a 4-level pipeline structure, the 1 st level corresponds to entropy decoding inverse quantization processing, the 2 nd level corresponds to inverse DCT transformation processing, the 3 rd level corresponds to intra-frame prediction, inter-frame prediction and image reconstruction processing, and the 4 th level corresponds to deblocking filtering processing.

3. The method according to claim 1 or 2, characterized in that: the entropy decoding inverse quantization unit comprises a residual analysis module and an inverse quantization calculation module,

the residual error analyzing module analyzes pixel point residual errors one by one in sequence according to the zigzag sequence, and each time one pixel point residual error is analyzed by the residual error analyzing module, the residual error is sent to the inverse quantization calculating module for inverse quantization processing, and the processed data is stored in the entropy decoding output macro block cache.

4. The method of claim 3, wherein: when the residual error is subjected to inverse quantization processing through the inverse quantization calculation module, the residual error coefficient transformation process comprises AC coefficient inverse quantization processing and DC coefficient inverse quantization processing, and before the DC coefficient inverse quantization processing, DC inverse Hadamard transformation is performed through the residual error analysis module.

5. The method of claim 4, wherein: the inverse quantization process of the residual error includes the steps of,

obtaining a residual analysis state;

when the residual analysis is determined to be idle, the DC coefficient R of the 4x4 block of the luminance component Y of the macroblock is analyzed by the luminance DC analysis submodule Y _ DC _ DEC_{ij_DC_Y}And then the DC coefficient R of the 4x4 block is subjected to a luminance DC inverse quantization sub-module Y _ DC _ IQT_{ij_DC_Y}After inverse Hadamard transform and inverse quantization, a residual error value Q after inverse quantization is obtained_{ij_DC_Y}(ii) a At the same time, the AC coefficient R of the luminance component Y is resolved by a luminance AC resolution submodule Y _ AC _ DEC_{ij_AC_Y}Inverse quantization is carried out to obtain a residual error value Q after inverse quantization_{ij_AC_Y}；

Then, the DC coefficient R of the 2 × 2 block of the chrominance component UV of the macroblock is resolved by the chrominance DC resolution submodule UV _ DC _ DEC_{ij_DC_UV}And then the chroma DC inverse quantization submodule UV _ DC _ IQT is used for carrying out inverse quantization on the DC coefficient R of the 2x2 block_{ij_DC_UV}After inverse Hadamard transform and inverse quantization, a residual error value Q after inverse quantization is obtained_{ij_DC_UV}(ii) a At the same time, the AC coefficient R of the chrominance component UV is resolved by a chrominance AC resolution submodule UV _ AC _ DEC_{ij_AC_UV}Inverse quantization is carried out to obtain a residual error value Q after inverse quantization_{ij_AC_UV}。

6. A data processing apparatus for video decoding, dividing the decoding of AVC video sequences into a multi-level pipeline structure, characterized by:

when the data processing device is used for setting a cooperative working mode of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering processes, the inverse quantization process and the entropy decoding process are positioned at the same pipeline level, and the inverse DCT transformation process is positioned at a next pipeline level of the pipeline level where the inverse quantization process is positioned;

the data processing device comprises an entropy decoding inverse quantization unit and an inverse transformation unit;

the entropy decoding and inverse quantization unit can perform entropy decoding and inverse quantization processing on the macro block and synchronously input data of a plurality of pixel points subjected to inverse quantization processing into the inverse transformation unit;

the inverse transformation unit can perform multipoint parallel inverse DCT transformation processing on the data of a plurality of synchronously input pixel points.

7. The data processing apparatus of claim 6, wherein: the decoding of an AVC video sequence is divided into a 4-level pipeline structure, the 1 st level corresponds to entropy decoding inverse quantization processing, the 2 nd level corresponds to inverse DCT transformation processing, the 3 rd level corresponds to intra-frame prediction, inter-frame prediction and image reconstruction processing, and the 4 th level corresponds to deblocking filtering processing.

8. The data processing apparatus of claim 6, wherein: the entropy decoding inverse quantization unit comprises a residual analysis module and an inverse quantization calculation module,

9. A video decoder system that employs macroblock-level pipelining, characterized by: the video decoder system comprises decoding firmware and a multi-core hardware decoding accelerator which are in communication connection, wherein the decoding firmware is used for analyzing non-entropy coding data on an upper layer of a video code stream, and the multi-core hardware decoding accelerator is used for processing a decoding task of a macro block layer in the video code stream;

the multi-core hardware decode accelerator comprising the data processing apparatus of any of claims 6-8.

10. A video decoding method, characterized by comprising the steps of:

receiving video code stream data;

analyzing non-entropy coding data on the upper layer of the video code stream through a decoding firmware, and processing a decoding task of a macro block layer in the video code stream through a multi-core hardware decoding accelerator;

wherein the content of the first and second substances,

the decoding tasks of the macro block layer comprise entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering processes, when the cooperative working mode of all the processes is set, the inverse quantization process and the entropy decoding process are positioned at the same stream level, and the inverse DCT transformation process is positioned at the next stream level of the stream level where the inverse quantization process is positioned; the entropy decoding and inverse quantization unit is used for carrying out entropy decoding and inverse quantization processing on the macro block, and the data of a plurality of pixel points after inverse quantization processing are synchronously input to the inverse transformation unit for carrying out multipoint parallel inverse DCT transformation processing.

Technical Field

The invention relates to the technical field of video decoding, in particular to a hardware decoder pipeline optimization method and application.

Background

For high definition Video, if software full Video decoding is adopted, it will bring large cost of CPU, power consumption, etc., so the industry usually adopts special hardware accelerator as Video Decoder (VDEC (Video Decoder) for Video decoding, taking common single-core hardware Decoder as an example, the single-core hardware Decoder mostly adopts pipeline design, taking Macro Block (MB) as a pipeline unit, taking AVC (Advanced Video Coding, one of the current mainstream Video compression standards) as an example, its main pipeline division can be seen in fig. 1, fig. 1 includes 4 levels, each level is described as following, first level entry Dec: Entropy decoding (CABAC/CAVLC), second level IQT (inverse quantization), IDCT (also known as inverse DCT transform or inverse DCT transform), inverse quantization, inverse DCT transform, third level IPred, ReC: intra, inter prediction, image reconstruction, fourth level Dblock: image reconstruction, inverse Discrete Cosine Transform (IDCT) is the most basic and most common transform in video decoding operation, and is one of the core operation processes of video decoding, and the main process is to perform two matrix multiplications on residual data, and a large amount of multiplier/adder resources are required in the process. Before the residual is sent to the IDCT unit, Inverse Quantization (IQT) is needed, which is a process of performing corresponding multiplication operations according to inverse quantization values (QP). In the prior art, since the control of Entropy decoding and analyzing different syntax elements (syntax) is complex, the control is usually realized by adopting a DSP and a special Encopy operation accelerator, and the IQT and the IDCT are usually placed at the same pipeline level after Entropy decoding due to close connection before and after operation.

With the advent of the 4K/8K high definition era, the IDCT unit may need to perform parallel computation on multiple pixel points in each clock cycle, resulting in that the IQT unit in the same pipeline stage also needs to perform inverse quantization computation on multiple residuals at the same time. For example, referring to fig. 2, the IQT unit and the IDCT unit are configured to perform pipeline level operation, and a plurality of pixel point residuals p0, p1, p2, p3, p4, p5, … …, pn calculated in parallel need to be synchronously fed into the IDCT unit after being operated by the respective IQT unit, in fig. 2, the residual p0 performs inverse quantization operation by the IQT0 unit, the residual p1 performs inverse quantization operation by the IQT1 unit, and so on, and the residual pn performs inverse quantization operation by the IQTn unit. Because the internal operation of each IQT unit requires at least one 16-bit (bit) multiplier, and in addition, resources such as an adder, a shifter and the like are also required, the linear increase of the resource overhead of the multiplier is brought when the multipoint parallel operation is performed.

In summary, it is an urgent technical problem to provide a hardware decoder pipeline optimization method capable of saving the extra multiplier resource overhead caused by the parallel operation of IDCT.

Disclosure of Invention

The invention aims to: the defects of the prior art are overcome, and a hardware decoder pipeline optimization method and application are provided. The invention leads the IQT operation and the entropy decoding to be in the same pipeline level by advancing the IQT (inverse quantization) operation, and can save the extra multiplier resource overhead brought by the IQT operation and the IDCT operation in the same pipeline level.

In order to achieve the above object, the present invention provides the following technical solutions:

a hardware decoder pipeline optimization method divides the decoding of AVC video sequence into multi-stage pipeline structure, the method stipulates the cooperative working mode of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering process;

Furthermore, the decoding of the AVC video sequence is divided into a 4-level pipeline structure, the 1 st level corresponds to entropy decoding inverse quantization processing, the 2 nd level corresponds to inverse DCT transformation processing, the 3 rd level corresponds to intra-frame prediction, inter-frame prediction and image reconstruction processing, and the 4 th level corresponds to deblocking filtering processing.

Further, the entropy decoding inverse quantization unit comprises a residual analysis module and an inverse quantization calculation module,

Further, when the residual error is subjected to inverse quantization processing through the inverse quantization calculation module, the residual error coefficient transformation process comprises AC coefficient inverse quantization processing and DC coefficient inverse quantization processing, and before the DC coefficient inverse quantization processing, DC inverse Hadamard transformation is performed through the residual error analysis module.

Further, the inverse quantization process of the residual includes the steps of,

obtaining a residual analysis state;

The invention also provides a data processing device for video decoding, which divides the decoding of the AVC video sequence into a multi-stage pipeline structure;

the data processing device comprises an entropy decoding inverse quantization unit and an inverse transformation unit;

the inverse transformation unit can perform multipoint parallel inverse DCT transformation processing on the data of a plurality of synchronously input pixel points.

Further, the entropy decoding inverse quantization unit comprises a residual analysis module and an inverse quantization calculation module,

The invention also provides a video decoder system, which adopts macro block level pipeline operation; the video decoder system comprises decoding firmware and a multi-core hardware decoding accelerator which are in communication connection, wherein the decoding firmware is used for analyzing non-entropy coding data on an upper layer of a video code stream, and the multi-core hardware decoding accelerator is used for processing a decoding task of a macro block layer in the video code stream;

the multi-core hardware decode accelerator comprises a data processing apparatus as claimed in any preceding claim.

The invention also provides a video decoding method, which comprises the following steps:

receiving video code stream data;

wherein the content of the first and second substances,

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects as examples: the invention leads the IQT operation and the entropy decoding to be in the same pipeline level by advancing the IQT (inverse quantization) operation, and can save the extra multiplier resource overhead brought by the IQT operation and the IDCT operation in the same pipeline level.

Drawings

FIG. 1 is a schematic diagram of a pipeline design of a prior art single core hardware decoder.

FIG. 2 is a diagram illustrating an example of parallel operations in which the IQT and the IDCT are placed in the same pipeline stage.

Fig. 3 is a partial schematic diagram of an optimized pipeline stage division according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of information transmission of an entropy decoding inverse quantization unit and an inverse transform unit according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a residual coefficient transformation process during inverse quantization according to an embodiment of the present invention.

Fig. 6 is a control diagram of a main state machine according to an embodiment of the present invention.

Fig. 7 is a schematic block diagram of a video decoder system according to an embodiment of the present invention.

Detailed Description

The hardware decoder pipeline optimization method and application disclosed by the invention are further described in detail in the following with reference to the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings.

It should be noted that the structures, proportions, sizes, and other dimensions shown in the drawings and described in the specification are only for the purpose of understanding and reading the present disclosure, and are not intended to limit the scope of the invention, which is defined by the claims, and any modifications of the structures, changes in the proportions and adjustments of the sizes and other dimensions, should be construed as falling within the scope of the invention unless the function and objectives of the invention are affected. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that described or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

Examples

A hardware decoder pipeline optimization method is used for dividing decoding of an AVC video sequence into a multi-level pipeline structure, and the method provides a cooperative working mode of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering processes.

When the pipeline level division is set, the inverse quantization process and the entropy decoding process are positioned at the same pipeline level, and the inverse DCT transformation process is positioned at the next pipeline level of the inverse quantization process. Referring to fig. 3, the entropy decoding and inverse quantization unit performs entropy decoding and inverse quantization on the macroblock, and the data of the multiple pixel points after inverse quantization is synchronously input to the inverse transformation unit for multi-point parallel inverse DCT transformation.

In this embodiment, the decoding of the AVC video sequence is preferably divided into 4-level pipeline structures, which are as follows: the level 1 corresponds to entropy decoding inverse quantization processing (or called entropy decoding inverse quantization pipeline level), the level 2 corresponds to inverse DCT transform processing (or called inverse DCT transform pipeline level), the level 3 corresponds to intra prediction, inter prediction and image reconstruction processing (or called intra prediction, inter prediction and image reconstruction pipeline level), and the level 4 corresponds to deblocking filtering processing (or called deblocking filtering pipeline level). The difference between the above pipeline division and the prior art in fig. 1 is that the Inverse Quantization (IQT) operation is moved forward to the pipeline stage where the entropy decoding is located.

With continued reference to fig. 3, the Entropy decoding inverse quantization unit (Entropy Dec and IQT) may include a residual parsing module and an inverse quantization calculation module, which employ a pipeline structure. Considering that the residual errors (Residue) are sequentially parsed one by one in a zigzag order during Entropy decoding (Entropy Dec), when the inverse quantization operation is moved forward to the pipeline stage of the Entropy decoding, the residual error parsing (Res Dec) module decodes one residual error pn every time, that is, the residual error pn is sent to an Inverse Quantization (IQT) module for inverse quantization and then stored in an Entropy decoding output macroblock buffer (MB buffer).

Specifically, for a plurality of pixel point residuals p0, p1, p2, … …, pn located in the same Macroblock (MB), the residual resolving module is configured to: the pixel point residuals pi (i to 0(1(2(… …, n)) are sequentially analyzed one by one according to the zigzag sequence, and each analyzed pixel point residual pi is sent to the inverse quantization calculation module.

The inverse quantization computation module is configured to: and receiving pixel point residual errors sent by the residual error analysis module, carrying out inverse quantization processing, and storing data subjected to inverse quantization processing into an entropy decoding output macro block cache.

For example, in fig. 3, after the residual error parsing (Res Dec) module parses the pixel residual error p0, p0 is sent to the Inverse Quantization (IQT) calculation module for inverse quantization (the data after inverse quantization is stored in the entropy decoding output macroblock buffer), then the residual error parsing module continues parsing the next pixel residual error p1, and after parsing, p1 is sent to the inverse quantization calculation module for inverse quantization (the data after inverse quantization is stored in the entropy decoding output macroblock buffer); and by analogy, after the residual error analysis module analyzes the residual error pn of the pixel point, the pn is sent to the inverse quantization calculation module to be subjected to inverse quantization processing.

By adopting the technical scheme, the time-sharing multiplexing of the inverse quantization computing module is realized, and as shown in fig. 4, only one IQT computing module is needed for the whole 1 st pipeline stage (entropy decoding inverse quantization pipeline stage), so that the multiplier resources required by the parallel computing of multiple pixel points are obviously reduced compared with the IQT/IDCT same-level operation in the prior art.

Since the residual coefficient transformation process involves the processing of AC and DC coefficients, the complete inverse quantization process of the residual at the entropy decoding inverse quantization pipeline level is described in detail below in conjunction with fig. 5 and 6.

According to the AVC protocol, as shown in fig. 5, when inverse quantization processing is performed on residual pn, the residual coefficient transformation process includes AC coefficient inverse quantization processing and DC coefficient inverse quantization processing, and compared with AC coefficient inverse quantization, inverse hadamard transform operation is required before the DC inverse quantization. Therefore, before performing the DC coefficient inverse quantization process, a DC inverse hadamard transform is also required to be performed by the residual analysis module.

With R_ijRepresenting the original obtained by entropy decoding a de-syntax elementResidual value, using Q_ijThe residual value after final inverse quantization is shown, and in a preferred embodiment, the inverse quantization process of the residual includes the following steps:

step 1, obtaining a residual analysis state.

Step 2, when the residual error analysis IDLE state (IDLE) is determined, the luminance DC analysis submodule Y _ DC _ DEC is used to analyze the DC coefficient R of the 4 × 4 block of the luminance component Y of the macroblock_{ij_DC_Y}And then the DC coefficient R of the 4x4 block is subjected to a luminance DC inverse quantization sub-module Y _ DC _ IQT_{ij_DC_Y}After inverse Hadamard transform and inverse quantization, a residual error value Q after inverse quantization is obtained_{ij_DC_Y}(ii) a At the same time, the AC coefficient R of the luminance component Y is resolved by a luminance AC resolution submodule Y _ AC _ DEC_{ij_AC_Y}Inverse quantization is carried out to obtain a residual error value Q after inverse quantization_{ij_AC_Y}。

Step 3, then, the DC coefficient R of the 2 × 2 block of the chrominance component UV of the macroblock is resolved by the chrominance DC resolution submodule UV _ DC _ DEC_{ij_DC_UV}And then the chroma DC inverse quantization submodule UV _ DC _ IQT is used for carrying out inverse quantization on the DC coefficient R of the 2x2 block_{ij_DC_UV}After inverse Hadamard transform and inverse quantization, a residual error value Q after inverse quantization is obtained_{ij_DC_UV}(ii) a At the same time, the AC coefficient R of the chrominance component UV is resolved by a chrominance AC resolution submodule UV _ AC _ DEC_{ij_AC_UV}Inverse quantization is carried out to obtain a residual error value Q after inverse quantization_{ij_AC_UV}。

In this way, the residual value Q after inverse quantization of the luminance component Y of the macroblock can be obtained_{ij_DC_Y}And Q_{ij_AC_Y}And the inverse quantized residual value Q of the chrominance component UV_{ij_DC_UV}And Q_{ij_AC_UV}. The data is input to an inverse transform unit to be subjected to IDCT transform processing (or inverse DCT transform processing, inverse DCT transform processing).

As a typical preferred mode, when a DC inverse hadamard operation is inserted into the residual analysis module, the control mode of the main state machine is as shown in fig. 6.

Wherein Y _ DC _ DEC denotes a luminance DC parsing submodule for parsing the DC coefficient R of a 4x4 block of the luminance component Y of the macroblock_{ij_DC_Y}. Y _ DC _ IQT denotes the luminance DC inverse quantization submodule for the DC coefficients R of a 4x4 block_{ij_DC_Y}Carry out inverse hadaObtaining a residual error value Q after inverse quantization after the Ma transformation and the inverse quantization_{ij_DC_Y}. Y _ AC _ DEC denotes a luminance AC parsing submodule for parsing the AC coefficient R of the luminance component Y_{ij_AC_Y}Inverse quantization is carried out to obtain a residual error value Q after inverse quantization_{ij_AC_Y}。

UV _ DC _ DEC denotes a chrominance DC parsing submodule for parsing the DC coefficient R of a 2x2 block of the chrominance component UV of a macroblock_{ij_DC_UV}. UV _ DC _ IQT denotes the chroma DC inverse quantization submodule for DC coefficients R for 2x2 blocks_{ij_DC_UV}After inverse Hadamard transform and inverse quantization, a residual error value Q after inverse quantization is obtained_{ij_DC_UV}. UV _ AC _ DEC denotes a chrominance AC parsing submodule for parsing the AC coefficient R of the chrominance component UV_{ij_AC_UV}Inverse quantization is carried out to obtain a residual error value Q after inverse quantization_{ij_AC_UV}。

According to another embodiment of the invention, a data processing device for video decoding is also provided. The data processing apparatus is for partitioning decoding of an AVC video sequence into a multi-level pipeline structure.

In this embodiment, when the data processing apparatus sets the cooperative working mode of the processes of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction, and deblocking filtering, the inverse quantization process and the entropy decoding process are located at the same pipeline level, and the inverse DCT transformation process is located at a next pipeline level of the pipeline level where the inverse quantization process is located.

Specifically, the data processing apparatus includes an entropy decoding inverse quantization unit and an inverse transformation unit.

The inverse transformation unit can perform multipoint parallel inverse DCT transformation processing on the data of a plurality of synchronously input pixel points.

In this embodiment, the data processing apparatus divides the decoding of an AVC video sequence into a 4-level pipeline structure, where level 1 corresponds to entropy decoding inverse quantization processing, level 2 corresponds to inverse DCT transformation processing, level 3 corresponds to intra prediction, inter prediction, and image reconstruction processing, and level 4 corresponds to deblocking filtering processing.

Preferably, the entropy decoding inverse quantization unit may include a residual analysis module and an inverse quantization calculation module, and the residual analysis module and the inverse quantization calculation module adopt a pipeline structure. Considering that the residual errors (Residue) are sequentially parsed one by one in a zigzag order during Entropy decoding (Entropy Dec), when the inverse quantization operation is moved forward to the pipeline stage of the Entropy decoding, the residual error parsing (Res Dec) module decodes one residual error pn every time, that is, the residual error pn is sent to an Inverse Quantization (IQT) module for inverse quantization and then stored in an Entropy decoding output macroblock buffer (MB buffer).

Specifically, the residual error analyzing module analyzes pixel point residual errors one by one in sequence according to a zigzag sequence, and the residual error analyzing module sends one pixel point residual error analyzed by the residual error analyzing module to the inverse quantization calculating module for inverse quantization processing, and stores the processed data in the entropy decoding output macro block cache.

By adopting the technical scheme, the time-sharing multiplexing of the inverse quantization computation module is realized, only one IQT computation module is needed for the whole 1 st pipeline level (entropy decoding inverse quantization pipeline level), and compared with the IQT/IDCT same-level operation in the prior art, the multiplier resources needed by multi-pixel-point parallel computation are obviously reduced.

Other technical features are referred to in the previous embodiments and are not described herein.

In another embodiment of the present invention, a video decoder system is also provided, which employs macroblock-level pipelining.

Referring to fig. 7, the Video decoder system may include a decoding FirmWare (VDEC _ FW (full name Video FirmWare) in fig. 7) and a Multi-Core hardware decoding accelerator (VDEC _ MCORE (full name VDEC Multi) Core in fig. 7) communicatively connected, the decoding FirmWare is configured to parse non-entropy encoded data at an upper layer of a Video code stream.

The bit stream of the AVC Video adopts a layered structure, most grammars shared in a GOP layer and a Slice layer are liberated, a Video Parameter Set VPS (namely, Video Parameter Set), a Sequence Parameter Set SPS (namely, Sequence Parameter Set) and a Picture Parameter Set PPS (namely, Picture Parameter Set) are formed, and the like. According to the characteristics of the code stream data, the decoder system provided by this embodiment divides the video decoder VDEC into two parts, namely, a decoding firmware VDEC _ FW used as a software part for parsing non-entropy coding data (such as a video parameter set VPS, a sequence parameter set SPS, a picture parameter set PPS, Slice header information, and the like) on an upper layer of the video code stream, and a multi-core hardware decoding accelerator VDEC _ MCORE used as a hardware part for collectively processing all decoding operations of a macro block layer in the video code stream.

In this embodiment, the multi-core hardware decoding accelerator includes the data processing apparatus in the foregoing embodiments.

When the data processing device is used for setting the cooperative working mode of the processes of entropy decoding, inverse quantization, inverse DCT transformation, intra-frame prediction, inter-frame prediction, image reconstruction and deblocking filtering, the inverse quantization process and the entropy decoding process are positioned at the same pipeline level, and the inverse DCT transformation process is positioned at the next pipeline level of the pipeline level where the inverse quantization process is positioned.

The data processing apparatus may include an entropy decoding inverse quantization unit and an inverse transformation unit.

The entropy decoding and inverse quantization unit can perform entropy decoding and inverse quantization processing on the macro block, and synchronously input data of a plurality of pixel points subjected to inverse quantization processing into the inverse transformation unit. The inverse transformation unit can perform multipoint parallel inverse DCT transformation processing on the data of a plurality of synchronously input pixel points. Further, the data processing apparatus may further include a prediction reconstruction unit configured to perform intra prediction processing, inter prediction processing, and image reconstruction processing on the macroblock data, and a deblocking filtering unit configured to perform deblocking filtering processing on the macroblock data.

In one embodiment, the entropy decoding inverse quantization unit may include a residual parsing module and an inverse quantization calculation module, and the residual parsing module and the inverse quantization calculation module adopt a pipeline structure. Considering that the residual errors (Residue) are sequentially parsed one by one in a zigzag order during Entropy decoding (Entropy Dec), when the inverse quantization operation is moved forward to the pipeline stage of the Entropy decoding, the residual error parsing (Res Dec) module decodes one residual error pn every time, that is, the residual error pn is sent to an Inverse Quantization (IQT) module for inverse quantization and then stored in an Entropy decoding output macroblock buffer (MB buffer). Specifically, the residual error analyzing module analyzes pixel point residual errors one by one in sequence according to a zigzag sequence, and the residual error analyzing module sends one pixel point residual error analyzed by the residual error analyzing module to the inverse quantization calculating module for inverse quantization processing, and stores the processed data in the entropy decoding output macro block cache.

Continuing to refer to fig. 7, the software/hardware takes Slice level in the video code stream as an interaction unit, and performs data interaction through Slice Queue inside the video decoder. The interaction flow of the decoding firmware VDEC _ FW and the multi-core hardware decoding accelerator VDEC _ MCORE may be as follows:

1) after the decoding firmware VDEC _ FW finishes the upper layer analysis task of the code stream, the Slice upper layer parameter information is packed and pressed into a Slice Queue, namely the information is put (push) into the Slice Queue for queuing. The downward arrow in fig. 7 indicates the operation of pushing in Slice Queue.

At this time, the decoding firmware is configured to: and after the upper layer of the video code stream is analyzed, the Slice upper layer parameter information is packed and pressed into Slice Queue.

2) The multi-core hardware decoding accelerator VDEC _ MCORE inquires ready information (ready state information) of Slice Queue data, reads Queue information and completes configuration, the full hardware analyzes a macro block in the current Slice until the end, sends an interrupt signal when the end, and releases the Slice Queue, namely releases (pop) corresponding information in the Queue of the Slice Queue. The upward arrow in fig. 7 represents the operation of releasing Slice Queue.

At this point, the multi-core hardware decode accelerator is configured to: and inquiring ready information of Slice Queue data, after reading the Queue and completing configuration, analyzing the current macroblock in the Slice until the macroblock in the Slice is analyzed, sending an interrupt signal after the analysis is finished, and releasing the Slice Queue.

Therefore, Slice parallel processing is realized by combining software and hardware division with Slice queue, software processing time can be obviously saved by software and hardware parallel processing, and then parallel processing efficiency is improved.

Specifically, as an example of a typical manner, the multi-core hardware decoding accelerator may be configured to include a preprocessor module and a plurality of homogeneous full-function hardware decoders. The full-function hardware decoder is at least capable of processing the steps of inverse DCT transformation, intra-frame inter-frame prediction and pixel reconstruction which are necessary for decoding a macro block line.

The preprocessor module comprises an entropy decoding inverse quantization unit of the data processing device.

The full-function hardware decoder comprises an inverse transformation unit, a prediction reconstruction unit and a deblocking filtering unit of the data processing device.

Each full-function hardware decoder single core is responsible for decoding a row of macro block lines, wherein the decoding comprises the process steps of inverse DCT (discrete cosine transform), intra-frame prediction, inter-frame prediction, image reconstruction, deblocking filtering and the like, and the macro blocks which are decoded in two adjacent uplink and downlink are separated by at least two macro blocks so as to realize multi-core synchronous decoding.

The homogeneous full-function hardware decoder can be set to be more than two (including two), is called a dual-core hardware decoding accelerator when being set to be two, is called a three-core hardware decoding accelerator when being set to be three, is called a four-core hardware decoding accelerator when being set to be four, and the like. Each full-function hardware decoder is responsible for decoding one row of macro block rows, the dual-core hardware decoding accelerator can simultaneously perform parallel decoding work on two rows of macro block rows, the three-core hardware decoding accelerator can simultaneously perform parallel decoding work on three rows of macro block rows, and the like.

For other technical features of the data processing apparatus, reference is made to the foregoing embodiments, and further description is omitted here.

Another embodiment of the present invention further provides a video decoding method using the aforementioned video decoder system. The video decoding method includes the steps of:

step 100, receiving video code stream data.

And 200, analyzing non-entropy coding data on the upper layer of the video code stream through a decoding firmware, and processing a decoding task of a macro block layer in the video code stream through a multi-core hardware decoding accelerator.

In step 200, the decoding task of the macroblock layer includes entropy decoding, inverse quantization, inverse DCT transformation, intra prediction, inter prediction, image reconstruction, and deblocking filtering processes. When the cooperative working mode of each process is set, the inverse quantization process and the entropy decoding process are positioned at the same pipeline level, and the inverse DCT transformation process is positioned at the next pipeline level of the inverse quantization process. The entropy decoding and inverse quantization unit is used for carrying out entropy decoding and inverse quantization processing on the macro block, and the data of a plurality of pixel points after inverse quantization processing are synchronously input to the inverse transformation unit for carrying out multipoint parallel inverse DCT transformation processing.

Other technical features are referred to in the previous embodiments and are not described herein.

In the foregoing description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. Common terms found in dictionaries should not be interpreted too ideally or too realistically in the context of related art documents unless the present disclosure expressly limits them to that. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

14页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：适用于AVC的熵解码硬件并行计算方法及应用

Hardware decoder pipeline optimization method and application

相关技术

网友询问留言