Filtering device and method in video coding and decoding

文档序号：555798 发布日期：2021-05-14 浏览：26次中文

阅读说明：本技术 视频编解码中的滤波装置和方法 (Filtering device and method in video coding and decoding ) 是由谢尔盖·尤里耶维奇·伊科宁维克多·阿列克谢耶维奇·斯蒂平迪米特雷·库里雪夫陈建乐罗曼于 2019-07-02 设计创作，主要内容包括：本发明涉及一种用于视频编解码的滤波器,其中所述滤波器用于处理重建帧以生成滤波后的重建帧,所述重建帧包括多个像素。所述滤波器包括一个或多个处理器,用于：获得所述重建块的量化参数(quantization parameter,QP)；根据所述QP获得阈值(threshold,THR)；根据所述QP获得查找表,以根据所述阈值和所述查找表生成经过滤波的重建块。所述滤波器有助于提高视频编解码效率。(The invention relates to a filter for video coding and decoding, wherein the filter is used for processing a reconstructed frame to generate a filtered reconstructed frame, and the reconstructed frame comprises a plurality of pixels. The filter includes one or more processors to: obtaining a Quantization Parameter (QP) for the reconstructed block; obtaining a Threshold (THR) from the QP; obtaining a look-up table from the QP to generate a filtered reconstructed block from the threshold and the look-up table. The filter helps to improve video coding and decoding efficiency.)

1. A method for processing a reconstructed block, the reconstructed block comprising a plurality of pixels, the method comprising:

obtaining a Quantization Parameter (QP) for the reconstructed block;

obtaining a Threshold (THR) from the QP;

obtaining a look-up table from the QP to generate a filtered reconstructed block from the threshold and the look-up table.

2. The method of claim 1, further comprising:

scanning a current pixel of the reconstruction block and a neighboring pixel of the current pixel according to a predefined scanning template;

obtaining spectral components by performing a transformation on the current pixel and neighboring pixels of the current pixel;

using the look-up table and the spectral components, filtered spectral components are obtained.

3. The method according to claim 2, characterized in that the filtered spectral components F (i, σ) are derived by the following equation:

wherein (i) is an index of spectral components, R (i) is the spectral component corresponding to the index (i), σ is a filtering parameter, LUT (R (i), σ) is an element in the lookup table, THR is the threshold derived from the QP.

4. The method of claim 3,m is a normalization constant equal to the number of spectral components.

5. The method of claim 3 or 4, wherein the filter parameters are derived from the QP.

6. The method of claim 5, wherein the filter parameter σ is derived by the following equation:

σ＝k×2^(n×(QP-s))

wherein QP is the coding and decoding quantization parameter, and k, n and s are constants.

7. The method of claim 6, wherein k is 2.64, n is 0.1296, and s is 11.

8. The method according to any one of claims 1 to 7, wherein the threshold value is derived by the following equation:

where C is a value close to 1, σ is a filter parameter, and m is a normalization constant equal to the number of spectral components.

9. The method according to any one of claims 1 to 8, further comprising:

selecting a QP set to obtain the lookup table (LUT), wherein the QP set comprises a first QP corresponding to an index (i) and a second QP corresponding to an index (i +1), and the interval between the first QP and the second QP is larger than 1.

10. The method of claim 9, wherein the interval is a constant equal to 8,10, or 16.

11. The method according to any one of claims 1 to 10, further comprising:

deleting N bits from the LUT table value, N being an integer.

12. The method of claim 11, wherein the accessing of LUT elements is as follows:

tbl[(z+HTDF_TBL_RND)＞＞HTDF_TBL_SH]

wherein, HTDF _ TBL _ SH is the alias of N, and HTDF _ TBL _ RND is 1< (HTDF _ TBL _ SH-1).

13. The method of claim 11, wherein the accessing of LUT elements is as follows:

LUT[(fHad[i]+(1＜＜(tblShift-1)))＞＞tblShift]

for positive values of fHad i,

-LUT[(-(fHad[i]+(1＜＜(tblShift-1)))＞＞tblShift]

for negative values of fHad i,

wherein tblShift is the alias of N, and fHad [ i ] is the spectral component to be filtered.

14. The method of any one of claims 11 to 13, wherein N is 3.

15. The method according to any of the claims 11 to 13, characterized in that for the first QP in a set of QPs, N is equal to 2, the set of QPs being used to obtain the look-up table (LUT);

for the last QP or the last two QPs in the set of QPs, N is equal to 4;

for the remaining QPs in the set of QPs, N equals 3.

16. The method according to any one of claims 11 to 13, wherein N is defined as

tblShift＝tblThrLog2[qpIdx]-4

tblThrLog2[5]＝{6，7，7，8，8}

Wherein tblShift is an alias of N, and qpIdx is derived from QP.

17. The method of claim 16, wherein qpIdx is derived from QP as follows:

if(pred_mode_flag[xCb][yCb]＝＝0&&nCbW＝＝nCbH&&min(nCbW，nCbH)＞＝32)qpIdx＝Clip3(0，4，(Qp_Y-28+(1＜＜2))＞＞3)

else

qpIdx＝Clip3(0，4，(Qp_Y-20+(1＜＜2))＞＞3)

wherein Qp_YThe position (x, y) of the current block on the picture, pred _ mode _ flag [ xCb ], is defined for the current block QP, xCd, yCb][yCb]Defining a prediction mode for the current block: if the current block is equal to 0, the prediction is inter-frame prediction, otherwise, the prediction is intra-frame prediction, and nCbW and nCbH are the width and the height of the current block respectively.

18. The method of any of claims 11 to 13, wherein the value of N is determined from the QP value.

19. The method of any of claims 1-18, wherein the LUT is generated as a secondary function of at least some Quantization Parameters (QPs).

20. The method of claim 19, wherein the LUT generation LUT (R) is derived by the following equation_i，σ)：

Wherein, AuxiliaryFunc_σ(i) Represents the auxiliary function whose value is equal to the THR of i corresponding to the parameter of the last LUT + 1.

21. The method according to claim 19 or 20, wherein the helper function is a straight line equation intersecting the points (i, THR) and (a,0), where a > 0 and the value of a is determined from the filter parameter σ or QP value.

22. The method of claim 21, wherein the LUT is defined as follows:

setOfLUT[5][16]＝

{0，0，2，6，10，14，19，23，28，32，36，41，45，49，53，57，}，{0，0，5，12，20，29，38，47，56，65，73，82，90，98，107，115，}，{0，0，1，4，9，16，24，32，41，50，59，68，77，86，94，103，}，{0，0，3，9，19，32，47，64，81，99，117，135，154，179，205，230，}，{0，0，0，2，6，11，18，27，38，51，64，96，128，160，192，224，}。

23. the method of claim 21, wherein the LUT is defined as follows:

setOfLUT[5][16]＝

{0，0，2，6，10，14，19，23，28，32，36，41，45，49，53，57，}，{0，0，5，12，20，29，38，47，56，65，73，82，90，98，107，115，}，{0，0，1，4，9，16，24，32，41，50，59，68，77，86，94，103，}，{0，0，0，1，3，5，9，14，19，25，32，40，55，73，91，110，}，{0，0，0，0，0，1，2，4，6，8，11，14，26，51，77，102，}。

24. the method of claim 21, wherein the LUT is defined as follows:

setOfLUT[5][16]＝

{0，2，10，19，28，36，45，53，61，70，78，86，94，102，110，118，}，{0，0，5，12，20，29，38，47，56，65，73，82，90，98，107，115，}，{0，0，1，4，9，16，24，32，41，50，59，68，77，86，94，103，}，{0，0，0，1，3，5，9，14，19，25，32，40，55，73，91，110，}，{0，0，0，0，0，1，2，4，6，8，11，14，26，51，77，102，}。

25. the method according to any one of claims 1 to 24, wherein the transform is an Hadamard transform.

26. The method according to any one of claims 1 to 25, wherein the transform is a 1D transform.

27. A method of processing a prediction block, wherein the prediction block comprises a plurality of pixels, the method comprising:

obtaining a Quantization Parameter (QP) for the prediction block;

obtaining a Threshold (THR) from the QP;

obtaining a look-up table from the QP to generate a filtered prediction block from the threshold and the look-up table.

28. The method of claim 27, further comprising:

scanning a current pixel of the prediction block and neighboring pixels of the current pixel according to a predefined scanning template;

obtaining spectral components by performing a transformation on the current pixel and neighboring pixels of the current pixel;

using the look-up table and the spectral components, filtered spectral components are obtained.

29. The method of claim 28, wherein the filtered spectral components F (i, σ) are derived by the following equation:

30. The method of claim 29,m is a normalization constant equal to the number of spectral components.

31. The method according to claim 29 or 30, wherein the filter parameters are derived from the QP.

32. The method of claim 29, wherein the filter parameter σ is derived by the following equation:

σ＝k×2^(n×(QP-s))

wherein QP is the coding and decoding quantization parameter, and k, n and s are constants.

33. The method according to any one of claims 27 to 32, wherein the threshold value is derived by the following equation:

where C is a value close to 1, σ is a filter parameter, and m is a normalization constant equal to the number of spectral components.

34. The method of any one of claims 27 to 33, further comprising:

35. The method of any one of claims 27 to 34, further comprising:

deleting N bits from the LUT table value, N being an integer.

36. The method of claim 35 wherein the access to LUT elements is as follows:

tbl[(z+HTDF_TBL_RND)＞＞HTDF_TBL_SH]

wherein, HTDF _ TBL _ SH is the alias of N, and HTDF _ TBL _ RND is 1< (HTDF _ TBL _ SH-1).

37. The method of claim 35 wherein the access to LUT elements is as follows:

LUT[(fHad[i]+(1＜＜(tblShift-1)))＞＞tblShift]

for positive values of fHad i,

-LUT[(-(fHad[i]+(1＜＜(tblShift-1)))＞＞tblShift]

for negative values of fHad i,

wherein tblShift is the alias of N, and fHad [ i ] is the spectral component to be filtered.

38. The method of any one of claims 35 to 37, wherein N is defined as follows:

tblShift＝tblThrLog2[qpIdx]-4

tblThrLog2[5]＝{6，7，7，8，8}

wherein tblShift is an alias of N, and qpIdx is derived from QP.

39. The method of claim 38 wherein qpIdx is derived from QP as follows:

if(pred_mode_flag[xCb][yCb]＝＝0&&nCbW＝＝nCbH&&min(nCbW，nCbH)＞＝32)

qpIdx＝Clip3(0，4，(Qp_Y-28+(1＜＜2))＞＞3)

else

qpIdx＝Clip3(0，4，(Qp_Y-20+(1＜＜2))＞＞3)

40. The method of any one of claims 35 to 39, wherein the value of N is determined from the QP value.

41. The method of any of claims 35 to 40, wherein the LUT is generated as a secondary function of at least some Quantization Parameters (QPs).

42. The method of claim 41 wherein the LUT generation LUT (R) is derived by the following equation_i，σ)：

Wherein, AuxiliaryFunc_σ(i) Represents the auxiliary function whose value is equal to the THR of i corresponding to the parameter of the last LUT + 1.

43. The method according to claim 41 or 42, wherein the helper function is a straight line equation intersecting the points (i, THR) and (a,0), wherein a > 0 and the value of a is determined from the filter parameter σ or QP value.

44. The method of claim 43, wherein the LUT is defined as follows:

setOfLUT[5][16]＝

45. the method of claim 43, wherein the LUT is defined as follows:

setOfLUT[5][16]＝

46. the method of claim 43, wherein the LUT is defined as follows:

setOfLUT[5][16]＝

47. a decoder, characterized in that the decoder comprises processing circuitry for performing the method according to any of claims 1 to 46.

48. An encoder, characterized in that the encoder comprises processing circuitry for performing the method according to any one of claims 1 to 46.

49. A decoder, characterized in that the decoder comprises a filter for performing the method according to any of claims 1 to 46.

50. An encoder, characterized in that the encoder comprises a filter for performing the method according to any of claims 1 to 46.

51. A decoder, comprising:

a memory comprising instructions;

one or more processors in communication with the memory, wherein the one or more processors execute the instructions to perform the method of any of claims 1-46.

52. An encoder, comprising:

a memory comprising instructions;

one or more processors in communication with the memory, wherein the one or more processors execute the instructions to perform the method of any of claims 1-46.

53. A computer program product, characterized in that it comprises program code for performing the method according to claims 1 to 46 when executed in a computer or processor.

Technical Field

The present invention relates generally to the field of video coding. More particularly, the present invention relates to a filter for video coding, a method for filtering reconstructed video frames, a method for filtering video blocks, and an encoding device and a decoding device comprising such a filter for video coding.

Background

Digital video has gained widespread use since the advent of DVD discs. Digital video is encoded prior to transmission and then transmitted over a transmission medium. The viewer receives the digital video, decodes and displays the digital video using a viewing device. Video quality has improved over the years due to improvements in resolution, color depth, frame rate, and the like. This results in larger data streams that are currently typically transmitted over the internet and mobile communication networks.

However, high resolution video typically requires more bandwidth because the video contains more information. To reduce bandwidth requirements, video coding standards involving video compression have been introduced. When encoding video, the bandwidth requirement (or the corresponding memory requirement during storage) is reduced. Reducing bandwidth requirements typically results in reduced quality. Video coding standards therefore attempt to balance bandwidth requirements with quality.

Due to the need to continually improve quality and reduce bandwidth requirements, various technical solutions have been studied to maintain quality while reducing bandwidth requirements or to improve quality while maintaining bandwidth requirements. In addition, a fold-in solution is sometimes acceptable. For example, if the quality improvement is significant, it may be acceptable to increase the bandwidth requirement.

High Efficiency Video Coding (HEVC) is one example of a Video Coding standard well known to those skilled in the art. In HEVC, a Coding Unit (CU) is divided into Prediction Units (PU) or Transform Units (TU). The universal Video Coding (VVC) next generation standard is the latest joint Video project by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization bodies. These two standardization organizations collaborate together and are called Joint Video Exploration Team (jfet). VVC is also known as ITU-T H.266/Next Generation Video Coding (NGVC) standard. The VVC removes the multiple partition type concept, i.e., does not distinguish CU, PU, and TU concepts (unless the CU is too large, exceeding the maximum transform length, these concepts are distinguished as needed). The VVC supports more flexible CU partition shapes.

Image filtering is often used to emphasize certain features of an image or to improve the objective or perceived quality of an image after filtering. Image filtering must deal with various noise sources. Thus, various methods of improving quality are provided, and are currently in use. For example, in the Adaptive Loop Filter (ALF) method, each reconstructed frame is divided into a set of small blocks (super pixels), and the adaptive Loop filter filters each block, that is, each pixel in the filtered reconstructed frame is a weighted sum of several pixels in a connected region from the pixels in the reconstructed frame around a position where the filtered pixel is generated. The weighting coefficients (also called filter coefficients) have a central symmetry property and are transmitted from the encoder to the decoder. The edges are typically large and thus the transmitted weighting coefficients become large, resulting in inefficient processing. There are a large number of weighting coefficients, and complex rate-distortion optimization (RDO) needs to be performed at the encoding end to reduce the weighting coefficients of the transmission. At the decoding end, ALF needs to be able to implement general multiplications, which need to be reloaded for each 2 × 2 pixel block.

Therefore, there is a need for an improved filter and method to improve the prediction quality at low complexity, thereby improving video coding efficiency.

Disclosure of Invention

It is an object of the present invention to provide an improved filter and method to increase the filtering efficiency with limited complexity, thereby increasing the video coding and decoding efficiency.

The above and other objects are achieved by the subject matter claimed in the independent claims. Further implementations are apparent in the dependent claims, the description and the drawings.

According to a first aspect, the invention relates to a filter for video coding. The filter is for processing a block to generate a filtered block, the block comprising a plurality of pixels. The filter includes a memory including instructions; one or more processors in communication with the memory. The one or more processors execute the instructions to: loading a current pixel and neighboring pixels of the current pixel into a linear buffer according to a predefined scan template; obtaining spectral components by performing a 1D transform on pixels in the linear buffer; obtaining filtered spectral components by multiplying each spectral component by a gain factor, wherein the gain factor is determined according to the corresponding spectral component and a filtering parameter; obtaining filtered pixels by performing a 1D inverse transform on the filtered spectral components; generating a filtered block from the filtered pixels.

For example, the block (or frame) may be a prediction block and the filtered block is a filtered prediction block. As another example, the block (or frame) may be a reconstructed block and the filtered block is a filtered reconstructed block.

For example, the gain factor is a function of the corresponding spectral component and the filter parameter. The filter parameters may be derived from a Quantization Parameter (QP) of the codec.

As another example, when the gain coefficient G (i, σ) of the first spectral component is equal to 1, the first spectral component is skipped without filtering. The first spectral component corresponds to a sum or average of pixels in the linear buffer, which may correspond to DC.

For another example, the one or more processors execute the instructions to remove N bits from a table value of the LUT, where N is an integer. N may be determined from the QP value, or a fixed value.

For example, the predefined scan template is defined as a set of spatial offsets or grid offsets relative to the position of the current pixel in the reconstructed block. Offsets pointing to neighboring pixels are in the reconstructed block. At least one filtered pixel may be placed onto its original location according to the predefined scanning template. Adding all filtered pixels to an accumulation buffer according to the predefined scan template; the accumulation buffer may be initialized to 0 prior to the obtaining of spectral components of the filtered wave. Dividing the accumulated value in the accumulation buffer by the number of pixels added to the current position of the accumulation buffer to obtain a final filtered pixel; the one or more processors are to: generating the filtered reconstructed block from the final filtered pixels.

Optionally, the difference between all filtered pixels and the pixels of the corresponding unfiltered wave is added to an accumulation buffer according to the predefined scan template, wherein the accumulation buffer is initialized by the unfiltered pixels multiplied by the maximum number of pixel values to be added to the block. The accumulated value in the accumulation buffer is divided by the maximum number of pixel values to be added to the block to obtain the final filtered pixel.

According to a second aspect, the invention relates to a corresponding filtering method for processing a block to generate a filtered block. The block includes a plurality of pixels. Each pixel is associated with a pixel value. The filtering method comprises the following steps: loading a current pixel and neighboring pixels of the current pixel into a linear buffer according to a predefined scan template; obtaining spectral components by performing a 1D transform on pixels in the linear buffer; obtaining filtered spectral components by multiplying each spectral component by a gain factor, wherein the gain factor is determined according to the corresponding spectral component and a filtering parameter; obtaining filtered pixels by performing a 1D inverse transform on the filtered spectral components; generating the filtered block from the filtered pixels.

For example, the gain factor is a function of the corresponding spectral component and the filter parameter. The filter parameters may be derived from a Quantization Parameter (QP) of the codec.

As another example, the spectral components are filtered according to a look-up table (LUT). The LUT may be generated from an auxiliary function of at least some Quantization Parameters (QPs). The auxiliary function may be a straight line equation, intersecting at points (i, THR) and (a,0), where a > 0, a is determined from the filter parameter σ or QP value. For example, for the last QP in the set of QPs, a equals 11; for the second to last QP in the QP set, a equals 9.

For another example, the method further comprises: deleting N bits from the LUT table values, where N is an integer. N may be determined from the QP value, or a fixed value. In the QP set, the smaller the QP, the smaller N. For example, for the first QP in the set of QPs, N equals 2; while for the remaining QPs in the QP set, N equals 3. Alternatively, in the QP set, the larger the QP, the larger N. For example, for the last QP or last two QPs in the set of QPs, N equals 4; while for the remaining QPs in the QP set, N equals 3. Or, in the QP set, the smaller QP is, the smaller N is; the larger the QP, the larger N. For example, for the first QP in the set of QPs, N equals 2; for the last QP or the last two QPs in the set of QPs, N is equal to 4; while for the remaining QPs in the QP set, N equals 3.

For example, the predefined scan template is defined as a set of spatial offsets or grid offsets relative to the position of the current pixel in the reconstructed block. The offset pointing to the neighboring pixel is within the reconstruction block. At least one filtered pixel may be placed onto its original location according to the predefined scanning template. Adding all filtered pixels to an accumulation buffer according to the predefined scan template; the accumulation buffer may be initialized to 0 prior to the obtaining of spectral components of the filtered wave. Dividing the accumulated value in the accumulation buffer by the number of pixels added to the current position of the accumulation buffer to obtain a final filtered pixel; the one or more processors are to: generating the filtered reconstructed block from the final filtered pixels.

According to a third aspect, the invention relates to an encoding device for encoding a current frame in an input video stream. The encoding apparatus comprising a filter according to the first aspect of the invention.

According to a fourth aspect, the present invention relates to a decoding apparatus for decoding a current frame from a received code stream. The decoding apparatus comprises a filter according to the first aspect of the invention.

According to a fifth aspect, the invention relates to a computer program product. The computer program comprises program code; the program code is adapted to perform the method provided by the second aspect when executed on a computer.

Therefore, the filter helps to improve video coding efficiency. More specifically, the improved filter according to embodiments of the present invention is able to estimate the filter parameters from the frame itself without indicating the filter parameters, whereas the conventional filter indicates the weight coefficients to filter in the image domain, so the improved filter requires fewer indications than the conventional filter.

Drawings

Other embodiments of the invention will be described with reference to the following drawings, in which:

fig. 1A is a schematic diagram of an encoding apparatus provided in this embodiment, where the encoding apparatus includes a filter provided in this embodiment;

fig. 1B is a schematic diagram of an encoding apparatus provided in this embodiment, where the encoding apparatus includes a filter provided in this embodiment;

fig. 2A is a schematic diagram of a decoding apparatus provided in this embodiment, where the decoding apparatus includes a filter provided in this embodiment;

fig. 2B is a schematic diagram of a decoding apparatus provided in this embodiment, where the decoding apparatus includes a filter provided in this embodiment;

FIG. 3A is a schematic diagram of aspects of a filtering process implemented in the filter provided in the present embodiment;

FIG. 3B is a schematic diagram of aspects of a filtering process implemented in the filter provided in the present embodiment;

FIG. 3C is a schematic diagram of aspects of a filtering process implemented in a filter according to another embodiment;

FIG. 4A shows templates for different pixel locations within a square reconstruction block;

FIG. 4B shows an equivalent filter shape for one pixel;

FIG. 4C provides an example of padding;

FIG. 4D provides another example of padding;

fig. 5A is a schematic diagram of the filtering method steps provided in this embodiment;

fig. 5B is a schematic diagram of the filtering method steps provided in this embodiment;

FIG. 6 is an exemplary hardware filter design based on SRAM;

FIG. 7 is an exemplary hardware design for 2 × 2 bank flip-flop based filtering;

FIG. 8 shows one example of combining the results of four 2 x 2 sets of filtering and reusing the results of the same spatial set of final filtering;

fig. 9 illustrates the result of LUT optimization;

FIG. 10 illustrates an unwanted gap in the filter transfer function;

FIG. 11 illustrates the same table drawing table entries one by one;

FIG. 12 illustrates a method of how to eliminate gaps using an auxiliary function;

fig. 13 shows an example of eliminating a gap by taking the maximum value from two values when generating the LUT;

fig. 14 shows by way of example the filter transfer function after applying the method described above;

FIG. 15 shows an example of a filter transfer function determined from five QPs in a set;

fig. 16 shows an example of filter transfer functions for five QPs in a set according to the respective tables;

fig. 17 shows an example of filter transfer functions for five QPs in a set according to the respective tables;

fig. 18 is a schematic diagram of an exemplary structure of the apparatus provided in the present embodiment.

In the various figures, the same reference numerals are used to indicate identical or functionally equivalent features.

Detailed Description

Reference is now made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific aspects in which the invention may be practiced. It is to be understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, as the scope of the present invention is defined by the appended claims.

For example, it is to be understood that the disclosure relating to describing a method may equally apply to a corresponding device or system for performing the method, and vice versa. For example, if a specific method step is described, the corresponding apparatus may comprise means for performing the described method step, even if such means are not explicitly described or illustrated in the figures. Furthermore, it is to be understood that features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.

Fig. 1A shows an encoding apparatus 100 provided in the present embodiment. The encoding apparatus 100 includes the filter 120 provided in the present embodiment. The encoding apparatus 100 is configured to encode a block in a frame of a video signal, wherein the video signal comprises a plurality of frames (also referred to herein as pictures), each frame being divisible into a plurality of blocks, each block comprising a plurality of pixels. In one embodiment, the blocks may be macroblocks, coding tree units, coding units, prediction units, and/or prediction blocks.

The term "block" is used herein to refer to any type of block or any depth of block, for example, the term "block" includes, but is not limited to, root blocks, sub-blocks, leaf nodes, and the like. The sizes of the blocks to be encoded are not necessarily the same. An image may comprise a plurality of blocks of different sizes, and the block grid of different images in a video sequence may also be different.

In the exemplary embodiment shown in fig. 1A, the encoding apparatus 100 is implemented as a hybrid video encoder. Typically, the first frame in a video signal is an intra-coded frame, which is encoded by intra-prediction only. To this end, in the embodiment shown in fig. 1A, the encoding apparatus 100 further includes an intra prediction unit 154 for intra prediction. The decoding of an intra-coded frame may not require information of other frame frames. The intra prediction unit 154 may perform intra prediction on the block according to the information provided by the intra estimation unit 152.

Blocks in frames subsequent to the first intra-coded frame may be encoded by inter prediction or intra prediction selected by the mode selection unit 160. To this end, the encoding apparatus 100 shown in fig. 1A further includes an inter prediction unit 144. In general, inter prediction unit 144 may be used to perform motion compensation on blocks according to motion estimation provided by inter estimation unit 142.

In addition, in the hybrid encoder embodiment shown in fig. 1B, the filter 145 also filters a prediction signal obtained by intra prediction or inter prediction.

Furthermore, in the hybrid encoder embodiments shown in fig. 1A and 1B, the residual calculation unit 104 determines the difference between the original block and the prediction block of the original block. The prediction block is a residual block defining a prediction error of intra/inter prediction. The residual block is transformed (e.g., by DCT) by transform unit 106 and the transform coefficients are quantized by quantization unit 108. Entropy encoding unit 170 further encodes the output of quantization unit 108 and the encoding information or side information provided by intra prediction unit 154, inter prediction unit 144, filter 120, and the like.

The hybrid video encoder typically processes the same way as the decoder, so that the encoder and decoder generate the same prediction block. Thus, in the embodiment shown in fig. 1, the inverse quantization unit 110 and the inverse transform unit perform the inverse operations of the transform unit 106 and the quantization unit 108, duplicating the decoded approximation of the residual block. Then, the reconstruction unit 114 adds the decoded residual block data to the result of the prediction block. The output of the reconstruction unit 114 may then be provided to a column buffer 116 for intra-prediction and further processed by a filter 120, as will be described in detail below. The resulting image is stored in the decoded image buffer 130 and can be used for inter prediction of subsequent frames.

Fig. 2A is a decoding apparatus 200 according to the present embodiment. The decoding apparatus 200 includes the filter 220 provided in the present embodiment. The decoding apparatus 200 is used for decoding blocks in a frame of an encoded video signal. In the embodiment shown in fig. 2A, the decoding apparatus 200 is implemented as a hybrid decoder. The entropy decoding unit 204 performs entropy decoding on the encoded image data. The encoded image data may typically include prediction errors (i.e., residual blocks), motion data, and other side information, which are required by the intra prediction unit 254 and the inter prediction unit 244, as well as other components of the filter 220 in the decoding apparatus 200. In general, the intra prediction unit 254 and the inter prediction unit 244 in the decoding apparatus 200 shown in fig. 2A are selected by the mode selection unit 260 and have the same functions as the intra prediction unit 154 and the inter prediction unit 144 in the encoding apparatus 100 shown in fig. 1, so that the encoding apparatus 100 and the decoding apparatus 200 can generate the same prediction block. The reconstruction unit 214 in the decoding apparatus 200 is configured to reconstruct the block in the frame from the residual block and the filtered prediction block provided by the inverse quantization unit 210 and the inverse transform unit 212. As with the encoding apparatus 100, the reconstructed block may be provided to a column buffer 216 for intra prediction, and a filter 220 may provide the filtered block/frame to a decoded picture buffer 230 for inter prediction.

In the hybrid encoder embodiment shown in fig. 2B, the filter 264 also filters a prediction signal obtained by intra prediction or inter prediction.

As described above, the filters 120, 220 may filter the frames. For example, the filters 120, 220 may be used to process reconstructed frames in the decoded reconstructed video stream to generate filtered reconstructed frames, where the reconstructed frames include a plurality of blocks. The filters 120, 220 may also filter the blocks immediately after they are reconstructed without waiting for the entire frame. For example, the filters 120, 220 may be used to process a reconstructed block to generate a filtered reconstructed block, where the reconstructed block includes a plurality of pixels.

The filter 120, 220, 145, or 264 includes one or more processors (or one or more processing units). As described further below, the one or more processors (or one or more processing units) are configured to: loading a current pixel and neighboring pixels of the current pixel into a linear buffer according to a predefined scan template (or scan order or scan pattern); obtaining spectral components by performing a 1D transform on each pixel in the linear buffer; obtaining filtered spectral components by multiplying each spectral component by a gain factor, wherein the gain factor is determined according to the corresponding spectral component and a filter parameter; obtaining filtered pixels by performing a 1D inverse transform on the filtered spectral components; a filtered reconstructed block is generated from the filtered pixels estimated in the previous processing step. For example, the gain factor is determined based on the corresponding spectral component and the filtering parameter. As another example, the gain factor is determined based on one or more filter parameters and one or more corresponding spectral components. As another example, the respective gain factor may be determined based on one or more filter parameters, the corresponding spectral component, and left and right adjacent spectral components of the spectral component.

In-loop filters for lossy video coding are described that perform local filtering and/or non-local filtering on reconstructed blocks in reconstructed frames. For example, the reconstructed frame is divided into a set of non-overlapping small rectangular blocks (CU blocks). In the next step, each reconstructed block (reconstructed CU block) is filtered in the frequency domain independently of the other reconstructed blocks. The filter can also be used after transformation and reconstruction, and the result of the filtering can be used for output, spatial prediction and temporal prediction.

The invention also describes a prediction filter for lossy video coding, which performs local filtering and/or non-local filtering on a prediction block in a reconstructed frame.

In a first processing step, all pixels within the reconstruction block are processed independently. The processing pixel r (0) requires the use of neighboring pixels. For example, as shown in fig. 3A, processing pixel r (0) requires the use of pixels r (1) through r (7), and pixels r (0) through r (7) constitute one processing group.

Fig. 3A or 3B are schematic diagrams 300, 300' of aspects of a filtering process implemented in the filter provided in the present embodiment. In steps 302, 302', a current pixel and neighboring pixels of the current pixel in a block are loaded into a linear buffer according to a predefined scan template. For example, the block may be a prediction block. As another example, the block may be a reconstructed block.

In steps 304, 304', a 1D transform is performed on pixel R (0) and the neighboring pixels R (1) to R (7) of pixel R (0) in the linear buffer to obtain spectral components R:

R＝1D_Transform(r)

for example, the 1D transform may be an Hadamard transform.

It is to be appreciated that a determination is made as to whether to perform a 1D transformation on 4 pixels in a row (e.g., pixel A, B, C, D in the example shown in fig. 3B) or a 2D transformation on spatially adjacent pixels A, B, C, D depending on the particular implementation. Performing a 2D transform on 4 pixels A, B, C, D located in a 2 x 2 block may produce the same result as performing a 1D transform on 4 pixels A, B, C, D in a row. In steps 306, 306', filtering is performed in the frequency domain by multiplying each spectral component r (i) by a corresponding gain coefficient G (i, σ), obtaining a filtered spectral component f (i):

F(i)＝R(i)×G(i，σ) (1)

the gain coefficients of all spectral components are integrated into the frequency impulse response of the filter.

For example, the gain coefficients are determined from the corresponding spectral components and the filter parameters. As another example, the gain factor is determined based on one or more filter parameters and one or more corresponding spectral components. As another example, the respective gain factor may be determined based on one or more filter parameters, the corresponding spectral component, and left and right adjacent spectral components of the spectral component. If each gain coefficient is a function of the spectral components of the reconstructed block and the filter parameters, or a function of the spectral components of the predicted block and the filter parameters, the gain coefficients G (i, σ) can be described using the following formula as an example:

where (i) is the index of the spectral components, R (i) is the spectral component corresponding to index (i), G (i, σ) is the gain coefficient corresponding to R (i), σ is the filter parameter, and m is a normalization constant equal to the number of spectral components. For example, m is 1,2, 3,4, etc. The gain factors for different spectral components may be the same or different.

For transforms where the spectral components correspond to the average (FFT, DCT, DST, etc.) or sum (adama transform) of the input pixels in the transform block (typically, the first component corresponds to a DC value), it is preferable to set the filter coefficient equal to 1 to avoid changing the average luminance of the filtered block. I.e. the first spectral component corresponding to the DC value is skipped (not filtered).

The filter parameter σ can be derived from a Quantization Parameter (QP) at the encoding end and the decoding end using the following formula, etc.:

σ＝k×2^(n×(QP-s))

where k, n and s are constants, e.g., k 2.64, n 0.1296, and s 11.

The filter parameters for different spectral components may be the same or different.

The parameters k, n and s may be chosen such that σ is determined according to the quantization step size. In the latest video coding standards, the quantization step size is increased by a factor of 1 for every 6 QP value increase. When the parameter k is equal to 0.5,in the example where s is 0, the parameter σ is derived as follows:

quantization scaling matrices are often used to improve video compression quality. In the method, a quantization step derived from the QP is multiplied by a scaling factor transmitted in the bitstream. In this approach, the parameter σ may be derived from the quantization step size used for the actual scaling of a certain QP:

σ＝k×Quantization_step_size(QP-s)

the constants k, n, and s may be fixed values for calculating σ, or take different values according to QP, block size and block shape, prediction type of the current block (inter prediction/intra prediction). For example, for an intra-prediction block of size 32 × 32 or larger, the parameter s may be calculated as s 11+ 8-19. In an equivalent filter, the value of the parameter σ is small, resulting in weak filtering. Weak filtering is more suitable for large intra-predicted blocks, which typically correspond to flat areas. As another example, k may be modified according to the bit depth of the pixel, i.e., k_mod＝k×(1<<(bit_depth–8))。

According to the methods 300, 300', the gain factor for each frequency is derived from the spectral components of the reconstructed or predicted pixel. Thus, the methods 300, 300' do not require transmission of filtering parameters and can be applied to any reconstructed block or predicted block without additional indication.

Details of the LUT are described below.

It should be noted that the filtering means that the spectral components r (i) are multiplied by a scaling factor, which is always smaller than 1. It can also be observed that the scaling factor approaches 1 when the value of r (i) is large. Based on these observations, spectral filtering is implemented using look-up tables, excluding multiplication and division from the filtering operation:

since the spectral gain factor is smaller than 1, the filtering can be implemented by looking up a short look-up table (LUT) according to the following formula:

wherein the content of the first and second substances,(i) is the index of the spectral components, r (i) is the spectral component corresponding to index (i), σ is the filter parameter, THR is the threshold, and m is a normalization constant equal to the number of spectral components. For example, m is 1,2, 3,4, etc.

For example, THR can be calculated from the following equation:

where C is a value close to 1, for example 0.8 or 0.9. To reduce LUT size, the threshold THR may be determined from the QP value.

To further reduce the LUT size, a second threshold may be introduced to replace the filtered small value with 0. In this case, the filtered spectral component F (i, σ) is further derived as:

where THR2 defines a threshold. If less than the threshold, the filtered spectral component is considered to be 0. A second THR2 may also be defined in terms of QP values.

After filtering in the frequency domain, in step 308, a 1D inverse transform is performed on the filtered spectral components, resulting in a filtered pixel f:

f＝1D_Inverse_Transform(F)

in steps 310, 310', the result of the 1D inverse transform is placed into a linear buffer comprising filtered reconstructed pixels or filtered pixels.

In steps 312, 312' (not shown in fig. 3A or 3B), a filtered block is generated from the filtered pixels estimated in the previous processing step. For example, the filtered block may be a filtered prediction block. As another example, the block of filtered waves may be a filtered reconstructed block.

As shown in FIG. 3A, in one embodiment, after the filtering step 306, the filtered pixel f (0) is placed onto the original location of the pixel f (0) according to a predefined scan template. The other filtered pixels f (1) to f (7) are not used. In another embodiment, a plurality of filtered pixels (e.g., all filtered pixels in a linear buffer including filtered pixels) are added to an accumulation buffer according to a predefined scan template used in step 302 of FIG. 3A. The accumulation buffer needs to be initialized to 0 before the filtering step. In the final normalization step, the accumulated value in the accumulation buffer is divided by the number of pixels added to the current position of the accumulation buffer, i.e. the number of pixel values added to the current position of the accumulation buffer in the previous processing step, to obtain the final filtered pixel. A filtered reconstructed or predicted block is then generated from the final filtered pixels.

In another embodiment, intra and inter Coding Unit (CU) filtering of the filter has the same implementation.

If the band quantization parameter is larger than 17, the hadamard transform domain filter is always applicable to the luminance reconstruction block with non-zero transform coefficients, but the 4 × 4 block is excluded. The filter parameters are explicitly derived from the encoded information. If the provided filtering is applicable, such filtering is performed on the decoded pixels after block reconstruction. The results of the filtering can be used for both output and spatial and temporal prediction.

The filtering process is described below, as shown in FIG. 3C.

The process of reconstructing each pixel in a block comprises the following steps:

scanning 4 adjacent pixels around a currently processed pixel (including the current pixel) according to a scanning mode;

performing a 4-point hadamard transform on the read pixels;

performing spectral filtering according to formula (1) and formula (2);

skipping a first spectral component corresponding to the DC value without filtering;

a 4-point inverse hadamard transform is performed on the filtered spectrum.

After the filtering step, the filtered pixel is placed in its original position in the accumulation buffer.

After the pixel filtering is completed, the accumulated values are normalized to the number of processing groups used per pixel filtering. Since one pixel is filled around the block, there are 4 processing groups per pixel in the block, and normalization is performed by right-shifting by 2 bits.

It can be seen that all pixels in the block can be processed in parallel where maximum concurrency is required.

In this embodiment, the threshold THR is set to a predefined value, e.g., 128, if the implementation is simple, then 128(1< <7) entries represented by 7-bit values need to be stored for each QP.

The size of the LUT affects the amount of on-chip memory required and the hardware implementation cost of the filter. To reduce on-chip memory, if the LUT is computed for only a limited set of QPs, the LUT starts at QP 20 with a fixed interval of 8. Five predefined LUTs (for five QP groups) are stored together. To filter the current block CU, the QP of the CU is rounded to the nearest QP in the table.

To further reduce the LUT size, the N lowest bits are deleted (or ignored) in the LUT generation process. This requires the use of sparse table representations.

In the exemplary implementation a, N is equal to 2, resulting in a table depth of 7-2 ═ 5 bits (32 of the table entries represented by the 7-bit values);

in exemplary implementation B, N equals 3, resulting in a table depth of 7-3 ═ 4 bits (16 of the table entries represented by the 7-bit values).

Thus, the total memory size required for the entire LUT storage is:

in exemplary implementation a: 140 bytes for 5 × 32 × 7 bits 1120 bits;

in exemplary implementation B: 5 × 16 × 7 bits 560 bits 70 bytes;

the exemplary implementation B refers to a LUT size of 16 bytes to enable parallel access in a software implementation, since the entire LUT can be stored in a 16-byte SSE register, and thus this configuration is proposed.

If the hadamard transform is used and the filtered pixel is placed on its original location according to a predefined scan template, the filtering process of method 300 is described using the following pseudo-code:

// scan reconstruction/prediction pixels

const int x0＝pIn[p0]；

const int x1＝pIn[p1]；

const int x2＝pIn[p2]；

const int x3 ═ pIn [ p3 ]; // p0-p3 define the scan pattern

//1D Forward Adama conversion

const int y0＝x0+x2；

const int y1＝x1+x3；

const int y2＝x0–x2；

const int y3＝x1–x3；

const int t0＝y0+y1；

const int t1＝y0–y1；

const int t2＝y2+y3；

const int t3＝y2–y3；

// frequency domain filtering

const int z0＝pTbl[t0]；

const int z1＝pTbl[t1]；

const int z2＝pTbl[t2]；

const int z3＝pTbl[t3]；

// back adama transformation

const int iy0＝z0+z2；

const int iy1＝z1+z3；

const int iy2＝z0–z2；

const int iy3＝z1–z3；

// output filtered pixels

pOut[p0_out]＝iy0+iy1；

If the hadamard transform is used and a number of filtered pixels in a linear buffer comprising filtered pixels are added to the accumulation buffer, the filtering process in this scenario is described using the following pseudo-code:

// scan reconstruction/prediction pixels

const int x0＝pIn[p0]；

const int x1＝pIn[p1]；

const int x2＝pIn[p2]；

const int x3 ═ pIn [ p3 ]; // p0-p3 define the scan pattern

//1D Forward Adama conversion

const int y0＝x0+x2；

const int y1＝x1+x3；

const int y2＝x0–x2；

const int y3＝x1–x3；

const int t0＝y0+y1；

const int t1＝y0–y1；

const int t2＝y2+y3；

const int t3＝y2–y3；

// frequency domain filtering

const int z0＝pTbl[t0]；

const int z1＝pTbl[t1]；

const int z2＝pTbl[t2]；

const int z3＝pTbl[t3]；

// back adama transformation

const int iy0＝z0+z2；

const int iy1＝z1+z3；

const int iy2＝z0–z2；

const int iy3＝z1–z3；

In an alternative embodiment, the accumulation buffer needs to be initialized to the product of the unfiltered pixel value times the maximum number of pixel values to be added to the block. The maximum number of pixel values to be added to the block is determined from the scan template. In effect, the scan template defines the number of pixel values that are added to each location. On this basis, the maximum number can be selected and used from all locations in the block during accumulation buffer initialization. Then, in each accumulation step, the unfiltered pixel value is subtracted from the corresponding filtered value and added to the accumulation buffer:

// accumulating filtered pixels

The plout [ p0] + ═ iy0+ iy1// p0-p3 defines the scan pattern

pOut[p1]+＝iy0–iy1

pOut[p2]+＝iy2+iy3

pOut[p3]+＝iy2–iy3

To reduce the bit depth of the accumulated pixel values, the result of the backward transform may be normalized by the size of the transform (m) before placing the pixel values into the accumulation buffer:

pOut[p0]+＝((iy0+iy1)>>HTDF_BIT_RND4)；

pOut[p1]+＝((iy0–iy1)>>HTDF_BIT_RND4)；

pOut[p2]+＝((iy2+iy3)>>HTDF_BIT_RND4)；

pOut[p3]+＝((iy2–iy3)>>HTDF_BIT_RND4)；

where, for transform size 4, HTDF _ BIT _ RND4 equals 2.

This embodiment enables the system to avoid storing multiple pixels added to the current position, allowing the use of a shift operation to replace division and multiplication in the final normalization step and accumulation buffer initialization step if the maximum number of added pixel values is a power of 2,4, 8, etc.

To maintain the accuracy of the normalization phase, this can be performed by:

// normalization

pFiltered[p0]＝CLIP3(0,(1<<BIT_DEPTH)–1,(pOut[p0]+ HTDF_CNT_SCALE_RND)>>HTDF_CNT_SCALE)；

Where HTDF CNT _ SCALE is Log2 of the number of pixels placed into the accumulation buffer, e.g., 4 pixels, HTDF CNT _ SCALE equals 2; HTDF _ CNT _ SCALE _ RND is equal to (1< < (HTDF _ CNT _ SCALE-1)). CLIP3 is a clipping function used to ensure that the filtered pixels are within an allowable range between minimum and maximum pixel values.

As mentioned above, in order to avoid changing the average luminance of the filtered block, the first spectral component (corresponding to the DC value) is preferably not filtered. This also simplifies the implementation of the filter. At this time, the filtering step is as follows:

// frequency domain filtering

const int z0＝t0；

const int z1＝pTbl[t1]；

const int z2＝pTbl[t2]；

const int z3＝pTbl[t3]；

For each pixel in the reconstructed block or predicted block, the scan template used in steps 302 and 310 is selected according to the position of the filtered pixel in the reconstructed block or predicted block. The scan template is chosen to ensure that all pixels are located within the reconstructed or predicted CU and close to the currently processed pixel. The templates may use any scanning order. For example, a predefined scan template is defined as a set of spatial or grid offsets relative to the position of the current pixel within the reconstructed or predicted block, where the offsets pointing to neighboring pixels are located within the reconstructed or predicted block. Examples of scan templates are as follows: (0,0),(0,1),(1,0),(1,1).

Fig. 4A shows one example of a template for different pixel positions in a block (e.g., a square CU prediction block or a square CU reconstruction block). In this figure, the boundary pixels may be filtered by a 4-point transform, and the center pixels may be filtered by an 8-point transform.

One side of a rectangular reconstructed block or predicted block is larger than the other side and should be scanned along the long side. For example, the horizontal rectangular blocks may use the following scan order:

(0,–3),(0,–2),(0,–1),(0,0),(0,1),(0,2),(0,3),(0,4),

where in each pair (y, x), x is the horizontal offset and y is the vertical offset relative to the position of the currently filtered pixel in the currently filtered reconstructed or predicted block.

The provided filter can be selectively applied according to the following conditions:

the reconstructed block or predicted block has a non-zero residual signal;

block sizes from small reconstructed or predicted blocks (minimum size less than a threshold), etc.;

aspect ratio according to the reconstructed block or predicted block;

according to the prediction mode (intra-prediction or inter-prediction) of the reconstructed block or predicted block, for example, only the inter-predicted block is filtered; or

Any combination of the above conditions.

For example, to avoid processing small blocks, filtering may be skipped (not done) if the block size is less than or equal to 4 × 4 pixels. This reduces the worst case complexity for minimum block processing.

As another example, only blocks with non-zero residual signals are filtered. It is beneficial to consider the values of the quantization or residual in determining whether to use the filter, since the purpose of using the filter is to improve the quantization error.

As another example, since intra prediction is generally less effective than inter prediction, a filter may be applied to an intra predicted block when there is no non-zero residual, while a filter may be applied to an inter predicted block as long as the block has a non-zero residual signal.

The filter parameter σ and the scan pattern may be different depending on the above conditions.

Fig. 4B shows equivalent filter shapes, allowing for filtering of one pixel in the current block according to the exemplary scan template (0,0), (0,1), (1,0), (1, 1). The current pixel is filtered using a square area of 3 x3 pixels (the current pixel is marked with a dark gray in the center of the 3 x3 square). The filtered pixels are combined from transform domain filtered pixels in four 2 x 2 groups. It will be appreciated that if the current pixel is located on a block boundary (e.g., the upper boundary), the upper left and upper right 2 x 2 groups are not available and only two 2 x 2 groups (lower left and lower right) can be used for filtering. Furthermore, if the current pixel is located in the corner of the block (e.g., upper left corner), only one 2 x 2 group (lower right corner) can be used for filtering.

To improve the filtering quality, a relatively large number of four 2 x 2 groups may be used for filtering, and if the pixel is located at the pair boundary and corner, the current block may be filled with additional pixels. Fig. 4C shows an example of filling pixels on the left and above. The filled pixels may be selected from the reconstructed block.

To further unify the filtering process for all pixels in a block (using four 2 x 2 groups to filter all pixels in the current block), in addition to filling the pixels at the top left, the current block can be extended by filling at the bottom right as shown in fig. 4D. Uniform filtering is advantageous because corner pixels and boundary pixels simplify implementation by eliminating special processing cases.

The filled pixels are preferably selected from the adjusted neighboring pixels in the reconstructed block. In the latest video codec standards, the reconstructed block may be located to the left or above or to the right or below the current block, specifically determined according to the reconstruction order of the blocks. Using more information of the adjustment pixels improves the filtering quality and makes the block transition smoother.

Obtaining reconstructed pixels from the modulated block or a previous reconstructed block may require additional memory load in a hardware or software implementation. In order to minimize or eliminate the extra memory, it is preferable to use pixels intended for intra prediction of the current block, which are typically selected from one, two or more rows and one, two or more columns of neighboring blocks adjusted to the boundary of the current block. These pixels are typically stored in a flash memory (also referred to as a "column" buffer) for access when performing intra prediction, and are referred to as reference pixels for intra prediction.

It is also noted that in some implementations, prior to performing intra prediction, reference pixels (intra reference pixels) are pre-processed by smoothing, sharpening, deringing, or bilateral filtering, etc., prior to prediction. In this case, the current block is preferably filled with pre-processed pixels.

If some pixels in the padding region are not available, the required pixels can be padded from the current block of extended boundary pixels to the padding region, as shown in FIG. 4D, since the modulated blocks have a reconstruction order.

Fig. 5A is a flowchart of steps of a corresponding in-loop filtering method 500 according to the present embodiment. The reconstructed block includes a plurality of pixels. The method 500 includes the steps of: loading (502) a current pixel and neighboring pixels of the current pixel into a linear buffer according to a predefined scan template; obtaining (504) spectral components by performing a 1D transform on pixels in the linear buffer; obtaining (506) filtered spectral components by multiplying each spectral component by a gain factor, wherein the gain factor is determined from the corresponding spectral component and a filtering parameter; obtaining (508) filtered pixels by performing an inverse 1D transform on the filtered spectral components; generating (510) a filtered reconstructed block from the filtered pixels estimated in the previous processing step. The method 500 may be performed by the encoding apparatus shown in fig. 1 and the decoding apparatus shown in fig. 2. The detailed information 300 of fig. 3A or the information 300' of fig. 3B also applies to the method 500 shown in fig. 5A.

Similar to fig. 5A, fig. 5B is a flow chart of the steps of a corresponding in-loop filtering method 500' according to another embodiment. In this example, the block (or frame) is a prediction block and the filtered block is a filtered prediction block. The detailed description of fig. 5B is similar to fig. 5A.

The following describes a hardware implementation.

The hadamard transform domain filtering occurs just after block reconstruction for processing the pixels. These pixels can be used for subsequent block reconstruction, in particular as reference pixels for intra prediction. Therefore, it is desirable to minimize the delay introduced by the filtering to ensure that the entire reconstruction process is not significantly affected.

The hadamard transform is relatively simple in hardware implementation. Only addition is needed to implement the adama transformation, and no multiplication is needed. As can be seen from pseudo-code 1 below, both forward and backward transforms involve 4 additions that can be done in parallel or with two sequential addition operations to reuse intermediate results.

Pseudo code 1

The forward-backward adam transform can be implemented in hardware using combinational logic. There is a need for fast, parallel access to LUTs.

The following describes an SRAM-based LUT.

In this exemplary implementation, the LUT is stored in an on-chip single port static RAM (FIG. 6).

Once the data from the previous processing step is available in the buffer by the rising edge of the clock, it can be accessed by combinatorial logic implementing a forward hadamard transform (containing two subsequent additions). After the combinatorial logic is completed, an address exists for each LUT. Data is accessed from the SRAM by an inverter and the falling edge of the clock. A second combinatorial logic implementing the backward hadamard transform and normalization is started immediately after the LUT data is available. The output filtered pixels become available at the end of the current clock cycle and are ready to be processed by the next algorithm on the next rising edge of the clock.

The following describes a flip-flop based LUT.

Implementing a LUT based on flip-flops is more efficient considering that one table used by the process is limited to 16 entries. This design does not require parallel processing of several LUTs, nor does it require clock edges for data access. The multiplexers implement parallel access as shown in fig. 7. FIG. 7 shows an exemplary design for processing a 2 × 2 group. In the proposed design, 16 7-bit flip-flops are needed to achieve parallel access during the filtering process. Once the QP for the current CU is available, the QP specific LUT may be loaded into the flip-flop.

Combining the results of the four 2 x 2 groups of filtering and reusing the results of the same spatial group produces the final filter output, as shown in fig. 8.

From the above analysis, it can be concluded that: the provided filter can be implemented in hardware using SRAM based or flip-flop based LUTs in one clock.

The complexity analysis is described below.

The effect on the code rate/PSNR is measured with respect to the anchor point.

Complexity analysis (e.g., encoding and decoding time metrics, complexity analysis by filling out the table below).

Table 1: summary of CE14-3 complexity analysis

Consider max/min/abs as a check term

The results of the experiments are described below.

The objective results are described below.

The target properties are shown in the following table:

table 2: testing the coding Performance of 14-3a

Table 3: testing the coding Performance of 14-3b

The LUT is proposed to have 70 bytes, 16 bytes per QP, allowing a 1-clock hardware implementation, and so on. The adama transform domain filter is employed in the next version of VTM.

The following references are incorporated herein by reference in their entirety:

joint Video Experts Team (JVT) file JVT-K0068.

The following illustrates how the LUT may be optimized.

In example 1, a set of Quantization Parameters (QPs) is selected to generate a lookup table (LUT), wherein the set of QPs includes a first QP corresponding to index (i) and a second QP corresponding to index (i +1), with a fixed interval between the first QP and the second QP. For example, the interval may be equal to 8,10 or 16.

For example, with fixed interval 8 as an example, QP in the sparse table of the LUT is {20, 28,36, 44, 52 }. The interval between the first QP 20 and the second QP 28 is 8. Similarly, the interval between the second QP 28 and the third QP 36 is 8. During filtering, the table with the closest QP is selected.

As another example, with fixed interval 8 as an example, QP in the sparse table of the LUT is {18, 26, 34, 42, 50 }. The interval between the first QP 18 and the second QP 26 is 8. Similarly, the interval between the second QP 26 and the third QP 34 is 8. During filtering, the table with the closest QP is selected.

The LUT size is: 5 x 128-640 bytes

Following is pseudo code 2, indicating which QPs are selected to generate a lookup table (LUT).

Pseudo code 2

In this pseudo code, HTDF _ QP _ ROUND denotes a fixed interval. Setting the interval to a power of 2 is advantageous for implementing the division operation for calculating the index as a shift. It should be noted that different fixed interval values may be selected, such as 2,4, 10, 15, or 16, for example. Further, in alternate embodiments, the interval may be any value, calculating the LUT for any QP set.

During the filtering process, for a given QP, the index corresponding to the LUT is calculated as follows:

int idx＝((qp–HTDF_MIN_QP)+(HTDF_QP_ROUND>>1))/HTDF_QP_ROUND；

alternatively, in the case of low precision:

int idx＝(qp–HTDF_MIN_QP)/HTDF_QP_ROUND；

if the fixed interval is a power of 2, then using a shift operation instead of division can better compute the index of the LUT:

int idx＝(qp–HTDF_MIN_QP)>>HTDF_QP_ROUND_LOG2＝(qp–HTDF_MIN_QP)>> 3；

in example 2, the decoder or encoder removes N bits from the table value, where N is an integer. This allows the sparse LUT table to store only selected values within a given range. For example, N is 3. The maximum table value is 127(7 bits), 3 bits are removed, and the result is 4 bits, i.e., 16 in the table entry represented by the 7-bit value, about 16 bytes.

The following is pseudo code 3, describing how the LUT is generated from a given QP value.

Pseudo code 3

In the given example, the HTDF _ TBL _ SH defines the number of bits to delete, and may be 1,2, 3,4, etc.

The following is a pseudo code illustrating the access to the sparse LUT during filtering:

tbl[(z+HTDF_TBL_RND)>>HTDF_TBL_SH]

when the above examples 1 and 2 are combined, fig. 9 illustrates the result of LUT optimization. As shown in fig. 9, example 1 and example 2 were combined. The 5 tables × 16 table entries × 7 bits 560 bits 70 bytes.

It should be noted that the number of LUT entries is determined according to the number of erased bits of the HTDF _ SHORT _ TBL _ THR (threshold value described in paragraph [0076 ]) and the number of erased bits of the HTDF _ TBL _ SH. Considering that the threshold is equal to 128 (i.e., 1< <7), 3 bits are deleted so that the number of entries is equal to 1< (7-3) > 1< <4 > 16. As mentioned above, the LUT threshold is preferably such that the result of the equation described in paragraph [0077], which, as also described herein, may vary from QP to QP, is close to 1. Therefore, to generate a LUT for high QP values, it is preferable to increase the threshold from 128(1< <7) to 256(1< <8) and so on. In this case, 32 entries (32-1 < (8-3) > 1< <5) are needed to keep the LUT entries with the same precision (e.g., in case 3 bits are deleted). Alternatively, in order to keep the high QP and low QP the same LUT size, the precision may be further reduced to 4 to keep 16 entries in the table, 16-1 < (8-4) -1 < < 4.

In certain implementations, it may be contradictory to keep the LUT size limit and let THR satisfy the equation in paragraph [0077 ]. In practice, to keep the LUT size limited to, for example, 16 entries when the QP value is large (resulting in a large σ value), may result in undesirable gaps in the filter transfer function (represented by the LUT) at value 120, as shown in fig. 10 (including also the method of LUT downsampling by deleting 3 least significant bits).

Fig. 11 shows the same table with entries drawn one by one, without considering the effect of LUT downsampling. Fig. 11 shows the gap between the last LUT table entry corresponding to index 15 and the filter transfer function corresponding to index 16.

Fig. 12 shows a method of how to eliminate the gap using the auxiliary function. The auxiliary function passes the THR value (e.g., by using the straight line equation (in green)) at parameter +1 corresponding to the last LUT table entry; at parameter (15) plus 1 or LUT size (16) corresponding to the last LUT entry, pass point THR 128; or intersect the x-axis at some point (e.g., value 11 in some example, etc.). It should be noted that other types of auxiliary functions may be used based on the same principle, including exponential, logarithmic, parabolic, etc., or combinations thereof.

FIG. 13 illustrates an example of eliminating gaps by taking the maximum of two values when generating a LUT, where the first value is the LUT entry as described aboveThe second value is the value of auxiarily function at the same parameter i (straight line in this example):

wherein AuxiliaryF.nc_σ(i) Representing an auxiliary function having a value equal to THR at i corresponding to the parameter of the last LUT + 1.

Fig. 14 shows the filter transfer function after applying the above method of sampling under LUT using the secondary straight line equation and by deleting 3 least significant bits.

As described in paragraph [00144], one LUT may be used for QP groups. To cover the range of possible QPs, a predefined set of QPs is used and a LUT is generated for each QP in the set. Fig. 15 shows an example of the filter transfer functions determined from the five QPs in the set and the corresponding tables (table 0 to table 4). In this example, Table 4 is generated using the method described in paragraphs [00156] through [00157], with the straight-line helper function intersecting the x-axis at a value of 11; table 3 was generated using the same method, with the straight-line auxiliary function intersecting the x-axis at value 9. The table used in this example includes the following values:

table 0 ═ 0,2,10,19,28,36,45,53,61,70,78,86,94,102,110,118, },

table 1 ═ 0,0,5,12,20,29,38,47,56,65,73,82,90,98,107,115, },

table 2 is {0,0,1,4,9,16,24,32,41,50,59,68,77,86,94,103 },

table 3 ═ 0,0,0,1,3,5,9,14,19,25,32,40,55,73,91,110, },

table 4 ═ 0,0,0,0,1,2,4,6,8,11,14,26,51,77,102, },

as described in paragraphs [00150] to [00153] above, the table downsampling method may remove N bits from the table values to reduce the table size. These paragraphs also mention that N may be different, depending on the QP used to generate a certain table and the THR value selected for that table. For example, the filter parameter σ for a small QP value is relatively smaller than the filter parameter σ for a large QP value. Therefore, the absolute value of THR can be reduced without affecting performance. Furthermore, in order to keep the table size for all QPs in the set the same (which is advantageous for simplifying the implementation) and to reduce the downsampling rate for smaller QPs (corresponding to lower compression levels and better reconstructed image quality), it is advantageous to reduce the number of bits N deleted compared to other QP tables, e.g. by setting N for small QPs to 2 and THR to 64. Fig. 16 shows an example of filter transfer functions for five QPs according to the respective tables (table 0 to table 4), where N is set to 2 in the first table (corresponding to small QP range) and N is set to 3 in the other tables. This example also includes the method of generating tables 3 and 4 using the Auxiliaryfunction described in paragraphs [00159] to [00160 ]. In table 4, the straight-line auxiliary function intersects the x-axis at a value of 11. In table 4, the straight-line helper function intersects the x-axis at a value. The table used in this example includes the following values:

table 0 ═ 0,0,2,6,10,14,19,23,28,32,36,41,45,49,53,57, },

table 1 ═ 0,0,5,12,20,29,38,47,56,65,73,82,90,98,107,115, },

table 2 is {0,0,1,4,9,16,24,32,41,50,59,68,77,86,94,103 },

table 3 ═ 0,0,0,1,3,5,9,14,19,25,32,40,55,73,91,110, },

table 4 ═ 0,0,0,0,1,2,4,6,8,11,14,26,51,77,102, },

as described in paragraphs [00150] to [00153] above, the table downsampling method may remove N bits from the table values to reduce the table size. These paragraphs also mention that N may be different, depending on the QP used to generate a certain table and the THR value selected for that table. For example, a large QP value corresponds to a filter parameter σ that is relatively larger than a small QP value, which requires increasing the THR value to keep the equation in paragraph [0077] closer to 1. At the same time, to keep the LUT size the same for all QPs in the set (this is advantageous due to implementation simplicity), and also to allow for: for larger QP values, the reconstructed image has more distortion, while for subjective reasons, increasing the sub-sampling of the LUT is acceptable in the presence of strong compression artifacts. The value N of the deleted least significant bit may be increased to 4 for the last or penultimate table in the set, etc. Fig. 17 shows an example of filter transfer functions for five QPs in a set according to the respective tables (table 0 to table 4), where N is set to 2 in the first table (table 0 corresponds to a small QP range), N is set to 4 in the last and second last tables (table 3 and table 4), and N is set to 3 in the other tables. In the present example, TRH is set to 64 when the first table is generated, 256 when the second to last table is generated, and 128 when the remaining tables are generated. This example also includes the method of generating tables 3 and 4 using the Auxiliaryfunction described in paragraphs [00159] to [00160 ]. In table 4, the straight-line auxiliary function intersects the x-axis at a value of 6. In Table 4, the straight line assist function intersects the x-axis at a value of 8. The table used in this example includes the following values:

table 0 ═ 0,0,2,6,10,14,19,23,28,32,36,41,45,49,53,57, },

table 1 ═ 0,0,5,12,20,29,38,47,56,65,73,82,90,98,107,115, },

table 2 is {0,0,1,4,9,16,24,32,41,50,59,68,77,86,94,103 },

table 3 ═ 0,0,3,9,19,32,47,64,81,99,117,135,154,179,205,230, },

table 4 ═ 0,0,0,2,6,11,18,27,38,51,64,96,128,160,192,224, },

fig. 18 is a block diagram of an apparatus 600 that may be used to implement various embodiments. The apparatus 600 may be the encoding apparatus shown in fig. 1 and the decoding apparatus shown in fig. 2. Additionally, apparatus 600 may include one or more of the described elements. In some embodiments, apparatus 600 is equipped with one or more input/output devices, such as speakers, microphones, mice, touch screens, keypads, keyboards, printers, displays, and the like. The apparatus 600 may include one or more Central Processing Units (CPUs) 610, a memory 620, a mass storage 630, a video adapter 640, and an I/O interface 660 connected to a bus. The bus is one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a video bus, and the like.

The CPU 610 may comprise any type of electronic data processor. The memory 620 may include or may be any type of system memory such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), read-only memory (ROM), combinations thereof, and the like. In one embodiment, the memory 620 may include ROM for use at boot-up and DRAM for program and data storage for use during program execution. In an embodiment, memory 620 is a non-transitory memory. Mass storage 630 includes any type of storage device that stores data, programs, and other information and enables the data, programs, and other information to be accessed over the bus. The mass storage 630 includes one or more of a solid state disk, hard disk drive, magnetic disk drive, optical disk drive, etc.

Video adapter 640 and I/O interface 660 provide interfaces to couple external input and output devices to apparatus 600. For example, the apparatus 600 may provide a SQL command interface to a client. As shown, examples of input and output devices include any combination of a display 690 coupled with the video adapter 640 and a mouse/keyboard/printer 670 coupled with the I/O interface 660. Other devices may be coupled to apparatus 600 and may utilize additional or fewer interface cards. For example, a serial interface card (not shown) may be used to provide a serial interface for the printer.

The apparatus 600 also includes one or more network interfaces 650, where the network interfaces 650 include wired links such as ethernet lines, and/or wireless links to access nodes or one or more networks 680. Network interface 650 allows device 600 to communicate with remote units over a network 680. For example, the network interface 650 may communicate with a database. In one embodiment, the apparatus 600 is coupled to a local or wide area network for data processing and communication with other processing units, the internet, remote storage devices, and the like.

The design of the provided in-loop filter or prediction filter has the following advantages over conventional adaptive filtering methods (e.g., ALF):

the provided frequency-domain filter derives the filtering parameters (frequency-domain gain coefficients) from the reconstructed frame or the predicted block at the decoding end, so that there is no need to transmit the filtering parameters from the encoding end to the decoding end.

ALF requires complex Rate Distortion Optimization (RDO) at the encoding end to reduce the weighting coefficients for transmission. The method provided does not require complex RDO (no parameter passing) at the encoding end and is applicable to all blocks that satisfy a predetermined condition.

ALF is a linear filter in the pixel domain. The filter provided is non-linear in that the gain factor for each 1D spectral component is determined from the value of that spectral component. This allows additional coding gain to be gained from the non-linear processing.

At the decoding end, ALF requires that generic multiplication be enabled. In the provided method, the filtering may be implemented as a look-up table, since the gain of each spectral coefficient is less than 1. Thus, the provided method can be implemented without multiplication.

Thus, the filter helps to improve video codec efficiency at low complexity.

While a particular feature or aspect of the invention may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "includes," has, "or other variants of these terms are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising. Also, the terms "exemplary," "e.g.," are merely meant as examples, rather than the best or optimal. The terms "coupled" and "connected," along with their derivatives, may be used. It will be understood that these terms may be used to indicate that two elements co-operate or interact with each other, whether or not they are in direct physical or electrical contact, or they are not in direct contact with each other.

Although specific aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present invention. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.

Although the elements of the above claims are recited in a particular sequence with corresponding labeling, unless the recitation of the claims otherwise implies a particular sequence for implementing some or all of the elements, the elements are not necessarily limited to being implemented in the particular sequence described.

Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the foregoing teachings. However, those skilled in the art will readily appreciate that there are numerous other applications of the present invention in addition to those described herein. While the present invention has been described with reference to one or more particular embodiments, those skilled in the art will recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and equivalents thereto, the invention may be practiced otherwise than as specifically described herein.

49页详细技术资料下载

Filtering device and method in video coding and decoding

相关技术

网友询问留言