Method and apparatus for quantizing coefficients for matrix-based intra prediction techniques

文档序号：246882 发布日期：2021-11-12 浏览：7次中文

阅读说明：本技术 用于基于矩阵的帧内预测技术的量化系数的方法和设备 (Method and apparatus for quantizing coefficients for matrix-based intra prediction techniques ) 是由阿列克谢·康斯坦丁诺维奇·菲利波夫瓦西里·亚历斯维奇·拉夫特斯基陈建乐塞米赫·艾森力克于 2020-03-24 设计创作，主要内容包括：块的帧内预测的方法,包括：获得两排重建的相邻样本；基于两排重建的相邻样本导出一组参考样本；基于从比特流获得的帧内预测模式获得一组MIP系数,其中,该一组MIP系数中的MIP系数C-(MIP)通过如下方式获得(恢复)：C-(MIP)＝Vsgn·(q＜＜s),其中,q是MIP系数的数值；s是左移值；Vsgn是MIP系数的符号值；基于一组参考样本和一组MIP系数获得预测块；其中,基于预测块获得重建的图片。(A method of intra prediction of a block, comprising: obtaining two rows of reconstructed adjacent samples; deriving a set of reference samples based on the two rows of reconstructed neighboring samples; obtaining a set of MIP coefficients based on an intra prediction mode obtained from the bitstream, wherein a MIP coefficient C of the set of MIP coefficients MIP Obtained (recovered) by: c MIP Vsgn (q < s), where q is the value of the MIP coefficient; s is a left shift value; vsgn is the sign value of the MIP coefficient; obtaining a prediction block based on a set of reference samples and a set of MIP coefficients; wherein the reconstructed picture is obtained based on the prediction block.)

1. A method of intra prediction, wherein the method of intra prediction is a directional intra prediction method or an Affine Linear Weighted Intra Prediction (ALWIP) method, wherein the method comprises the steps of:

(1601) preparing a set of reference samples;

(1603) in case that the method of intra prediction of the first block is directional intra prediction:

(1605) obtaining a first prediction signal for the first block of a first picture by convolving the set of reference samples with a first set of coefficients;

(1607) obtaining a first reconstructed block of the first picture according to the first prediction signal; and

(1603) in case the method of intra prediction of the second block is ALWIP:

(1609) obtaining a second prediction signal for the second block of a second picture by convolving the set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision;

(1611) upsampling the second prediction signal; and

(1613) obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

2. The method of claim 1, wherein the first set of coefficients and/or the second set of coefficients are adaptively defined according to a position of a prediction sample, respectively.

3. The method of claim 1 or 2, wherein the coefficients of the core matrix a have a 6-bit (bit) precision, such that processing of 10-bit samples is suitable for 16-bit operations.

4. The method according to any of claims 1 to 3, wherein the step of upsampling is skipped for directional intra prediction.

5. The method of intra prediction according to any of claims 1 to 4, further comprising:

obtaining two rows of reconstructed adjacent samples;

deriving the set of reference samples based on the two rows of reconstructed neighboring samples;

a set of MIP coefficients is obtained based on an intra prediction mode obtained from the bitstream,

wherein a MIP-coefficient C of the set of MIP-coefficients_MIPObtained by the following method:

C_MIP＝v_sgn·(q＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient;

obtaining a prediction block based on the set of reference samples and the set of MIP coefficients;

wherein a reconstructed picture is obtained based on the prediction block.

6. The method of claim 5, wherein obtaining a prediction block based on the set of reference samples and the set of MIP coefficients comprises a matrix multiplication of the reference samples and the set of MIP coefficients, wherein a multiplication operation in the matrix multiplication is performed at a reduced bit depth by relocating a shift operation after the multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

7. The method according to claim 5 or 6, wherein the numerical value q of the MIP-coefficient is a 6-bit depth value.

8. The method of any of claims 5 to 7, wherein the left-shift value is a 2-bit depth value.

9. The method according to any one of claims 5 to 8, wherein the multiplication is performed by means of a multiplier used in an intra interpolation process of the angular intra prediction.

10. An encoder (20) comprising processing circuitry for performing the method according to any one of claims 1 to 9.

11. A decoder (30) comprising processing circuitry for performing the method according to any one of claims 1 to 9.

12. A computer program product comprising program code for performing the method according to any one of claims 1 to 9.

13. A decoder (30) comprising:

one or more processors; and

a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the decoder to perform the method of any of claims 1-9.

14. An encoder (20), comprising:

one or more processors; and

a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the encoder to perform the method of any of claims 1-9.

15. An encoder (20), comprising:

a preparation unit (2001) configured to prepare a set of reference samples;

a first obtaining unit (2003) configured to:

in the case of intra-predicting the first block by directional intra-prediction:

obtaining a first prediction signal for the first block of a first picture by convolving the set of reference samples with a first set of coefficients, an

Obtaining a first reconstructed block of the first picture according to the first prediction signal;

a second obtaining unit (2005) configured to:

in the case of intra-predicting a second block of a second picture by affine linear weighted intra-prediction (ALWIP):

obtaining a second prediction signal for the second block of the second picture by convolving the set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision;

upsampling the second prediction signal; and

obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

16. Encoder (20) according to claim 15, wherein the first set of coefficients and/or the second set of coefficients are adaptively defined according to the position of the prediction samples, respectively.

17. Encoder (20) according to claim 15 or 16, wherein the coefficients of the core matrix a have a 6-bit precision, such that the processing of 10-bit samples is suitable for 16-bit operations.

18. Encoder (20) according to any of claims 15 to 17, wherein the second obtaining unit (2005) is configured to skip the step of upsampling for directional intra prediction.

19. The intra-predicted encoder (20) according to any one of claims 15 to 18, further comprising:

a third obtaining unit configured to obtain two rows of reconstructed adjacent samples;

a deriving unit configured to derive the set of reference samples based on the two rows of reconstructed neighboring samples;

a fourth obtaining unit configured to obtain a set of MIP coefficients based on an intra prediction mode obtained from the bitstream,

wherein a MIP-coefficient C of the set of MIP-coefficients_MIPObtained by the following method:

C_MIP＝v_sgn·(q＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient;

a prediction unit configured to obtain a prediction block based on the set of reference samples and the set of MIP coefficients;

wherein a reconstructed picture is obtained based on the prediction block.

20. Encoder (20) according to claim 19, wherein the prediction unit is configured to obtain the prediction block based on a matrix multiplication of the reference sample and the set of MIP coefficients, wherein the multiplication operation in the matrix multiplication is performed at a reduced bit depth by relocating a shift operation after the multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

21. Encoder (20) according to claim 19 or 20, wherein the value q of the MIP coefficients is a 6-bit depth value.

22. Encoder (20) according to any of claims 19-21, wherein the left-shifted value is a 2-bit depth value.

23. Encoder (20) according to any of claims 19-22, wherein the multiplication is performed by means of a multiplier used in an intra interpolation process of angular intra prediction.

24. A decoder (30) comprising:

a preparation unit (3001) configured to prepare a set of reference samples;

a first obtaining unit (3003) configured to:

in the case of intra-predicting the first block by directional intra-prediction:

obtaining a first prediction signal for the first block of a first picture by convolving the set of reference samples with a first set of coefficients, an

Obtaining a first reconstructed block of the first picture according to the first prediction signal;

a second obtaining unit (3005) configured to:

in the case of intra-predicting a second block of a second picture by affine linear weighted intra-prediction (ALWIP):

upsampling the second prediction signal; and

obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

25. Decoder (30) in accordance with claim 24, in which the first set of coefficients and/or the second set of coefficients are adaptively defined in dependence on the position of the prediction samples, respectively.

26. Decoder (30) according to claim 24 or 25, wherein the coefficients of the core matrix a have a 6-bit precision, such that the processing of 10-bit samples is suitable for 16-bit operations.

27. The decoder (30) according to any of claims 24-26, wherein the second obtaining unit (3005) is configured to skip the step of upsampling for directional intra prediction.

28. The intra-predicted decoder (30) according to any one of claims 24 to 27, further comprising:

a third obtaining unit configured to obtain two rows of reconstructed adjacent samples;

a deriving unit configured to derive the set of reference samples based on the two rows of reconstructed neighboring samples;

a fourth obtaining unit configured to obtain a set of MIP coefficients based on an intra prediction mode obtained from the bitstream,

wherein a MIP-coefficient C of the set of MIP-coefficients_MIPObtained by the following method:

C_MIP＝v_sgn·(q＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient;

a prediction unit configured to obtain a prediction block based on the set of reference samples and the set of MIP coefficients;

wherein a reconstructed picture is obtained based on the prediction block.

29. Decoder (30) according to claim 28, wherein the prediction unit is configured to obtain the prediction block based on a matrix multiplication of the reference sample and the set of MIP coefficients, wherein the multiplication operation in the matrix multiplication is performed at a reduced bit depth by relocating a shift operation after the multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

30. Decoder (30) according to claim 28 or 29, wherein the value q of the MIP coefficients is a 6-bit depth value.

31. The decoder (30) of any of claims 28-30, wherein the left-shifted value is a 2-bit depth value.

32. Decoder (30) according to one of the claims 28 to 31, wherein the multiplication is performed by means of a multiplier used in the intra interpolation process of the angular intra prediction.

33. The encoder (20) of claim 15, the decoder (30) of claim 24, wherein the first obtaining unit (2001, 3001) and the second obtaining unit (2003, 3003) are identical.

Technical Field

Embodiments of the present disclosure relate generally to the field of picture processing, and more particularly, to a method of intra prediction in video coding. In other words, embodiments of the present disclosure relate to mechanisms to signal intra prediction modes.

Background

Video encoding (video encoding and decoding) is used for a wide range of digital video applications such as video transmission over broadcast digital TV, the internet and mobile networks, real-time conversational applications (e.g. video chat, video conferencing), DVD and blu-ray discs, video content capture and editing systems, and cameras for security applications.

The amount of video data required to render even relatively short video can be large, which can lead to difficulties when the data is to be streamed or otherwise communicated over a communication network with limited bandwidth capacity. Thus, video data is typically compressed before being transmitted over modern telecommunication networks. When video is stored on a storage device, the size of the video may also be a problem, as storage resources may be limited. Video compression devices typically encode video data using software and/or hardware at a source location prior to transmission or storage, thereby reducing the amount of data required to represent digital video images. The compressed data is then received at the destination by a video decompression device that decodes the video data. With limited network resources and the ever-increasing demand for higher video quality, there is a need for improved compression and decompression techniques that improve compression rates with little sacrifice in picture quality.

Disclosure of Invention

Embodiments of the present disclosure provide devices and methods for encoding and decoding according to the independent claims.

The foregoing and other objects are achieved by the subject matter of the independent claims. Further forms of realization are apparent from the dependent claims, the description and the drawings.

The present disclosure provides:

a method of intra prediction, wherein the method of intra prediction is a directional intra prediction method or an ALWIP method, wherein the method comprises the steps of:

preparing a set of reference samples;

in the case where the method of intra prediction for the first block is directional intra prediction:

obtaining a first prediction signal for a first block of a first picture by convolving a set of reference samples with a first set of coefficients;

obtaining a first reconstruction block of a first picture according to a first prediction signal; and

in case the method of intra prediction of the second block is ALWIP:

obtaining a second prediction signal for a second block of the second picture by convolving a set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision;

upsampling the second prediction signal; and

a second reconstructed block of the second picture is obtained from the upsampled second prediction signal.

Thus, the present disclosure presents unified directional intra prediction and ALWIP by adjusting the accuracy of the multiplication operations. This unification makes it possible for both methods to have a unified convolution step and thus eliminate hardware redundancy. Each of the above steps includes parameters that can be adjusted. By defining a set of parameters, the sequence of steps can be operated as ALWIP or as directional intra prediction.

Affine linear weighted intra prediction ALWIP may also be referred to as matrix-based intra prediction MIP.

In the method as described above, the first set of coefficients and/or the second set of coefficients may be adaptively defined according to the position of the prediction sample, respectively.

In the method described above, the coefficients of the core matrix a may have 6-bit precision, so that the processing of 10-bit samples is suitable for 16-bit operations.

In the method as described above, the upsampling step may be skipped for directional intra prediction.

In the method as described above, the method may further include:

obtaining two rows of reconstructed adjacent samples;

deriving a set of reference samples based on the two rows of reconstructed neighboring samples;

a set of MIP coefficients is obtained based on an intra prediction mode obtained from the bitstream,

wherein, the MIP coefficient C in a group of MIP coefficients_MIPCan be obtained by the following steps:

C_MIP＝v_sgn·(q＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient;

obtaining a prediction block based on a set of reference samples and a set of MIP coefficients;

wherein the reconstructed picture may be obtained based on the prediction block.

In the method as described above, obtaining the prediction block based on the set of reference samples and the set of MIP coefficients may comprise a matrix multiplication of the reference samples and the set of MIP coefficients, wherein the multiplication operation in the matrix multiplication may be performed at a reduced bit depth by relocating the shift operation after the multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

In the method described above, the value q of the MIP coefficient may be a 6-bit depth value.

In the method described above, the left-shift value may be a 2-bit depth value.

In the method as described above, multiplication may be performed by means of a multiplier used in the intra interpolation process of angle intra prediction.

The present disclosure also provides an encoder comprising processing circuitry for performing the above method.

The present disclosure also provides a decoder comprising processing circuitry for performing the above method.

The present disclosure also provides a computer program product comprising program code for performing the above method.

The present disclosure also provides a decoder comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the decoder to perform the above-described method.

The present disclosure also provides an encoder comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the encoder to perform the above-described method.

The present disclosure also provides an encoder comprising: a preparation unit configured to prepare a set of reference samples; a first obtaining unit configured to: in the case of intra-predicting the first block by directional intra-prediction: obtaining a first prediction signal for a first block of the first picture by convolving a set of reference samples with a first set of coefficients, and obtaining a first reconstructed block of the first picture from the first prediction signal; a second obtaining unit configured to: in the case of intra-prediction of a second block of a second picture by ALWIP: obtaining a second prediction signal for a second block of the second picture by convolving a set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision; upsampling the second prediction signal; and obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

In the encoder as described above, the first set of coefficients and/or the second set of coefficients may be adaptively defined according to the position of the prediction sample, respectively.

In the encoder described above, the coefficients of the core matrix a may have 6-bit precision, so that the processing of 10-bit samples is suitable for 16-bit operations.

In the encoder as described above, the second obtaining unit is configured to skip the upsampling step for directional intra prediction.

The encoder as described above may further include: a third obtaining unit configured to obtain two rows of reconstructed adjacent samples; a deriving unit configured to derive a set of reference samples based on the two rows of reconstructed neighboring samples; a fourth obtaining unit configured to obtain a set of MIP coefficients based on an intra prediction mode obtained from the bitstream, wherein the set of MIP coefficientsMIP coefficient C of MIP coefficients_MIPCan be obtained by the following steps:

C_MIP＝v_sgn·(q＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; a prediction unit configured to obtain a prediction block based on a set of reference samples and a set of MIP coefficients; wherein the reconstructed picture may be obtained based on the prediction block.

In the encoder as described above, the prediction unit may be configured to obtain the prediction block based on a matrix multiplication of the reference samples and a set of MIP coefficients, wherein the multiplication operation in the matrix multiplication may be performed at a reduced bit depth by relocating the shift operation after the multiplication:

p·G_MIP＝v_sgn·((p·q)＜＜s)

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

In the encoder described above, the value q of the MIP coefficient may be a 6-bit depth value.

In the encoder described above, the left-shift value may be a 2-bit depth value.

In the encoder as described above, multiplication may be performed by means of a multiplier used in the intra interpolation process of angle intra prediction.

The present disclosure also provides a decoder comprising: a preparation unit configured to prepare a set of reference samples; a first obtaining unit configured to: in the case of intra-predicting the first block by directional intra-prediction: obtaining a first prediction signal for a first block of the first picture by convolving a set of reference samples with a first set of coefficients, and obtaining a first reconstructed block of the first picture from the first prediction signal; a second obtaining unit configured to: in the case of intra-prediction of a second block of a second picture by ALWIP: obtaining a second prediction signal for a second block of the second picture by convolving a set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision; upsampling the second prediction signal; and obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

In the decoder as described above, the first set of coefficients and/or the second set of coefficients may be adaptively defined according to the positions of the prediction samples, respectively.

In the decoder described above, the coefficients of the core matrix a may have 6-bit precision, so that the processing of 10-bit samples is suitable for 16-bit operations.

In the decoder as described above, the second obtaining unit (3005) may be configured to skip the upsampling step for directional intra prediction.

The decoder as described above may further include: a third obtaining unit configured to obtain two rows of reconstructed adjacent samples; a deriving unit configured to derive a set of reference samples based on the two rows of reconstructed neighboring samples; a fourth obtaining unit configured to obtain a set of MIP coefficients based on an intra prediction mode obtained from the bitstream, wherein a MIP coefficient C of the set of MIP coefficients_MIPCan be obtained by the following steps:

C_MIP＝v_sgn·(q＜＜s)，

In the decoder as described above, the prediction unit may be configured to obtain the prediction block based on a matrix multiplication of the reference samples and a set of MIP coefficients, wherein the multiplication operation in the matrix multiplication may be performed at a reduced bit depth by relocating the shift operation after the multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

In the decoder described above, the value q of the MIP coefficient may be a 6-bit depth value.

In the decoder described above, the left-shifted value may be a 2-bit depth value.

In the decoder as described above, wherein the multiplication may be performed by means of a multiplier used in the intra interpolation process of the angle intra prediction.

In the encoder and decoder as described above, the first obtaining unit and the second obtaining unit may be the same.

It should be noted that MIP may use a dedicated MPM signaling mechanism, as well as a redefined MPM list derivation mechanism, which may be enabled when intra lwip flag is 1.

The present disclosure reduces the number of checks in the parsing process by unifying the cases of signaling when the intra _ lwip _ flag is 0 and when the intra _ lwip _ flag is 1.

According to a first aspect, the present invention relates to a method of unified MPM list construction, and thus to a new correlation between the number of signaled directional intra prediction modes and the size of intra prediction blocks.

According to a second aspect, the invention relates to a device for decoding a video stream, comprising a processor and a memory. The memory stores instructions for causing the processor to perform the method according to the first aspect.

According to a third aspect, the invention relates to a device for encoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method according to the second aspect.

According to a fourth aspect, a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to be configured to encode video data is presented. The instructions cause the one or more processors to perform a method according to the first or second aspect or any possible implementation of the first or second aspect.

According to a fifth aspect, the invention relates to a computer program comprising program code for performing a method according to the first or second aspect or any possible implementation of the first or second aspect when executed on a computer.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Embodiments of the invention are described in more detail below with reference to the accompanying drawings, in which:

FIG. 1A is a block diagram illustrating an example of a video encoding system configured to implement embodiments of the present invention;

FIG. 1B is a block diagram illustrating another example of a video encoding system configured to implement embodiments of the present invention;

FIG. 2 is a block diagram illustrating an example of a video encoder configured to implement embodiments of the present invention;

FIG. 3 is a block diagram illustrating an example structure of a video decoder configured to implement an embodiment of the present invention;

fig. 4 is a block diagram showing an example of an encoding apparatus or a decoding apparatus;

fig. 5 is a block diagram showing another example of an encoding apparatus or a decoding apparatus;

fig. 6 shows the orientation mode and its corresponding direction. In other words, fig. 6 is a diagram showing angular intra prediction directions and associated intra prediction modes in VTM 4.0 and VVC specification draft v.4;

fig. 7 is a simplified block diagram illustrating a method of matrix-based intra prediction (MIP), also known as Affine Linear Weighted Intra Prediction (ALWIP);

FIG. 8 is a detailed block diagram illustrating a method of matrix-based intra prediction (MIP), also known as Affine Linear Weighted Intra Prediction (ALWIP);

fig. 9 shows 10-bit long MIP coefficients comprising a 9-bit value and a sign (1 bit);

fig. 10 shows the 6-bit values of the MIP coefficients extracted from the 9-bit values as a function of the position of the Most Significant Bit (MSB) which is not zero;

FIG. 11 shows a representation of MIP coefficients having a 6-bit value;

fig. 12 shows how multipliers for intra prediction interpolation filtering are reused by MIP;

FIG. 13 illustrates a flow chart of an embodiment of an intra prediction method of the present disclosure, the intra prediction method being a directional intra prediction method or an ALWIP method;

FIG. 14 shows an encoder according to an embodiment of the present disclosure;

fig. 15 shows a decoder according to an embodiment of the present disclosure.

In the following, the same reference numerals indicate identical or at least functionally equivalent features, if not explicitly stated otherwise.

Detailed Description

In the following description, reference is made to the accompanying drawings which form a part hereof and which show by way of illustration, specific aspects of embodiments of the invention or which may be used. It should be understood that embodiments of the present invention may be used in other respects, and include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.

For example, it should be understood that the disclosure relating to the described method may also apply to a corresponding apparatus or system configured to perform the method, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may include one or more units, e.g., functional units, to perform the described one or more method steps (e.g., one unit performs one or more steps, or each of the plurality of units performs one or more of the plurality of steps), even if such one or more units are not explicitly described or shown in the figures. On the other hand, for example, if a particular device is described based on one or more units, e.g., functional units, the corresponding method may include one step to perform the function of the one or more units (e.g., one step performs the function of the one or more units, or each of the plurality of steps performs the function of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Furthermore, it should be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

Video coding generally refers to the process of forming a sequence of pictures of a video or video sequence. Instead of the term "picture", the terms "frame" or "image" may be used as synonyms in the field of video coding. Video encoding (or encoding in general) includes both video encoding and video decoding. Video encoding is performed at the source end, typically including processing (e.g., by compression processing) the original video pictures to reduce the amount of data required to represent the video pictures (for more efficient storage and/or transmission). Video decoding is performed at the destination end and typically involves inverse processing compared to the encoder to reconstruct the video pictures. Embodiments relating to the "encoding" of video pictures (or pictures in general) should be understood as relating to the "encoding" or "decoding" of video pictures or corresponding video sequences. The combination of the encoding and Decoding portions is also called CODEC (Coding and Decoding).

In the case of lossless video coding, the original video picture can be reconstructed, i.e. the reconstructed video picture has the same quality as the original video picture (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, further compression (e.g., by quantization) is performed to reduce the amount of data representing video pictures that cannot be fully reconstructed at the decoder, i.e., the quality of the reconstructed video pictures is lower or worse than the quality of the original video pictures.

Several video coding standards belong to the group of "lossy hybrid video codecs" (i.e., spatial and temporal prediction in the sample domain is combined with 2D transform coding in the transform domain for applying quantization). Each picture of a video sequence is typically divided into a set of non-overlapping blocks, and encoding is typically performed at the block level. In other words, at the encoder, the video is typically processed (i.e. encoded) at the block (video block) level, for example by generating a prediction block using spatial (intra picture) prediction and/or temporal (inter picture) prediction, subtracting the prediction block from the current block (currently processed/block to be processed) to obtain a residual block, transforming the residual block and quantizing the residual block in the transform domain to reduce the amount of data to be transmitted (compression), while at the decoder, the inverse process compared to the encoder is applied to the encoded or compressed block to reconstruct the current block for representation. Furthermore, the encoder replicates the decoder processing loop so that both will generate the same prediction (e.g., intra-prediction and inter-prediction) and/or reconstruction to process (i.e., encode) subsequent blocks.

In the following embodiments of the video encoding system 10, the video encoder 20 and the video decoder 30 are described based on fig. 1 to 3.

Fig. 1A is a schematic block diagram illustrating an example encoding system 10, such as a video encoding system 10 (or short encoding system 10) that may utilize the techniques of the present application. Video encoder 20 (or short encoder 20) and video decoder 30 (or short decoder 30) of video encoding system 10 represent examples of apparatus that may be configured to perform techniques in accordance with various examples described in this application.

As shown in fig. 1A, encoding system 10 includes a source device 12, source device 12 configured to provide encoded picture data 21, e.g., to a destination device 14, to decode encoded picture data 13.

Source device 12 includes an encoder 20 and may additionally (i.e., optionally) include a picture source 16, a preprocessor (or preprocessing unit) 18 (e.g., picture preprocessor 18), and a communication interface or unit 22.

The picture source 16 may include or be any kind of picture capturing device (e.g., a camera for capturing real-world pictures) and/or any kind of picture generating device (e.g., a computer graphics processor for generating computer animated pictures) or any kind of other device for obtaining and/or providing real-world pictures, computer generated pictures (e.g., screen content, Virtual Reality (VR) pictures), and/or any combination thereof (e.g., Augmented Reality (AR) pictures). The picture source may be any kind of memory or storage device that stores any of the aforementioned pictures.

Unlike preprocessor 18 and the processing performed by preprocessing unit 18, picture or picture data 17 may also be referred to as original picture or original picture data 17.

Pre-processor 18 is configured to receive (raw) picture data 17 and perform pre-processing on picture data 17 to obtain pre-processed picture 19 or pre-processed picture data 19. The pre-processing performed by pre-processor 18 may include, for example, trimming, color format conversion (e.g., from RGB to YCbCr), color correction, or de-noising. It is to be understood that the pre-processing unit 18 may be an optional component.

Video encoder 20 is configured to receive pre-processed picture data 19 and provide encoded picture data 21 (additional details will be described below, e.g., based on fig. 2). Communication interface 22 of source device 12 may be configured to receive encoded picture data 21 and send encoded picture data 21 (or any further processed version thereof) over communication channel 13 to another device, such as destination device 14 or any other device for storage or direct reconstruction.

Destination device 14 includes a decoder 30 (e.g., video decoder 30), and may additionally (i.e., optionally) include a communication interface or communication unit 28, a post-processor 32 (or post-processing unit 32), and a display device 34.

Communication interface 28 of destination device 14 is configured to receive encoded picture data 21 (or any further processed version thereof), e.g., directly from source device 12 or from any other source (e.g., a storage device such as an encoded picture data storage device), and provide encoded picture data 21 to decoder 30.

Communication interface 22 and communication interface 28 may be configured to send or receive encoded picture data 21 or encoded data 13 via a direct communication link (e.g., a direct wired or wireless connection) between source device 12 and destination device 14 or via any kind of network (e.g., a wired or wireless network or any combination thereof, or any kind of private and public network or any kind of combination thereof).

The communication interface 22 may, for example, be configured to package the encoded picture data 21 into a suitable format (e.g., packets) and/or process the encoded picture data using any kind of transport encoding or processing for transmission over a communication link or communication network.

Communication interface 28, which forms a counterpart of communication interface 22, may, for example, be configured to receive the transmitted data and process the transmitted data using any type of corresponding transmission decoding or processing and/or unpacking to obtain encoded picture data 21.

Both communication interface 22 and communication interface 28 may be configured as a one-way communication interface or a two-way communication interface as indicated by the arrows of communication channel 13 pointing from source device 12 to destination device 14 in fig. 1A, and may be configured to, for example, send and receive messages, e.g., to establish a connection to acknowledge and exchange any other information related to the communication link and/or data transmission (e.g., encoded picture data transmission).

Decoder 30 is configured to receive encoded picture data 21 and provide decoded picture data 31 or decoded picture 31 (further details will be described below, e.g., based on fig. 3 or fig. 5).

Post-processor 32 of destination device 14 is configured to post-process decoded picture data 31 (also referred to as reconstructed picture data), e.g., decoded picture 31, to obtain post-processed picture data 33 (e.g., post-processed picture 33). The post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color correction, trimming or resampling, or any other processing, for example, for preparing decoded picture data 31 for display, for example, by display device 34.

Display device 34 of destination device 14 is configured to receive post-processed picture data 33 for displaying pictures, e.g., to a user or viewer. The display device 34 may be or comprise any kind of display for presenting the reconstructed picture, such as an integrated or external display or monitor. The display may, for example, include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other type of display.

Although fig. 1A depicts the source device 12 and the destination device 14 as separate devices, embodiments of the devices may also include the functionality of both or both — the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality. In such embodiments, the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.

Based on this description, it will be apparent to those skilled in the art that the existence and (exact) division of the functions of the different elements, or functions within the source device 12 and/or destination device 14 as shown in fig. 1A, may vary depending on the actual device and application.

Encoder 20 (e.g., video encoder 20) or decoder 30 (e.g., video decoder 30) or both encoder 20 and decoder 30 may be implemented via processing circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, dedicated video coding, or any combinations thereof, as shown in fig. 1B. Encoder 20 may be implemented via processing circuitry 46 to embody the various modules as discussed with respect to encoder 20 of fig. 2 and/or any other encoder system or subsystem described herein. Decoder 30 may be implemented via processing circuitry 46 to embody the various modules as discussed with respect to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. The processing circuitry may be configured to perform various operations discussed later. As shown in fig. 5, if the techniques are implemented in part in software, the apparatus may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Either of video encoder 20 and video decoder 30 may be integrated as part of a combined encoder/decoder (CODEC) in a single device, for example as shown in fig. 1B.

Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or fixed device, such as a notebook or laptop computer, a mobile phone, a smart phone, a tablet or tablet computer, a camcorder, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game console, a video streaming device (e.g., a content service server or a content delivery server), a broadcast receiver device, a broadcast transmitter device, etc., and may use no operating system or any type of operating system. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Thus, source device 12 and destination device 14 may be wireless communication devices.

In some cases, the video encoding system 10 shown in fig. 1A is merely an example, and the techniques of this application may be applied to video encoding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between the encoding device and the decoding device. In other examples, the data is retrieved from local storage, streamed over a network, and so on. The video encoding device may encode and store data to the memory, and/or the video decoding device may retrieve and decode data from the memory. In some examples, the encoding and decoding are performed by devices that do not communicate with each other, but simply encode data into memory and/or retrieve data from memory and decode the data.

For convenience of description, embodiments of the present invention are described herein, for example, by referring to reference software for High-Efficiency Video Coding (HEVC) or general Video Coding (VVC), a next-generation Video Coding standard developed by the Video Coding Experts Group (VCEG) of the ITU-T Video Coding Experts Group and the Joint Collaboration on Video Coding (JCT-VC) of the ISO/IEC Moving Picture Experts Group (MPEG). One of ordinary skill in the art will appreciate that embodiments of the present invention are not limited to HEVC or VVC.

Encoder and encoding method

Fig. 2 shows a schematic block diagram of an example video encoder 20 configured to implement the techniques of the present application. In the example of fig. 2, video encoder 20 includes an input 201 (or input interface 201), a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, and an inverse transform processing unit 212, a reconstruction unit 214, a loop filter unit 220, a Decoded Picture Buffer (DPB)230, a mode selection unit 260, an entropy encoding unit 270, and an output 272 (or output interface 272). The mode selection unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a partition unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 as shown in fig. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.

The residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the mode selection unit 260 may be referred to as forming a forward signal path of the encoder 20, and the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB)230, the inter prediction unit 244, and the intra prediction unit 254 may be referred to as forming a backward signal path of the video encoder 20, wherein the backward signal path of the video encoder 20 corresponds to a signal path of a decoder (see the video decoder 30 in fig. 3). Inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, Decoded Picture Buffer (DPB)230, inter prediction unit 244, and intra prediction unit 254 are also referred to as "built-in decoders" that form video encoder 20.

Picture and picture segmentation (pictures and blocks)

Encoder 20 may be configured to receive, e.g., via input 201, picture 17 (or picture data 17), e.g., pictures of a sequence of pictures forming a video or video sequence. The received picture or picture data may also be a pre-processed picture 19 (or pre-processed picture data 19). For simplicity, the following description refers to picture 17. Picture 17 may also be referred to as the current picture or a picture to be encoded (particularly in video encoding to distinguish the current picture from previously encoded and/or decoded pictures of other pictures, e.g., the same video sequence (i.e., a video sequence that also includes the current picture).

A (digital) picture is or can be considered as a two-dimensional array or matrix of samples having intensity values. The samples in an array may also be referred to as pixels (short for picture elements) or pels. The number of samples in the horizontal and vertical directions (or axes) of the array or picture defines the size and/or resolution of the picture. For the representation of colors, three color components are typically employed, i.e. a picture may be represented as or comprise three arrays of samples. In the RBG format or color space, a picture includes respective arrays of red, green, and blue samples. However, in video coding, each pixel is typically represented in a luminance and chrominance format or color space (e.g., YCbCr), which includes a luminance component indicated by Y (L is also sometimes used instead) and two chrominance components indicated by Cb and Cr. The luminance (or short luminance) component Y represents the luma or gray level intensity (e.g., as in a gray level picture), while the two chrominance (or short chrominance) components Cb and Cr represent the chrominance or color information components. Thus, a picture in YCbCr format includes a luminance sample array (Y) of luminance sample values and two chrominance sample arrays (Cb and Cr) of chrominance values. Pictures in RGB format may be converted or transformed into YCbCr format, whereas pictures in YCbCr format may be converted or transformed into RGB format, a process also referred to as color transformation or conversion. If the picture is monochrome, the picture may only include an array of luma samples. Thus, a picture may be, for example, an array of luma samples in a monochrome format or an array of luma samples in 4:2:0, 4:2:2, and 4:4:4 color formats and two corresponding arrays of chroma samples.

Embodiments of video encoder 20 may include a picture partitioning unit (not depicted in fig. 2) configured to partition picture 17 into multiple (typically non-overlapping) picture blocks 203. These blocks may also be referred to as root blocks, macroblocks (h.264/AVC), or Coding Tree Blocks (CTBs) or Coding Tree Units (CTUs) (h.265/HEVC and VVC). The picture segmentation unit may be configured to use the same block size and a corresponding mesh defining the block size for all pictures of the video sequence, or to change the block size between pictures or subsets or groups of pictures and segment each picture into corresponding blocks.

In further embodiments, the video encoder may be configured to receive block 203 of picture 17 directly, e.g., forming one block, several blocks, or all blocks of picture 17. The picture block 203 may also be referred to as a current picture block or a picture block to be encoded.

Like picture 17, picture block 203 is also or can be considered a two-dimensional array or matrix of samples having intensity values (sample values), although its size is smaller than picture 17. In other words, block 203 may comprise, for example, one sample array (e.g., a luma array in the case of monochrome picture 17, or a luma array or a chroma array in the case of color picture) or three sample arrays (e.g., a luma array and two chroma arrays in the case of color picture 17) or any other number and/or kind of arrays, depending on the color format applied. The number of samples in the horizontal and vertical directions (or axes) of the block 203 defines the size of the block 203. Thus, a block may be, for example, an M × N (M columns by N rows) array of samples, or an M × N array of transform coefficients.

The embodiment of video encoder 20 as shown in fig. 2 may be configured to encode picture 17 on a block-by-block basis, e.g., performing encoding and prediction for each block 203.

The embodiment of the video encoder 20 as shown in fig. 2 may also be configured to partition and/or encode a picture by using slices (also referred to as video slices), wherein the picture may be partitioned into one or more slices (typically non-overlapping) or encoded using one or more slices, and each slice may include one or more blocks (e.g., CTUs).

The embodiment of the video encoder 20 as shown in fig. 2 may also be configured to partition and/or encode a picture by using groups of tiles (also referred to as video tiles) and/or tiles (also referred to as video tiles), wherein a picture may be partitioned into one or more groups of tiles (typically non-overlapping) or encoded using one or more groups of tiles, and each group of tiles may comprise, for example, one or more blocks (e.g., CTUs) or one or more tiles, wherein each tile may be, for example, rectangular in shape and may comprise one or more blocks (e.g., CTUs), e.g., complete or partial blocks.

Residual calculation

The residual calculation unit 204 may be configured to obtain the residual block 205 in the sample domain, e.g. by calculating the residual block 205 (also referred to as residual 205) sample by sample (pixel by pixel) by subtracting sample values of the prediction block 265 from sample values of the picture block 203, based on the picture block 203 and the prediction block 265 (further details regarding the prediction block 265 are provided later).

Transformation of

The transform processing unit 206 may be configured to apply a transform, such as a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST), to the sample values of the residual block 205 to obtain transform coefficients 207 in the transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients and represent a residual block 205 in the transform domain.

The transform processing unit 206 may be configured to apply integer approximations of DCT/DST, such as the transforms specified for h.265/HEVC. Such integer approximations are typically scaled by a factor compared to the orthogonal DCT transform. To preserve the norm of the residual block processed by the forward and inverse transform, an additional scaling factor is applied as part of the transform process. The scaling factor is typically selected based on certain constraints, such as scaling factors for shift operations that are powers of 2, a trade-off between bit depth of transform coefficients, accuracy and implementation cost, and so on. The particular scaling factor is specified, for example, for the inverse transform as performed by inverse transform processing unit 212 (and a corresponding inverse transform as performed, for example, by inverse transform processing unit 312 at video decoder 30), and may accordingly specify a corresponding scaling factor, for example, for the forward transform as performed by transform processing unit 206 at encoder 20.

Embodiments of video encoder 20 (and accordingly transform processing unit 206) may be configured to output transform parameters (e.g., one or more transforms) directly or to encode or compress via entropy encoding unit 270, for example, such that video decoder 30 may receive the transform parameters and use the transform parameters for decoding, for example.

Quantization

Quantization unit 208 may be configured to quantize transform coefficients 207 to obtain quantized coefficients 209, e.g., by applying scalar quantization or vector quantization. Quantized coefficients 209 may also be referred to as quantized transform coefficients 209 or quantized residual coefficients 209.

The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, during quantization, an n-bit transform coefficient may be rounded down to an m-bit transform coefficient, where n is greater than m. The degree of quantization may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different scaling may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. The applicable quantization step size may be indicated by a Quantization Parameter (QP). The quantization parameter may for example be an index to a set of predefined applicable quantization steps. For example, a small quantization parameter may correspond to a fine quantization (small quantization step size) and a large quantization parameter may correspond to a coarse quantization (large quantization step size), or vice versa. The quantization may comprise a division by a quantization step size and the corresponding dequantization and/or inverse dequantization, e.g. by the inverse quantization unit 210, may comprise a multiplication by the quantization step size. Embodiments according to some standards (e.g., HEVC) may be configured to use a quantization parameter to determine a quantization step size. In general, the quantization step size may be calculated based on the quantization parameter using a fixed point approximation of an equation including division. Additional scaling factors may be introduced for quantization and de-quantization to recover the norm of the residual block, which may be modified due to the scaling used in the fixed point approximation of the equation for the quantization step size and the quantization parameter. In one example implementation, scaling and dequantization of the inverse transform may be combined. Alternatively, a customized quantization table may be used and signaled from the encoder to the decoder, e.g., in a bitstream. Quantization is a lossy operation in which the loss increases as the quantization step size increases.

Embodiments of video encoder 20 (and accordingly quantization unit 208) may be configured to output Quantization Parameters (QPs), for example, directly or via entropy encoding unit 270, such that, for example, video decoder 30 may receive the quantization parameters and apply the quantization parameters for decoding.

Inverse quantization

The inverse quantization unit 210 is configured to apply inverse quantization of the quantization unit 208 on the quantized coefficients, e.g., by applying an inverse operation of the quantization scheme applied by the quantization unit 208 based on or using the same quantization step as the quantization unit 208, to obtain dequantized coefficients 211. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211 and correspond to transform coefficients 207 (although usually not identical to the transform coefficients due to loss of quantization).

Inverse transformation

Inverse transform processing unit 212 is configured to apply an inverse transform of the transform applied by transform processing unit 206 (e.g., an inverse Discrete Cosine Transform (DCT) or an inverse Discrete Sine Transform (DST) or other inverse transform) to obtain a reconstructed residual block 213 (or corresponding dequantized coefficients 213) in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 213.

Reconstruction

The reconstruction unit 214 (e.g. an adder or summer 214) is configured to add the transform block 213 (i.e. the reconstructed residual block 213) to the prediction block 265 to obtain a reconstructed block 215 in the sample domain, e.g. by adding sample values of the reconstructed residual block 213 and sample values of the prediction block 265 sample by sample.

Filtering

The loop filter unit 220 (or simply "loop filter" 220) is configured to filter the reconstructed block 215 to obtain a filtered block 221, or typically to filter the reconstructed samples to obtain filtered samples. The loop filter unit is for example configured to smooth pixel transitions or otherwise improve video quality. Loop filter unit 220 may include one or more loop filters (e.g., a deblocking filter, a sample-adaptive offset (SAO) filter) or one or more other filters (e.g., a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening filter, a smoothing filter, or a collaborative filter), or any combination thereof. Although loop filter unit 220 is shown in fig. 2 as an in-loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 may also be referred to as a filtered reconstruction block 221.

Embodiments of video encoder 20 (respectively loop filter unit 22) may be configured to output loop filter parameters (e.g., sample adaptive offset information) directly or to encode via entropy encoding unit 270, for example, such that, for example, decoder 30 may receive and apply the same loop filter parameters or a corresponding loop filter for decoding.

Decoded picture buffer

Decoded Picture Buffer (DPB)230 may be a memory that stores reference pictures or general reference picture data for use in encoding video data by video encoder 20. DPB 230 may be formed from any of a variety of memory devices, such as a Dynamic Random Access Memory (DRAM) including a Synchronous DRAM (SDRAM), a Magnetoresistive RAM (MRAM), a Resistive RAM (RRAM), or other types of memory devices. The Decoded Picture Buffer (DPB)230 may be configured to store one or more filtered blocks 221. Decoded picture buffer 230 may also be configured to store other previously filtered blocks (e.g., previously reconstructed and filtered blocks 221) of the same current picture or a different picture (e.g., a previously reconstructed picture), and may provide a complete previously reconstructed (i.e., decoded) picture (and corresponding reference blocks and samples) and/or a partially reconstructed current picture (and corresponding reference blocks and samples), for example, for inter prediction. Decoded Picture Buffer (DPB)230 may also be configured to store one or more non-filtered reconstructed blocks 215, or generally non-filtered reconstructed samples, or any other further processed version of a reconstructed block or sample, e.g., if reconstructed block 215 is not filtered by loop filter unit 220.

Mode selection (segmentation and prediction)

Mode selection unit 260 includes a segmentation unit 262, an inter prediction unit 244, and an intra prediction unit 254, and is configured to receive or obtain original picture data, e.g., original block 203 (current block 203 of current picture 17), and reconstructed picture data, e.g., filtered and/or unfiltered reconstructed samples or blocks of the same (current) picture and/or from one or more previously decoded pictures, e.g., from decoded picture buffer 230 or other buffers (e.g., line buffers, not shown). The reconstructed picture data is used as reference picture data for prediction (e.g., inter prediction or intra prediction) to obtain a prediction block 265 or a predictor 265.

The mode selection unit 260 may be configured to determine or select a partition and prediction mode (e.g., intra or inter prediction modes) for a current block prediction mode (including no partition) and generate a reconstructed respective prediction block 265 for the block 215 of the calculation and reconstruction of the residual block 205.

Embodiments of mode selection unit 260 may be configured to select a partitioning and prediction mode (e.g., from among the modes supported by or available to mode selection unit 260) that provides the best match or in other words the smallest residual (which means better compression for transmission or storage), or the smallest signaling overhead (which means better compression for transmission or storage), or both, taking into account or balancing. The mode selection unit 260 may be configured to determine the segmentation and prediction modes based on Rate Distortion Optimization (RDO), i.e. to select the prediction mode that provides the smallest rate distortion. Terms such as "best," "minimum," "optimal," and the like, herein do not necessarily refer to "best," "minimum," "optimal," and the like as a whole, but may also refer to meeting a termination criterion or selection criterion, such as a value above or below a threshold or other constraint that may result in "suboptimal selection" but reduced complexity and processing time.

In other words, the partitioning unit 262 may be configured to partition the block 203 into smaller block partitions or sub-blocks (which again form blocks), for example, iteratively using quad-tree partitioning (QT), binary-tree partitioning (BT), or triple-tree partitioning (TT), or any combination thereof, and to perform prediction, for example, for each of the block partitions or sub-blocks, wherein the mode selection includes selection of the tree structure of the partitioned block 203 and the prediction mode is applied to each of the block partitions or sub-blocks.

In the following, the partitioning performed by the example video encoder 20 (e.g., by the partitioning unit 260) and the prediction processing performed (by the inter-prediction unit 244 and the intra-prediction unit 254) will be described in more detail.

Segmentation

The segmentation unit 262 may segment (or divide) the current block 203 into smaller partitions, e.g., smaller blocks of square or rectangular size. These smaller blocks (which may also be referred to as sub-blocks) may also be partitioned into even smaller partitions. This is also referred to as tree splitting or hierarchical tree splitting, wherein a root block, e.g. at root tree level 0 (level 0, depth 0), may be recursively split, e.g. into two or more blocks of the next lower tree level, e.g. nodes at tree level 1 (level 1, depth 1), wherein these blocks may again be split into two or more blocks of the next lower level, e.g. tree level 2 (level 2, depth 2), etc., until the splitting terminates, e.g. due to the termination criteria being met (e.g. maximum tree depth or minimum block size being reached). A block that is not further divided is also referred to as a leaf block or leaf node of the tree. The use of a tree partitioned into two partitions is called a Binary Tree (BT), the use of a tree partitioned into three partitions is called a Ternary Tree (TT), and the use of a tree partitioned into four partitions is called a Quadtree (QT).

As previously mentioned, the term "block" as used herein may be a portion of a picture, in particular a square or rectangular portion. For example, referring to HEVC and VVC, the block may be or correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU) and a Transform Unit (TU) and/or a corresponding block, such as a Coding Tree Block (CTB), a Coding Block (CB), a Transform Block (TB), or a Prediction Block (PB).

For example, a Coding Tree Unit (CTU) may be or include a CTB of luma samples of a picture having three sample arrays, two corresponding CTBs of chroma samples, or a CTB of samples of a monochrome picture or a picture encoded using three separate color planes and syntax structures for encoding the samples. Accordingly, a Coding Tree Block (CTB) may be for a block of N × N samples having a value of N, such that the division of the component into CTBs is a partition. A Coding Unit (CU) may be or comprise an encoded block of luma samples of a picture with three arrays of samples, two corresponding encoded blocks of chroma samples or an encoded block of samples of a monochrome picture or a picture encoded using three separate color planes and syntax structures for encoding the samples. Accordingly, the Coding Block (CB) may be for M × N sample blocks of values M and N, such that the partitioning of the CTB into coding blocks is a partition.

In an embodiment, for example, according to HEVC, a Coding Tree Unit (CTU) may be divided into CUs by using a quadtree structure represented as a coding tree. The decision whether to encode a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction is made at the CU level. Each CU may be further partitioned into one, two, or four PUs, depending on the PU partition type. Within one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU partition type, the CU may be divided into Transform Units (TUs) according to another quadtree structure similar to a coding tree used for the CU.

In an embodiment, for example, combined Quad-tree and binary tree (QTBT) partitioning is used to partition coding blocks, for example, according to the latest video coding standard in current development, which is called universal video coding (VVC). In a QTBT block structure, a CU may have a square or rectangular shape. For example, a Coding Tree Unit (CTU) is first partitioned by a quadtree structure. The leaf nodes of the quadtree are further partitioned by a binary tree or ternary (or trifurcated) tree structure. The divided leaf node is called a Coding Unit (CU), and the division is used for prediction and transform processing without any further division. This means that CU, PU and TU have the same block size in the QTBT coding block structure. In parallel, multiple partitions (e.g., ternary tree partitions) may be used with the QTBT block structure.

In one example, mode select unit 260 of video encoder 20 may be configured to perform any combination of the segmentation techniques described herein.

As described above, video encoder 20 is configured to determine or select a best or optimal prediction mode from a set of (e.g., predetermined) prediction modes. The set of prediction modes may include, for example, intra-prediction modes and/or inter-prediction modes.

Intra prediction

The set of intra prediction modes may comprise 35 different intra prediction modes, e.g. non-directional modes such as DC (or mean) mode and planar mode, or directional modes such as defined in HEVC, or may comprise 67 different intra prediction modes, e.g. non-directional modes such as DC (or mean) mode and planar mode, or directional modes such as defined for VVC, for example.

The intra-prediction unit 254 is configured to generate the intra-prediction block 265 using reconstructed samples of neighboring blocks of the same current picture according to an intra-prediction mode of the set of intra-prediction modes.

Intra-prediction unit 254 (or mode selection unit 260 in general) is also configured to output intra-prediction parameters (or information indicating the selected intra-prediction mode for the block in general) to entropy encoding unit 270 in the form of syntax elements 266 for inclusion in encoded picture data 21 so that, for example, video decoder 30 may receive and use the prediction parameters for decoding.

Inter prediction

The set of inter prediction modes (or possible inter prediction modes) depends on the available reference pictures (i.e., pictures that have been previously at least partially decoded, e.g., stored in the DBP 230) and other inter prediction parameters, such as whether the entire reference picture or only a portion of the reference picture (e.g., the search window area around the area of the current block) is used to search for the best matching reference block, and/or such as whether pixel interpolation (e.g., half pel/half pel and/or quarter pel interpolation) is applied.

In addition to the prediction mode described above, a skip mode and/or a direct mode may be applied.

The inter prediction unit 244 may include a Motion Estimation (ME) unit and a Motion Compensation (MC) unit (neither shown in fig. 2). The motion estimation unit may be configured to receive or obtain the picture block 203 (the current picture block 203 of the current picture 17) and the decoded picture 231, or at least one or more previously reconstructed blocks (e.g. reconstructed blocks of one or more other/different previously decoded pictures 231) for motion estimation. For example, the video sequence may include a current picture and a previously decoded picture 231, or in other words, the current picture and the previously decoded picture 231 may be part of or may form a sequence of pictures that form the video sequence.

The encoder 20 may, for example, be configured to select a reference block from a plurality of reference blocks of the same or different ones of a plurality of other pictures, and to provide the motion estimation unit with an offset (spatial offset) between the position (x, y coordinates) of the reference picture (or reference picture index) and/or the reference block and the position of the current block as an inter prediction parameter. This offset is also called a Motion Vector (MV).

The motion compensation unit is configured to obtain (e.g., receive) inter-prediction parameters and perform inter-prediction based on or using the inter-prediction parameters to obtain an inter-prediction block 265. The motion compensation performed by the motion compensation unit may involve obtaining or generating a prediction block based on a motion/block vector determined by motion estimation, and interpolation may be performed on sub-pixel precision. Interpolation filtering may generate additional pixel samples from known pixel samples, thus potentially increasing the number of candidate prediction blocks that may be used to encode a picture block. When a motion vector for a PU of a current picture block is received, the motion compensation unit may locate, in one of the reference picture lists, a prediction block to which the motion vector points.

Motion compensation unit may also generate syntax elements associated with the blocks and the video slice for use by video decoder 30 in decoding picture blocks of the video slice. In addition to or as an alternative to slices and respective syntax elements, groups of tiles and/or tiles and respective syntax elements may be generated or used.

Entropy coding

Entropy encoding unit 270 is configured to apply, for example, an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a context adaptive VLC scheme (CAVLC), an arithmetic coding scheme, binarization, Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique) or bypass (no compression) to quantized coefficients 209, inter-prediction parameters, intra-prediction parameters, loop filter parameters, and/or other syntax elements to obtain encoded picture data 21 that may be output via output 272, for example, in the form of encoded bitstream 21, such that, for example, video decoder 30 may receive and use the parameters for decoding. The encoded bitstream 21 may be transmitted to the video decoder 30 or stored in memory for later transmission or retrieval by the video decoder 30.

Other structural changes of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may quantize the residual signal directly for certain blocks or frames without the transform processing unit 206. In another implementation, encoder 20 may combine quantization unit 208 and inverse quantization unit 210 into a single unit.

Decoder and decoding method

Fig. 3 shows an example of a video decoder 30 configured to implement the techniques of this application. The video decoder 30 is configured to receive encoded picture data 21 (e.g., encoded bitstream 21), e.g., encoded by the encoder 20, to obtain a decoded picture 331. The encoded picture data or bitstream includes information for decoding the encoded picture data, e.g., data representing picture blocks of an encoded video slice (and/or tile group or tile) and associated syntax elements.

In the example of fig. 3, decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (e.g., summer 314), a loop filter 320, a decoded picture buffer (DBP)330, a mode application unit 360, an inter prediction unit 344, and an intra prediction unit 354. The inter prediction unit 344 may be or include a motion compensation unit. In some examples, video decoder 30 may perform a decoding process that is generally the inverse of the encoding process described with respect to video encoder 100 of fig. 2.

As explained with respect to encoder 20, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, Decoded Picture Buffer (DPB)230, inter prediction unit 344, and intra prediction unit 354 are also referred to as "built-in decoders" that form video encoder 20. Accordingly, the inverse quantization unit 310 may be functionally identical to the inverse quantization unit 110, the inverse transform processing unit 312 may be functionally identical to the inverse transform processing unit 212, the reconstruction unit 314 may be functionally identical to the reconstruction unit 214, the loop filter 320 may be functionally identical to the loop filter 220, and the decoded picture buffer 330 may be functionally identical to the decoded picture buffer 230. Accordingly, the description provided for the corresponding units and functions of video encoder 20 applies correspondingly to the corresponding units and functions of video decoder 30.

Entropy decoding

The entropy decoding unit 304 is configured to parse the bitstream 21 (or generally the encoded picture data 21) and perform entropy decoding, for example, on the encoded picture data 21 to obtain, for example, quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), such as any or all of inter-prediction parameters (e.g., reference picture indices and motion vectors), intra-prediction parameters (e.g., intra-prediction modes or indices), transform parameters, quantization parameters, loop filter parameters, and/or other syntax elements. The entropy decoding unit 304 may be configured to apply a decoding algorithm or scheme corresponding to the encoding scheme described with respect to the entropy encoding unit 270 of the encoder 20. Entropy decoding unit 304 may also be configured to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360, and to provide other parameters to other units of decoder 30. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level. In addition to or as an alternative to slices and respective syntax elements, groups of tiles and/or tiles and respective syntax elements may be received and/or used.

Inverse quantization

Inverse quantization unit 310 may be configured to receive Quantization Parameters (QPs) (or information related to inverse quantization in general) and quantized coefficients from encoded picture data 21 (e.g., by parsing and/or decoding by entropy decoding unit 304), and apply inverse quantization to decoded quantized coefficients 309 based on the quantization parameters to obtain dequantized coefficients 311 (which may also be referred to as transform coefficients 311). The inverse quantization process may include determining a degree of quantization using a quantization parameter determined by video encoder 20 for each video block in a video slice (or tile or group of tiles), and likewise determining a degree of inverse quantization that should be applied.

Inverse transformation

The inverse transform processing unit 312 may be configured to receive the dequantized coefficients 311 (also referred to as transform coefficients 311) and apply a transform to the dequantized coefficients 311 in order to obtain a reconstructed residual block 213 in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 313. The transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process. Inverse transform processing unit 312 may also be configured to receive transform parameters or corresponding information from encoded picture data 21 (e.g., by parsing and/or decoding, e.g., by entropy decoding unit 304) to determine a transform to apply to dequantized coefficients 311.

Reconstruction

The reconstruction unit 314 (e.g., adder or summer 314) may be configured to add the reconstructed residual block 313 and the prediction block 365 to obtain a reconstructed block 315 in the sample domain, e.g., by adding sample values of the reconstructed residual block 313 and sample values of the prediction block 365.

Filtering

Loop filter unit 320 (in or after the encoding loop) is configured to filter reconstructed block 315 to obtain filtered block 321, e.g., to smooth pixel transitions, or otherwise improve video quality. Loop filter unit 320 may include one or more loop filters (e.g., a deblocking filter, a Sample Adaptive Offset (SAO) filter) or one or more other filters (e.g., a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening filter, a smoothing filter, or a collaborative filter), or any combination thereof. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.

Decoded picture buffer

Decoded video block 321 of the picture is then stored in decoded picture buffer 330, which decoded picture buffer 330 stores decoded picture 331 as a reference picture for subsequent motion compensation for other pictures and/or for separate output display.

Decoder 30 is configured to output decoded picture 311, e.g., via output 312, for presentation or viewing by a user.

Prediction

The inter-prediction unit 344 may be functionally identical to the inter-prediction unit 244 (in particular, the motion compensation unit), and the intra-prediction unit 354 may be functionally identical to the inter-prediction unit 254, and perform partitioning or partitioning decisions and predictions based on partitioning and/or prediction parameters or corresponding information received from the encoded picture data 21 (e.g., by parsing and/or decoding, for example, by the entropy decoding unit 304). The mode application unit 360 may be configured to perform prediction (intra or inter prediction) for each block based on reconstructed pictures (filtered or unfiltered) blocks or corresponding samples to obtain a prediction block 365.

When the video slice is encoded as an intra-coded (I) slice, the intra-prediction unit 354 of the mode application unit 360 is configured to generate a prediction block 365 for a picture block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current picture. When the video picture is encoded as an inter-coded (i.e., B or P) slice, the inter-prediction unit 344 (e.g., motion compensation unit) of the mode application unit 360 is configured to generate the prediction block 365 for the video block of the current video slice based on the motion vectors and other syntax elements received from the entropy decoding unit 304. For inter prediction, a prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may use a default construction technique based on the reference pictures stored in DPB 330 to construct the reference frame lists, list 0 and list 1. In addition to or as an alternative to slices (e.g., video slices), the same or similar techniques may be applied to or by implementations that use tile groups (e.g., video tile groups) and/or tiles (e.g., video tiles), e.g., I, P or B tile groups and/or tiles may be used to encode video.

The mode application unit 360 is configured to determine prediction information for video blocks of the current video slice by parsing motion vectors or related information and other syntax elements, and to generate a prediction block for the current video block being decoded using the prediction information. For example, mode application unit 360 uses some received syntax elements to determine a prediction mode (e.g., intra or inter prediction) for encoding video blocks of a video slice, an inter prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter prediction state for each inter-coded video block of the slice, and other information for decoding video blocks in a current video slice. In addition to or as an alternative to slices (e.g., video slices), the same or similar techniques may be applied to or used by embodiments that use tile groups (e.g., video tile groups) and/or slices (e.g., video slices), e.g., I, P or B tile groups and/or tiles may be used to encode video.

The embodiment of the video decoder 30 as shown in fig. 3 may be configured to segment and/or decode a picture by using slices (also referred to as video slices), wherein the picture may be segmented into one or more slices (typically non-overlapping) or decoded using one or more slices, and each slice may include one or more blocks (e.g., CTUs).

The embodiment of video decoder 30 as shown in fig. 3 may be configured to partition and/or decode a picture by using groups of tiles (also referred to as video tiles) and/or tiles (also referred to as video tiles), wherein a picture may be partitioned into one or more groups of tiles (typically non-overlapping) or decoded using one or more groups of tiles, and each group of tiles may comprise, for example, one or more blocks (e.g., CTUs) or one or more tiles, wherein each tile may be, for example, rectangular in shape and may comprise one or more blocks (e.g., CTUs), e.g., complete or partial blocks.

Other variations of video decoder 30 may be used to decode encoded picture data 21. For example, decoder 30 may generate an output video stream without loop filtering unit 320. For example, the non-transform based decoder 30 may inverse quantize the residual signal directly for certain blocks or frames without the inverse transform processing unit 312. In another implementation, video decoder 30 may combine inverse quantization unit 310 and inverse transform processing unit 312 into a single unit.

It should be understood that in the encoder 20 and the decoder 30, the processing result of the current step may be further processed and then output to the next step. For example, after interpolation filtering, motion vector derivation, or loop filtering, additional operations such as clipping or shifting may be performed on the processing results of the interpolation filtering, motion vector derivation, or loop filtering.

It should be noted that additional operations may be applied to the derived motion vector for the current block (including but not limited to control point motion vectors for affine mode, planar mode, sub-block motion vectors in ATMVP mode, temporal motion vectors, etc.). For example, the value of the motion vector is limited to a predefined range according to its representation bits. If the representation bits of the motion vector are bitDepth, the range is-2 ^ (bitDepth-1) to 2^ (bitDepth-1) -1, where "^" represents the power operation. For example, if bitDepth is set equal to 16, the range is-32768 ~ 32767; if bitDepth is set equal to 18, the range is-131072 ~ 131071. For example, the values of the derived motion vectors (e.g., the MVs of the four 4 × 4 sub-blocks within an 8 × 8 block) are constrained such that the maximum difference between the integer parts of the four 4 × 4 sub-blocks MVs does not exceed N pixels, e.g., 1 pixel. Two methods for constraining motion vectors according to bitDepth are provided herein.

The method comprises the following steps: removal of overflow MSB (most significant bit) by streaming operation

ux＝(mvx+2^bitDepth)％2^bitDepth (1)

mvx＝(ux＞＝2^bitDepth-1)？(ux-2^bitDepth)：ux (2)

uy＝(mvy+2^bitDepth)％2^bitDepth (3)

mvy＝(uy＞＝2^bitDepth-1)？(uy-2^bitDepth)：uy (4)

Wherein mvx is a horizontal component of a motion vector of an image block or sub-block, mvy is a vertical component of a motion vector of an image block or sub-block, and ux and uy indicate intermediate values;

for example, if the value of mvx is-32769, the resulting value is 32767 after applying equations (1) and (2). In a computer system, decimal numbers are stored as two's complement. The two's complement of-32769 is 1, 0111, 1111, 1111, 1111(17 bits), and then the MSB is discarded, so the resulting two's complement is 0111, 1111, 1111, 1111 (decimal is 32767), which is the same as the output of applying equations (1) and (2).

ux＝(mvpx+mvdx+2^bitDepth)％2^bitDepth (5)

mvx＝(ux＞＝2^bitDepth-1)？(ux-2^bitDepth)：ux (6)

uy＝(mvpy+mvdy+2^bitDepth)％2^bitDepth (7)

mvy＝(uy＞＝2^bitDepth-1)？(uy-2^bitDepth)：uy (8)

These operations may be applied during the summation of mvp and mvd as shown in equations (5) to (8).

The method 2 comprises the following steps: removing overflow MSB by clipping value

vx＝Clip3(-2^bitDepth-1，2^bitDepth-1-1，vx)

vy＝Clip3(-2^bitDepth-1，2^bitDepth-1-1，vy)

Wherein vx is the horizontal component of the motion vector of the image block or sub-block, vy is the vertical component of the motion vector of the image block or sub-block; x, y, and z correspond to the three input values of the MV clipping process, respectively, and the function Clip3 is defined as follows:

fig. 4 is a schematic diagram of a video encoding apparatus 400 according to an embodiment of the present disclosure. The video encoding apparatus 400 is suitable for implementing the disclosed embodiments described herein. In an embodiment, the video encoding apparatus 400 may be a decoder (e.g., the video decoder 30 of fig. 1A) or an encoder (e.g., the video encoder 20 of fig. 1A).

The video encoding apparatus 400 comprises an input port 410 (or input port 410) for receiving data and a receiver unit (Rx) 420; a processor, logic unit, or Central Processing Unit (CPU) 430 for processing data; a transmitter unit (Tx) 440 and an egress port 450 (or output 450) for transmitting data; and a memory 460 for storing data. The video encoding device 400 may also include optical-to-electrical (OE) and electrical-to-optical (EO) components coupled to the inlet port 410, the receiver unit 420, the transmitter unit 440, and the outlet port 450 for output or input of optical or electrical signals.

The processor 430 is implemented by hardware and software. Processor 430 may be implemented as one or more CPU chips, cores (e.g., multi-core processors), FPGAs, ASICs, and DSPs. Processor 430 is in communication with inlet port 410, receiver unit 420, transmitter unit 440, outlet port 450, and memory 460. Processor 430 includes an encoding module 470. The encoding module 470 implements the embodiments disclosed above. For example, the encoding module 470 implements, processes, prepares, or provides various encoding operations. Thus, the inclusion of the encoding module 470 provides a substantial improvement in the functionality of the video encoding apparatus 400 and enables the transformation of the video encoding apparatus 400 into different states. Alternatively, the encoding module 470 is implemented as instructions stored in the memory 460 and executed by the processor 430.

Memory 460 may include one or more disks, tape drives, and solid state drives, and may serve as an over-flow data storage device to store programs as such programs are selected for execution, and to store instructions and data read during program execution. The memory 460 may be, for example, volatile and/or non-volatile, and may be read-only memory (ROM), Random Access Memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM).

Fig. 5 is a simplified block diagram of an apparatus 500 according to an example embodiment, which apparatus 500 may be used as either or both of the source device 12 and the destination device 14 in fig. 1.

The processor 502 in the device 500 may be a central processing unit. Alternatively, processor 502 may be any other type of device or devices capable of manipulating or processing information now existing or later developed. Although the disclosed implementations may be implemented with a single processor as shown, such as processor 502, advantages in speed and efficiency may be realized using more than one processor.

In one implementation, the memory 504 in the apparatus 500 may be a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of storage device may be used for memory 504. The memory 504 may include code and data 506 that are accessed by the processor 502 using a bus 512. The memory 504 may also include an operating system 508 and application programs 510, the application programs 510 including at least one program that allows the processor 502 to perform the methods described herein. For example, application 510 may include applications 1 through N, which also include video coding applications that perform the methods described herein. The device 500 may also include one or more output devices, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines a display with a touch-sensitive element operable to sense touch inputs. A display 518 may be coupled to the processor 502 via the bus 512.

Although depicted here as a single bus, bus 512 of device 500 may be comprised of multiple buses. Further, the secondary memory 514 may be directly coupled to other components of the device 500 or may be accessed via a network and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Thus, the device 500 may be implemented in a wide variety of configurations.

Directional intra-prediction is a well-known technique that involves propagating the values of neighboring samples into a prediction block specified by the prediction direction. Fig. 6 shows 93 prediction directions, where the dashed direction is associated with the wide-angle mode applied only to non-square blocks.

The direction may be specified by increasing the offset between the positions of the prediction samples and the reference samples. A larger magnitude of this increase corresponds to a larger inclination of the prediction direction. Table 1 specifies the mapping table between predModelIntra and the angle parameter intraPredAngle. This parameter is actually an increase in the offset per row (or per column) specified in the 1/32 sample resolution.

TABLE 1 InterraPredAngle Specification

predModeIntra	-14	-13	-12	-11	-10	-9	-8	-7	-6	-5	-4	-3	-2	-1	2	3	4
																		intraPredAngle	512	341	256	171	128	102	86	73	64	57	51	45	39	35	32	29	26
predModeIntra	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21
																		intraPredAngle	23	20	18	16	14	12	10	8	6	4	3	2	1	0	-1	-2	-3
predModeIntra	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38
																		intraPredAngle	-4	-6	-8	-10	-12	-14	-16	-18	-20	-23	-26	-29	-32	-29	-26	-23	-20
predModeIntra	39	40	41	42	43	44	45	46	47	48	49	50	51	52	53	54	55
																		intraPredAngle	-18	-16	-14	-12	-10	-8	-6	-4	-3	-2	-1	0	1	2	3	4	6
predModeIntra	56	57	58	59	60	61	62	63	64	65	66	67	68	69	70	71	72
																		intraPredAngle	8	10	12	14	16	18	20	23	26	29	32	35	39	45	51	57	64
predModelntra	73	74	75	76	77	78	79	80
																		intraPredAngle	73	86	102	128	171	256	341	512

The wide-angle mode can be identified by the absolute value of intraPredAngle greater than 32(1 sample), which corresponds to a slope of the prediction direction greater than 45 degrees.

Prediction samples ("preSamples") may be obtained from neighboring samples "p" as follows: the value preSamples [ x ] [ y ] (where x ═ 0.. nTbW-1, y ═ 0.. nTbH-1) of the prediction samples is derived as follows:

-if predModelIntra is greater than or equal to 34, the following sequential steps apply:

1. the reference sample array ref [ x ] is specified as follows:

the following applies:

ref [ x ] p [ -1-refIdx + x ] [ -1-refIdx ], where x ═ 0

If intraPredAngle is less than 0, then the main reference sample array is expanded as follows:

-when (nTbH. intraPredAngle) > 5 is less than-1,

ref[x]＝p[-1-refIdx][-1-refIdx+((x*invAngle+128)＞＞8)]，

wherein x is-1. (nTbH. intraPredAngle) > 5

ref[((nTbH*intraPredAngle)＞＞5)-1]＝ref[(nTbH*intraPredAngle)＞＞5]

ref[nTbW+1+refIdx]＝ref[nTbW+refIdx]

-if not, then,

ref [ x ] ═ p [ -1-refIdx + x ] [ -1-refIdx ], where x ═ nTbW +1+ refIdx

ref[-1]＝ref[0]

Additional samples ref [ refW + refIdx + x ], where x is 1. (Max (1, nTbW/nTbH) × refIdx +1) are derived as follows:

ref[refW+refIdx+x]＝p[-1+refW][-1-refIdx]

2. the value predSamples [ x ] [ y ] of the prediction sample (where x is 0.. nTbW-1, y is 0.. nTbH-1) is derived as follows:

the index variable ididx and the multiplication factor iFact are derived as follows:

iIdx＝((y+1+refIdx)*intraPredAngle)＞＞5+refIdx

iFact＝((y+1+refIdx)*intraPredAngle)&31

-if cldx is equal to 0, the following applies:

-the interpolation filter coefficients fT [ j ] (where j ═ 0..3) are derived as follows:

fT[j]＝filterFlagfG[iFact][j]：fC[iFact][j]

the value predSamples [ x ] [ y ] of the prediction sample is derived as follows:

else (cldx not equal to 0), depending on the value of iFact, the following applies:

if iFact is not equal to 0, the value predSamples [ x ] [ y ] of the prediction sample is derived as follows:

predSamples[x][[y]＝

((32iFact)*ref[x+iIdx+1]+iFact*ref[x+iIdx+2]+16)＞＞5

otherwise, the value predSamples [ x ] [ y ] of the prediction sample is derived as follows:

predSamples[x][y]＝ref[x+iIdx+1]

otherwise (predModeIntra is less than 34), the following sequential steps apply:

1. the reference sample array ref [ x ] is specified as follows:

the following applies:

ref [ x ] p [ -1-refIdx ] [ -1-refIdx + x ], where x ═ 0

If intraPredAngle is less than 0, then the main reference sample array is expanded as follows:

-when (nTbW. intraPredAngle) > 5 is less than-1,

ref[x]＝p[-1-refIdx+((x*invAngle+128)＞＞8)][-1-refIdx]，

wherein x is-1. (nTbW.) intraPredAngle) > 5

ref[((nTbW*intraPredAngle)＞＞5)-1]＝

ref[(nTbW*intraPredAngle)＞＞5](8-145)

ref[nTbG+1+refIdx]＝ref[nTbH+refIdx]

-if not, then,

ref [ x ] p [ -1-refIdx ] [ -1-refIdx + x ], where x ═ nTbH +1+ refIdx

ref[-1]＝ref[0]

Additional samples ref [ refW + refIdx + x ] (where x is 1. (Max (1, nTbW/nTbH) × refIdx +1)) are derived as follows:

ref[refH+refIdx+x]＝p[-1+refH][-1-refIdx]

2. the value predSamples [ x ] [ y ] of the prediction sample (where x is 0.. nTbW-1, y is 0.. nTbH-1) is derived as follows:

the index variable ididx and the multiplication factor iFact are derived as follows:

iIdx＝((x+1+refIdx)*intraPredAngle)＞＞5

iFact＝((x+1+refIdx)*intraPredAngle)&31

-if cldx is equal to 0, the following applies:

-the interpolation filter coefficients fT [ j ] (where j ═ 0..3) are derived as follows:

fT[j]＝filterFlagfG[iFact][j]：fC[iFact][j]

the value predSamples [ x ] [ y ] of the prediction sample is derived as follows:

else (cldx not equal to 0), depending on the value of iFact, the following applies:

if iFact is not equal to 0, the value predSamples [ x ] [ y ] of the prediction sample is derived as follows:

predSamples[x][y]＝

((32-iFact)*ref[y+iIdx+1]+iFact*ref[y+iIdx+2]+16)＞＞5

otherwise, the value predSamples [ x ] [ y ] of the prediction sample is derived as follows:

predSamples[x][y]＝ref[y+iIdx+1]

interpolation filter coefficients fC [ phase ] [ j ] and fG [ phase ] [ j ] used in the directional prediction are specified in table 2, where phase 0..31 and j 0.. 3.

TABLE 2 Specification of interpolation Filter coefficients fC and fG

As shown in fig. 7 and 8, one method, called affine linear weighted prediction, uses reference samples to derive values for prediction samples. For predicting samples of a rectangular block of width W and height H, Affine Linear Weighted Intra Prediction (ALWIP) takes as input a row of H reconstructed adjacent boundary samples on the left side of the block and a row of W reconstructed adjacent boundary samples above the block. If the reconstructed samples are not available, they are generated as done in conventional intra prediction.

The generation of the prediction signal is based on the following three steps:

1. among the boundary samples, 4 samples in the case of W ═ H ═ 4 and 8 samples in all other cases were extracted by averaging.

2. A matrix vector multiplication is performed with the averaged samples as input, followed by an addition of an offset. The result is a reduced prediction signal for the sub-sampled set of samples in the original block.

3. The prediction signal at the remaining positions is generated from the prediction signals of the sub-sampled set by linear interpolation, which is a single step linear interpolation in each direction.

The matrix and offset vectors required to generate the prediction signal are taken from three sets S of matrices₀、S₁、S₂. Set S₀Comprising 18 matrices with 16 rows and 4 columnsAnd 18 offset vectors of size 16And (4) forming. The matrix and offset vectors of the set are for blocks of size 4 x 4. Set S₁Consists of 10 matrixes with 16 rows and 8 columnsAnd 10 offset vectors of size 16And (4) forming. The matrices and offset vectors of the set are for blocks of sizes 4 × 8, 8 × 4, and 8 × 8. Finally, set S₂Consisting of 6 matrices with 64 rows and 8 columnsAnd 6 offset vectors of size 64And (4) forming. The matrix and offset vectors of the set or a portion of these matrix and offset vectors are used for all other block shapes.

The total number of multiplications required to compute the matrix-vector product is always less than or equal to 4 · W · H. In other words, for ALWIP mode, a maximum of four multiplications are required per sample.

Averaging of boundaries

In a first step, the boundary bdry is entered^topAnd bdry^leftReduced to smaller boundariesAndhere, in the case of a 4 x 4 block,andall consist of 2 samples, and in all other cases,andall consisted of 4 samples.

In the case of 4 x 4 blocks, for 0 ≦ i <2, the definition

And similarly define

Otherwise, if the block width W is given as W4 · 2^kFor 0 ≦ i<4, definition of

And similarly define

Two reduced boundariesAndconnected to the reduced boundary vector bdry_redThus, for a 4 x 4 shaped block, its size is 4, and for all other shaped blocks its size is 8. If mode refers to ALWIP mode, then this connection is defined as follows:

finally, for the interpolation of the subsampled predicted signal, for large blocks, a second version of the average boundary is needed. I.e., if min (W, H)>8, W is more than or equal to H, then W is 8 x 2^lAnd for 0 ≦ i<8, definition of

If min (W, H)>8, and H>W, then are defined similarly

Generating reduced prediction signal by matrix vector multiplication

At reduced input vector bdry_redIn (1), generating a reduced prediction signal pred_red。

The latter signal being the width W_redAnd height H_redThe signal on the downsampling block. Here, W_redAnd H_redIs defined as:

by calculating the matrix-vector product and adding an offsetTo calculate a reduced prediction signal pred_red：

pred_red＝A·bdry_red+b。

Here, a is a group having W in the case where W ═ H ═ 4_red·H_redA matrix of rows and 4 columns, and in all other cases A is a matrix with W_red·H_redA matrix of rows and 8 columns. b is W_red·H_redVector of magnitude.

Matrix A and vector b are taken from set S₀、S₁、S₂One of them is as follows. The definition index (also called block size type) idx ═ idx (W, H) is as follows:

further, m is expressed as follows:

then, if idx ≦ 1 or idx ═ 2 and min (W, H)>4, then substitutingAndin the case where idx is 2 and min (W, H) is 4, it is assumed that a is determined by omitting aCorresponds to the odd x-coordinate in the down-sampled block, in case W is 4, or corresponds to the odd y-coordinate in the down-sampled block, in case H is 4.

Finally, the reduced prediction signal is replaced by its transpose in the following cases:

W-H-4 and mode ≧ 18

Max (W, H) 8 and mode ≧ 10

Max (W, H) >8 and mode ≧ 6

The technical problem to be solved is as follows:

in ALWIP, the prediction samples are obtained using convolution of a set of coefficients.

Notably, the reduced prediction signal pred is calculated by calculating the matrix-vector product and adding an offset_red：

pred_red＝A·bdry_red+b。

If this step is collocated with the interpolation step used in the tilt-directional intra prediction,some similarities may be noted.

First, multiplication is performed on a set of reference samples obtained from neighboring samples using reference sample filtering. Second, a prediction sample is obtained from the reference sample using a convolution operation. The two methods differ in that the convolution kernels (A and fT [ ]) differ. Thus, despite similarities, the steps of the two methods are different and require separate hardware designs with similar modules.

The solution and the advantages are that: the present disclosure presents unified directional intra prediction and ALWIP by adjusting the accuracy of the multiplication operation. This unification makes it possible for both methods to have a unified convolution step and thus eliminate hardware redundancy.

The core of the present disclosure may be expressed as an intra prediction method including the steps of:

-preparing a set of reference samples;

obtaining a prediction signal by convolving the reference sample with a set of coefficients, which set of coefficients can be adaptively defined depending on the position of the prediction sample,

-upsampling the prediction signal.

Each of these steps has parameters that can be adjusted. By defining a set of parameters, the sequence of steps can be operated as ALWIP or as directional intra prediction.

Fig. 7 shows a flow chart applicable to two methods: ALWIP and directional intra prediction. All steps shown in fig. 7 have the same input and output data, but the processing within each step varies according to the selected intra prediction method.

The first step of reference sample generation is performed as follows:

boundary averaging in the case of ALWIP

-sample selection and conditional filtering in case of directional intra prediction

The second step of convolving the reference sample with a convolution kernel requires that both methods use the same precision filter kernel and that the same number of coefficients is expected to be used.

For the case of directional intra prediction, the last step of upsampling may be skipped. This design implies some limitations on the parameters used by both methods.

In particular, the set of coefficients should include coefficients of a given precision (i.e., bit depth).

In one embodiment, the coefficients of the directional interpolation filter are defined to have the same precision as the coefficients belonging to matrix "a". For example, the set may be defined as given in table 1.

TABLE 1 Intra prediction interpolation Filter coefficients

In another embodiment, the coefficients of matrix a have 6-bit precision, so that processing of 10-bit samples is suitable for 16-bit operations. In other words, the coefficients of matrix a have 6-bit precision, making the processing of 10-bit samples suitable for 16-bit operations.

The ALWIP method may also be referred to as matrix-based intra prediction (MIP). The signaling of the intra prediction mode in the presence of MIP can be expressed as shown in table 1.

Table 1. signaling of intra prediction mode with MIP enabled.

The process of MPM list derivation requires intra prediction modes of neighboring blocks. However, even if MIP is not used for the current block, neighboring blocks can be predicted using MIP and thus will have intra-prediction modes that are inconsistent with conventional non-MIP intra-prediction modes. For this purpose, a look-up table (tables 2-4) is introduced that maps the input MIP mode index to the regular intra prediction mode.

Table 2 mode mapping lookup table for 4 x 4 block

Table 3 mode mapping lookup table for 8 x 8 block

MIP index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18
																				intraPredMode	0	1	0	1	0	22	18	18	1	0	1	0	1	0	44	0	50	1	0

Table 4 mode mapping lookup table for 16 x 16 block

MIP index	0	1	2	3	4	5	6	7	8	9	10
												intraPredMode	1	1	1	1	18	0	1	0	1	50	0

When predicting a MIP block, its MPM list is constructed taking into account neighboring non-MIP modes. These patterns map to MIP patterns using two steps:

at a first step, directional intra prediction modes are mapped to a reduced set of directional modes (see Table 5)

-a second step of determining a MIP mode based on the determined directional mode of the reduced set of directional modes.

TABLE 5 mapping directional intra prediction modes to a reduced set of directional modes

intraPredMode	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
																			intraPredMode33	0	1	2	2	3	3	4	4	5	5	6	6	7	7	8	8	9	9

intraPredMode	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35
																			intraPredMode33	10	10	11	11	12	12	13	13	14	14	15	15	16	16	17	17	18	18

intraPredMode	36	37	38	39	40	41	42	43	44	45	46	47	48	49	50	51	52	53
																			intraPredMode33	19	19	20	20	21	21	22	22	23	23	24	24	25	25	26	26	27	27

intraPredMode	54	55	56	57	58	59	60	61	62	63	64	65	66	67
															intraPredMode33	28	28	29	29	30	30	31	31	32	32	33	33	34	DM_CHROMA_IDX

As shown in FIG. 9, the MIP coefficients (matrix)Andelement(s) has a length of 10 bits, including a value (magnitude) and sign (1 bit) of 9 bits. The statistics of the MIP coefficient values used in the reference software VTM-5.0 and the H.266/VVC draft specification are shown in Table 6.

TABLE 6 statistics of MIP coefficients

Block size	Value of	Range
			4×4	-222...435	657
8×8	-207...476	683
			16×16	-170...314	484

In the proposed invention, the values of the MIP coefficients are identical to the values of the filter coefficients used for interpolation filtering in intra prediction. This agreement enables reuse of multipliers used for intra-prediction interpolation filtering in matrix multiplication by MIP. To achieve this, the value of the MIP coefficients should not exceed 6 bits. As shown in fig. 10, the 6-bit value of the MIP coefficient, which is affected by the position of the non-zero Most Significant Bit (MSB), is extracted from the 9-bit value 1001. The 4 cases indicated by 1002-1005 can be used. The value C of 9 bits of the MIP coefficient is to be recovered_MIPAnd symbols, the following formula may be used:

C_MIP＝v_sgn·(q＜＜s)，

where q is the 6-bit value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient. The left shift values affected by the most significant bit position are shown in table 7.

TABLE 7 left shift values affected by the most significant bit position

Case index in FIG. 10	Left shift value
		1002	0
1003	1
		1004	2
1005	3

A representation of the MIP coefficients with a 6-bit value 1102 is shown in fig. 11. At the leftmost position, the MIP-coefficient symbol 1101 is placed. The left-shifted 2-bit value is appended to the MIP coefficient value. In fig. 12, it is shown how multipliers for intra prediction interpolation filtering are reused by MIP. The 6-bit value 1202 of the MIP coefficient and the pixel value 1203 (e.g., having a bit depth of 10 bits) are inputs to a multiplier 1204, the multiplier 1204 providing a result having a value 1205 preceded by a sign 1206, the sign 1206 having the same value as the sign 1201 of the MIP coefficient.

The multiplication operations in matrix multiplication are performed with reduced bit depth by repositioning the shift operations after multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

Array 7 exemplary values of MIP coefficients selected according to the proposed method

Array 8. exemplary values of MIR coefficients defined in the C/C + + programming language, selected according to the proposed method.

Array 9. exemplary values of MIP coefficients defined in the C/C + + programming language selected according to the proposed method.

The modified signaling is shown in table 9.

TABLE 9 unified signaling for Directional mode and MIP mode

If intra _ lwip _ flag is not signaled, it is inferred to be 0.

The derivation of the MPM list in the case of joint signaling is represented as follows:

the MPM coding (intra _ luma _ MPM _ idx) is not modified, i.e. the processing for the directional intra prediction mode is invoked.

non-MPM coding (intra _ luma _ MPM _ remaining) is called. For the case where the intra _ lwip _ flag is 1, the symbol is encoded using a truncated unary code or a fixed length code according to the value of the intra _ lwip _ mpm _ flag. When the flag is set equal to 0, the number of modes encoded by the fixed length code is set as follows:

cMax＝(cbWidth＝＝4&&cbHeight＝＝4)？

31:((cbWidth<＝8&&cbHeight<＝8)？15:7)

when intra _ lwip _ mpm _ flag is equal to 1, the number of modes that can be encoded using the truncated unary code is fixed and does not depend on the size of the current block. Another embodiment of the present invention facilitates the use of special flags for MPM coding. This flag may be referred to as intra _ luma _ planar _ flag or intra _ luma _ not _ planar _ flag, indicating whether the intra prediction mode is planar or non-planar, respectively. When mpm _ flag is equal to 0, intra _ luma _ planar _ flag is not signaled. According to this embodiment, the unified signaling mechanism may be defined as follows (see table 10). In this embodiment, MIP MPM signaling is enabled only if intra _ luma _ not _ planar _ flag is set equal to 0, indicating that the intra prediction mode is non-planar. Otherwise, when intra _ luma _ not _ planar _ flag is set equal to 1, it indicates that the intra prediction mode is planar. In this case, no additional MIP-related signaling is needed, since the planar mode is known not to belong to the set of MIP modes.

TABLE 10 unified signaling for Directional and MIP modes to facilitate intra _ luma _ non _ planar _ flag

In another embodiment, a more general intra _ luma _ head _ mpm _ flag is used instead of the intra _ luma _ non _ player _ flag. As in the previous embodiment, it is assumed that, among the set of temporary intra prediction modes, the most probable mode is the planar mode. Therefore, when intra _ lwip _ flag is set to zero, intra _ luma _ head _ mpm _ flag is set to 1, indicating that the intra prediction mode is planar. When intra _ lwip _ flag is set to 1, intra _ luma _ head _ mpm _ flag is set to 1, indicating that the intra prediction mode is the most probable MIP mode. Table 11 gives the syntax of MIP with the MPM list part based on intra _ luma _ head _ MPM _ flag signaling.

TABLE 11 unified signaling for Directional and MIP modes to facilitate intra _ luma _ head _ mpm _ flag

For values of MIP chunk size types 0, 1 and 2, the number of most likely MIP modes is 17, 0 and 1, respectively.

The MIP mode is defined by a multiplication matrix A and an offset matrix b, which are used to derive the reduced boundary bdry_redTo obtainReduced prediction sample pred_red：

pred_red＝A·bdry_red+b。

Table 12 gives the multiplication matrix a and offset value b for the most probable mode 17 for MIP block size type 0. The offset value b for MIP chunk size types 1 and 2 is equal to 0. Table 13 gives the multiplication matrix a for best mode 0 and MIP block size type 1. Table 14 gives the multiplication matrix a for best mode 1 and MIP block size type 2.

Table 12 multiplication matrix a and offset value b for the most probable MIP mode 17 for chunk size type 0. (in Table 12, multiplication matrix A is given with 8-bit precision, i.e., multiplication result A · bdry_redShould be shifted to the right by 8. The precision of multiplication matrix a in tables 13 and 14 is 9 bits. )

TABLE 12

Table 13 multiplication matrix a for the most probable MIP mode 0 for chunk size type 1.

Table 14. multiplication matrix a for the most probable MIP mode 1 for chunk size type 2.

When intra _ luma _ head _ mpm _ flag is signaled for MIP mode, the most likely MIP mode is indicated. This requires that the MPM list for MIP is constructed in the following way: the first element of the MPM list is always assigned to the same most likely MIP mode (e.g., 17, 0, or 1 for MIP chunk size types 0, 1, and 2, respectively).

The order in which the MRLP index (intra _ luma _ ref _ idx), the ISP flag (intra _ sub _ split _ flag), and the MIP flag (intra _ lwip _ flag) are signaled may be different. Table 15 shows an embodiment of table 10, where the MIP flag (intra _ lwip _ flag) is followed by the MRLP index (intra _ luma _ ref _ idx) and the ISP flag (intra _ sub _ partitions _ flag).

Table 15. unified signaling for the directional mode and MIP mode with intra _ luma _ non _ planar _ flag with alternative flag coding order is facilitated.

Fig. 13 illustrates a method of intra prediction according to the present disclosure, wherein the method of intra prediction is a directional intra prediction method or an ALWIP method, wherein the method includes the steps of: preparing a set of reference samples (step 1601); in the case where the method of intra prediction for the first block is directional intra prediction (step 1603: yes): obtaining (step 1605) a first prediction signal for a first block of a first picture by convolving a set of reference samples with a first set of coefficients; obtaining (step 1607) a first reconstructed block of the first picture from the first prediction signal; and in the case that the method of intra prediction of the second block is ALWIP (step 1603: NO): obtaining (step 1609) a second prediction signal for a second block of the second picture by convolving a set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision; upsampling (1611) the second prediction signal; and obtaining (1613) a second reconstructed block of the second picture from the upsampled second prediction signal.

Fig. 14 shows an encoder 20 according to the present disclosure. The encoder 20 of fig. 14 includes: a preparation unit 2001 configured to prepare a set of reference samples; a first obtaining unit 2003 configured to: in the case of intra-predicting the first block by directional intra-prediction: obtaining a first prediction signal for a first block of the first picture by convolving a set of reference samples with a first set of coefficients, and obtaining a first reconstructed block of the first picture from the first prediction signal; a second obtaining unit 2005 configured to: in the case of intra-prediction of a second block of a second picture by ALWIP: obtaining a second prediction signal for a second block of the second picture by convolving a set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a have the same precision as the first set of coefficients; upsampling the second prediction signal; and obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

Fig. 15 shows a decoder 30 according to the present disclosure. The decoder 30 of fig. 15 includes: a preparation unit 3001 configured to prepare a set of reference samples; a first obtaining unit 3003 configured to: in the case of intra-predicting the first block by directional intra-prediction: obtaining a first prediction signal for a first block of the first picture by convolving a set of reference samples with a first set of coefficients, and obtaining a first reconstructed block of the first picture from the first prediction signal; a second obtaining unit (3005) configured to: in the case of intra-prediction of a second block of a second picture by ALWIP: obtaining a second prediction signal for a second block of the second picture by convolving a set of reference samples with a second set of coefficients, wherein the second set of coefficients comprises coefficients of a core matrix a of the ALWIP, and the coefficients of the core matrix a and the first set of coefficients have the same precision; upsampling the second prediction signal; and obtaining a second reconstructed block of the second picture from the upsampled second prediction signal.

In the encoder 20 according to fig. 14 and/or in the decoder 30 according to fig. 15, the first obtaining unit and the second obtaining unit may be identical.

Mathematical operators

The mathematical operators used in this application are similar to those used in the C programming language. However, the results of integer division and arithmetic shift operations are more precisely defined, and additional operations are defined, such as exponentiation and real-valued division. The numbering and counting convention generally starts with 0, e.g., "first" corresponds to 0 th, "second" corresponds to 1 st, and so on.

Arithmetic operator

The following arithmetic operators are defined as follows:

+ addition

Subtraction (as two-parameter operator) or negation (as unary prefix operator)

Multiplication, including matrix multiplication

x^yAnd (4) performing an exponentiation operation. Specifying the y power of x. In other contexts, such notation is used for superscript purposes and not for interpretation as an exponential operation.

Integer division, where the result is truncated towards zero. For example, 7/4 and-7/-4 are truncated to 1, while-7/4 and 7/-4 are truncated to-1.

Division to mean division in the mathematical equation without truncation or rounding.

For representing divisions in a mathematical equation that do not require truncation or rounding.

The sum of (i), where i takes all integer values from x to y and including y.

x% y modulus. The remainder of x divided by y is defined only for integers x and y where x >0 and y > 0.

Logical operators

The following logical operators are defined as follows:

boolean logical AND of x & & y x and y "

Boolean logical "OR" of x | y x and y "

| A Boolean logic "not"

Z if x is true or not equal to 0, then take the value of y; otherwise, take the value of z.

Relational operators

The following relational operators are defined as follows:

is greater than

Greater than or equal to

< less than

Less than or equal to

Equal to

| A Is not equal to

When the relationship operator is applied to a syntax element or variable that has been assigned "na" (not applicable), the value "na" is treated as a different value for the syntax element or variable. The value "na" is considered not equal to any other value.

Bitwise operator

The following bitwise operators are defined as follows:

and is bit by bit. When operating on integer parameters, the binary complement representation of the integer value is operated on. When operating on a binary parameter containing fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.

I is bit-wise or. When operating on integer parameters, the binary complement representation of the integer value is operated on. When operating on a binary parameter containing fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.

And E, bitwise exclusive or. When operating on integer parameters, the binary complement representation of the integer value is operated on. When operating on a binary parameter containing fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.

x > > y represents an arithmetic right shift by y binary bits over the two's complement integer of x. The function is defined only for non-negative integer values of y. Before the shift operation, the value of the bit shifted into the Most Significant Bit (MSB) due to the right shift is equal to the MSB of x.

The two's complement integer of x < < y over x represents an arithmetic left shift by y binary bits. The function is defined only for non-negative integer values of y. The bit shifted into the Least Significant Bit (LSB) due to the left shift has a value equal to 0.

Assignment operators

The following arithmetic operators are defined as follows:

operator for value assignment

+ is incremented, i.e., x + + corresponds to x + 1; when used in a queue index, the value of the variable is taken prior to the increment operation.

-decreasing, i.e. x-corresponds to x ═ x-1; when used in a queue index, the value of the variable is taken prior to the decrement operation.

Increasing by a specified amount, i.e., x + ═ 3 corresponds to x ═ x +3, and x + ═ 3 corresponds to x ═ x + (-3).

Decreasing by a given amount, i.e. x-3 corresponds to x-3 and x-3 corresponds to x- (-3).

Symbol of range

The following notation is used to designate ranges of values:

y.. z x takes integer values from y to z (including y and z), where x, y, and z are integers and z is greater than y.

Mathematical function

The following mathematical functions are defined:

asin (x) triangular arcsine function, operating on a parameter x in the range-1.0 to 1.0 (including-1.0 and 1.0), where the output values range from- π ÷ 2 to π ÷ 2 (including- π ÷ 2 and π ÷ 2) in radians

Atan (x) triangular arctangent function, operating on the parameter x, where the output ranges from- π/2 to π/2 in radians (inclusive of- π/2 and π/2)

Ceil (x) is the smallest integer greater than or equal to x.

Clipl_Y(x)＝Clip3(0,(1<<BitDepth_Y)-1,x)

Clipl_C(x)＝Clip3(0,(1<<BitDepth_C)–1,x)

Cos (x) a trigonometric cosine function operating on a parameter x in radians.

Floor (x) maximum integer less than or equal to x

Ln (x) the natural logarithm of x (base e logarithm, where e is the base constant 2.718281828 of the natural logarithm).

Log2(x) x base 2 logarithm.

Log10(x) x base 10 logarithm.

Round(x)＝Sign(x)*Floor(Abs(x)+0.5)

Sin (x) trigonometric sine function operating on parameter x in radians

Swap(x,y)＝(y,x)

Tan (x) a trigonometric tangent function operating on a parameter x in radians

Order of operation priority

When the priority order in the expression is not explicitly indicated with parentheses, the following rule applies:

-the operation with the higher priority is performed before any operation with the lower priority.

-performing the operations of the same priority in turn from left to right.

The following table specifies the operational priorities from highest to lowest; a higher position in the table indicates a higher priority.

For those operators that are also used in the C programming language, the priority order used in this specification is the same as the priority order used in the C programming language.

Table: operation priority from highest (at the top of the table) to lowest (at the bottom of the table)

Textual description of logical operations

In the text, the statement of a logical operation as will be described mathematically in the form:

can be described in the following way:

… … the following/… … the following applies:

-if condition 0, status 0

Else, if condition 1, state 1

-...

Else (information flag on remaining condition), status n

Each "if … … else if … … else … …" statement in the text starts with "… … as follows" or "… … as applies" followed by "if … …". The last condition of "if … … otherwise, if … … else … …" is always "else … …". The "if … … otherwise sandwiched between, if … … otherwise … …" statement may be identified by matching "… … as follows" or "… … as follows applies" with the end "otherwise … …".

In the text, the statement of a logical operation as will be described mathematically in the form:

can be described in the following way:

… … the following/… … the following applies:

-state 0 if all of the following conditions are true:

-condition 0a

-condition 0b

-else, state 1 if one or more of the following conditions is true:

-condition 1a

-condition 1b

……

-else, state n

In the text, the statement of a logical operation as will be described mathematically in the form:

if(condition 0)

statement 0

if(condition 1)

statement 1

can be described in the following way:

when condition 0, status 0

When condition 1, condition 1

Although embodiments of the present invention have been described primarily based on video coding, it should be noted that embodiments of the encoding system 10, encoder 20 and decoder 30 (and corresponding system 10) and other embodiments described herein may also be configured for still picture processing or encoding, i.e., independent of the processing or encoding of individual pictures such as any previous or consecutive pictures in video coding. In general, where picture processing encoding is limited to a single picture 17, only inter prediction units 244 (encoders) and 344 (decoders) may not be available. All other functions (also referred to as tools or techniques) of video encoder 20 and video decoder 30 may be used for still picture processing as well, such as residual calculation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse) transform 212/312, partitioning 262/362, intra prediction 254/354 and/or loop filtering 220, 320, as well as entropy encoding 270 and entropy decoding 304.

For example, embodiments of encoder 20 and decoder 30 and functions described herein, for example, with reference to encoder 20 and decoder 30, may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code over a communication medium and executed by a hardware-based processing unit. The computer readable medium may include: a computer-readable storage medium corresponding to a tangible medium such as a data storage medium; or communications media including any medium that facilitates transfer of a computer program from one place to another, such as according to a communications protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in certain aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be implemented entirely in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in various apparatuses or devices including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but the various components, modules, or units do not necessarily need to be implemented by different hardware units. In particular, as described above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperative hardware units (including one or more processors as described above) in combination with appropriate software and/or firmware.

The present disclosure also discloses the following seven aspects:

a first aspect of the intra prediction method includes the steps of:

preparing a set of reference samples; the reference samples are convolved and the prediction signal is upsampled.

For this aspect, the set of coefficients may be adaptively defined according to the position of the prediction sample.

A second aspect of the method according to the first aspect, wherein the set of coefficients used for directional intra prediction has the same precision as the core a coefficients of the ALWIF.

A third aspect of the method according to the first aspect, wherein upsampling is skipped for directional intra prediction.

A fourth aspect of the decoder comprises processing circuitry for performing the method according to any of the first to third aspects.

A fifth aspect of the computer program product comprises program code for performing the method according to any of the first to third aspects.

A sixth aspect of the decoder, comprising: one or more processors; and

a non-transitory computer readable storage medium coupled to a processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the decoder to perform the method according to any one of the first to third aspects.

A seventh aspect of the encoder, comprising: one or more processors; and

a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the encoder to perform the method according to any one of the first to third aspects.

In addition, the present disclosure discloses the following thirteen aspects.

A first aspect of a method for intra prediction of a block, comprising:

-obtaining two rows of reconstructed neighboring samples;

-deriving a set of reference samples based on the two rows of reconstructed neighboring samples;

-obtaining a set of MIP coefficients based on intra prediction modes obtained from the bitstream,

wherein, the MIP coefficient C in a group of MIP coefficients_MIPObtained (recovered) by:

C_MIP＝v_sgn·(q＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient;

-obtaining a prediction block based on a set of reference samples and a set of MIP coefficients;

-wherein the reconstructed picture is obtained based on the prediction block.

A second aspect of the method according to the first aspect, wherein obtaining the prediction block based on the set of reference samples and the set of MIP coefficients comprises a matrix multiplication of the reference samples and the set of MIP coefficients, wherein the multiplication operation in the matrix multiplication is performed at a reduced bit depth by relocating the shift operation after the multiplication:

p·C_MIP＝v_sgn·((p·q)＜＜s)，

wherein q is the value of the MIP coefficient; s is a left shift value; v. of_sgnIs the sign value of the MIP coefficient; p is a reference sample, and p is,or

A third aspect of the method according to any of the first or second aspects, wherein the value q of the MIP coefficients is a 6-bit depth value.

A fourth aspect of the method according to any of the preceding aspects, wherein the left-shifted value is a 2-bit depth value.

A fifth aspect of the method according to any one of the preceding aspects, wherein the multiplication is performed by means of a multiplier used in an intra interpolation process of the angular intra prediction.

A sixth aspect of the method according to any preceding aspect, wherein the MIP coefficients are any one of the following arrays:

a seventh aspect of the method according to any of the preceding aspects, wherein the MIP coefficients are any one of the following arrays:

an eighth aspect of the method according to any of the preceding aspects, wherein the MIP coefficients are any one of the following arrays:

a ninth aspect of the encoder comprises processing circuitry for performing the method according to any one of the first to eighth aspects.

A tenth aspect of the decoder comprises processing circuitry for performing the method according to any one of the first to eighth aspects.

An eleventh aspect of the computer program product comprises program code for performing the method according to any of the first to eighth aspects.

A twelfth aspect of the decoder, comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the decoder to perform the method according to any one of the first to eighth aspects.

A thirteenth aspect of the encoder, comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the encoder to perform the method according to any one of the first to eighth aspects.

120页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：在以自适应区域数量进行的几何分区中的帧间预测

Method and apparatus for quantizing coefficients for matrix-based intra prediction techniques

相关技术

网友询问留言