Method and apparatus for bit depth control of bi-directional optical flow

文档序号:214976 发布日期:2021-11-05 浏览:36次 中文

阅读说明:本技术 用于双向光流的比特深度控制的方法和设备 (Method and apparatus for bit depth control of bi-directional optical flow ) 是由 修晓宇 陈漪纹 王祥林 于 2020-03-26 设计创作,主要内容包括:提供了一种比特深度控制方法、装置和非暂态计算机可读存储介质。所述方法包括:获得与视频块相关联的第一参考图片I~((0))和第二参考图片I~((1));获得与视频块相关联的第一参考图片I~((0))和第二参考图片I~((1));从所述第一参考图片I~((0))中的参考块获得所述视频块的第一预测样点I~((0))(i,j);从所述第二参考图片I~((1))中的参考块获得所述视频块的第二预测样点I~((1))(i,j);通过对内部双向光流BDOF参数应用右移来控制BDOF的内部比特深度;基于BDOF被应用于所述视频块根据所述第一预测样点I~((0))(f,j)和所述第二预测样点I~((1))(i,j)来获得所述视频块中的样点的运动细化;以及基于所述运动细化获得所述视频块的双向预测样点。(A bit depth control method, apparatus, and non-transitory computer-readable storage medium are provided. The method comprises the following steps: obtaining a first reference picture I associated with a video block (0) And a second reference picture I (1) (ii) a Obtaining a first reference picture I associated with a video block (0) And a second reference picture I (1) (ii) a From the first reference picture I (0) Obtaining a first prediction sample point I of the video block from the reference block in (1) (0) (i, j); from the second reference picture I (1) Obtaining a second prediction sample point I of the video block from the reference block in (1) (1) (i, j); controlling the internal bit depth of the BDOF by applying a right shift to the internal bi-directional optical flow BDOF parameters; BDOF-based is applied to the video block according to the first predicted sample point I (0) (f, j) and the second predicted sample I (1) (i, j) to obtain motion refinement of samples in the video block; and obtaining bi-predictive samples for the video block based on the motion refinement.)

1. A bi-directional optical flow BDOF bit-depth control method for coding and decoding a video signal, the bit-depth control method comprising:

obtaining a first reference picture I associated with a video block(0)And a second reference picture I(1)Wherein, in display order, the first reference picture I(0)Prior to the current picture, and the second reference picture I(1)Subsequent to the current picture;

from the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point of the current picture;

from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);

Controlling an inner bit depth of a BDOF by applying a right shift to an inner BDOF parameter when a coded bit depth is greater than 12 bits, wherein the inner BDOF parameter comprises an inner BDOF based on the first predicted sample point I(0)(I, j) the second predicted sample point I(1)(I, j) and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) obtaining a horizontal gradient value and a vertical gradient value by the sampling point difference between the two values;

BDOF-based is applied to the video block according to the first predicted sample point I(0)(I, j) and the second predicted sample I(1)(i, j) to obtain motion refinement of samples in the video block; and

and obtaining bidirectional prediction sampling points of the video block based on the motion refinement.

2. The method of claim 1, wherein, when the coded bit depth is greater than 12 bits, controlling an inner bit depth of BDOF by applying a right shift to the inner BDOF parameter comprises:

based on the first predicted sampling point I(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);

based on the second predicted sample point I(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);

based on the first predicted sampling point I(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample point I(0)(ii) a first vertical gradient value of (i, j);

based on the second predicted sample point I(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample point I(1)(ii) a second vertical gradient value of (i, j);

right-shifting the first horizontal gradient value and the second horizontal gradient value by a first shift value; and

right-shifting the first vertical gradient value and the second vertical gradient value by the first shift value.

3. The method of claim 2, wherein the first shift value is equal to the coded bit depth minus 6.

4. The method of claim 1, further comprising:

obtaining a first correlation value, wherein the first correlation value is based on the first predicted sampling point I(0)(I, j) and (ii) a horizontal gradient value based on the second predicted sample point I(1)(ii) sum of the horizontal gradient values of (i, j);

obtaining a second correlation value, wherein the second correlation value is based on the first predicted sample point I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) sum of vertical gradient values of (i, j);

modifying the first correlation value by right-shifting the first correlation value using a second shift value; and

modifying the second correlation value by right-shifting the second correlation value using the second shift value.

5. The method of claim 4, wherein the second shift value is equal to the coded bit depth minus 11.

6. The method of claim 4, further comprising:

right shifting the first predicted sample I by using a third shift value(0)(i, j) obtaining a first modified predicted sample;

right-shifting the second predicted sample I by using the third shift value(1)(i, j) obtaining a second modified predicted sample; and

obtaining a third correlation value, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.

7. The method of claim 6, wherein the third shift value is equal to the coded bit depth minus 8.

8. The method of claim 6, further comprising:

obtaining a first internal sum value based on a sum of squares of the first correlation values within each 4 x 4 sub-block of the video block;

obtaining a second internal sum value based on a sum of products of the first correlation value and the second correlation value within each 4 x 4 sub-block of the video block;

obtaining a third internal sum value based on a sum of products of the first correlation value and the third correlation value within each 4 x 4 sub-block of the video block;

obtaining a fourth internal sum value based on a sum of squares of the second correlation values within each 4 x 4 sub-block of the video block;

obtaining a fifth internal sum value based on a sum of products of the second correlation value and the third correlation value within each 4 x 4 sub-block of the video block;

obtaining a horizontal motion refinement value based on a quotient of the third internal summation value and the first internal summation value, wherein the motion refinement value comprises the horizontal motion refinement value;

obtaining a vertical motion refinement value based on the second, fourth, fifth, and horizontal motion refinement values, wherein the motion refinement value comprises the vertical motion refinement value; and

clipping the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.

9. The method of claim 8, wherein the motion refinement threshold is determined as a coded bit depth of 2 minus the power of 7.

10. A computing device, comprising:

one or more processors;

a non-transitory computer-readable storage medium storing instructions executable by the one or more processors, wherein the one or more processors are configured to:

obtaining a first reference picture I associated with a video block(0)And a second reference picture I(1)Wherein, in display order, the first reference picture I(0)Prior to the current picture, and the second reference picture I(1)Subsequent to the current picture;

from the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point of the current picture;

from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);

Controlling an internal bit depth of a bidirectional optical flow (BDOF) by applying a right shift to BDOF parameters when a coded bit depth is greater than 12 bits, wherein the internal BDOF parameters include an I-sample based on the first prediction(0)(I, j) the second predicted sample point I(1)(I, j) and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) horizontal and vertical gradients obtained from the difference in sample points between (i, j)A value of the metric;

BDOF-based is applied to the video block according to the first predicted sample point I(0)(I, j) and the second predicted sample I(1)(i, j) to obtain motion refinement of samples in the video block; and

and obtaining bidirectional prediction sampling points of the video block based on the motion refinement.

11. The computing device of claim 10, wherein the one or more processors configured to control an internal bit depth of BDOF by applying a right shift to the internal BDOF parameter when the encoded bit depth is greater than 12 bits are further configured to:

based on the first predicted sampling point I(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);

based on the second predicted sample point I(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);

based on the first predicted sampling point I(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample point I(0)(ii) a first vertical gradient value of (i, j);

based on the second predicted sample point I(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample point I(1)(ii) a second vertical gradient value of (i, j);

right-shifting the first horizontal gradient value and the second horizontal gradient value by a first shift value; and

right-shifting the first vertical gradient value and the second vertical gradient value by the first shift value.

12. The computing device of claim 11, wherein the first shift value is equal to the coded bit depth minus 6.

13. The computing device of claim 10, wherein the one or more processors are further configured to:

obtaining a first correlation value, wherein the first correlation value is based on the first predicted sampling point I(0)(I, j) and (ii) a horizontal gradient value based on the second predicted sample point I(1)(ii) sum of the horizontal gradient values of (i, j);

obtaining a second correlation value, wherein the second correlation value is based on the first predicted sample point I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) sum of vertical gradient values of (i, j);

modifying the first correlation value by right-shifting the first correlation value using a second shift value; and

modifying the second correlation value by right-shifting the second correlation value using the second shift value.

14. The computing device of claim 13, wherein the second shift value is equal to the coded bit depth minus 11.

15. The computing device of claim 13, wherein the one or more processors are further configured to:

right shifting the first predicted sample I by using a third shift value(0)(i, j) obtaining a first modified predicted sample;

right-shifting the second predicted sample I by using the third shift value(1)(i, j) obtaining a second modified predicted sample; and

obtaining a third correlation value, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.

16. The computing device of claim 15, wherein the third shift value is equal to the coded bit depth minus 8.

17. The computing device of claim 15, wherein the one or more processors are further configured to:

obtaining a first internal sum value based on a sum of squares of the first correlation values within each 4 x 4 sub-block of the video block;

obtaining a second internal sum value based on a sum of products of the first correlation value and the second correlation value within each 4 x 4 sub-block of the video block;

obtaining a third internal sum value based on a sum of products of the first correlation value and the third correlation value within each 4 x 4 sub-block of the video block;

obtaining a fourth internal sum value based on a sum of squares of the second correlation values within each 4 x 4 sub-block of the video block;

obtaining a fifth internal sum value based on a sum of products of the second correlation value and the third correlation value within each 4 x 4 sub-block of the video block;

obtaining a horizontal motion refinement value based on a quotient of the third internal summation value and the first internal summation value, wherein the motion refinement value comprises the horizontal motion refinement value;

obtaining a vertical motion refinement value based on the second, fourth, fifth, and horizontal motion refinement values, wherein the motion refinement value comprises the vertical motion refinement value; and

clipping the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.

18. The computing device of claim 17, wherein the motion refinement threshold is determined as a coded bit depth of 2 minus the power of 7.

19. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to:

obtaining andfirst reference picture I associated with video block(0)And a second reference picture I(1)Wherein, in display order, the first reference picture I(0)Prior to the current picture, and the second reference picture I(1)Subsequent to the current picture;

from the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point of the current picture;

from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);

Controlling an internal bit depth of a BDOF by applying a right shift to internal bi-directional optical flow BDOF parameters when a coded bit depth is greater than 12 bits, wherein the internal BDOF parameters include an I-prediction sample based on the first prediction sample(0)(I, j) the second predicted sample point I(1)(I, j) and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) obtaining a horizontal gradient value and a vertical gradient value by the sampling point difference between the two values;

BDOF-based is applied to the video block according to the first predicted sample point I(0)(I, j) and the second predicted sample I(1)(i, j) to obtain motion refinement of samples in the video block; and

and obtaining bidirectional prediction sampling points of the video block based on the motion refinement.

20. The non-transitory computer readable storage medium of claim 19, wherein the plurality of programs that cause the computing device to, when the encoded bit depth is greater than 12 bits, execute the plurality of programs that control an internal bit depth of BDOF by applying a right shift to the internal BDOF parameter further cause the computing device to:

based on the first predicted sampling point I(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);

based on the second predicted sample point I(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);

based on the first predicted sampling point I(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample point I(0)(ii) a first vertical gradient value of (i, j);

based on the second predicted sample point I(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample point I(1)(ii) a second vertical gradient value of (i, j);

right-shifting the first horizontal gradient value and the second horizontal gradient value by a first shift value; and

right-shifting the first vertical gradient value and the second vertical gradient value by the first shift value.

21. The non-transitory computer-readable storage medium of claim 20, wherein the first shift value is equal to the coded bit depth minus 6.

22. The non-transitory computer readable storage medium of claim 19, wherein the plurality of programs further cause the computing device to:

obtaining a first correlation value, wherein the first correlation value is based on the first predicted sampling point I(0)(I, j) and (ii) a horizontal gradient value based on the second predicted sample point I(1)(ii) sum of the horizontal gradient values of (i, j);

obtaining a second correlation value, wherein the second correlation value is based on the first predicted sample point I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) sum of vertical gradient values of (i, j);

modifying the first correlation value by right-shifting the first correlation value using a second shift value; and

modifying the second correlation value by right-shifting the second correlation value using the second shift value.

23. The non-transitory computer readable storage medium of claim 22, wherein the second shift value is equal to the coded bit depth minus 11.

24. The non-transitory computer readable storage medium of claim 22, wherein the plurality of programs further cause the computing device to:

right shifting the first predicted sample I by using a third shift value(0)(i, j) obtaining a first modified predicted sample;

right-shifting the second predicted sample I by using the third shift value(1)(i, j) obtaining a second modified predicted sample; and

obtaining a third correlation value, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.

25. The non-transitory computer-readable storage medium of claim 24, wherein the third shift value is equal to the coded bit depth minus 8.

26. The non-transitory computer readable storage medium of claim 24, wherein the plurality of programs further cause the computing device to:

obtaining a first internal sum value based on a sum of squares of the first correlation values within each 4 x 4 sub-block of the video block;

obtaining a second internal sum value based on a sum of products of the first correlation value and the second correlation value within each 4 x 4 sub-block of the video block;

obtaining a third internal sum value based on a sum of products of the first correlation value and the third correlation value within each 4 x 4 sub-block of the video block;

obtaining a fourth internal sum value based on a sum of squares of the second correlation values within each 4 x 4 sub-block of the video block;

obtaining a fifth internal sum value based on a sum of products of the second correlation value and the third correlation value within each 4 x 4 sub-block of the video block;

obtaining a horizontal motion refinement value based on a quotient of the third internal summation value and the first internal summation value, wherein the motion refinement value comprises the horizontal motion refinement value;

obtaining a vertical motion refinement value based on the second, fourth, fifth, and horizontal motion refinement values, wherein the motion refinement value comprises the vertical motion refinement value; and

clipping the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.

27. The non-transitory computer readable storage medium of claim 26, wherein the motion refinement threshold is determined as a coded bit depth of 2 minus the power of 7.

Technical Field

The present application relates to video coding and compression. More particularly, the present disclosure relates to methods and apparatus for a bi-directional optical flow (BDOF) method for video coding and decoding.

Background

Various video codec techniques may be used to compress the video data. Video coding is performed according to one or more video coding standards. For example, video codec standards include general video codec (VVC), joint exploration test model (JEM), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), Moving Picture Experts Group (MPEG) codec, and so forth. Video coding is typically performed using prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy present in video images or sequences. An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing video quality degradation.

Disclosure of Invention

Examples of the present disclosure provide methods and apparatus for bit depth control of bi-directional optical flow. According to a first aspect of the present disclosure, there is provided a BDOF bit depth control method for encoding and decoding a video signal. The method may include obtaining a first reference picture I associated with a video block(0)And a second reference picture I(1). In display order, the first reference picture I(0)May precede the current picture and the second reference picture I(1)May follow the current picture. The method may comprise deriving from the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j may represent the coordinates of one sample point of the current picture. The method may comprise deriving the second reference picture I from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The method may include controlling an inner bit depth of the BDOF by applying a right shift to an inner BDOF parameter when the coded bit depth is greater than 12 bits. BDOF may use internal BDOF parameters including based on the first predicted sample I(0)(I, j) the second predicted sample point I(1)(I, j) and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a horizontal gradient value and a vertical gradient value obtained from a sample point difference between (i, j). The method may comprise applying BDOF based on the first predicted sample I to the video block(0)(I, j) and the second predicted sample I(1)(i, j) to obtain a motion refinement of samples in the video block. The method may include obtaining bi-predictive samples for the video block based on the motion refinement.

According to a second aspect of the present disclosure, a computing device is provided. The computing device may include one or more processors, a non-transitory computing device storing instructions executable by the one or more processorsA machine readable memory. The one or more processors may be configured to obtain a first reference picture I associated with a video block(0)And a second reference picture I(1). In display order, the first reference picture I(0)May precede the current picture and the second reference picture I(1)May follow the current picture. The one or more processors may be configured to derive the first reference picture I from the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j may represent the coordinates of one sample point of the current picture. The one or more processors may be configured to derive the second reference picture I from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The one or more processors may be configured to control an inner bit depth of the BDOF by applying a right shift to the inner BDOF parameter when the coded bit depth is greater than 12 bits. BDOF may use internal BDOF parameters including based on the first predicted sample I(0)(I, j) the second predicted sample point I(1)(I, j) and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a horizontal gradient value and a vertical gradient value obtained from a sample point difference between (i, j). The one or more processors may be configured to apply BDOF to the video block according to the first predicted sample point I based on(0)(I, j) and the second predicted sample I(1)(i, j) to obtain a motion refinement of samples in the video block. The one or more processors may be configured to obtain bi-predictive samples for the video block based on the motion refinement.

According to a third aspect of the present disclosure, a non-transitory computer-readable storage medium having instructions stored therein is provided. When executed by one or more processors of an apparatus, the instructions may cause the apparatus to: obtaining a first reference picture I associated with a video block(0)And a second reference picture I(1). In display order, the first reference picture I(0)May precede the current picture, and the secondTwo reference pictures I(1)May follow the current picture. The instructions may cause the apparatus to: from the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j may represent the coordinates of one sample point of the current picture. The instructions may cause the apparatus to: from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The instructions may cause the apparatus to: when the coded bit depth is greater than 12 bits, the internal bit depth of the BDOF is controlled by applying a right shift to the internal BDOF parameters. BDOF may use internal BDOF parameters including based on the first predicted sample I(0)(I, j) the second predicted sample point I(1)(I, j) and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a horizontal gradient value and a vertical gradient value obtained from a sample point difference between (i, j). The instructions may cause the apparatus to: BDOF-based is applied to the video block according to the first predicted sample point I(0)(I, j) and the second predicted sample I(1)(i, j) to obtain a motion refinement of samples in the video block. The instructions may cause the apparatus to: and obtaining bidirectional prediction sampling points of the video block based on the motion refinement.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of an encoder according to an example of the present disclosure.

Fig. 2 is a block diagram of a decoder according to an example of the present disclosure.

Fig. 3A is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3B is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3C is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3D is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3E is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 4 is a graphical illustration of a bi-directional optical flow (BDOF) model according to an example of the present disclosure.

Fig. 5 is a flowchart illustrating a bit depth control method of encoding and decoding a video signal according to an example of the present disclosure.

Fig. 6 is a flowchart illustrating a method for controlling an internal bit depth of a BDOF according to an example of the present disclosure.

Fig. 7 is a diagram illustrating a computing environment coupled with a user interface according to an example of the present disclosure.

Detailed Description

Reference will now be made in detail to example embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same reference numerals in different drawings represent the same or similar elements, unless otherwise specified. The implementations set forth in the following description of example embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects set forth in the claims below related to the present disclosure.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is intended to mean and include any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are merely used to distinguish one type of information from another. For example, a first information may be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" may be understood to mean "when … …" or "at … …" or "in response to a determination", depending on the context.

The first version of the HEVC standard was completed in 2013 in 10 months, which provides a bit rate saving of about 50% or an equivalent perceptual quality compared to the previous generation video codec standard h.264/MPEG AVC. Although the HEVC standard provides significant codec improvements over its predecessors, there is evidence that additional codec tools can be used to achieve codec efficiencies over HEVC. On this basis, both VCEG and MPEG began the search for new codec technologies to achieve future video codec standardization. ITU-T VECG and ISO/IEC MPEG established a joint video exploration team (jfet) in 2015 at 10 months, and significant research into advanced technologies that can greatly improve codec efficiency began. Jfet maintains a reference software called Joint Exploration Model (JEM) by integrating a number of additional codec tools on top of the HEVC test model (HM).

In 10.2017, ITU-T and ISO/IEC issued joint proposals (CfP) for video compression with the ability to go beyond HEVC. In 4 months of 2018, 23 CfP replies were received and evaluated at the 10 th jfet meeting, demonstrating an improvement in compression efficiency of about 40% over HEVC. Based on the results of such evaluations, jfet initiated a new project to develop a new generation of video codec standard known as universal video codec (VVC). In the same month, a reference software code base called VVC Test Model (VTM) is established for demonstrating the reference implementation of the VVC standard.

Like HEVC, VVC is built on a block-based hybrid video codec framework. Fig. 1 shows a block diagram of a generic block-based hybrid video coding system. In particular, fig. 1 shows a typical encoder 100. The encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block prediction value 140, adder 128, transform 130, quantization 132, prediction related information 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, loop filter 122, entropy coding 138, and bitstream 144.

The input video signal is processed block by block, called Coding Unit (CU). In VTM-1.0, a CU may be up to 128 × 128 pixels. However, unlike HEVC which partitions blocks based on only quadtrees, in VVC one Coding Tree Unit (CTU) is divided into CUs to accommodate local characteristics that differ based on quadtrees. In addition, the concept of multiple partition unit types in HEVC is removed, i.e., the differentiation of CU, Prediction Unit (PU) and Transform Unit (TU) is no longer present in VVC; instead, each CU is always used as a basic unit for both prediction and transform without further partitioning. In the multi-type tree structure, one CTU is first divided according to a quadtree structure. Each leaf node of the quadtree may then be further partitioned according to a binary tree structure and a ternary tree structure.

As shown in fig. 3A, 3B, 3C, 3D, and 3E, there are five division types, quaternary division, horizontal binary division, vertical binary division, horizontal ternary division, and vertical ternary division.

Fig. 3A shows a diagram illustrating block quaternary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3B shows a diagram illustrating block vertical binary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3C shows a diagram illustrating block-level binary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3D shows a diagram illustrating block vertical ternary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3E shows a diagram illustrating block-level ternary partitioning in a multi-type tree structure according to the present disclosure.

In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") uses pixels from samples (referred to as reference samples) of already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from an already encoded video picture to predict a current video block. Temporal prediction reduces the temporal redundancy inherent in video signals. The temporal prediction signal for a given CU is typically signaled by one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and the temporal reference of the current CU. Also, if multiple reference pictures are supported, one reference picture index is additionally transmitted for identifying from which reference picture in the reference picture store the temporal prediction signal comes. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g., based on a rate-distortion optimization method. The prediction block is then subtracted from the current video block and the prediction residual is decorrelated and quantized using a transform. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further, before the reconstructed CU is placed in a reference picture store and used for coding future video blocks, loop filtering such as deblocking filters, Sample Adaptive Offset (SAO), and Adaptive Loop Filters (ALF) may be applied to the reconstructed CU. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to an entropy coding unit for further compression and packing to form the bitstream.

Fig. 2 presents a general block diagram of a block-based video decoder. In particular, fig. 2 shows a block diagram of a typical decoder 200. The decoder 200 has a bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, loop filter 228, motion compensation 224, picture buffer 226, prediction related information 234, and video output 232.

In fig. 2, a video bitstream is first entropy decoded at an entropy decoding unit. The coding mode and prediction information are sent to a spatial prediction unit (in the case of intra-coding) or a temporal prediction unit (in the case of inter-coding) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct a residual block. Then, the prediction block and the residual block are added. The reconstructed block may be further loop filtered and then stored in a reference picture store. The reconstructed video in the reference picture store is then sent out to drive the display device and used to predict future video blocks.

Bidirectional light stream

Conventional bi-prediction in video coding is a simple combination of two temporally predicted blocks obtained from already reconstructed reference pictures. However, due to the limitation of block-based motion compensation, there may still be residual small motion that may be observed between the samples of the two prediction blocks, thus reducing the efficiency of motion compensated prediction. To solve this problem, BDOF is applied in VVC to reduce the effect of such motion for each sample within a block.

Specifically, when bi-prediction is used, BDOF is motion refinement at the sample level performed on top of block-based motion compensated prediction, as shown in fig. 4. Fig. 4 shows a diagram of a BDOF model according to the present disclosure.

After applying BDOF inside a 6 × 6 window Ω around each 4 × 4 sub-block, the motion refinement (v) of this sub-block is calculated by minimizing the difference between the L0 and L1 predicted samples (vx,vy). Specifically, (v)x,vy) Is derived as

Wherein the content of the first and second substances,is a floor function; clip3(min. max, x) is the clipping of a given value x to [ min, max [ ]]A function within a range; symbol > tablePerforming bit-by-bit right shift operation; the symbol < represents bit-by-bit left shift operation; th (h)BDOFIs a motion refinement threshold for preventing propagation errors due to irregular local motion, which is equal to 213-BDWhere BD is the bit depth of the input video. For example, the bit depth represents the number of bits used to define each pixel. In the step (1), the first step is carried out,

S1、S2、S3、S5and S6Is calculated as

Wherein the content of the first and second substances,

θ(i,j)=(I(1)(i,j)>>6)-(I(0)(i,j)>>6)

wherein, I(k)(i, j) are sample values at coordinates (i, j) of the prediction signal in the list k (k ═ 0, 1), which are generated with medium to high precision (i.e., 16 bits);andthe horizontal gradient and the vertical gradient of a sample point obtained by directly calculating the difference between two adjacent sample points of the sample point, that is,

based on the motion refinement derived in (1), the final bidirectional prediction samples of the CU are computed by interpolating the L0/L1 prediction samples along the motion trajectory based on the optical flow model, as shown in the following equation

Wherein shift and ooffsetIs the right shift value and offset value applied to combine the L0 prediction signal and the L1 prediction signal for bi-directional prediction, equal to 15-BD and 1 < (14-BD) +2 (1 < 13), respectively. Table 1 shows the specific bit widths of the intermediate parameters involved in the BDOF procedure. As shown in the table, the internal bit width of the entire BDOF process does not exceed 32 bits. In addition, the multiplication with the worst possible input takes place on the product v in equation (1)xS2,mWhere S is input2,mAnd vxWith 15 bits and 4 bits respectively. Therefore, a 15-bit multiplier is sufficient for BDOF.

TABLE 1 bit Width of intermediate parameters of BDOF in VVC

Efficiency of bi-directional prediction

Although BDOF can improve the efficiency of bi-prediction, there is still an opportunity to further improve the design of BDOF. In particular, the present disclosure finds the following problem for controlling the bit width of intermediate parameters in existing BDOF designs of VVCs.

As shown in Table 1, the parameters θ (i, j) (i.e., the difference between the L0 predicted samples and the L1 predicted samples) and the parameter ψx(i, j) and ψy(i, j) (i.e., the sum of the horizontal/vertical L0 and L1 gradient values) are allExpressed in the same 11-bit width. While this approach may facilitate overall control of the internal bit-width of the BDOF, it is suboptimal in terms of the accuracy of the resulting motion refinement. As shown in equation (4), part of the reason may be to calculate the gradient values as the difference between neighboring predicted samples. Due to the high-pass nature of this process, the reliability of the resulting gradient may be reduced in the presence of noise (e.g., noise captured in the original video and codec noise generated during the codec process). Therefore, it may not always be beneficial to represent the gradient values with a high bit width.

As shown in Table 1, the maximum bit width usage in the entire BDOF process occurs in the computation of the vertical motion refinement vyWherein S is first introduced6(27 bits) is left-shifted by 3 bits and then subtracted ((v)xS2,m)<<12+vxS2,s) 2 (/ 30 bits). Thus, the maximum bit width of the current design is equal to 31 bits. In practical hardware implementations, the codec process with the maximum internal bit width greater than 16 bits is typically implemented with a 32-bit implementation. Thus, existing designs do not take full advantage of the effective dynamic range of the 32-bit implementation. This may result in an unnecessary loss of precision of the motion refinement obtained by BDOF.

Improving efficiency of bi-directional prediction using BDOF

In the present disclosure, an improved bit-width control method is proposed to solve these two problems of the bit-width control method indicated in the "problem statement" section for the existing BDOF design.

Fig. 5 illustrates a bit depth control method of encoding and decoding a video signal according to the present disclosure.

In step 510, a first reference picture I associated with a video block is obtained(0)And a second reference picture I(1). First reference picture I in display order(0)Before the current picture, and a second reference picture I(1)After the current picture. For example, the reference picture may be a video picture adjacent to the current picture being encoded.

In step 512, from the first reference picture I(0)Obtain a first prediction sample I of the video block(0)(i, j), wherein i and j represent the coordinates of a sample point of the current picture. For example, the first predicted sample I(0)(i, j) may be a prediction sample in the L0 list of the previous reference picture in display order using the motion vector.

In step 514, from the second reference picture I(1)Obtain a second prediction sample I of the video block(1)(i, j). For example, the second predicted sample I(1)(i, j) may be a prediction sample in the L1 list of the reference picture following in display order using the motion vector.

In step 516, when the coded bit depth is greater than 12 bits, the inner bit depth of the BDOF is controlled by applying a right shift to the inner BDOF parameter. For example, BDOF uses internal BDOF parameters including first predicted sample point I based(0)(I, j) second predicted sample point I(1)(I, j) and a first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) a horizontal gradient value and a vertical gradient value obtained from a sample point difference between (i, j).

In step 518, BDOF-based is applied to the video block according to the first predicted sample I(0)(i, j) and the second predicted samples to obtain a motion refinement of samples in the video block.

In step 520, bi-predictive samples for the video block are obtained based on the motion refinement.

First, to overcome the negative effects of gradient estimation errors, in the proposed method, the gradient values in equation (4) are calculatedAndintroducing an additional right shift ngradI.e. in order to reduce the internal bit width of the gradient values. Specifically, the horizontal and vertical gradients at each sample location are calculated as

In addition, the additional bits are shifted by nadjIntroduced into variable ψx(i,j)、ψy(i, j) and θ (i, j) to control the entire BDOF process to operate the BDOF process with the appropriate internal bit width, as depicted below:

as will be seen in Table 2, the parameter ψ is due to the modification applied to the number of bits shifted to the right in equations (6) and (7)x(i,j)、ψyThe dynamic ranges of (i, j) and θ (i, j) will be different, in contrast to the prior BDOF design shown in Table 1, where all three parameters are represented by the same dynamic range (i.e., 21 bits). Such variations may increase the internal parameter S1、S2、S3、S5And S6Which may increase the maximum bit width of the internal BDOF process to above 32 bits. Therefore, to ensure a 32-bit implementation, two additional clipping operations are introduced to the pair S2And S6In the calculation of the value of (c). In particular, in the proposed method, the values of these two parameters are calculated as:

wherein, B2And B6Are respectively used for controlling S2And S6The output dynamic range of the parameter. It should be noted that unlike gradient calculations, in equation (8), the clipping operation is only applied once to calculate the motion refinement of each 4 × 4 sub-block inside one BDOF CU, i.e. is invoked on a 4 × 4 unit basis. Thus, due to the clipping operation introduced in the proposed methodThe corresponding complexity increase of (a) is quite insignificant.

In practice, different values of n may be appliedgrad、nadj、B2And B6To achieve different trade-offs between the intermediate bit width and the accuracy of the inner BDOF derivation. As an embodiment of the present disclosure, n is proposedgradAnd nadjSet to 2, set B2Set to 25, and B6Set to 27. As another embodiment of the present disclosure, it is proposed to use ngradSet to 1, nadjSet to 4, set B2Set to 26, and B6Set to 28.

Table 2 shows the corresponding bit width of each intermediate parameter when the proposed bit width control method is applied to the BDOF. In table 2, the grey highlights the variation applied in the proposed bit width control method compared to the existing BDOF design in VVC. As shown in table 2, the proposed bit width control method makes the internal bit width of the entire BDOF process not more than 32 bits. In addition, with the proposed design, the maximum bit width is just 32 bits, and thus the available dynamic range of a 32-bit hardware implementation can be fully exploited. On the other hand, as shown in the table, the multiplication with the worst input occurs at the product vxS2,mWhere S is input2,mIs 14 bits, and input vxIs 6 bits. Thus, as with the existing BDOF design, a 16-bit multiplier is also large enough when applying the proposed method.

Table 2 bit width of the intermediate parameters of the proposed method

Fig. 6 illustrates an example method for controlling the internal bit depth of a BDOF according to this disclosure.

In step 610, based on the first predicted sample I(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j).

In step 612, based on the second predicted sample I(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j).

In step 614, based on the first predicted sample I(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample point I(0)(ii) a first vertical gradient value of (i, j).

In step 616, based on the second predicted sample I(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample point I(1)(ii) a second vertical gradient value of (i, j).

In step 618, the first horizontal gradient value and the second horizontal gradient value are right-shifted by a first shift value.

In step 620, the first vertical gradient value and the second vertical gradient value are right-shifted by a first shift value.

In the above method, a clipping operation as in equation (8) is added to avoid deriving vxAnd vyThe intermediate parameter overflows. However, such clipping is only required when the relevant parameters are accumulated in a large local window. When a small window is applied, overflow may not occur. Therefore, in another embodiment of the present disclosure, the following bit depth control method is proposed for the BDOF method that does not require clipping, as described below.

First, at each sample position, the gradient value in equation (4)Andis calculated as

Secondly, the relevant parameter psi for the BDOF procedurex(i,j)、ψy(i, j) and θ (i, j) are calculated as:

third, S1、S2、S3、S5And S6Is calculated as

Fourth, motion refinement for each 4 × 4 sub-block (v)x,vy) Is derived as

Fifth, based on the optical flow model, the final bidirectional prediction samples of the CU are computed by interpolating the L0/L1 prediction samples along the motion trajectory, as shown in the following equation

The above-described BDOF bit-width control method is based on the assumption that the internal bit-depth used to encode the video cannot exceed 12 bits, so that the accuracy of the output signal from Motion Compensation (MC) is 14 bits. In other words, when the internal bit depth is greater than 12 bits, the BDOF bit width control methods as specified in equations (9) to (13) cannot guarantee that all bit depths of the internal BDOF operation are within 32 bits. To solve this overflow problem of high internal bit depth, an improved BDOF bit depth control method is disclosed below, which introduces an additional right shift of the bit level depending on the internal bit depth applied after the MC level. In this method, when the internal bit depth is greater than 12 bits, the output signal of the MC is always shifted to 14 bits, so that the existing BDOF bit depth control method, which is designed for the internal bit depth of 8 to 12 bits, can be reused for the BDOF process of the high bit depth video. In particular, assuming bit-depth is the internal bit depth, the proposed method can be implemented by:

first, at each sample position, the gradient value in equation (4)Andis calculated as

Secondly, the relevant parameter psi for the BDOF procedurex(i,j)、ψy(i, j) and θ (i, j) are calculated as:

third, S1、S2、S3、S5And S6Is calculated as

Fourth, motion refinement for each 4 × 4 sub-block (v)x,vy) Is derived as

Therein, thBDOFIs the motion refinement threshold, which is 1 < max (5, bit-depth-7) based on the internal bit depth calculation. In another example, thBDOFIt can be calculated as 1 < (bit-depth-7) based on the internal bit depth. In other words, to control the dynamic range of the BDOF motion refinement, the motion refinement threshold is determined as the coded bit depth of 2 minus the power of 7.

FIG. 7 illustrates a computing environment 710 coupled with a user interface 760. The computing environment 710 may be part of a data processing server. The computing environment 710 includes a processor 720, a memory 740, and an I/O interface 750.

The processor 720 generally controls the overall operation of the computing environment 710, such as operations associated with display, data acquisition, data communication, and image processing. Processor 720 may include one or more processors to execute instructions to perform all or some of the steps of the above-described methods. Further, processor 720 may include one or more modules that facilitate interaction between processor 720 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip, GPU, etc.

The memory 740 is configured to store various types of data to support the operation of the computing environment 710. The memory 740 may include predetermined software 742. Examples of such data include instructions for any application or method operating on the computing environment 710, video data sets, image data, and so forth. The memory 740 may be implemented using any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

I/O interface 750 provides an interface between processor 720 and peripheral interface modules such as a keyboard, click wheel, buttons, etc. The buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 750 may be coupled to an encoder and a decoder.

In an embodiment, a non-transitory computer readable storage medium is also provided, comprising a plurality of programs, such as embodied in the memory 740, executable by the processor 720 in the computing environment 710 for performing the above-described methods. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The non-transitory computer readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the motion prediction method described above.

In embodiments, the computing environment 710 may be implemented with one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will become apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.

The embodiment was chosen and described in order to explain the principles of the disclosure and to enable others of ordinary skill in the art to understand the disclosure for various embodiments and with the best mode of practicing the disclosure and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the disclosure.

29页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:进行帧内预测的编码器、解码器和对应方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类