Video encoding and decoding

文档序号:539064 发布日期:2021-06-01 浏览:16次 中文

阅读说明:本技术 视频编码和解码 (Video encoding and decoding ) 是由 G·拉罗彻 C·吉斯科特 P·乌诺 乔纳森·泰奎特 于 2019-10-18 设计创作,主要内容包括:一种对仿射合并模式的运动信息预测子索引进行编码的方法,所述方法包括:生成运动信息预测子候选的列表;选择所述列表中的运动信息预测子候选之一作为仿射合并模式预测子;以及使用CABAC编码来生成针对所选择的运动信息预测子候选的运动信息预测子索引,所述运动信息预测子索引的一个或多个位被旁路CABAC编码。(A method of encoding a motion information predictor index for affine merge mode, the method comprising: generating a list of motion information predictor candidates; selecting one of the motion information predictor candidates in the list as an affine merge mode predictor; and generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded.)

1. A method for encoding a motion information predictor index for affine merge mode, the method comprising:

generating a list of motion information predictor candidates;

selecting one of the motion information predictor candidates in the list as an affine merge mode predictor; and

generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded.

2. The method of claim 1, wherein, in the case of using a non-affine merge mode, the method further comprises: selecting one of the motion information predictor candidates in the list as a non-affine merge mode predictor.

3. The method of claim 2, wherein,

the CABAC encoding includes: for at least one bit of a motion information predictor index for a current block, using a first context variable if the affine merge mode is used or using a second context variable if the non-affine merge mode is used; and

the method further comprises the following steps: in the case where the affine merge mode is used, data indicating that the affine merge mode is used is included in a bitstream.

4. The method of claim 2, wherein the CABAC encoding comprises: in the case of using the affine merge mode and in the case of using the non-affine merge mode, the same context variable is used for at least one bit of a motion information predictor index of the current block.

5. The method of any preceding claim, further comprising: data for determining the maximum number of motion information predictor candidates that can be included in the generated list of motion information predictor candidates is included in the bitstream.

6. The method of any preceding claim, wherein all but the first bit of the motion information predictor index is bypass CABAC encoded.

7. The method of any preceding claim, wherein the motion information predictor index for the selected motion information predictor candidate is encoded using the same syntax elements in case of using the affine merge mode and in case of using the non-affine merge mode.

8. A method for decoding a motion information predictor index for affine merge mode, the method comprising:

generating a list of motion information predictor candidates;

decoding the motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; and

in the case of using the affine merge mode, one of the motion information predictor candidates in the list is identified as an affine merge mode predictor using the decoded motion information predictor index.

9. The method of claim 8, wherein, in the case of using a non-affine merge mode, the method further comprises: identifying one of the motion information predictor candidates in the list as a non-affine merge mode predictor using the decoded motion information predictor index.

10. The method of claim 9, further comprising:

obtaining data indicating that the affine merge mode is used from a bitstream, and the CABAC decoding includes at least one bit of a motion information predictor index for a current block:

in the event that the obtained data indicates that the affine merge mode is used, using a first context variable; and

in a case where the obtained data indicates that the non-affine merge mode is used, a second context variable is used.

11. The method of claim 9, wherein,

the CABAC decoding includes: in the case of using the affine merge mode and in the case of using the non-affine merge mode, the same context variable is used for at least one bit of a motion information predictor index of the current block.

12. The method of any of claims 9 to 11, further comprising: obtaining data indicating use of an affine merge mode from the bitstream, wherein the generated list of motion information predictor candidates:

in the event that the obtained data indicates that the affine merge mode is used, including an affine merge mode predictor candidate; and

in case the obtained data indicates that the non-affine merging mode is used, a non-affine merging mode predictor candidate is included.

13. The method of any of claims 9 to 12, wherein decoding the motion information predictor index comprises: parsing the same syntax elements from a bitstream in case the affine merging mode is used and in case the non-affine merging mode is used.

14. The method of any of claims 8 to 13, further comprising: data for determining the maximum number of motion information predictor candidates that can be included in the generated list of motion information predictor candidates is obtained from the bitstream.

15. The method of any of claims 8 to 14, wherein all bits of the motion information predictor index except the first bit are bypass CABAC decoded.

16. The method of any preceding claim, wherein the motion information predictor candidate comprises information for obtaining a motion vector.

17. The method of any preceding claim, wherein the generated list of motion information predictor candidates comprises ATMVP candidates.

18. The method according to any of the preceding claims, wherein the generated list of motion information predictor candidates has the same maximum number of motion information predictor candidates that can be included therein, both in case the affine merging mode is used and in case the non-affine merging mode is used.

19. An apparatus for encoding a motion information predictor index for affine merge mode, the apparatus comprising:

means for generating a list of motion information predictor candidates;

means for selecting one of the motion information predictor candidates in the list as an affine merge mode predictor; and

means for generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded.

20. The apparatus according to claim 19, comprising means for performing the method of encoding a motion information predictor index according to any one of claims 1 to 7 or any one of claims 16 to 18 when dependent on any one of claims 1 to 7.

21. An apparatus for decoding a motion information predictor index for affine merge mode, the apparatus comprising:

means for generating a list of motion information predictor candidates;

means for decoding the motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; and

means for identifying one of the motion information predictor candidates in the list as an affine merge mode predictor using the decoded motion information predictor index if the affine merge mode is used.

22. The apparatus of claim 21, comprising means for performing the method of decoding motion information predictor indices according to any of claims 8 to 15 or any of claims 16 to 18 when dependent on any of claims 8 to 15.

23. A program which, when run on a computer or processor, causes the computer or processor to perform the method of any one of claims 1 to 18.

24. A carrier medium carrying a program according to claim 23.

Technical Field

The present invention relates to video encoding and decoding.

Background

Recently, the joint video experts group (jfet), a collaboration team consisting of MPEG and ITU-T research group 16 VCEG, began to investigate a new video coding standard called multifunctional video coding (VVC). The goal of VVC is to provide a significant improvement in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and was completed in 2020. Primary target applications and services include, but are not limited to, 360 degree and High Dynamic Range (HDR) video. In summary, jfet evaluated feedback from 32 organizations using formal subjective testing conducted by an independent testing laboratory. Some proposals indicate that compression efficiency is typically improved by 40% or more when compared to the use of HEVC. Particular effects are shown on Ultra High Definition (UHD) video test material. Thus, we can expect the compression efficiency improvement to be far more than the targeted 50% for the final standard.

The Jfet Exploration Model (JEM) uses all HEVC tools. Another tool not present in HEVC is to use "affine motion mode" when applying motion compensation. Motion compensation in HEVC is limited to translation only, but in practice there are many kinds of motion, such as zoom in/out, rotation, perspective motion, and other irregular motion. When affine motion mode is used, more complex transforms are applied to the block in an attempt to predict these forms of motion more accurately. It would therefore be desirable if affine motion patterns could be used while achieving good coding efficiency but at the same time being less complex.

Another tool not present in HEVC uses Alternative Temporal Motion Vector Prediction (ATMVP). Alternative Temporal Motion Vector Prediction (ATMVP) is a specific motion compensation. Instead of considering only one motion information from the current block of the temporal reference frame, the motion information of collocated blocks is considered. Therefore, the temporal motion vector prediction partitions the current block using the associated motion information of each sub-block. In current VTM (VVC test model) reference software, ATMVP is signaled as a merge candidate inserted in the merge candidate list. When ATMVP is enabled at the SPS level, the maximum number of merge candidates is increased by one. Thus, 6 candidates are considered instead of 5 when this mode is disabled.

These and other tools described later present problems with coding efficiency and complexity of coding used to signal which candidate's index (e.g., merge index) to select from a candidate list (e.g., from a merge candidate list used with merge mode coding).

Disclosure of Invention

Accordingly, a solution to at least one of the above problems is desired.

According to a first aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates including an ATMVP candidate;

selecting one of the motion vector predictor candidates in the list; and

motion vector predictor indices (merge indices) for the selected motion vector predictor candidates are generated using CABAC coding, one or more bits of the motion vector predictor indices being bypass CABAC coded.

In one embodiment, all bits of the motion vector predictor index except the first bit are bypass CABAC encoded.

According to a second aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates including an ATMVP candidate;

decoding the motion vector predictor index using CABAC decoding, one or more bits of the motion vector predictor index being bypass CABAC decoded; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

In one embodiment, all bits of the motion vector predictor index except the first bit are bypass CABAC decoded.

According to a third aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates including an ATMVP candidate;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index (merge index) for the selected motion vector predictor candidate using CABAC coding, one or more bits of the motion vector predictor index being bypass CABAC coded.

According to a fourth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates including an ATMVP candidate;

means for decoding the motion vector predictor index using CABAC decoding, one or more bits of the motion vector predictor index being bypass CABAC decoded; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a fifth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected fixed motion vector predictor candidate using CABAC coding, two or more bits of the motion vector predictor index sharing the same context.

In one embodiment, all bits of the motion vector predictor index share the same context.

According to a sixth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, two or more bits of the motion vector predictor index sharing a same context; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

In one embodiment, all bits of the motion vector predictor index share the same context.

According to a seventh aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC coding, two or more bits of the motion vector predictor index sharing the same context.

According to an eighth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, two or more bits of the motion vector predictor index sharing a same context; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a ninth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on the motion vector predictor index of at least one neighboring block of the current block.

In one embodiment, a context variable of at least one bit of the motion vector predictor index depends on the respective motion vector predictor indices of at least two neighboring blocks.

In another embodiment, the context variable of at least one bit of the motion vector predictor index depends on the motion vector predictor index of the left neighboring block on the left side of the current block and the motion vector predictor index of the above neighboring block above the current block.

In another embodiment, the left adjacent block is a2 and the above adjacent block is B3.

In another embodiment, the left adjacent block is a1 and the above adjacent block is B1.

In another embodiment, the context variable has 3 different possible values.

Another embodiment includes: comparing the motion vector predictor index of at least one neighboring block with an index value of the motion vector predictor index of the current block, and setting the context variable according to the comparison result.

Another embodiment includes: comparing a motion vector predictor index of at least one neighboring block with a parameter representing a bit or a bit position of one bit in the motion vector predictor index of the current block; and setting the context variable according to the comparison result.

Yet another embodiment includes: performing a first comparison of the motion vector predictor index of the first neighboring block with a parameter indicating a bit or a bit position of the bit in the motion vector predictor index of the current block; performing a second comparison, comparing the motion vector predictor index of a second neighboring block to the parameter; and setting the context variable according to the results of the first and second comparisons.

According to a tenth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index of a current block depends on the motion vector predictor index of at least one neighboring block of the current block; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

In one embodiment, a context variable of at least one bit of the motion vector predictor index depends on the respective motion vector predictor indices of at least two neighboring blocks.

In another embodiment, the context variable of at least one bit of the motion vector predictor index depends on the motion vector predictor index of a left neighboring block to the left of the current block and the motion vector predictor index of an upper neighboring block above the current block.

In another embodiment, the left adjacent block is a2 and the above adjacent block is B3.

In another embodiment, the left adjacent block is a1 and the above adjacent block is B1.

In another embodiment, the context variable has 3 different possible values.

Another embodiment includes: comparing the motion vector predictor index of at least one neighboring block with an index value of the motion vector predictor index of the current block, and setting the context variable according to the comparison result.

Another embodiment includes: comparing a motion vector predictor index of at least one neighboring block with a parameter representing a bit or a bit position of one bit in the motion vector predictor index of the current block; and setting the context variable according to the comparison result.

Yet another embodiment includes: performing a first comparison of the motion vector predictor index of the first neighboring block with a parameter indicating a bit or a bit position of the bit in the motion vector predictor index of the current block; performing a second comparison, comparing the motion vector predictor index of a second neighboring block to the parameter; and setting the context variable according to the results of the first and second comparisons.

According to an eleventh aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on the motion vector predictor index of at least one neighboring block of the current block.

According to a twelfth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of a motion vector predictor index of a current block depends on a motion vector predictor index of at least one neighboring block of the current block; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a thirteenth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index of the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on a skip flag of the current block.

According to a fourteenth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on another parameter or syntax element of the current block available before decoding the motion vector predictor index.

According to a fifteenth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of a current block depends on another parameter or syntax element of the current block as an indicator of motion complexity in the current block.

According to a sixteenth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index of a current block depends on a skip flag of the current block; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a seventeenth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on another parameter or syntax element of the current block available before decoding the motion vector predictor index; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to an eighteenth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on another parameter or syntax element of the current block that is an indicator of motion complexity in the current block; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a nineteenth aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on a skip flag of the current block.

According to a twentieth aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of a motion vector predictor index of a current block depends on another parameter or syntax element of the current block available before decoding the motion vector predictor index.

According to a twenty-first aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on another parameter or syntax element of the current block that is an indicator of motion complexity in the current block.

According to a twenty-second aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on a skip flag of the current block; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a twenty-third aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of a motion vector predictor index for a current block depends on another parameter or syntax element of the current block available before decoding the motion vector predictor index; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a twenty-fourth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on another parameter or syntax element of the current block that is an indicator of motion complexity in the current block; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a twenty-fifth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC coding, wherein a context variable of at least one bit of the motion vector predictor index for the current block depends on the affine motion vector predictor candidate in the list (if present).

In one embodiment, the context variable depends on the position of the first affine motion vector predictor candidate in the list.

According to a twenty-sixth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for the current block depends on an affine motion vector predictor candidate (if present) in the list; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

In one embodiment, the context variable depends on the position of the first affine motion vector predictor candidate in the list.

According to a twenty-seventh aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC coding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on the affine motion vector predictor candidate in the list (if present).

According to a twenty-eighth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for the current block depends on an affine motion vector predictor candidate (if present) in the list; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a twenty-ninth aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates comprising affine motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block depends on an affine flag of the current block and/or of at least one neighboring block of the current block.

According to a thirtieth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates comprising affine motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on an affine flag of the current block and/or at least one neighboring block of the current block; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a thirty-first aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates comprising affine motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of a motion vector predictor index of a current block depends on an affine flag of the current block and/or of at least one neighboring block of the current block.

According to a thirty-second aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates comprising affine motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index for a current block depends on an affine flag of the current block and/or at least one neighboring block of the current block; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a thirty-third aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block is derived from a context variable of at least one of a skip flag and an affine flag of the current block.

According to a thirty-fourth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index of a current block is derived from a context variable of at least one of a skip flag and an affine flag of the current block; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a thirty-fifth aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC encoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block is derived from a context variable of at least one of a skip flag and an affine flag of the current block.

According to a thirty-sixth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of a motion vector predictor index of a current block is derived from a context variable of at least one of a skip flag and an affine flag of the current block; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a thirty-seventh aspect of the present invention, there is provided a method of encoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

selecting one of the motion vector predictor candidates in the list; and

generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC coding, wherein a context variable of at least one bit of the motion vector predictor index of the current block has only two different possible values.

According to a thirty-eighth aspect of the present invention, there is provided a method of decoding a motion vector predictor index, the method comprising:

generating a list of motion vector predictor candidates;

decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block has only two different possible values; and

identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

According to a thirty-ninth aspect of the present invention, there is provided an apparatus for encoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for selecting one of the motion vector predictor candidates in the list; and

means for generating a motion vector predictor index for the selected motion vector predictor candidate using CABAC coding, wherein the context variable of at least one bit of the motion vector predictor index of the current block has only two different possible values.

According to a fortieth aspect of the present invention, there is provided an apparatus for decoding a motion vector predictor index, the apparatus comprising:

means for generating a list of motion vector predictor candidates;

means for decoding the motion vector predictor index using CABAC decoding, wherein a context variable of at least one bit of the motion vector predictor index of the current block has only two different possible values; and

means for identifying one of the motion vector predictor candidates in the list using the decoded motion vector predictor index.

Yet another aspect of the invention relates to programs which, when executed by a computer or processor, cause the computer or processor to perform any of the methods of the preceding aspects. The program may be provided alone or may be carried on, by or in a carrier medium. The carrier medium may be non-transitory, e.g. a storage medium, in particular a computer readable storage medium. The carrier medium may also be transitory, such as a signal or other transmission medium. The signals may be transmitted via any suitable network, including the internet.

Yet another aspect of the invention relates to a camera comprising any of the apparatus aspects described above. In one embodiment, the camera further comprises a zoom component.

According to a forty-first aspect of the present invention, there is provided a method of encoding a motion information predictor index, the method comprising: generating a list of motion information predictor candidates; when the affine merging mode is used, selecting one of the motion information predictor candidates in the list as an affine merging mode predictor; selecting one of the motion information predictor candidates in the list as a non-affine merge mode predictor when using a non-affine merge mode; and generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded.

Suitably, the CABAC encoding comprises: the same context variable is used for at least one bit of the motion information predictor index of the current block in case of using the affine merge mode and in case of using the non-affine merge mode. Alternatively, the CABAC encoding includes: using a first context variable if an affine merge mode is used or a second context variable if a non-affine merge mode is used for at least one bit of a motion information predictor index of a current block; and the method further comprises: in the case of using the affine merging mode, data indicating that the affine merging mode is used is included in the bitstream.

Suitably, the method further comprises: data for determining the maximum number of motion information predictor candidates that can be included in the generated motion information predictor candidate list is included in the bitstream. Suitably, all bits of the motion information predictor index except the first bit are bypass CABAC encoded. Suitably, the first bit is CABAC encoded. Suitably, the motion information predictor index for the selected motion information predictor candidate is encoded using the same syntax elements in case of using an affine merging mode and in case of using a non-affine merging mode.

According to a forty-second aspect of the present invention, there is provided a method of decoding a motion information predictor index, the method comprising: generating a list of motion information predictor candidates; decoding the motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; identifying one of the motion information predictor candidates in the list as an affine merge mode predictor using the decoded motion information predictor index, in case of using an affine merge mode; and in the case of using a non-affine merge mode, identifying one of the motion information predictor candidates in the list as a non-affine merge mode predictor using the decoded motion information predictor index.

Suitably, CABAC decoding comprises: in case of using the affine merge mode and in case of using the non-affine merge mode, the same context variable is used for at least one bit of the motion information predictor index of the current block. Optionally, the method further comprises: obtaining, from a bitstream, data indicating that an affine merge mode is used, and CABAC decoding at least one bit including a motion information predictor index for a current block: in the event that the obtained data indicates that an affine merge mode is used, using a first context variable; and in the event that the obtained data indicates that the non-affine merge mode is used, using a second context variable.

Suitably, the method further comprises obtaining data indicating that an affine merge mode is used from the bitstream, wherein, in a case where the obtained data indicates that the affine merge mode is used, the generated list of motion information predictor candidates comprises including affine merge mode predictor candidates; and in the event that the obtained data indicates that a non-affine merge mode is used, the generated list of motion information predictor candidates comprises non-affine merge mode predictor candidates.

Suitably, the method further comprises obtaining data from the bitstream for determining a maximum number of motion information predictor candidates that can be included in the generated list of motion information predictor candidates. Suitably, all bits of the motion information predictor index except the first bit are bypass CABAC decoded. Suitably, the first bit is CABAC decoded. Suitably, decoding the motion information predictor index comprises: in the case of using an affine merge mode and in the case of using a non-affine merge mode, the same syntax elements are parsed from the bitstream. Suitably, the motion information predictor candidate comprises information for obtaining a motion vector. Suitably, the generated list of motion information predictor candidates comprises ATMVP candidates. Suitably, the generated list of motion information predictor candidates has the same maximum number of motion information predictor candidates that can be included therein, both in case of using an affine merging mode and in case of using a non-affine merging mode.

According to a forty-third aspect of the present invention, there is provided an apparatus for encoding a motion information predictor index, the apparatus comprising: means for generating a list of motion information predictor candidates; means for selecting one of the motion information predictor candidates in the list as an affine merge mode predictor if affine merge mode is used; means for selecting one of the motion information predictor candidates in the list as a non-affine merge mode predictor if a non-affine merge mode is used; and means for generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded. Suitably, the apparatus comprises means for performing the method for encoding a motion information predictor index according to the fortieth aspect.

According to a forty-fourth aspect of the present invention, there is provided an apparatus for decoding a motion information predictor index, the apparatus comprising: means for generating a list of motion information predictor candidates; means for decoding the motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; means for identifying one of the motion information predictor candidates in the list as an affine merge mode predictor using the decoded motion information predictor index if affine merge mode is used; and means for identifying one of the motion information predictor candidates in the list as a non-affine merge mode predictor using the decoded motion information predictor index if a non-affine merge mode is used. Suitably, the apparatus comprises means for performing a method of decoding a motion information predictor index according to the forty-second aspect.

According to a forty-fifth aspect of the present invention, there is provided a method of encoding a motion information predictor index for an affine merging mode, the method comprising: generating a list of motion information predictor candidates; selecting one of the motion information predictor candidates in the list as an affine merge mode predictor; and generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded.

Suitably, in case a non-affine merge mode is used, the method further comprises selecting one of the motion information predictor candidates in the list as non-affine merge mode predictor. Suitably, the CABAC encoding comprises: for at least one bit of a motion information predictor index of the current block, using a first context variable if an affine merge mode is used or using a second context variable if a non-affine merge mode is used; and the method further comprises: in the case of using the affine merging mode, data indicating that the affine merging mode is used is included in the bitstream. Optionally, the CABAC encoding comprises: in case of using the affine merge mode and in case of using the non-affine merge mode, the same context variable is used for at least one bit of the motion information predictor index of the current block.

Suitably, the method further comprises: data for determining the maximum number of motion information predictor candidates that can be included in the generated list of motion information predictor candidates is included in the bitstream.

Suitably, all bits of the motion information predictor index except the first bit are bypass CABAC encoded. Suitably, the first bit is CABAC encoded. Suitably, the motion information predictor index for the selected motion information predictor candidate is encoded using the same syntax elements in case of using an affine merge mode and in case of using a non-affine merge mode.

According to a forty-sixth aspect of the present invention, there is provided a decoding method for decoding a motion information predictor index of an affine merging mode, the method comprising: generating a list of motion information predictor candidates; decoding the motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; and in the case of using an affine merge mode, using the decoded motion information predictor index to identify one of the motion information predictor candidates in the list as an affine merge mode predictor.

Suitably, where a non-affine merge mode is used, the method further comprises: using the decoded motion information predictor index to identify one of the motion information predictor candidates in the list as a non-affine merge mode predictor. Suitably, the method further comprises: obtaining, from the bitstream, data indicating that an affine merge mode is used, and CABAC decoding at least one bit including a motion information predictor index for the current block: in the event that the obtained data indicates that an affine merge mode is used, using a first context variable; and in the event that the obtained data indicates that the non-affine merge mode is used, using a second context variable. Optionally, the CABAC decoding comprises: in case of using the affine merge mode and in case of using the non-affine merge mode, the same context variable is used for at least one bit of the motion information predictor index of the current block.

Suitably, the method further comprises: obtaining data indicating use of an affine merge mode from the bitstream, wherein the generated list of motion information predictor candidates comprises affine merge mode predictor candidates in case the obtained data indicates use of an affine merge mode, and the generated list of motion information predictor candidates comprises non-affine merge mode predictor candidates in case the obtained data indicates use of a non-affine merge mode.

Suitably, decoding the motion information predictor index comprises: in the case of using the affine merge mode and in the case of using the non-affine merge mode, the same syntax elements are parsed from the bitstream. Suitably, the method further comprises: data for determining the maximum number of motion information predictor candidates that can be included in the generated list of motion information predictor candidates is obtained from the bitstream. Suitably, all bits of the motion information predictor index except the first bit are bypass CABAC decoded. Suitably, the first bit is CABAC decoded. Suitably, the motion information predictor candidate comprises information for obtaining a motion vector. Suitably, the generated list of motion information predictor candidates comprises ATMVP candidates. Suitably, the generated list of motion information predictor candidates has the same maximum number of motion information predictor candidates that can be included therein, both in case of using the affine merging mode and in case of using the non-affine merging mode.

According to a forty-seventh aspect of the present invention, there is provided an apparatus for encoding a motion information predictor index for an affine merging mode, the apparatus comprising: means for generating a list of motion information predictor candidates; means for selecting one of the motion information predictor candidates in the list as an affine merge mode predictor; and means for generating a motion information predictor index for the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded. Suitably, the apparatus comprises means for performing the method of encoding a motion information predictor index according to the forty-fifth aspect.

According to a forty-eighth aspect of the present invention, there is provided an apparatus for decoding a motion information predictor index for an affine merging mode, the apparatus comprising: means for generating a list of motion information predictor candidates; means for decoding a motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; and means for identifying one of the motion information predictor candidates in the list as an affine merge mode predictor using the decoded motion information predictor index, if affine merge mode is used. Suitably, the apparatus comprises means for performing the method of decoding the motion information predictor index according to the sixteenth aspect.

In an embodiment the camera is adapted to indicate when said zoom component is operable and to signal an affine mode in dependence of said indication that the zoom component is operable.

In another embodiment, the camera further comprises a pan component.

In another embodiment the camera is adapted to indicate when said pan member is operable and to signal an affine mode in dependence of said indication that the pan member is operable.

According to yet another aspect of the present invention there is provided a mobile device comprising a camera embodying any of the camera aspects described above.

In one embodiment, the mobile device further comprises at least one position sensor adapted to sense a change in orientation of the mobile device.

In one embodiment, the mobile device is adapted to signal an affine pattern according to said sensed orientation change of the mobile device.

Other features of the invention are characterized by the other independent and dependent claims.

Any feature in one aspect of the invention may be applied to other aspects of the invention in any suitable combination. In particular, method aspects may apply to apparatus aspects and vice versa.

Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference herein to software and hardware features should be construed accordingly.

Any device feature as described herein may also be provided as a method feature, and vice versa. As used herein, means-plus-function features are expressed alternatively in terms of their respective structures, such as a suitably programmed processor and associated memory.

It should also be understood that particular combinations of the various features described and defined in any aspect of the invention may be implemented, provided and/or used independently.

Drawings

Reference will now be made, by way of example, to the accompanying drawings, in which:

fig. 1 is a diagram for illustrating a coding structure used in HEVC;

FIG. 2 is a block diagram schematically illustrating a data communication system in which one or more embodiments of the present invention may be implemented;

FIG. 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented;

FIG. 4 is a flow chart illustrating the steps of an encoding method according to an embodiment of the present invention;

FIG. 5 is a flow chart illustrating the steps of a decoding method according to an embodiment of the invention;

FIGS. 6a and 6b illustrate spatial and temporal blocks that may be used to generate a motion vector predictor;

FIG. 7 shows simplified steps of an AMVP predictor subset derivation process;

fig. 8 is a diagram of a motion vector derivation process in the merge mode;

FIG. 9 illustrates the partitioning and temporal motion vector prediction of a current block;

fig. 10(a) illustrates coding of a merge index for HEVC, or coding when ATMVP is not enabled at the SPS level;

fig. 10(b) illustrates coding of a merge index when ATMVP is enabled at SPS level;

FIG. 11(a) illustrates a simple affine motion field;

FIG. 11(b) illustrates a more complex affine motion field;

fig. 12 is a flow chart of a partial decoding process of some syntax elements related to the coding mode;

FIG. 13 is a flowchart illustrating merge candidate derivation;

FIG. 14 is a flow chart illustrating a first embodiment of the present invention;

fig. 15 is a flowchart of a partial decoding process of some syntax elements related to a coding mode in a twelfth embodiment of the present invention;

fig. 16 is a flowchart illustrating generation of a merge candidate list in the twelfth embodiment of the present invention;

fig. 17 is a block diagram for explaining a CABAC encoder suitable for use in an embodiment of the present invention;

FIG. 18 is a schematic block diagram of a communication system for implementing one or more embodiments of the present invention;

FIG. 19 is a schematic block diagram of a computing device;

fig. 20 is a diagram illustrating a network camera system;

fig. 21 is a diagram illustrating a smartphone;

fig. 22 is a flowchart of a partial decoding process of some syntax elements related to an encoding mode according to the sixteenth embodiment;

FIG. 23 is a flow diagram illustrating a single index signaling scheme for both merge mode and affine merge mode; and

fig. 24 is a flowchart illustrating an affine merge candidate derivation process for the affine merge mode.

Detailed Description

Embodiments of the present invention described below relate to improving encoding and decoding of indices using CABAC. It should be appreciated that implementations for improving other context-based arithmetic coding schemes that are functionally similar to CABAC are also possible according to alternative embodiments of the invention. Prior to describing the embodiments, video encoding and decoding techniques and related encoders and decoders will be described.

Fig. 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard. The video sequence 1 consists of a series of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.

The images 2 of the sequence may be segmented into slices (slices) 3. In some cases, a sheet may constitute the entirety of an image. The slices are partitioned into non-overlapping Coding Tree Units (CTUs). A Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to the macroblock units used in several previous video standards. A CTU is sometimes also referred to as a Largest Coding Unit (LCU). The CTU has luminance and chrominance component parts, each of which is called a Coding Tree Block (CTB). These different color components are not shown in fig. 1.

For HEVC, the CTU typically has a size of 64 pixels by 64 pixels, but for VVC, the size may be 128 pixels by 128 pixels. The respective CTUs may be iteratively partitioned into smaller variable size Coding Units (CUs) 5 using a quadtree decomposition.

A coding unit is a basic coding element and is composed of two types of sub-units called a Prediction Unit (PU) and a Transform Unit (TU). The maximum size of a PU or TU is equal to the CU size. The prediction unit corresponds to a partition of the CU used for prediction of pixel values. It is possible to partition a CU into various different partitions of PUs, including a partition divided into 4 square PUs, and two different partitions divided into 2 rectangular PUs, as shown at 6. The transform unit is a basic unit for spatial transform using DCT. A CU may be partitioned into TUs based on a quadtree representation 7.

Each slice is embedded in a Network Abstraction Layer (NAL) unit. In addition, the coding parameters of a video sequence are stored in dedicated NAL units called parameter sets. In HEVC and h.264/AVC, two parameter set NAL units are employed: first, a Sequence Parameter Set (SPS) NAL unit, which collects all parameters that are invariant during the entire video sequence. Typically, it handles the encoding profile, the size of the video frame and other parameters. Second, a Picture Parameter Set (PPS) NAL unit, which includes parameters that can be changed from one picture (or frame) to another picture (or frame) of a sequence. HEVC also includes a Video Parameter Set (VPS) NAL unit that contains parameters that describe the overall structure of the bitstream. VPS is a new type of parameter set defined in HEVC and applies to all layers of the bitstream. A layer may contain multiple temporal sub-layers and all version 1 bitstreams are limited to a single layer. HEVC has certain layered extensions for scalability and multiview, and these extensions will allow multiple layers with a backward compatible version 1 base layer.

Fig. 2 and 18 illustrate data communication systems in which one or more embodiments of the present invention may be implemented. The data communication system comprises a transmitting device 191, such as a server 201, operable to transmit data packets of a data stream, such as a bit stream 101, to a receiving device 195, such as a client terminal 202, via a data communication network 200. The data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be, for example, a wireless network (Wifi/802.11a or b or g), an ethernet network, an internet network or a hybrid network consisting of several different networks. In a particular embodiment of the invention, the data communication system may be a digital television broadcast system, wherein the server 201 transmits the same data content to a plurality of clients.

The data stream 204 (or bitstream 101) provided by the server 201 may be composed of multimedia data representing video (e.g., image sequence 151) and audio data. In some embodiments of the invention, the audio and video data streams may be captured by the server 201 using a microphone and camera, respectively. In some embodiments, the data stream may be stored on server 201 or received by server 201 from other data providers, or generated at server 201. The server 201 is provided with an encoder 150 for encoding video and audio streams, in particular to provide a compressed bitstream 101 for transmission, which compressed bitstream 101 is a more compact representation of the data presented as input to the encoder.

To obtain a better ratio of quality of transmission data to amount of transmission data, the video data may be compressed, for example, according to the HEVC format or the h.264/AVC format or the VVC format.

The client 202 receives the transmitted bitstream 101 and its decoder 100 decodes the reconstructed bitstream to render video images (e.g., video signal 109) on a display device and audio data using a speaker.

Although a streaming scenario is considered in the examples of fig. 2 and 18, it will be appreciated that in some embodiments of the invention, data communication between the encoder and decoder may be performed using, for example, a media storage device such as an optical disc or the like.

In one or more embodiments of the invention, a video image is transmitted with data representing compensation offsets to be applied to reconstructed pixels of the image to provide filtered pixels in a final image.

Fig. 3 schematically illustrates a processing apparatus 300 configured to implement at least one embodiment of the invention. The processing device 300 may be a device such as a microcomputer, a workstation, or a lightweight portable device. The apparatus 300 includes a communication bus 313 connected to:

a central processing unit 311, such as a microprocessor or the like, denoted as CPU;

a read only memory 307, denoted ROM, for storing a computer program implementing the invention;

a random access memory 312, denoted RAM, for storing executable code of the method of an embodiment of the invention, and registers adapted to record variables and parameters required for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to an embodiment of the invention; and

a communication interface 302 connected to a communication network 303, through which the digital data to be processed are transmitted or received.

Optionally, the device 300 may also include the following components:

a data storage component 304, such as a hard disk or the like, for storing computer programs implementing the methods of one or more embodiments of the invention and data used or generated during implementation of one or more embodiments of the invention;

a disc drive 305 for a disc 306, the disc drive being adapted to read data from or write data to the disc 306; and

a screen 309 for displaying data by means of a keyboard 310 or any other pointing device and/or for serving as a graphical interface for interaction with a user.

The device 300 may be connected to various peripheral devices such as a digital camera 320 or a microphone 308, each of which is connected to an input/output card (not shown) to provide multimedia data to the device 300.

The communication bus 313 provides communication and interoperability between various elements included in the device 300 or connected to the device 300. The representation of the bus is not limiting and, in particular, the central processing unit is operable to communicate instructions to any element of the device 300, either directly or by means of other elements of the device 300.

The disc 306 may be replaced by any information medium, such as a compact disc-ROM (CD-ROM), ZIP disc or memory card, rewritable or not, and in general by an information storage means readable by a microcomputer or microprocessor, the disc 306 being integrated or not in the device, possibly removable, and being adapted to store one or more programs which perform the method enabling the encoding of a sequence of digital images and/or the decoding of a bitstream according to the invention.

The executable code may be stored in read only memory 307, on hard disk 304, or on a removable digital medium such as, for example, disk 306 as previously described. According to a variant, the executable code of the program may be received via the interface 302 by means of a communication network 303 to be stored in one of the storage means of the device 300, such as a hard disk 304 or the like, before execution.

The central processing unit 311 is adapted to control and direct the execution of instructions or portions of software code executing one or more programs according to the present invention, instructions stored in one of the above mentioned memory means. At power-on, one or more programs stored in non-volatile memory (e.g., on hard disk 304 or in read-only memory 307) are transferred into random access memory 312 (which then contains executable code for the one or more programs) and registers for storing variables and parameters necessary to implement the present invention.

In this embodiment, the device is a programmable device that uses software to implement the invention. However, the invention may alternatively be implemented in hardware (e.g., in the form of an application specific integrated circuit or ASIC).

Fig. 4 illustrates a block diagram of an encoder in accordance with at least one embodiment of the present invention. The encoder is represented by connected modules, each adapted to implement at least one respective step of the method according to one or more embodiments of the invention for implementing at least one embodiment of encoding images in a sequence of images, for example in the form of programmed instructions executed by the CPU 311 of the apparatus 300.

The encoder 400 receives as input an original sequence 401 of digital images i0 to in. Each digital image is represented by a set of samples (sometimes also referred to as pixels) (hereinafter, they are referred to as pixels).

The encoder 400 outputs a bitstream 410 after implementing the encoding process. The bitstream 410 includes a plurality of coding units or slices, each slice including a slice header for transmitting an encoded value of an encoding parameter used for encoding the slice, and a slice body including encoded video data.

Module 402 segments input digital images i0 through in 401 into pixel blocks. The blocks correspond to image portions and may be of variable size (e.g., 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, 128 × 128 pixels, and several rectangular block sizes may also be considered). A coding mode is selected for each input block. Two coding mode families are provided: spatial prediction coding (intra prediction) based coding modes and temporal prediction based coding modes (inter coding, merging, skipping). Possible coding modes were tested.

Module 403 implements an intra prediction process in which a block to be encoded is predicted by predictors calculated from neighboring pixels of the given block to be encoded. If intra coding is selected, the selected intra predictor and an indication of the difference between the given block and its predictor are encoded to provide a residual.

Temporal prediction is implemented by a motion estimation module 404 and a motion compensation module 405. First, a reference image from a reference image set 416 is selected, and a portion of the reference image (also referred to as a reference region or image portion), which is the region closest (in terms of similarity in pixel values) to a given block to be encoded, is selected by the motion estimation module 404. The motion compensation module 405 then uses the selected region to predict the block to be encoded. The difference between the selected reference area and a given block (also called residual block) is calculated by the motion compensation module 405. The selected reference area is indicated using a motion vector.

Thus, in both cases (spatial and temporal prediction), the residual is calculated by subtracting the predictor from the original block.

In the intra prediction implemented by block 403, the prediction direction is encoded. In the inter prediction implemented by the modules 404, 405, 416, 418, 417, at least one motion vector or data for identifying such a motion vector is encoded for temporal prediction.

If inter prediction is selected, information on a motion vector and a residual block is encoded. To further reduce the bitrate, the motion vector is encoded by the difference with respect to the motion vector predictor, assuming that the motion is homogenous. The motion vector predictor from the set of motion information predictor candidates is obtained from the motion vector field 418 by a motion vector prediction and coding module 417.

The encoder 400 also includes a selection module 406 for selecting an encoding mode by applying an encoding cost criterion, such as a rate-distortion criterion. To further reduce redundancy, a transform (such as a DCT or the like) is applied to the residual block by transform module 407, and the resulting transformed data is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bitstream 410.

The encoder 400 also performs decoding of the encoded image to generate a reference image (e.g., a reference image in reference image/picture 416) for motion estimation of a subsequent image. This enables the encoder and decoder receiving the bitstream to have the same reference frame (using the reconstructed picture or picture portion). The inverse quantization ("dequantization") module 411 performs inverse quantization ("dequantization") of the quantized data, followed by an inverse transform of the transform module 412. The intra prediction module 413 uses the prediction information to determine which predictor to use for a given block, and the motion compensation module 414 actually adds the residual obtained by the module 412 to the reference region obtained from the reference picture set 416.

Post-filtering is then applied by module 415 to filter the reconstructed frame of pixels (image or image portion). In an embodiment of the invention, an SAO loop filter is used, wherein a compensation offset is added to the pixel values of the reconstructed pixels of the reconstructed image. It will be appreciated that post-filtering is not always necessary. Furthermore, any other type of post-filtering may be performed in addition to or instead of SAO loop filtering.

Fig. 5 shows a block diagram of a decoder 60 according to an embodiment of the invention, the decoder 60 being operable to receive data from an encoder. The decoder is represented by connected modules, each adapted to implement a respective step of the method implemented by the decoder 60, for example in the form of programmed instructions to be executed by the CPU 311 of the device 300.

The decoder 60 receives a bitstream 61 comprising coding units (e.g., data corresponding to blocks or coding units), each coding unit consisting of a header containing information on the encoded parameters and a body containing the encoded video data. As explained with respect to fig. 4, for a given block, the encoded video data is entropy encoded over a predetermined number of bits, and the index of the motion vector predictor is encoded. The received encoded video data is entropy decoded by module 62. The residual data is then dequantized by a module 63, after which an inverse transform is applied by a module 64 to obtain pixel values.

Mode data indicating an encoding mode is also entropy-decoded, and based on the mode, an encoded block (unit/set/group) of image data is intra-type-decoded or inter-type-decoded.

In the case of intra mode, the intra inverse prediction module 65 determines an intra predictor based on the intra prediction mode specified in the bitstream.

If the mode is inter, motion prediction information is extracted from the bitstream to find (identify) a reference region used by the encoder. The motion prediction information includes a reference frame index and a motion vector residual. The motion vector predictor is added to the motion vector residual by the motion vector decoding module 70 to obtain a motion vector.

The motion vector decoding module 70 applies motion vector decoding to each current block encoded by motion prediction. Once the index of the motion vector predictor for the current block has been obtained, the actual value of the motion vector associated with the current block can be decoded and used to apply motion compensation by the module 66. The reference picture portion indicated by the decoded motion vector is extracted from the reference picture 68 to apply the motion compensation 66. The motion vector field data 71 is updated with the decoded motion vectors for the prediction of the subsequently decoded motion vectors.

Finally, a decoded block is obtained. Suitably, post-filtering is applied by a post-filtering module 67. Decoder 60 ultimately provides or obtains decoded video signal 69.

CABAC

HEVC uses several types of entropy coding, such as context-based adaptive binary arithmetic coding (CABAC), Golomb-rice Code (Golomb-rice Code), or a simple binary representation known as fixed-length coding. Most of the time, a binary encoding process is performed to represent different syntax elements. This binary encoding process is also very specific and depends on different syntax elements. The arithmetic coding represents the syntax element according to a current probability of the syntax element. CABAC is an extension of arithmetic coding that separates the probabilities of syntax elements according to a "context" defined by a context variable. This corresponds to the conditional probability. The context variable may be derived from the values of the current syntax of the decoded top left block (as in a2 in fig. 6B, as described in more detail below) and the top left block (B3 in fig. 6B).

CABAC has been adopted as part of the specification for the H.264/AVC and H.265/HEVC standards. In H.264/AVC, it is one of two alternative methods of entropy coding. Another approach specified in h.264/AVC is a low complexity entropy coding technique based on the use of a variable length coded context adaptive switching set, so-called Context Adaptive Variable Length Coding (CAVLC). CAVLC provides reduced implementation cost at the cost of lower compression efficiency compared to CABAC. For standard or high definition resolution TV signals CABAC typically provides bit rate savings of 10% -20% relative to CAVLC at the same objective video quality. In HEVC, CABAC is one of the entropy coding methods used. Many bits are also bypass CABAC encoded. Furthermore, some syntax elements are encoded as unary codes or golomb codes for other types of entropy coding.

Fig. 17 shows the main blocks of the CABAC encoder.

Input syntax elements of non-binary values are binarized by the binarizer 1701. The coding strategy of CABAC is based on the following findings: very efficient encoding of syntax element values (such as motion vector differences or components of transform coefficient level values) in a hybrid block-based video encoder can be achieved by employing a binarization scheme as a pre-processing unit for the subsequent stages of context modeling and binary arithmetic coding. In general, binarization schemes define a unique mapping of syntax element values to binary decision sequences (so-called bins), which can also be interpreted in terms of binary coding trees. The design of the binarization scheme in CABAC is based on several basic prototypes, whose structure enables simple on-line calculations, and which are adapted to some suitable model probability distribution.

The bins may be processed in one of two basic ways depending on the setting of the switch 1702. When the switch is in the "normal" setting, the bin is supplied to the context modeler 1703 and the normal encoding engine 1704. When the switch is in the "bypass" setting, the context modeler is bypassed and the bin is supplied to the bypass coding engine 1705. Another switch 1706 has "normal" and "bypass" settings similar to switch 1702 so that bins encoded by the applicable ones of the encoding engines 1704 and 1705 can form a bitstream as the output of the CABAC encoder.

It is to be understood that another switch 1706 may be used with a storage device to group some of the bins encoded by encoding engine 1705 (e.g., bins used to encode a block or coding unit) to provide a bypass-encoded data block in the bitstream, and to group some of the bins encoded by encoding engine 1704 (e.g., bins used to encode a block or coding unit) to provide another "regular" (or mathematically) encoded data block in the bitstream. Such separate grouping of bypass encoded and regular encoded data may result in improved throughput during the decoding process.

By decomposing the individual syntax element values into a sequence of bins, the further processing of the individual bin values in CABAC depends on the associated coding mode decision, which may be selected as either normal mode or bypass mode. The latter is selected for bins related to symbol information or less significant bins, which are assumed to be evenly distributed and for which the entire conventional binary arithmetic coding process is therefore simply bypassed. In the conventional coding mode, the individual bin values are coded by using a conventional binary arithmetic coding engine, wherein the associated probability model is determined by a fixed selection without any context modeling or is adaptively selected depending on the relevant context model. As an important design decision, the latter case will typically only apply to the most frequently observed bin, while the other bins (typically the less frequently observed bins) will be processed using a joint (typically a zeroth order probability model). In this way, CABAC enables selective context modeling at the sub-symbol level, and thus provides an efficient tool for exploiting inter-symbol redundancy with significantly reduced overall modeling or learning costs. For the specific choice of context models, four basic design types are employed in CABAC, where only two of them are applied for coding at the transform coefficient level. The design of these four prototypes is based on a priori knowledge about typical characteristics of the source data to be modeled, and reflects the following objectives: a good compromise is found between avoiding unnecessary modeling cost overhead and exploiting the conflicting goals of statistical dependencies to a large extent.

At the lowest level of processing in CABAC, the respective bin values enter the binary arithmetic encoder in the normal or bypass coding mode. For the latter, a fast branch of coding engines of significantly reduced complexity is used, whereas for the previous coding mode, the coding of a given bin value depends on the actual state of the associated adaptive probability model (the term selected for the table-based adaptive binary arithmetic coding engine in CABAC) passed to the M-encoder together with the bin value.

Interframe coding

HEVC uses 3 different inter modes: inter mode (advanced motion vector prediction (AMVP)), a "classic" merge mode (i.e., "non-affine merge mode" or also referred to as "regular" merge mode), and a "classic" merge skip mode (i.e., "non-affine merge skip mode" or also referred to as "regular" merge skip mode). The main difference between these modes is the signaled data in the bitstream. For motion vector coding, the current HEVC standard includes a contention-based scheme for motion vector prediction that is not present in earlier versions of the standard. This means that several candidates are competing on the encoder side with a rate-distortion criterion to find the best motion vector predictor or best motion information for inter or merge modes, respectively, i.e. the "classical/regular" merge mode or the "classical/regular" merge skip mode. The index corresponding to the best predictor or best candidate for motion information is then inserted into the bitstream. The decoder may derive the same set of predictors or candidates from the decoded indices and use the best predictor or candidate. In the picture content extension of HEVC, a new coding tool called Intra Block Copy (IBC) is signaled as any of the three inter modes, the difference between IBC and the equivalent inter mode being made by checking whether the reference frame is the current frame. This may be achieved, for example, by checking the reference index of the list L0 and if it is the last frame in the list, then intra block copy is inferred. Other approaches compare the picture order counts of the current and reference frames: if equal, it is intra block copy.

The derived design of predictors and candidates is important in achieving optimal coding efficiency without a disproportionate impact on complexity. In HEVC, two motion vector derivatives are used: one for inter mode (advanced motion vector prediction (AMVP)) and one for merge mode (merge derivation process for classical merge mode and classical merge skip mode). These processes are described below.

Fig. 6a and 6b illustrate spatial and temporal blocks that may be used to generate motion vector predictors in Advanced Motion Vector Prediction (AMVP) and merge modes of an HEVC encoding and decoding system, and fig. 7 shows simplified steps of a process of AMVP predictor set derivation.

As represented in fig. 6a, two spatial predictors (i.e., two spatial motion vectors of AMVP mode) are selected among motion vectors of a top block (indicated by the letter "B") and a left block (indicated by the letter "a") including a top corner block (block B2) and a left corner block (block a0), and one temporal predictor is selected among motion vectors of a bottom right block (H) and a Center block (Center) of a collocated block.

Table 1 below summarizes the nomenclature used when referring to blocks relative to the current block as shown in fig. 6a and 6 b. This nomenclature is used as a shorthand, but it should be understood that other marking systems may be used, particularly in future versions of the standard.

TABLE 1

It should be noted that the "current block" may be variable in size, e.g., 4 × 4, 16 × 16, 32 × 32, 64 × 64, 128 × 128, or any size in between. The size of the block is preferably a factor of 2 (i.e., 2^ n x 2^ m where n and m are positive integers), as this results in a more efficient use of bits when binary coding is used. The current block need not be square, but this is generally the preferred embodiment for coding complexity.

Turning to fig. 7, the first step is aimed at selecting a first spatial predictor (Cand 1, 706) in the bottom left blocks a0 and a1 (spatial positions are illustrated in fig. 6 a). To this end, the blocks are selected one after the other in a given order (700, 702), and for each selected block the following conditions are evaluated in the given order (704), the first block that satisfies the conditions being set as predictor:

-motion vectors from the same reference list and the same reference picture;

-motion vectors from other reference lists and the same reference picture;

-scaled motion vectors from the same reference list and different reference images; or

-scaled motion vectors from other reference lists and different reference images.

If no value is found, the left predictor is considered to be unavailable. In this case, it is indicated that the relevant blocks are intra-coded or that these blocks do not exist.

The following steps are aimed at selecting a second spatial predictor (Cand 2, 716) among the upper right block B0, the upper block B1 and the upper left block B2 (spatial positions are illustrated in fig. 6 a). To this end, the blocks are selected one after the other in a given order (708, 710, 712), and for each selected block the above-mentioned conditions are evaluated in the given order (714), the first block that satisfies the above-mentioned conditions being set as predictor.

Likewise, if no value is found, the top predictor is considered unavailable. In this case, it is indicated that the relevant blocks are intra-coded or that these blocks do not exist.

In a next step (718), if both predictors are available, the two predictors are compared to each other to remove one of the two predictors if they are equal (i.e., same motion vector value, same reference list, same reference index, and same direction type). If only one spatial predictor is available, the algorithm looks for a temporal predictor in the next step.

The temporal motion predictor (Cand 3, 726) is derived as follows: the bottom right (H, 720) position of the collocated block in the previous/reference frame is first considered in the availability check module 722. If not, or if the motion vector predictor is not available, the center of the collocated block is selected (Centre, 724) for checking. These time positions (center and H) are depicted in fig. 6 a. In any case, scaling 723 is applied to these candidates to match the temporal distance between the current frame and the first frame in the reference list.

The motion predictor value is then added to the set of predictors. Next, the number of predictors (Nb _ Cand) is compared to a maximum number of predictors (Max _ Cand) (728). As mentioned above, in the current version of the HEVC standard, the derivation process of AMVP requires the maximum number of predictors (Max _ Cand) of motion vector predictors generated to be two.

If this maximum number is reached, a final list or set of AMVP predictors is constructed (732). Otherwise, the zero predictor is added to the list (730). A zero predictor is a motion vector equal to (0, 0).

As illustrated in fig. 7, a final list or set of AMVP predictors (732) is constructed from a subset of spatial motion predictor candidates (700 to 712) and from a subset of temporal motion predictor candidates (720, 724).

As mentioned above, the motion predictor candidates for the classical merge mode or the classical merge skip mode represent all required motion information: direction, list, reference frame index, and motion vector. Several candidate indexed lists are generated by a merged export process. In the current HEVC design, the maximum number of candidates for the two merge modes (i.e., classical merge mode and classical merge skip mode) is equal to five (4 spatial candidates and 1 temporal candidate).

Fig. 8 is a schematic diagram of a motion vector derivation process in the merge mode (the classic merge mode and the classic merge skip mode). In the first step of the derivation process, five block positions (800 to 808) are considered. These positions are the spatial positions depicted in fig. 6a with reference numerals a1, B1, B0, a0 and B2. In the next step, the availability of spatial motion vectors is checked and at most five motion vectors are selected/obtained for consideration (810). In the case where a predictor exists and a block is not intra-coded, the predictor is considered to be available. Therefore, motion vectors corresponding to five blocks are selected as candidates according to the following condition:

if the "left" a1 motion vector (800) is available (810), i.e. if it exists and if the block is not intra coded, the motion vector of the "left" block is selected and used as the first candidate in the candidate list (814);

if the "up" B1 motion vector (802) is available (810), the candidate "up" block motion vector is compared to the "left" a1 motion vector (if present) (812). If the B1 motion vector is equal to the A1 motion vector, then B1 is not added to the list of spatial candidates (814). Conversely, if the B1 motion vector is not equal to the a1 motion vector, then B1 is added to the list of spatial candidates (814);

if the "top right" B0 motion vector (804) is available (810), the "top right" motion vector is compared to the B1 motion vector (812). If the B0 motion vector is equal to the B1 motion vector, then the B0 motion vector is not added to the list of spatial candidates (814). Conversely, if the B0 motion vector is not equal to the B1 motion vector, then the B0 motion vector is added to the list of spatial candidates (814);

if the "bottom left" A0 motion vector (806) is available (810), the "bottom left" motion vector is compared to the A1 motion vector (812). If the A0 motion vector is equal to the A1 motion vector, then the A0 motion vector is not added to the list of spatial candidates (814). Conversely, if the a0 motion vector is not equal to the a1 motion vector, then the a0 motion vector is added to the list of spatial candidates (814); and

if the list of spatial candidates does not contain four candidates, the availability of the "top left" B2 motion vector (808) is checked (810). If available, it is compared to the A1 motion vector and the B1 motion vector. If the B2 motion vector is equal to the A1 motion vector or the B1 motion vector, then the B2 motion vector is not added to the list of spatial candidates (814). Conversely, if the B2 motion vector is not equal to the a1 motion vector or the B1 motion vector, then the B2 motion vector is added to the list of spatial candidates (814).

At the end of this phase, the list of spatial candidates includes up to four candidates.

For the time candidate, two positions may be used: the bottom right position of the juxtaposed block (816, denoted H in fig. 6 a) and the center of the juxtaposed block (818). These positions are depicted in fig. 6 a.

With respect to the temporal motion predictor of the AMVP motion vector derivation process described in relation to fig. 7, the first step aims at checking the availability of the block at the H position (820). Next, if the block is unavailable, the availability of the block at the central location is checked (820). If at least one motion vector for these locations is available, then the temporal motion vector may be scaled (822) relative to the reference frame with index 0 for both lists L0 and L1, as needed, to create a temporal candidate that is added to the list of merged motion vector predictor candidates (824). The temporal candidate is placed after the spatial candidate in the list. Lists L0 and L1 are 2 reference frame lists containing zero, one, or multiple reference frames.

If the number of candidates (Nb _ Cand) is strictly less than the maximum number of candidates (Max _ Cand, which is signaled in the bitstream slice header and is equal to five in the current HEVC design) (826), and if the current frame is B-type, a combination candidate is generated (828). A combination candidate is generated based on available candidates of the list of merged motion vector predictor candidates. It mainly consists in combining (pairing) the motion information of one candidate of list L0 with the motion information of one candidate of list L1.

If the number of candidates (Nb _ Cand) remains strictly less than the maximum number of candidates (Max _ Cand) (830), then zero motion candidates are generated (832) until the number of candidates of the merged motion vector predictor candidate list reaches the maximum number of candidates.

At the end of the process, a list or set of merged motion vector predictor candidates (i.e., a list or set of candidates for merge mode (classical merge mode and classical merge skip mode)) is constructed (834). As illustrated in fig. 8, the list or set of merged motion vector predictor candidates is constructed from a subset of spatial candidates (800 to 808) and from a subset of temporal candidates (816, 818) (834).

Alternative Temporal Motion Vector Prediction (ATMVP)

Alternative Temporal Motion Vector Prediction (ATMVP) is a special type of motion compensation. Instead of considering only one motion information for the current block from the temporal reference frame, the respective motion information of the respective collocated blocks is considered. Thus, as depicted in fig. 9, the temporal motion vector prediction gives the partitioning of the current block using the associated motion information of each sub-block.

In current VTM reference software, ATMVP is signaled as a merge candidate inserted into the merge candidate list, i.e., the list and set of candidates for merge mode (classical merge mode and classical merge skip mode). When ATMVP is enabled at the SPS level, the maximum number of merge candidates is increased by one. Thus, 6 candidates are considered instead of 5 candidates in the case where the ATMVP mode is disabled.

In addition, when this prediction is enabled at the SPS level, all bins of the merge index (i.e., the identifier or index used to identify a candidate from the list of merge candidates) are context coded by CABAC. When ATMVP is not enabled in HEVC or at SPS level in JEM, only the first bin is context coded and the remaining bins are context bypass coded (i.e., bypass CABAC coding). Fig. 10(a) illustrates coding of a merge index for HEVC or when ATMVP is not enabled at SPS level in JEM. This corresponds to the unary maximum code. In addition, the first bit is CABAC encoded and the other bits are bypass CABAC encoded.

Fig. 10(b) illustrates coding of a merge index when ATMVP is enabled at an SPS level. In addition, all bits are CABAC encoded (from 1 st to 5 th bits). It should be noted that the individual bits used to encode the index have their own context-in other words, their probabilities are separate.

Affine mode

In HEVC, only the translational motion model is applied to Motion Compensated Prediction (MCP). While in the real world there are many kinds of movements such as zoom in/out, rotation, perspective movement and other irregular movements.

In JEM, a simplified affine transform motion compensated prediction is applied, and the general principles of affine mode are described below based on an excerpt of the document jfet-G1001 presented at the jfet conference of torrio, 7 months 13-21, 2017. This document is incorporated by reference herein in its entirety to the extent that it describes other algorithms used in JEM.

As shown in fig. 11(a), the affine motion field of a block is described by two control point motion vectors.

The Motion Vector Field (MVF) of a block is described by the following equation:

wherein (v)0x,v0y) A motion vector of a control point that is the top left corner, and (v)1x,v1y) Is the motion vector of the control point of the top right corner. And w is the width of the block Cur (current block)

To further simplify the motion compensated prediction, sub-block based affine transform prediction is applied. The sub-block size M N is derived as in equation 2, where MvPre is the motion vector fractional precision (1/16 in JEM), (v) and2x,v2y) Is according to equation 1And calculating the motion vector of the top left control point.

After being derived by equation 2, M and N can be adjusted downward, if necessary, to be divisors of w and h, respectively. h is the height of the current block Cur (current block).

To derive the motion vector for each M × N sub-block, the motion vector of the center sample of each sub-block as shown in fig. 6a is calculated according to equation 1 and rounded to 1/16 fractional precision. A motion compensated interpolation filter is then applied to generate a prediction for each sub-block having a derived motion vector.

Affine mode is a motion compensation mode like inter mode (AMVP, "classical" merge skip). The principle is to generate one motion information per pixel from 2 or 3 neighboring motion information. In JEM, as depicted in fig. 11(a)/(b), the affine mode derives one motion information for each 4 × 4 block (each square is a 4 × 4 block, and the entire block in fig. 11(a)/(b) is a 16 × 16 block, which is divided into 16 blocks of such 4 × 4 sized squares — each 4 × 4 square block has a motion vector associated therewith). It should be understood that in embodiments of the present invention, the affine mode may derive one motion information for blocks of different sizes or shapes as long as the one motion information can be derived. By using the flag, the affine mode is enabled, which is available for the AMVP mode and the merge mode, i.e., the classic merge mode (also referred to as "non-affine merge mode") and the classic merge skip mode (also referred to as "non-affine merge skip mode"). The flag is CABAC encoded. In an embodiment, the context depends on the sum of the affine flags of the left block (position a2 of fig. 6B) and the upper left block (position B3 of fig. 6B).

Thus, for affine flags, three context variables (0, 1 or 2) can be given in JEM by the following formula:

Ctx=IsAffine(A2)+IsAffine(B3)

where IsAffine (block) is a function that returns 0 if the block is not an affine block and 1 if the block is an affine.

Affine merge candidate derivation

In JEM, the affine merge mode (or affine merge skip mode) derives motion information for the current block from the first neighboring block that is affine in the blocks at positions a1, B1, B0, a0, B2 (i.e., the first neighboring block encoded using affine mode). These positions are depicted in fig. 6a and 6 b. However, how to derive affine parameters is not fully defined, and the present invention aims at: this aspect is at least improved, for example, by defining affine parameters of an affine merging mode to enable a wider selection for affine merging candidates (i.e., not only affine first neighboring blocks, but also at least one other candidate is available for the selection, with an identifier such as an index, etc.).

For example, according to some embodiments of the present invention, an affine merge mode having its own list of affine merge candidates (candidates for deriving/obtaining motion information of an affine mode) and an affine merge index (for identifying one affine merge candidate from the list of affine merge candidates) is used for encoding or decoding a block.

Signaling affine merging

Fig. 12 is a flow diagram of a partial decoding process of some syntax elements related to an encoding mode for signaling the use of an affine merge mode. In the figure, a skip flag (1201), a prediction mode (1211), a merge flag (1203), a merge index (1208), and an affine flag (1206) may be decoded.

The skip flag is decoded for all CUs in the inter slice (1201). If the CU is not skipped (1202), the pred mode (prediction mode) is decoded (1211). The syntax element indicates whether the current CU is encoded (to be decoded) in inter mode or intra mode. Note that if the CU is skipped 1202, its current mode is inter mode. If the CU is not skipping (1202: NO), the CU is encoded in AMVP or merge mode. If the CU is in inter mode (1212), the merge flag is decoded (1203). If the CU is merging (1204) or if the CU is skipping (1202: Yes), it is verified/checked (1205) whether the affine flag (1206) needs to be decoded, i.e., it is determined at (1205) whether the current CU is already encoded in affine mode. If the current CU is a2 nx 2N CU, the flag is decoded, which means that the height and width of the CU should be equal in the current VVC. Furthermore, at least one neighboring CU a1 or B1 or B0 or a0 or B2 must be encoded with an affine mode (affine merge mode or affine mode enabled AMVP mode). Finally, the current CU should not be a 4 × 4CU, but the CU 4 × 4 is disabled by default in the VTM reference software. If the condition (1205) is false, it is determined that the current CU is encoded in a classical merge mode (or a classical merge skip mode) as specified in HEVC, and the merge index is decoded (1208). If the affine flag (1206) is set equal to 1(1207), the CU is a merge affine CU (i.e., a CU encoded in affine merge mode) or a merge skip affine CU (i.e., a CU encoded in affine merge skip mode), and the merge index (1208) does not need to be decoded (because the affine merge mode is used, i.e., the CU will be decoded using the affine mode with the first neighboring block being affine). Otherwise, the current CU is a classical (base) merging or merging skip CU (i.e., a CU encoded in a classical merging or merging skip mode), and the merging index candidate is decoded (1208).

In this specification, "signaling" may refer to inserting (providing/including) one or more syntax elements representing an enabled or disabled mode or other information into or extracting/obtaining from a bitstream.

Merging candidate exports

Fig. 13 is a flowchart illustrating merging candidate (i.e., a candidate of a classical merging mode or a classical merging skip mode) derivation. This derivation is built on the basis of the motion vector derivation process (i.e. merge candidate list derivation for HEVC) which is represented in fig. 8 as a merge mode. The main changes compared to HEVC are the addition of ATMVP candidates (1319, 1321, 1323), full redundancy check of candidates (1325) and new order of candidates. ATMVP prediction is set as a specific candidate because it represents some motion information of the current CU. The value of the first sub-block (top left) is compared to the time candidate and if they are equal, the time candidate is not added to the merged list (1320). ATMVP candidates are not compared to other spatial candidates. Instead, the temporal candidate is compared (1325) with the respective spatial candidates already in the list and not added in the merge candidate list if it is a redundant candidate.

When a spatial candidate is added to the list, it is compared to other spatial candidates in the list (1312), which is not the case in the final version of HEVC.

In the current VTM version, the list of merging candidates is set in the following order, since it has been determined to provide the best results under encoding test conditions:

·A1

·B1

·B0

·A0

·ATMVP

·B2

time of

Combinations of

Zero _ MV

It is important to note that spatial candidate B2 is placed after the ATMVP candidate.

In addition, when ATMVP is enabled at the slice level, the maximum number in the candidate list is 6 instead of 5 for HEVC.

An exemplary embodiment of the present invention will now be described with reference to fig. 12 to 16 and fig. 22 to 24. It should be noted that the embodiments may be combined unless explicitly stated otherwise; for example, certain combinations of embodiments may improve coding efficiency with increased complexity, but this may be acceptable in certain use cases.

First embodiment

As described above, in the current VTM reference software, ATMVP is signaled as a merge candidate inserted into the list of merge candidates. ATMVP may be enabled or disabled for the entire sequence (at the SPS level). When ATMVP is disabled, the maximum number of merge candidates is 5. When ATMVP is enabled, the maximum number of merge candidates increases from 5 by 1 to 6.

In the encoder, a list of merging candidates is generated using the method of fig. 13. One merge candidate is selected from the merge candidate list, e.g., based on a rate-distortion criterion. The selected merge candidates are signaled to the decoder in the bitstream using a syntax element called a merge index.

In current VTM reference software, the way the merging index is encoded is different depending on whether ATMVP is enabled or disabled.

Fig. 10(a) illustrates coding of a merge index when ATMVP is not enabled at the SPS level. The 5 merge candidates Cand 0, Cand l, Cand 2, Cand 3 and Cand 4 are encoded as 0, 10, 110, 1110 and 1111, respectively. This corresponds to unary maximum coding. In addition, the first bit is encoded by CABAC using a single context and the other bits are bypass-encoded.

Fig. 10(b) illustrates coding of a merge index when ATMVP is enabled. The 6 merge candidates Cand 0, Cand l, Cand 2, Cand 3, Cand 4 and Cand 5 are encoded as 0, 10, 110, 1110, 11110 and 11111, respectively. In this case, all bits (from bit 1 to bit 5) of the merge index are context-coded by CABAC. Each bit has its own context and there are separate probability models for different bits.

In the first embodiment of the present invention, as shown in fig. 14, when ATMVP is included as a merge candidate in the merge candidate list (e.g., when ATMVP is enabled at the SPS level), the coding of the merge index is modified such that only the first bit of the merge index is coded by CABAC using a single context. When ATMVP is not enabled at the SPS level, context is set in the same manner as in the current VTM reference software. The other bits (from bit 2 to bit 5) are bypass coded. When ATMVP is not included as a merge candidate in the merge candidate list (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates. Only the first bit of the merge index is coded by CABAC using a single context. When ATMVP is not enabled at the SPS level, context is set in the same manner as in the current VTM reference software. The other bits (2 nd to 4 th bits) are bypass decoded.

The decoder generates the same merge candidate list as the encoder. This can be done using the method of fig. 13. When ATMVP is not included as a merge candidate in the merge candidate list (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates. Only the first bit of the merging index is decoded by CABAC using a single context. The other bits (from bit 2 to 4) are bypass decoded. In contrast to current reference software, when ATMVP is included as a merge candidate in the merge candidate list (e.g., when ATMVP is enabled at the SPS level), only the first bit of the merge index is decoded by CABAC using a single context in the decoding of the merge candidate. The other bits (from bit 2 to bit 5) are bypass decoded. The decoded merge index is used to identify a merge candidate selected by the encoder from the merge candidate list.

An advantage of this embodiment over VTM2.0 reference software is that the complexity of the merge index decoding and decoder design (and encoder design) is reduced without affecting the coding efficiency. Indeed, with this embodiment, only 1 CABAC state is needed for the merge index instead of 5 for the current VTM merge index encoding/decoding. Furthermore, the worst case complexity is reduced because the other bits are CABAC bypass encoded, which reduces the number of operations compared to encoding all bits with CABAC.

Second embodiment

In a second embodiment, all bits of the merge index are CABAC encoded, but they all share the same context. There may be a single context as in the first embodiment, in which case the single context is shared between the bits. Thus, when ATMVP is included as a merge candidate in the merge candidate list (e.g., when ATMVP is enabled at the SPS level), only one context is used as compared to VTM2.0 referencing 5 in software. An advantage of this embodiment over VTM2.0 reference software is that the complexity of the merge index decoding and decoder design (and encoder design) is reduced without affecting the coding efficiency.

Alternatively, as described below in connection with the third to fifteenth embodiments, a context variable may be shared among bits so that two or more contexts are available, but the current context is shared by the bits.

When ATMVP is disabled, the same context is still used for all bits.

This and all subsequent embodiments may be applied even if ATMVP is not available mode or disabled.

In a variation of the second embodiment, any two or more bits of the merge index are CABAC encoded and share the same context. The other bits of the merge index are bypass coded. For example, the first N bits of the merge index may be CABAC encoded, where N is two or more.

Third embodiment

In a first embodiment, the first bit of the merging index is CABAC encoded using a single context.

In a third embodiment, the context variable of the bit of the merge index depends on the value of the merge index of the neighboring block. This allows more than one context for the target bit, where each context corresponds to a different value of the context variable.

The neighboring block may be any block that has already been decoded such that its merge index is available to the decoder when the current block is decoded. For example, the neighboring block may be any one of the blocks a0, a1, a2, B0, B1, B2, and B3 shown in fig. 6B.

In a first variant, the first bit is CABAC encoded using only the context variable.

In a second variation, the first N bits of the merge index (where N is two or more) are CABAC encoded and the context variable is shared between these N bits.

In a third variation, any N bits of the merge index (where N is two or more) are CABAC encoded and the context variable is shared between these N bits.

In a fourth variation, the first N bits of the merge index (where N is two or more) are CABAC encoded and N context variables are used for these N bits. Assuming that the context variable has K values, K × N CABAC states are used. For example, in the present embodiment, the context variable may conveniently have 2 values, e.g., 0 and 1, for one neighboring block. In other words, 2N CABAC states are used.

In a fifth variation, any N bits of the merge index (where N is two or more) are adaptively PM encoded, and N context variables are used for the N bits.

The same modification is applied to the fourth to sixteenth embodiments described below.

Fourth embodiment

In a fourth embodiment, the context variables for the bits of the merge index depend on the respective values of the merge indices of two or more neighboring blocks. For example, the first adjacent block may be a left block a0, a1, or a2, and the second adjacent block may be an upper block B0, B1, B2, or B3. The manner of combining two or more merge index values is not particularly limited. Examples are given below.

The context variable may conveniently have 3 different values, for example 0, 1 and 2, in which case there are two adjacent blocks. If the fourth variation described in connection with the third embodiment is applied to this embodiment with 3 different values, K is 3 instead of 2. In other words, 3N CABAC states are used.

Fifth embodiment

In the fifth embodiment, the context variables for the bits of the merge index depend on the respective values of the merge indices of the neighboring blocks a2 and B3.

Sixth embodiment

In the sixth embodiment, the context variables for the bits of the merge index depend on the respective values of the merge indices of the neighboring blocks a1 and B1. The advantage of this variant is alignment with the merging candidate derivation. As a result, in some decoder and encoder implementations, a reduction in memory access may be achieved.

Seventh embodiment

In the seventh embodiment, the context variable for the bit with bit position idx _ num in the merge index of the current block is obtained according to the following formula:

ctxIdx=(Merge_index_left==idx_num)+(Merge_index_up==idx_num)

where, Merge _ index _ left is the Merge index of the left-side block, Merge _ index _ up is the Merge index of the upper block, and the symbol is equal.

For example, when there are 6 merge candidates, 0< ═ idx _ num < ═ 5.

The left block may be block a1 and the upper block may be block B1 (as in the sixth embodiment). Alternatively, the left block may be block a2 and the upper block may be block B3 (as in the fifth embodiment).

If the Merge index of the left-side block is equal to idx _ num, the formula (Merge _ index _ left ═ idx _ num) is equal to 1. The following table gives the results of this formula (Merge _ index _ left ═ idx _ num):

of course, the table of the formula (Merge _ index _ up ═ idx _ num) is the same.

The following table gives the unary maximum code for each merge index value and the relative bit positions of the bits. This table corresponds to fig. 10 (b).

If the left block is not a merge block or an affine merge block (i.e., encoded using affine merge mode), the left block is considered unavailable. The same conditions apply for the upper block.

For example, when only the first bit is CABAC encoded, the context variable ctxIdx is set to:

equal to 0 if no left and upper/upper block has a merge index, or if the left block merge index is not the first index (i.e., not 0) and if the upper block merge index is not the first index (i.e., not 0);

equal to 1 if one of the left block and the upper block has a merge index equal to the first index and the other block does not; and

if the merge index is equal to the first index for each of the left and top blocks, then it is equal to 2.

More generally, for a target bit at position idx _ num that is CABAC encoded, the context variable ctxIdx is set to:

equal to 0 if there is no left block and the up/top block has a merge index, or if the left block merge index is not the ith index (where i ═ idx _ num) and if the top block merge index is not the ith index;

equal to 1 if one of the left block and the upper square has a merge index equal to the ith index and the other block does not; and

if the merge index is equal to the ith index, then 2, for each of the left and top blocks. Here, the ith index denotes a first index when i is 0, a second index when i is 1, and so on.

Eighth embodiment

In the eighth embodiment, the context variable for the bit with bit position idx _ num in the merge index of the current block is obtained according to the following formula:

ctx ═ Merge _ index _ left > idx _ num) + (Merge _ index _ up > idx _ num), where Merge _ index _ left is the Merge index of the left-hand chunk, Merge _ index _ up is the Merge index of the upper chunk, and sign > means "greater than".

For example, when there are 6 merge candidates, 0< ═ idx _ num < > 5.

The left block may be block a1 and the upper block may be block B1 (as in the fifth embodiment). Alternatively, the left block may be block a2 and the upper block may be block B3 (as in the sixth embodiment).

If the Merge index of the left chunk is greater than idx _ num, then the formula (Merge _ index _ left > idx _ num) is equal to 1. If the left block is not a merge block or an affine merge block (i.e., encoded using affine merge mode), the left block is considered unavailable. The same conditions apply for the upper block.

The following table gives the results of this formula (Merge _ index _ left > idx _ num):

for example, when only the first bit is CABAC encoded, the context variable ctxIdx is set to:

equal to 0 if no left and upper/upper block has a merge index, or if the left block merge index is less than or equal to the first index (i.e., not 0) and if the upper block merge index is less than or equal to the first index (i.e., not 0);

equal to 1 if one of the left block and the upper block has a merge index greater than the first index and the other block does not; and

if the merge index is greater than the first index for each of the left and top blocks, then it equals 2.

More generally, for a target bit at position idx _ num that is CABAC encoded, the context variable ctxIdx is set to:

equal to 0 if no left and upper/upper block has a merge index, or if the left block merge index is less than the ith index (where i ═ idx _ num) and if the upper block merge index is less than or equal to the ith index;

equal to 1 if one of the left block and the upper block has a merge index greater than the ith index and the other block does not; and

if the merge index is greater than the ith index for each of the left and top blocks, then it equals 2.

The eighth embodiment provides a further increase in coding efficiency compared to the seventh embodiment.

Ninth embodiment

In the fourth to eighth embodiments, the context variables for the bits of the merge index of the current block depend on the respective values of the merge indices of two or more neighboring blocks.

In a ninth embodiment, the context variables for the bits of the merge index of the current block depend on the respective merge flags of two or more neighboring blocks. For example, the first adjacent block may be a left block a0, a1, or a2, and the second adjacent block may be an upper block B0, B1, B2, or B3.

The merge flag is set to 1 when a block is encoded using the merge mode, and is set to 0 when other modes such as the skip mode or the affine merge mode are used. Note that in VMT2.0, affine merging is a different mode than the basic or "classical" merging mode. Affine merging mode can be signaled using an affine-only flag. Alternatively, the list of merge candidates may comprise affine merge candidates, in which case the affine merge mode may be selected and signaled using a merge index.

The context variables are then set to:

0 if the merge flag of the left neighboring block and the above neighboring block is not set to 1;

1 if the merge flag of one of the left-neighboring block and the above-neighboring block is set to 1 and the other block is not; and

and 2 if the left-neighboring block and the above-neighboring block each set their merge flag to 1.

This simple measure achieves an improvement in coding efficiency compared to VTM 2.0. Another advantage compared to the seventh and eighth embodiments is the lower complexity, since only the merging flags of the neighboring blocks need to be checked instead of the merging index.

In a variant, the context variable for the bits of the merge index of the current block depends on the merge flag of a single neighboring block.

Tenth embodiment

In the third to ninth embodiments, the context variable for the bit of the merge index of the current block depends on the merge index value or the merge flag of one or more neighboring blocks.

In the tenth embodiment, the context variable for the bits of the merge index of the current block depends on the value of the skip flag of the current block (current coding unit or CU). The skip flag is equal to 1 when the current block uses the merge skip mode, and is equal to 0 otherwise.

The skip flag is a first example of another variable or syntax element that has been decoded or parsed for the current block. The other variable or syntax element is preferably an indicator of the complexity of the motion information in the current block. Since the occurrence of a merge index value depends on the complexity of motion information, a variable or syntax element such as a skip flag is typically associated with the merge index value.

More specifically, the merge-skip mode is typically selected for static scenes or scenes involving constant motion. As a result, the merge index value is typically lower for the merge skip mode than for the classical merge mode used to encode inter prediction containing block residuals. This typically occurs for more complex movements. However, the choice between these modes is also typically related to quantization and/or RD criteria.

This simple measure provides an increase in coding efficiency compared to VTM 2.0. It is also very simple to implement since it does not involve neighboring blocks or checking the merge index value.

In a first variant, the context variable for the bits of the merge index of the current block is simply set equal to the skip flag of the current block. This bit may be the first bit only. The other bits are bypass-coded as in the first embodiment.

In a second variant, all bits of the merge index are CABAC encoded and each has its own context variable dependent on the merge flag. When there are 5 CABAC coded bits (corresponding to 6 merge candidates) in the merge index, this requires 10 probability states.

In a third variant, to limit the number of states, only the N bits of the merging index are CABAC encoded, where N is two or more, e.g., the first N bits. This requires 2N states. For example, when the current 2 bits are CABAC encoded, 4 states are required.

In general, instead of the skip flag, any other variable or syntax element that has been decoded or parsed for the current block and is an index of complexity of motion information in the current block may be used.

Eleventh embodiment

The eleventh embodiment relates to affine merge signaling as previously described with reference to fig. 11(a), 11(b), and 12.

In the eleventh embodiment, the context variable of the CABAC-encoded bit for the merge index of the current block (current CU) depends on the affine merge candidate (if any) in the merge candidate list. This bit may be the first bit of the merge index only, or the first N bits, where N is two or more, or any N bits. The other bits are bypass coded.

Affine prediction is designed to compensate for complex motion. Thus, for complex motions, the merge index typically has a higher value than for less complex motions. If the first affine merge candidate is located lower in the list, or if no affine merge candidate exists at all, the merge index of the current CU may have a small value.

Thus, effectively, the context variable depends on the presence and/or location of the at least one affine merging candidate in the list.

For example, the context variable may be set to:

if A1 is affine, it equals 1

If B1 is affine, then it equals 2

If B0 is affine, then it equals 3

If A0 is affine, then it equals 4

If B2 is affine, then it equals 5

If no neighboring block is affine, it equals 0.

The affine flags of the merge candidates at these positions have been checked when decoding or parsing the merge index of the current block. As a result, no further memory access is required to derive the context for the merge index of the current block.

This embodiment provides an increase in coding efficiency compared to VTM 2.0. No additional memory access is needed because step 1205 already involves checking the neighboring CU affine pattern.

In a first variant, to limit the number of states, the context variable may be set to:

equal to 0 if no neighboring blocks are affine, or if A1 or B1 are affine

If B0, A0, or B2 are affine, then equal to 1

In a second variant, to limit the number of states, the context variable may be set to:

equal to 0 if no neighboring block is affine

If A1 or B1 is affine, then equal to 1

If B0, A0, or B2 are affine, then equal to 2

In a third variation, the context variables may be set to:

if A1 is affine, it equals 1

If B1 is affine, then it equals 2

If B0 is affine, then it equals 3

If A0 or B2 is affine, then equal to 4

If no neighboring block is affine, it equals 0.

Note that these locations have already been checked when decoding or parsing the merged index, since affine flag decoding depends on these locations. As a result, no additional memory access is required to derive the merged index context encoded after the affine flag.

Twelfth embodiment

In a twelfth embodiment, signaling the affine pattern comprises inserting the affine pattern as a candidate motion predictor.

In one example of the twelfth embodiment, affine merging (and affine merging skip) is signaled as a merging candidate (i.e., as one of the merging candidates used with either the classical merging mode or the classical merging skip mode). In this case, blocks 1205, 1206, and 1207 of FIG. 12 are removed. In addition, in order not to affect the coding efficiency of the merge mode, the maximum possible number of merge candidates is incremented. For example, in the current version of the VTM, the value is set equal to 6, so if this embodiment is applied to the current version of the VTM, the value will be 7.

An advantage is a simplified design of syntax elements for the merge mode, since fewer syntax elements need to be decoded. In some cases, coding efficiency improvements/changes may be observed.

Two possible ways of implementing this example will now be described:

the merge index of an affine merge candidate always has the same position within the list, regardless of the values of the other merge MVs. The position of a candidate motion predictor indicates the likelihood that it is selected and, therefore, if it is placed higher (lower index value) in the list, it is more likely to select that motion vector predictor.

In a first example, the merge index of an affine merge candidate always has the same position within the merge candidate list. This means that it has a fixed "merged idx" value. For example, the value may be set equal to 5, since the affine merge mode should represent a complex motion that is not the most likely content. An additional advantage of this embodiment is that the current block can be set as an affine block when parsing the current block (decoding/reading only syntax elements and not decoding the data itself). As a result, this value may be used to determine the CABAC context for the affine flag of AMVP. Therefore, the conditional probability should be improved for this affine flag and the coding efficiency should be better.

In a second example, affine merge candidates are derived along with other merge candidates. In this example, the new affine merge candidate is added to the merge candidate list (for either classical merge mode or classical merge skip mode). Fig. 16 illustrates this example. In contrast to fig. 13, the affine merge candidate is the first affine neighboring block from a1, B1, B0, a0, and B2 (1917). If the same condition as 1205 of fig. 12 is valid (1927), a motion vector field generated using affine parameters is generated to obtain an affine merge candidate (1929). The list of initial merge candidates may have 4, 5, 6, or 7 candidates depending on ATMVP, time, and the use of affine merge candidates.

The order between all these candidates is important because the more likely candidates should be processed first to ensure that they are more likely to advance to motion vector candidates-the preferred ordering is as follows:

A1

B1

B0

A0

affine merging

ATMVP

B2

Time

Combination of

Zero _ MV

It is important to note that the affine merge candidate is located before the ATMVP candidate but after the four primary neighboring blocks. The advantage of placing the affine merge candidate before the ATMVP candidate is that the coding efficiency is increased compared to placing the affine merge candidate after the ATMVP and temporal predictor candidates. This increase in coding efficiency depends on the GOP (group of pictures) structure and the Quantization Parameter (QP) setting of each picture in the GOP. But this order gives an increase in coding efficiency for the most common GOP and QP settings.

Another advantage of this solution is the compact design of classical merging and classical merging skip modes (i.e. merging modes with additional candidates such as ATMVP or affine merge candidates) for both syntax and derivation processing. Further, the merge index of an affine merge candidate may change depending on the availability or value (redundancy check) of the previous candidate in the merge candidate list. Thus, efficient signaling can be obtained.

In another example, the merge index of an affine merge candidate may vary depending on one or several conditions.

For example, the merge index or position within the list associated with the affine merge candidate changes according to a criterion. The principle is that when an affine merge candidate has a high probability of being selected, a low value is set for the merge index corresponding to the affine merge candidate (and when there is a low probability to be selected, a higher value is set).

In the twelfth embodiment, affine merge candidates have a merge index value. To improve the coding efficiency of the merge index, it is efficient to make the context variables for the bits of the merge index dependent on the affine flag for the neighboring blocks and/or for the current block.

For example, the context variable may be determined using the following formula:

ctxIdx=IsAffine(A1)+IsAffine(B1)+IsAffine(B0)+IsAffine(A0)+IsAffine(B2)

the resulting context value may have a value of 0, 1, 2, 3, 4, or 5.

Affine flags increase coding efficiency.

In a first variant, to involve fewer neighboring blocks, ctxIdx ═ IsAffine (a1) + IsAffine (B1). The resulting context value may have a value of 0, 1 or 2.

In a second variant, also fewer neighboring blocks are involved, ctxIdx ═ IsAffine (a2) + IsAffine (B3). Again, the resulting context value may have a value of 0, 1 or 2.

In a third variant, no neighboring blocks are involved, ctxIdx ═ IsAffine (current block). The resulting context value may have a value of 0 or 1.

Fig. 15 is a flow chart of a partial decoding process of some syntax elements relating to the coding mode with the third variant. In the figure, a skip flag (1601), a prediction mode (1611), a merge flag (1603), a merge index (1608), and an affine flag (1606) may be decoded. This flowchart is similar to the previously described flowchart of fig. 12, and thus detailed description is omitted. The difference lies in that: the merge index decoding process takes into account the affine flag so that when obtaining the context variables of the merge index, the affine flag decoded before the merge index can be used, which is not the case in VTM 2.0. In VTM2.0, the affine flag of the current block cannot be used to obtain the context variable of the merge index because it always has the same value "0".

Thirteenth embodiment

In a tenth embodiment, the context variables of the bits of the merge index of the current block depend on the value of the skip flag for the current block (current coding unit or CU).

In the thirteenth embodiment, instead of directly using the skip flag value to derive the context variable of the target bit of the merge index, the context value of the target bit is derived from the context variable used to encode the skip flag of the current CU. This is possible because the skip flag itself is CABAC coded and therefore has a context variable.

Preferably, the context variable of the target bit of the merge index of the current CU is set equal to the context variable used to encode the skip flag of the current CU (the context variable of the target bit of the merge index of the current CU is copied from the context variable used to encode the skip flag of the current CU).

The target bit may be the first bit only. The other bits may be bypass encoded as in the first embodiment.

The context variable for the skip flag of the current CU is derived in the manner specified in VTM 2.0. An advantage of this embodiment over VTM2.0 reference software is that the complexity of the merge index decoding and decoder design (and the encoder design) is reduced without affecting the coding efficiency. Indeed, with this embodiment, at least, only 1 CABAC state is needed to encode the merge index, instead of 5 CABAC states for the current VTM merge index encoding (encoding/decoding). Furthermore, this reduces the worst case complexity because the other bits are CABAC bypass encoded, which reduces the number of operations compared to encoding all bits with CABAC.

Fourteenth embodiment

In a thirteenth embodiment, the context value of the target bit is derived from the context variable of the skip flag of the current CU.

In a fourteenth embodiment, the context value of the target bit is derived from the context variable of the affine flag of the current CU.

This is possible because the affine flags themselves are CABAC encoded and therefore have context variables.

Preferably, the context variable of the target bit of the merge index of the current CU is set equal to the context variable of the affine flag of the current CU (the context variable of the target bit of the merge index of the current CU is copied from the context variable of the affine flag of the current CU).

The target bit may be the first bit only. The other bits are bypass-coded as in the first embodiment.

The context variables for the affine flag of the current CU are derived in the manner specified in VTM 2.0.

An advantage of this embodiment over VTM2.0 reference software is that the complexity of the merge index decoding and decoder design (and the encoder design) is reduced without affecting the coding efficiency. Indeed, with the present embodiment, at least, only 1 CABAC state is needed for merge index, rather than 5 CABAC states for current VTM merge index encoding (encoding/decoding). Furthermore, this reduces the worst case complexity because the other bits are CABAC bypass encoded, which reduces the number of operations compared to encoding all bits with CABAC.

Fifteenth embodiment

In several of the foregoing embodiments, the context variable has more than 2 values, e.g., three values 0, 1, and 2. However, to reduce complexity and reduce the number of states to be processed, the number of permitted context variable values may be limited to 2, e.g., 0 and 1. This may be accomplished, for example, by changing any initial context variable with a value of 2 to 1. In practice, this simplification has no or only a limited effect on the coding efficiency.

Combinations of embodiments with other embodiments

Any two or more of the foregoing embodiments may be combined.

The foregoing description has focused on the encoding and decoding of the merge index. For example, the first embodiment relates to: generating a merge candidate list comprising ATMVP candidates (for either classical merge mode or classical merge skip mode, i.e. non-affine merge mode or non-affine merge skip mode); selecting one of the merge candidates in the list; and generating a merge index for the selected merge candidate using CABAC encoding, one or more bits of the merge index being bypass CABAC encoded. In principle, the present invention can be applied to modes other than the merge mode (e.g., affine merge mode) that involve: generating a list of motion information predictor candidates (e.g., a list of affine merge candidates or Motion Vector Predictor (MVP) candidates); selecting one of the motion information predictor candidates (e.g., MVP candidates) in the list; and generates an identifier or index for the selected motion information predictor candidate in the list (e.g., the selected affine merge candidate or the selected MVP candidate for predicting the motion vector of the current block). Thus, the present invention is not limited to the merge mode (i.e., the classical merge mode and the classical merge skip mode), and the index to be encoded or decoded is not limited to the merge index. For example, in the development of VVCs, it is conceivable that the techniques of the foregoing embodiments may be applied to (or extended to) modes other than merge modes, such as the AMVP mode of HEVC or its equivalent in VVCs or affine merge modes, and so forth. The claims to follow should be construed accordingly.

As discussed, in the foregoing embodiments, one or more motion information candidates (e.g., motion vectors) and/or one or more affine parameters for an affine merge mode (affine merge or affine merge skip mode) are obtained from a first neighboring block that is affine-encoded among spatially neighboring blocks (e.g., at positions a1, B1, B0, a0, B2) or temporally associated blocks (e.g., a "center" block having collocated blocks or spatially neighboring blocks thereof such as "H" or the like). These positions are depicted in fig. 6a and 6 b. To enable obtaining (e.g., deriving or sharing or "merging") one or more motion information and/or affine parameters between a current block (or a group of samples/pixel values currently being encoded/decoded, e.g., a current CU) and a neighboring block (spatially neighboring or temporally associated with the current block), one or more affine merge candidates are added to a list of merge candidates (i.e., classical merge mode candidates), such that when the selected merge candidate (which is then used to signal, e.g., using a merge index, a syntax element such as "merge idx" in affine HEVC or a functionally equivalent syntax element thereof) is a merge candidate, the current CU/block is encoded/decoded using affine merge mode for the affine merge candidate.

As described above, such one or more affine merge candidates for obtaining (e.g., deriving or sharing) one or more motion information and/or affine parameters for an affine merge mode may also be signaled using a separate affine merge candidate list (or set) that may be the same as or different from the merge candidate list used for the classical merge mode.

According to an embodiment of the present invention, when the technique of the above-described embodiment is applied to the affine merge mode, the affine merge candidate list may be generated using the same technique as the motion vector derivation process for the classical merge mode as shown and described with respect to fig. 8, or the merge candidate derivation process as shown and described with respect to fig. 13. An advantage of sharing the same technique for generating/compiling a list of affine merging candidates (for affine merging mode or affine merging skip mode) and a list of merging candidates (for classical merging mode or classical merging skip mode) is that the complexity in the encoding/decoding process is reduced compared to having separate techniques.

According to another embodiment, a separate technique, shown below with respect to FIG. 24, may be used to generate/compile the list of affine merge candidates.

Fig. 24 is a flowchart illustrating affine merging candidate derivation processing for affine merging modes (affine merging mode and affine merging skip mode). In the first step of the derivation process, five block positions (2401 to 2405) are considered to obtain/derive a spatial affine merge candidate 2413. These positions are the spatial positions depicted in fig. 6a (and fig. 6B) with reference numerals a1, B1, B0, a0 and B2. In the next step, the availability of spatial motion vectors is checked and it is determined whether the inter-mode encoding blocks associated with the respective positions a1, B1, B0, a0, and B2 are each encoded with affine mode (e.g., using any of affine merging, affine merge skip, or affine AMVP mode) (2410). A maximum of five motion vectors (i.e., spatial affine merge candidates) are selected/obtained/derived. The predictor is considered available if it is present (e.g., there is information to obtain/derive a motion vector associated with that position) and the block is not intra-coded and the block is affine (i.e., coded using an affine mode).

Affine motion information is then derived/obtained (2411) for each available block position (2410). This derivation is done for the current block based on an affine model of the block location (and affine model parameters such as discussed with respect to fig. 11(a) and 11 (b)). Then, a pruning process is applied (2412) to remove candidates that give the same affine motion compensation (or have the same affine model parameters) as another candidate previously added to the list.

At the end of this stage, the list of spatially affine merge candidates includes up to five candidates.

If the number of candidates (Nb _ Cand) is strictly less than (2426) the maximum number of candidates (here Max _ Cand is the value signaled in the bitstream slice header and equals five for affine merge mode, but may be different/variable depending on the implementation).

Then, the constructed affine merge candidate (i.e., additional affine merge candidates generated to provide some diversity and near target number, functioning similarly to the combined bi-directional prediction merge candidate in HEVC, for example) is generated (2428). These constructed affine merge candidates are based on motion vectors associated with neighboring spatial and temporal positions of the current block. First, control points (2418, 2419, 2420, 2421) are defined to generate motion information for generating affine models. Two of these control points are compared with v of, for example, FIGS. 11(a) and 11(b)0And v1And correspondingly. These four control points correspond to the four corners of the current block.

If there is a block position at position B2(2405) and if the block is encoded in inter mode (2414), then the motion information for the left top of control point (2418) is obtained from the motion information for the block position at position B2 (e.g., the motion information for the left top of control point (2418) is made equal to the motion information for the block position at position B2). Otherwise, if there is a block position at position B3(2406) (as depicted in fig. 6B) and if the block is encoded in inter mode (2414), the motion information for the left control point top (2418) is obtained from the motion information for the block position at position B3 (e.g., having the motion information for the left control point top (2418) equal the motion information for the block position at position B3), and if this is not the case, the motion information for the left control point top (2418) is obtained from the motion information for the block position at position a2 (e.g., having the motion information for the left control point top (2418) equal the motion information for the block position at position a2), if there is a block position at position a2 (as depicted in fig. 6B) and if the block is encoded in inter mode (2414). When no blocks are available for the control point, the control point is considered unavailable (unusable).

If there is a block position at position B1(2402) and if the block is encoded in inter mode (2415), then the motion information for the right top of control point (2419) is obtained from the motion information for the block position at position B1 (e.g., the motion information for the right top of control point (2419) is made equal to the motion information for the block position at position B1). Otherwise, if there is a block position at position B0(2403) and if the block is encoded in inter mode (2415), the motion information for the top right side of the control point (2419) is obtained from the motion information for the block position at position B0 (e.g., the motion information for the top right side of the control point (2419) is made equal to the motion information for the block position at position B0). When no blocks are available for the control point, the control point is considered unavailable (unusable).

If there is a block position at position a1(2401) and if the block is encoded in inter mode (2416), then the motion information for the left side of the bottom of the control point (2420) is obtained from the motion information for the block position at position a1 (e.g., the motion information for the left side of the bottom of the control point (2420) is made equal to the motion information for the block position at position a 1). Otherwise, if there is a block position at position a0(2404) and if the block is encoded in inter mode (2416), the motion information for the bottom left side of the control point (2420) is obtained from the motion information for the block position at position a0 (e.g., the motion information for the bottom left side of the control point (2420) is made equal to the motion information for the block position at position a 0). When no blocks are available for the control point, the control point is considered unavailable (unusable).

If there is a collocated block location at location H (2408) (as depicted in fig. 6 a) and if the block is coded in inter mode (2417), then the motion information of the control point bottom right side (2421) is obtained from the motion information of the temporal candidate, e.g., the collocated block at location H (e.g., the motion information of the control point bottom right side (2421) is made equal to the motion information of the temporal candidate, e.g., the collocated block at location H). When no blocks are available for the control point, the control point is considered unavailable (unusable).

Based on these control points, up to 10 constructed affine merge candidates may be generated (2428). These candidates are generated based on affine patterns with 4, 3 or 2 control points. For example, 4 control points may be used to generate the affine merge candidate of the first construction. The next 4 affine merge candidates constructed are then 4 possible ways that can be generated using 4 different sets of 3 control points (i.e. 4 different possible combinations of sets containing 3 of the 4 available control points). Then, the other constructed affine merge candidates are candidates generated using different sets of 2 control points (i.e., different possible combinations of sets containing 2 of the 4 control points).

If after adding these additional (constructed) affine merge candidates the number of candidates (Nb _ Cand) remains strictly smaller (2430) than the maximum number of candidates (Max _ Cand), then further additional virtual motion information candidates such as zero motion vector candidates (or even bipredictive merge candidates combined if applicable) are added/generated (2432) until the number of candidates in the affine merge candidate list reaches the target number (e.g. the maximum number of candidates).

At the end of this processing, a list or set of affine merge mode candidates (i.e., a list or set of candidates for affine merge modes (affine merge mode and affine merge skip mode)) is generated/constructed (2434). As illustrated in fig. 24, a list or set of affine merge (motion vector predictor) candidates is constructed/generated from a subset of the spatial candidates (2401 to 2407) and the temporal candidate (2408) (2434). It should be understood that other affine merge candidate derivation processes in a different order for checking availability, pruning processing, or number/type of potential candidates (e.g., ATMVP candidates may also be added in a manner similar to the merge candidate list derivation process in fig. 13 or fig. 16) may also be used to generate the list/set of affine merge candidates according to embodiments of the present invention.

The following embodiments illustrate how a list (or set) of affine merge candidates may be used to signal (e.g., encode or decode) a selected affine merge candidate (which may be signaled using a merge index for a merge mode or a separate affine merge index specifically used with an affine merge mode).

In the following examples: a merge mode (i.e., a merge mode other than an affine merge mode defined later, in other words, a classical non-affine merge mode or a classical non-affine merge skip mode) is a merge mode that obtains (or derives or shares with) a current block motion information of spatially neighboring or temporally associated blocks; merge mode predictor candidates (i.e., merge candidates) are information related to one or more spatially neighboring or temporally associated blocks for which the current block may obtain/derive motion information in merge mode; the merge mode predictor is a selected merge mode predictor candidate, wherein information of the selected merge mode predictor candidate is used in predicting motion information of the current block and during signaling of an index (e.g., a merge index) identifying the merge mode predictor from a list (or set) of merge mode predictor candidates in a merge mode (e.g., encoding or decoding) process; the affine merge mode is a merge mode in which motion information of spatially neighboring or temporally associated blocks is obtained (derived for or shared with the current block) for the current block so that the obtained/derived/shared motion information can be utilized by motion information and/or affine parameters of affine mode processing (or affine motion model processing) of the current block; affine merge mode predictor candidates (i.e., affine merge candidates) are information about one or more spatially neighboring or temporally associated blocks for which the current block may obtain/derive motion information in affine merge mode; the affine merge mode predictor is a selected affine merge mode predictor candidate that can be used in an affine motion model when predicting motion information of the current block and during signaling of an index (e.g., affine merge index) identifying the affine merge mode predictor from a list (or set) of affine merge mode predictor candidates in an affine merge mode (e.g., encoding or decoding) process. It should be understood that in the following embodiments, an affine merge mode is a merge mode with its own affine merge index (as an identifier of a variable) for identifying one affine merge mode predictor candidate from a list/set of candidates (also referred to as "affine merge list" or "sub-block merge list"), as opposed to having a single index value associated therewith, where the affine merge index is signaled to identify that particular affine merge mode predictor candidate.

It is to be understood that in the following embodiments, a "merge mode" refers to any of the classic merge mode or the classic merge skip mode in HEVC/JEM/VTM, or any functionally equivalent mode, provided that such obtaining (e.g. deriving or sharing) of motion information and signaling of merge indices as described above is used in said mode. "affine merge mode" also refers to any of an affine merge mode or an affine merge skip mode (if present and using such derivation/derivation), or any other functionally equivalent mode (assuming the same features are used in the modes).

Sixteenth embodiment

In a sixteenth embodiment, motion information predictor indices for identifying affine merge mode predictors (candidates) from an affine merge candidate list are signaled using CABAC coding, wherein one or more bits of the motion information predictor indices are bypass CABAC coded.

According to a first variant of this embodiment, at the encoder, the motion information predictor index for affine merging mode is encoded by: generating a list of motion information predictor candidates; selecting one of the motion information predictor candidates in the list as an affine merging mode predictor; and generating a motion information predictor index of the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded. Data indicating the index of the selected motion information predictor candidate is then included in the bitstream. The decoder then decodes the motion information predictor index for the affine merge mode from the bitstream comprising the data by: generating a list of motion information predictor candidates; decoding a motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; when the affine merge mode is used, one of the motion information predictor candidates in the list is identified as an affine merge mode predictor using the decoded motion information predictor index.

According to another variant of the first variant, when using merge mode, one of the motion information predictor candidates in the list may also be selected as a merge mode predictor, such that when using merge mode, the decoder may use the decoded motion information predictor index (e.g. merge index) to identify one of the motion information predictor candidates in the list as a merge mode predictor. In this further variant, an affine merge index is used to signal an affine merge mode predictor (candidate), and the signaling of the affine merge index is implemented using a similar signaling index as the signaling of the merge index according to any one of the first to fifteenth embodiments or the signaling of the merge index used in current VTM or HEVC.

In this variant, when using merge mode, the signaling of the merge index may be implemented using the signaling of the merge index according to any of the first to fifteenth embodiments or the signaling of the merge index used in current VTM or HEVC. In this variation, signaling the affine merge index and signaling the merge index may use different signaling indexing schemes. The advantage of this variant is that by efficient index coding/signaling using both affine merge mode and merge mode, a better coding efficiency is achieved. Furthermore, in this variant, separate syntax elements may be used for Merge indices (such as "Merge _ idx [ ] [ ]" in HEVC or its functional equivalents) and affine Merge indices (such as "a _ Merge _ idx [ ] [ ]" or the like). This enables the merge index and affine merge index to be signaled (encoded/decoded) independently.

According to yet another variant, when merge mode is used and one of the motion information predictor candidates in the list can also be selected as a merge mode predictor, CABAC encoding uses the same context variable for both modes (i.e., when affine merge mode is used and when merge mode is used) for at least one bit of the motion information predictor index (e.g., merge index or affine merge index) of the current block, such that the affine merge index and the at least one bit of the merge index share the same context variable. The decoder then identifies one of the motion information predictor candidates in the list as a merge mode predictor using the decoded motion information predictor index when merge mode is used, wherein CABAC decoding uses the same context variable for both modes (i.e., when affine merge mode is used and when merge mode is used) for at least one bit of the motion information predictor index for the current block.

According to a second variant of this embodiment, at the encoder, the motion information predictor index is encoded by: generating a list of motion information predictor candidates; when the affine merging mode is used, selecting one of the motion information predictor candidates in the list as an affine merging mode predictor; selecting one of the motion information predictor candidates in the list as a merge mode predictor when the merge mode is used; and generating a motion information predictor index of the selected motion information predictor candidate using CABAC coding, one or more bits of the motion information predictor index being bypass CABAC coded. Data indicating the index of the selected motion information predictor candidate is then included in the bitstream. The decoder then decodes the motion information predictor index from the bitstream by: generating a list of motion information predictor candidates; decoding the motion information predictor index using CABAC decoding, one or more bits of the motion information predictor index being bypass CABAC decoded; identifying one of the motion information predictor candidates in the list as an affine merge mode predictor using the decoded motion information predictor index when using the affine merge mode; and when the merge mode is used, identifying one of the motion information predictor candidates in the list as a merge mode predictor using the decoded motion information predictor index.

According to another variant of the second variant, the signaling of the affine merging index and the signaling of the merging index uses the same signaling indexing scheme according to any one of the first to fifteenth embodiments or the signaling of the merging index used in current VTM or HEVC. The advantage of this further variant is a simple design during implementation, which may also result in lower complexity. In this variant, when the affine merging mode is used, the CABAC encoding of the encoder comprises: for at least one bit of a motion information predictor index (affine merge index) of the current block, using a context variable that is separable from another context variable for the at least one bit of the motion information predictor index (merge index) when using the merge mode; and including data indicating that the affine merge mode is used in the bitstream such that context variables for the affine merge mode and the merge mode can be distinguished (clearly identified) for CABAC decoding processing. The decoder then obtains from the bitstream data indicating that an affine merge mode is used in the bitstream; and when affine merge mode is used, CABAC decoding uses this data to distinguish between context variables for affine merge indices and merge indices. Furthermore, at the decoder, the data indicating that the affine merging mode is used may also be used for generating a list (or set) of affine merging mode predictor candidates if the obtained data indicates that the affine merging mode is used, or for generating a list (or set) of merging mode predictor candidates if the obtained data indicates that the merging mode is used.

This variation enables the merge index and the affine merge index to be signaled using the same signaling indexing scheme, while the merge index and the affine merge index are still encoded/decoded (e.g., by using separate context variables) independently of each other.

One way to use the same signaling indexing scheme is to use the same syntax elements for both the affine merge index and the merge index, i.e., to encode the motion information predictor index of the selected motion information predictor candidate using the same syntax elements for both cases when affine merge mode is used and when merge mode is used. Then, at the decoder, the motion information predictor index is decoded by parsing the same syntax elements from the bitstream, regardless of whether the current block is encoded (and being decoded) using affine merge mode or merge mode.

Fig. 22 shows a partial decoding process of some syntax elements related to the encoding mode (i.e., the same signaling indexing scheme) according to this variation of the sixteenth embodiment. The figure shows that the affine merge index (2255- "merge idx affine") of the affine merge mode (2257: yes) and the merge index (2258- "merge idx") of the merge mode (2257: no) are signaled with the same signaling index scheme. It should be understood that in some variations, the affine merge candidate list may include ATMVP candidates, as in the merge candidate list of the current VTM. The encoding of affine merge index is similar to the encoding of merge index for merge mode depicted in fig. 10(a) and 10 (b). In some variations, even if affine merge candidate derivation does not define ATMVP merge candidates, when ATMVP is enabled for a merge mode with at most 5 other candidates (i.e., 6 candidates in total), the affine merge index is encoded as described in fig. 10(b) such that the maximum number of candidates in the affine merge candidate list matches the maximum number of candidates in the merge candidate list. Thus, each bit of the affine merge index has its own context. All context variables used to signal the bits of the merge index are independent of the context variables used to signal the bits of the affine merge index.

According to another variant, the same signaled indexing scheme shared by the signaled and affine merge indices uses CABAC coding only for the first bin, as in the first embodiment. That is, all bits of the motion information predictor index except the first bit are bypass CABAC encoded. In this other variation of the sixteenth embodiment, when ATMVP is included as a candidate in one of the merge candidate list or the affine merge candidate list (e.g., when ATMVP is enabled at the SPS level), the encoding of each index (i.e., the merge index or the affine merge index) is modified so that only the first bit of the index is encoded by CABAC using a single context variable, as shown in fig. 14. When ATMVP is not enabled at the SPS level, this single context is set in the same way as in the current VTM reference software. The other bits (from bit 2 to 5 or bit 4 if there are only 5 candidates in the list) are bypass coded. When ATMVP is not included as a candidate in the merge candidate list (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates and 5 affine merge candidates available. Only the first bit of the merge index of the merge mode is encoded by CABAC using the first single context variable. And only the first bit of the affine merge index of the affine merge mode is encoded by CABAC using the second single context variable. These first and second context variables are set in the same way as in the current VTM reference software when ATMVP is not enabled at the SPS level for both the merge index and the affine merge index. The other bits (2 nd to 4 th bits) are bypass decoded.

The decoder generates the same merge candidate list as the encoder and the same affine merge candidate list. This is accomplished using the method of fig. 22. Although the same signaling indexing scheme is used for both merge mode and affine merge mode, the affine flag (2256) is used to determine whether the data currently being decoded is for a merge index or an affine merge index, such that the first and second context variables may be separate (or distinguishable) from each other for the CABAC decoding process. That is, the affine flag (2256) is used (i.e., at step 2257) during the index decoding process to determine whether to decode "merge idx 2258" or "merge idx affine 2255". When ATMVP is not included as a candidate in the merge candidate list (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates for the two candidate lists (for merge mode and affine merge mode). Only the first bit of the merge index is decoded by CABAC using the first single context variable. And decode only the first bit of the affine merge index by CABAC using the second single context variable. All other bits (from bit 2 to 4) are bypass decoded. In contrast to current reference software, when ATMVP is included as a candidate in the merge candidate list (e.g., when ATMVP is enabled at the SPS level), only the first bit of the merge index is decoded by CABAC using a first single context variable when decoding the merge index and a second single context variable when decoding the affine merge index. The other bits (from 2 nd to 5 th or 4 th bit) are bypass decoded. The decoded index is then used to identify the candidate selected by the encoder from the list of respective candidates (i.e., merge candidates or affine merge candidates).

The advantage of this variant is that using the same signaled indexing scheme for both merge indices and affine merge indices allows reducing the complexity of the index decoding and decoder design (and encoder design) for implementing these two different modes without significantly impacting the encoding efficiency. In fact, for this variable, only 2 CABAC states (for each of the first and second single context variables) are required to signal the index, rather than the 9 or 10 required if all bits of the merge index and all bits of the affine merge index were CABAC encoded/decoded. Furthermore, the worst-case complexity is reduced because all other bits (except the first bit) are CABAC bypass encoded, which reduces the number of operations required during the CABAC encoding/decoding process compared to encoding all bits with CABAC.

According to a further variant, CABAC encoding or decoding uses the same context variable for at least one bit of the motion information predictor index of the current block for the case of using affine merge mode and for the case of using merge mode. In this further variant, the context variables for the first bit of the merged index and the first bit of the affine merged index are independent of which index is being encoded or decoded, i.e. the first and second single context variables (from the previous variant) are indistinguishable/separable and are the same single context variable. Thus, in contrast to the previous variant, the merge index and the affine merge index share one context variable during CABAC processing. As shown in fig. 23, the signaling indexing scheme is the same for both merge indices and affine merge indices, i.e., only one type of index "merge idx (2308)" is encoded or decoded for both modes. In the case of a CABAC decoder, the same syntax elements are used for both the merge index and the affine merge index, and there is no need to distinguish them when considering the context variables. Therefore, it is not necessary to use the affine flag (2306) to determine whether the current block is encoded (to be decoded) in the affine merge mode as in step (2257) of fig. 22, and there is no branch after step 2306 of fig. 23 because only one index ("merge idx") needs to be decoded. The affine flag is used for motion information prediction with affine merge mode (i.e., during the prediction process after the CABAC decoder has decoded the index ("merge idx")). Furthermore, only the first bit of the index (i.e., the merge index and affine merge index) is CABAC encoded using one single context, and the other bits are bypass encoded, as described for the first embodiment. Thus, in this further variant, one context variable of the first bit of the merge index and the affine merge index is shared by both the signaled merge index and the affine merge index. If the size of the candidate list is different for the merge index and the affine merge index, the maximum number of bits used to signal the relevant index for each case may also be different, i.e. they are independent of each other. Thus, if desired, the number of bits to be bypass encoded may be adapted accordingly according to the value of the affine flag (2306), e.g. to enable parsing of the relevant indexed data from the bitstream.

The advantage of this variant is that the complexity of the merge index and affine merge index decoding process and decoder design (and encoder design) is reduced without significant impact on the coding efficiency. Indeed, for this further variant, only 1 CABAC state is required when signaling both the merge index and the affine merge index, instead of the CABAC states of the previous variant or of 9 or 10. Furthermore, the worst-case complexity is reduced because all other bits (except the first bit) are CABAC bypass encoded, which reduces the number of operations required during the CABAC encoding/decoding process compared to encoding all bits with CABAC.

In the foregoing variation of this embodiment, the signaled affine merge index and merge index may share one or more contexts as described in any of the first through fifteenth embodiments. The advantage of this scenario is the reduction in complexity due to the reduction in the number of contexts needed to encode or decode these indices.

In the aforementioned variant of the present embodiment, the motion information predictor candidate comprises information for obtaining (or deriving) one or more of: direction, identification of the list, reference frame index, and motion vector. Preferably, the motion information predictor candidate comprises information for obtaining a motion vector predictor candidate. In a preferred variant, the motion information predictor index (e.g. affine merge index) is used to signal an affine merge mode predictor candidate, and the signaling of the affine merge index is implemented using a similar signaling index as the signaling of the merge index according to any one of the first to fifteenth embodiments or the signaling of the merge index used in current VTM or HEVC, with the motion information predictor candidate of affine merge mode as the merge candidate.

In the foregoing variation of the present embodiment, the generated motion information predictor candidate list includes ATMVP candidates as in the first embodiment or as in some of the other foregoing second to fifteenth embodiments. Optionally, the generated motion information predictor candidate list does not include ATMVP candidates.

In the foregoing variation of this embodiment, the maximum number of candidates that may be included in the candidate lists of the merge index and affine merge index is fixed. The maximum number of candidates that may be included in the candidate lists of the merge index and the affine merge index may be the same. Then, data for determining (or indicating) the maximum number (or target number) of motion information predictor candidates that can be included in the generated motion information predictor candidate list is included in the bitstream by the encoder, and the decoder obtains data for determining the maximum number (or target number) of motion information predictor candidates that can be included in the generated motion information predictor candidate list from the bitstream. This enables parsing of data for decoding a merge index or an affine merge index from the bitstream. This data used to determine (or indicate) the maximum number (or target number) may be the maximum number (or target number) at decoding itself, or may enable a decoder to determine this maximum/target number in conjunction with other parameters/syntax elements (e.g., "five _ minus _ max _ num _ merge _ cand" or "maxnummergeecd-1" or its functional equivalent parameters as used in HEVC).

Alternatively, if the maximum number (or target number) of candidates in the candidate lists of the merge index and the affine merge index may vary or may differ (e.g., because the use of ATMVP candidates or any other optional candidate may not be enabled or disabled for one list and for the other list, or because the lists use different candidate list generation/derivation processes), then the maximum number (or target number) of motion information predictor candidates that may be included in the generated motion information predictor candidate lists in the case of using the affine merge mode and in the case of using the merge mode may be determined separately, and the encoder includes data for determining the maximum number/target number in the bitstream. The decoder then obtains data for determining the maximum/target number from the bitstream and parses or decodes the motion information predictor index using the obtained data. The affine flag may then be used to switch between parsing or decoding the merge index and the affine merge index, for example.

Implementation of embodiments of the invention

One or more of the foregoing embodiments are implemented by the processor 311 of the processing device 300 in fig. 3, or the respective functional modules/units of the encoder 400 in fig. 4, the decoder 60 in fig. 5, the CABAC encoder in fig. 17, or its respective CABAC decoder, performing the method steps of one or more of the foregoing embodiments.

FIG. 19 is a schematic block diagram of a computing device 1300 for implementing one or more embodiments of the present invention. Computing device 1300 may be a device such as a microcomputer, workstation, or lightweight portable device. Computing device 1300 includes a communication bus connected to: a Central Processing Unit (CPU)2001, such as a microprocessor or the like; a Random Access Memory (RAM)2002 for storing executable code of the method of an embodiment of the invention and registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least a part of an image according to an embodiment of the invention, the storage capacity of which can be expanded, for example, by means of an optional RAM connected to an expansion port; a Read Only Memory (ROM)2003 for storing a computer program for implementing an embodiment of the present invention; a network interface (NET)2004, which is typically connected to a communication network through which digital data to be processed is transmitted or received, the network interface (NET)2004 may be a single network interface or consist of a set of different network interfaces (e.g. wired and wireless interfaces, or different kinds of wired or wireless interfaces) into which data packets are written for transmission or from which data packets are read for reception under the control of a software application running in the CPU 2001; a User Interface (UI)2005, which may be used to receive input from a user or display information to a user; a Hard Disk (HD)2006, which may be provided as a mass storage device; an input/output module (IO)2007, which may be used to receive/transmit data from/to external devices (such as a video source or display, etc.). The executable code may be stored in the ROM 2003, on the HD 2006, or on a removable digital medium such as a disk. According to a variant, the executable code of the program may be received by means of the communication network via NET2004 to be stored in one of the storage means (such as HD 2006, etc.) of the communication device 1300 before being executed. The CPU 2001 is adapted to control and direct the execution of instructions or portions of software code of one or more programs according to embodiments of the present invention, which instructions are stored in one of the aforementioned memory components. For example, after power-on, the CPU 2001 can execute those instructions related to the software application from the main RAM memory 2002 after the instructions are loaded from the program ROM 2003 or HD 2006. Such software applications cause the steps of the method according to the invention to be performed when executed by the CPU 2001.

It will also be appreciated that according to other embodiments of the invention, the decoder according to the above embodiments is provided in a user terminal such as a computer, a mobile phone (cellular phone), a tablet or any other type of device (e.g. display device) capable of providing/displaying content to a user. According to a further embodiment, the encoder according to the above-described embodiments is provided in an image capturing device further comprising a camera, video camera or web camera (e.g. a closed circuit television or video surveillance camera) for capturing and providing content for encoding by the encoder. Two such examples are provided below with reference to fig. 20 and 21.

Fig. 20 is a diagram illustrating a network camera system 2100 including a network camera 2102 and a client device 2104.

The web camera 2102 includes an image capturing unit 2106, an encoding unit 2108, a communication unit 2110, and a control unit 2112.

The network camera 2102 and the client device 2104 are connected to each other via the network 200 to be able to communicate with each other.

The image pickup unit 2106 includes a lens and an image sensor (e.g., a Charge Coupled Device (CCD) or a Complementary Metal Oxide Semiconductor (CMOS)), and captures an image of a subject and generates image data based on the image. The image may be a still image or a video image. The camera unit may further comprise a zoom component and/or a pan component adapted to zoom or pan (optically or digitally), respectively.

The encoding unit 2108 encodes the image data by using the encoding method described in the first to sixteenth embodiments. The encoding unit 2108 uses at least one of the encoding methods explained in the first to sixteenth embodiments. For other examples, the encoding unit 2108 may use a combination of the encoding methods described in the first to sixteenth embodiments.

The communication unit 2110 of the network camera 2102 transmits the encoded image data encoded by the encoding unit 2108 to the client device 2104.

Further, the communication unit 2110 receives a command from the client device 2104. The command includes a command for setting parameters for encoding of the encoding unit 2108.

The control unit 2112 controls other units in the network camera 2102 in accordance with the command received by the communication unit 2110.

The client device 2104 includes a communication unit 2114, a decoding unit 2116, and a control unit 2118.

The communication unit 2114 of the client apparatus 2104 transmits a command to the network camera 2102.

Further, the communication unit 2114 of the client apparatus 2104 receives the encoded image data from the network camera 2102.

The decoding unit 2116 decodes the encoded image data by using the decoding method described in any of the first to sixteenth embodiments. For other examples, the decoding unit 2116 may use a combination of the decoding methods described in the first to sixteenth embodiments.

The control unit 2118 of the client device 2104 controls other units in the client device 2104 in accordance with a user operation or command received by the communication unit 2114.

The control unit 2118 of the client device 2104 controls the display device 2120 to display the image decoded by the decoding unit 2116.

The control unit 2118 of the client device 2104 also controls the display device 2120 to display a GUI (graphical user interface) for specifying the values of the parameters of the web camera 2102 (including the encoded parameters for the encoding unit 2108).

The control unit 2118 of the client device 2104 also controls other units in the client device 2104 in accordance with user operation input to the GUI displayed by the display device 2120.

The control unit 2118 of the client device 2104 controls the communication unit 2114 of the client device 2104 in accordance with user operation input to the GUI displayed by the display device 2120 to transmit a command for specifying the value of the parameter of the web camera 2102 to the web camera 2102.

The webcam system 2100 can determine whether the camera 2102 utilizes zoom or pan during recording of video, and can use such information when encoding a video stream, as zoom or pan during capture can benefit from the use of an affine pattern that is well suited for encoding complex motions such as zoom, rotation, and/or stretch (which can be a side effect of pan, especially if the lens is a "fisheye" lens).

Fig. 21 is a diagram illustrating a smartphone 2200.

The smartphone 2200 includes a communication unit 2202, a decoding/encoding unit 2204, a control unit 2206, and a display unit 2208.

The communication unit 2202 receives encoded image data via a network.

Decoding section 2204 decodes the encoded image data received by communication section 2202.

The decoding unit 2204 decodes the encoded image data by using the decoding methods described in the first to sixteenth embodiments. The decoding unit 2204 may use at least one of the decoding methods explained in the first to sixteenth embodiments. For other examples, the decoding/encoding unit 2204 may use a combination of the decoding methods described in the first to sixteenth embodiments.

The control unit 2206 controls other units in the smartphone 2200 in accordance with user operations or commands received by the communication unit 2202.

For example, the control unit 2206 controls the display device 2208 to display the image decoded by the decoding unit 2204.

The smartphone may also include an image recording device 2210 (e.g., a digital camera and associated circuitry) for recording images or video. Such recorded images or videos may be encoded by the decoding/encoding unit 2204 under the instruction of the control unit 2206.

The smartphone may also include a sensor 2212 adapted to sense the orientation of the mobile device. Such sensors may include accelerometers, gyroscopes, compasses, Global Positioning (GPS) units, or similar location sensors. Such a sensor 2212 can determine whether the smartphone changes orientation and can use such information when encoding a video stream, as changes in orientation during capture can benefit from the use of an affine pattern that is well suited for encoding complex motions such as rotations.

Substitutions and modifications

It will be appreciated that it is an object of the present invention to ensure that the affine mode is utilised in the most efficient manner, and that certain examples discussed above involve signalling the use of affine modes according to the likelihood that perceived affine modes are useful. Further examples of the invention may be applied to encoders when it is known that complex motion is being encoded, in which case affine transformations may be particularly effective. Examples of such cases include:

a) camera zoom in/out

b) A portable camera (e.g., a mobile phone) changes orientation (i.e., rotational movement) during shooting.

c) "fisheye" lens camera panning (e.g., stretching/distorting a portion of an image)

In this way, an indication of complex motion can be proposed during the recording process, so that affine patterns can be given a higher probability for a slice, a sequence of frames or indeed the video stream as a whole.

In a further example, depending on the characteristics or functions of the apparatus for recording video, a higher probability may be given that an affine pattern is used. For example, a mobile device may be more likely to change orientation than, for example, a fixed security camera (security camera), and thus an affine model may be more suitable for encoding video from the former. Examples of features or functions include: presence/use of a zoom component, presence/use of a position sensor, presence/use of a pan component, whether the device is portable, or a user selection on the device.

While the invention has been described with reference to the embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. It will be understood by those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

It will also be appreciated that any of the results of the above-described comparison, determination, evaluation, selection, execution, performance, or consideration (e.g., selections made during encoding or filtering processes) may be indicated in or may be determined/inferred from data in the bitstream (e.g., flags or data indicative of the results) such that the indicated or determined/inferred results may be used for processing rather than actually being compared, determined, evaluated, selected, executed, performed, or considered, e.g., during a decoding process.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be used to advantage.

Reference signs appearing in the claims are provided by way of illustration only and shall not be construed as limiting the scope of the claims.

In the foregoing embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and may be executed by a hardware-based processing unit.

The computer readable medium may include a computer readable storage medium corresponding to a tangible medium such as a data storage medium or a communication medium including any medium that facilitates transfer of a computer program from one place to another, for example, according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable gates/logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.

76页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:位置相关预测组合的模式相关和大小相关块级限制的方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类