Method and apparatus for video encoding and decoding using bi-prediction

文档序号：690453 发布日期：2021-04-30 浏览：25次中文

阅读说明：本技术 使用双预测进行视频编码和解码的方法和装置 (Method and apparatus for video encoding and decoding using bi-prediction ) 是由 F·莱纳内克 T·波里尔 P·博尔德斯于 2019-09-18 设计创作，主要内容包括：描述了不同的实现方式,尤其呈现了使用具有双预测的运动补偿的视频编码和解码的实现方式。所述编码方法包括：对于图片,使用第一参考图片获得所述图片的块的第一预测值；使用第二参考图片获得所述图片的所述块的第二预测值；使用所述第一预测值及所述第二预测值来形成用于双预测帧间预测中的所述块的第三预测值,其中所述第三预测值是作为所述第一预测值与所述第二预测值的加权平均值而被获得；并且其中在所述加权预测中使用的权重取决于所述样本在所述块中的位置。还给出了其它实施例,用于实现块三角形分区预测,用于使用多个模式实现块分区预测,以及用于解码方法中的相应运动补偿。(Various implementations are described, particularly implementations of video encoding and decoding using motion compensation with bi-prediction are presented. The encoding method comprises the following steps: for a picture, obtaining a first prediction value for a block of the picture using a first reference picture; obtaining a second prediction value for the block of the picture using a second reference picture; forming a third predictor for the block in bi-predictive inter prediction using the first predictor and the second predictor, wherein the third predictor is obtained as a weighted average of the first predictor and the second predictor; and wherein the weights used in the weighted prediction depend on the position of the sample in the block. Other embodiments are presented for implementing block triangle partition prediction, for implementing block partition prediction using multiple modes, and for corresponding motion compensation in a decoding method.)

1. A method for video encoding, comprising:

obtaining (1510) a first prediction value for a block of a picture using a first reference picture;

obtaining (1510) a second prediction value for the block of the picture using a second reference picture;

using (1540, 1550) the first prediction value and the second prediction value to form a third prediction value for the block of the picture in bi-predictive inter prediction, wherein the third prediction value is obtained as a weighted average of the first prediction value and the second prediction value; and

wherein the samples of the third predictor are obtained by applying a first weight to the samples of the first predictor and by applying a second weight to the samples of the second predictor; the samples of the third predictor, the samples of the first predictor, and the samples of the second predictor share a same position in the block; and wherein the first weight and the second weight depend on the position of the sample in the block.

2. A method for video decoding, comprising:

obtaining (1510) a first prediction value for a block of a picture using a first reference picture;

obtaining (1510) a second prediction value for the block of the picture using a second reference picture;

3. An apparatus for video encoding, comprising:

one or more processors, wherein the one or more processors are configured to:

obtaining a first prediction value for a block of a picture using a first reference picture;

obtaining a second prediction value for the block of the picture using a second reference picture;

forming a third prediction value for the block of the picture in bi-predictive inter prediction using the first prediction value and the second prediction value, wherein the third prediction value is obtained as a weighted average of the first prediction value and the second prediction value; and

4. An apparatus for video decoding, comprising:

one or more processors, wherein the one or more processors are configured to:

obtaining a first prediction value for a block of a picture using a first reference picture;

obtaining a second prediction value for the block of the picture using a second reference picture;

5. The method of claim 1 or 2, or the apparatus of claims 3 or 4, wherein the weighted average of the first predictor and the second predictor is processed at a bit depth greater than or equal to a bit depth of the first predictor, the second predictor, and the third predictor.

6. The method of any of claims 1, 2, and 5, or the apparatus of any of claims 3-5, wherein a first information indicating that the block of the picture is partitioned with triangle partitions is obtained, the first and second weights depending on a distance between the sample and an edge of the triangle partition of the block.

7. The method of claim 6 or the device of claim 6, wherein a second information indicating a direction of the edge of the triangular partition indicating the block of the picture is obtained.

8. The method of claim 6 or 7, or the apparatus of claim 6 or 7, wherein a third information indicating the location of the edge of the triangle partition in the block is obtained.

9. The method of any of claims 6, 7, and 8, or the apparatus of any of claims 6, 7, and 8, wherein a fourth information indicating whether the edge of the triangle partition is vertical or horizontal in the block is obtained.

10. The method of any of claims 6-9 or the apparatus of any of claims 6-9, wherein at least one of the first information, second information, third information, or fourth information is entropy encoded or entropy decoded.

11. The method of any of claims 1, 2, and 5-10, or the apparatus of any of claims 3-10, wherein the block of the picture comprises a luma component and two chroma components, and wherein the first weight and the second weight further depend on the luma component or chroma components.

12. A non-transitory computer readable medium containing data content generated by the apparatus of any one of claims 3-11 or by the method of any one of claims 1, 2 and 4-11 for playback using a processor.

13. A computer program product comprising computing instructions for performing the method of any one of claims 1, 2, and 5-11 when executed by one or more processors.

14. A signal comprising encoded video, the signal being formed by performing the steps of:

obtaining a first prediction value for a block of a picture using a first reference picture;

obtaining a second prediction value for the block of the picture using a second reference picture;

1. Field of the invention

A method and apparatus for encoding video into a bitstream is disclosed. Corresponding decoding methods and apparatus are also disclosed. At least some embodiments further relate to bi-prediction (bi-prediction) of inter-coded blocks in video compression schemes.

2. Background of the invention

The technical field of one or more implementations relates generally to video compression. To achieve high compression efficiency, image and video coding schemes typically employ prediction and transform to exploit spatial and temporal redundancy in video content. Generally, intra or inter prediction is used to exploit intra or inter correlation, and then transform, quantize, and entropy encode the difference between the original block and the predicted block (which is often denoted as prediction error or prediction residual). To reconstruct the video, the compressed data is decoded by inverse processes corresponding to the entropy encoding, quantization, transformation, and prediction. In the HEVC video compression standard (also referred to as recommendation ITU-t h.265), the bi-prediction process used in inter prediction involves averaging 2 uni-directional prediction signals. Fig. 1 illustrates a bi-prediction process in HEVC. As shown in fig. 1, the averaging of the 2 unidirectional predictions is done with a higher precision than the input bit depth or the internal bit depth. The bi-prediction equation is shown in equation 1, where an offset (offset) and a shift (shift) are used to normalize the final predictor (predictor) to the input bit depth:

P_bidir＝(P_L0+P_L1+ offset > shift equation 1

HEVC interpolation filters allow for certain implementation optimizations due to the absence of rounding in the intermediate stages.

Recent additions to video compression techniques include various industry standards, various versions of reference software and/or documents, such as Joint Exploration Models (JEMs) developed by the jfet (joint video exploration team) group and later VTMs (universal video coding (VVC) test models). The goal is to make further improvements to the existing HEVC (high efficiency video coding) standard. For example, in a more recent approach to video codecs, multiple weights are used to average 2 uni-directional predictions to obtain a bi-directional prediction. Typically, the weights used are { -1/4,5/4}, {3/8,5/8} or {1/2,1/2} ({1/2,1/2} are the weights implemented in HEVC), and the bi-predictive formula is modified as in equation 2. Only one weight is used for the whole block.

P_bidir＝((1-w₁)*P_L0+w₁*P_L1+ offset) shift equation 2

In another approach of video codecs, triangle prediction is used in merge mode. Fig. 2 shows the partitioning of the coding unit CU into two triangle prediction units. As shown in fig. 2, a CU is partitioned into two triangle prediction units PU0 and PU1, in either a diagonal or an inverse diagonal direction along a diagonal edge. Each triangle prediction unit in a CU is inter-predicted using its own motion vector and a reference frame index derived from the merge candidate list. In this context, an adaptive weighting process is applied to the diagonal or inverse diagonal edge between the two triangular prediction units to derive a final prediction for the entire CU. Fig. 3 shows the weighting process for the diagonal edges between the two triangle prediction units. The triangle prediction unit mode is applied only to CUs in skip or merge mode. When the triangle prediction unit mode is applied to the CU, an index indicating a direction in which the CU is partitioned into two triangle prediction units and motion vectors for the two triangle prediction units are signaled. For two prediction units, a common list with 5 unidirectional predictors can be derived, examining the same spatial and temporal positions as in the classical merging process, but using only unidirectional vectors. If there are not enough candidates, no redundant motion vectors are added to the list and zero motion vectors are added at the end of the list. For a given prediction unit, the number of motion vector predictors is 5, and for each diagonal 20 combinations are tested (5 × 4 — 20, the same motion vector predictor cannot be used for both PUs). The indices range from 0 to 39 and, referring to table 2, this lookup table is used to derive the partitioning direction and motion vector for each PU from the indices. The first element of a given triplet gives the diagonal direction, and the second and third elements give the predictor indices of PU0 and PU1, respectively. The index syntax is shown in table 1.

Table 1: triangle partitions and corresponding merge index syntax

Table 2: lookup table for determining diagonal directions and predicted values

Fig. 4 illustrates sub-block motion vector storage for triangle partitions according to a particular compression scheme. In one implementation, a motion vector is stored for each 4 x 4 sub-block. When triangular partitions are used for a CU, the motion vectors for each partition are stored in the same way for each sub-block, but for sub-blocks on the edges, only the motion vectors from one PU are stored, as shown in fig. 4.

The combination of bi-prediction and triangle partitioning of inter-coded blocks raises implementation issues. Therefore, a less computational method for bi-prediction is needed. Accordingly, embodiments are disclosed to improve bi-prediction with respect to inter-coded blocks.

3. Summary of the invention

According to an aspect of the present disclosure, a method for encoding a picture is disclosed. The method comprises the following steps: obtaining a first prediction value for a block of a picture using a first reference picture; obtaining a second prediction value for the block of the picture using a second reference picture; forming a third prediction value for the block of the picture in bi-predictive inter prediction using the first prediction value and the second prediction value, wherein the third prediction value is obtained as a weighted average of the first prediction value and the second prediction value; and wherein the samples of the third predictor are obtained by applying a first weight to the samples of the first predictor and by applying a second weight to the samples of the second predictor; the samples of the third predictor, the samples of the first predictor, and the samples of the second predictor share a same position in the block; and the first weight and the second weight depend on the position of the sample in the block.

According to another aspect of the present disclosure, an apparatus for encoding a picture is disclosed. Means for obtaining a first prediction value for a block of a picture using a first reference picture; means for obtaining a second prediction value for the block of the picture using a second reference picture; means for forming a third prediction value for the block of the picture in bi-prediction inter prediction using the first prediction value and the second prediction value, wherein the third prediction value is obtained as a weighted average of the first prediction value and the second prediction value; and wherein the samples of the third predictor are obtained by applying a first weight to the samples of the first predictor and by applying a second weight to the samples of the second predictor; the samples of the third predictor, the samples of the first predictor, and the samples of the second predictor share a same position in the block; and the first weight and the second weight depend on the position of the sample in the block. .

According to an aspect of the present disclosure, there is provided an apparatus for encoding a picture, the apparatus comprising a processor and at least one memory coupled to the processor, the processor being configured to implement any variant of the encoding method.

In accordance with another aspect of the present disclosure, a method for decoding video is disclosed. The method comprises the following steps: receiving encoded video data in a bitstream and, for motion compensation, obtaining a first prediction value for a block of the picture using a first reference picture; obtaining a second prediction value for the block of the picture using a second reference picture; forming a third prediction value for the block of the picture in bi-predictive inter prediction using the first prediction value and the second prediction value, wherein the third prediction value is obtained as a weighted average of the first prediction value and the second prediction value; and wherein the samples of the third predictor are obtained by applying a first weight to the samples of the first predictor and by applying a second weight to the samples of the second predictor; the samples of the third predictor, the samples of the first predictor, and the samples of the second predictor share a same position in the block; and wherein the first weight and the second weight depend on the position of the sample in the block.

According to another aspect of the present disclosure, an apparatus for decoding video is disclosed. The device comprises: means for receiving encoded video data in a bitstream and means for processing motion compensation, the means for processing motion compensation further comprising means for obtaining a first prediction value for a block of a picture using a first reference picture; means for obtaining a second prediction value for the block of the picture using a second reference picture; means for forming a third prediction value for the block of the picture in bi-prediction inter prediction using the first prediction value and the second prediction value, wherein the third prediction value is obtained as a weighted average of the first prediction value and the second prediction value; wherein the samples of the third predictor are obtained by applying a first weight to the samples of the first predictor and by applying a second weight to the samples of the second predictor; the samples of the third predictor, the samples of the first predictor, and the samples of the second predictor share a same position in the block; and wherein the first weight and the second weight depend on the position of the sample in the block.

According to an aspect of the present disclosure, there is provided an apparatus for decoding video, the apparatus comprising a processor and at least one memory coupled to the processor, the processor being configured to receive encoded video data in a bitstream and to implement any variant of the decoding method.

The present disclosure also provides a signal comprising video data generated according to the method or apparatus of any of the preceding description. Embodiments of the present invention also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the described method.

The present disclosure also provides a computer-readable storage medium having stored thereon a bitstream generated according to the above-described method. The present disclosure also provides a method and apparatus for transmitting a bitstream generated according to the above method.

The foregoing presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the subject matter. Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.

Additional features and advantages of the present disclosure will become apparent from the following detailed description of illustrative embodiments thereof, which proceeds with reference to the accompanying drawings

4. Brief description of the drawings

Fig. 1 illustrates a bi-prediction process according to the HEVC standard;

fig. 2 illustrates a partitioning of a coding unit CU into two triangle prediction units according to a certain compression scheme;

FIG. 3 illustrates a process of weighting diagonal edges between two triangle prediction units according to a particular compression scheme;

FIG. 4 illustrates sub-block motion vector storage for triangle partitions according to a particular compression scheme;

FIG. 5 illustrates a motion compensation process for bi-predicted triangle partitions according to a particular compression scheme;

FIG. 6 illustrates a motion compensation process for a uni-predicted triangle partition according to a particular compression scheme;

FIG. 7 illustrates an example of a modified motion compensation process suitable for triangle prediction according to an embodiment of the present invention;

FIG. 8 illustrates an example of a plurality of diagonal patterns (patterns) in accordance with an embodiment of the present disclosure;

FIGS. 9 and 10 show other examples of patterns according to embodiments of the invention;

fig. 11 shows an embodiment of the proposed motion vector storage;

FIG. 12 illustrates an exemplary encoder according to an embodiment of the present disclosure;

fig. 13 illustrates an exemplary decoder according to an embodiment of the present disclosure;

FIG. 14 illustrates a block diagram of an example of a system in which the various aspects and embodiments are implemented;

fig. 15 illustrates a weighted bi-prediction implemented in either a decoding method or an encoding method according to an embodiment of the present disclosure.

Detailed Description

It is to be understood that the figures and descriptions have been simplified to illustrate elements that are relevant for a clear understanding of the present principles, while eliminating, for purposes of clarity, many other elements found in typical encoding and/or decoding devices. It will be understood that, although the terms first and second may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.

Various embodiments are described with respect to encoding/decoding of pictures. They may be applied to encode/decode a portion of a picture (such as a slice or tile), or an entire sequence of pictures.

Various methods are described above, and each method includes one or more steps or actions for implementing the described method. The order and/or use of specific steps and/or actions may be modified or combined unless a specific order of steps or actions is required for proper operation of the method.

At least some embodiments relate to a method for video encoding or video decoding comprising weighted bi-prediction, wherein weights of the weighted bi-prediction depend on locations of samples in blocks of pictures of the encoded or decoded video.

A first problem of weighted motion compensation for bi-predictive triangular partitions is: each PU is bi-predictive and therefore involves a weighting process on the edges between 2 predictions. Fig. 5 shows a motion compensation process for bi-predicted triangle partitions. For samples on the edge as shown in fig. 3, 4 motion compensations are required as shown in fig. 5. Therefore, a less computational approach is needed.

A first solution has been proposed that limits the triangle PU to uni-prediction to reduce the memory bandwidth. Fig. 6 shows a motion compensation process for uni-predictive triangular partitions. In this case the motion compensation process is simplified as shown in fig. 6, however, this solution may still benefit from improvements regarding process accuracy. At least one embodiment relates to improving the accuracy of a bi-prediction process of an inter-coded block.

In the enhancement of bi-predictive triangular partitioning, it is desirable to improve compression efficiency by adding more patterns. A second problem with weighted motion compensation for bi-predictive partitions is then to signal the added mode without the large cost of coding the index on the combination of partition and motion vector. At least one embodiment relates to improving signaling of a bi-prediction process for inter-coded blocks.

The third problem is the storage of motion vectors for each PU in sub-blocks on the edge. For samples on the edge, weighting is done between 2 predictions, which means that at least 2 motion vectors are used to predict the sample, but only the motion vector used to predict the current PU is stored in memory, which may result in sub-optimal motion propagation with neighboring blocks. At least one embodiment relates to improving the storage of motion vectors for the bi-prediction process of inter-coded blocks.

Accordingly, embodiments are disclosed to improve bi-prediction of inter-coded blocks.

General examples

At least one embodiment of a general method of weighted bi-prediction 1500 is shown in fig. 15. A person skilled in the art can easily implement such a method in any of the motion compensation processes of a video encoding method or a video decoding method, wherein the acquisition of the input information can be determined in the RDO loop in the encoding method or decoded from the received data in the decoder. In accordance with the present principles, the weights of weighted bi-prediction depend on the position of a sample in a block of a picture of the encoded or decoded video. Thus, the processing of triangle partitions is advantageously performed as weighted prediction and results in an increased accuracy of the bi-prediction process of inter-coded blocks, as explained later.

First, at 1510 of fig. 15, for a block of a picture to be encoded/decoded, a first prediction value and a second prediction value are obtained. The first prediction value for the block uses a first reference picture stored in list L0, and the second prediction value for the block uses a second reference picture stored in list L1. These 2 uni-directional predictors are combined to form a third predictor by bi-directional inter prediction.

According to various embodiments described hereinafter, at 1520, at least one piece of information for determining the location-related weights is obtained. This step is optional. According to a non-limiting embodiment, the first information indicates that the block of the picture is partitioned with triangle partitions, the second information indicates a direction of an edge of the triangle partitions of the block, and the third information indicates a location of the edge of the triangle partitions. According to another embodiment, said partitioning of said block is not limited to triangular partitions, and a more general partitioning of a block into 2 partitions along an edge is shown in fig. 9. Then, the information for determining the position-related weight is information indicating that the block of the picture is divided by edge partition. According to a non-limiting embodiment, the fourth information indicates that the edges of the so-called "triangle partitions" are vertical or horizontal (rather than diagonal). According to another embodiment, the information used in determining the position-related weights is information relating to color components. If a block is a luma block or a chroma block, the number of samples in the block may be different, and thus the weights are also determined from the color components. In addition to this, the weight is also determined according to the size of the block. According to a further embodiment, the position dependent weights in bi-prediction are not limited to triangles or edge partitions, so the information is more generally any information used in determining position dependent weights.

As previously described, in a particular embodiment, 2 uni-directional predictions are averaged using a plurality of weights to obtain a bi-directional prediction. According to a non-limiting example, the weights used are { -1/4,5/4}, {3/8,5/8} or {1/2,1/2}, where only one pair of weights is used for the entire block. The present principles are advantageously compatible with selecting block-based weights in a set of weights, wherein the position-dependent weights of the samples are derived from the selected block-based weights. In other words, the weight of a sample is determined according to its position and the selected block-based weight of the predictor. Thus, at 1530, optionally, in a set of weights, the selected block-based weights are determined.

At 1540, position dependent weights for the samples in the block are determined. According to a particular embodiment, the position-dependent weights of the samples in the block are further derived from at least one of the obtained information, the selected block-based weights, components of the block, a size of the block. Samples of a third predictor are obtained by applying a first weight to samples of the first predictor and by applying a second weight to samples of the second predictor. Thus, the first and second weights are determined in dependence on the position of the sample in the block. The samples of the third predictor, the samples of the first predictor, and the samples of the second predictor share a same position in the block, which samples are co-located in the block.

According to an embodiment, position dependent weighted bi-directional inter prediction is used for triangle prediction. Since each triangle prediction unit is limited to uni-prediction, the triangle prediction is implemented as bi-prediction, with the first and second weights depending on the sample position. In the case of a triangle partition, the first and second weights depend on the distance between the sample and the edge of the triangle partition of the block. However, the present principles are not limited to triangular partitions and may be readily extended to other partitions of the block, including horizontal/vertical edges and including multiple patterns. Various modifications and improvements are described below. The position of the edge in the block for calculating the weight is obtained from at least one information indicating that the block is divided into 2 partitions. Furthermore, in a variant, the at least one information is signaled to allow a decoding method corresponding to the encoding method to use the same information for bi-prediction. For example, in the encoding method, the at least one signaled information is entropy encoded. For example, in the decoding method, the at least one information is obtained from entropy decoding of the signaled information.

However, the position-dependent weights in bi-prediction are not limited to triangles or edge partitions. For example, the present principles are also compatible with avec combined intra inter prediction (where the intermediate weights depend on sample positions without other partitions).

Once the sample dependent weights are obtained, the bi-predictive process is processed at 1550. The third predictor (also referred to as a bi-predictor) is obtained as a weighted average of the first predictor and the second predictor. Advantageously, the weighted average is processed at an increased bit depth compared to the bit depth of the predictor. Then, at 1560, the weighted average over the larger bit depth is shifted and clipped (clip) to obtain the third predictor over the same bit depth as the first and second predictors.

At 1570, the bi-prediction ends and the third prediction value for the encoded or decoded image block is output.

Example 1

Thus, at least one embodiment of an encoding or decoding method involves weighting samples on the edges of a triangle partition with increased precision. Fig. 7 shows an example of a modified motion compensation procedure for triangle prediction according to an embodiment. Since each triangle prediction unit is limited to uni-prediction, bi-prediction with sample position dependent weighting factors is processed to obtain triangle prediction. This modified motion compensation process is implemented with the position dependent weights shown in fig. 3 and 4, and is used for 2 single prediction P₁And P₂The weight of the averaging may be different for each sample. The weight depends on the current sample S₀、S₁、S₂Or S₃Distance from the edge between the 2 triangular PUS P1 and P2. For example, for sample S0 on the edge of the partition, the first weight W₁Equal to 4/8, second weight W₂Equal to 4/8. For example, for samples S far from the edge of the partition₁First weight W₁Equal to 1/8, second weight W₂Equal to 7/8. In contrast, for samples S that are the same distance from the edge of the partition₂First weight W₁Equal to 7/8, and a second weight W₂Equal to 1/8. And, when the distance between the sample S3 and the edge of the partition is above a limit value, the first weight W₁Equal to 8/8, and a second weight, W2, equal to 0/8. Advantageously, shifting to the input bit depth and clipping is postponed until after weighting the samples, as shown in fig. 7, in favor of extended precision. In this embodiment, the first information indicates that the blocks of the picture are partitioned using triangular partitions, for example, according to the top-left to bottom-right direction shown in fig. 3. For example, a special syntax, such as triangle flag, is used to indicate the triangle partition of the block. In a block with a luminance component of size NxN (e.g., N-8 as in fig. 3), the weight W at location (x, y) in the block₁For example, by obtaining (x and y are in the range [0, N-1 ]]The following are added:

W₁(Clip (0,8, (x-y) +4) and W₂＝8-W₁

The resulting weights are in the range 0-8]Thus, increased accuracy of the weighted sum is obtained, and then shift and crop operations are performed to reduce the bit depth to that of the third prediction value. As shown in fig. 3, the weight W at position (x, y)₁Different for the chrominance components.

Example 2

At least one embodiment of the encoding or decoding method further involves weighting samples on the edges of triangular partitions adapted to the multiple partition patterns. Advantageously, the information indicative of the arrangement of the plurality of partition patterns is used to determine edges in the blocks of the different partition patterns. Thus, the position, e.g. distance, of the sample in the block relative to the edge is determined. Fig. 8 illustrates an example of a plurality of diagonal patterns according to an embodiment. Such a multiple diagonal pattern, in which 2 (TL2BR or TR2BL) or more (TL2BR _1_4, TL2BR _3_4, TR2BL _1_4, TR2BL _3_4) patterns are defined, is desirable to improve the encoding efficiency. For example, as shown in fig. 8, the diagonal may be shifted (shift 1/4 or 3/4) or rotated (top left to bottom right TL2BR or top right to bottom left TR2 BL).

At least one embodiment relates to bi-prediction with multiple partitions (e.g., triangle partitions). According to the variation feature, the encoding of the plurality of patterns is separated from the encoding of the index of the motion vector predictor. In fact, if the 6 patterns from fig. 8 were implemented as bi-predictive triangular partitions, then 6 × 20 — 180 combinations would have to be tested at the encoder. Various encoder acceleration implementations are described below in a variation of embodiment 5.

Thus, a dedicated syntax is described that separates the signaling about patterns using the syntax elements diagonaldir x0 y0 diagonalpos x0 y0 from the signaling about motion vector indices in the motion vector candidate list most _ basic _ idx. Thus, the second information indicating the direction of the edge of the triangle partition of the block is a diagonaldirx [ x0] [ y0] syntax element, and the third information indicating the position of the edge of the triangle partition is a diagonalpos [ x0] [ y0] syntax element. Such syntax elements are entropy encoded, decoded separately, and binarized as proposed in tables 4 and 5.

Table 3: modified syntax for multiple patterns

diagonaldir x0 y0 specifies the direction of the diagonal separating the 2 prediction units of the block, where x0, y0 specify the position of the top left luma sample of the considered prediction block relative to the top left luma sample of the picture (x0, y 0). Examples of various diagonal directions are shown in fig. 2 and 8.

Table 4: binarization for Diagnonal _ dir syntax element

The diagonalpos x0 y0 specifies the position of the diagonal separating the 2 prediction units of the block, where x0, y0 specify the position of the top left luma sample of the considered prediction block relative to the top left luma sample of the picture (x0, y 0). Fig. 8 shows examples of various diagonal positions.

Table 5: binarization for Diagnonal _ pos syntax element

Example 3

In another variation, other edges may be used as horizontal, vertical, or from corner to middle edges of the block, as shown in fig. 9 and 10. For example, fig. 9 shows additional patterns where the edges are horizontal (HOR, HOR _1_4, HOR _3_4) or vertical (VER, VER _1_4, VER _3_4) and located at the middle (HOR, VER) or quarter (HOR _1_4, HOR _3_4, VER _1_4, VER _3_4) of the block. Fig. 10 shows a further pattern, where the edge starts at a corner of a block and ends in the middle of the block. Of course, partitioning along edges compatible with the principles of the present invention is not limited to the described patterns, and those skilled in the art will readily apply the modified motion compensation process using position dependent weighting factors to other partition patterns. According to the present embodiment, the syntax is modified to add a new pattern as shown in table 6.

Table 6: proposed syntax for diagonal plus horizontal and vertical partitioning

In another variant, syntax elements indicating other weights applied to the average over the edges are encoded.

Example 4

The at least one embodiment is further adapted to store motion vectors on the edges of the partitions. In the classical inter mode, when bi-prediction is used, 2 motion vectors (1 per list) are stored in the 4 × 4 sub-blocks. In triangle mode merging, in a variant where each PU is restricted to uni-prediction, the 2 motion vectors used on the edges as shown in fig. 11 are advantageously stored in the corresponding 2 lists. Thus, the method can be implemented without additional costs. In conjunction with equation 2, motion vectors can be stored with given weights w0 and w1 for motion vectors from list 0 and list 1, respectively, which allows for better propagation of motion vectors and thus better prediction for neighboring blocks.

Modification of embodiment 5

Embodiment 5 relates to encoder acceleration for motion compensation. At the encoder, the maximum number of test combinations increases much when more patterns are used, because for a given pattern, 20 combinations for motion vector predictors are used. For any combination, the averaging is processed to evaluate the candidates. At least one embodiment relates to reducing the number of test combinations. According to one feature, only the best predictor is subjected to RDOQ (rate distortion optimized quantization) processing by using a fast estimate of SATD (sum of absolute transformed differences).

In a first variant, no test is performed on combinations that use zero motion vector predictors added at the end of the list. For example, in the case where predictors 4,5 of the list are additional zero motion vectors, only 3 × 2 — 6 combinations are tested for a given pattern, instead of 20. In the case of 6 patterns, 6 × 6 — 36 combinations were tested instead of 6 × 20 — 240.

In a second variant, the combination using the motion vectors selected in the classical merging is tested, while the other combinations are not tested. First, the best merging candidate for the classical merging is determined. One or 2 motion vectors depending on whether the best candidate is unidirectional or bidirectional are then restored. If these motion vectors are in the list of motion vector predictors for the triangle PU, then the test on possible combinations is reduced to the test on combinations containing these motion vector predictors.

In a third variation, the motion vector predictors are ordered using SAD or SATD. Since all combinations are done using up to 5 motion vector predictors, they are sorted according to the likelihood of all 5 uni-directional motion vector predictors using SAD or SATD. The N (N < 5) best motion vectors are retained, or predictors larger than a threshold are removed. Thus, the number of possible combinations is reduced.

Additional embodiments and information

This application describes aspects including tools, features, embodiments, models, methods, and the like. Many of these aspects are described as specific and, at least to show individual characteristics, are often described in a manner that may sound limited. However, this is for clarity of description and does not limit the application or scope of those aspects. Indeed, all of the different aspects may be combined and interchanged to provide further aspects. Further, these aspects may also be combined and interchanged with the aspects described in the earlier documents.

The aspects described and contemplated in this application can be embodied in many different forms. Fig. 12, 13, and 14 below provide some embodiments, but other embodiments are contemplated and the discussion of fig. 12, 13, and 14 does not limit the breadth of the implementation. At least one of the aspects relates generally to video encoding and decoding, and at least one other aspect relates generally to transmitting a generated or encoded bitstream. These and other aspects may be implemented as a method, apparatus, computer-readable storage medium having stored thereon instructions for encoding or decoding video data according to any of the described methods, and/or computer-readable storage medium having stored thereon a bitstream generated according to any of the described methods.

In this application, the terms "reconstruction" and "decoding" are used interchangeably, the terms "pixel" and "sample" are used interchangeably, and the terms "image", "picture" and "frame" are used interchangeably. Typically, but not necessarily, the term "reconstruction" is used at the encoder side, while "decoding" is used at the decoder side.

Various methods are described herein, and each method includes one or more steps or actions for achieving the described method. The order and/or use of specific steps and/or actions may be modified or combined unless a specific order of steps or actions is required for proper operation of the method.

Various methods and other aspects described in this application may be used to modify modules, such as entropy encoding 145, motion compensation 170, and motion estimation 175 of the video encoder 100 shown in fig. 12 and entropy decoding 230 and motion compensation 275 modules of the video decoder 200 shown in fig. 13. Furthermore, the present invention is not limited to VVC or HEVC, and is applicable to, for example, extensions to other standards and proposals (whether pre-existing or developed in the future) and any such standards and proposals, including VVC and HEVC. The aspects described in this application may be used alone or in combination unless otherwise indicated or technically excluded.

Various numerical values are used in this application, such as the number of partitions or the value of relative weights. The specific values are for example purposes and the described aspects are not limited to these specific values.

Fig. 12 shows an encoder 100. Variations of this encoder 100 are contemplated, but for clarity, the encoder 100 is described below without describing all contemplated variations.

Before being encoded, the video sequence may undergo a pre-encoding process (101), for example, applying a color transform (e.g., conversion from RGB 4:4:4 to YCbCr 4:2: 0) to the input color pictures, or performing a remapping of the input picture components in order to obtain a more resilient signal distribution to compression (e.g., using histogram equalization of one of the color components). Metadata may be associated with the preprocessing and appended to the bitstream.

In the encoder 100, the pictures are encoded by an encoder element, as described below. Partitions (102) and processes pictures to be encoded in units of, for example, CUs. Each unit is encoded using, for example, intra or inter modes. When a unit is encoded in intra mode, it performs intra prediction (160). In inter mode, motion estimation (175) and compensation (170) are performed. The encoder decides (105) which of an intra mode or an inter mode to use to encode the unit, and indicates the intra/inter decision by, for example, a prediction mode flag. For example, a prediction residual is calculated by subtracting (110) the prediction block from the original image block.

The prediction residual is then transformed (125) and quantized (130). The quantized transform coefficients are entropy encoded (145) along with motion vectors and other syntax elements to output a bitstream. The encoder may skip the transform and apply quantization directly to the untransformed residual signal. The encoder may bypass both transform and quantization, i.e. directly encode the residual without applying the transform or quantization process.

The encoder decodes the encoded block to provide a reference for further prediction. The quantized transform coefficients are dequantized (140) and inverse transformed (150) to decode the prediction residual. The decoded prediction residual and the prediction block are combined (155) to reconstruct the image block. An in-loop filter (165) is applied to the reconstructed picture to perform, for example, deblocking/SAO (sample adaptive offset) filtering to reduce coding artifacts. The filtered image is stored in a reference picture buffer (180).

Fig. 13 shows a block diagram of the video decoder 200. In the decoder 200, the bit stream is decoded by a decoder element, as described below. Video decoder 200 typically performs a decoding process that is reciprocal to the encoding process as described in fig. 13. The encoder 100 also typically performs video decoding as part of the encoded video data.

In particular, the input of the decoder comprises a video bitstream, which may be generated by the video encoder 100. The bitstream is first entropy decoded (230) to obtain transform coefficients, motion vectors, and other encoded information. The picture partition information indicates how the picture is partitioned. The decoder may thus divide (235) the picture according to the decoded picture partition information. The transform coefficients are dequantized (240) and inverse transformed (250) to decode the prediction residual. The decoded prediction residual is combined (255) with the prediction block, reconstructing the block. The prediction block may be obtained (270) from intra-prediction (260) or motion compensated prediction (i.e., inter-prediction) (275). An in-loop filter (265) is applied to the reconstructed image. The filtered image is stored in a reference picture buffer (280).

The decoded pictures may further undergo post-decoding processing (285), such as an inverse color transform (e.g., conversion from YCbCr 4:2:0 to RGB 4:4: 4) or performing an inverse remapping of the remapping process performed in the pre-encoding processing (101). The post-decoding processing may use metadata derived in the pre-encoding processing and signaled in the bitstream.

FIG. 14 illustrates a block diagram of an example of a system in which aspects and embodiments are implemented. The system 1000 may be implemented as a device including the various components described below and configured to perform one or more aspects described herein. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptop computers, smart phones, tablet computers, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. Elements of system 1000 may be implemented in a single Integrated Circuit (IC), multiple ICs, and/or discrete components, alone or in combination. For example, in at least one embodiment, the processing and encoder/decoder elements of system 1000 are distributed across multiple ICs and/or discrete components. In various embodiments, the system 1000 is communicatively coupled to one or more other systems or other electronic devices via, for example, a communications bus or through dedicated input and/or output ports. In various embodiments, the system 1000 is configured to implement one or more aspects described herein.

The system 1000 includes at least one processor 1010 configured to execute instructions loaded therein for implementing various aspects described herein, for example. The processor 1010 may include embedded memory, an input-output interface, and various other circuits known in the art. The system 1000 includes at least one memory 1020 (e.g., volatile memory devices and/or non-volatile memory devices). System 1000 includes a storage device 1040 that may include non-volatile memory and/or volatile memory, including, but not limited to, Electrically Erasable Programmable Read Only Memory (EEPROM), Read Only Memory (ROM), Programmable Read Only Memory (PROM), Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), flash memory, magnetic disk drives, and/or optical disk drives. By way of non-limiting example, the storage 1040 may include an internal storage, an attached storage (including removable and non-removable storage), and/or a network accessible storage.

The system 1000 includes an encoder/decoder module 1030 configured to, for example, process data to provide encoded video or decoded video, and the encoder/decoder module 1030 may include its own processor and memory. The encoder/decoder module 1030 represents module(s) that may be included in a device to perform encoding and/or decoding functions. As is known, a device may include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 1030 may be implemented as a separate element of system 1000 or may be incorporated within processor 1010 as a combination of hardware and software as known to those skilled in the art.

Program code to be loaded onto processor 1010 or encoder/decoder 1030 to perform the various aspects described in this document may be stored in storage device 1040 and subsequently loaded onto memory 1020 for execution by processor 1010. According to various embodiments, one or more of the processor 1010, the memory 1020, the storage 1040, and the encoder/decoder module 1030 may store one or more of various items during performance of the processes described herein. These stored terms may include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, the matrix, the variables, and intermediate or final results from the processing of equations, formulas, operations, and operation logic.

In some embodiments, memory within the processor 1010 and/or the encoder/decoder module 1030 is used to store instructions and provide working memory for processing required during encoding or decoding. However, in other embodiments, memory external to the processing device (e.g., the processing device may be the processor 1010 or the encoder/decoder module 1030) is used for one or more of these functions. The external memory may be memory 1020 and/or storage 1040, such as dynamic volatile memory and/or non-volatile flash memory. In several embodiments, external non-volatile flash memory is used to store an operating system, such as a television. In at least one embodiment, fast external dynamic volatile memory such as RAM is used as working memory for video encoding and decoding operations, such as working memory for MPEG-2(MPEG refers to moving Picture experts group, MPEG-2 is also known as ISO/IEC13818, and 13818-1 is also known as H.222, and 13818-2 is also known as H.262), HEVC (HEVC refers to high efficiency video coding, also known as H.265 and MPEG-H part 2), or VVC (general video coding, a new standard developed by the Joint video team expert JVET).

As shown in block 1130, input to the elements of system 1000 may be provided through a variety of input devices. Such input devices include, but are not limited to: (i) an RF portion that receives a Radio Frequency (RF) signal, for example, transmitted over the air by a broadcaster, (ii) a Component (COMP) input terminal (or set of component input terminals), (iii) a Universal Serial Bus (USB) input terminal, and/or (iv) a High Definition Multimedia Interface (HDMI) input terminal. Other examples not shown in fig. 10 include composite video.

In various embodiments, the input device of block 1130 has associated corresponding input processing elements known in the art. For example, the RF part may be associated with elements suitable for: (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a frequency band), (ii) down-converting the selected signal, (iii) band-limiting the frequency band again to a narrower frequency band to select, for example, a signal band that may be referred to as a channel in some embodiments, (iv) demodulating the down-converted, band-limited signal, (v) performing error correction, and (vi) demultiplexing to select a desired stream of data packets. The RF section of various embodiments includes one or more elements to perform these functions, such as frequency selectors, signal selectors, band limiters, channel selectors, filters, down-converters, demodulators, error correctors, and demultiplexers. The RF section may include a tuner that performs various of these functions including, for example, down-converting the received signal to a lower frequency (e.g., an intermediate or near baseband frequency) or baseband. In one set-top box embodiment, the RF section and its associated input processing elements receive RF signals transmitted over a wired (e.g., cable) medium and perform frequency selection to a desired frequency band by filtering, down-converting, and re-filtering. Various embodiments rearrange the order of the above (and other) elements, remove some of these elements, and/or add other elements that perform similar or different functions. Adding components may include inserting components between existing components, such as an amplifier and an analog-to-digital converter. In various embodiments, the RF section includes an antenna.

Additionally, USB and/or HDMI terminals may include respective interface processors for connecting the system 1000 to other electronic devices through USB and/or HDMI connections. It should be appreciated that various aspects of the input processing (e.g., solomon error correction) may be implemented as desired, for example, within a separate input processing IC or processor 1010. Similarly, aspects of the USB or HDMI interface processing may be implemented within a separate interface IC or within the processor 1010, as desired. The demodulated, error corrected and demultiplexed stream is provided to various processing elements including, for example, a processor 1010 and an encoder/decoder 1030 that operate in conjunction with memory and storage elements to process the data stream as needed for presentation on an output device.

The various elements of system 1000 may be disposed within an integrated housing. Within the integrated housing, the various components may be interconnected and communicate data therebetween using a suitable connection arrangement (e.g., internal buses known in the art, including inter-IC (I2C) buses, wiring, and printed circuit boards).

The system 1000 includes a communication interface 1050 that enables communication with other devices via a communication channel 1060. The communication interface 1050 may include, but is not limited to, a transceiver configured to transmit and receive data over a communication channel 1060. The communication interface 1050 can include, but is not limited to, a modem or a network card, and the communication channel 1060 can be implemented, for example, within a wired and/or wireless medium.

In various embodiments, data is streamed or otherwise provided to the system 1000 using a wireless network (e.g., a Wi-Fi network, such as IEEE 802.11(IEEE refers to the institute of electrical and electronics engineers)). The Wi-Fi signals of these embodiments are received over a communication channel 1060 and a communication interface 1050 suitable for Wi-Fi communication. The communication channel 1060 of these embodiments is typically connected to an access point or router that provides access to external networks including the internet to allow streaming applications and other on-cloud communications. Other embodiments provide streamed data to the system 1000 using a set-top box that passes the data over the HDMI connection of input block 1130. Still other embodiments provide streamed data to the system 1000 using the RF connection of input block 1130. As described above, various embodiments provide data in a non-streaming manner. In addition, various embodiments use wireless networks other than Wi-Fi, e.g., a cellular network or a Bluetooth network.

The system 1000 may provide output signals to various output devices, including a display 1100, speakers 1110, and other peripheral devices 1120. The display 1100 of various embodiments includes one or more of the following: such as a touch screen display, an Organic Light Emitting Diode (OLED) display, a curved display, and/or a foldable display. The display 1100 may be used in a television, tablet, laptop, cellular phone (mobile phone), or other device. The display 1100 may also be integrated with other components (e.g., as in a smart phone), or stand alone (e.g., an external monitor for a laptop computer). In examples of embodiments, the other peripheral devices 1120 include one or more of: a stand-alone digital video disc (or digital versatile disc) (DVR, for both items), a disc player, a stereo system, and/or a lighting system. Various embodiments use one or more peripherals 1120 that provide functionality based on the output of system 1000. For example, a disc player performs the function of playing the output of the system 1000.

In various embodiments, control signals are communicated between the system 1000 and the display 1100, speakers 1110, or other peripheral devices 1120 using signaling, such as av. link (AV. link), Consumer Electronics Control (CEC), or other communication protocol that enables device-to-device control with or without user intervention. The output devices may be communicatively coupled to the system 1000 via dedicated connections through respective interfaces 1070, 1080, and 1090. Alternatively, the output device may be connected to the system 1000 via the communication interface 1050 using the communication channel 1060. The display 1100 and speaker 1110 may be integrated in a single unit in an electronic device (e.g., a television) along with other components of the system 1000. In various embodiments, the display interface 1070 includes a display driver, such as a timing controller ((T Con) chip.

For example, if the RF portion of input 1130 is part of a separate set-top box, the display 1100 and speaker 1110 may alternatively be separate from one or more of the other components. In various embodiments where the display 1100 and speaker 1110 are external components, the output signals may be provided via a dedicated output connection, including, for example, an HDMI port, USB port, or COMP output.

These embodiments may be implemented by the processor 1010 or by hardware-implemented computer software, or by a combination of hardware and software. The embodiments may be implemented by one or more integrated circuits, as non-limiting examples. The memory 1020 may be of any type suitable to the technical environment and may be implemented using any suitable data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory and removable memory, as non-limiting examples. The processor 1010 may be of any type suitable to the technical environment, and may include, by way of non-limiting example, one or more of the following: microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture.

Various implementations relate to decoding. As used herein, "decoding" may include, for example, all or part of the processing performed on a received encoded sequence to produce a final output suitable for display. In various embodiments, such processes include one or more of the processes typically performed by a decoder, such as entropy decoding, inverse quantization, inverse transformation, and differential decoding. In various embodiments, such processes also or alternatively include processes performed by decoders of various implementations described in the present application, such as decoding bi-prediction flags, decoding partitions for bi-prediction and indices in a list of prediction values, determining weights from locations of pixels in a block (particularly along edges of partitions of a PU), and performing motion compensation between frames using bi-prediction and the determined weights.

As a further example, "decoding" in one embodiment refers to entropy decoding only, in another embodiment refers to differential decoding only, and in another embodiment "decoding" refers to a combination of entropy decoding and differential decoding. The phrase "decoding process" is intended to refer specifically to a subset of operations or to a broader decoding process in general, as will be clear based on the context of the specific description and is believed to be well understood by those skilled in the art.

Various implementations relate to encoding. In a similar manner to the discussion above regarding "decoding," encoding "as used in this application may include, for example, all or part of the process performed on an input video sequence to produce an encoded bitstream. In various embodiments, such processes include one or more processes typically performed by an encoder, such as partitioning, differential encoding, transformation, quantization, and entropy encoding. In various embodiments, such processes also or alternatively include processes performed by encoders of various implementations described in the present application, for example, determining prediction values in a bi-prediction scheme using motion compensation and motion prediction (where the weights of the bi-prediction are based on the locations of pixels in a block (especially along the edges of partitions of a PU), performing motion compensation in inter frames using bi-prediction, encoding bi-prediction flags, encoding partitions for bi-prediction, and encoding indices in a list of prediction values.

As a further example, "encoding" in one embodiment refers only to entropy encoding, in another embodiment "encoding" refers only to differential encoding, and in another embodiment "encoding" refers to a combination of differential encoding and entropy encoding. The phrase "encoding process" is intended to refer specifically to a subset of operations or to a broader encoding process in general, as will become clear based on the context of the specific description and is believed to be well understood by those skilled in the art.

Note that syntax elements (e.g., dialog _ dir, dialog _ pos, dialog _ flag, partition _ dir, partition _ pos) as used herein are descriptive terms. Therefore, they do not exclude the use of other syntax element names.

While the figures are presented as flow charts, it should be understood that it also provides a block diagram of the corresponding apparatus. Similarly, when the figures are presented as block diagrams, it should be understood that it also provides flow diagrams of corresponding methods/processes.

Various embodiments relate to rate-distortion optimization, for example, when testing a combination of bi-predictive multi-partition PUs at an encoder. In particular, during the encoding process, a balance or trade-off between rate and distortion is typically considered, often giving constraints on computational complexity. The rate-distortion optimization is typically formulated to minimize a rate-distortion function, which is a weighted sum of rate and distortion. There are different approaches to solve the rate-distortion optimization problem. For example, these methods may be based on extensive testing of all coding options, including all considered modes or coding parameter values, and a complete evaluation of their coding costs and associated distortions of the reconstructed signal after encoding and decoding. Faster methods can also be used to save coding complexity, in particular to calculate the approximate distortion based on the prediction or prediction residual signal instead of the reconstructed signal. A mixture of these two approaches may also be used, for example by using approximate distortion only for some possible coding options, and full distortion for other coding options. Other methods evaluate only a subset of the possible coding options. More generally, many approaches employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete assessment of both coding cost and associated distortion.

The implementations and aspects described herein may be implemented in, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (e.g., discussed only as a method), implementation of the features discussed may also be implemented in other forms (e.g., an apparatus or program). For example, the apparatus may be implemented in appropriate hardware, software and firmware. The method may be implemented, for example, in a processor, which refers generally to a processing device, including, for example, a computer, microprocessor, integrated circuit, or programmable logic device. Processors also include communication devices such as computers, cellular telephones, portable/personal digital assistants ("PDAs"), and other devices that facilitate the communication of information between end-users.

Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" as well as other variations means that a particular feature, structure, characteristic, and the like described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation," as well as any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.

In addition, the present application may relate to "determining" various information. Determining this information may include, for example, one or more of: estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

Further, the present application may relate to "accessing" various information. Accessing this information may include, for example, one or more of: receiving the information, retrieving the information (e.g., retrieving the information from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.

In addition, this application may refer to "receiving" various information. As with "access," reception is intended to be a broad term. Receiving the information may include, for example, one or more of: access the information or retrieve the information (e.g., from memory). Further, "receiving" is often referred to in one way or another during operations such as storing information, processing information, transmitting information, moving information, copying information, erasing information, calculating information, determining information, predicting information, or estimating information.

It should be understood that the use of any of the following "/", "and/or" and at least one of "is intended to encompass the selection of only the first listed option (a), or only the second listed option (B), or the selection of both options (a and B), for example in the case of" a/B "," a and/or B "and" at least one of a and B ". As a further example, in the case of "A, B and/or C" and "at least one of A, B and C", such phrases are intended to include selecting only the first listed option (a), or only the second listed option (B), or only the third listed option (C), or only the first and second listed options (a and B), or only the first and third listed options (a and C), or only the second and third listed options (B and C), or all three options (a and B and C). This can be extended to a number of items listed as will be clear to those of ordinary skill in this and related arts.

Furthermore, as used herein, the word "signal" particularly refers to something that a corresponding decoder indicates. For example, in some embodiments, the encoder signals a particular one of a plurality of parameters for improved bi-prediction, e.g., for signaling partitions and associated weights for a block. Thus, in an embodiment, the same parameters are used at both the encoder side and the decoder side. Thus, for example, an encoder may send (explicitly signal) a particular parameter to a decoder so that the decoder may use that same particular parameter. Conversely, if the decoder already has the particular parameter and other parameters, signaling may be used without sending (implicitly signaled) to simply allow the decoder to know and select the particular parameter. By avoiding the transmission of any actual functionality, bit savings are achieved in various embodiments. It should be understood that the signaling may be implemented in various ways. For example, in various embodiments, one or more syntax elements, flags, etc. are used to signal information to a corresponding decoder. Although the foregoing relates to a verb form of the word "signal," the word "signal" may also be used herein as a noun.

As will be apparent to one of ordinary skill in the art, implementations may produce various signals formatted to carry information that may be stored or transmitted, for example. The information may include, for example, instructions for performing a method, or data generated by one of the described implementations. For example, the signal may be formatted to carry a bitstream of the described embodiments. Such signals may be formatted, for example, as electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or as baseband signals. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information carried by the signal may be, for example, analog or digital information. The signals may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor readable medium.

We describe a number of embodiments. The features of these embodiments may be provided separately or in any combination. Furthermore, across the various claim categories and types, embodiments may include one or more of the following features, devices, or aspects, alone or in any combination:

modifying the bi-prediction process applied in the decoder and/or encoder.

Applying bi-predictive methods with increased accuracy in the decoder and/or encoder.

Enable several weights in the same PU for bi-predictive methods in the decoder and/or encoder.

Determining, in the decoder and/or encoder, the weights in the PU of the bi-prediction method according to the position of the pixels with respect to the edges of the partition of the PU.

Determining, in the decoder and/or encoder, weights in the PU of the bi-prediction method according to the position of the pixels with respect to the edges of the plurality of partitions of the PU.

Insert syntax elements in the signaling that enable the decoder to identify PU partitions for use by the bi-predictive method, and optionally, weights for each pixel.

Based on these syntax elements, the partition and the weights are selected for application of the bi-predictive method at the decoder.

Using a uni-predictive motion model to combine them into a weighted bi-prediction at the encoder and/or decoder according to any of the embodiments discussed.

A bitstream or signal comprising one or more of the described syntax elements or variants thereof.

-inserting in said signalling a syntax element enabling said decoder to perform motion compensation in a manner corresponding to the manner used by the encoder.

Creating and/or transmitting and/or receiving and/or decoding a bitstream or signal comprising one or more of the described syntax elements or variants thereof.

TV, set-top box, cellular phone, tablet or other electronic device, which performs inter-frame bi-prediction according to any of the embodiments described.

A TV, set-top box, cellular phone, tablet or other electronic device that performs inter-frame bi-prediction according to any of the described embodiments, and displays (e.g., using a monitor, screen or other type of display) the resulting image.

A TV, set-top box, cellular phone, tablet or other electronic device that tunes (e.g., using a tuner) a channel to receive a signal including an encoded image, and performs inter-frame bi-prediction according to any of the embodiments described.

A TV, set-top box, cellular phone, tablet or other electronic device that receives over the air (e.g., using an antenna) a signal comprising the encoded image and performs bi-prediction in inter-frame parameters according to any of the embodiments described.

31页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于使用帧内预测对图像进行编码/解码的方法和装置

Method and apparatus for video encoding and decoding using bi-prediction

相关技术

网友询问留言