Use-case driven context model selection for hybrid video coding tools

文档序号：246885 发布日期：2021-11-12 浏览：9次中文

阅读说明：本技术 混合视频编码工具的用例驱动上下文模型选择 (Use-case driven context model selection for hybrid video coding tools ) 是由乔纳森·普法夫菲利普·赫勒米夏埃尔·沙费尔托比亚斯·欣茨比约恩·施塔伦贝格尔菲利于 2020-03-04 设计创作，主要内容包括：描述了一种包括编码器的装置。编码器接收图像或视频数据,对所接收到的图像或视频数据进行编码,并提供表示图像或视频数据的比特流。编码器包括CABAC编码器。CABAC编码器接收与要编码的图像或视频数据块相关联的二进制值语法元素,并使用选定的上下文模型将二进制值语法元素编码为比特流的编码比特。二进制值语法元素包括工具标志,其指示在对图像或视频数据的块进行编码时是否采用特定编码工具,例如,线性加权帧内预测LWIP。对于具有大于2的纵横比且特定编码工具适用的图像或视频数据的块,从一个或多个第一上下文模型的组中选择用于对工具标志进行编码的第一上下文模型,并且对于具有小于或等于2的纵横比且特定编码工具适用的图像或视频数据的块,从一个或多个第二上下文模型的组中选择用于对工具标志进行编码的第二上下文模型。(An apparatus comprising an encoder is described. An encoder receives image or video data, encodes the received image or video data, and provides a bitstream representing the image or video data. The encoder comprises a CABAC encoder. The CABAC encoder receives a binary-valued syntax element associated with a block of image or video data to be encoded and encodes the binary-valued syntax element into coded bits of a bitstream using a selected context model. The binary-valued syntax element includes a tool flag that indicates whether a particular coding tool, e.g., linear weighted intra prediction, LWIP, is employed in encoding a block of image or video data. For blocks of image or video data having an aspect ratio greater than 2 and for which a particular coding tool applies, a first context model for coding a tool flag is selected from a set of one or more first context models, and for blocks of image or video data having an aspect ratio less than or equal to 2 and for which a particular coding tool applies, a second context model for coding a tool flag is selected from a set of one or more second context models.)

1. An apparatus, comprising:

an encoder that receives image or video data, encodes the received image or video data, and provides a bitstream representing the image or video data,

the encoder comprises a CABAC encoder that receives a binary-valued syntax element associated with a block of image or video data to be encoded and encodes the binary-valued syntax element into coded bits of the bitstream using a selected context model,

wherein the binary-valued syntax element comprises a tool flag indicating whether a particular encoding tool, e.g., Linear Weighted Intra Prediction (LWIP), is employed in encoding the block of image or video data,

wherein for a block of image or video data having an aspect ratio greater than 2 and to which the particular encoding tool applies, a first context model for encoding the tool flag is selected from a set of one or more first context models,

wherein for a block of image or video data having an aspect ratio less than or equal to 2 and to which the particular encoding tool applies, a second context model for encoding the tool flag is selected from a set of one or more second context models.

2. The apparatus of claim 1, wherein the coding tool is Linear Weighted Intra Prediction (LWIP) and the first context model comprises a CABAC context for sending the tool flags for blocks of image or video data having an aspect ratio greater than 2, wherein

If LWIP mode is not tested for blocks with aspect ratio greater than 2, the flag is always 0 and the additional CABAC context sends a probability of 1 to zero adjustment, and

if the LWIP mode is tested for blocks with aspect ratios greater than 2, the flag is 1 with a certain probability.

3. The apparatus of claim 1 or 2, wherein the CABAC encoder selects the first context model and the second context model for the current processed block in response to a selection index indicating that the current processed block has an aspect ratio greater than 2 or has an aspect ratio less than or equal to 2.

4. An apparatus, comprising:

a decoder that receives a bitstream comprising encoded image or video data, decodes the encoded image or video data from the received bitstream, and provides decoded image or video data,

the decoder comprises a CABAC decoder that decodes, from the bitstream, binary-valued syntax elements associated with the blocks of the encoded image or video data using the selected context model,

wherein for a block of image or video data having an aspect ratio greater than 2 and to which the particular encoding tool applies, a first context model for decoding the tool flag is selected from a set of one or more first context models,

wherein for a block of image or video data having an aspect ratio less than or equal to 2 and to which the particular encoding tool applies, a second context model for decoding the tool flag is selected from a set of one or more second context models.

5. A method for encoding image or video data, the method comprising:

the image or video data is received and,

encoding the received image or video data, and

providing a bitstream representing said image or video data,

wherein encoding the received image or video data comprises:

receiving, using a CABAC encoder, a binary-valued syntax element associated with a block of image or video data to be encoded, an

Encoding the binary-valued syntax element into encoded bits of the bitstream using the selected context model,

6. A method for decoding image or video data, the method comprising:

receiving a bitstream comprising encoded image or video data,

decoding said encoded image or video data from the received bitstream, an

-providing said encoded image or video data,

wherein decoding the received image or video data comprises: decoding, from the bitstream, binary-valued syntax elements associated with the blocks of the encoded image or video data using a CABAC decoder and a selected context model,

7. A computer program product comprising instructions which, when said program is executed by a computer, cause the computer to carry out the method according to any one of claims 5 or 6.

8. An apparatus, comprising:

an encoder that receives image or video data, encodes the received image or video data, and provides a bitstream representing the image or video data,

the encoder comprises a CABAC encoder that receives a binary-valued syntax element associated with a particular data block of image or video data to be encoded, encodes the binary-valued syntax element into coded bits of the bitstream using a selected context model,

wherein the binary-valued syntax element includes a tool flag indicating whether a particular encoding tool is employed in encoding the image or video data,

wherein a set of first context models for encoding the tool flags is selected for one or more first portions of the particular data block that are application-independent and for which the encoding tool is always applicable,

wherein a set of second context models for encoding the tool flags is selected for one or more second portions of the particular data block depending on the application, the coding tool applicable or not applicable.

9. The apparatus of claim 8, wherein the CABAC encoder selects the first context model or the second context model for the currently processed portion of the particular data block in response to a selection index having a first value indicating that the currently processed portion of the particular data block is the first portion and having a second value indicating that the currently processed portion of the particular data block is the second portion.

10. An apparatus, comprising:

a decoder that receives a bitstream comprising encoded image or video data, decodes the encoded image or video data from the received bitstream, and provides decoded image or video data,

the decoder comprises a CABAC decoder that decodes, from the bitstream, a binary-valued syntax element associated with a particular data block of the encoded image or video data using a selected context model,

wherein the binary-valued syntax element includes a tool flag indicating whether a particular encoding tool is employed in encoding the image or video data,

wherein the set of first context models for decoding the tool flags is selected for a portion of the particular data block that is application-independent and for which the encoding tool is always applicable,

wherein a set of second context models for decoding the tool flags is selected for a portion of the particular data block that is applicable or inapplicable depending on an application, the encoding tool, or both.

11. The apparatus of claim 10, wherein the CABAC decoder is to select the first context model or the second context model for the currently processed portion of the particular data block in response to a selection index having a first value indicating that the currently processed portion of the particular data block is the first portion and having a second value indicating that the currently processed portion of the particular data block is the second portion.

12. The apparatus according to any of claims 8 to 11, wherein the set of first context models comprises a first context model or first context models, and wherein the set of second context models comprises a second context model or second context models.

Technical Field

The present invention relates to the field of encoding/decoding pictures, images or video, and more particularly to encoding one or more encoding tools, such as affine linear weighted intra prediction LWIP or matrix based intra prediction MIP of the general video coding VVC standard, using a context or context model of a context adaptive binary arithmetic coding CABAC engine. Embodiments relate to encoding a flag (e.g., intra _ MIP _ flag) indicating applicability of LWIP or MIP of the VVC standard based on a context model selected according to an aspect ratio of a block of image or video data to be processed.

Background

In the most advanced video coding standards, such as ITU T h.265| MPEG H HEVC [1], pictures are divided into fixed square-sized coding tree units (CodingTreeUnits), which can be further subdivided into smaller blocks. The reconstructed signal of such a block is typically a superposition of the prediction signal and the residual signal. The prediction signal is obtained by extrapolating samples of the neighboring neighborhood into the current block (intra prediction) or by copying filtered or unfiltered sample representations from one or two reference pictures (inter prediction). A reference picture is a picture that has been reconstructed from a bitstream and stored in a picture buffer for reference. The residual signal is obtained by inverse transforming the dequantized transform coefficients read from the bitstream. After the block reconstruction process, a loop filter is applied to enhance the signal of the reconstructed block and obtain a reconstructed picture.

The entropy decoding process of reading symbols such as transform coefficients deltaQP, intra prediction mode, motion vector differences, etc. from the bitstream is done by a parser that converts the bits read from the bitstream into binary decisions (bits (bins)) using a Context Adaptive Binary Arithmetic Coding (CABAC) engine. The parser converts or assembles these bits into symbols or syntax elements. The adaptivity of the entropy coding process is achieved by using CABAC Context (CC). Each context represents an adaptive probability model that models the entropy of a particular symbol or set of symbols. The term adaptation indicates a continuous update of the model to the current coding state. Thus, the model is adapted to the local statistics of the respective symbols. The update step is typically embedded in the arithmetic coding operation. First, the current state of the CC is used to parameterize the arithmetic coding process. Then, once the decoded symbol is derived, it is used to update the CC towards the current decoding probability in a given step.

In JEM software [2] and the upcoming VVC standard [3], various improvements on the arithmetic coding process have been evaluated and adopted. The arithmetic coding engine has been changed and the initialization and update process of the CC has been improved. The dynamic range of the probability representation and the behavior of the update process of the CC have been improved. Each CC has a separate two-level update step that controls the strength of the adaptation of the CC towards the current probability. This refinement helps to self-define the CC update procedure according to expected CC usage statistics.

Due to the large number of binary decisions required to transmit the syntax elements, and due to the number of syntax elements themselves, the binary decisions must be grouped using the same CC to arrive at the actual number of CCs or amount of CCs that can be processed by the decoder. In addition, grouping helps the update process to take advantage of local statistics and improves the stability of the underlying probabilistic model.

Binary decisions belonging to the same syntax element with statistically similar probability are usually grouped into one CC. Such grouping occurs where the binary decisions may have different probabilities that can be predicted from the decoded symbols in the adjacent neighborhood. In this case, the selected CC is predicted from the decoded symbols in the neighboring neighborhood. Such a procedure is typically applied to symbols transmitted relatively frequently in the bit stream.

In addition to context-controlled arithmetic coding, there is a bypass mode with a fixed probability of 0.5. This bypass mode, which is incorporated in the arithmetic encoder, is a low complexity mode for high throughput. For example, the bypass mode is widely used for transform coding.

Disclosure of Invention

The evolution of video coding methods has shown more and more block shapes and more coding tools, resulting in a large amount of algorithmic complexity required by the encoder to find a good coding representation. Therefore, it may be beneficial to skip evaluation (i.e., turn off) of coding tools in a particular context at the encoder to achieve a better tradeoff of complexity and compression efficiency. The use of on-block coding tools is typically communicated to the decoder by submitting a context modeling tool enable flag in the bitstream.

Problem(s)

Ideally, the decoder has a minimum constraint on whether to send a tool enable flag, i.e., a flag that determines whether a tool (e.g., coding mode or prediction mode) applies to a particular block, in the bitstream. The reason is that disabling a tool for a particular situation may deteriorate its impact on compression performance in some scenarios, even though these scenarios are unlikely to occur. In fact, one of the main reasons for the efficiency of hybrid video codecs is that a very large number of competing coding tools are always possible and only one of these tools is selected in a given situation.

For example, a constraint that only tools are allowed to be used for the size of a small block (and thus the tool enable flag is sent) would potentially reduce the coding efficiency for future applications with very high resolution, which typically only contains a small fraction of the small block.

On the other hand, for application scenarios where a fast encoder search strategy will not test the tool in some cases, it is inefficient to send the tool enable flag for all possible cases, because it is unlikely that a tool will be selected for these cases, because it is too expensive in terms of run-time, or the impact of using a tool on overall coding efficiency is rather small in these cases. In this case, the tool is not tested for a particular case, resulting in a faster encoder, but at the cost of reduced encoding efficiency: for a particular case, the tool enable flag will be sent in the bitstream, although in a given scenario the tool is never used for that case. Thus, in this scenario, the coding efficiency will be higher if the encoder search constraint is also represented by the constraint of sending a tool enable flag in the bitstream.

Starting from the prior art as described above, there may be a need for improving or enhancing the encoding of one or more encoding tools used for encoding/decoding pictures, images or video.

Drawings

Embodiments of the invention are described in further detail with reference to the accompanying drawings, in which:

fig. 1 illustrates an apparatus for encoding image or video data according to an embodiment of the present invention;

FIG. 2 illustrates an apparatus for decoding encoded image or video data according to an embodiment of the present invention;

fig. 3 illustrates an apparatus for encoding image or video data that introduces a separate, additional CABAC context for sending flags for blocks with aspect ratios greater than 2, in accordance with an embodiment of the invention;

fig. 4 shows an apparatus for decoding image or video data incorporating a separate, additional CABAC context for flags for blocks having aspect ratios greater than 2 and encoded using the apparatus of fig. 3, in accordance with an embodiment of the invention; and

fig. 5 shows an example of a computer system on which the units or modules described according to the inventive method and the steps of the method can be executed.

Detailed Description

Embodiments of the invention will now be described in more detail with reference to the drawings, in which the same or similar elements have the same reference numerals.

As described above, previous standards only considered aspects that reduce the number of context models used for overall entropy contrast (vs.) in the design of how binary decisions are grouped into context models. In contrast to this approach, a new aspect of the design is proposed according to the present invention on how to group binary decisions into a context model. This approach takes into account the increasing complexity of the algorithm itself. The present invention adjusts context modeling, for example, by inserting context to align the coding context with the context appropriate for turning off a particular coding tool. This enables the encoder to select between operating points with different algorithm complexity while avoiding deterioration of compression efficiency.

The method of the invention is illustrated using the following example. It is assumed that the binary decision of the probabilistic modeling switches between a first tool represented by algorithm 1 and a second tool represented by algorithm 2. Here, the second tool is considered a baseline tool, while the first tool is considered a more specialized tool. Based on this assumption, the second tool is more likely to be selected than the first tool as a whole. For example, assume that the total probability of a first tool being better than a second tool is 0.3.

Now assume that two application scenarios are presented. In the first application a, there are N cases where two tools are tested and the one with the best performance is selected. In the second application B, for some reason, only two tools were tested for a certain portion of all N cases, while for the remaining cases only the baseline tool, i.e. the second tool, was selected. For both applications, the decisions for all N cases must be context modeled and transmitted in the bitstream. For example, assume that the number of cases where both tools are tested in application B is equal to N/2. For the other N/2 cases, the tool was tested in application A, but not in application B.

If the tool flags are context coded using a single CC, then in the first application a the probability of algorithm 1 has a stable value of 0.3, while in the second application B the average probability drops to 0.15, introducing a fixed probability penalty of ± 0.15 for all N cases: in the case of a tool tested in application B, the true probability would be 0.3 instead of 0.15, whereas in the case of a tool not tested in application B, the true probability would be 0. In other words, in application B, using a single CC, the probability of the actual decision will be modeled in a non-optimal way, which results in a higher bit rate when transmitted in the bitstream.

In the inventive method, the disadvantage of such non-optimal modeling of decision probabilities is overcome in the following manner. Instead of using one probability model for all decisions, two (or more) probability models or CABAC contexts are assigned to the same decision. A selection index is used to select which probability model to use for each of the N cases.

When considering the above example again, the selection index is selected in this way: the determined part of the testing of algorithm 1 and algorithm 2 in the second application B is distinguished from the rest of the testing of algorithm 2 only. In other words, the two different cases are aggregated and represented by different values of the selection index.

When this selection index is used with two probabilistic models in application a, several different CCs can model the statistics of the tool flags, although the selection index switches between the two probabilistic models. In both models, the probability of the tool represented by Algorithm 1 is 0.3. This results in equivalent modeling as in the original case, which is modeled using only one probabilistic model.

However, using the above selection index with two probabilistic models in the second application B also leads to an optimal modeling of this situation. The probability of algorithm 1 is 0.3 for the deterministic portion of all N/2 cases testing both algorithms, while the probability of algorithm 1 is 0.0 for the remainder of testing only baseline algorithm 2. These two probabilities are captured in a well differentiated way in the two probability models, resulting in a modeling without any modeling penalty, which results in a low bit rate when transmitted in the bit stream.

Thus, embodiments of the inventive method are based on introducing an additional CABAC context for the transmit tool enable flag. The additional context is only applied when a condition is satisfied. Otherwise, the CABAC context will be selected as usual. According to an embodiment, the condition may be that the size of the current block belongs to a predefined subset of the block size, which condition may be skipped by the fast encoder search strategy, but may be beneficial for applications requiring high coding efficiency. According to another embodiment, the condition may be that the aspect ratio of the current block is higher than a certain value (e.g., 2), which may be skipped by the fast encoder search strategy, but may be beneficial for applications that require high coding efficiency and may control at the block level instead of the picture level or slice level.

On the one hand, due to the probability adaptation of the CABAC context, if the tool is never tested for specific cases defined for this condition in the application scenario, the signaling overhead of sending the tool flags for these cases will be very small. Therefore, for these cases, the coding efficiency is almost as good as not sending the tool flag.

On the other hand, again due to the probability adaptation of CABAC contexts, if the tool is also tested by the encoder in a different application scenario with the condition determined, the coding efficiency for sending the tool flags will not be significantly reduced if a separate CABAC context is used for the case determined by the condition.

Thus, in contrast to prior art methods, the assignment of the different CABAC contexts proposed by the present invention is not guided by attempting to model the overall conditional probability distribution of the tool flags. In contrast, as described in the above example, the assignment of different CABAC contexts corresponds to different application scenarios of the tool. Here, each application scenario is defined as a specific condition under which execution of a given tool is possible in principle, but never tested by the encoder in a given scenario.

There may be different reasons for excluding a tool or algorithm under a specific condition, and some examples of the case determined by the condition are given below, but the present invention is not limited to these examples. First, the exclusion algorithm may be too complex for these cases. Second, for these cases, the algorithm may not be implementable or even implementable, e.g., due to hardware or resource limitations. Third, in some scenarios, the algorithms used in these cases may only slightly improve compression performance. Fourth, the use of underlying algorithms in these cases essentially always provides very limited compression gains, and thus is only feasible when maximum compression performance is targeted. Fifth, these cases do not include the core application domain for which the algorithm or tool was originally designed.

More than one context split for tools

Embodiments of the present invention also incorporate splitting the context into multiple contexts, where each additional context corresponds to a different use case scenario of the underlying tool. One embodiment of the present description may be as follows. The raw or regular tool enable flags are modeled by a single context. According to an embodiment, the inventive method instead uses three contexts, wherein the selection index is controlled, for example, by the quantized version of the block region. Here, the selection index may be assigned as follows:

according to other embodiments, the inventive method uses four contexts or context models for the binarization of the syntax element intra _ MIP _ flag representing the affine linear weighted intra prediction LWIP or matrix based intra prediction MIP of the generic video coding VVC standard. The selection index is controlled by the aspect ratio (width/height or height/width) of the current block:

fig. 1 shows an apparatus 100 for encoding image or video data according to an embodiment of the invention. The apparatus 100 includes an encoder 102. The encoder 102 receives image or video data 104 and encodes the received image or video data 104 to provide a bitstream 106 representing the encoded image or video data. The encoder 102 includes a CABAC encoder 108. The CABAC encoder 108 receives a binary-valued syntax element 110 associated with a particular data block of image or video data to be encoded and encodes the binary-valued syntax element into coded bits 112 of a bitstream using a selected context model. The binary-valued syntax element includes a tool flag that indicates whether a particular encoding tool is employed in encoding the image or video data. A first set of context models for encoding a tool flag is selected for one or more first portions of a particular data block that are application independent and for which the encoding tool is always applicable. A second set of context models for encoding the tool flags is selected for one or more second portions of the particular data block depending on the application, the application of the encoding tool, or the inapplicability of the encoding tool. According to an embodiment, the CABAC encoder 108 selects either the first context model or the second context model for the currently processed portion of the particular data block in response to the selection index, also described below. The selection index has a first value indicating that the currently processed portion of the particular data block is the first portion and a second value indicating that the currently processed portion of the particular data block is the second portion.

Fig. 2 shows an apparatus 200 for decoding encoded image or video data according to an embodiment of the invention. The apparatus 200 includes a decoder 202. The decoder 202 receives a bitstream 106, similar to the bitstream provided by the encoder 102 of fig. 1. The bitstream 106 includes encoded image or video data, and the decoder 202 decodes the encoded image or video data from the received bitstream and provides decoded image or video data 204. The decoder includes a CABAC decoder 206 that decodes, from the bitstream 106, the binary-valued syntax element 110 associated with a particular data block of the encoded image or video data using the selected context model. The binary-valued syntax element includes a tool flag that indicates whether a particular encoding tool is employed in encoding the image or video data. A first set of context models for decoding the tool flags is selected for portions of the particular data block that are not application-dependent and for which the coding tool is always applicable, and a second set of context models for decoding the tool flags is selected for portions of the particular data block that are application-dependent, for which the coding tool is applicable, or not applicable. According to an embodiment, the CABAC decoder 206 selects either the first context model or the second context model for the currently processed portion of the particular data block in response to the selection index, also described below. The selection index has a first value indicating that the currently processed portion of the particular data block is the first portion and a second value indicating that the currently processed portion of the particular data block is the second portion.

According to embodiments, the first set of context models comprises a first context model or a plurality of first context models, and the second set of context models comprises a second context model or a plurality of second context models.

Combining with original context index

As previously described, in many cases, the tool-enabled flag is modeled by more than one context model when the binary decisions may have different probabilities, which can be predicted from the decoded symbols in the neighboring neighborhood. The inventive method can also be applied as a combination of such entropy driven context selection and the inventive selection index. The motivation for this combination is obvious, as the original entropy driven context index may also be applicable after separation.

An embodiment of a context model for selecting tool flags both by entropy driven context selection and by the method of the invention is as follows.

CombinedIndex＝EntropyIndex+3*SelectionIndex

Thus, in this embodiment, pure entropy driven context selection will produce three possible context models for a given tool flag, while its combination with the context model selection of the present invention produces six possible context models indexed by the index CombinedIndex.

Replacement of bypass mode encoding enable flag

The inventive method is also applicable to flags that were originally encoded using bypass mode. In this case, one or two additional context models are needed to implement the use-case driven implementation. If only one additional context model is used, selecting the index distinguishes the use of bypass mode for encoding the unaffected part of the case from the use of one context model for all other cases (where the tool can now be switched on and off).

In case two context models are used, the bypass mode is completely replaced by context modeling arithmetic coding and selecting the index will distinguish between the two context models. It should be mentioned that with the improved updating techniques in the upcoming VVC standard, one context modeling the non-switchable part of the situation can be updated with less update strength to implement a quasi-static model.

It should also be mentioned that the additional model modeling the switchable part of the situation is certainly used together with a very strong update strength to enable a fast adaptation of the context model to the tool-on or tool-off probability.

Alternative ways of signaling partial tool enablement in parameter sets

The aforementioned partial tool enabling behavior may also be signaled in a parameter set, which is signaled in the bitstream for each predetermined portion of one or several frames (e.g. for each slice). In this case, no context separation is required, since the transmission flag from the parameter set will contain all necessary information. However, this signaling has disadvantages compared to use-case driven context model selection: in the former case, the tool can only be enabled or disabled completely for the case corresponding to the application scene throughout the entire video sequence portion for which the parameter set applies, while in the latter case the tool can also be disabled in any variable portion of the video sequence that does not need to be predetermined or known by the decoder. The reason is that in the latter case, if a special context model for the tool flags is assigned to all cases, where disabling the tool for a particular application scenario is sometimes feasible, the encoder corresponding to a particular application scenario never tests the tool for the respective case, starting from any position in the sequential encoding, up to a flexible point in the sequence encoding, which has little signaling overhead compared to the case where the tool is completely disabled in all cases corresponding to application scenarios in this particular part of the video sequence.

Applications using arbitrary coding tools

The context splitting of the present invention can be applied to any coding tool controlled by an enable flag. The current candidate tools that may be present in future VVC standards for use with the method of the present invention are listed below. However, the application of the method of the invention is not limited to these tools. Candidate tools include DMVR, OBMC, BIO, FRUC, LIC, ISP, ALF, SAO, MTS inter-or MTS intra, 65 angular intra modes, MRL, and partitioning tools such as QTBT, MTT, or QTBT + TT.

The inventive method can also be applied to tools having different configurations represented by index values. Here, the CC allocation of the present invention will be determined by the fact that: in some application scenarios, only a subset of all configurations of the tool are feasible for a particular situation or a general situation. The CC allocation of the present invention will take these different application scenarios into account by allocating additional CCs to configurations where the tool is not feasible for the scenario. An embodiment of this aspect of the invention is a tool that applies one of n transforms to the inverse transform of the prediction residual, where the index of the transform is transmitted in the bitstream.

Using the method of the present invention, many tools can only be provided with some context-split enabling tools in certain situations. Here, context splitting according to the present invention is a specific case according to which a specific tool may not be feasible in a specific application scenario. These particular cases depend on the key attributes of the tool. The list of attributes that may be evaluated or combined with certain scenarios not limiting to the inventive use case is: block size, block shape, block aspect ratio, temporal level, QP, picture type, picture resolution, dynamic range of picture, reference picture, leading picture of GOP.

The specific case may also be a combination of those attributes described above.

Application embodiment of context allocation method of the invention

Affine Linear Weighted Intra Prediction (LWIP) [4] is a new intra prediction technique. Like conventional intra prediction, LWIP consists of a set of prediction modes. Given the reconstructed (reference) samples on the left and top, each of the signaled prediction modes corresponds to a prediction function that generates a different prediction signal.

In the case where both the legacy mode and the LWIP mode are available, the encoder compares the rate-distortion costs of the legacy mode and the LWIP mode and decides the mode whose total cost is the lowest. The selected prediction mode is then sent to the decoder in the bitstream and it is used to select the corresponding prediction function to predict the block. Signaling the prediction mode has the following syntax: first, a flag is sent indicating whether the block is predicted in the legacy mode or the LWIP mode. If the conventional prediction is selected, the conventional prediction mode is read from the bitstream according to the intra prediction signaling. Otherwise, if LWIP prediction is selected, the mode index within the set of available LWIP modes is sent after the flag. Since both legacy mode and LWIP mode are available for all block sizes that the codec typically supports for intra prediction, a flag must be sent for each block.

The VVC coding standard currently supports intra prediction of a luminance block size within a range W × H ═ {4, 8, 16, 32, 64} × {4, 8, 16, 32, 64 }. Clearly, an encoder search using Rate Distortion Optimization (RDO) can become very complex, since the rate-distortion cost of all different block sizes (resulting from splitting a large block into smaller blocks) is evaluated for all different prediction modes. To reduce the complexity of the encoder, optimization typically reduces the number of combinations tested, excluding cases that are statistically unlikely to result in the lowest rate-distortion cost.

Core prediction for LWIP supports only W_LWIP＝H_LWIPE {4, 8, 16 }. To enable LWIP prediction for all other block sizes, the reference samples are downsampled to match the core prediction size and the output is upsampled to match the block size. This has the following effect: for block shapes with highly unequal down-and up-sampling ratios in the horizontal and vertical directions, the prediction quality is reduced. This means that for blocks with an aspect ratio greater than 2, i.e.,orLWIP mode is less likely to result in lower rate-distortion costs than conventional prediction mode.

This effect can be used to reduce the complexity of the encoder by limiting the LWIP mode to blocks with aspect ratios less than or equal to 2, and not testing blocks with higher aspect ratios for LWIP mode. However, this also results in a reduction in coding efficiency. Not sending the flag for blocks with aspect ratios greater than 2 reduces the loss of coding efficiency, but does not enable an encoder that achieves higher overall coding efficiency by testing the LWIP mode for blocks with aspect ratios greater than 2 (which may be required for different applications).

A solution that supports both fast and high compression efficiency encoders is to introduce a separate, additional CABAC context for sending flags for blocks with aspect ratios greater than 2. Now, if the LWIP mode is not tested at the encoder for these blocks, the flag is always 0 (i.e. blocks that do not use LWIP mode prediction) and sending the flag results in little overhead (overhead for only the probability of sending a 1 context to zero adjustment), which means that the coding efficiency is very close to a solution where no flag is sent for those blocks. If LWIP blocks are tested at the encoder for these blocks, the flag is 1 with a certain probability (the LWIP mode prediction block is used) and sending the flag causes little overhead, which means that the coding efficiency is very close to the solution of sending the flag with the same context for all block sizes.

In other words, as described above, according to an embodiment, the inventive method uses four contexts or context models for the binarization of the syntax element intra _ MIP _ flag representing the affine linear weighted intra prediction LWIP or matrix-based intra prediction MIP of the generic video coding VVC standard. The selection index is controlled by the aspect ratio (width/height or height/width) of the current block:

fig. 3 illustrates an apparatus (e.g., similar to the apparatus of fig. 1) for encoding image or video data that introduces a separate, additional CABAC context for sending flags for blocks with aspect ratios greater than 2, according to an embodiment of the invention. The encoder 102 receives image or video data 104 and encodes the received image or video data 104 to provide a bitstream 106 representing the encoded image or video data. CABAC encoder 108 receives tool flags 110, tool flags 110 indicating whether a particular encoding tool, e.g., affine linear weighted intra prediction, LWIP, is employed in encoding a block of image or video data. Affine linear weighted intra prediction LWIP is also referred to as matrix-based intra prediction MIP in the general video coding VVC standard, and the tool flag is also referred to as intra _ MIP _ flag, which indicates the applicability of MIP of the affine LWIP or VVC standard. For a block 300 of image or video data having an aspect ratio greater than 2 and for which a particular coding tool is applicable, a first context model for coding a tool flag is selected from a set of one or more first context models and provided to CABAC 206. For blocks 302 of image or video data having an aspect ratio less than or equal to 2 and for which a particular coding tool is applicable, a second context model for coding tool flags is selected from a set of one or more second context models and provided to CABAC 206. For example, if the LWIP mode is not tested for blocks 300 with an aspect ratio greater than 2, the flag is always 0 and the additional CABAC context sends a probability of 1 to zero adjustment, whereas if the LWIP mode is tested for blocks 300 with an aspect ratio greater than 2, the flag is 1 with a certain probability. According to an embodiment, the CABAC encoder 108 may select the first context model and the second context model for the current processed block in response to the selection index. The selection index indicates that the currently processed block has an aspect ratio greater than 2 or has an aspect ratio less than or equal to 2.

Fig. 4 illustrates an apparatus (e.g., apparatus 100 similar to the apparatus of fig. 2) for decoding image or video data that introduces a separate, additional CABAC context for flags for blocks having aspect ratios greater than 2 and encoded using the apparatus of fig. 3, in accordance with an embodiment of the present invention. For a block 300 of image or video data having an aspect ratio greater than 2 and for which a particular coding tool is applicable, a first context model for coding a tool flag is selected from a set of one or more first context models and provided to CABAC 206. For blocks 302 of image or video data having an aspect ratio less than or equal to 2 and for which a particular coding tool is applicable, a second context model for coding tool flags is selected from a set of one or more second context models and provided to CABAC 206.

For example, the binarization of the syntax element intra _ mip _ flag may employ a total of four context models with context indices {0, 1, 2, 3} as shown below:

-if the aspect ratio (width/height or height/width) of the current block is greater than 2, using a context model with index 3,

otherwise, one of the context models 0, 1, 2 is used, wherein the selection may depend on the intra _ mip _ flag of the left block and the upper block of the current block, e.g. because it is known and used for several other syntax elements.

In the VVC specification, the CU syntax can be as follows (see, e.g., the 7.3.10.5 CU syntax of [5 ]):

intra _ mip _ flag [ x0] [ y0] equals 1 specifies that the intra prediction type of luma samples is matrix-based intra prediction. Intra _ mip _ flag [ x0] [ y0] equal to 0 specifies that the intra prediction type of luma samples is not matrix-based intra prediction (see, e.g., 7.4.11.5-coding unit semantics of [5 ]).

The binarization of the syntax element intra _ mip _ flag can be as follows (see e.g. 9.3.4.2-ctxTable, ctxIdx and bypassFlag derivation procedure of [5 ]):

although some aspects of the described concepts have been described in the context of an apparatus, it will be apparent that these aspects also represent a description of a corresponding method, where a block or device corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of method steps also represent a description of a respective block or item or a feature of a respective apparatus.

The various elements and features of the invention may be implemented in hardware using analog and/or digital circuitry, in software executing instructions by one or more general-purpose or special-purpose processors, or as a combination of hardware and software. For example, embodiments of the invention may be implemented in the context of a computer system or another processing system. Fig. 5 shows an example of a computer system 400. These units or modules and the steps of the methods performed by these units may be executed on one or more computer systems 400. The computer system 400 includes one or more processors 402, such as a special purpose or general purpose digital signal processor. The processor 402 is connected to a communication infrastructure 404, such as a bus or network. The computer system 400 includes: a main memory 406, e.g., Random Access Memory (RAM); and secondary storage 408 such as a hard disk drive and/or a removable storage drive. Secondary memory 408 may allow computer programs or other instructions to be loaded into computer system 400. Computer system 400 may also include a communications interface 410 to allow software and data to be transferred between computer system 400 and external devices. Communications may be in the form of electronic, electromagnetic, optical, or other signals capable of being processed by the communications interface. The communication may use wire or cable, fiber optics, a phone line, a cellular phone link, an RF link, and other communication channels 412.

The terms "computer program medium" and "computer-readable medium" are used generally to refer to tangible storage media, such as removable storage units or hard disks installed in hard disk drives. These computer program products are means for providing software to computer system 400. Computer programs, also called computer control logic, are stored in main memory 406 and/or secondary memory 408. Computer programs may also be received via communications interface 410. The computer programs, when executed, enable the computer system 400 to implement the present invention. In particular, the computer programs, when executed, enable the processor 402 to implement processes of the present invention, such as any of the methods described herein. Accordingly, such computer programs may represent controllers of the computer system 400. Where the disclosure is implemented using software, the software may be stored in a computer program product and loaded into computer system 400 using a removable storage drive, an interface, such as communications interface 410.

Implementation in hardware or in software may be performed using a digital storage medium, such as a cloud storage, a floppy disk, a DVD, a blu-ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a flash memory, on which electronically readable control signals are stored, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Accordingly, the digital storage medium may be computer-readable.

Some embodiments according to the invention comprise a data carrier with electronically readable control signals capable of cooperating with a programmable computer system so as to perform one of the methods described herein.

Generally, embodiments of the invention can be implemented as a computer program product having a program code operable to perform one of the methods when the computer program product runs on a computer. The program code may be stored, for example, on a machine-readable carrier.

Other embodiments include a computer program stored on a machine-readable carrier for performing one of the methods described herein. In other words, an embodiment of the inventive method is thus a computer program with a program code for performing one of the methods described herein, when the computer program runs on a computer.

Thus, another embodiment of the inventive method is a data carrier (or digital storage medium or computer readable medium) having a computer program recorded thereon for performing one of the methods described herein. Thus, other embodiments of the inventive method are a data stream or a signal sequence representing a computer program for performing one of the methods described herein. The data stream or signal sequence may for example be arranged to be transmitted via a data communication connection (e.g. via the internet). Other embodiments include a processing device, such as a computer or programmable logic device, configured or adapted to perform one of the methods described herein. Another embodiment comprises a computer having a computer program installed thereon for performing one of the methods described herein.

In some embodiments, a programmable logic device (e.g., a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor to perform one of the methods described herein. In general, the method is preferably performed by any hardware device.

The above-described embodiments are merely illustrative of the principles of the present invention. It should be understood that modifications and variations of the arrangements and details described herein will be apparent to others skilled in the art. It is therefore intended that the scope of the appended patent claims be limited only by the details of the description and the explanation of the embodiments herein, and not by the details of the description and the explanation.

Reference to the literature

[1]ISO/IEC，ITU-T.High efficiency video coding.ITU-T RecommendationH.265|ISO/IEC 2300810(HEVC)，edition 1，2013；edition 2，2014.

[2]JEM reference software，

https：//jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/.

[3]B.Bross，J.Chen，Shan Liu，“Versatile Video Coding(Draft 4)”，JVET-M1001-v5，February 2019，Marrakesh，Morocco

[4]J.Pfaff，B.Stallenberger，M.Schafer，P.Merkle，P.Helle，R.Rische，H.Schwarz，D.Marpe，T.Wiegand，“Affine Linear Weighted Intra Prediction”，JVET-M0043，February 2019，Marrakesh，Morocco

[5]B.Bross，J.Chen，Shan Liu，“Versatile Video Coding(Draft 8)”，JVET-Q2001-vD，February 2020，Brussels，Belgium。

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于视频译码的最后位置译码的上下文推导

Use-case driven context model selection for hybrid video coding tools

相关技术

网友询问留言