Intra block copy for screen content coding and decoding

文档序号:214974 发布日期:2021-11-05 浏览:26次 中文

阅读说明:本技术 用于屏幕内容编解码的帧内块复制 (Intra block copy for screen content coding and decoding ) 是由 修晓宇 陈漪纹 王祥林 马宗全 于 2020-03-11 设计创作,主要内容包括:一种用于对视频数据进行编码的方法包括接收包括多个编码单元的视频图片。将图片分成多个非重叠块。编码器计算多个非重叠块中的每个块的散列值。将所有非重叠块分类为包括第一类别和第二类别的至少两个类别。第一类别包括表征第一组散列值所涵盖的一个或多个散列值的一个或多个块,并且第二类别包括所有剩余的块。将第二类别中的块分类为包括第一分组的至少两个分组。第一分组包括表征与第二类别中的另一个块相同的散列值的一个或多个块。还提供了一种相关联的计算设备和非暂态计算机可读存储介质。(A method for encoding video data includes receiving a video picture including a plurality of coding units. The picture is divided into a plurality of non-overlapping blocks. The encoder calculates a hash value for each block of the plurality of non-overlapping blocks. All non-overlapping blocks are classified into at least two classes including a first class and a second class. The first category includes one or more chunks characterizing one or more hash values encompassed by the first set of hash values, and the second category includes all remaining chunks. The blocks in the second category are classified into at least two groups including the first group. The first packet includes one or more chunks characterizing the same hash value as another chunk in the second category. An associated computing device and non-transitory computer-readable storage medium are also provided.)

1. A video encoding method, comprising:

receiving a video picture including a plurality of coding units, wherein each coding unit of the plurality of coding units is predicted from a reference coding unit in the same picture by an intra block copy, IBC, mode, the reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit;

dividing the picture into a plurality of non-overlapping blocks and calculating, by an encoder, a hash value for each block of the plurality of non-overlapping blocks;

classifying all non-overlapping blocks into at least two categories including a first category and a second category, wherein the first category includes one or more blocks characterizing one or more hash values encompassed by a first set of hash values and the second category includes all remaining blocks;

classifying blocks in the second category into at least two groups including a first group, wherein the first group includes one or more blocks that characterize a same hash value as another block in the second category;

determining a distortion metric for calculating a difference between samples in one coding unit and samples of a reference coding unit in the same picture as the coding unit; and

obtaining an optimal Block Vector (BV) of a first coding unit in the picture based on the distortion metric, wherein the BV of the first coding unit is a displacement between the first coding unit and a reference coding unit in the same picture as the first coding unit.

2. The video encoding method of claim 1, wherein determining the distortion metric comprises:

using sum of absolute differences SAD as the distortion measure when the percentage of blocks in the first group in the second class is greater than a predetermined threshold, an

Using Absolute Transform Differences (ATD) and SATD as the distortion metric when the percentage of blocks in the first packet in the second class is not greater than the predetermined threshold.

3. The video encoding method of claim 1, wherein obtaining the optimal BV of the first coding unit comprises:

identifying a second encoding unit corresponding to the first encoding unit by matching a hash value of each block in the first encoding unit with a hash value of an in-place block of the block in a second encoding unit, wherein the hash value of the in-place block in the second encoding unit is the same as the hash value of the block in the first encoding unit, and the plurality of encoding units include the second encoding unit.

4. The video encoding method of claim 3, wherein identifying the second coding unit corresponding to the first coding unit comprises:

identifying a leading block in the first coding unit, wherein the leading block is a block in the first coding unit that corresponds to a minimum number of matching blocks that have a same hash value as the block in the picture;

identifying a second encoding unit comprising a co-located block of the lead block, wherein the second encoding unit has the same size as the first encoding unit and the hash value of the co-located block is the same as the hash value of the lead block; and

determining the second encoding unit as a reference encoding unit, wherein the hash value of each block in the first encoding unit is the same as the hash value of the co-located block of the block in the reference encoding unit.

5. The video encoding method of claim 1, wherein obtaining the optimal BV of the first coding unit comprises:

selecting and maintaining a set of BV candidates based on the distortion metric when the first coding unit is encoded at a first time based on a first block partitioning path;

calculating a rate-distortion cost for each BV candidate in the maintained set of BV candidates based on a second partitioning path when the first coding unit is coded at a second time based on the first partitioning path;

selecting a BV from the set of BV candidates, wherein the selected BV has a minimum rate-distortion cost among the maintained set of BV candidates; and

determining the selected BV as an optimal BV of the first coding unit.

6. The video encoding method of claim 1, wherein obtaining the optimal BV of the first coding unit comprises:

maintaining, at the encoder, a BV library, wherein the BV library comprises one or more BV candidates obtained from a BV search of previously encoded coding units, the number of the one or more BV candidates being N, and N being a positive integer;

generating a BV candidate list, wherein the BV candidate list comprises all BVs in the BV library, BVs of spatially neighboring coding units, and derived BVs of the first coding unit;

calculating a rate-distortion cost for each BV in the BV candidate list, and selecting the BV as the optimal BV for the first coding unit, wherein the selected BV has the smallest rate-distortion cost; and

updating the BV library by adding one or more BVs from the BV candidate list to replace one or more existing BVs in the BV library, wherein the updated BV library is used to determine an optimal BV of future coding units, a number of the added one or more BVs and a number of the replaced one or more existing BVs are each K, and K is a positive integer.

7. The video encoding method of claim 6, wherein the value of N is 64 and the value of K is 8.

8. The video coding method of claim 6, wherein the derived BV of the first coding unit is generated by:

identifying a first reference coding unit encoded in the IBC mode, wherein the first reference coding unit is pointed to by a first BV of spatially neighboring coding units of the first coding unit encoded in the IBC mode;

identifying a second BV, wherein the second BV is a BV of the first reference coding unit;

generating a first derived BV by adding the first BV and the second BV;

identifying a second reference coding unit encoded in the IBC mode, wherein the second reference coding unit is pointed to by a second BV from the first reference coding unit;

identifying a third BV, wherein the third BV is a BV of the second reference coding unit;

generating a second derived BV by adding the first derived BV and the third BV; and

generating one or more derived BVs by repeating the above process until the corresponding reference block is not encoded by the IBC mode.

9. The video coding method of claim 6, wherein the spatial neighboring coding units comprise the following neighboring coding units: and the left, left lower, upper, right upper and left upper adjacent coding units of the first coding unit.

10. A computing device, comprising:

one or more processors;

a non-transitory storage device coupled to the one or more processors; and

a plurality of programs stored in the non-transitory storage device, which when executed by the one or more processors, cause the one or more processors to perform acts comprising:

receiving a video picture including a plurality of coding units, wherein each coding unit of the plurality of coding units is predicted from a reference coding unit in the same picture by an intra block copy, IBC, mode, the reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit;

dividing the picture into a plurality of non-overlapping blocks and calculating, by an encoder, a hash value for each block of the plurality of non-overlapping blocks;

classifying all non-overlapping blocks into at least two categories including a first category and a second category, wherein the first category includes one or more blocks characterizing one or more hash values encompassed by a first set of hash values and the second category includes all remaining blocks;

classifying blocks in the second category into at least two groups including a first group, wherein the first group includes one or more blocks that characterize a same hash value as another block in the second category;

determining a distortion metric for calculating a difference between samples in one coding unit and samples of a reference coding unit in the same picture as the coding unit; and

obtaining an optimal Block Vector (BV) of a first coding unit in the picture based on the distortion metric, wherein the BV of the first coding unit is a displacement between the first coding unit and a reference coding unit in the same picture as the first coding unit.

11. The computing device of claim 10, wherein to determine the distortion metric comprises to:

using Sum of Absolute Differences (SAD) as the distortion measure when the percentage of blocks in the first packet in the second category is greater than a predetermined threshold, an

Using absolute transform difference and SATD as the distortion metric when a percentage of blocks in the first packet in the second class is not greater than the predetermined threshold.

12. The computing device of claim 10, wherein to obtain the optimal BV of the first coding unit comprises to:

identifying a second encoding unit corresponding to the first encoding unit by matching a hash value of each block in the first encoding unit with a hash value of an in-place block of the block in a second encoding unit, wherein the hash value of the in-place block in the second encoding unit is the same as the hash value of the block in the first encoding unit, and the plurality of encoding units include the second encoding unit.

13. The computing device of claim 12, wherein to identify a second coding unit corresponding to the first coding unit comprises to:

identifying a leading block in the first coding unit, wherein the leading block is a block in the first coding unit that corresponds to a minimum number of matching blocks that have a same hash value as a block in the picture;

identifying a second encoding unit comprising a co-located block of the lead block, wherein the second encoding unit has the same size as the first encoding unit and the hash value of the co-located block is the same as the hash value of the lead block; and

and determining the second coding unit as a reference coding unit, wherein the hash value of each block in the first coding unit is the same as the hash value of the same block in the second coding unit.

14. The computing device of claim 10, wherein to obtain the optimal BV of the first coding unit comprises to:

selecting and maintaining a set of BV candidates based on the distortion metric when the first coding unit is encoded at a first time based on a first block partitioning path;

calculating a rate-distortion cost for each BV candidate in the maintained set of BV candidates based on a second partitioning path when the first coding unit is coded at a second time based on the first partitioning path;

selecting a BV from the set of BV candidates, wherein the selected BV has a minimum rate-distortion cost; and

determining the selected BV as an optimal BV of the first coding unit.

15. The computing device of claim 10, wherein to obtain the optimal BV of the first coding unit comprises to:

maintaining, at the encoder, a BV library comprising one or more BV candidates obtained from a BV search of previously coded coding units, the number of the one or more BV candidates being N, and N being a positive integer;

generating a BV candidate list, wherein the BV candidate list comprises all BVs in the BV library, BVs of spatially neighboring coding units, and derived BVs of the first coding unit;

calculating a rate-distortion cost for each BV in the BV candidate list, and selecting the BV as the optimal BV for the first coding unit, wherein the selected BV has the smallest rate-distortion cost; and

updating the BV library by adding one or more BVs from the BV candidate list to replace one or more existing BVs in the BV library, wherein the updated BV library is used to determine an optimal BV of future coding units, a number of the added one or more BVs and a number of the replaced one or more existing BVs are K, respectively, and K is a positive integer.

16. The computing device of claim 15, wherein the value of N is 64 and the value of K is 8.

17. The computing device of claim 15, wherein the derived BV of the first coding unit is generated by:

identifying a first reference coding unit encoded in the IBC mode, wherein the first reference coding unit is pointed to by a first BV of spatially neighboring coding units of the first coding unit encoded in the IBC mode;

identifying a second BV, wherein the second BV is a BV of the first reference coding unit;

generating a first derived BV by adding the first BV and the second BV;

identifying a second reference coding unit encoded in the IBC mode, wherein the second reference coding unit is pointed to by a second BV from the first reference coding unit;

identifying a third BV, wherein the third BV is a BV of the second reference coding unit;

generating a second derived BV by adding the first derived BV and the third BV; and

generating one or more derived BVs by repeating the above process until the corresponding reference block is not encoded by the IBC mode.

18. The computing device of claim 15, wherein the spatially neighboring coding units comprise the following neighboring coding units: and the left, left lower, upper, right upper and left upper adjacent coding units of the first coding unit.

19. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the one or more processors to perform acts comprising:

receiving a video picture including a plurality of coding units, wherein each coding unit of the plurality of coding units is predicted from a reference coding unit in the same picture by an intra block copy, IBC, mode, the reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit;

dividing the picture into a plurality of non-overlapping blocks and calculating, by an encoder, a hash value for each block of the plurality of non-overlapping blocks;

classifying all non-overlapping blocks into at least two categories including a first category and a second category, wherein the first category includes one or more blocks characterizing one or more hash values encompassed by a first set of hash values and the second category includes all remaining blocks;

classifying blocks in the second category into at least two groups including a first group, wherein the first group includes one or more blocks that characterize a same hash value as another block in the second category;

determining a distortion metric for calculating a difference between samples in one coding unit and samples of a reference coding unit of the coding unit in the same picture; and

obtaining an optimal Block Vector (BV) of a first coding unit in the picture based on the distortion metric, wherein the BV of the first coding unit is a displacement between the first coding unit and a reference coding unit in the same picture.

20. The non-transitory computer-readable storage medium of claim 19, wherein determining the distortion metric comprises:

using Sum of Absolute Differences (SAD) as the distortion measure when the percentage of blocks in the first packet in the second category is greater than a predetermined threshold, an

Using absolute transform difference and SATD as the distortion metric when a percentage of blocks in the first packet in the second class is not greater than the predetermined threshold.

Technical Field

The present disclosure relates generally to video coding (e.g., encoding and decoding) and compression. More particularly, the present disclosure relates to a method, computing device, and non-transitory computer-readable storage medium for Intra Block Copy (IBC) for screen content codec.

Background

This section provides background information related to the present disclosure. The information contained in this section is not necessarily to be construed as prior art.

Various video codec techniques may be used to compress the video data. Video coding is performed according to one or more video coding standards. For example, video codec standards include general video codec (VVC), joint exploration test model codec (JEM), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), moving picture experts group codec (MPEG), and the like. Video coding is typically performed using prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy present in video images or sequences. An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing video quality degradation.

The first version of the HEVC standard was completed in 2013 in 10 months, which provides a bit rate saving of about 50% or an equivalent perceptual quality compared to the previous generation video codec standard h.264/MPEG AVC. Although the HEVC standard provides significant codec improvements over its predecessors, there is evidence that additional codec tools can be used to achieve codec efficiencies over HEVC. On this basis, both VCEG and MPEG began the quest for new codec technologies to achieve future video codec standardization. A joint video exploration team (jfet) was formed by ITU-T VECG and ISO/IEC MPEG 10 months 2015, and a great deal of research was started on advanced technologies that may significantly improve codec efficiency. Jfet obtains a reference software called Joint Exploration Model (JEM) by integrating a number of additional codec tools on top of the HEVC test model (HM).

In 10.2017, ITU-T and ISO/IEC published a joint proposal symptom set (CfP) on video compression with the capability to go beyond HEVC. In 4 months of 2018, 23 CfP replies were received and evaluated at the 10 th jfet meeting, demonstrating an improvement in compression efficiency of about 40% over HEVC. Based on the results of such evaluations, jfet initiated a new project to develop a new generation of video codec standard known as universal video codec (VVC). In the same month, a reference software code base called VVC Test Model (VTM) is established for demonstrating the reference implementation of the VVC standard.

Disclosure of Invention

This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.

According to a first aspect of the present disclosure, there is provided a video encoding method, performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. According to a video encoding method, a video picture comprising a plurality of coding units is received. Each of the plurality of coding units is predicted from a reference coding unit in the same picture by an Intra Block Copy (IBC) mode. The reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit. The picture is divided into a plurality of non-overlapping blocks. The encoder calculates a hash value for each block of the plurality of non-overlapping blocks. All non-overlapping blocks are classified into at least two classes including a first class and a second class. The first category includes one or more chunks characterizing one or more hash values encompassed by the first set of hash values, and the second category includes all remaining chunks.

In addition, the blocks in the second category are classified into at least two groups including the first group. The first packet includes one or more chunks characterizing the same hash value as another chunk in the second category. A distortion metric is determined for calculating a difference between samples in one coding unit and samples of a reference coding unit in the same picture as the coding unit. Based on the distortion metric, an optimal Block Vector (BV) of a first coding unit in the picture is obtained. Each BV of the first coding unit is a displacement of the first coding unit from a reference coding unit of the first coding unit in the same picture.

According to a second aspect of the present disclosure, there is provided a computing device comprising one or more processors, a non-transitory storage coupled to the one or more processors, and a plurality of programs stored in the non-transitory storage. The plurality of programs, when executed by one or more processors, cause a computing device to perform the following actions. A video picture comprising a plurality of coding units is received. Each of the plurality of coding units is predicted from a reference coding unit in the same picture by an Intra Block Copy (IBC) mode. The reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit. The picture is divided into a plurality of non-overlapping blocks. The encoder calculates a hash value for each block of the plurality of non-overlapping blocks. All non-overlapping blocks are classified into at least two classes including a first class and a second class. The first category includes one or more chunks characterizing one or more hash values encompassed by the first set of hash values, and the second category includes all remaining chunks.

In addition, the blocks in the second category are classified into at least two groups including the first group. The first packet includes one or more chunks characterizing the same hash value as another chunk in the second category. A distortion metric is determined for calculating a difference between samples in one coding unit and samples of a reference coding unit in the same picture as the coding unit. Based on the distortion metric, an optimal Block Vector (BV) of a first coding unit in a picture is obtained. Each BV of the first coding unit is a displacement of the first coding unit from a reference coding unit of the first coding unit in the same picture.

According to a third aspect of the disclosure, a non-transitory computer readable storage medium stores a plurality of programs for execution by a computing device having one or more processors. The plurality of programs, when executed by the one or more processors, cause the computing device to codec screen content in IBC mode. A video picture comprising a plurality of coding units is received. Each of the plurality of coding units is predicted from a reference coding unit in the same picture by an Intra Block Copy (IBC) mode. The reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit. The picture is divided into a plurality of non-overlapping blocks. The encoder calculates a hash value for each block of the plurality of non-overlapping blocks. All non-overlapping blocks are classified into at least two classes including a first class and a second class. The first category includes one or more chunks characterizing one or more hash values encompassed by the first set of hash values, and the second category includes all remaining chunks.

In addition, the blocks in the second category are classified into at least two groups including the first group. The first packet includes one or more chunks characterizing the same hash value as another chunk in the second category. A distortion metric is determined for calculating a difference between samples in one coding unit and samples of a reference coding unit in the same picture as the coding unit. Based on a distortion metric, an optimal Block Vector (BV) of a first coding unit in the picture is obtained. Each BV of the first coding unit is a displacement of the first coding unit from a reference coding unit of the first coding unit in the same picture.

Drawings

Various illustrative, non-limiting examples of the present disclosure are described below in connection with the following figures. Variations in structure, method, or function may be implemented by persons of ordinary skill in the relevant art based on the examples presented herein, and such variations are included within the scope of the present disclosure. The teachings of the different examples may, but need not, be combined with each other in case of conflict.

Fig. 1 is a block diagram illustrating an illustrative block-based video encoder that may be used in connection with many video codec standards, including VVC;

fig. 2 is a block diagram illustrating an illustrative block-based video decoder that may be used in connection with many video codec standards, including VVC;

3A-3E illustrate example partition types, namely quad partition (FIG. 3A), horizontal binary partition (FIG. 3B), vertical binary partition (FIG. 3C), horizontal ternary partition (FIG. 3D), and vertical ternary partition (FIG. 3E), according to some examples;

FIG. 4 illustrates valid prediction sample regions for coding units of an IBC codec;

fig. 5 is a flow diagram of IBC signaling in a VVC according to an example;

FIG. 6 is a block diagram illustrating a decoding process using luma mapping with chroma scaling (LMCS) according to one example;

fig. 7 is a flow diagram of a Block Vector (BV) estimation process for IBC mode according to one example;

FIG. 8 shows a diagram of locating IBC reference encoding units based on hash values of 4 x 4 sub-blocks, according to an example;

fig. 9 illustrates spatially neighboring CUs for predictor-based IBC search, according to some examples;

fig. 10 illustrates a BV derivation process used in a predictor-based BV search, according to one example;

FIG. 11 illustrates a comparison between a hash-based current matching method and a hash-based matching method according to an example;

FIG. 12A illustrates a method of generating an IBC hash table by generating the hash table using original luma samples, according to one example;

FIG. 12B illustrates a method of generating an IBC hash table by generating the hash table using mapped luma samples, according to one example;

FIG. 13 illustrates an exemplary process for updating the BV library;

fig. 14 illustrates an exemplary extended BV derivation;

fig. 15A to 15C show examples of dividing the same coding unit by different division paths;

fig. 16 is a flow diagram of BV estimation according to an example;

FIG. 17 illustrates an exemplary IBC reference region that takes into account reconstructed samples in a line buffer;

fig. 18A shows an Adaptive Loop Filter (ALF) applied to chroma reconstruction samples in VVC according to an example;

fig. 18B shows an Adaptive Loop Filter (ALF) applied to luma reconstruction samples in VVC according to an example;

fig. 19 shows a deblocking process in VVC according to an example.

Detailed Description

The terminology used in the present disclosure is intended to be illustrative of particular examples and is not intended to be limiting of the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be understood that the term "and/or" as used herein refers to any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are merely used to distinguish one type of information from another. For example, a first information may also be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" may be understood to mean "when … …" or "at … …" or "in response," depending on the context.

Reference throughout this specification to "one example," "an example," "another example," or the like, in the singular or plural, means that a particular feature, structure, or characteristic described in connection with the example is included in at least one example of the present disclosure. Thus, the appearances of the phrases "in one example" or "in an example," "in another example," and the like, in the singular or plural, in various places throughout this specification are not necessarily all referring to the same example. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more examples.

The present disclosure relates generally to encoding and decoding (e.g., encoding and decoding) video data. More particularly, the present disclosure relates to a method, computing device, and non-transitory computer-readable storage medium for selectively enabling and/or disabling intra smoothing operations for video codecs.

Like HEVC, VVC is built on a block-based hybrid video codec framework. Fig. 1 is a block diagram illustrating an illustrative block-based video encoder 100 that may be used in connection with many video codec standards, including VVC. The input video signal is processed block by block, referred to as a Coding Unit (CU). In VTM-1.0, a CU may be up to 128 × 128 pixels. However, unlike HEVC which partitions blocks based on only quadtrees, in VVC one Coding Tree Unit (CTU) is divided into CUs to accommodate local characteristics that differ based on quadtrees. In addition, the concept of multiple partition unit types in HEVC is removed, i.e., the splitting of CU, Prediction Unit (PU) and Transform Unit (TU) is no longer present in VVC; instead, each CU is always used as a basic unit for both prediction and transform without further partitioning. In the multi-type tree structure, one CTU is first divided by a quadtree structure. Each leaf node of the quadtree may then be further partitioned by a binary tree structure and a ternary tree structure.

In the encoder 100, a video frame is partitioned into a plurality of blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method. In inter-frame prediction, one or more prediction values are formed by motion estimation and motion compensation based on pixels from previously reconstructed frames. In intra prediction, a prediction value is formed based on reconstructed pixels in a current frame. Through the mode decision, the best predictor can be selected to predict the current block.

The prediction residual, which represents the difference between the current video block and its prediction value, is sent to the transform circuit 102. The term "circuitry" as used herein includes both hardware and software for operating the hardware. The transform coefficients are then sent from transform circuit 102 to quantization circuit 104 for entropy reduction. The quantized coefficients are then fed to entropy coding circuitry 106 to generate a compressed video bitstream. As shown in fig. 1, prediction related information 110 (such as block partition information, motion vectors, reference picture indices, and intra prediction modes, etc.) from inter prediction circuitry and/or intra prediction circuitry 112 is also fed through entropy encoding circuitry 106 and saved into a compressed video bitstream 114.

In the encoder 100, decoder-related circuitry is also required to reconstruct the pixels for prediction purposes. First, the prediction residual is reconstructed by the inverse quantization circuit 116 and the inverse transformation circuit 118. The reconstructed prediction residual is combined with the block prediction value 120 to generate unfiltered reconstructed pixels of the current block. Inverse quantization circuit 116 and inverse transform circuit 118 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain. In certain embodiments, the adder adds the reconstructed residual block to the motion compensated prediction block generated earlier by motion compensation circuit 122 or intra prediction circuit 112 to generate a reconstructed video block, which is stored into reference picture memory 111. The picture memory 111 may be connected to a loop filter 115, which is coupled to a picture buffer 117. Motion estimation circuit 124 and motion compensation circuit 122 may use the reconstructed video block as a reference block for inter-coding a block in a subsequent video frame.

As shown in fig. 1, intra prediction (also referred to as "spatial prediction") and/or inter prediction (also referred to as "temporal prediction" or "motion compensated prediction") may be performed. Intra-prediction uses pixels from samples (referred to as reference samples) of already coded neighboring blocks in the same video picture or slice to predict the current video block. Intra-prediction reduces the spatial redundancy inherent in video signals. Inter prediction uses reconstructed pixels from a coded video picture to predict the current video block.

Inter-frame prediction reduces temporal redundancy inherent in video signals. The inter prediction signal for a given CU is typically signaled by one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and its temporal reference. Also, if a plurality of reference pictures are supported, one reference picture index for identifying from which reference picture in the reference picture store the temporal prediction signal comes is additionally transmitted.

After intra-prediction and/or inter-prediction, the intra/inter mode decision circuit 121 in the encoder 100 selects the best prediction mode based on, for example, a rate-distortion optimization method. Then, subtracting the prediction block from the current video block; and the prediction residual is decorrelated and quantized using a transform. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further loop filtering such as deblocking filters or Sample Adaptive Offset (SAO) and Adaptive Loop Filters (ALF) may be applied to the reconstructed CU before the reconstructed CU is placed in the reference picture buffer 117 and used to encode future video blocks. To form the output video bitstream, the codec mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding circuit 106 for further compression and packing to form the bitstream.

Fig. 2 is a block diagram illustrating an illustrative block-based video decoder that may be used in connection with many video codec standards, including VVC. In some examples, the decoder 200 is similar to the reconstruction related parts located in the encoder 100 of fig. 1.

Referring to fig. 2, in a decoder 200, an incoming video bitstream 201 is first decoded by an entropy decoding circuit 202 to derive quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed by inverse quantization circuitry 204 and inverse transform circuitry 206 to obtain reconstructed prediction residuals. The codec mode and prediction information are sent to a spatial prediction circuit (if intra-coded) or a temporal prediction circuit (if inter-coded) to form a prediction block. The residual transform coefficients are sent to inverse quantization circuit 204 and inverse transform circuit 206 to reconstruct the residual block. Then, the prediction block and the residual block are added. The reconstructed block may further pass through a loop filter 216 and then be stored in a reference picture store. The reconstructed video in the reference picture store is then sent out to drive the display device and used to predict future video blocks.

The block predictor mechanisms that may be implemented in the intra/inter mode selection circuitry 208 include: an intra-prediction circuit 210 configured to perform an intra-prediction process, and/or a motion compensation circuit 212 configured to perform a motion compensation process based on decoded prediction information. A set of unfiltered reconstructed pixels is obtained by summing the reconstructed prediction residual from the inverse transform circuit 206 with the prediction output generated by the block predictor mechanism using an addition 214. With the loop filter 216 turned on, a filtering operation is performed on these reconstructed pixels to derive the final reconstructed video for output.

Fig. 3A to 3E show five example partition types, i.e., a quad partition (fig. 3A), a horizontal binary partition (fig. 3B), a vertical binary partition (fig. 3C), a horizontal ternary partition (fig. 3D), and a vertical ternary partition (fig. 3E).

Fig. 4 illustrates valid prediction sample areas of coding units for one IBC codec. Due to the rapid development of video applications such as wireless display, video conferencing, live game play, and cloud computing, Screen Content Codec (SCC) has received wide attention in recent years in academia and industry. Although VVCs have achieved significant improvements in codec efficiency compared to the previous video codec standard HEVC, most codec tools in VVCs are primarily designed for video captured by natural cameras. However, screen content videos, which are typically composed of computer-generated content such as text and graphics, show attributes that are distinct from those of natural content. For example, natural video signals captured by a camera typically show smooth boundaries across different objects due to the nature of the camera lens, while screen content presents sharp edges.

IBC was first proposed during the development of the HEVC SCC extension and is a block matching technique that predicts the samples of one current video coding unit by reconstructed coding units in the same picture. The reconstructed coding unit is also referred to as a reference coding unit of the current video coding unit. The displacement between the current video coding unit and the reference coding unit is called a Block Vector (BV). BV, along with the prediction residual of the current video coding unit, needs to be transmitted from the encoder to the decoder for sample point reconstruction at the decoder. Since the IBC codec tool has superior codec performance with BD rate reduction of more than 30% for typical screen content video, it was adopted into the VVC working draft at the 12 th jfet meeting. Because IBC uses unfiltered reconstructed samples in the current picture as a reference, both the encoder and decoder need to keep samples of the reconstructed region before loop filtering (e.g., deblocking, SAO, and ALF) is done in the current picture. This may significantly increase hardware implementation complexity due to the need to use additional memory and bandwidth for IBC-related read and write operations.

As shown in fig. 4, in order to achieve a good tradeoff between codec performance and implementation complexity, in the IBC design of VVC, only reconstructed samples in the left-neighboring CTU and the current CTU are allowed to be used as references for IBC prediction of the current CU.

Fig. 5 is a flow diagram of IBC signaling in a VVC according to an example. When the prediction mode of a CU is signaled, IBC is signaled as a third mode in addition to the intra-prediction mode and the inter-prediction mode. This is achieved by adding a CU level flag (in addition to the original flag indicating intra mode versus inter mode) to indicate whether the current CU is coded by IBC mode.

Specifically, there are two different ways to enable IBC mode in VVC. First, if one CU is coded in merge mode, the merge candidate index is used to indicate the BV of the neighbor candidate in the list from which the BV from the current CU inherits the IBC coding. The IBC merge candidate list consists of up to five spatially neighboring CU/block BVs and a history-based BV in a manner similar to the conventional inter-frame merge mode. Second, if one CU is coded as a non-merge mode, the BV of the CU is predicted and the corresponding BV difference is coded in the same manner as the conventional MV. The BV prediction method uses two candidates as predictors, one from the left neighbor and the other from the upper neighbor (both IBC-coded). When any neighbor is not available, zero BV will be used as the predictor. A1 binary bit flag is signaled to indicate the block vector predictor index. In addition, when a CU is coded as a non-merging IBC mode, the BVD resolution may switch between 1 pixel integer and 4 pixel integer at the CU level.

The IBC mode is very similar to the inter mode except that the IBC mode uses samples of reconstructed regions in the current picture as reference samples, while the normal inter mode uses samples of other coded pictures to predict the current CU. Therefore, some codec tools for inter-coding may also be applied to IBC mode. Specifically, the following design aspects are included in the current IBC to handle its interaction with inter-frame coding tools in VVCs: interaction with spatial merge mode, interaction with temporal motion vector prediction and sub-block-based temporal motion vector prediction, interaction with pairwise merge mode, interaction with history-based motion vector prediction, interaction with separate luma-chroma partition trees, interaction with adaptive motion vector resolution, and interaction with luma mapping with chroma scaling.

In interaction with the spatial merge mode, the BV of the current CU is allowed to inherit from the BV of its spatial neighboring CUs. The derivation process of the IBC merging candidate remains almost the same as the conventional merging candidate (i.e., non-IBC merging mode), except that the derivation of the BV candidate of the IBC mode excludes all neighboring CUs coded by inter mode, and vice versa. Specifically, if the current CU is an IBC-coded CU, only neighboring CUs coded by the IBC mode are considered when generating the merging candidates of the CUs. Conversely, if the current CU is an inter-coded CU, only neighboring CUs coded by the inter mode are considered to form a merging candidate list of CUs.

In the interaction with temporal motion vector prediction and subblock-based temporal motion vector prediction, Temporal Motion Vector Prediction (TMVP) is supported in VVC. Under TMVP, the MV of the current CU is predicted by the MVs of the co-located CUs in one temporal reference picture (also referred to as co-located picture). Additionally, VVCs also support subblock-based temporal motion vector prediction (SbTMVP). Like TMVP, SbTMVP derives motion information of the current CU using motion information of the co-located picture through the merge mode. However, the motion derivation for SbTMVP mode is performed at the sub-block level, rather than just deriving a single MV for the current CU. Both TMVP and SbTMVP are enabled only for inter-coded CUs, but disabled for IBC-coded CUs.

In interaction with the pairwise merge mode, pairwise merge candidates are supported under IBC merge mode. Specifically, similar to the inter-frame merging mode (in which the pair merging candidates may be generated by averaging MVs of the two inter-frame merging candidates), for the IBC merging mode, the pair merging candidates may be generated by averaging BV of the two IBC merging candidates. However, combining one IBC merging candidate and one inter merging candidate is prohibited, i.e., averaging one BV with one MV is not allowed.

In the interaction with history-based motion vector prediction, as in the normal inter mode, history-based motion vector prediction (HMVP) is applied to the IBC mode by adding the BV of the previous IBC CU to a history candidate list as a reference for predicting the BV of a future IBC CU. However, instead of sharing the same buffer, two separate candidate tables are maintained and updated at both the encoder and decoder, one containing the MVs of the previous inter-coded CUs (i.e., the HMVP MV table) and the other containing the BV of the previous IBC CU (i.e., the HMVP BV table). After a inter/IBC CU is coded, the HMVP MV/IBC table is updated by adding the corresponding MV/BV as a new candidate to the last entry of the correspondence table. In addition, the candidates in these HMVP tables may be used as merging candidates or AMVP prediction candidates for the normal inter mode and IBC mode, respectively.

In interaction with separate luma-chroma partition trees, IBC mode may still be applied to both luma and chroma when separate partition trees are applied for luma and chroma components, but with the limitation that the BV of a chroma CU is directly derived from the BV of the corresponding luma CU without signaling. More specifically, before coding and decoding a chroma CU, the IBC mode coverage of the luma samples corresponding to the chroma CU is first checked. IBC mode may be enabled for a chroma CU only if all luma samples in a corresponding luma region of the chroma CU are coded as IBC mode. When IBC mode is enabled, BV for each chroma sub-block (e.g., 2 × 2 sub-block) is derived from the corresponding luma BV (scaled and rounded with MVs).

In the interaction with adaptive motion vector resolution, all BVs are limited to integer pixel resolution, so that a direct sample copy from the reference CU (i.e. without any pixel interpolation) can be used to generate the IBC prediction. In addition to integer-pixel BV precision, Adaptive Block Vector Resolution (ABVR) is also applied to introduce four-pixel BV precision for IBC mode. The ABVR is conditionally enabled based on whether the current CU has at least one non-zero BVD. If both the horizontal BVD and the vertical BVD are zero, then the integer pixel BVD can always be inferred. Similar to Adaptive Motion Vector Resolution (AMVR), to ensure that the reconstructed BV has the desired precision, the selected BV predictor of one IBC CU is rounded to the same precision as the precision of the BVD and then added to the BVD to generate the final BV.

In the interaction with luma mapping with chroma scaling, a codec tool known as luma mapping with chroma scaling (LMCS) is applied before the loop filter. LMCS has two main components: 1) a loop mapping of the luminance component based on the adaptive piecewise linear model; and 2) for the chroma component, applying luma-dependent chroma residual scaling.

Fig. 6 is a block diagram illustrating a decoding process using luma mapping with chroma scaling (LMCS) according to one example. As shown in FIG. 6, Q-1And T-1The circuit 601, reconstruction circuit 602, and intra prediction circuit 603 indicate circuits that apply processing in the mapped domain, including inverse quantization, inverse transformation, luma intra prediction, and addition of luma prediction and luma residual. In addition, the loop filter circuits 604 and 607, the DPB circuits 605 and 608, the motion compensation circuits 606 and 611, the intra prediction circuit 610, and the reconstruction circuit 609 indicate circuits that apply processing in the original (i.e., unmapped) domain, including loop filters such as deblocking, ALF, and SAO, motion compensated prediction, chroma intra prediction, addition of chroma prediction to chroma residual, and storing of the decoded picture as a reference picture.

In VVC, IBC is allowed to be combined with LMCS. When the two codec tools are jointly enabled, IBC luma prediction is performed in the mapped domain and IBC chroma prediction is done in the original domain, similar to intra mode. Furthermore, as will be mentioned later, it is necessary to generate a hash table for each current picture and use the hash table for a hash-based BV search. In VTM-4.0, a hash table is generated in the mapped luma sample field. In particular, it is necessary to convert the luminance samples of the current picture to a mapped sample domain using an LMCS piecewise linear model and then use the mapped luminance samples to generate a hash table of the picture. In interaction with other inter tools, IBC mode cannot be jointly enabled with inter coding tools including affine mode, merge mode with motion vector differences, combined intra-inter prediction and triangle mode on a given CU.

In VVC Test Model (VTM) -4.0, a hash-based BV search method is performed for IBC mode at the encoder side. The encoder performs a rate-distortion (RD) check on a CU whose width and height are neither greater than 16 luminance samples. For non-merged IBC mode, a BV search is first performed using a hash-based search. If the hash search fails, the current CU's cache BV from the previous split path will be checked. If the cache BV still fails to provide valid BV candidates, a local BV search will eventually be performed based on conventional block matching.

Fig. 7 is a flow diagram of a Block Vector (BV) estimation process for IBC mode according to one example. For hash-based BV search, a hash value (i.e., 32-bit CRC) is calculated for each 4 x 4 block in the original picture and extended to all allowed CU sizes for IBC mode. In particular, for a given CU, a perfect match with another reference CU (of the same size as the given CU) is determined only if the hash values of all 4 × 4 sub-blocks within the given CU match the hashes of the corresponding co-located sub-blocks within the reference CU. To locate the position of the reference block, the hash value (i.e., hash) of the 4 x 4 sub-block within the current CU associated with the minimum number of matching 4 x 4 blocks, also referred to as the "leading sub-block," is identified. Then, for each 4 x 4 block in the current picture whose hash value is equal to hash x, a reference CU may be determined whose starting position is set to the top left position of the 4 x 4 block and whose size is set equal to the width and height of the CU.

Fig. 8 shows an example to illustrate how to locate an IBC reference block based on the hash values of the 4 x 4 sub-blocks. If there are multiple reference CUs whose hash values match the hash value of the current CU, the BV signaling cost corresponding to each reference CU is calculated and the one with the smallest cost is selected.

Due to the quad/bi/tri-tree partitioning structure used in VVC, the same block partitions can be obtained by different partition combinations. To speed up the BV estimation process, when a hash-based BV search fails to provide valid BV candidates, a fast BV search algorithm is applied in VTM-4.0 by reusing BV candidates of one particular CU in different partition choices. In particular, when a CU is first encoded, the determined BV for a particular CU will be stored. Then, when the same CU is encoded through another partition path, the stored BV will be directly reused, instead of estimating BV again through a hash-based BV search and a local BV search.

Fig. 9 illustrates spatially neighboring CUs for predictor-based IBC search, according to some examples. If both the hash-based BV search and the cached BV search fail, a local block matching-based BV search will be performed once based on the conventional block matching scheme. Specifically, the local BV search process consists of two separate steps, namely a predictor-based BV search and a spatial BV search. For predictor-based BV search, the BV of five spatially neighboring CUs at positions a0, a1, B0, B1, and B2 (same neighboring positions for inter and IBC merging modes) as shown in fig. 9 are used as BV candidates for the current CU. In addition, for each spatial BV candidate, if the corresponding reference CU of the current CU is also IBC coded, a derived BV candidate may be generated by adding the current BV and the BV of the reference CU.

Fig. 10 illustrates a BV derivation process used in a predictor-based BV search. All of the above BV candidates are evaluated and sorted based on RD cost, which is a lagrangian weighted average of the Sum of Absolute Differences (SAD) of the luma component and the BV signaling bits, and a BV candidate list is maintained that contains eight BV candidates with the smallest RD cost. In the spatial BV search, candidate locations within a predetermined range of reconstructed samples to the left and top of the current CU and BV candidate list are updated in the same manner as in the predictor-based BV search. Finally, BV refinement is applied to the BV candidate list by considering the RD cost of both the luma and chroma components. The BV that minimizes the RD cost is selected as the final BV for the CU.

Referring to fig. 8, in order to locate the position of a reference block for a hash-based BV search, a hash value of a leading sub-block within a current CU is first identified. Then, for each 4 × 4 block that possesses the same hash value as the leader sub-block, the reference CU is determined by using the 4 × 4 matched block as the starting position (i.e., the position of the upper left corner). This approach is suboptimal because the leading 4 x 4 sub-block may not always be located in the upper left corner of the current CU, since the relative position of the leading 4 x 4 sub-block within the current CU is not taken into account when locating the reference CU of the current CU. This disregard may reduce the likelihood of hash-based block matching (i.e., reduce the number of matching blocks that can be found), and thus compromise the efficiency of the hash-based BV search.

To address this problem, an improved method of hash-based block matching is proposed herein, as shown in fig. 11. FIG. 11 illustrates a comparison between a hash-based current matching method and a hash-based matching method, according to an example. Specifically, as shown in fig. 11, the leading 4 × 4 sub-block within the current CU is first determined in the same manner as the hash-based block matching method. However, the proposed method does not use each matching 4 × 4 block as a starting position, but determines the region of the corresponding reference CU by considering the matching 4 × 4 block as a co-located block of a leading sub-block within the reference block.

The hash table may be generated by using the mapped luminance samples. This design increases coding complexity due to additional sample level mapping operations applied to compute luma samples for each picture's hash table. To reduce the complexity of IBC hash table generation, a hash table may be generated based on the original luminance samples, i.e., without LMCS mapping.

In addition to BV estimation based on IBC hashing, the hash-based search method can also be applied to inter (inter) (i.e., inter-frame) MV search, where one hash table is computed for each temporal reference picture of the current picture. The reference CU of the current CU is then identified by comparing the hash values of the two CUs. Hash table generation for hash-based interframe searching is not uniform. Specifically, for intra reference pictures, a corresponding hash table is generated by using the mapped luma samples. For inter reference pictures, a hash table is generated by using the original luma samples. This may potentially affect the efficiency of inter (i.e., inter) hash matching, given that hash values for the reference block and the current block may be computed in different luma sample domains.

Two hash table coordination methods for IBC mode can be applied to inter-frame hash search to solve this problem. First, the hash tables for both intra and inter reference pictures are generated based on the original luma samples, i.e., there is no LMCS mapping. Second, hash tables for both intra-frame reference pictures and inter-frame reference pictures are established based on the mapped luma samples, i.e., there is LMCS mapping.

In predictor-based MV search, only the BV of the five blocks directly adjacent to the current CU and its corresponding derived BV are checked as BV candidates. Such a scheme may only help when at least one of the five neighboring CUs is available. In addition, if none of these CUs are coded by IBC mode, predictor-based MV search will not be applicable because neighboring non-IBC CUs do not have BV candidates available. On the other hand, due to the common block partition structure applied in VVC, each CU may be further partitioned by multiple tree partitions (i.e., quadtree, binary tree, and ternary tree).

Accordingly, there may be a strong correlation between BVs of CUs at different coding tree levels or BVs of spatially non-adjacent CUs. For example, there may be a strong correlation between CUs within the same area (e.g., a relatively flat area with less texture). In this case, one CU may select the same or similar BV as that of its parent CU. In another example, for the ternary tree partitioning as shown in fig. 3D and 3E, a CU may be divided into three sub-partitions at a ratio of 1:2:1 in the horizontal or vertical direction. It is generally assumed that in such a partition, there is one foreground object located in the central sub-partition of the block, while the left and right sub-partitions belong to the background. In this case, the BV correlation between the left and right sub-partitions will be stronger than the BV correlation between the center sub-partition and the left (or right) sub-partition.

To further improve IBC coding efficiency, a BV library-based approach is provided, where the BV library contains multiple (e.g., N) BV candidates examined by previous IBC CUs. The candidates in the BV library may be used as BV candidates for predictor-based BV search of the current CU. In particular, the BV library may be set to empty or initialized with some representative BV value at the beginning of a picture. Then, after completion of the BV estimation for the CU, the BV library may be updated by merging another number (e.g., K) of BV candidates examined by the current CU with BV candidates in the BV library to form an updated BV library. The updated BV library will be used for predictor-based BV searches of future CUs.

In addition, to further improve IBC codec efficiency while maintaining the BV library at a reasonable size, pruning may be applied when updating the BV library so that only BV candidates for recently coded CUs that were not present in the BV library prior to the update may be added. In addition, due to the strong correlation between BVs of spatially neighboring CUs, the BV candidates of the most recently coded CU are first included in the updated BV library, followed by the BV in the original BV library.

Fig. 13 illustrates an exemplary process for updating the BV library. The BV bank size (i.e., N) and the number of newly added BV candidates (i.e., K) may be set to different values, which may provide different tradeoffs between coding efficiency and coding complexity. In one example, the values of N and K may be set to 64 and 8, respectively.

In one example, a BV library is maintained at the encoder. The BV library includes one or more BV candidates obtained from a BV search of previously coded CUs. A BV candidate list is then generated. The BV candidate list may include all BVs in the BV library, the BVs of the spatially neighboring CUs, and the derived BV of the current CU. A rate-distortion cost is then calculated for each BV in the BV candidate list. Based on the calculated rate-distortion cost, the BV with the smallest rate-distortion cost may be determined to be the optimal BV for the current CU. In addition, the BV library is updated by adding K BVs from the BV candidate list to replace one or more existing K BVs in the BV library that were examined by previously coded CUs.

Referring to fig. 10, if a reference CU identified by a neighboring BV of a current CU is also coded by the IBC mode, a derived BV candidate may be generated for a predictor-based BV search of the current CU by adding the current BV and the BV of the reference CU. The derived BV may be generated from a combination of two BVs. In other words, the current BV derivation chain may contain at most two consecutive IBC reference CUs.

However, due to the highly repetitive pattern in screen content video, it is highly likely that one IBC CU has multiple well-matched blocks in one picture. Therefore, to further explore such characteristics, an extended BV derivation method is proposed in the present disclosure for conducting a predictor-based BV search by allowing multiple BVs to be combined in generating the derived BV. In particular, similar to the current BV derivation method, the proposed method generates a derived BV if the reference CU indicated by the selected neighboring CU BV is IBC coded. However, the proposed method does not stop once, but continues and repeats the BV derivation process until the reference CU indicated by the newly derived BV is no longer coded by the IBC mode.

Fig. 14 shows an exemplary extended BV derivation. As shown in FIG. 14, there are L +1 consecutive IBC reference CUs on the BV derivation chain starting from the current CU, i.e., reference CUs0Reference CU1…, reference CUL. Accordingly, with the proposed method, L different BVs can be derived for predictor-based BV searches, i.e.,

when the current CU cannot find a matching reference CU through hash-based matching, a cached BV search is applied by reusing BV candidates of the same block obtained in a previous segmentation path. In the current VTM-4.0, only the best BV is stored when the current block is encoded based on one particular partition structure. Although the same CU may be obtained by different partition combinations, neighboring blocks around the CU may be different when different partition paths are applied.

Fig. 15A to 15C show examples of dividing the same CU by different dividing paths. Specifically, in fig. 15A, CU X is obtained by one quadtree division; in fig. 15B, CU X is obtained by performing one horizontal binary division and then one vertical binary division on the second sub-partition; in fig. 15C, CU X is obtained by performing one vertical binary division and then one horizontal binary division on the second sub-partition.

In addition, as shown in fig. 15A to 15C, neighboring CUs of the current block are different under different partition paths. For example, in fig. 15A, all three neighboring CUs of CU X are the CUs of the IBC codec, whereas in fig. 15B, there are no IBC neighboring CUs around CU X. Due to the different neighboring CUs, different BV predictions may be obtained when predicting BV for CU X, which may result in different BVD signaling overhead. Therefore, when the same CU is encoded based on another partition path, the best BV of one CU generated based on one partition path may not always be optimal.

To further improve IBC codec efficiency, it is proposed in the present disclosure to increase the number of stored BVs used for cached BV searches. In particular, when testing one IBC CU at the encoder, the proposed method does not store only a single best BV, but stores and maintains the first M (where M is a value greater than 1) BV candidates selected based on the corresponding RD cost. Then, when the same CU is coded again through another partition path, the local BV estimation will be skipped; instead, the RD cost of the stored BV candidate from the previous split path will be computed and the BV candidate that minimizes the RD cost will be selected as the current best BV of the CU.

In another example, to further improve the accuracy of the estimated BV, predictor-based BV candidates may additionally be tested during a cached BV search. Specifically, in addition to the BV candidates of the current CU obtained from the previous split path, the BV of the five spatially neighboring CUs and their corresponding derived BV in fig. 9 are tested to determine the optimal BV of the current CU.

Furthermore, all proposed BV search improvements (e.g., predictor-based enhanced BV searches derived based on BV libraries and extended BV) may be freely combined, and different combinations may provide different tradeoffs in coding performance and coding complexity. In one example, all of the above improvements are combined to maximize the efficiency of BV estimation for IBC mode.

Fig. 16 is a flowchart of BV estimation when all of the above BV search improvement methods are applied jointly.

In modern video encoders, Sum of Absolute Difference (SAD) and Sum of Absolute Transformed Difference (SATD) are two widely used distortion metrics for determining certain codec parameters (e.g. MV/BV, codec mode and extra cases) during the encoder RD. SAD measures the similarity between two video CUs by simply calculating the absolute difference between a sample in one CU and its corresponding sample in another CU. Because of its low complexity, SAD has been used for some RD process involving a large number of similarity comparisons, e.g., motion estimation at integer sample positions. SATD works by frequency transforming the corresponding video CU (usually the hadamard transform) and then measuring the similarity of the two transformed video CUs (i.e. the difference between a sample in one transformed CU and its corresponding sample in the other transformed CU).

Although SATD is more complex due to the additional transform operations, SATD is a better estimate of RD cost than SAD in terms of the number of bits considered for transmitting the residual signal. Thus, SATD is typically used for those RD processes that require more accurate RD cost estimation (e.g., motion estimation at fractional sample positions, pre-selection of inter-frame merge candidates and intra-frame prediction modes, and additional cases).

SATD is more accurate than SAD in estimating RD cost of natural video content, as transforms often help to codec these content. Due to the nature of the camera lens, natural video content typically shows a smooth transition between adjacent captured sample values. This results in a gradual change of the residual plane after inter/intra prediction, where the 2D transform is advantageous for energy compaction purposes applied to the residual signal. However, because the screen content is directly generated by the computer, the screen content shows very different characteristics (e.g., extremely sharp edges or color transitions, large uniform regions with repeating patterns, and a large number of the same CUs or regions in the same picture) compared to natural content. This makes various prediction schemes (e.g., intra, inter, and IBC prediction) more efficient.

Meanwhile, strong edges in typical screen content video also generate a large amount of high-frequency residuals through transformation, which makes the conventional 2D transformation less suitable for encoding and decoding screen content. Therefore, the screen content CU generally prefers the transform skip mode (i.e. skipping the 2D transform). In this scenario, the RD cost may be calculated using SAD instead of SATD due to the skipped transitions.

To estimate the RD cost more accurately, the SATD metric and SAD metric are adaptively selected to measure the difference between two CUs during the encoder RD. In some examples, a SAD/SATD adaptation method is provided using tile groupings as base units. In addition, the SAD/SATD adaptation method may also be applicable to other codec stages, e.g. sequence level, picture/slice level or even region level (e.g. each region may contain a certain amount of CTUs).

In one example, the SAD/SATD adaptation method may include the following steps to decide whether the RD cost calculation should use SATD or SATD. For each non-overlapping 4 x 4 block in a tile grouping, the encoder calculates a hash value (e.g., 32-bit CRC) for the block; meanwhile, for each hash value, the encoder counts the number of 4 × 4 blocks (i.e., usage) associated with the hash value.

All non-overlapping 4 x 4 blocks in the tile grouping are then classified into two categories. The first category contains the 4 x 4 blocks covered by the first N most common hash values and the second category contains all the remaining 4 x 4 blocks not belonging to the first category.

For each 4 x 4 block in the second category, it is checked whether there is another 4 x 4 block in the same category that presents the same hash value. If there is at least one matching block, then the 4 x 4 block is considered to be a screen content block; otherwise (if there are no matching blocks), the 4 x 4 block is treated as a non-screen content (i.e., natural video content) block. If the percentage of screen content blocks is greater than a predefined threshold, then the RD cost calculation will be performed using SAD. Otherwise, SATD is applied to the RD cost calculation.

For temporal MV prediction, one temporal motion field is maintained for each reference picture that stores the MV of inter-coded CUs and the BV of IBC-coded CUs, and is used for motion derivation of TMVP and SbTMVP. In addition, both TMVP and SbTMVP are always disabled for the IBC coded CUs. This means that only the MVs stored in the temporal motion field buffer can be used for the TMVP and SbTMVP prediction modes. The BV of a CU stored in the IBC codec in the temporal motion field buffer cannot be used for TMVP or SbTMVP. To distinguish MV from BV in the motion field buffer, for each 8 x 8 block, 2-bit storage is required to store the corresponding prediction modes (i.e., intra, inter, and IBC modes). This is higher than the prediction mode storage used by HEVC, which requires only 1 bit to distinguish intra modes from inter modes. In order to reduce the number of bits used for the temporal motion field buffer, according to the present disclosure, it is proposed to convert the prediction mode of the IBC block into an intra mode and then store into the temporal motion field.

In another example, it is proposed to enable IBC mode Temporal Block Vector Prediction (TBVP). Specifically, in the method, the temporal BV derivation process (e.g., selection of collocated CUs) remains the same as TMVP, except that the BV scaling operation is always abandoned. In addition, similar to TMVP, TBVP prediction can be used in two different ways: 1) if the current CU carries out coding and decoding in the IBC merging mode, besides the BV of the spatial adjacent CU, a TBVP predicted value can be added to serve as an additional IBC merging candidate inherited by the BV of the current CU; 2) if the current CU is codec in explicit IBC mode, the BV of the current CU may be predicted using the TBVP predictor as one BV predictor, where the value of the resulting BVD is signaled in the bitstream.

HMVP applies to both inter and IBC modes. In addition, according to some examples, two separate candidate tables are maintained and updated at both the encoder and the decoder, one containing MVs of previous inter CUs (i.e., HMVP MV table) and the other containing BV of previous IBC CUs (i.e., HMVP BV table). This doubles the buffer size required for HMVP. To reduce the HMVP storage, in one example of the present disclosure, it is proposed to use the same buffer to store both MV and BV of previously coded inter-CU and IBC CU.

Referring to fig. 4, according to the IBC design in the VVC, in the IBC mode, only reconstructed samples in the current CTU and the left CTU are allowed to be used as reference samples to predict the current CU. This is enforced to avoid additional data exchanges with external (or in other words, non-local) memory, since all reconstructed samples of the current CTU and the left CTU can be stored using on-chip memory. On the other hand, in addition to the reconstructed samples in the current CTU and the left CTU, there are also other reconstructed samples that are typically stored in a local on-chip memory. These reconstructed samples in the local buffer in addition to the samples in the current CTU and the left CTU are also allowed to be used as reference samples for IBC prediction of the current CU.

One example is to additionally include those reconstructed samples stored in a line buffer for IBC prediction. For intra prediction in VVC, at least one row of reconstruction samples in the width of the current picture and/or current tile from the top CTU row needs to be kept. These samples may be used as reference samples for intra prediction of CUs in the next CTU row. The buffer used to hold these reconstructed samples is commonly referred to as a "line buffer". In other words, with the proposed method, the corresponding region of reconstructed samples used by the IBC comprises reconstructed samples in the current CTU and the left CTU and at least one row of reconstructed samples in the line buffer.

Fig. 17 illustrates the corresponding IBC reference region after considering the reconstructed samples in the line buffer.

Another example is to additionally include reconstructed samples for IBC prediction stored in a local buffer prior to loop filtering. For practical hardware codec implementations of VVCs, other codec tools in current VVCs require some other local on-chip memory in addition to the reconstructed samples (before the loop filter) stored in the line buffer. In some examples, an Adaptive Loop Filter (ALF) is applied to enhance reconstructed luma and chroma samples based on a finite response (FIR) filter determined by an encoder and transmitted to a decoder.

As shown in fig. 18A to 18B, in the current ALF, the luminance and chrominance reconstructed samples are filtered using a 7 × 7 diamond filter and a 5 × 5 diamond filter, respectively. Accordingly, as shown in fig. 18A to 18B, 3 lines of reconstructed luma samples and 2 lines of reconstructed chroma samples need to be stored from the top CTU to perform the ALF operation on the samples of CUs in the first CU line of the current CTU.

In another example, deblocking is applied as a loop filter in VVC to reduce/remove blocking artifacts caused by motion compensation and quantization. As illustrated in fig. 19, in the current deblocking design, up to 4 rows of reconstructed luma samples and 2 rows of reconstructed chroma samples from the top CTU are needed for deblocking samples of the bounding box adjacent to the current CTU and the top CTU. To further improve IBC performance, in addition to local storage for reconstructed samples in the current CTU and the left CTU, it is proposed in this disclosure to include reconstructed samples stored in all other on-chip memories (e.g., 4 rows of reconstructed samples for ALF, deblocking, SAO, etc.) as additional reference samples for IBC prediction of one current CU.

In some examples, a video encoding method is provided. The method comprises the following steps: receiving a video picture including a plurality of coding units, wherein each coding unit of the plurality of coding units is predicted from a reference coding unit in the same picture by an Intra Block Copy (IBC) mode, the reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit; dividing the picture into a plurality of non-overlapping blocks and calculating, by an encoder, a hash value for each block of the plurality of non-overlapping blocks; classifying all non-overlapping blocks into at least two categories including a first category and a second category, wherein the first category includes one or more blocks characterizing one or more hash values encompassed by a first set of hash values and the second category includes all remaining blocks; classifying blocks in the second category into at least two groups including a first group, wherein the first group includes one or more blocks that characterize a same hash value as another block in the second category; determining a distortion metric for calculating a difference between samples in a coding unit and samples of a reference coding unit of the coding unit in the same picture; and obtaining an optimal Block Vector (BV) for a first coding unit in the picture based on the distortion metric, wherein the BV for the first coding unit is a displacement between the first coding unit and a reference coding unit in the same picture.

In some examples, the at least two groups may further include a second group, wherein the second group includes all remaining blocks in the second category.

In some examples, the step of determining the distortion metric comprises: using Sum of Absolute Differences (SAD) as the distortion metric when the percentage of blocks in the first packet of the second class is greater than a predetermined threshold; and using a Sum of Absolute Transformed Differences (SATD) as the distortion metric when the percentage of blocks in the first packet of the second class is not greater than the predetermined threshold.

In some examples, the obtaining the optimal BV for the first coding unit comprises: identifying a second encoding unit corresponding to the first encoding unit by matching a hash value of each block in the first encoding unit with a hash value of an in-place block of the block in a second encoding unit, wherein the hash value of the in-place block in the second encoding unit is the same as the hash value of the block in the first encoding unit, and the plurality of encoding units include the second encoding unit.

In some examples, the step of identifying a second coding unit corresponding to the first coding unit further comprises: identifying a leading block in the first coding unit, wherein the leading block is a block in the first coding unit that corresponds to a minimum number of matching blocks that have a same hash value as blocks in the picture; identifying a second encoding unit comprising a co-located block of the lead block, wherein the second encoding unit has the same size as the first encoding unit and the hash value of the co-located block is the same as the hash value of the lead block; and determining that the second encoding unit is a reference encoding unit, wherein the hash value of each block in the first encoding unit is the same as the hash value of the block in the same position in the second encoding unit.

In some examples, the obtaining the optimal BV for the first coding unit further comprises: selecting and maintaining a set of BV candidates based on the determined distortion metric when the first coding unit is encoded at the first time based on the first partition path; calculating a rate-distortion cost for each BV candidate in the set of preserved BV candidates based on a second partitioning path when the first coding unit is coded at a second time based on the first partitioning path; selecting a BV from the set of BV candidates, wherein the selected BV has a minimum rate-distortion cost; and determining the selected BV as an optimal BV for the first coding unit.

In some examples, the calculating the optimal BV for the first coding unit further comprises: maintaining, at the encoder, a BV library, wherein the BV library comprises one or more BV candidates obtained from a BV search of previously encoded coding units, the number of the one or more BV candidates being N, and N being a positive integer; generating a BV candidate list, wherein the BV candidate list comprises all BVs in the BV library, BVs of spatially neighboring coding units, and derived BVs of the first coding unit; calculating a rate-distortion cost for each BV in the BV candidate list and selecting a BV as an optimal BV for the first coding unit, wherein the selected BV has a minimum rate-distortion cost; and updating the BV library by adding one or more BVs from the BV candidate list in place of one or more existing BVs in the BV library, wherein the updated BV library is used to determine an optimal BV for a future coding unit, a number of the one or more BVs added and a number of the one or more existing BVs replaced are each K, and K is a positive integer.

In some examples, the value of N is 64 and the value of K is 8.

In some examples, the derived BV of the first coding unit is generated by: identifying a first reference coding unit coded in the IBC mode, wherein the first reference coding unit is pointed to by a first BV of spatially neighboring coding units of the first coding unit coded in the IBC mode; identifying a second BV, wherein the second BV is a BV of the first reference coding unit; generating a first derived BV by adding the first BV and the second BV; identifying a second reference coding unit coded in the IBC mode, wherein the second reference coding unit is pointed to by a second BV from the first reference coding unit; identifying a third BV, wherein the third BV is a BV of the second reference coding unit; generating a second derived BV by adding the first derived BV and the third BV; and generating one or more derived BVs by repeating the above process until the corresponding reference block is not coded by the IBC mode.

In some examples, the spatially neighboring coding units include the following neighboring coding units: and the left, left lower, upper, right upper and left upper adjacent coding units of the first coding unit.

In some examples, a computing device is provided. The computing device includes: one or more processors; a non-transitory storage device coupled to the one or more processors; and a plurality of programs stored in the non-transitory storage, which when executed by the one or more processors, cause the computing device to perform acts comprising: receiving a video picture including a plurality of coding units, wherein each coding unit of the plurality of coding units is predicted from a reference coding unit in the same picture by an Intra Block Copy (IBC) mode, the reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit; dividing the picture into a plurality of non-overlapping blocks and calculating, by an encoder, a hash value for each block of the plurality of non-overlapping blocks; classifying all non-overlapping blocks into at least two categories including a first category and a second category, wherein the first category includes one or more blocks characterizing one or more hash values encompassed by a first set of hash values and the second category includes all remaining blocks; classifying blocks in the second category into at least two groups including a first group, wherein the first group includes one or more blocks that characterize a same hash value as another block in the second category; determining a distortion metric for calculating a difference between a sample point in one coding unit and a sample point of a reference coding unit in the same picture as the coding unit; and obtaining an optimal Block Vector (BV) of a first coding unit in the picture based on the distortion metric, wherein the BV of the first coding unit is a displacement between the first coding unit and a reference coding unit in the same picture as the first coding unit.

In some examples, a non-transitory computer-readable storage medium is provided that stores a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform acts comprising: receiving a video picture including a plurality of coding units, wherein each coding unit of the plurality of coding units is predicted from a reference coding unit in the same picture by an Intra Block Copy (IBC) mode, the reference coding unit is a reconstructed coding unit, and the plurality of coding units includes a first coding unit; dividing the picture into a plurality of non-overlapping blocks and calculating, by an encoder, a hash value for each block of the plurality of non-overlapping blocks; classifying all non-overlapping blocks into at least two categories including a first category and a second category, wherein the first category includes one or more blocks characterizing one or more hash values encompassed by a first set of hash values and the second category includes all remaining blocks; classifying blocks in the second category into at least two groups including a first group, wherein the first group includes one or more blocks that characterize a same hash value as another block in the second category; determining a distortion metric for calculating a difference between a sample point in one coding unit and a sample point of a reference coding unit in the same picture as the coding unit; and obtaining an optimal Block Vector (BV) of a first coding unit in the picture based on the distortion metric, wherein the BV of the first coding unit is a displacement between the first coding unit and a reference coding unit in the same picture as the first coding unit.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer readable media may include computer readable storage media corresponding to tangible media such as data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the embodiments described herein. The computer program product may include a computer-readable medium.

Further, the above-described methods may be implemented using an apparatus comprising one or more circuits including an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components. The apparatus may use these circuits in conjunction with other hardware or software components for performing the methods described above. Each module, sub-module, unit or sub-unit disclosed above may be implemented, at least in part, using the one or more circuits.

Other examples of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles thereof, including such departures from the present disclosure as come within known or customary practice within the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise examples described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. It is intended that the scope of the invention be limited only by the claims appended hereto.

38页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:视频编码和解码中的运动矢量导出

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类