System and method for performing planar intra-prediction video coding

文档序号：1722494 发布日期：2019-12-17 浏览：19次中文

阅读说明：本技术 用于执行平面帧内预测视频编码的系统和方法 (System and method for performing planar intra-prediction video coding ) 是由基兰·穆克什·米斯拉赵杰克里斯托弗·安德鲁·塞格尔于 2017-12-21 设计创作，主要内容包括：一种产生对视频数据区域的预测的方法,所述方法包括：接收包括样本值的矩形视频块；以及对于所述视频块中包括的每个样本,通过对与所述视频块内的样本的位置相对应的水平内插和竖直内插求平均来产生预测性样本值。(a method of generating a prediction for a region of video data, the method comprising: receiving a rectangular video block comprising sample values; and for each sample included in the video block, generating a predictive sample value by averaging horizontal and vertical interpolations corresponding to locations of samples within the video block.)

1. a method of generating a prediction for a region of video data, the method comprising:

Receiving a rectangular video block comprising sample values; and

For each sample included in the video block, generating a predictive sample value by averaging horizontal and vertical interpolations corresponding to positions of samples within the video block.

2. The method of claim 1, wherein the horizontal interpolation is based on a width of the video block.

3. The method of any of claims 1 or 2, wherein the vertical interpolation is based on a height of the video block.

4. the method of any of claims 1-3, wherein the width and height of the video blocks are not equal.

5. An apparatus for encoding video data, the apparatus comprising one or more processors configured to perform any one or all combinations of the steps of claims 1-4.

6. the apparatus of claim 5, wherein the apparatus comprises a video encoder.

7. The apparatus of claim 5, wherein the apparatus comprises a video decoder.

8. A system, comprising:

The apparatus of claim 6; and

The apparatus of claim 7.

9. An apparatus for encoding video data, the apparatus comprising means for performing any one or all combinations of the steps of claims 1-4.

10. A non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed, cause one or more processors of a device that encodes video data to perform any one and all combinations of the steps of claims 1-4.

Technical Field

The present disclosure relates to video coding, and more particularly, to techniques for blocking pictures of video data.

Background

Digital video functionality may be incorporated into a variety of devices, including digital televisions, portable or desktop computers, tablet computers, digital recording devices, digital media players, video game devices, cellular telephones including so-called smart phones, medical imaging devices, and the like. Digital video may be encoded according to a video coding standard. Video coding standards may incorporate video compression techniques. Examples of video coding standards include ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC) and High Efficiency Video Coding (HEVC). HEVC is described as High Efficiency Video Coding (HEVC), Rec ITU-T h.265, month 4 2015, which is incorporated herein by reference and is referred to herein as ITU-T h.265. Extensions and improvements of ITU-T h.265 are currently being considered for the development of next generation video coding standards. For example, the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC (moving Picture experts group (MPEG) (collectively referred to as Joint video exploration group (JVT)) are studying the potential requirements of future video coding technology standardization with compression capabilities far exceeding the current HEVC standard, Joint browsing model 3(JEM 3), the algorithmic description of Joint browsing test model 3(JEM 3), the ISO/IEC JTC1/SC29/WG11 document JVT-C v3, 2016 5 months, Nissan Watt, CH (which is incorporated herein by reference) describes the coding features under coordinated test model studies by JVET as a potential enhanced video coding technique that exceeds the ITU-T H.265 capabilities. it should be noted that the coding features of JEM 3 are implemented in JEM reference software maintained by the Fraunhofer research organization. currently, the updated JEM reference software version 3(JEM 3.0) may be used. As used herein, the term JEM is used to collectively refer to the implementation of the algorithms and JEM reference software included in JEM 3.

video compression techniques enable a reduction in data requirements for storing and transmitting video data. Video compression techniques can reduce data requirements by exploiting inherent redundancy in video sequences. Video compression techniques may subdivide a video sequence into successive smaller portions (i.e., groups of frames within the video sequence, frames within groups of frames, slices within a frame, coding tree units (e.g., macroblocks) within a slice, coding blocks within a coding tree unit, etc.). Intra-prediction coding techniques (e.g., intra-picture (spatial)) and inter-prediction techniques (i.e., inter-picture (temporal)) may be used to generate differences between a unit of video data to be encoded and a reference unit of video data. The difference value may be referred to as residual data. The residual data may be encoded into quantized transform coefficients. The syntax elements may relate to residual data and reference coding units (e.g., intra prediction mode index, motion vector, and block vector). The residual data and the syntax elements may be entropy encoded. The entropy encoded residual data and the syntax element may be included in a compatible bitstream.

Disclosure of Invention

a method of generating a prediction for a region of video data, comprising: receiving a rectangular video block comprising sample values; and for each sample included in the video block, generating a predictive sample value by averaging horizontal and vertical interpolations corresponding to locations of samples within the video block.

drawings

Fig. 1 is a conceptual diagram illustrating an example of a set of pictures encoded according to quad tree binary tree (quad tree) partitioning in accordance with one or more techniques of this disclosure.

fig. 2 is a conceptual diagram illustrating an example of a binary quadtree tree in accordance with one or more techniques of this disclosure.

Fig. 3 is a conceptual diagram illustrating video component binary tree partitioning in a quadtree according to one or more techniques of this disclosure.

fig. 4 is a conceptual diagram illustrating an example of a video component sampling format in accordance with one or more techniques of this disclosure.

Fig. 5 is a conceptual diagram illustrating a possible coding structure of a block of video data according to one or more techniques of this disclosure.

Fig. 6A is a conceptual diagram illustrating an example of encoding a block of video data according to one or more techniques of this disclosure.

Fig. 6B is a conceptual diagram illustrating an example of encoding a block of video data according to one or more techniques of this disclosure.

Fig. 7 is a block diagram illustrating an example of a system that may be configured to encode and decode video data in accordance with one or more techniques of this disclosure.

Fig. 8 is a block diagram illustrating an example of a video encoder that may be configured to encode video data in accordance with one or more techniques of this disclosure.

Fig. 9 is a conceptual diagram illustrating video component binary tree partitioning in a quadtree according to one or more techniques of this disclosure.

Fig. 10 is a conceptual diagram illustrating video component binary tree partitioning in a quadtree according to one or more techniques of this disclosure.

Fig. 11 is a conceptual diagram illustrating an example of a binary quadtree tree in accordance with one or more techniques of this disclosure.

fig. 12 is a conceptual diagram illustrating binary quadtree partitioning in accordance with one or more techniques of this disclosure.

fig. 13 is a conceptual diagram illustrating binary quadtree partitioning in accordance with one or more techniques of this disclosure.

fig. 14 is a block diagram illustrating an example of a video decoder that may be configured to decode video data in accordance with one or more techniques of this disclosure.

fig. 15A is a conceptual diagram illustrating an example of performing intra prediction according to one or more techniques of this disclosure.

Fig. 15B is a conceptual diagram illustrating an example of performing intra prediction according to one or more techniques of this disclosure.

Fig. 16A is a conceptual diagram illustrating an example of performing intra prediction according to one or more techniques of this disclosure.

Fig. 16B is a conceptual diagram illustrating an example of performing intra prediction according to one or more techniques of this disclosure.

Detailed Description

In general, this disclosure describes various techniques for encoding video data. In particular, this disclosure describes techniques for blocking pictures of video data. It should be noted that although the techniques of this disclosure are described with respect to ITU-T H.264, ITU-T H.265, and JEM, the techniques of this disclosure are generally applicable to video coding. For example, the coding techniques described herein may be incorporated into video coding systems (including video coding systems based on future video coding standards) that include block structures, intra-prediction techniques, inter-prediction techniques, transform techniques, filtering techniques, and/or entropy coding techniques other than those included in ITU-T h.265 and JEM. Accordingly, references to ITU-T H.264, ITU-T H.265, and/or JEM are for descriptive purposes and should not be construed as limiting the scope of the techniques described herein. Further, it should be noted that documents incorporated by reference herein are for descriptive purposes and should not be construed as limiting or creating ambiguity regarding the terms used herein. For example, where an incorporated reference provides a different definition of a term than another incorporated reference and/or than that used herein, that term should be interpreted in the following manner: each respective definition is included broadly and/or each particular definition in the alternatives is included.

in one example, an apparatus for generating a prediction for a region of video data comprises: one or more processors configured to: receiving a rectangular video block comprising sample values; and for each sample included in the video block, generating a predictive sample value by averaging horizontal and vertical interpolations corresponding to locations of samples within the video block.

in one example, a non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed, cause one or more processors of a device to: receiving a rectangular video block comprising sample values; and for each sample included in the video block, generating a predictive sample value by averaging horizontal and vertical interpolations corresponding to locations of samples within the video block.

in one example, an apparatus comprises: means for receiving a rectangular video block comprising sample values; and means for generating, for each sample included in the video block, a predictive sample value by averaging horizontal and vertical interpolations corresponding to positions of samples within the video block.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Video content typically comprises a video sequence consisting of a series of frames (or pictures). A series of frames may also be referred to as a group of pictures (GOP). Each video frame or picture may include a plurality of slices or tiles (tiles), where a slice or tile includes a plurality of video blocks. As used herein, the term video block may generally refer to a region of a picture, or may more specifically refer to a maximum array of sample values, sub-partitions thereof, and/or corresponding structures that may be predictively encoded. Furthermore, the term "current video block" may refer to a region of a picture being encoded or decoded. A video block may be defined as an array of sample values that may be predictively coded. It should be noted that in some cases, pixel values may be described as including sample values for various components of video data, which may also be referred to as color components (e.g., luminance (Y) and chrominance (Cb and Cr) components or red, green, and blue components). It should be noted that in some cases, the terms pixel values and sample values may be used interchangeably. Video blocks may be ordered within a picture according to a scanning pattern (e.g., raster scan). The video encoder may perform predictive encoding on the video block and its sub-partitions. The video block and its sub-partitions may be referred to as a node.

ITU-T h.264 specifies macroblocks comprising 16 x 16 luma samples. That is, in ITU-T h.264, a picture is segmented into macroblocks. ITU-T H.265 specifies a similar Code Tree Unit (CTU) structure. In ITU-T H.265, pictures are segmented into CTUs. In ITU-T h.265, for a picture, the CTU size may be set to include 16 × 16, 32 × 32, or 64 × 64 luma samples. In ITU-T h.265, a CTU consists of a respective Coding Tree Block (CTB) for each component of video data, e.g., luminance (Y) and chrominance (Cb and Cr). Furthermore, in ITU-T h.265, a CTU may be partitioned according to a Quadtree (QT) partitioning structure, which results in the CTBs of the CTU being partitioned into Coded Blocks (CBs). That is, in ITU-T h.265, CTUs may be blocked as quad tree leaf nodes. According to ITU-T h.265, one luma CB along with two corresponding chroma CBs and associated syntax elements are referred to as a Coding Unit (CU). In ITU-T h.265, the minimum allowed size of the CB may be signaled. In ITU-T h.265, the minimum allowed size for luma CB is 8 × 8 luma samples. In ITU-t h.265, the decision to encode a picture region using intra prediction or inter prediction is made at the CU level.

in ITU-T h.265, a CU is associated with a Prediction Unit (PU) structure whose root is at the CU. In ITU-T h.265, the PU structure allows splitting of luma and chroma CB in order to generate corresponding reference samples. That is, in ITU-t h.265, luma and chroma CB may be divided into respective luma and chroma Prediction Blocks (PB), where PB includes a block of sample values to which the same prediction is applied. In ITU-T H.265, CBs may be chunked into 1, 2 or 4 PBs. ITU-T H.265 supports PB size reduction from 64 x64 samples to 4 x 4 samples. In ITU-T h.265, square PBs are supported for intra prediction, where CB may form PB, or CB may be split into four square PBs (i.e., intra prediction PB size types include M × M or M/2 × M/2, where M is the height and width of a square CB. in ITU-T h.265, in addition to square PB, rectangular PBs are supported for inter prediction, where CB may be halved vertically or horizontally to form PB (i.e., inter prediction PB types include M × M, M/2 × M/2, M/2 × M, or M × M/2.) furthermore, it should be noted that in ITU-T h.265, for inter prediction, four asymmetric PB split blocks are supported, where CB is split into two PBs at one quarter of the height (at the top or bottom) or at one quarter of the width of CB (at the left or right side) (i.e., asymmetric partitions include M/4 xM left, M/4 xM right, M xM/4 top, and M x M/4 bottom). Intra prediction data (e.g., intra prediction mode syntax elements) or inter prediction data (e.g., motion data syntax elements) corresponding to the PB are used to generate reference and/or prediction sample values for the PB.

JEM specifies that the maximum size of the CTU is 256 × 256 luma samples. JEM specifies the quad Tree plus binary Tree (QTBT) block structure. In JEM, the QTBT structure enables quadtree leaf nodes to be further partitioned by a Binary Tree (BT) structure. That is, in JEM, the binary tree structure enables the leaf nodes of the quadtree to be recursively divided vertically or horizontally. Fig. 1 shows an example in which CTUs (e.g., CTUs having a size of 256 × 256 luma samples) are partitioned into quad tree leaf nodes and the quad tree leaf nodes are further partitioned according to a binary tree. That is, in fig. 1, the dotted line represents an additional binary tree partition in the quad tree. Thus, the binary tree structure in JEM implements square and rectangular leaf nodes, where each leaf node includes a CB. As shown in fig. 1, the pictures included in a GOP may include slices, where each slice includes a sequence of CTUs, and each CTU may be partitioned according to a QTBT structure. Fig. 1 shows an example of QTBT partitions of one CTU included in a slice. Fig. 2 is a conceptual diagram illustrating an example of QTBT corresponding to the example QTBT chunk shown in fig. 1.

In JEM, QTBT is signaled by signaling a QT split flag and a BT split mode syntax element. When the QT split flag has a value of 1, QT split is indicated. When the QT split flag has a value of 0, the BT split mode syntax element is signaled. When the BT split mode syntax element has a value of 0, binary split is not indicated. When the BT split mode syntax element has a value of 1, split mode is indicated. When the BT split mode syntax element has a value of 2, horizontal split mode is indicated. Further, BT splitting may be performed until a maximum BT depth is reached. Thus, according to JEM, the QTBT shown in fig. 2 can be signaled based on the pseudo syntax provided in table 1:

TABLE 1

In one example, when the maximum QT depth is reached, signaling of the QT flag may be skipped and its value may be inferred, e.g., to be 0. In one example, when the current depth is less than the minimum QT depth, signaling of the QT flag may be skipped and its value may be inferred, e.g., to be 1. In one example, when a maximum depth for signaling of a block type is reached, the associated syntax element may not be signaled in the bitstream and its value may be inferred. In one example, when the minimum depth for the signaling of the partition type has not been reached, the associated syntax element may not be signaled in the bitstream and its value may be inferred. In one example, when QT split is not allowed and the current depth is less than the minimum BT depth, then the signaling of BT split may be modified to not allow BT split to be equal to 0.

in one example, following a tree traversal may be used to signal a split decision. For example:

1. Signal splitting decision of current node

2. For the number of child nodes from i to 1 to the current node (step size is 1), the following operations are performed:

a. determining the child node n corresponding to i (this may be based on a lookup, i.e. on the splitting pattern of the current node)

b. A subtree rooted at the child node n is traversed by recursively invoking a traversal function.

in one example, following a tree traversal may be used to signal a split decision. For example:

1. For the number of child nodes from i to 1 to the current node (step size is 1), the following operations are performed:

a. Determining the child node n corresponding to i (this may be based on a lookup, i.e. on the splitting pattern of the current node)

b. A traversal function is recursively called to traverse the subtree rooted at the child node n.

c. Signal splitting decision of current node

In one example, following a tree traversal may be used to signal a split decision. For example:

1. For the number of child nodes from i to 1 to the current node (step size is 1), the following operations are performed:

a. Determining the child node n corresponding to i (this may be based on a lookup, i.e. on the splitting pattern of the current node)

b. A subtree rooted at the child node n is traversed by recursively invoking a traversal function.

2. And signal splitting decision of the current node.

In one example, the tree may be traversed at increasing depths. In this case, all split decisions for nodes at a particular depth may be signaled before proceeding to the next depth.

As shown in fig. 2 and table 1, the QT split flag syntax element and the BT split mode syntax element are associated with depths, where a zero depth corresponds to the root of the QTBT and higher value depths correspond to subsequent depths beyond the root. Furthermore, in JEM, the luma and chroma components may have separate QTBT tiles. That is, in JEM, luma and chroma components may be partitioned independently by signaling the respective QTBT. Fig. 3 shows an example of CTUs partitioned according to QTBT for luma component and independent QTBT for chroma component. As shown in fig. 3, when independent QTBT is used to block CTUs, the CBs of the luma component need not and need not be aligned with the CBs of the chroma component. Currently, in JEM, an intra-prediction technique is used to implement an independent QTBT structure for slices. It should be noted that in some cases it may be desirable to derive the value of the chromaticity variable from the associated value of the luminance variable. In these cases, the sample positions of the chroma and chroma formats may be used to determine corresponding sample positions in the luma to determine the associated luma variable value.

In addition, it should be noted that JEM includes the following parameters for the signaling of the QTBT tree:

CTU size: a root node size of the quadtree (e.g., 256 × 256, 128 × 128, 64 × 64, 32 × 32, 16 × 16 luma samples);

minimum QT size (MinQTSize): minimum allowed quadtree leaf node sizes (e.g., 16 × 16, 8 × 8 luma samples);

Maximum BT size (MaxBTSize): the maximum allowed binary tree root node size, i.e., the maximum size of a leaf quadtree node that can be partitioned by binary splitting (e.g., 64 × 64 luma samples);

Maximum BT depth (MaxBTDepth): maximum allowed binary tree depth, i.e., the lowest level at which binary splitting can occur, where a quadtree leaf node is the root (e.g., 3);

Minimum BT size (MinBTSize): minimum allowed binary tree leaf node size; i.e. the minimum width or height of a binary leaf node (e.g. 4 luma samples).

It should be noted that in some examples, MinQTSize, MaxBTSize, MaxBTDepth, and/or MinBTSize may be different for different components of the video.

In JEM, CB is used for prediction without further partitioning. That is, in JEM, CB may be a block of sample values to which the same prediction is applied. Thus, a JEM QTBT leaf node may be similar to the PB in ITU-T H.265.

The video sampling format, which may also be referred to as a chroma format, may define the number of chroma samples included in a CU relative to the number of luma samples included in the CU. For example, for a 4:2:0 sampling format, the sampling rate of the luma component is twice that of the chroma components in the horizontal and vertical directions. As a result, for a CU formatted according to the 4:2:0 format, the width and height of the sample array for the luma component is twice the width and height of each sample array for the chroma component. Fig. 4 is a conceptual diagram illustrating an example of a coding unit formatted according to a 4:2:0 sample format. Fig. 4 shows the relative positions of chroma samples with respect to luma samples within a CU. As described above, a CU is typically defined in terms of the number of horizontal and vertical luma samples. Thus, as shown in fig. 4, a 16 × 16CU formatted according to the 4:2:0 sample format includes 16 × 16 samples for the luma component and 8 × 8 samples for each chroma component. Further, in the example shown in fig. 4, the relative positions of the chroma samples with respect to the luma samples of the video blocks adjacent to the 16 × 16CU are shown. For a CU formatted according to the 4:2:2 format, the width of the sample array for the luma component is twice the width of the sample array for each chroma component, but the height of the sample array for the luma component is equal to the height of the sample array for each chroma component. Furthermore, for a CU formatted according to the 4:4:4 format, the sample array for the luma component has the same width and height as the sample array for each chroma component.

As described above, the intra prediction data or the inter prediction data is used to generate reference sample values of the block of sample values. The difference between sample values included in the current PB or another type of picture region structure and associated reference samples (e.g., reference samples generated using prediction) may be referred to as residual data. The residual data may comprise an array of individual difference values corresponding to each component of the video data. The residual data may be in the pixel domain. Transforms such as Discrete Cosine Transforms (DCT), Discrete Sine Transforms (DST), integer transforms, wavelet transforms, or conceptually similar transforms may be applied to the difference array to produce transform coefficients. It should be noted that in ITU-T h.265, a CU is associated with a Transform Unit (TU) structure whose root is at the CU level. That is, in ITU-T h.265, to generate transform coefficients, the difference array may be subdivided (e.g., four 8 × 8 transforms may be applied to a 16 × 16 array of residual values). This subdivision of the difference values may be referred to as a Transform Block (TB) for each component of the video data. It should be noted that in ITU-T H.265, the TB need not be aligned with the PB. Fig. 5 shows an example of alternative PB and TB combinations that may be used to encode a particular CB. Further, it should be noted that in ITU-T H.265, the TB may have the following size: 4 × 4, 8 × 8, 16 × 16, and 32 × 32.

It should be noted that in JEM, the residual values corresponding to CB are used to generate the transform coefficients without further partitioning. That is, in JEM, a QTBT leaf node may be similar to PB and TB in ITU-T H.265. It should be noted that in JEM, the core transform and subsequent quadratic transform may be applied (in the video encoder) to generate the transform coefficients. For video decoders, the order of the transforms is reversed. Furthermore, in JEM, whether or not a quadratic transform is applied to generate transform coefficients may depend on the prediction mode.

Quantization processing may be performed on the transform coefficients. Quantization scales transform coefficients in order to change the amount of data needed to represent a set of transform coefficients. Quantization may include dividing the transform coefficients by a quantization scale factor and any associated rounding function (e.g., rounding to the nearest integer). The quantized transform coefficients may be referred to as coefficient level values. Inverse quantization (or "inverse quantization") may include multiplying the coefficient level values by a quantization scale factor. It should be noted that as used herein, the term quantization process may refer in some cases to restoring transform coefficients in some cases by dividing by a scale factor to produce a horizontal value and multiplying by the scale factor. That is, the quantization process may refer to quantization in some cases and inverse quantization in some cases. Furthermore, it should be noted that although in the following examples the following quantization process is described with respect to arithmetic operations associated with decimal symbols, such description is for illustrative purposes and should not be construed as limiting. For example, the techniques described herein may be implemented in a device using binary operations or the like. For example, bit shifting (bit shifting) operations and the like may be used to implement the multiplication and division operations described herein.

Fig. 6A to 6B are conceptual diagrams illustrating an example of encoding a block of video data. As shown in fig. 6A, a current block of video data (e.g., CB corresponding to a video component) is encoded by generating a residual by subtracting a set of prediction values from the current block of video data, performing a transform on the residual, and quantizing the transform coefficients to generate horizontal values. As shown in fig. 6B, the current block of video data is decoded by performing inverse quantization on the horizontal values, performing an inverse transform, and adding a set of prediction values to the resulting residual. It should be noted that in the examples of fig. 6A-6B, the sample values of the reconstructed block are different from the sample values of the current video block being encoded. In this way, the encoding can be considered lossy. However, for a viewer of the reconstructed video, the difference in sample values may be considered acceptable or imperceptible. Further, as shown in fig. 6A to 6B, scaling is performed using a scale factor array.

in ITU-T h.265, a scale factor array is generated by selecting a scaling matrix and multiplying each entry in the scaling matrix by a quantization scale factor. In ITU-T h.265, a scaling matrix is selected based on the prediction mode and the color component, where the following size scaling matrix is defined: 4 × 4, 8 × 8, 16 × 16, and 32 × 32. Therefore, it should be noted that ITU-T H.265 does not define scaling matrices for sizes other than 4 × 4, 8 × 8, 16 × 16, and 32 × 32. In ITU-T h.265, the value of the quantization scale factor may be determined by the quantization parameter QP. In ITU-T h.265, QP may take 52 values of 0 to 51, and a variation of 1 in QP typically corresponds to a variation of approximately 12% in quantization scale factor value. Further, in ITU-T h.265, a predictive quantization parameter value (which may be referred to as a predictive QP value or QP predictor) and optionally a signaled quantization parameter delta value (which may be referred to as a QP delta value or delta QP value) may be used to derive a QP value for a set of transform coefficients. In ITU-T h.265, the quantization parameter may be updated for each CU, and may be derived for each of the luminance (y) and chrominance (Cb and Cr) components.

As shown in fig. 6A, the quantized transform coefficients are encoded into a bitstream. The quantized transform coefficients and syntax elements (e.g., syntax elements indicating the coding structure of the video block) may be entropy encoded according to an entropy encoding technique. Examples of entropy coding techniques include Content Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), probability interval partitioning entropy coding (PIPE), and so forth. The entropy encoded quantized transform coefficients and corresponding entropy encoded syntax elements may form a compatible bitstream that may be used to render video data at a video decoder. The entropy encoding process may include performing binarization on the syntax elements. Binarization refers to a process of converting the value of a syntax value into a sequence of one or more bits. These bits may be referred to as "bins". Binarization is a lossless process and may involve one or a combination of the following encoding techniques: fixed length coding, unary coding, truncated Rice coding, Golomb coding, order k exponential Golomb coding, and Golomb-Rice coding. For example, binarization may include representing the integer value 5 of a syntax element as 00000101 using an 8-bit fixed length binarization technique, or representing the integer value 5 as 11110 using a unary coding binarization technique. As used herein, the terms fixed-length coding, unary coding, truncated Rice coding, Golomb coding, order k exponential Golomb coding, and Golomb-Rice coding may each refer to a general implementation of these techniques and/or to a more specific implementation of these coding techniques. For example, a Golomb-Rice coding implementation may be specifically defined in accordance with a video coding standard, such as ITU-T H.265. The entropy encoding process also includes encoding bin values (bin values) using a lossless data compression algorithm. In the CABAC example, for a particular bin, a context model may be selected from a set of available context models associated with the bin. In some examples, the context model may be selected based on values of previous bins and/or previous syntax elements. The context model may identify the probabilities of bins having particular values. For example, the context model may indicate a 0.7 probability of encoding a 0-value bin and a 0.3 probability of encoding a 1-value bin. It should be noted that in some cases, the sum of the probability of encoding a 0-value bin and the probability of encoding a 1-value bin may not be 1. After selecting the available context model, the CABAC entropy encoder may arithmetically encode the bin based on the identified context model. The context model may be updated based on the encoded bin values. The context model may be updated based on associated variables stored with the context, such as adaptive window size, number of bins encoded using the context. It should be noted that according to ITU-T h.265 a CABAC entropy encoder may be implemented such that some syntax elements may be entropy encoded using arithmetic coding without using an explicitly assigned context model, which may be referred to as bypass coding.

As described above, the intra prediction data or the inter prediction data may associate a region (e.g., PB or CB) of a picture with a corresponding reference sample. For intra-prediction coding, an intra-prediction mode may specify the location of reference samples within a picture. In ITU-T H.265, the possible intra prediction modes defined include the planar (i.e., surface-fit) prediction mode (predMode: 0), the DC (i.e., flat ensemble average) prediction mode (predMode: 1), and the 33 angular prediction mode (predMode: 2-34). In JEM, the possible intra prediction modes defined include a planar prediction mode (predMode: 0), a DC prediction mode (predMode: 1), and 65 angular prediction modes (predMode: 2-66). It should be noted that the planar and DC prediction modes may be referred to as non-directional prediction modes, and the angular prediction mode may be referred to as a directional prediction mode. It should be noted that the techniques described herein are generally applicable regardless of the number of possible prediction modes defined.

As described above, the planar prediction mode defined according to ITU-T H.265 may be described as a surface fit. The planar prediction mode defined according to ITU-T h.265 consists of averaging two linear predictions. That is, in ITU-T h.265, for each sample included in the CB, the corresponding prediction is determined as the average of two linear predictions. The first horizontal linear prediction is generated by interpolating values of the reconstructed sample values located in the adjacent upper row to the CB (shown by T in fig. 15A) at the rightmost position to the position of the current sample value of the reconstructed sample having the same vertical position (i.e., defined as p-1 < y >) located in the adjacent left column of the CB and the current sample. The second vertical linear prediction is generated by interpolating the values of the reconstructed sample values located in the adjacent left column to the CB (shown as L in fig. 15A) at the bottommost position to the position of the current sample value having the reconstructed sample located in the adjacent upper row of the CB and having the same horizontal position as the current sample, i.e., defined as p-1 y. Thus, referring to FIG. 15A, the planar prediction mode defined according to ITU-T H.265 can be generally described as the average of (1) the interpolated values of T and p [ -1] [ y ] and (2) the interpolated values of L and p [ x ] [ -1 ]. The following equation provides a formal definition of the planar prediction mode provided in ITU-T H.265.

predSamples[x][y]＝((nTbS-1-x)*p[-1][y]+(x+1)*p[nTbS][-1]+(nTbS-1-y)*p[x][-1]+(y+1)*p[-1][nTbS]+nTbS)＞＞(Log2(nTbS)+1)

Wherein the content of the first and second substances,

nTbS specifies the size of the corresponding transform block;

p < -1 > [ y ] is the sample value of the reconstructed sample located in the adjacent left column of the CB and having the same vertical position as the current sample;

p [ nTbS ] [ -1] is the sample value of T;

p [ x ] [ -1] is the sample value of the reconstructed sample located on the next upper row of CB and having the same horizontal position as the current sample;

p < -1 > [ nTbS ] is the sample value of L;

x > y is the arithmetic right shift of x by y binary numbers for the complement integer representation of x; and

Log2(x) x base 2 logarithm.

FIG. 15B shows an example where p [ -1] [ y ] is denoted as B and p [ x ] [ -1] is denoted as a for the current sample C.

For inter-prediction coding, Motion Vectors (MVs) identify reference samples in pictures other than the picture of the video block to be encoded, thereby exploiting temporal redundancy in the video. For example, the current video block may be predicted from a reference block located in a previously encoded frame, and a motion vector may be used to indicate the location of the reference block. The motion vectors and related data may describe, for example, a horizontal component of the motion vectors, a vertical component of the motion vectors, a resolution of the motion vectors (e.g., quarter-pixel precision, half-pixel precision, one-pixel precision, two-pixel precision, four-pixel precision), a prediction direction, and/or a reference picture index value. In addition, coding standards such as ITU-T H.265 may support motion vector prediction. Motion vector prediction enables specifying a motion vector using motion vectors of adjacent blocks. Examples of motion vector prediction include Advanced Motion Vector Prediction (AMVP), Temporal Motion Vector Prediction (TMVP), so-called "merge" mode, and "skip" and "direct" motion inference. In addition, the JEM supports Advanced Temporal Motion Vector Prediction (ATMVP) and spatio-temporal motion vector prediction (STMVP).

As described above, in JEM, QTBT leaf nodes that allow arbitrary rectangular CBs may be similar to PB and TB in ITU-T H.265. Thus, in some cases, JEM may provide less flexibility in possible PB and TB architectures than is provided in ITU-T H.265. As described further above, in ITU-T h.265, only the square TB is allowed, and for intra prediction, only the square PB is allowed. Accordingly, some processes in ITU-T h.265 are defined based on the assumption that the sample value array input to the process must be square, and thus some processes in ITU-T h.265 may not provide sufficient support for encoding arbitrary rectangular video blocks. Furthermore, QTBT partitioning and related signaling as defined in JEM may be less than ideal. This disclosure describes techniques for performing video encoding using arbitrary rectangular video blocks.

Fig. 7 is a block diagram illustrating an example of a system that may be configured to encode (i.e., encode and/or decode) video data in accordance with one or more techniques of this disclosure. System 100 represents an example of a system that may perform video encoding using any rectangular video block in accordance with one or more techniques of this disclosure. As shown in fig. 1, system 100 includes a source device 102, a communication medium 110, and a destination device 120. In the example shown in fig. 1, source device 102 may include any device configured to encode video data and transmit the encoded video data to communication medium 110. Destination device 120 may include any device configured to receive encoded video data and decode encoded video data via communication medium 110. Source device 102 and/or destination device 120 may comprise computing devices equipped for wired and/or wireless communication, and may include set-top boxes, digital video recorders, televisions, desktop, laptop or tablet computers, gaming consoles, mobile devices including, for example, "smart" phones, cellular telephones, personal gaming devices, and medical imagination devices.

Communication media 110 may include any combination of wireless and wired communication media and/or storage devices. Communication medium 110 may include coaxial cables, fiber optic cables, twisted pair wires, wireless transmitters and receivers, routers, switches, repeaters, base stations, or any other device that may be beneficial for enabling communication between various devices and stations. The communication medium 110 may include one or more networks. For example, the communication medium 110 may include a network configured to enable access to the world wide web (e.g., the internet). The network may operate according to a combination of one or more telecommunications protocols. The telecommunications protocol may include proprietary aspects and/or may include a standardized telecommunications protocol. Examples of standardized telecommunication protocols include the Digital Video Broadcasting (DVB) standard, the Advanced Television Systems Committee (ATSC) standard, the Integrated Services Digital Broadcasting (ISDB) standard, the cable data service interface specification (DOCSIS) standard, and global system mobile communications (GSM). Standards, Code Division Multiple Access (CDMA) standards, third generation partnership project (3GPP) standards, European Telecommunications Standards Institute (ETSI) standards, Internet Protocol (IP) standards, Wireless Application Protocol (WAP) standards, and Institute of Electrical and Electronics Engineers (IEEE) standards.

The storage device may include any type of device or storage medium capable of storing data. The storage medium may include a tangible or non-transitory computer readable medium. The computer readable medium may include an optical disc, flash memory, magnetic memory, or any other suitable digital storage medium. In some examples, the memory device or portions thereof may be described as non-volatile memory, and in other examples, portions of the memory device may be described as volatile memory. Examples of volatile memory may include Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), and Static Random Access Memory (SRAM). Examples of non-volatile memory may include magnetic hard disks, optical disks, floppy disks, flash memory, or forms of electrically programmable memory (EPROM) or Electrically Erasable and Programmable (EEPROM) memory. The storage device may include a memory card (e.g., a Secure Digital (SD) memory card), an internal/external hard disk drive, and/or an internal/external solid state drive. The data may be stored on the storage device according to a defined file format.

Referring again to fig. 1, the source device 102 includes a video source 104, a video encoder 106, and an interface 108. Video source 104 may include any device configured to capture and/or store video data. For example, video source 104 may include a video camera and a storage device operatively coupled thereto. Video encoder 106 may include any device configured to receive video data and generate a compatible bitstream representing the video data. A compatible bitstream may refer to a bitstream from which a video decoder may receive and reproduce video data. Aspects of the compatible bitstream may be defined according to a video coding standard. The video encoder 106 may compress the video data when generating the compatible bitstream. The compression may be lossy (identifiable or unrecognizable) or lossless. The interface 108 may include any device configured to receive a compatible video bitstream and transmit and/or store the compatible video bitstream to a communication medium. The interface 108 may comprise a network interface card, such as an ethernet card, and may include an optical transceiver, a radio frequency transceiver, or any other type of device that may send and/or receive information. Further, the interface 108 may include a computer system interface that may enable storage of a compatible video bitstream on a storage device. For example, the interface 108 may include support for Peripheral Component Interconnect (PCI) and peripheral component interconnect express (PCIe) bus protocols, proprietary bus protocols, Universal Serial Bus (USB) protocols, I²C ofa chipset, or any other logical and physical structure that may be used to interconnect peer devices.

Referring again to fig. 1, destination device 120 includes an interface 122, a video decoder 124, and a display 126. Interface 122 may include any device configured to receive a compatible video bitstream from a communication medium. The interface 108 may comprise a network interface card, such as an ethernet card, and may include an optical transceiver, a radio frequency transceiver, or any other type of device that may receive and/or transmit information. Further, interface 122 may comprise a computer system interface that enables retrieval of a compatible video bitstream from a storage device. For example, interface 122 may include a chipset that supports PCI and PCIe bus protocols, a proprietary bus protocol, a USB protocol, or any other logical and physical structure that may be used to interconnect peer devices. Video decoder 124 may include any device configured to receive a compatible bitstream and/or acceptable variants thereof and render video data therefrom. Display 126 may include any device configured to display video data. The display 126 may include one of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other type of display. The display 126 may include a high definition display or an ultra high definition display. It should be noted that although in the example shown in fig. 7, video decoder 124 is described as outputting data to display 126, video decoder 124 may be configured to output video data to various types of devices and/or subcomponents thereof. For example, video decoder 124 may be configured to output video data to any communication medium, as described herein.

fig. 8 is a block diagram illustrating an example of a video encoder 200 that may implement techniques for encoding video data described herein. It should be noted that although the example video encoder 200 is shown as having different functional blocks, such illustration is for descriptive purposes and does not limit the video encoder 200 and/or its subcomponents to a particular hardware or software architecture. The functionality of video encoder 200 may be implemented using any combination of hardware, firmware, and/or software implementations. In one example, video encoder 200 may be configured to encode video data according to the techniques described herein. Video encoder 200 may perform intra-prediction encoding and inter-prediction encoding of picture regions, and thus may be referred to as a hybrid video encoder. In the example shown in fig. 8, video encoder 200 receives a source video block. In some examples, a source video block may include a picture region that has been divided according to a coding structure. For example, the source video data may include macroblocks, CTUs, CBs, sub-partitions thereof, and/or other equivalent coding units. In some examples, video encoder 200 may be configured to perform additional sub-partitioning of the source video block. It should be noted that some of the techniques described herein may be generally applicable to video encoding regardless of how the source video data is partitioned prior to and/or during encoding. In the example shown in fig. 8, the video encoder 200 includes a summer 202, a transform coefficient generator 204, a coefficient quantization unit 206, an inverse quantization/transform processing unit 208, a summer 210, an intra prediction processing unit 212, an inter prediction processing unit 214, a post filter unit 216, and an entropy encoding unit 218.

As shown in fig. 8, video encoder 200 receives a source video block and outputs a bitstream. As described above, JEM includes the following parameters for signaling of the QTBT tree: CTU size, MinQTSize, MaxBTSize, MaxBTDepth, and MinBTSize. Table 2 shows the block sizes of the QT leaf nodes at various QT depths for different CTU sizes (in this example, MinQTSize is 8). Further, table 3 shows the allowed block sizes of BT leaf nodes at various BT depths for the binary tree root node size (i.e., leaf quadtree node size).

TABLE 2

TABLE 3

Thus, referring to table 2, the size of the quadtree nodes forming the root of the binary tree may be determined based on CTU size and QT depth. If the quadtree is further partitioned into a binary tree, the binary tree leaf node sizes may be determined based on the QT node size and BT depth, as shown in Table 3. Each of MaxBTSize, MaxBTDepth, and MinBTSize may be used to determine a minimum allowed binary tree leaf node size. For example, if the CTU size is 128 × 128, the QT depth is 3, the MaxBTSize is 16 × 16, and the MaxBTDepth is 2, then the minimum allowed binary tree leaf node size includes 64 samples (i.e., 8 × 8, 16 × 4, or 4 × 16). In this case, if MaxBTDepth is 1, the minimum allowed binary tree leaf node size includes 128 samples (i.e., 16 × 8 or 8 × 16). Table 4 shows the block sizes of BT leaf nodes for various combinations of QT depth and BT depth with CTU size of 128 × 128.

TABLE 4

As mentioned above, the QTBT partition and related signaling defined in JEM may be less than ideal. For example, as described above with reference to fig. 3, in JEM, when the independent QTBT is used to block the CTU, the CB of the luma component need not and does not have to be aligned with the CB of the chroma component. That is, in JEM, when independent QTBT is used to block the CTU, such signaling may be less than ideal using a separate QT split flag and a set of BT split mode syntax elements to signal each of the luma component and chroma component blocks.

In some examples, in accordance with the techniques described herein, video encoder 200 may be configured to block CTUs such that luma and chroma components have a common block structure up to a particular depth and thus share a common set of QT split flag and BT split mode syntax elements up to the particular depth. It should be noted that in this case, the depth may correspond to the absolute depth of the QTBT (i.e., the depth formed by the sum of the QT depth and the BT depth). It should be noted that in some cases, depth may correspond to multiple samples of a component (e.g., luma and/or chroma) in a block, and may optionally be indicated in terms of a minimum width and/or a minimum height. For example, QTBT may be shared until an array of chroma samples is blocked to a particular size. For example, QTBT may be shared until one of the height or width of a node is less than a specified number of samples, e.g., 8 samples, of a component. For example, QTBT may be shared until the number of samples of a component (e.g., luma and/or chroma) of a node is less than a specified number, e.g., 64. In one example, the depth may be predetermined for a group of CTUs. For example, the depth may be set to 2 for a slice of video data, or to 2 for a picture of video data, for example. In one example, a syntax element (e.g., shared _ depth, etc.) may be used to signal depth. In one example, the shared depth syntax element may be signaled at the CTU level. In one example, the shared depth syntax element may be signaled at the slice level. In one example, the shared depth syntax element may be signaled at a parameter set level (e.g., a Picture Parameter Set (PPS) or a Sequence Parameter Set (SPS)). In one example, a higher level flag may be used to indicate the presence of a shared depth syntax element at a lower level. For example, a syntax element included at the slice level may indicate whether a shared depth syntax element is included for each CTU included in the slice. It should be noted that in a similar manner, the CTU level flag may be used to indicate one or more of shared QTBT, partially shared QTBT, or independent QTBT for luma and chroma components.

In one example, the shared depth syntax element may be a flag at the split level. For example, for each QT split flag and/or BT split mode, the respective flag may indicate whether the indicated split is shared. In one example, a high level of sharing depth syntax element may be used to set the sharing depth, and a lower level flag may be used to indicate sharing beyond the level specified by the syntax element. For example, the sharing depth may be set to a depth of 1 at the slice level, and each CTU within a slice may include a flag indicating whether a particular CTU shares a depth that extends beyond 1 to a depth of 2.

Fig. 9 and 10 are conceptual diagrams illustrating examples of common partitions in which luminance and chrominance components have shared depths. In the example shown in fig. 9, the luminance component is additionally blocked beyond the shared depth 1, and the chrominance component is not blocked beyond the depth 1. In the example shown in fig. 10, both the luminance component and the chrominance component are independently blocked beyond the shared depth of 1. As described above, the video sampling format may define the number of chroma samples included in a CU relative to the number of luma samples included in the CU. In one example, video encoder 200 may be configured to selectively partition chroma components beyond a shared depth based on a sampling format. For example, where the CTUs are formatted according to a 4:2:0 sample format, in one example, video encoder 200 may be configured such that chroma components may not be further blocked beyond the shared depth. Further, where the CTUs are formatted according to a 4:4:4 sample format, in one example, video encoder 200 may be configured such that chroma components may be further blocked beyond a shared depth. Further, in addition or alternatively to the sampling format, one or more of the following may be possible: CTU size, MinQTSize, MaxBTSize, MaxBTDepth, and/or MinBTSize may be used to determine that chroma components are allowed to be blocked beyond the shared depth.

fig. 11 is a conceptual diagram illustrating an example of QTBT corresponding to the example QTBT chunk shown in fig. 10. As shown in fig. 11, the QTBT for luminance and the QTBT for chrominance are the same before the depth 1, i.e., the shared depth is 1. Further, it should be noted that the luminance tree shown in FIG. 11 is the same as the QTBT shown in FIG. 2 for purposes of explanation. As such, for the example shown in fig. 11, the video encoder 200 may be configured to signal the luminance QTBT based on the pseudo syntax provided in table 1. In one example, video encoder 200 may be configured to signal a chroma QTBT that exceeds the shared QTBT based on the pseudo syntax provided in table 5.

TABLE 5

In the example shown in table 5, the add chunking condition may include a condition based on one or more of the following: sample format, CTU size, MinQTSize, MaxBTSize, MaxBTDepth, and/or MinBTSize, as described above. It should be noted that in one example, video encoder 200 may be configured to signal a QTBT that exceeds the shared QTBT chroma by multiplexing syntax elements illustrated in table 1 and table 5. For example, the syntax element for the chroma component node exceeding the shared node and the syntax element that is a descendant of the shared node may be signaled after the syntax element of the luma component node exceeding the shared node and the syntax element that is a descendant of the shared node. Table 6 shows an example of a pseudo syntax in which syntax elements for chroma components are signaled after the syntax elements terminate the shared node as a leaf node for the luma component. In one example, the chroma syntax element may be signaled before the luma syntax element.

TABLE 6

In this way, the video encoder 200 representation is configured to: an example of a device that receives a video block, the video block comprising sample values of a first component of video data and a second component of video data; partitioning sample values of a first component of video data according to a first quadtree binary tree partitioning structure; and block sample values of a second component of the video data to a shared depth according to a first binary quadtree block structure.

As described above, ITU-T H.265 supports four asymmetric PB-blocks for inter prediction. It should be noted that the asymmetric PB blocks provided in ITU-T h.265 may be less than ideal. That is, the asymmetric PB segment provided in ITU-T h.265 is limited to PB such that it has a quarter of the width or height of the square CB. For example, for 32 × 32CB in ITU-T H.265, the M/4 × M left partition divides the CB into 8 × 32PB and 24 × 32 PB. ITU-T h.265 does not provide a mechanism for partitioning CBs into PBs based on arbitrary offsets. That is, PB is not allowed to have any width or height. In some cases, it may be useful to block the CTBs according to arbitrary offsets. For example, in the example above, for 32 × 32CB, it may be useful in some cases to block the CB into 10 × 32PB and 22 × 32PB based on the attributes of the image. Furthermore, referring to table 3 above, in some cases it may be useful to divide the binary leaf nodes further according to arbitrary offsets. That is, in JEM, the potential leaf node sizes are limited to those shown in table 3. For example, in the case where the binary leaf nodes are 32 × 128, it may be useful to further block the binary leaf nodes into 32 × 28CB and 32 × 100 CB. It should be noted that blocking a block of video data according to any offset in accordance with the techniques described herein may be applied to at least one or more of the following: (1) in case a CU (or CB) forms the root of a PU (or PB), any offset blocking may be applied to block the CTU (or CTB) into a CU (or CB); (2) in case CU (or CB) does not form the root of PU (or PB), i.e. in case CB level determines prediction, any offset blocking may be applied to block CTU (or CTB) into CU (or CB); (3) any offset partition may be applied to the partition of the PU (or PB); and (4) any offset blocking may be applied to block sample blocks corresponding to nodes of the coding tree. It should be noted that in some cases, any offset partition may be selectively enabled for CTU partitions and/or PU partitions.

Figure 12 shows an example of further horizontally partitioning binary leaf nodes according to an offset. It should be noted that although the example shown in fig. 12 includes partitioning binary leaf nodes according to arbitrary offset partitions, such examples should not be construed as limiting and as described herein, arbitrary offset partitions may be applicable to various scenarios in which video data has been partitioned. In the example shown in fig. 12, CTB may correspond to luminance CTB having a size of 256 × 256. In this case, the size of the binary leaf node in the upper right corner is 32 × 128. As described above, it may be useful to further block the 32x 128 binary leaf nodes into 32x28CB and 32x100 CB. In the example partition shown in fig. 12, the offset has a value of 28. In one example, the video encoder 200 may be configured to block leaf nodes of the QTBT according to the offset. In one example, video encoder 200 may be configured such that any number of asymmetric offset block structures may be allowed. That is, in some examples, the offset may be in the range of 2 to the block height minus 2 for vertical offsets and 2 to the block width minus 2 for horizontal offsets. In some examples, the offset may be in the range of 1 to the block height minus 1 for vertical offsets and 1 to the block width minus 1 for horizontal offsets. In some examples, allowed asymmetric offset blocking may be restricted based on attributes associated with the CTU and/or the prediction mode. For example, asymmetric offset partitioning may be limited based on whether a CU is encoded according to intra prediction or inter prediction. Further, in some examples, asymmetric offset partitioning may be limited based on the size of the CU or CB. In one example, the value of the offset may be limited to an integer multiple of the setting. In one example, the value of the offset may be limited to an integer multiple of the permutation and some additional integer value (e.g., 2). In some examples, the set of integer multiples may be based on the size of the leaf node to which the offset is being applied. For example, regarding the case where 32 × 128 leaf nodes are horizontally partitioned as described above. In one example, the value of the offset may be limited to a multiple of 4 (i.e., the allowed offset values include 4, 8, 12, 16,..., 120, 124). In one example, a set of offset values of the index may be used to specify the value of the offset. For example, with respect to the case where 32 × 128 leaf nodes are horizontally partitioned as described above, in one example, the values of the offsets may be limited to the following set of offset values 28, 42, 84, and 100. In some examples, a set of offset value combinations for the index may be selected to avoid partitions that may be signaled using QTBT signaling or an approximate change thereof. For example, where a 32x 128 leaf node is partitioned horizontally, in some cases (e.g., depending on the value of MaxBTDepth), the BT structure may allow the 32x 128 leaf node to be partitioned into two 32x64 partitions. In this case, a set of offset values for the index may be selected such that the offsets are not within the specified range of 64. Further, in some examples, a set of offset values for the index may be based on the value of MaxBTDepth.

It should be noted that in some examples, the allowed asymmetric offset tiles may include horizontal or vertical tiles. For example, in one example, with respect to 32x 128 binary leaves, video encoder 200 may be configured to further block the 32x 128 binary leaf nodes into 8 x 128 CBs and 24 x 128 CBs. In this manner, the offset may indicate an offset value relative to the anchor point. For example, an anchor point may include a left edge for vertical tiling and a top edge for horizontal tiling. It should be noted that in some examples, the anchor may be a set number of samples from the edge. For example, the anchor may be set to 4 samples from the edge. In this way, an offset value of zero would indicate a block of 4 samples from the edge. In one example, the offset may include fixed length binarization. In one example, the offset may comprise truncated unary binarization.

As described above, in one example, a set of offset values of the index may be used to specify the value of the offset. In one example, a set of offset values for an index may correspond to a fractional chunk. Tables 7 and 8 provide examples of a set of offset values for the index corresponding to the fractional chunk. With respect to tables 7 and 8, it should be noted that in some examples, fractional partitions may be rounded to the nearest sample value. For example, with respect to the case where horizontal position 32 × 128 leaf nodes are partitioned as described above, in one example, the offset from edge value 1/3 may be rounded to 43. With respect to tables 7 and 8, it should be noted that in an example, fractional blocking may be rounded to the nearest integer multiple of sample values. For example, with respect to the case where a 32x 128 leaf node is horizontally partitioned as described above, in one example, the offset from the edge value 1/3 may be rounded to 44, which is the closest multiple of 4 samples. With regard to tables 7 and 8, it should be noted that in an example, the fractional partitions may be rounded down to the nearest integer multiple sample values. For example, with respect to the case where a 32x 128 leaf node is horizontally partitioned as described above, in one example, the offset from the edge value 1/3 may be rounded to 40, which is the closest multiple of 4 samples.

Offset from edge	Binary representation of an offset
		Block size under consideration 1/4	01
Block size under consideration 1/2	1
		Block size under consideration 3/4	00

TABLE 7

Offset from edge	Binary representation of an offset
		Block size under consideration 1/3	01
Block size under consideration 1/2	1
		Block size under consideration 2/3	00

TABLE 8

As described above, the video encoder 200 may be configured to signal the QTBT. In one example, the video encoder 200 may be configured to indicate the offset value by incorporating offset signaling within the signaling of the QTBT. For example, the example shown in fig. 12 includes the same QTBT structure as the example shown in fig. 1. As such, the offset signaling may be based on the example pseudo syntax shown in table 1, where in one example the offset signaling is included after the syntax indicating the leaf nodes. Table 9 shows an example pseudo syntax corresponding to the case where the binary leaf nodes with a size of 32x 128 in the upper right corner are further partitioned into 32x28CB and 32x100 CB for 256 x 256 CTB.

TABLE 9

thus, according to the example shown in table 9, the video encoder 200 may be configured to: signaling a flag indicating that offset partitioning applies to a QTBT leaf node, signaling a signal indicating whether the offset partitioning is a vertical partitioning or a horizontal partitioning; and signals a value indicating an offset value. It should be noted that in other examples, video encoder 200 may be configured to use other signaling techniques to indicate the offset value. For example, video encoder 200 may be configured to signal an offset value for CB levels. It should be noted that in some examples, the offset may be signaled as an extension of the current BT split mode signaling. That is, for example, in JEM, the BT split mode syntax element results in a node halving. In one example, BT split mode signaling may include signaling a split type and offset pair in accordance with the techniques described herein. For example, referring to the example shown in fig. 12, in one example, the offset may be signaled as follows: (BT split 2, Offset value 28).

Further, in one example, each CB of the CTB may be indexed according to a defined scan order, and video encoder 200 may be configured to signal the offset value by signaling the index value of the CB. For example, referring to FIG. 13, the binary leaf node in the upper right corner is shown indexed as CB₈. Thus, in one example, video encoder 200 may be configured to use this index value to indicate that offset partitioning is performed for this leaf node. In this way, the videoencoder 200 represents an example of a device configured to determine offset values and to block leaf nodes according to the offset values.

in one example, a predetermined order set of split decisions (arbitrary offset chunks and/or QT chunks) may be applied to a sample chunk and indicated in the bitstream using a single indicator.

Referring again to fig. 8, video encoder 200 may generate residual data by subtracting the predictive video block from the source video block. Summer 202 represents a component configured to perform this subtraction operation. In one example, the subtraction of the video block occurs in the pixel domain. The transform coefficient generator 204 applies a transform, such as a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or a conceptually similar transform, to its residual block or sub-partition (e.g., four 8 x 8 transforms may be applied to a 16 x 16 array of residual values) to produce a set of residual transform coefficients. The transform coefficient generator 204 may be configured to perform any and all combinations of transforms included in the family of discrete trigonometric transforms. As described above, in ITU-T h.265, the TB is limited to the following sizes 4 × 4, 8 × 8, 16 × 16, and 32 × 32. In one example, the transform coefficient generator 204 may be configured to perform the transform according to arrays having sizes of 4 × 4, 8 × 8, 16 × 16, and 32 × 32. In one example, the transformation coefficient generator 204 may also be configured to perform transformations from arrays having other dimensions. In particular, in some cases it may be useful to perform a transformation on a rectangular array of differences. In one example, the transform coefficient generator 204 may be configured to perform the transform according to an array of the following sizes: 2 × 2, 2 × 4N, 4 mx 2 and/or 4 mx 4N. In one example, a 2-dimensional (2D) mxn inverse transform may be implemented as a 1-dimensional (1D) M-point inverse transform followed by a 1D N point inverse transform. In one example, the 2D inverse transform may be implemented as a 1D N point vertical transform followed by a 1D N point horizontal transform. In one example, the 2D inverse transform may be implemented as a 1D N point horizontal transform followed by a 1D N point vertical transform. The transform coefficient generator 204 may output the transform coefficients to the coefficient quantization unit 206.

The coefficient quantization unit 206 may be configured to perform quantization of the transform coefficients. As described above, the quantization degree may be modified by adjusting the quantization parameter. The coefficient quantization unit 206 may also be configured to determine quantization parameters and output QP data (e.g., data used to determine quantization group sizes and/or delta QP values) that may be used by a video decoder to reconstruct the quantization parameters to perform inverse quantization during video decoding. It should be noted that in other examples, one or more additional or alternative parameters may be used to determine the quantization level (e.g., a scale factor). The techniques described herein are generally applicable to determining a quantization level for a transform coefficient corresponding to a component of video data based on a quantization level for a transform coefficient corresponding to another component of video data.

As shown in fig. 8, the quantized transform coefficient is output to the inverse quantization/transform processing unit 208. The inverse quantization/transform processing unit 208 may be configured to apply inverse quantization and inverse transform to generate reconstructed residual data. As shown in fig. 8, at summer 210, the reconstructed residual data may be added to the predictive video block. In this way, the encoded video block may be reconstructed, and the resulting reconstructed video block may be used to evaluate the encoding quality for a given prediction, transform, and/or quantization. The video encoder 200 may be configured to perform multiple encoding channels (e.g., encoding while changing one or more of prediction, transform parameters, and quantization parameters). Rate distortion or other system parameters of the bit stream may be optimized based on the evaluation of the reconstructed video block. In addition, the reconstructed video block may be stored and used as a reference for predicting subsequent blocks.

As described above, video blocks may be encoded using intra prediction. The intra prediction processing unit 212 may be configured to: an intra prediction mode is selected for a video block to be encoded. The intra prediction processing unit 212 may be configured to: the frame and/or its region is evaluated and the intra prediction mode used to encode the current block is determined. As shown in fig. 8, the intra prediction processing unit 212 outputs intra prediction data (e.g., syntax elements) to the entropy encoding unit 218 and the transform coefficient generator 204. As described above, the transformation performed on the residual data may be mode dependent. As described above, the possible intra prediction modes may include a plane prediction mode, a DC prediction mode, and an angle prediction mode. Further, in some examples, the prediction of the chroma component may be inferred from intra prediction for the luma prediction mode.

As mentioned above, ITU-T h.265 provides a formal definition of the planar prediction mode, which is based on the variable nTbS, which specifies the size of the corresponding transform block. As further described above, in ITU-T h.265, the TB is limited to the following sizes 4 × 4, 8 × 8, 16 × 16, and 32 × 32. Thus, nTbS may have a value of 4, 8, 16, or 32 to indicate the size of a square, and thus cannot indicate a rectangle of any size. Therefore, the plane prediction mode defined according to ITU-T h.265 may be less than ideal for performing plane prediction with respect to a rectangle of arbitrary size. In accordance with the techniques described herein, video encoder 200 may be configured to perform plane prediction with respect to arbitrarily sized rectangular CBs.

In one example, video encoder 200 may be configured to perform plane prediction with respect to an arbitrarily sized rectangular CB by averaging horizontal interpolation and vertical prediction. Such plane prediction can be generally described as follows:

predSamples[x][y]＝(Hor_Interpolation[x][y]+Ver_Interpolation[x][y]+1)/2

in one example, Hor _ Interpolation [ x ] [ y ] and Ver _ Interpolation [ x ] [ y ] can be based on the width and height of the CB, respectively, according to the following equations:

Hor_Interpolation[x][y]＝((nCbSW-1-x)*p[-1][y]+(x+1)*p[nCbSW][-1])/nCbSW

and

Ver_Interpolation[x][y]＝((nCbSH-1-y)*p[x][-1]+(y+1)*p[-1][nCbSH])/nCbSH

it can be expressed as:

predSamples[x][y]＝(((nCbSW-1-x)*p[-1][y]+(x+1)*p[nCbSW][-1])*nCbSH+((nCbSH-1-y)*p[x][-1]+(y+1)*p[-1][nCbSH])*nCbSW+nCbSW*nCbSH)/(2*nCbSW*nCbSH)

Wherein the content of the first and second substances,

The nCbSW specifies the width of the corresponding coding block;

The nCbSH specifies the height of the corresponding coding block;

p < -1 > [ y ] is the sample value of the reconstructed sample located in the adjacent left column of the CB and having the same vertical position as the current sample;

P [ nCbSW ] [ -1] is the sample value of T;

p [ x ] [ -1] is the sample value of the reconstructed sample located on the next upper row of CB and having the same horizontal position as the current sample;

p < -1 > [ CTbSH ] is the sample value of L; and

Is an integer division operation with the result truncated to zero.

With regard to the example equations above, it should be noted that although the equations are described with regard to CB, in other examples, the equations may be described based on PB, TB, and/or other coding structures or picture regions.

With respect to the example equations above, it should be noted that in some cases, the coding block may correspond to a transform block, and in other cases, the coding block and transform block structures may be independent. Fig. 16A shows the positions of T and L relative to an example rectangle CB according to the above equation. FIG. 16B shows an example where C, p [ -1] [ y ] is denoted as B and p [ x ] [ -1] is denoted as a for the current sample. It should be noted that according to the above equation, in the case where nCbSW is greater than nCbSH, a relatively higher weight is applied than b, and in the case where nCbSH is greater than nCbSW, a relatively higher weight is applied than b. Accordingly, video encoder 200 may be configured to perform plane prediction in a manner that takes into account the direction of the rectangular array of sample values. It should be noted that in some examples, the weighted average may be applied to horizontal interpolation and vertical interpolation. For example, such a plane prediction may be described generally as follows:

predSamples[x][y]＝(α*Hor_Interpolation[x][y]+β*Ver_Interpolation[x][y]+(α+β)/2)/(α+β)，

where α and β depend on nCbSH and/or nCbSW. Further, the alpha and beta correlations may depend on the PB, TB, and/or other coding structures or picture regions in other examples.

The inter prediction processing unit 214 may be configured to perform inter prediction encoding on the current video block. Inter prediction processing unit 214 may be configured to receive a source video block and calculate motion vectors for PUs of the video block. The motion vector may indicate the displacement of a PU (or similar coding structure) of a video block within the current video frame relative to a predictive block within the reference frame. Inter-prediction encoding may use one or more reference pictures. Further, the motion prediction may be uni-prediction (using one motion vector) or bi-prediction (using two motion vectors). Inter prediction processing unit 214 may be configured to select a predictive block by computing pixel differences determined by, for example, Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. As described above, the motion vector may be determined and specified based on motion vector prediction. The inter prediction processing unit 214 may be configured to perform motion vector prediction, as described above. The inter prediction processing unit 214 may be configured to generate a predictive block using motion prediction data. For example, inter-prediction processing unit 214 may locate the predictive video block within a frame buffer (not shown in fig. 8). It should be noted that inter prediction processing unit 214 may be further configured to apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Inter prediction processing unit 214 may output motion prediction data for the calculated motion vector to entropy encoding unit 218. As shown in fig. 8, inter prediction processing unit 214 may receive reconstructed video blocks via post filter unit 216. The post-filter unit 216 may be configured to perform deblocking (deblocking) and/or Sample Adaptive Offset (SAO) filtering. Deblocking refers to the process of smoothing the boundaries of the reconstructed video block (e.g., making the boundaries imperceptible to a viewer). SAO filtering is a non-linear amplitude mapping that may be used to improve reconstruction by adding offsets to the reconstructed video data.

Referring again to fig. 8, entropy encoding unit 218 receives quantized transform coefficients and predictive syntax data (i.e., intra prediction data, motion prediction data, QP data, etc.). It should be noted that in some examples, coefficient quantization unit 206 may perform a scan of a matrix including quantized transform coefficients prior to outputting the coefficients to entropy encoding unit 218. In other examples, entropy encoding unit 218 may perform the scanning. Entropy encoding unit 218 may be configured to perform entropy encoding in accordance with one or more techniques described herein. The entropy encoding unit 218 may be configured to output a compatible bitstream, i.e., a bitstream from which the video decoder may receive and reproduce video data.

Fig. 14 is a block diagram illustrating an example of a video decoder that may be configured to decode video data in accordance with one or more techniques of this disclosure. In one example, the video decoder 300 may be configured to reconstruct video data based on one or more of the techniques described above. That is, the video decoder 300 may operate in a reverse manner to the video encoder 200 described above. The video decoder 300 may be configured to perform intra-prediction decoding and inter-prediction decoding, and thus may be referred to as a hybrid decoder. In the example shown in fig. 14, the video decoder 300 includes an entropy decoding unit 302, an inverse quantization unit 304, an inverse transform processing unit 306, an intra prediction processing unit 308, an inter prediction processing unit 310, a summer 312, a post filter unit 314, and a reference buffer 316. The video decoder 300 may be configured to decode video data in a manner consistent with a video coding system that may implement one or more aspects of a video coding standard. It should be noted that although the example video decoder 300 is shown as having different functional blocks, such illustration is for descriptive purposes and does not limit the video decoder 300 and/or its subcomponents to a particular hardware or software architecture. The functionality of the video decoder 300 may be implemented using any combination of hardware, firmware, and/or software implementations.

As shown in fig. 14, the entropy decoding unit 302 receives an entropy-encoded bitstream. The entropy decoding unit 302 may be configured to decode the quantized syntax elements and the quantized coefficients from the bitstream according to a process reciprocal to the entropy encoding process. The entropy decoding unit 302 may be configured to perform entropy decoding according to any of the entropy encoding techniques described above. The entropy decoding unit 302 may parse the encoded bitstream in a manner consistent with a video coding standard. The video decoder 300 may be configured to parse an encoded bitstream, wherein the encoded bitstream is generated based on the techniques described above. That is, for example, the video decoder 300 may be configured to determine the QTBT block structure that is generated and/or signaled based on one or more of the techniques described above for reconstructing video data. For example, the video decoder 300 may be configured to parse syntax elements and/or evaluate attributes of the video data in order to determine the shared depth of the QTBT. Further, the video decoder 300 may be configured to determine offset values and block the block of video data according to the offset values.

Referring again to fig. 14, the inverse quantization unit 304 receives the quantized transform coefficients (i.e., horizontal values) and quantization parameter data from the entropy decoding unit 302. The quantization parameter data may include any and all combinations of delta QP values and/or quantization group size values, etc., described above. The video decoder 300 and/or inverse quantization unit 304 may be configured to determine QP values for inverse quantization based on values signaled by a video encoder and/or by video properties and/or coding parameters. That is, the inverse quantization unit 304 may operate in a reciprocating manner to the coefficient quantization unit 206 described above. For example, the inverse quantization unit 304 may be configured to infer predetermined values (e.g., determine the sum of the QT depth and BT depth based on coding parameters), allowed quantization group sizes, etc., according to the techniques described above. The inverse quantization unit 304 may be configured to apply inverse quantization. The inverse transform processing unit 306 may be configured to perform an inverse transform to generate reconstructed residual data. The techniques performed by the inverse quantization unit 304 and the inverse transform processing unit 306, respectively, may be similar to the techniques performed by the inverse quantization/transform processing unit 208 described above. The inverse transform processing unit 306 may be configured to apply an inverse DCT, an inverse DST, an inverse integer transform, an inseparable quadratic transform (NSST), or a conceptually similar inverse transform process to the transform coefficients in order to generate a residual block in the pixel domain. Further, as described above, whether a particular transform (or a particular type of transform) is performed may depend on the intra prediction mode. As shown in fig. 14, the reconstructed residual data may be provided to summer 312. Summer 312 may add the reconstructed residual data to the predictive video block and generate reconstructed video data. The predictive video block may be determined according to predictive video techniques (i.e., intra-prediction and inter-prediction). In one example, the video decoder 300 and the post-filter unit 314 may be configured to determine QP values and use them for post-filtering (e.g., deblocking). In one example, other functional blocks of the video decoder 300 that utilize QPs may determine QPs based on the signaling received and use them for decoding.

The intra-prediction processing unit 308 may be configured to: intra-prediction syntax elements are received and predictive video blocks are retrieved from reference buffer 316. The reference buffer 316 may include a memory device configured to store one or more frames of video data. The intra-prediction syntax element may identify an intra-prediction mode, such as the intra-prediction modes described above. In one example, intra-prediction processing unit 308 may reconstruct the video block using one or more of the intra-prediction encoding techniques described herein. Inter-prediction processing unit 310 may receive inter-prediction syntax elements and generate motion vectors to identify prediction blocks in one or more reference frames stored in reference buffer 316. Inter prediction processing unit 310 may generate a motion compensated block so that interpolation may be performed based on the interpolation filter. An identifier of an interpolation filter for motion estimation with sub-pixel precision may be included in the syntax element. The inter prediction processing unit 310 may calculate an interpolation value of sub-integer pixels of the reference block using an interpolation filter. The post-filter unit 314 may be configured to perform filtering on the reconstructed video data. For example, post-filter unit 314 may be configured to perform deblocking and/or SAO filtering, as described above with respect to post-filter unit 216. Further, it should be noted that in some examples, post-filter unit 314 may be configured to perform proprietary discretionary filters (e.g., visual enhancement). As shown in fig. 14, the reconstructed video block may be output by a video decoder 300. In this manner, video decoder 300 may be configured to generate reconstructed video data in accordance with one or more techniques described herein. In this manner, the video decoder 300 may be configured to: the method further includes parsing the first binary quadtree block structure, applying the first binary quadtree block structure to a first component of the video data, determining a shared depth, and applying the first binary quadtree block structure to a second component of the video data up to the shared depth. In this manner, video decoder 300 represents an example of a device configured to determine offset values and block leaf nodes according to the offset values.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer readable medium may include a computer readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, an Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, these techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in various devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as noted above, the various units may be combined in a codec hardware unit, or provided by a set of interoperative hardware units, including one or more processors as described above, as well as suitable software and/or firmware.

Further, each functional block or various features of the base station device and the terminal device used in each of the above-described embodiments may be implemented or executed by a circuit, which is typically one integrated circuit or a plurality of integrated circuits. Circuits designed to perform the functions described in this specification can include a general purpose processor, a Digital Signal Processor (DSP), an application specific or general purpose integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, or a combination thereof. A general-purpose processor may be a microprocessor, or alternatively, the processor may be a conventional processor, controller, microcontroller, or state machine. The general-purpose processor or each circuit described above may be configured by a digital circuit or may be configured by an analog circuit. Further, when a technology for making an integrated circuit that replaces a current integrated circuit appears due to the advancement of semiconductor technology, an integrated circuit by the technology can also be used.

Various examples have been described. These examples and other examples are within the scope of the following claims.

< Cross reference >

This non-provisional application claims priority under 35u.s.c. § 119 to provisional application No.62/452879, 31/1/2017, the entire contents of which are incorporated herein by reference.

43页详细技术资料下载

System and method for performing planar intra-prediction video coding

相关技术

网友询问留言