Method and apparatus for block shape based video encoding and decoding

文档序号：1358578 发布日期：2020-07-24 浏览：13次中文

阅读说明：本技术 基于块形状的视频编码和解码的方法和装置 (Method and apparatus for block shape based video encoding and decoding ) 是由 F.拉卡普 G.拉思 F.厄本于 2018-12-04 设计创作，主要内容包括：提供了用于视频编码和解码的方法(1300,1500)和装置(700,1400,1600)。视频编码方法(1300)包括访问(1310)用于预测视频的图片中的块的参考样本集,基于块的形状处理(1320)该参考样本集,基于处理的参考样本集生成(1330)该块的预测块,以及基于预测块对该块进行编码。还描述了被格式化为包括编码数据的比特流、计算机可读存储介质和计算机程序产品。(Methods (1300, 1500) and apparatus (700, 1400, 1600) for video encoding and decoding are provided. A video encoding method (1300) includes accessing (1310) a set of reference samples for predicting a block in a picture of a video, processing (1320) the set of reference samples based on a shape of the block, generating (1330) a prediction block for the block based on the processed set of reference samples, and encoding the block based on the prediction block. Also described are a bitstream formatted to include encoded data, a computer-readable storage medium, and a computer program product.)

1. A method (1300) of video encoding, comprising:

accessing (1310) a reference sample set for prediction of a block in a picture of video;

processing (1320) the set of reference samples based on a shape of the block;

generating (1330) a prediction block for the block based on the processed reference sample set; and

encoding (1340) the block based on the prediction block.

2. An apparatus for video encoding, comprising:

means for accessing a reference sample set for predicting a block in a picture of a video;

means for processing the reference sample set based on a shape of the block;

means for generating a prediction block for the block based on the processed reference sample set; and

means for encoding the block based on the prediction block.

3. A method (1500) of video decoding, comprising:

accessing (1510) a set of reference samples for predicting a block in a picture of encoded video;

processing (1520) the reference sample set based on a shape of the block;

generating (1530) a prediction block for the block based on the processed reference sample set; and

decoding (1540) the block based on the prediction block.

4. An apparatus for video decoding, comprising:

means for accessing a reference sample set for predicting a block in a picture of encoded video;

means for processing the reference sample set based on a shape of the block;

means for generating a prediction block for the block based on the processed reference sample set; and

means for encoding the block based on the prediction block.

5. The method of any of claims 1 and 3 or the apparatus of any of claims 2 and 4, wherein the processing is enabled when a function associated with the block shape is greater than a value.

6. The method of claim 5 or the apparatus of claim 5, wherein the function is one of a maximum size of the block, a diagonal of the block, and a weighted sum of the size of the block.

7. The method of any of claims 5-6 or the apparatus of any of claims 5-6, wherein the processing is further based on a prediction mode.

8. The method of claim 8, or the apparatus of claim 8, wherein the processing is enabled for at least one prediction mode when the function is greater than the value.

9. The method according to any one of claims 1, 3 and 5-8 or the apparatus according to any one of claims 2, 4 and 5-8, wherein for non-square blocks, the processing is disabled for at least one directional prediction mode in the direction of the smallest size of the block.

10. The method according to any of claims 5-9 or the apparatus according to any of claims 5-9, wherein at least one flag is included in the encoded video, the at least one flag indicating at least one of the value and whether the processing is enabled.

11. The method according to any one of claims 1, 3 and 5-10 or the apparatus according to any one of claims 2, 4 and 5-10, wherein the processing comprises smoothing filtering the reference sample set.

12. The method of any one of claims 1, 3, and 5-11 or the apparatus of any one of claims 2, 4, and 5-11, wherein the prediction is based on the block shape.

13. A computer program product comprising program code instructions for performing the method according to any one of claims 1, 3 and 5-12.

14. A computer readable storage medium carrying a software program comprising program code instructions for the method according to any one of claims 1, 3 and 5-12.

15. A bitstream formatted to include encoded data representing blocks of a picture, the encoded data encoded according to any of claims 1 and 5-12.

Technical Field

The present embodiments relate generally to video encoding and decoding and, more particularly, to prediction of blocks based on block shapes.

Background

Any background information described herein is intended to introduce the reader to various aspects of art, which may be related to the present embodiments described below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light.

To achieve high compression efficiency, image and video coding schemes typically employ prediction and transform to exploit spatial and temporal redundancy in video content. Typically, intra or inter prediction is used to exploit intra or inter correlation. The difference between the original and predicted images (usually expressed as prediction error or prediction residual) is then transformed, quantized and entropy encoded. To reconstruct video, the compressed data is decoded by an inverse process corresponding to prediction, transformation, quantization, and entropy coding.

In the High Efficiency Video Coding (HEVC) standard ("ITU-T h.265, ITU telecommunication standardization sector (10/2014), family H: audio and multimedia systems, Coding of infrastructure-motion Video of audio visual services, High Efficiency Video Coding, Recommendation ITU-T h.265" ("ITU-T h.265 temporal Prediction Coding section of ITU (10/2014)), series H: audio and multimedia Coding, infra: intra and multimedia Coding, infra of audio Coding, Coding of intra Video Coding, Coding of intra Coding, Coding of inter Coding, Coding of intra Coding, etc., a picture is partitioned into square Coding Tree Units (CTUs), the size of a Coding tree Unit is typically ×, 64, or a Prediction Unit of intra Coding, Coding of PU 35, Coding of intra Coding, Coding of PU 35, Coding of PU, Coding of PU 35, Coding of intra Coding, Coding of PU 35, Coding of PU, Coding of PU, Coding, PU 35, Coding of PU, Coding, PU 35, Coding, PU, N, etc., Coding, etc., a set of a Coding, where a Coding of a Coding Unit is a Coding Unit of a Coding Unit, a Coding Unit of.

The Quad-Tree plus Binary-Tree (QTBT) coding tool is a new video coding tool that provides more flexible CTU representation and higher compression efficiency than the CU/PU/TU arrangement of the HEVC standard. As shown in fig. 3, the quadtree plus binary tree (QTBT) coding tool defines a coding tree 310 in which coding units may be partitioned in a quadtree and binary tree fashion. Fig. 3 shows an exemplary coding tree representation of coding tree unit 320, where solid lines indicate quadtree partitions and dashed lines indicate binary partitions of CUs 330 within CTU 320, CU 330 being spatially embedded in quadtree leaves.

The decision to partition the CTU into coding units at the encoder side is e.g. decided by a rate-distortion optimization procedure, which comprises determining the QTBT representation of the CTU with a minimum rate-distortion cost. In the QTBT representation, a CU has a square or rectangular shape. The size of a coding unit is always a power of 2, usually from 4 to 128. The QTBT decomposition of the CTU comprises two stages: the CTU is first partitioned into 4 CUs in a quadtree manner, and then each quadtree leaf may be further partitioned into two CUs in a binary manner or 4 CUs in a quadtree manner, as shown in fig. 3.

However, as shown in FIG. 3, this QTBT representation only allows symmetric partitioning of CUs, the four partitioning modes allowed by QTBT are NO _ SP L IT (CU undivided), QT _ SP L IT (partitioned into four quadrants), HOR (horizontally partitioned into two equally sized CUs), and VER (vertically partitioned into two equally sized CUs).

Recently, a CU having a new rectangular shape, which is generated by a new binary split pattern called an asymmetric split pattern, is proposed as shown in fig. 4 and 5. Fig. 4 shows a binary split pattern of CUs in QTBT, referred to as asymmetric split pattern, and depicts 4 exemplary split patterns 410 to 440. In FIG. 4, the new rectangular shape includes a width and/or height having a size equal to 3.2ⁿ. In addition, a CU whose width or height is a multiple of 3 may be further divided horizontally or vertically in a binary manner.

A square CU of size (w, h) (width and height) is partitioned by one of the proposed asymmetric binary partitioning modes (e.g., HOR UP 410), resulting in a corresponding rectangle of size (w, h) (width and height)And2, so that the encoder can choose a width or height equal to 3 · 2ⁿThe CU (1). In this case, an intra prediction or inter prediction process of rectangular blocks having a multiple of 3 in size may be performed. In addition, the execution width or height is 3.2ⁿAnd performs a subsequent transform coefficient entropy coding process.

Other CU partitioning modes, referred to as horizontal and vertical ternary tree partitioning modes 510 and 520, as shown in fig. 5, include partitioning a CU into 3 sub-coding units (sub-CUs), where the size of the respective sub-coding units is equal to the size of the parent CU 1/4,1/2, and 1/4 in the direction of the considered spatial partitioning.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided a video encoding method, the method including: the method includes accessing a reference sample set for prediction of a block in a picture of the video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and encoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided an apparatus for video encoding, the apparatus including: means for accessing a reference sample set for prediction of a block in a picture of a video; means for processing a reference sample set based on a shape of a block; means for generating a prediction block for the block based on the processed reference sample set; and means for encoding the block based on the prediction block.

According to an aspect of the disclosure, there is provided an apparatus for video encoding, the apparatus comprising a processor and at least one memory coupled to the processor, the processor configured to access a reference sample set for prediction of a block in a picture of a video, process the reference sample set based on a shape of the block, generate a prediction block for the block based on the processed reference sample set, and encode the block based on the prediction block.

According to an aspect of the present disclosure, a bitstream formatted to include encoded data representing blocks of a picture, the encoded data encoded by: the method includes accessing a reference sample set for prediction of a block in a picture of the video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and encoding the block based on the prediction block.

According to an aspect of the disclosure, a signal comprising a bitstream formatted to include encoded data representing blocks of a picture, the encoded data encoded by: the method includes accessing a reference sample set for prediction of a block in a picture of the video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and encoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided a video decoding method including: the method includes accessing a reference sample set for prediction of a block in a picture of encoded video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and decoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided an apparatus for video decoding, the apparatus including: means for accessing a reference sample set for prediction of a block in a picture of encoded video; means for processing a reference sample set based on a shape of a block; means for generating a prediction block for a block based on the processed set of reference samples; and means for decoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided an apparatus for video decoding, the apparatus comprising a processor and at least one memory coupled to the processor, the processor configured to: the method includes accessing a reference sample set for prediction of a block in a picture of coded video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and decoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided a computer program product comprising program code instructions for performing the steps of: the method includes accessing a reference sample set for prediction of a block in a picture of the video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and encoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided a computer program product comprising program code instructions for performing the steps of: the method includes accessing a reference sample set for prediction of a block in a picture of encoded video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and decoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided a computer readable storage medium carrying a software program comprising program code instructions for performing the steps of: the method includes accessing a reference sample set for prediction of a block in a picture of the video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and encoding the block based on the prediction block.

According to an aspect of the present disclosure, there is provided a computer readable storage medium carrying a software program comprising program code instructions for performing the steps of: the method includes accessing a reference sample set for prediction of a block in a picture of coded video, processing the reference sample set based on a shape of the block, generating a prediction block for the block based on the processed reference sample set, and decoding the block based on the prediction block.

The foregoing presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the subject matter. Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.

Additional features and advantages of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The disclosure may be better understood from the following brief description of exemplary drawings:

fig. 1 illustrates CTUs partitioned into CUs according to the HEVC standard;

fig. 2 illustrates partitioning of a CTU into CUs, PUs and TUs according to the HEVC standard;

FIG. 3 shows a CTU according to the QTBT tool;

FIG. 4 shows a binary split mode of CUs in QTBT, called asymmetric split mode;

FIG. 5 shows horizontal (left) and vertical (right) ternary tree CU partitioning patterns in QTBT;

fig. 6 illustrates an exemplary set of CU partitioning patterns according to an embodiment of the present disclosure;

fig. 7 shows a simplified block diagram of an exemplary video encoder in accordance with an embodiment of the present disclosure;

FIG. 8 shows a simplified block diagram of an exemplary intra prediction module in accordance with an embodiment of the present disclosure;

FIG. 9 illustrates an exemplary reference sample for a current block according to the present disclosure;

fig. 10 shows intra prediction directions according to the HEVC standard;

fig. 11 shows exemplary intra prediction modes according to the square block shape of the HEVC standard;

FIG. 12 illustrates exemplary intra prediction modes for rectangular block shapes according to this disclosure;

fig. 13 shows a flow diagram of an exemplary method of video encoding in accordance with an embodiment of the present disclosure;

fig. 14 shows a simplified block diagram of an exemplary video decoder in accordance with an embodiment of the present disclosure;

fig. 15 shows a flow diagram of an exemplary method of video decoding according to an embodiment of the present disclosure; and

FIG. 16 illustrates a block diagram of a computing environment in which aspects of the disclosure may be implemented and executed.

Detailed Description

It should be understood that the elements shown in the fig. may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces. The phrase "coupled" is defined herein as directly connected or indirectly connected through one or more intermediate components. Such intermediate components may include hardware and software based components.

This specification illustrates the principles of the disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its scope.

All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, Digital Signal Processor (DSP) hardware, Read Only Memory (ROM) for storing software, Random Access Memory (RAM), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included. Also, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. Thus, any means that can provide those functionalities may be deemed equivalent to those shown herein.

It is to be understood that the figures and descriptions have been simplified to illustrate elements that are relevant for a clear understanding of the present disclosure, while eliminating, for purposes of clarity, many other elements found in typical encoding and/or decoding devices.

It will be understood that, although the terms first and second may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Various methods are described above, and each method includes one or more steps or actions for implementing the described method. The order and/or use of specific steps and/or actions may be modified or combined unless the proper operation of the method requires a specific order of steps or actions.

It should be understood that a picture may be an array of luma samples in a monochrome format, or two corresponding arrays of chroma samples and luma samples in 4:2:0, 4:2:2, and 4:4:4 color formats, or three arrays of three color components (e.g., RGB). In HEVC, a "block" addresses a particular region in a sample array (e.g., luma Y), and a "unit" includes the collocated blocks of all encoded color components (luma Y and possibly chroma Cb and chroma Cr), syntax elements, and prediction data (e.g., motion vectors) associated with the block. However, the term "block" is used more generally herein to refer to a block (e.g., a Coding Block (CB), a Transform Block (TB), a Coding Group (CG), etc.) or a unit (e.g., a CU).

It should be understood that a picture or a block of pixels or transform coefficients is a two-dimensional array or matrix. The horizontal or x-direction (or axis) represents the width and the vertical or y-direction (or axis) represents the height. The index starts at 0. The x direction represents columns and the y direction represents rows. The largest x index is width-1. The largest y index is height-1.

In the following sections, the words "reconstruction" and "decoding" may be used interchangeably. Usually, but not necessarily, the "reconstruction" is used at the encoder side and the "decoding" at the decoder side. In addition, the words "encoded" and "encoded" may be used interchangeably. Furthermore, the terms "image," "picture," "frame," and "slice" (i.e., a portion of a picture) may be used interchangeably. Furthermore, the words "encoding," "source encoding," and "compressing" may be used interchangeably.

The present disclosure is directed to prediction in video encoding and decoding. Intra-prediction in video compression refers to spatial prediction of a block of pixels using information from causal neighboring blocks (causal neighbor blocks), which are neighboring blocks in the same frame that have already been decoded. This is a powerful coding tool because it allows high compression efficiency within frames and also between frames whenever there is no better temporal prediction. Therefore, intra-prediction has been included as a core coding tool in video compression standards, including h.264/Advanced Video Coding (AVC), h.265/HEVC, etc. In HEVC intra prediction, CUs are spatially predicted from causal neighboring CUs (i.e., upper and upper-right CUs, left and lower-left CUs, and upper-left CU).

In particular, the present disclosure is directed to smooth filtering of reference samples. In HEVC and earlier video coding standards, intra prediction of a target block uses filtered reference samples. Specifically, the reference samples may be, for example, a decoded pixel row (or sample row) on the upper side and a decoded pixel column on the left side of the current block. Based on the decoded pixel values in these CUs, the encoder constructs different predictions for the target block and selects the prediction that results in the best RD performance.

Corresponding to this block size, the number of intra prediction modes has increased to 35, one of which is planar mode (index is mode 0), one is DC mode (index is mode 1), and the remaining 33 (indices are modes 2-34) are directional or angular modes.

In HEVC and some previous video coding standards, intra prediction modes are signaled for each CU. The reference samples are filtered according to the block size. The Prediction mode and filter (e.g., Position Dependent Prediction Combination (PDPC) or Reference Sample Adaptive Filtering (RSAF)) are signaled.

Fig. 6 illustrates an exemplary set of CU partition modes according to the present disclosure. The rich set of CU topologies produces a coding structure that spatially matches the structure and discontinuities contained in the images of the bitstream. In case of rectangular block shapes (instead of the traditional square shapes) it is of interest to reduce the number of signaling flags by activating some prediction modes and filters depending on the block shape (e.g. horizontal or vertical rectangular blocks) and e.g. intra prediction direction.

The present disclosure addresses some of the shortcomings present in the prior art. In particular, in at least some embodiments of the present disclosure, having shape-dependent modes may result in better prediction and, thus, higher coding gain. Although the description generally refers to intra-prediction, similar concepts may be applied to inter-prediction without departing from the scope of the present disclosure. In particular, similar concepts may be applied to filtering of reference samples for intra-prediction and/or inter-prediction modes.

Encoding

Fig. 7 shows a simplified block diagram of an exemplary video encoder 700 in accordance with an embodiment of the present disclosure. The encoder 700 may be included in a transmitter or head end (head end) in a communication system. To encode a video sequence with one or more pictures, a picture may be partitioned into CTUs with square shapes of configurable size. A set of consecutive CTUs may be grouped into a stripe. The CTU is the root of the QTBT partitioned into CUs. In the exemplary encoder 700, pictures are encoded by an encoder module, as described below. Each block is encoded using either an intra mode or an inter mode. When a block is encoded in intra mode, the encoder 700 performs intra prediction (block 760) or spatial prediction based on at least one block in the same picture or frame. When a block is encoded in an inter mode, the encoder 700 performs inter prediction or temporal prediction based on at least one reference block from at least one reference picture or frame. In uni-directional (uni) inter prediction, the prediction is typically (but not necessarily) based on an earlier reference picture or frame. In bi-directional (bi) inter prediction, the prediction is typically (but not necessarily) based on earlier and later pictures or frames. In inter mode, motion estimation (block 775) and compensation (block 770) are performed. The encoder decides (block 705) which of the intra mode or inter mode to use to encode the block and indicates the intra/inter decision by the prediction mode flag. The residual is calculated by subtracting (block 710) a block of prediction samples (also referred to as predictor) from the original image block.

As an example, a block in intra mode is predicted from reconstructed neighboring samples. Inter prediction is performed by performing motion estimation (block 775) and motion compensation (block 770) on the reference blocks stored in the reference picture buffer 780. The motion estimation module 775 may include motion compensation as its purpose is to determine the best motion vector, which may use an iterative search that typically terminates when the rate distortion cost (RD cost) is sufficiently low or has reached a minimum.

The residual is transformed (block 725) and quantized (block 730). The transform module 725 may transform the image from the pixel or time domain to the transform or frequency domain. the transform may be, for example, a cosine transform, a sine transform, a wavelet transform, etc. quantization may be performed according to, for example, a rate-distortion criterion.

The encoder includes a decoding loop and thus decodes the encoded block to provide a reference for further prediction. The quantized transform coefficients are dequantized (block 740) and inverse transformed (block 750) to decode the residual. The block of pictures is reconstructed by combining (block 755) the decoded residual and the block of prediction samples. The in-loop filter (765) may be applied to the reconstructed picture, for example, to perform deblocking filtering/Sample Adaptive Offset (SAO) filtering to reduce coding artifacts. The filtered image is stored in the reference picture buffer 780.

The modules of the video encoder 700 may be implemented in software and executed by a processor or may be implemented using circuit components well known to those skilled in the art of compression. In particular, the video encoder 700 may be implemented as an Integrated Circuit (IC).

In addition to the differences described in this disclosure, the modules of the video encoder 700 may also be present in other video encoders (e.g., HEVC encoders), particularly in the intra prediction 760 and/or the inter prediction modules 770, 775 as will be described in more detail in the following paragraphs and figures. The video encoder 700 may be similar to an HEVC video encoder for functions other than intra prediction 760 and/or inter prediction 770, 775, and these functions are not described in detail herein.

Fig. 8 shows a simplified block diagram 800 of an exemplary intra prediction module in accordance with an embodiment of the present disclosure. The intra-prediction module 800 may be similar to the intra-prediction module 760 of fig. 7. The intra prediction module 800 receives the reconstructed block 811 and outputs a prediction block 835. The intra prediction module 800 includes three modules: a reference sample generation module 810, a block intra-prediction module 820, and a post-prediction processing (post-processing) module 830.

In the reference sample generation module 810, reference samples 813 may first be calculated or determined from reconstructed neighboring blocks of the current block in a reference sample calculation module 812. At this stage, the reference pixels or sample values may be copied into a first reference sample buffer, filling them if the bottom left pixel or the top right pixel is not available. Next, in the reference sample smoothing module 814, a smoothed version of the original reference sample/buffer 813 is computed as the filtered reference sample/buffer 815. Alternatively, module 814 may be bypassed. Then, in the reference sample selection module 816, a reference sample is selected 819 from the original reference sample 813 or the filtered reference sample 815 according to the selection parameters 817. Next, in modular intra prediction 820, prediction is performed to calculate intra prediction for the current block from the selected reference samples/buffers 819. Finally, depending on the prediction direction, post-processing (e.g., filtering) may be applied to the intra-block prediction in a post-prediction processing module 830 to output a prediction block 835. The post-processing module 830 may be bypassed or removed.

In addition to the differences described in this disclosure, particularly the differences in modules 816, 820, and 830 and in selection parameters 817 as will be described in more detail in the following paragraphs and figures, the modules of the intra prediction module 800 may also be present in other video encoders (e.g., HEVC encoders). For example, in the HEVC standard, the selection parameters in 817 are the number of pixels (e.g., block size) of the current block and the determined prediction direction. Furthermore, in the HEVC standard, modules 820 and 830 are a function of the determined prediction direction. In accordance with the present disclosure, block shape (e.g., horizontal or vertical rectangle) may be a selection parameter 817 and may also affect modules 816, 820, and 830, which will be described in more detail in the following paragraphs and figures. The block shape is characterized by the ratio of the width and height of the block, which is why horizontal and vertical rectangular blocks are considered to be different shapes. Thus, the orientation of the rectangular blocks is taken into account.

Fig. 9 shows exemplary reference samples (920, 930) of a current block 910 according to the present disclosure fig. 9 is also applicable to the HEVC standard in block 812 of fig. 8, for a current block 910 of size N × N (i.e., horizontal (x) versus vertical (y) directions), a row of 2N (i.e., 2N) reconstructed reference samples on the top side 920 may be formed from previously reconstructed top and right top side reference pixels to the current block 910. the top side reference samples 920 include samples P (0, -1), …, P (N-1, -1),. ·, P (2N-1, -1). similarly, a column of 2N samples on the left side 930 may be formed from reconstructed left and left bottom side reference pixels to the current block 910. the left side reference samples 930 include samples P (-1,0), …, P (-1, N-1),. P (-1,2N-1), the top left side position of corner pixels P (-1,2N-1), the left side position of the left side 930, the left side reference samples P (-1,2N-1), the left side position of the left side may also be used for filling gaps in the left side reference samples, if the corresponding reference block may be replaced by the same method, or not performed in the left side reference block, because the corresponding reference sample boundary, in the left side reference block, the method, if the method may be used in the left side, or in the reference block.

According to the present disclosure, the block size need not be limited to N × N (i.e., square block) and may be M × N (i.e., generally rectangular block, including square block), where M is equal to or different from N for a general block size of M × N, there may be a row of 2M reconstructed reference samples on the upper side of the current block and a column of 2N reconstructed reference samples on the left side of the current block.

Intra sample prediction is performed in block 820 of fig. 8 and comprises predicting pixels of the target CU based on the reference samples. As previously described, HEVC supports a range of prediction models in order to efficiently predict different kinds of content. The planar and DC prediction modes are used to predict smooth and gradual regions, while the angular prediction mode is used to capture different directional structures. HEVC supports 33 directional prediction modes with indices from 2 to 34. The directional prediction modes correspond to different prediction directions.

Fig. 10 shows intra prediction directions according to the HEVC standard and according to the present disclosure. However, the present disclosure is not limited to the HEVC direction. Numerals 2 to 34 denote prediction mode indexes associated with the corresponding directions. Modes 2 to 17 indicate horizontal prediction (H-26 to H +32), and modes 18 to 34 indicate vertical prediction (V-32 to V + 32). The symbols "H" and "V" in fig. 10 are used to indicate the horizontal and vertical directions, respectively, and the numerical part of the identifier represents the displacement of the pixel (also referred to as "angle parameter").

The angle parameter a, represents the position of the reference sample (1/32 pixels in resolution) relative to the target pixel on the first row or column. The values a for the different prediction modes are shown in table 1 and table 2.

TABLE 1

TABLE 2

The directions with non-negative displacements (i.e., H0 to H +32 and V0 to V +32) are also denoted as positive directions, and the directions with negative displacements (i.e., H-2 to H-26 and V-2 to V-32) are also denoted as negative directions. A positive prediction direction may also be defined as a direction having a positive a value and a negative prediction direction may be defined as a direction having a negative a value.

As shown in fig. 10, the sample accuracy of the defined angular direction is 1/32. That is, the interval between two pixels in the horizontal or vertical direction is divided into 32 sub-intervals. As mentioned above, the defined direction can be distinguished as vertical or horizontal. The prediction mode in the horizontal direction uses only the left reference samples, or uses some left reference samples and some upper reference samples. Similarly, the prediction mode in the vertical direction uses only the upper side reference samples, or uses some upper side reference samples and some left side reference samples. The positive horizontal direction from H0 to H +32 uses only the left reference sample for prediction. Similarly, the positive vertical direction from V0 to V +32 uses only the upper reference sample for prediction. The negative horizontal and negative vertical directions (H-2 to H-26 and V-2 to V-32) are predicted using both the left side reference sample and the upper side reference sample.

In HEVC reference code, the reference array is first constructed using the upper and left reference samples. For vertical prediction, the reference array is horizontal, and for horizontal prediction, the reference array is vertical. For modes with a positive angle parameter a (modes 2 to 10 and 26 to 34), the reference array is simply the upper or left side reference sample, depending on the direction:

for vertical prediction, topRef [ x ] ═ P [ x-1] [ -1],0 ≦ x ≦ 2N (1)

For horizontal prediction, leftRef [ y ] ═ P [ -1] [ y-1],0 ≦ y ≦ 2N (2)

Where N is the CU size. The sample coordinates are typically initialized to (0,0) at the top left pixel of the target CU. Thus, the upper reference sample has a y coordinate of-1 and the left reference sample has an x coordinate of-1.

For modes with negative angle parameter a (modes 11 to 25), the reference array requires pixels from both the upper and left side references. In this case, the reference array would be further extended to negative indices beyond-1. The sample values on the reference array with positive indices are obtained as described above from vertical or horizontal prediction. Those pixels on the reference array with negative indices are obtained by projecting either the left-side reference pixel (for vertical prediction) or the upper-side reference pixel (for horizontal prediction) on the reference array along the prediction direction.

Once the reference array is constructed, a prediction of any pixel location inside the target CU is obtained by projecting the pixel location to the reference array in a selected direction, and then copying the reference array sample values at (x, y). The reference sample value is calculated at the sample resolution of (1/32) by interpolating between two adjacent samples, as follows:

for vertical prediction, P [ x ] [ y ] (32-f) × topRef [ x + i +1] + f × topRef [ x + i +2] +16) > 5),0 ≦ x, y < N (3)

For horizontal prediction, P [ x ] [ y ] (32-f) × leftRef [ y + i +1] + f leftRef [ y + i +2] +16) > 5),0 ≦ x, y < N (4)

Where i and f represent the integer and fractional parts of the displacement projected from the pixel location (x, y), and > represents the move operation to the right. If Δ represents a projection displacement, then

For horizontal prediction, (x +1) × a (5)

For vertical prediction, (y +1) × a (6)

Then, the integer and fractional parts of the displacement are obtained as follows:

i＝Δ＞＞5 (7)

f＝Δ&31 (8)

where & represents a bitwise and operation. Note that if f is 0, i.e. there is no fractional part, the prediction is equal to the reference array sample value in the prediction direction. In this case, no interpolation is required.

Some of the prediction modes, such as the DC mode, as well as the direct horizontal mode (H0 or 10) and the vertical mode (V0 or 26), may cause discontinuity at the CU boundary after prediction. Thus, in HEVC and in the present disclosure, this prediction mode is followed by a post-processing step performed by the post-processing module 830 of fig. 8, where the boundary prediction samples are smoothed using a low-pass filter.

As previously described, to construct the prediction of the target block, the encoder (and decoder) may use only one row of reference samples on the upper side of the block and one column of reference samples on the left side of the block. The reference samples in fig. 9 closest to the target block carry the maximum amount of correlation with the content of the target block, and therefore the use of additional decoding rows and columns is considered unnecessary due to higher complexity and memory requirements. But this logic is applicable when the target block size is small and there are only a few angular prediction modes. As the block size and the number of prediction modes increase, the prediction from one reference row and one reference column can be made more accurate by using the direction information from the additional reference row and column.

The shape of a rectangular block may be disproportionately narrow in the horizontal or vertical direction since the block may no longer be square. Therefore, the performance of the prediction mode and filtering of the reference sample may be deviated according to the shape of the block, as shown by comparison of fig. 11 and 12.

Fig. 11 shows exemplary intra prediction modes according to the square block shape of the HEVC standard. Diagram 1100 shows: a) horizontal prediction examples and b) vertical prediction examples with corresponding prediction directions. It can be observed that the block shape and prediction direction are not discriminant, that is, there is similarity between the horizontal and vertical predictions of the square shape. The same applies to the square block-shaped intra prediction mode according to the present disclosure.

Fig. 12 illustrates an exemplary intra prediction mode of a rectangular block shape according to the present disclosure. The diagram 1200 shows: a) a horizontally predicted instance of a vertical rectangular block, b) a vertically predicted instance of a vertical rectangular block, c) a horizontally predicted instance of a horizontal rectangular block, and d) a vertically predicted instance of a horizontal rectangular block, with corresponding prediction directions. It can be observed that in examples a) and d), the predicted pixels are close to the reference sample. However, in cases b) and c), the predicted pixels are far from the reference sample, which is undesirable because the correlation tends to decrease with distance.

Furthermore, propagating strong edges that are not aligned with the block structure or texture may generate high frequencies in the residual, thus high cost residual to encode. In order to achieve a good balance between generating accurate predictions and avoiding false strong edges, it is useful in some cases to low-pass filter the reference samples. In the HEVC standard, intra reference samples are filtered according to the absolute value of the (smallest) angle between the pure horizontal direction or the pure vertical direction and the intra prediction direction (block 814). The threshold or value depends on the block size (number of pixels in the block). The threshold or value is independent of the block shape. If the angle is greater than the threshold or value, a reference sample smoothing filter is applied.

As a result, the reference samples used for Intra prediction are sometimes filtered by a three-tap (tree-tap) [ 121 ]/4 smoothing filter the HEVC standard applies the smoothing operation adaptively according to directionality and block size.a smoothing filter is not applicable to a block of 4 × 4 for a block of 8 × 8, only the diagonal directions (Intra-Angular) [ k ], i.e., k 2, 18, or 34) use the reference sample smoothing.for a block of 16 × 16, most directions of the reference samples are filtered except for the near horizontal and near vertical directions (i.e., k in the range of 9-11 and 25-27). for a block of 32 × 32, all directions use the smoothing filter except for the horizontal direction (k-10) and the vertical direction (k-26). when the block size is greater than or equal to 8 × 8, the in-plane mode also uses the smoothing filter and the smoothing is not used for (or not useful) in-DC Intra cases.

According to the present disclosure, it is therefore desirable to determine an intra prediction mode based on block shape. Furthermore, it is desirable to determine the filtering tools based on block shape and possible intra prediction direction.

In one embodiment according to the present disclosure, the function, characteristic, or condition is determined according to or in association with a block shape for enabling or disabling reference smoothing filtering (module 810). The smoothing filtering may be a low pass filtering process. The smoothing filtering may be performed by similar filters used in the HEVC standard. The function or characteristic may be defined as the block width of a horizontal rectangular block, the block height of a vertical rectangular block, and the block width or block height of a square block. This function or characteristic may be equivalently defined, for example, as the maximum size of the block and applied to the reference smoothing filter. In one embodiment, if the maximum size is the width of a block (e.g., a horizontal rectangular block, such as items "c" and "d" in fig. 12), then at least one vertical intra prediction mode (V in fig. 10) may be disabled for reference smoothing filtering of a reference sample set associated with the block (810). If the maximum size is the height of the block (e.g., a vertical rectangular block, as in items "a" and "b" of fig. 12), then at least one horizontal intra prediction mode (H in fig. 10) may be disabled for reference smoothing filtering of the reference sample set associated with the block (810). Otherwise, if the width and height of the block are of similar size, horizontal and vertical intra prediction modes may be enabled to reference smooth filtering of a reference sample set associated with the block (810).

In other words, if the prediction directionality (horizontal or vertical) is the same as the block shape (defined by the largest dimension, e.g., horizontal prediction and width > height, or vertical prediction and width < height, cases "c" and "b" in fig. 12), then the prediction is considered long. Otherwise (cases "a" and "d" in fig. 12), the prediction is considered short. For short prediction, at least one directional intra prediction mode may be disabled for reference smoothing filtering (810).

In one embodiment, the reference sample smoothing filter may be disabled for short predictions. For example, the reference smoothing filter is applied only to the terms "b" and "c" in fig. 12.

In one embodiment according to the present disclosure, a function, characteristic, or condition may be defined as a diagonal of the block. In one embodiment, the function, characteristic, or condition may be defined as a weighted sum of the sizes of the blocks, e.g., 2 x width + height. In one embodiment, other more complex functions may be designed to distinguish different shapes of blocks.

In one embodiment according to the present disclosure, the function, characteristic or condition may further comprise a comparison of a direction threshold/value or range of values associated with the prediction direction of the block. For example, if the maximum size of the block is greater than the shape threshold p-4 and the prediction mode index k is less than 10 for horizontal tiles or greater than 26 for vertical tiles, then a reference smoothing filter may be applied.

In one embodiment according to the present disclosure, a more complex function of block shape and prediction direction of the block may be established. For example, the shape threshold may depend on the prediction direction: for the horizontal direction, if the block width is larger than p, a filter is applied, where for k 9 to 11, p 4, and for k 18, p 16, which means that for almost all horizontal directions, p 16, except for k 9 to 11 (where p 4). For the vertical direction, if the block height is larger than p, a filter is applied, where for k 25 to 27, p 4, and for k >18, p 16, which means that for almost all vertical directions, except for k 25 to 27 (where p 4), p 16.

In one embodiment, at least one flag may be included and optionally encoded in the bitstream (e.g., as a syntax element) to indicate the shape threshold(s) or value(s) of at least one of the current picture, slice, or block. At least one flag may be retrieved at a decoder and used to decode the encoded block.

In one embodiment according to the present disclosure, prediction, residual, transform, and encoding may be performed with and without reference sample smoothing filtering. Between the two options, the option yielding better Rate Distortion (RD) performance may be selected.

In one embodiment, at least one flag may be included and optionally encoded in the bitstream (e.g., as a syntax element) to indicate whether reference sample smoothing filtering is enabled/disabled for at least one of the current picture, slice, or block. At least one flag may be retrieved at a decoder and used to decode the encoded block.

In one embodiment, the selected prediction may be signaled to the decoder using a one bit flag at the CU level. Flags may be coded with CABAC using contexts that depend on the prediction direction and block shape. For example, if the prediction is horizontal and width > height, or the prediction is vertical and height > width, context 1 may be used, otherwise context 2 may be used. The flag may be retrieved at the decoder and used to decode the encoded block.

In one embodiment according to the present disclosure, the function, characteristic, or condition is determined according to a block shape used to enable or disable the intra prediction mode (modules 820, 830). This function or characteristic may be defined, for example, as the maximum size of the block and applied to intra prediction. In one embodiment, if the maximum size is the width of the block (e.g., a horizontal rectangular block, such as items "c" and "d" in fig. 12), at least one vertical intra-prediction mode may be disabled to generate intra-prediction associated with the block (820, 830). If the maximum size is the height of the block (e.g., a vertical rectangular block, such as items "a" and "b" in fig. 12), at least one horizontal intra-prediction mode may be disabled to generate intra-prediction associated with the block (820, 830). Otherwise, if both the width and the height of the block have similar sizes, horizontal and vertical intra prediction modes may be enabled to generate intra prediction associated with the block (820, 830).

In other words, if the prediction is short, at least one prediction mode may be disabled for intra prediction (820, 830).

In one embodiment, for short prediction, all (vertical or horizontal) prediction modes may be disabled. For example, a horizontal rectangular block is only horizontally predicted (item "c" in fig. 12), and a vertical rectangular block is only vertically predicted (item "b" in fig. 12).

In one embodiment according to the present disclosure, the function, characteristic, or condition may include a comparison of a shape threshold/value or range of values associated with the shape of the block. In one embodiment, the shape threshold/value or range of values may be associated with a block width if the prediction is horizontal or a block height if the prediction is vertical. In other words, the shape threshold/value or range of values may be associated with a maximum size of the block. For example, the function, characteristic, or condition may be a maximum size of the block that is greater than the shape threshold p-4. If the condition is true, at least one intra prediction mode is allowed/selected for the block, slice, or picture. In one example, the prediction direction (index) is available even only when the condition of the block shape is true. In another example, only when the condition is true, one of 4 (multiples of 4) prediction directions (indices) is available.

In one embodiment according to the present disclosure, the function, characteristic or condition may further comprise a comparison of a direction threshold/value or range of values associated with or associated with the prediction direction of the block. For example, the function, characteristic, or condition may be a maximum size of the block that is greater than the shape threshold p ═ 4, and the prediction mode index k is less than 10 for horizontal tiles or greater than 26 for vertical tiles. If the condition is true, at least one intra prediction mode is allowed/selected for the block, slice, or picture. In one example, the prediction direction (index) is available even only when the condition of the block shape is true. In another example, only when the condition is true, one of 4 (multiples of 4) prediction directions (indices) is available.

In one embodiment according to the present disclosure, a more complex function of block shape and prediction direction of the block may be established. If the condition is true, at least one intra prediction mode is allowed/selected for the block, slice, or picture. In one example, the prediction direction (index) is available even only when the condition of the block shape is true. In another example, only when the condition is true, one of 4 (multiples of 4) prediction directions (indices) is available.

In one embodiment according to the present disclosure, any of the above embodiments associated with intra prediction modes (820, 830) are applicable to PDPC index coding. In another embodiment, any of the above embodiments is applicable to RSAF index coding. In a further embodiment, any of the above embodiments are applicable to multiple reference sample switching. That is, if the condition is not met, intra prediction is performed using a single row or column of references as in HEVC, otherwise multi-reference prediction is used. Multi-reference intra prediction refers to intra prediction using multiple rows and columns of reference pixels. It is also referred to as arbitrary layer (tier) reference intra prediction or multi-row intra prediction. In the known case, it is proposed to use weighted multiple references, where a weight is associated with each reference line (or layer).

As a variant, this condition may apply to any of the coding tools (PDPC, RSAF, multi-sample reference) associated with intra prediction described in the above embodiments. When the conditions are met, the tool is always used (i.e., the predictive method), otherwise the tool is not used. In another variation, when the condition is satisfied, the encoder chooses to use or not use the tool (with a classic RDO loop), and sends a flag if the tool has been used; otherwise (if the condition is not met), the tool is not used.

Fig. 13 shows a flowchart 1300 of an exemplary method of video encoding according to an embodiment of the present disclosure. The method 1300 includes, at step 1310, accessing a reference sample set for prediction of a block in a picture of video. Then, at step 1320, method 1300 includes processing the reference sample set based on the shape of the block. Next, at step 1330, method 1300 includes generating a prediction block for the block based on the processed set of reference samples. Finally, at step 1340, the method 1300 includes encoding the block based on the prediction block. Steps 1310-1340 may be performed, for example, by an encoder 700 (e.g., 760, 770, 775), including module 800. Specifically, steps 1310-1330 may be performed by, for example, modules 760, 800, 770, 775, including module 810 for step 1320 and modules 820, 830 for step 1330. The block shape may be, for example, one of a square shape, a vertical rectangular shape, and a horizontal rectangular shape.

According to one embodiment of the method, the prediction may be intra prediction or inter prediction.

According to one embodiment of the method, the process may be enabled or selected when a function or characteristic of or associated with a block shape is greater than a value or threshold. Thus, the condition is true when the function or characteristic is greater than a value or threshold. In one embodiment, when a function or characteristic is less than or equal to a value, the process may be disabled or not selected.

According to one embodiment of the method, the function or characteristic of the block shape or the function or characteristic associated with the block shape may be the maximum size of the block. In one embodiment, when the block shape is a horizontal rectangular shape, the maximum dimension is the block width; when the block shape is a vertical rectangular shape, the maximum dimension is the block height; when the block shape is square, the maximum dimension is the block width or block height.

According to one embodiment of the method, the function or characteristic may be a diagonal length of the block.

According to one embodiment of the method, the function or characteristic may be a weighted sum of the sizes of the blocks (e.g., 2 x width + height).

According to one embodiment of the method, the processing may be further based on a prediction mode. In one embodiment, the processing may be further based on a predicted directional mode of the block.

According to one embodiment of the method, the processing may be enabled or selected for at least one prediction mode when the function or characteristic is greater than the value. For example, the process may be enabled for a horizontal prediction mode in a horizontal rectangular block and a vertical prediction mode in a vertical rectangular block.

According to one embodiment of the method, for non-square blocks, the processing may be disabled or not selected for at least one directional prediction mode in the direction of the minimum size of the block. For example, the process may be disabled for a horizontal prediction mode in a vertical tile and/or a vertical prediction mode in a horizontal tile.

According to one embodiment of the method, at least one flag is included in the encoded video, the at least one flag indicating at least one of a value and whether processing is enabled. The flag may be retrieved at the decoder prior to or during prediction.

According to one embodiment of the method, the processing includes smoothing filtering the set of reference samples. The smoothing filter may be a low pass filter.

According to one embodiment of the method, the prediction is based on block shape. In one embodiment, for at least one block shape, at least one prediction mode (e.g., intra prediction direction) is allowed. In an embodiment, for at least one block shape, at least one prediction mode is not allowed. In one embodiment, as previously described, at least one prediction mode is enabled or disabled when the condition of the block shape is true.

It should be understood that method 1300 is also applicable to any of the additional embodiments and examples previously described in this disclosure.

According to one embodiment, the method may further include receiving a picture, partitioning the picture into a plurality of blocks including a block, determining a prediction residual for the block, transforming and quantizing the residual to obtain a plurality of transform coefficients, and entropy encoding the transform coefficients. The steps of transforming and quantizing may be performed by, for example, modules 725 and 730 of encoder 700. The step of entropy encoding may be performed by, for example, module 745 of encoder 700. The steps of receiving, transforming and quantizing may be optional, bypassed or removed as they may have been previously performed by another device and/or the results may have been stored in memory.

It should be appreciated that any of the embodiments of the method 1300 described above may be implemented by an encoder 700 (e.g., 760, 770, 775) that includes an intra-prediction module 800. The blocks of the encoder 700, including the intra prediction module 800, may be implemented by hardware (e.g., an integrated circuit) or software that is stored in a memory and executed by a processor.

Decoding

Fig. 14 shows a simplified block diagram of an exemplary video decoder 1400 in accordance with an embodiment of the present disclosure. The video decoder 1400 may be included in a receiver of a communication system. Although not all operations in the video decoder 1400 are the inverse of the encoding process (e.g., intra-prediction and inter-prediction) performed by the video encoder 700 described in fig. 7, the decoder typically performs a decoding process that is reciprocal to the encoding process. In particular, the input to the decoder 1400 comprises a video bitstream that can be generated by the video encoder 700. The bitstream is first entropy decoded (block 1430) to obtain transform coefficients, motion vectors, syntax elements and other coding information. The transform coefficients are dequantized (block 1440) and inverse transformed (block 1450) to decode the residual. The decoded residual is then combined (block 1455) with the predicted block of samples (also referred to as predictor) to obtain a decoded/reconstructed image block. The encoder decides (e.g., module 705) which of the intra mode or inter mode to use to encode the block and indicates the intra/inter decision by the prediction mode flag. The prediction sample block may be obtained (block 1405) from intra prediction (block 1460) or motion compensated prediction (i.e., inter prediction) (block 1470). An in-loop filter may be applied to the reconstructed image (block 1465). The in-loop filter may include a deblocking filter and an SAO filter. The filtered image is stored in a reference picture buffer 1480.

The modules of the video decoder 1400 may be implemented in software and executed by a processor or may be implemented using circuit components well known to those skilled in the art of compression. In particular, the video encoder 1400 may be implemented as an Integrated Circuit (IC) as a codec, alone or in combination with the video decoder 700.

In addition to the differences described in this disclosure, in particular, in addition to the differences of the intra prediction module 1460, such as module 760 of fig. 7, or the inter prediction module 1475, such as modules 770, 775 of fig. 7, according to this disclosure, as will be described in more detail in the following paragraphs and drawings, the modules of the video decoder 1400 are also present in other video decoders (e.g., HEVC decoders). For functions other than intra-prediction module 1460 and/or inter-prediction 1475, video decoder 1400 may be similar to an HEVC video decoder, and these functions are not described in detail herein.

Further, intra-prediction module 1460 may be similar to intra-prediction module 760 in fig. 7 and 800 in fig. 8. And motion compensation module 1470 may be similar to motion compensation module 770 of fig. 7.

Fig. 15 shows a flowchart 1500 of an exemplary method of video decoding according to one embodiment of the present disclosure. The method 1500 includes, at step 1510, accessing a set of reference samples for prediction of a block in a picture of the encoded video. Then, at step 1520, the method 1500 includes processing the reference sample set based on the shape of the block. Next, at step 1530, method 1500 includes generating a prediction block for the block based on the processed reference sample set. Finally, at step 1540, the method 1500 includes decoding the block based on the prediction block. Steps 1510 through 1540 may be performed by decoder 1400 (e.g., including module 800), for example. In particular, steps 1510 to 1530 may be performed by, for example, modules 1460, 800, 1475, including module 810 for step 1520 and modules 820, 830 for step 1530. The block shape may be, for example, one of a square shape, a vertical rectangular shape, and a horizontal rectangular shape.

According to one embodiment of the method, the prediction may be intra prediction or inter prediction.

According to one embodiment of the method, the function or characteristic or attribute of the block shape or associated with the block shape may be the maximum size of the block. In one embodiment, when the block shape is a horizontal rectangular shape, the maximum dimension is the block width; when the block shape is a vertical rectangular shape, the maximum dimension is the block height; when the block shape is square, the maximum size is the block width or block height.

According to one embodiment of the method, the function or characteristic may be a diagonal length of the block.

According to one embodiment of the method, the function or characteristic may be a weighted sum of the sizes of the blocks (e.g., 2 x width + height).

According to one embodiment of the method, the processing may be enabled or selected for at least one prediction mode when the function or characteristic is greater than a value. For example, the process may be enabled for a horizontal prediction mode in a horizontal rectangular block and a vertical prediction mode in a vertical rectangular block.

According to one embodiment of the method, at least one flag is included in the encoded video, the at least one flag indicating at least one of a value and whether the processing is enabled. The flag may be retrieved at the decoder prior to or during prediction.

According to one embodiment of the method, the processing includes smoothing filtering the set of reference samples. The smoothing filter may be a low pass filter.

According to one embodiment of the method, the prediction is based on block shape. In one embodiment, at least one prediction mode (e.g., intra prediction direction) is allowed for at least one block shape. In one embodiment, at least one prediction mode is not allowed for at least one block shape. In one embodiment, as previously described, at least one prediction mode is enabled or disabled when the condition of the block shape is true.

It should be understood that method 1500 is also applicable to any of the additional embodiments and examples previously described in this disclosure in association with method 1300.

According to one embodiment, the method may further comprise receiving the encoded picture, entropy decoding the encoded block, inverse transforming the transform coefficient block to obtain a decoded residual, combining the decoded residual with the block of prediction samples to obtain a decoded/reconstructed image block. The transform coefficients may be further inverse quantized prior to inverse transformation. The steps of entropy decoding, inverse transforming and inverse quantizing may be performed by, for example, modules 1430, 1450 and 1440 of the decoder 1400, respectively. The steps of receiving, entropy decoding, inverse transforming and inverse quantizing, and combining may be optional, bypassed, or removed, as they may have been previously performed by and/or provided to another device, or the results may have been retrieved from and/or stored in memory.

It is to be appreciated that any of the embodiments of the method 1500 described above can be implemented by the decoder 1400 (e.g., 1460, 800, 1475). The modules of the decoder 1400 may be implemented by hardware (e.g., an integrated circuit) or in software, stored in a memory, and executed by a processor.

Fig. 16 illustrates a block diagram 1600 of an exemplary system in which aspects of the exemplary embodiments of the present disclosure may be implemented. The system 1600 may be embodied as a device that includes the various components described below and is configured to perform the processes described above. Examples of such devices include, but are not limited to, personal computers, laptop computers, smart phones, smart watches, tablet computers, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. System 1600 may be communicatively coupled to other similar systems and to a display via a communication channel as shown in fig. 16 and as known to those skilled in the art for implementing the exemplary video system described above. System 1600 may implement encoder 700 (e.g., including module 800), decoder 1400 (e.g., including module 800), or encoder(s) and decoder(s), either independently or in combination. Further, system 1600 may be implemented independently or jointly and configured to perform any of the processes of the present disclosure, including methods 1300 and/or 1500.

The system 1600 may include at least one processor 1610 configured to execute instructions loaded therein for performing various processes as described above. Processor 1610 may include embedded memory, input-output interfaces, and various other circuits known in the art. The system 1600 can also include at least one memory 1620 (e.g., a volatile memory device such as RAM, a non-volatile memory device such as ROM). The system 1600 may additionally include a storage device 1640, which storage device 1640 may include non-volatile memory, including, but not limited to, erasable programmable read-only memory (EPROM), ROM, programmable read-only memory (PROM), Dynamic RAM (DRAM), Static RAM (SRAM), flash memory, a magnetic disk drive, and/or an optical disk drive. As non-limiting examples, the storage device 1640 may comprise an internal storage device, an attached storage device, and/or a network accessible storage device. System 1600 may also include an encoder/decoder module 1630 configured to process data to provide encoded video or decoded video.

Encoder/decoder module 1630 represents a module that may be included in a device to perform encoding and/or decoding functions, e.g., in accordance with fig. 7 (e.g., including fig. 8) and fig. 14 (e.g., including fig. 8), respectively. The device may include one or both of an encoding and decoding module, as is known in the compression arts. Further, encoder/decoder module 1630 may be implemented as a separate element of system 1600 or may be incorporated within processor 1610 as a combination of hardware and software as is known to those of skill in the art. For example, the encoder/decoder module 1630 may be implemented as one or two separate integrated circuits and/or a field-programmable gate array (FPGA).

Program code to be loaded onto processor 1610 to perform the various processes described above may be stored in storage device 1640 and subsequently loaded into memory 1620 for execution by processor 1610. According to an example embodiment of the present disclosure, one or more of the processor(s) 1610, the memory 1620, the storage devices 1640, and the encoder/decoder module 1630 may store one or more of various items including, but not limited to, input video, decoded video, bitstreams, equations, formulas, matrices, variables, operations, and operating logic during execution of the processes discussed above.

System 1600 can also include a communication interface 1650 that enables communication with other devices via a communication channel 1660. Communication interface 1650 may include, but is not limited to, a transceiver configured to transmit and receive data from communication channel 1660. The communication interface may include, but is not limited to, a modem or network card, and the communication channel may be implemented in a wired and/or wireless medium. The various components of system 1600 may be connected or communicatively coupled together using various suitable connections, including but not limited to internal buses, wires, and printed circuit boards.

Exemplary embodiments according to the present disclosure may be performed by computer software executed by the processor 1610, or by hardware, or by a combination of hardware and software. As a non-limiting example, exemplary embodiments according to the present disclosure may be implemented by one or more integrated circuits. By way of non-limiting example, the memory 1620 may be of any type suitable to the technical environment and may be implemented using any suitable data storage technology, such as optical storage, magnetic storage, semiconductor-based storage, fixed memory and removable memory. By way of non-limiting example, the processor 1610 may be of any type suitable to the technical environment and may include one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture.

The embodiments described herein may be implemented in, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (e.g., discussed only as a method), the implementation of the features discussed can also be implemented in other forms (e.g., an apparatus or a program). The apparatus may be implemented in, for example, appropriate hardware, software and firmware. The methods may be implemented, for example, in an apparatus such as, for example, a processor, which refers generally to a processing device including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cellular telephones, portable/Personal Digital Assistants (PDAs), and other devices that facilitate the communication of information between end-users.

According to an aspect of the present disclosure, there is provided an apparatus 1600 for video encoding, the apparatus comprising a processor 1610 and at least one memory 1620, 1640 coupled to the processor, the processor 1610 being configured to perform any embodiment of the video encoding method 1300 described above.

According to an aspect of the present disclosure, there is provided a device 1600 for video decoding, the device comprising a processor 1610 and at least one memory 1620, 1640 coupled to the processor, the processor 1610 being configured to perform any embodiment of the video decoding method 1500 described above.

According to an aspect of the present disclosure, there is provided an apparatus for video encoding, including: means for accessing a reference sample set for prediction of a block in a picture of a video; means for processing the reference sample set based on a shape of a block; means for generating a prediction block for a block based on the processed set of reference samples; and means for encoding the block based on the prediction block. The video encoders of fig. 7 (e.g., including fig. 8) and fig. 16 may include structures or components of an apparatus, particularly modules 760 (e.g., 800), 770, 775, 1710, and 1730. The apparatus for video encoding may perform any embodiment of any of the methods 1300 of video encoding.

According to an aspect of the present disclosure, there is provided an apparatus for video decoding, including: means for accessing a reference sample set for prediction of a block in a picture of encoded video; means for processing the reference sample set based on a shape of a block; means for generating a prediction block for a block based on the processed reference samples; and means for encoding the block based on the prediction block. Fig. 14 (e.g., including fig. 8) and 17 may include structures or components of an apparatus for video decoding, particularly blocks 1460 (e.g., 800), 1475, 1710, and 1730. The apparatus for video decoding may perform any embodiment of any of the methods 1500 of video decoding.

It will be apparent to those skilled in the art that embodiments may produce signals in a variety of formats to carry information that may be stored or transmitted, for example. This information may include, for example, instructions for performing a method, or data generated by one of the described embodiments. For example, the signal may be formatted to carry a bitstream of the described embodiments. Such signals may be formatted, for example, as electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or as baseband signals. Formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information carried by the signal may be, for example, analog or digital information. As is well known, signals may be transmitted over a variety of different wired or wireless links. The signal may be stored on a processor readable medium.

According to an aspect of the disclosure, a signal comprises a bitstream formatted to include encoded data representing blocks of a picture, the encoded data being encoded according to any embodiment of any one of methods 1300 of video encoding.

According to an aspect of the disclosure, a bitstream is formatted to include encoded data representing blocks of a picture, the encoded data being encoded according to any of the embodiments of any of the methods 1300 of video encoding.

Further, any of methods 1300 and/or 1500 may be implemented as a computer program product comprising computer-executable instructions that may be executed (independently or in conjunction) by a processor. A computer program product having computer-executable instructions may be stored in a corresponding transitory or non-transitory computer-readable storage medium of system 1600, encoder 700 (e.g., including module 800), and/or decoder 1400 (e.g., including module 800).

According to an aspect of the present disclosure, there is provided a computer program product comprising program code instructions for performing (independently or jointly) any embodiment of any of the methods 1300 and/or 1500 of the present disclosure.

It is important to note that in some embodiments, one or more elements of processes 1300 and/or 1500 may be combined, performed in a different order, or eliminated while still implementing aspects of the present disclosure. Other steps may be performed in parallel, where the processor does not wait for the complete completion of one step before beginning another step.

Furthermore, aspects of the present disclosure may take the form of a computer-readable storage medium. Any combination of one or more computer-readable storage media may be utilized. The computer-readable storage medium may take the form of a computer-readable program product embodied in one or more computer-readable media and having computer-readable program code embodied thereon that is executable by a computer. Computer-readable storage media, as used herein, is considered to be non-transitory storage media that have the inherent capability of storing information therein and the inherent capability of providing retrieval of information therefrom. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.

It should be understood that the following list, while providing more specific examples of computer readable storage media to which the present disclosure may be applied, is merely an illustrative and non-exhaustive list, as would be readily understood by one of ordinary skill in the art. An exemplary list includes a portable computer diskette, a hard disk, a ROM, an EPROM, a flash memory, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be appreciated that reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the disclosure, as well as other variations thereof, means that a particular feature, structure, characteristic, etc. described in connection with the embodiment is included in at least one embodiment of the disclosure. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation," as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

In addition, the present disclosure or claims hereof may refer to "determining" various information. Determining the information may include, for example, one or more of estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

In addition, the present disclosure or claims hereof may refer to "providing" various information. Providing information may include, for example, one or more of outputting information, storing information, transmitting information, displaying information, showing information, or moving information.

In addition, the present disclosure or claims thereof may refer to the word "accessing" various information. Accessing information may include, for example, one or more of receiving information, retrieving information (e.g., from memory), storing information, processing information, moving information, copying information, erasing information, calculating information, determining information, predicting information, or estimating information.

In addition, the present disclosure or claims hereof may refer to "receiving" various information. Like "access," receive is a broad term. Receiving information may include, for example, one or more of accessing the information or retrieving the information (e.g., from memory). Further, "receiving" is typically involved in such or other ways during operations such as storing information, processing information, transmitting information, moving information, copying information, erasing information, calculating information, determining information, predicting information, or estimating information.

It should be understood that the various features shown and described are interchangeable. Features shown in one embodiment may be incorporated into another embodiment unless otherwise specified. Furthermore, features described in the various embodiments may be combined or separated unless otherwise indicated as being inseparable or not combined.

As previously mentioned, the functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. Further, when provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared.

It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the processes of the present disclosure are programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present disclosure.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present disclosure is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope of the present disclosure. Furthermore, the various embodiments may be combined without departing from the scope of the present disclosure. All such variations and modifications are intended to be included herein within the scope of this disclosure as set forth in the appended claims.

34页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：全向视频切片段

Method and apparatus for block shape based video encoding and decoding

相关技术

网友询问留言