DMVR using decimated prediction blocks

文档序号：214973 发布日期：2021-11-05 浏览：20次中文

阅读说明：本技术 使用抽取预测块的dmvr (DMVR using decimated prediction blocks ) 是由斯利拉姆·赛阿瑟拉门塞米赫·艾森力克奇甫阿·瑞安·阿赛格安仁·蔻特查阿于 2020-03-05 设计创作，主要内容包括：本公开提供了一种帧间预测方法,包括以下步骤：获取用于双向预测的初始运动矢量和参考图像；根据初始运动矢量和候选运动矢量获取参考图像中的候选样本位置集合,其中每个候选运动矢量由初始运动矢量和各自的运动矢量偏移导出,并且其中每个候选样本位置集合对应于每个候选运动矢量；从每个候选样本位置集合中获取各自的样本位置集合；在每个样本位置集合内计算每个候选运动矢量的匹配代价；基于计算出的每个候选运动矢量的匹配代价获取细化的运动矢量；以及基于细化的运动矢量获取当前块的预测值。(The present disclosure provides an inter-frame prediction method, including the steps of: acquiring an initial motion vector and a reference image for bidirectional prediction; obtaining a set of candidate sample positions in the reference image from the initial motion vector and the candidate motion vectors, wherein each candidate motion vector is derived from the initial motion vector and a respective motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector; obtaining a respective sample position set from each candidate sample position set; calculating a matching cost for each candidate motion vector within each set of sample positions; obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and obtaining a prediction value of the current block based on the refined motion vector.)

1. An inter prediction method comprising:

acquiring an initial motion vector and a reference image for bidirectional prediction;

obtaining a set of candidate sample positions in the reference image from candidate motion vectors, wherein each candidate motion vector is derived from the initial motion vector and each preset motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector;

obtaining each sample position set from each candidate sample position set;

calculating a matching cost for each candidate motion vector within each set of sample positions;

obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and

obtaining a prediction value of the current block based on the refined motion vector.

2. The method according to claim 1, wherein the initial motion vector and/or the reference picture are obtained based on signaled indication information in a bitstream.

3. The method of claim 1, wherein the initial motion vector is obtained from a motion vector predictor and a motion vector difference signaled in the bitstream.

4. The method of claim 3, wherein the motion vector predictor is indicated by an index signaled in the bitstream, wherein the index is used to indicate a position in a candidate vector list.

5. The method of any of claims 1 to 4, wherein the matching cost is calculated using a similarity measure, a dissimilarity measure, a Sum of Absolute Differences (SAD), a mean removal sum of absolute differences (MSD) and MRSAD, or a Sum of Squared Errors (SSE).

6. The method of any of claims 1-5, wherein the set of candidate sample locations is within a bounding rectangular region, and wherein the bounding rectangular region is calculated using the initial motion vector, an upper left corner location of the current block, a motion vector refinement range in a horizontal direction and a vertical direction.

7. The method of any of claims 1 to 6, wherein the set of sample locations is obtained by a regular extraction of the set of candidate sample locations.

8. The method of any of claims 1-7, wherein sample positions on alternate rows in the set of candidate sample positions are selected as the set of sample positions.

9. The method of any of claims 1-8, wherein a zipper pattern of locations of interpolated samples in the set of candidate sample locations is selected as the set of sample locations.

10. The method of any of claims 1-9, wherein the initial motion vector is a first motion vector or a second motion vector, and the first motion vector and the second motion vector correspond to different reference picture lists.

11. An encoder (20) comprising processing circuitry for performing the method according to any one of claims 1 to 10.

12. A decoder (30) comprising processing circuitry for performing the method according to any one of claims 1 to 10.

13. A computer program product comprising program code for performing the method according to any one of claims 1 to 10.

14. A decoder, comprising:

one or more processors; and

a non-transitory computer readable storage medium coupled to the processor and storing programming for execution by the processor, wherein the programming, when executed by the processor, configures the decoder to perform the method of any of claims 1-10.

15. An encoder, comprising:

one or more processors; and

a non-transitory computer readable storage medium coupled to the processor and storing programming for execution by the processor, wherein the programming, when executed by the processor, configures the encoder to perform the method of any of claims 1-10.

16. An apparatus (1200) comprising motion vector refinement means (1210) for: acquiring an initial motion vector and a reference image for bidirectional prediction; obtaining a set of candidate sample positions in the reference image from the initial motion vector and candidate motion vectors, wherein each candidate motion vector is derived from the initial motion vector and each preset motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector; obtaining each sample position set from each candidate sample position set; calculating a matching cost for each candidate motion vector within each set of sample positions; obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and obtaining a prediction value of the current block based on the refined motion vector.

Technical Field

Embodiments of the present application (disclosure) relate generally to the field of image processing and, more particularly, to a set of prediction samples used during a search in a decoder-side motion refinement method.

Background

Video encoding (video encoding and decoding) is used in a wide range of digital video applications, such as video transmission over broadcast digital TV, the internet and mobile networks, real-time conversational applications (such as video chat, video conferencing, DVD and blu-ray disc), video content capture and editing systems, and cameras in security applications.

Rendering even relatively short video may require large amounts of video data, which may cause difficulties when the data is to be streamed or otherwise communicated over a communication network having limited bandwidth capacity. Therefore, video data is typically compressed before transmission over modern telecommunication networks. When video is stored on a storage device, the size of the video may also be an issue because memory resources may be limited. Video compression devices typically encode video data at a source using software and/or hardware prior to transmission or storage, thereby reducing the amount of data required to represent digital video images. The compressed data is then received at the destination by a video decompression device that decodes the video data. With limited network resources and growing demand for higher video quality, improved compression and decompression techniques are desired to increase the compression ratio with little sacrifice in image quality.

Disclosure of Invention

Embodiments of the present disclosure provide apparatuses and methods for encoding and decoding according to the independent claims.

The foregoing and other objects are achieved by the subject matter of the independent claims. Further implementations are apparent from the dependent claims, the description and the drawings.

According to a first aspect, the present disclosure provides an inter prediction method, comprising the steps of: acquiring an initial motion vector and a reference image for bidirectional prediction; obtaining a set of candidate sample positions in the reference image from the candidate motion vectors, wherein each candidate motion vector is derived as an initial motion vector and a respective motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector; obtaining a respective sample position set from each candidate sample position set; calculating a matching cost for each candidate motion vector within each set of sample positions; obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and obtaining a prediction value of the current block (block) based on the refined motion vector.

The method according to the first aspect reduces the complexity of the prediction process associated with generating prediction samples for calculating the matching cost value.

In a possible implementation form of the method according to the first aspect, the initial motion vector and/or the reference picture is obtained based on signaled indication information in the bitstream. Since the initial motion vector is signaled in the bitstream, it is not possible to represent the initial motion vector with very high accuracy, since this will increase the bitrate, and thus the initial motion vector is improved with a motion vector refinement procedure even when the accuracy of the initial motion vector is not very high.

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, the initial motion vector is obtained from a motion vector predictor and a motion vector difference signaled in the bitstream. Thus, the motion vector may be determined based on the indication information defining the initial motion vector in the bitstream and the signaled difference value.

In a possible implementation form of the method according to the previous implementation, the motion vector predictor is indicated by an index signaled in the bitstream, wherein the index is used to indicate a position in the candidate vector list. This is an efficient way of including the initial motion vector in the bitstream with respect to the number of bits.

The initial motion vector may be determined based on indication information in the bitstream. For example, an index may be signaled in the bitstream, where the index indicates a position in the candidate motion vector list. In another example, the motion vector predictor index and the motion vector difference value may be signaled in a bitstream.

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, the matching cost is calculated using a similarity measure, a dissimilarity measure, a Sum of Absolute Differences (SAD), a Mean Removed Sum of Absolute Differences (MRSAD) or a Sum of Squared Errors (SSE). These are useful functions for calculating the matching cost.

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, the set of candidate sample positions is within a bounding rectangular region, and the bounding rectangular region is calculated using the initial motion vector, an upper left corner position of the current block, and motion vector refinement ranges in horizontal and vertical directions.

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, the set of sample positions is obtained by a regular extraction (determination) of the set of candidate sample positions. This method improves the compression efficiency compared to a method that does not use any decimation.

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, the sample positions on alternate rows of the set of candidate sample positions are selected as the set of sample positions. This increases the speed of the method and reduces the storage requirements.

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, a Zipper pattern (Zipper pattern) of positions of interpolated samples in the set of candidate sample positions is selected as the set of sample positions. For example, even parity rows may produce first prediction samples at a predetermined number of consecutive sample positions aligned left (or right), and odd parity rows may produce prediction samples at a predetermined number of consecutive sample positions aligned right (or left).

In a possible implementation form of the method according to any of the preceding implementation forms or the first aspect, the initial motion vector is a first motion vector or a second motion vector, and the first motion vector and the second motion vector correspond to different reference picture lists.

According to a second aspect, the present disclosure provides an encoder comprising processing circuitry for performing a method according to the first aspect or any preceding implementation.

According to a third aspect, the present disclosure provides a decoder comprising processing circuitry for performing a method according to the first aspect or any preceding implementation.

According to a fourth aspect, the present disclosure provides a computer program product comprising program code for performing the method according to the first aspect or any of the preceding implementations.

According to a fifth aspect, the present invention provides a decoder comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing programming for execution by the processor, wherein the programming, when executed by the processor, configures the decoder to perform a method according to the first aspect or any of the preceding implementations.

According to a sixth aspect, the present disclosure provides an encoder comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing programming for execution by the processor, wherein the programming, when executed by the processor, configures the encoder to perform a method according to the first aspect or any of the preceding implementations.

According to a seventh aspect, the present disclosure provides an apparatus comprising: motion vector refinement means for: acquiring an initial motion vector and a reference image for bidirectional prediction; obtaining a set of candidate sample positions in the reference image from the initial motion vector and the candidate motion vectors, wherein each candidate motion vector is derived from the initial motion vector and each preset motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector; obtaining each sample position set from each candidate sample position set; calculating a matching cost for each candidate motion vector within each set of sample positions; obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and obtaining a prediction value of the current block based on the refined motion vector. Thus, the apparatus (in particular the motion vector refinement means) is configured to perform the method steps of the method according to the first aspect.

The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Embodiments of the invention are described in more detail below with reference to the attached drawing figures, wherein:

FIG. 1A is a block diagram illustrating an example of a video encoding system for implementing an embodiment of the present invention;

FIG. 1B is a block diagram illustrating another example of a video encoding system for implementing an embodiment of the present invention;

FIG. 2 is a block diagram illustrating an example of a video encoder for implementing an embodiment of the present invention;

FIG. 3 is a block diagram showing an example structure of a video decoder for implementing an embodiment of the present invention;

fig. 4 is a block diagram showing an example of an encoding apparatus or a decoding apparatus;

fig. 5 is a block diagram showing another example of an encoding apparatus or a decoding apparatus;

fig. 6 is an example of interpolated samples for refinement using example vertical decimation.

Fig. 7A is an example regarding a sample for calculating a matching cost.

Fig. 7B is another example of a sample for calculating a matching cost.

Fig. 8A is another example of interpolated samples for refinement using example vertical decimation.

Fig. 8B is an example regarding a sample for calculating a matching cost.

Fig. 8C is another example regarding a sample for calculating a matching cost.

FIG. 9 is a flow diagram of one embodiment for obtaining final prediction samples.

Fig. 10A is an example of a zipper (Zipped) interpolated sample.

Fig. 10B is an example of a sample used to calculate the overlap-match cost.

Fig. 10C is an example of a sample for calculating an overlap-match cost.

Fig. 11 is a flow chart for an embodiment according to the first aspect of the present disclosure.

Fig. 12 illustrates an embodiment of an inter prediction unit including a motion vector refinement apparatus.

In the following, identical reference numerals denote identical or at least functionally equivalent features, if not explicitly stated otherwise.

Detailed Description

In the following description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific aspects of embodiments of the invention or which may be used. It should be understood that embodiments of the invention may be used in other ways and include structural or logical changes not shown in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.

For example, it should be understood that the disclosure in connection with the described method is also true for a corresponding apparatus or system for performing the method, and vice versa. For example, if one or more particular method steps are described, a corresponding apparatus may comprise one or more units, e.g., functional units, to perform the described one or more method steps (e.g., one unit performs one or more steps, or each of multiple units performs one or more of the multiple steps), even if the one or more units are not explicitly described or shown in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units (e.g., functional units), the corresponding method may include one step of performing the function of the one or more units (e.g., one step performs the function of the one or more units, or each of the plurality of steps performs the function of one or more of the plurality of units), even if the one or more steps are not explicitly described or illustrated in the figures. Further, it should be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

Video coding generally refers to the process of forming a video or a sequence of images of a video sequence. Instead of the term "image", the terms "frame" or "picture" may be used as synonyms in the field of video coding. Video encoding (or encoding in general) includes both video encoding and video decoding. Video encoding is performed on the source side, typically involving processing (e.g., by compression) of the original video image to reduce the amount of data required to represent the video image (for more efficient storage and/or transmission). Video decoding is performed at the destination side and typically involves the inverse processing compared to the encoder to reconstruct the video image. Embodiments that relate to the "encoding" of video images (or images in general) are understood to relate to the "encoding" or "decoding" of video images or respective video sequences. The combination of the encoding and Decoding portions is also called codec (encoding and Decoding).

In the case of lossless video coding, the original video image can be reconstructed, i.e., the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, further compression is performed, for example by quantization, to reduce the amount of data representing the video image, which cannot be fully reconstructed at the decoder, i.e. the quality of the reconstructed video image is lower or inferior to the quality of the original video image.

Several video coding standards belong to the group of "lossy hybrid video codecs" (i.e., spatial and temporal prediction in the sample domain is combined with 2D transform coding in the transform domain for applying quantization). Each image of a video sequence is typically partitioned into a set of non-overlapping blocks, and encoding is typically performed at the block level. In other words, at the encoder, the video is typically processed (i.e. encoded) at the block (video block) level, for example by generating a prediction block using spatial (intra picture) prediction and/or temporal (inter picture) prediction, subtracting the prediction block from the current block (currently processed/block to be processed) to obtain a residual block, transforming the residual block and quantizing the residual block in the transform domain to reduce the amount of data to be transmitted (compression), while at the decoder, the inverse process compared to the encoder is applied to the encoded or compressed block to reconstruct the current block for representation. Furthermore, the encoder replicates the decoder processing loop so that both will generate the same prediction (e.g., intra and inter prediction) and/or reconstruction for processing (i.e., encoding) subsequent blocks.

In the following embodiments of the video encoding system 10, the video encoder 20 and the video decoder 30 are described based on fig. 1 to 3.

Fig. 1A is a schematic block diagram illustrating an example encoding system 10, such as encoding system 10 (or simply encoding system 10) that may utilize the techniques of the present application. Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) of video encoding system 10 represent examples of devices that may be used to perform techniques in accordance with various examples described in this application.

As shown in fig. 1A, encoding system 10 includes a source device 12 for providing encoded image data 21 (e.g., to a destination device 14 for decoding encoded image data 13).

Source device 12 includes an encoder 20 and may additionally, i.e., optionally, include an image source 16, a pre-processor (or pre-processing unit) 18 (e.g., image pre-processor 18), and a communication interface or unit 22.

Image source 16 may include or may be any type of image capture device, such as a camera for capturing real-world images, and/or any type of image generation device, such as a computer graphics processor for generating computer-animated images, or any type of other device for acquiring and/or providing real-world images, computer-generated images (e.g., screen content, Virtual Reality (VR) images), and/or any combination thereof (e.g., Augmented Reality (AR) images). The image source can be any type of memory or storage that stores any of the aforementioned images.

Unlike preprocessor 18 and the processing performed by preprocessing unit 18, image or image data 17 may also be referred to as original image or original image data 17.

Preprocessor 18 is to receive (raw) image data 17 and perform preprocessing on image data 17 to obtain a preprocessed image 19 or preprocessed image data 19. The pre-processing performed by pre-processor 18 may include, for example, trimming, color format conversion (e.g., from RGB to YCbCr), color correction, or de-noising. It is to be understood that the pre-processing unit 18 may be an optional component.

Video encoder 20 is operative to receive pre-processed image data 19 and provide encoded image data 21 (further details are described below, e.g., based on fig. 2).

Communication interface 22 of source device 12 may be used to receive encoded image data 21 and send encoded image data 21 (or any further processed version thereof) over communication channel 13 to another device, such as destination device 14 or any other device for storage or direct reconstruction.

Destination device 14 includes a decoder 30 (e.g., a video decoder 30) and may additionally, i.e., optionally, include a communication interface or communication unit 28, a post-processor 32 (or post-processing unit 32), and a display device 34.

Communication interface 28 of destination device 14 is used to receive encoded image data 21 (or any further processed version thereof), e.g., directly from source device 12 or from any other source (e.g., a storage device such as an encoded image data storage device), and provide encoded image data 21 to decoder 30.

Communication interface 22 and communication interface 28 may be used to send or receive encoded image data 21 or encoded data 13 via a direct communication link (e.g., a direct wired or wireless connection, or via any type of network, such as a wired or wireless network or any combination thereof, or any type of private and public network, or any combination thereof) between source device 12 and destination device 14.

Communication interface 22 may, for example, be used to package encoded image data 21 into a suitable format, such as packets, and/or process the encoded image data using any type of transport encoding or processing for transmission over a communication link or communication network.

Communication interface 28, which forms a counterpart of communication interface 22, may for example be used to receive transmitted data and process the transmitted data using any type of corresponding transmission decoding or processing and/or de-encapsulation to obtain encoded image data 21.

Both communication interface 22 and communication interface 28 may be configured as a unidirectional communication interface (as indicated by the arrow of communication channel 13 pointing from source device 12 to destination device 14 in fig. 1A) or a bidirectional communication interface, and may be used, for example, to send and receive messages, for example, to establish a connection, to acknowledge and exchange any other information related to a communication link and/or a data transmission (e.g., an encoded image data transmission).

Decoder 30 is operative to receive encoded image data 21 and provide decoded image data 31 or decoded image 31 (further details are described below, e.g., based on fig. 3 or 5).

The post-processor 32 of the destination device 14 is configured to post-process the decoded image data 31 (also referred to as reconstructed image data), such as the decoded image 31, to obtain post-processed image data 33, such as the post-processed image 33. Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color correction, trimming or resampling, or any other processing, e.g., to prepare decoded image data 31 for display, e.g., by display device 34.

The display device 34 of the destination device 14 is used to receive the post-processed image data 33 for displaying the image (e.g., to a user or viewer). The display device 34 may be or may include any type of display, such as an integrated or external display or monitor, for representing the reconstructed image. The display may, for example, include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other type of display.

Although fig. 1A depicts source device 12 and destination device 14 as separate devices, embodiments of the devices may include both or both functionality, source device 12 or corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.

Based on this description, it will be apparent to those skilled in the art that the existence and (exact) division of functionality of different units or functions within source device 12 and/or destination device 14, as shown in fig. 1A, may vary depending on the actual device and application.

Encoder 20 (e.g., video encoder 20) or decoder 30 (e.g., video decoder 30), or both encoder 20 and decoder 30, may be implemented by processing circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, dedicated video encoding, or any combinations thereof, as shown in fig. 1B. Encoder 20 may be implemented via processing circuitry 46 to embody the various modules as discussed with respect to encoder 20 of fig. 2 and/or any other encoder system or subsystem described herein. Decoder 30 may be implemented via processing circuitry 46 to embody the various modules as discussed with respect to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. The processing circuitry may be used to perform various operations discussed later. As shown in fig. 5, if the techniques are implemented in part in software, the device may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Either of video encoder 20 and video decoder 30 may be integrated as part of a combined CODEC (CODEC) in a single device, for example, as shown in fig. 1B.

Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or fixed device, such as a notebook or laptop computer, a mobile phone, a smart phone, a tablet or tablet computer, a camera, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game console, a video streaming device (such as a content service server or a content delivery server), a broadcast receiver device, a broadcast transmitter device, etc., and may not use or use any type of operating system. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Thus, source device 12 and destination device 14 may be wireless communication devices.

In some cases, the video encoding system 10 shown in fig. 1A is merely one example, and the techniques of this application may be applied to a video encoding setup (e.g., video encoding or video decoding), which does not necessarily include any data communication between an encoding device and a decoding device. In other examples, data is retrieved from local memory, streamed over a network, and so forth. The video encoding device may encode and store data in memory, and/or the video decoding device may retrieve and decode data from memory. In some examples, the encoding and decoding are performed by devices that are not in communication with each other, but simply encode data to and/or retrieve and decode data from memory.

For ease of description, embodiments of the present invention are described herein, for example, with reference to High-Efficiency Video Coding (HEVC) or general Video Coding (VVC) reference software, next-generation Video Coding standards developed by the ITU-T Video Coding Experts Group (VCEG) and the Joint Collaboration of Video Coding Team (JCT-VC) of the ISO/IEC Motion Picture Experts Group (MPEG). One of ordinary skill in the art will appreciate that embodiments of the present invention are not limited to HEVC or VVC.

Encoder and encoding method

Fig. 2 shows a schematic block diagram of an example video encoder 20 for implementing the techniques of the present application. In the example of fig. 2, video encoder 20 includes an input 201 (or input interface 201), a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a loop filter unit 220, a Decoded Picture Buffer (DPB)230, a mode selection unit 260, an entropy coding unit 270, and an output 272 (or output interface 272). The mode selection unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a partition unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 as shown in fig. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.

The residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the mode selection unit 260 may be referred to as forming a forward signal path of the encoder 20, and the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB)230, the inter prediction unit 244, and the intra prediction unit 254 may be referred to as forming a backward signal path of the video encoder 20, wherein the backward signal path of the video encoder 20 corresponds to a signal path of a decoder (see the video decoder 30 in fig. 3). Inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, Decoded Picture Buffer (DPB)230, inter prediction unit 244, and intra prediction unit 254 are also referred to as "built-in decoders" that form video encoder 20.

Image & image partition (image & block)

The encoder 20 may be used for receiving images 17 (or image data 17), e.g. via input 201, e.g. forming images in a sequence of images of a video or video sequence. The received image or image data may also be a pre-processed image 19 (or pre-processed image data 19). For simplicity, the following description refers to image 17. The image 17 may also be referred to as a current image or an image to be encoded (in particular in video encoding to distinguish the current image from other images (for example previously encoded and/or decoded images in the same video sequence, i.e. a video sequence also comprising the current image)).

The (digital) image is or can be regarded as a two-dimensional array or matrix of samples having intensity values. The samples in an array may also be referred to as pixels (short for picture elements) or pels (pels). The number of samples in the horizontal and vertical directions (or axes) of the array or image defines the size and/or resolution of the image. For the representation of colors, three color components are typically used, i.e. the image may be represented or comprise three arrays of samples. In the RBG format or color space, the image includes respective arrays of red, green, and blue samples. However, in video coding, each pixel is typically represented in a luminance and chrominance format or color space, such as YCbCr, which includes a luminance component (sometimes L is also used) indicated by Y and two chrominance components indicated by Cb and Cr. The luminance (or luma) component Y represents the luminance or gray level intensity (e.g. as in a gray level image), while the two chrominance (or chroma) components Cb and Cr represent the chrominance or color information components. Thus, an image in YCbCr format includes a luminance sample array of luminance sample values (Y) and two chrominance sample arrays of chrominance values (Cb and Cr). An image in RGB format may be converted or transformed into YCbCr format and vice versa, a process also known as color transformation or conversion. If the image is monochromatic, the image may include only an array of luma samples. Thus, the image may be, for example, an array of luminance samples in a monochrome format or 4: 2: 0. 4: 2: 2 and 4: 4: an array of luma samples in a 4 color format and two corresponding arrays of chroma samples.

Embodiments of the video encoder 20 may include an image partitioning unit (not shown in fig. 2) for partitioning the image 17 into a plurality of (typically non-overlapping) image blocks 203. These blocks may also be referred to as root blocks, macroblocks (h.264/AVC), or Coding Tree Blocks (CTBs) or Coding Tree Units (CTUs) (h.265/HEVC and VVC). The image partitioning unit may be adapted to use the same block size and corresponding grid defining the block size for all images of the video sequence, or to vary the block size between images or subsets or groups of images and partition each image into corresponding blocks.

In further embodiments, the video encoder may be adapted to receive the blocks 203 of the image 17 directly, e.g. forming one, several or all blocks of the image 17. The image block 203 may also be referred to as a current image block or an image block to be encoded.

Like image 17, image block 203 is also or can be thought of as a two-dimensional array or matrix of samples having intensity values (sample values), but of a smaller size than image 17. In other words, block 203 may include, for example, one sample array (e.g., a luma array in the case of a monochrome image 17, or a luma or chroma array in the case of a color image) or three sample arrays (e.g., a luma array and two chroma arrays in the case of a color image 17), or any other number and/or type of arrays, depending on the color format applied. The number of samples in the horizontal and vertical directions (or axes) of the block 203 defines the size of the block 203. Thus, a block may be, for example, an array of MxN (M columns by N rows) samples, or an array of MxN transform coefficients.

The embodiment of video encoder 20 as shown in fig. 2 may be used to encode image 17 block by block, e.g., encoding and prediction are performed in blocks 203.

The embodiment of video encoder 20 as shown in fig. 2 may further be used to partition and/or encode images by using slices (also referred to as video slices), wherein an image may be partitioned into or encoded using one or more slices (typically non-overlapping), and each slice may include one or more blocks (e.g., CTUs).

The embodiment of the video encoder 20 as shown in fig. 2 may further be used for partitioning and/or encoding an image by using groups of tiles (also referred to as video tiles) and/or tiles (also referred to as video tiles), wherein an image may be partitioned into or encoded using one or more groups of tiles (typically non-overlapping) and each group of tiles may comprise for example one or more blocks (e.g. CTUs) or one or more tiles, wherein each tile may for example be rectangular shaped and may comprise one or more blocks (e.g. CTUs), e.g. complete or partial blocks.

Residual calculation

The residual calculation unit 204 may be configured to calculate a residual block 205 (also referred to as a residual 205) based on the image block 203 and the prediction block 265 (further details regarding the prediction block 265 are provided later), e.g. by subtracting sample values of the prediction block 265 from sample values of the image block 203, obtaining the residual block 205 in the sample domain sample-by-sample (pixel-by-pixel).

Transformation of

The transform processing unit 206 may be configured to apply a transform, such as a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST), to the sample values of the residual block 205 to obtain transform coefficients 207 in the transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients and represent a residual block 205 in the transform domain.

The transform processing unit 206 may be used to apply integer approximations of DCT/DST, such as the transforms specified for h.265/HEVC. Such integer approximations are typically scaled by a factor compared to the orthogonal DCT transform. To preserve the norm of the residual block processed by the forward and inverse transforms, an additional scaling factor is applied as part of the transform process. The scaling factor is typically selected based on certain constraints, such as the scaling factor being a power of 2 for the shift operation, the scaling factor being a trade-off between bit depth of the transform coefficients, precision and implementation cost, etc. The particular scaling factor is specified for the inverse transform, e.g., by inverse transform processing unit 212 (and corresponding inverse transform, e.g., performed by inverse transform processing unit 312 at video decoder 30), and the corresponding scaling factor for the forward transform at encoder 20 may be specified accordingly, e.g., by transform processing unit 206.

Embodiments of video encoder 20 (transform processing unit 206, respectively) may be configured to output transform parameters, e.g., a type of one or more transforms, e.g., directly output or encoded or compressed via entropy coding unit 270, such that, e.g., video decoder 30 may receive and use the transform parameters for decoding.

Quantization

Quantization unit 208 may be used to quantize transform coefficients 207 (e.g., by applying scalar quantization or vector quantization) to obtain quantized coefficients 209. Quantized coefficients 209 may also be referred to as quantized transform coefficients 209 or quantized residual coefficients 209.

The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, during quantization, an n-bit transform coefficient may be rounded down to an m-bit transform coefficient, where n is greater than m. The degree of quantization may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different scaling may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. The applicable quantization step size may be indicated by a Quantization Parameter (QP). The quantization parameter may for example be an index of a predefined set of applicable quantization steps. For example, a small quantization parameter may correspond to a fine quantization (small quantization step size) and a large quantization parameter may correspond to a coarse quantization (large quantization step size), or vice versa. Quantization may comprise division by a quantization step size, and corresponding and/or inverse dequantization (e.g., performed by inverse quantization unit 210) may comprise multiplication by the quantization step size. According to some standard embodiments, such as HEVC, may be used to determine the quantization step size using a quantization parameter. In general, the quantization step size may be calculated based on a quantization parameter using a fixed point approximation of an equation including division. Additional scaling factors may be introduced for quantization and de-quantization to recover the norm of the residual block, which may be modified due to the scaling used in the fixed point approximation of the equation for the quantization step size and quantization parameter. In one example implementation, scaling and dequantization of the inverse transform may be combined. Alternatively, a customized quantization table may be used, which is signaled (e.g., in the bitstream) from the encoder to the decoder. Quantization is a lossy operation in which the loss increases with increasing quantization step size.

Embodiments of video encoder 20 (quantization unit 208, respectively) may be used to output Quantization Parameters (QPs), e.g., directly output or encoded via entropy coding unit 270, such that, for example, video decoder 30 may receive and apply the quantization parameters for decoding.

Inverse quantization

The inverse quantization unit 210 is configured to apply an inverse quantization of the quantization unit 208 on the quantized coefficients to obtain dequantized coefficients 211, e.g. by applying an inverse of the quantization scheme applied by the quantization unit 208 based on or using the same quantization step as the quantization unit 208. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211 and correspond to the transform coefficients 207, although they are usually not identical to the transform coefficients due to the loss of quantization.

Inverse transformation

Inverse transform processing unit 212 is to apply an inverse transform of the transform applied by transform processing unit 206, such as an inverse Discrete Cosine Transform (DCT) or an inverse Discrete Sine Transform (DST) or other inverse transform, to obtain a reconstructed residual block 213 (or corresponding dequantized coefficients 213) in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 213.

Reconstruction

The reconstruction unit 214 (e.g., an adder or adder 214) is configured to add the transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 to obtain a reconstructed block 215 in the sample domain, e.g., by adding sample values in the reconstructed residual block 213 and sample values in the prediction block 265 sample by sample.

Filtering

Loop filter unit 220 (or simply "loop filter" 220) is used to filter reconstructed block 215 to obtain filtered block 221, or generally, to filter reconstructed samples to obtain filtered samples. The loop filtering unit is used, for example, to smooth pixel transitions or otherwise improve video quality. Loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening filter, a smoothing filter, or a collaborative filter, or any combination thereof. Although loop filter unit 220 is shown in fig. 2 as an in-loop (in loop) filter, in other configurations, loop filter unit 220 may be implemented as a post-loop (post loop) filter. The filtered block 221 may also be referred to as a filtered reconstruction block 221.

Embodiments of video encoder 20 (loop filter unit 220, respectively) may be configured to output loop filter parameters (e.g., sample adaptive offset information), e.g., directly output or encoded via entropy coding unit 270, such that, e.g., decoder 30 may receive and apply the same loop filter parameters or respective loop filters for decoding.

Decoded picture buffer

Decoded Picture Buffer (DPB)230 may be a memory that stores reference pictures, or in general, reference picture data, for encoding video data by video encoder 20. DPB 230 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) (including Synchronous DRAM (SDRAM)), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. A Decoded Picture Buffer (DPB)230 may be used to store the one or more filtered blocks 221. The decoded picture buffer 230 may also be used to store other previously filtered blocks, e.g., previously reconstructed and filtered blocks 221, of the same current picture or of a different picture (e.g., a previously reconstructed picture), and may provide a complete previously reconstructed (i.e., decoded) picture (and corresponding reference blocks and samples) and/or a partially reconstructed current picture (and corresponding reference blocks and samples), e.g., for inter prediction. The Decoded Picture Buffer (DPB)230 may also be used to store one or more unfiltered reconstructed blocks 215, or in general, unfiltered reconstructed samples, e.g. if the reconstructed block 215 is not filtered by the loop filter unit 220, or any other further processed version of the reconstructed block or sample.

Mode selection (partition & prediction)

Mode selection unit 260 includes a partition unit 262, an inter prediction unit 244, and an intra prediction unit 254, and mode selection unit 260 is used to receive or obtain raw image data, such as raw block 203 (current block 203 of current image 17), and reconstructed image data, such as filtered and/or unfiltered reconstructed samples or blocks of the same (current) image and/or from one or more previously decoded images, such as from decoded image buffer 230 or other buffers (e.g., line buffers, not shown). The reconstructed image data is used as reference image data for prediction (e.g., inter prediction or intra prediction) to obtain a prediction block 265 or a prediction value 265.

The mode selection unit 260 may be used to determine or select a partition (including no partition) and a prediction mode (e.g., intra or inter prediction mode) for the current block prediction mode and generate a corresponding prediction block 265 for the calculation of the residual block 205 and reconstruction of the reconstructed block 215.

Embodiments of the mode selection unit 260 may be used to select partition and prediction modes (e.g., from those supported or available by the mode selection unit 260) that provide the best match or in other words the smallest residual (the smallest residual means better compression of transmission or storage), or the smallest signaling overhead (the smallest signaling overhead means better compression of transmission or storage), or both, which are considered or balanced. The mode selection unit 260 may be configured to determine the partition and the prediction mode based on Rate Distortion Optimization (RDO), i.e. to select the prediction mode that provides the smallest rate distortion. Terms such as "best," "minimum," "optimal," and the like, herein do not necessarily refer to "best," "minimum," "optimal," and the like as a whole, but may also refer to termination or satisfaction of selection criteria, such as values above or below a threshold or other constraints that may result in "sub-optimal selection" but reduce complexity and processing time.

In other words, the partition unit 262 may be used to partition the block 203 into smaller block partitions or sub-blocks (which again form blocks), e.g., iteratively using quad-tree partitions (QT), binary-tree partitions (BT), or triple-tree partitions (TT), or any combination thereof, and perform, e.g., prediction of each block partition or sub-block, where the pattern selection includes selection of the tree structure of the partitioned block 203 and the prediction pattern is applied to each block partition or sub-block.

In the following, the partitioning (e.g., by partition unit 260) and prediction processing (performed by inter prediction unit 244 and intra prediction unit 254) performed by example video encoder 20 will be explained in more detail.

Partitioning

The partition unit 262 may partition (or partition) the current block 203 into smaller partitions, such as smaller blocks of square or rectangular size. These smaller blocks (which may also be referred to as sub-blocks) may be further partitioned into even smaller partitions. This is also referred to as tree partitioning or hierarchical tree partitioning, wherein e.g. a root block at root tree level 0 (level 0, depth 0) may be progressively partitioned, e.g. into two or more blocks of the next lower tree level, e.g. nodes at tree level 1 (level 1, depth 1), wherein these blocks may again be partitioned into two or more blocks of the next lower level, e.g. tree level 2 (level 2, depth 2), etc., until the partitioning is terminated, e.g. because a termination criterion is met, e.g. a maximum tree depth or a minimum block size is reached. Blocks without further partitioning are also referred to as leaf blocks or leaf nodes of the tree. A tree divided into two partitions using partitions is called a binary-tree (BT), a tree divided into three partitions using partitions is called a ternary-tree (TT), and a tree divided into four partitions using partitions is called a quad-tree (QT).

As previously mentioned, the term "block" as used herein may be a portion of an image, in particular a square or rectangular portion. For example, referring to HEVC and VVC, a block may be or correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), and a Transform Unit (TU) and/or a corresponding block, such as a Coding Tree Block (CTB), a Coding Block (CB), a Transform Block (TB), or a Prediction Block (PB).

For example, a Coding Tree Unit (CTU) may be or include a CTB of luma samples, two corresponding CTBs of chroma samples of a picture having three arrays of samples, or a CTB of monochrome images or samples of a picture encoded using three separate color planes and syntax structures for encoding samples. Accordingly, for some values of N, a Coding Tree Block (CTB) may be a block of NxN samples, such that dividing a component into CTBs is a type of partition. A Coding Unit (CU) may be or comprise a coded block of luma samples, two corresponding coded blocks of chroma samples of a picture having three arrays of samples, or a coded block of samples of a picture being monochrome image or encoded using three separate color planes and syntax structures for encoding samples. Accordingly, for some values of M and N, a Coding Block (CB) may be a block of MxN samples, such that partitioning a CTB into coding blocks is a type of partition.

In an embodiment, for example, according to HEVC, a Coding Tree Unit (CTU) may be divided into CUs by using a quadtree structure represented as a coding tree. The decision whether to encode an image region using inter-image (temporal) prediction or intra-image (spatial) prediction is made at the CU level. Each CU may be further partitioned into one, two, or four PUs depending on the PU partition type. Within one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU partition type, the CU may be divided into Transform Units (TUs) according to another quadtree structure similar to a coding tree used for the CU.

In an embodiment, for example, according to the latest video coding standard in current development, which is called universal video coding (VVC), combined Quad-tree and binary tree (QTBT) partitioning is used for j partitioning the coded blocks, for example. In the QTBT block structure, a CU may be square or rectangular. For example, a Coding Tree Unit (CTU) is first partitioned by a quadtree structure. The leaf nodes of the quadtree are further partitioned by a binary tree or a trifurcated (or ternary) tree structure. The leaf node of a partition, called a Coding Unit (CU), is used for prediction and transform processing without any further partitioning. This means that CU, PU and TU have the same block size in the QTBT coding block structure. In parallel, multiple partitions (e.g., ternary tree partitions) may be used with the QTBT block structure.

In some embodiments, for example in the VVC draft standard, to facilitate processing pipelines in hardware with limited internal memory compared to CTUs, Virtual Pipeline Data Units (VPDUs) are defined. A VPDU is a uniform sub-block that virtually partitions a CTU into luma samples and corresponding chroma samples, with a particular processing order between partitions within the CTU, such that the processing of a given VPDU is not dependent on the processing of any other future VPDU in the processing order. However, some syntax elements can still be signaled in the bitstream at CTU level and will apply to all VPDUs of that CTU. Certain constraints may be imposed on the partitions to ensure that the coding units completely span one or more VPDUs and do not partially cover the VPDUs.

In one example, mode select unit 260 of video encoder 20 may be used to perform any combination of the partitioning techniques described herein.

As described above, video encoder 20 is used to determine or select a best or optimal prediction mode from a (e.g., predetermined) set of prediction modes. The set of prediction modes may include, for example, intra-prediction modes and/or inter-prediction modes.

Intra prediction

The set of intra prediction modes may comprise 35 different intra prediction modes, e.g. non-directional modes like DC (or mean) mode and planar mode, or directional modes (e.g. as defined by HEVC), or may comprise 67 different intra prediction modes, e.g. non-directional modes like DC (or mean) mode and planar mode, or directional modes (e.g. as defined by VVC).

The intra-prediction unit 254 is configured to generate the intra-prediction block 265 using reconstructed samples of neighboring blocks of the same current picture according to an intra-prediction mode in the set of intra-prediction modes.

Intra-prediction unit 254 (or, generally, mode selection unit 260) is also to output intra-prediction parameters (or, generally, information indicating the intra-prediction mode selected for the block) to entropy coding unit 270 in the form of syntax elements 266 for inclusion in encoded image data 21 so that, for example, video decoder 30 may receive and use the prediction parameters for decoding.

Inter prediction

The (or possible) set of inter prediction modes depends on the available reference picture (i.e. the previously at least partially decoded picture, e.g. stored in the DBP 230) and other inter prediction parameters, e.g. whether the entire reference picture or only a part of the reference picture (e.g. the search window area around the current block area) is used for searching for the best matching reference block, and/or e.g. whether pixel interpolation, e.g. half/half pixel (half/half-pel) and/or quarter pixel interpolation, is applied.

In addition to the prediction mode described above, a skip mode and/or a direct mode may be applied.

The inter prediction unit 244 may include a Motion Estimation (ME) unit and a Motion Compensation (MC) unit (neither shown in fig. 2). The motion estimation unit may be configured to receive or retrieve an image block 203 (a current image block 203 of a current image 17) and a decoded image 231, or at least one or more previously reconstructed blocks, e.g. reconstructed blocks of one or more other/different previously decoded images 231, for motion estimation. For example, the video sequence may include a current picture and a previously decoded picture 231, or in other words, the current picture and the previously decoded picture 231 may be part of or may form a sequence of pictures forming the video sequence.

The encoder 20 may for example be configured to select a reference block from a plurality of reference blocks of the same or different ones of a plurality of other images and to provide the motion estimation unit with the reference image (or reference image index) and/or an offset (spatial offset) between the position (x, y coordinates) of the reference block and the position of the current block as an inter prediction parameter. This offset is also called a Motion Vector (MV).

The motion compensation unit is configured to obtain (e.g., receive) inter-prediction parameters and perform inter-prediction based on or using the inter-prediction parameters to obtain an inter-prediction block 265. The motion compensation performed by the motion compensation unit may comprise taking or generating a prediction block based on a motion/block vector determined by motion estimation, possibly performing an interpolation to sub-pixel accuracy. Interpolation filtering may generate additional pixel samples from known pixel samples, thus potentially increasing the number of candidate prediction blocks that may be used to encode an image block. When receiving the motion vector of the PU of the current image block, the motion compensation unit may locate the prediction block to which the motion vector points in one of the reference picture lists.

Motion compensation unit may also generate syntax elements associated with the block and the video slice for use by video decoder 30 in decoding image blocks of the video slice. The tile groups and/or tiles and respective syntax elements may be generated or used in addition to or instead of slices and respective syntax elements.

Entropy coding

Entropy coding unit 270 is to apply, for example, an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a context adaptive VLC scheme (CAVLC), an arithmetic coding scheme, binarization, Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context-adaptive binary arithmetic coding (syntax-based) or entropy coding or another (no compression) method or technique to quantized coefficients 209, inter-prediction parameters, intra-prediction parameters, loop filter parameters, and/or other syntax elements to obtain encoded image data 21, which encoded image data 21 may be output via output 272, e.g., in the form of a coded bitstream 21, so that, for example, video decoder 30 may receive and use these parameters for decoding. The encoded bitstream 21 may be transmitted to the video decoder 30 or stored in memory for later transmission or retrieval by the video decoder 30.

Other structural variations of video encoder 20 may be used to encode the video stream. For example, for some blocks or frames, the non-transform based encoder 20 may quantize the residual signal directly without the transform processing unit 206. In another implementation, encoder 20 may combine quantization unit 208 and inverse quantization unit 210 into a single unit.

Decoder and decoding method

Fig. 3 shows an example of a video decoder 30 for implementing the techniques of the present application. The video decoder 30 is configured to receive encoded image data 21 (e.g., encoded bitstream 21), e.g., encoded by the encoder 20, to obtain a decoded image 331. The encoded image data or bitstream includes information for decoding the encoded image data, such as data representing image blocks and related syntax elements of an encoded video slice (and/or tile group or tile).

In the example of fig. 3, the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (e.g., an adder 314), a loop filter 320, a decoded picture buffer (DBP)330, a mode application unit 360, an inter prediction unit 344, and an intra prediction unit 354. The inter prediction unit 344 may be or may include a motion compensation unit. In some examples, video decoder 30 may perform a decoding process that is generally the inverse of the encoding process described with respect to video encoder 100 of fig. 2.

As explained with respect to encoder 20, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, Decoded Picture Buffer (DPB)230, inter prediction unit 344, and intra prediction unit 354 are also referred to as "built-in decoders" that form video encoder 20. Accordingly, the inverse quantization unit 310 may be functionally identical to the inverse quantization unit 110, the inverse transform processing unit 312 may be functionally identical to the inverse transform processing unit 212, the reconstruction unit 314 may be functionally identical to the reconstruction unit 214, the loop filter 320 may be functionally identical to the loop filter 220, and the decoded picture buffer 330 may be functionally identical to the decoded picture buffer 230. Accordingly, the explanations provided for the various units and functions of video encoder 20 apply accordingly to the various units and functions of video decoder 30.

Entropy decoding

Entropy decoding unit 304 is to parse bitstream 21 (or, in general, encoded image data 21) and perform, for example, entropy decoding on encoded image data 21 to obtain, for example, quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), e.g., any or all of inter-prediction parameters (such as reference image indices and motion vectors), intra-prediction parameters (such as intra-prediction modes or indices), transform parameters, quantization parameters, loop filter parameters, and/or other syntax elements. Entropy decoding unit 304 may be used to apply a decoding algorithm or scheme corresponding to the encoding scheme described with respect to entropy coding unit 270 of encoder 20. Entropy decoding unit 304 may also be used to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360, and to provide other parameters to other units of decoder 30. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level. In addition to or instead of slices and respective syntax elements, groups of tiles and/or tiles and respective syntax elements may be received and/or used.

Inverse quantization

Inverse quantization unit 310 may be used to receive Quantization Parameters (QPs) (or, in general, information related to inverse quantization) and quantized coefficients from encoded image data 21 (e.g., by parsing and/or decoding such as performed by entropy decoding unit 304) and apply inverse quantization to decoded quantized coefficients 309 based on the quantization parameters to obtain dequantized coefficients 311, which may also be referred to as transform coefficients 311. The inverse quantization process may include determining a degree of quantization using a quantization parameter determined by video encoder 20 for each video block in a video slice (or tile or group of tiles) and, likewise, a degree of inverse quantization that should be applied.

Inverse transformation

The inverse transform processing unit 312 may be configured to receive the dequantized coefficients 311, also referred to as transform coefficients 311, and to apply a transform to the dequantized coefficients 311 in order to obtain a reconstructed residual block 213 in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 313. The transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process. Inverse transform processing unit 312 may also be used to receive transform parameters or corresponding information from encoded image data 21 (e.g., by parsing and/or decoding such as performed by entropy decoding unit 304) to determine a transform to be applied to dequantized coefficients 311.

Reconstruction

The reconstruction unit 314 (e.g., an adder or adder 314) may be used to add the reconstructed residual block 313 to the prediction block 365 to obtain a reconstructed block 315 in the sample domain, e.g., by adding sample values in the reconstructed residual block 313 and sample values in the prediction block 365.

Filtering

Loop filter unit 320 (in or after the encoding loop) is used to filter reconstructed block 315 to obtain filtered block 321, e.g., to smooth pixel transitions, or otherwise improve video quality. Loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a Sample Adaptive Offset (SAO) filter, or one or more other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening filter, a smoothing filter, or a collaborative filter, or any combination thereof. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.

Decoded picture buffer

The decoded video blocks 321 of the picture are then stored in a decoded picture buffer 330, which stores the decoded picture 331 as a reference picture for subsequent motion compensation of other pictures and/or for separate output display.

Decoder 30 is operative to output (e.g., via output 312) decoded image 311 for presentation to or viewing by a user.

Prediction

Inter-prediction unit 344 may be functionally identical to inter-prediction unit 244 (in particular, a motion compensation unit), and intra-prediction unit 354 may be functionally identical to inter-prediction unit 254, and perform segmentation or partition decision and prediction based on partition and/or prediction parameters or respective information received from encoded image data 21 (e.g., received through parsing and/or decoding such as performed by entropy decoding unit 304). The mode application unit 360 may be used to perform prediction (intra or inter prediction) on a block-by-block basis based on the reconstructed image, block, or respective samples (filtered or unfiltered) to obtain a prediction block 365.

When the video slice is encoded as an intra-coded (I) slice, the intra prediction unit 354 of the mode application unit 360 is used to generate a prediction block 365 for an image block of the current video slice based on the signaled intra prediction mode and data from previously decoded blocks of the current image. When the video image is encoded as an inter-coded (i.e., B or P) slice, the inter prediction unit 344 (e.g., motion compensation unit) of the mode application unit 360 is used to generate a prediction block 365 for a video block of the current video slice based on the motion vectors and other syntax elements received from the entropy decoding unit 304. For inter prediction, the prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may use a default construction technique based on the reference pictures stored in DPB 330 to construct the reference frame lists, list 0 and list 1. The same or similar content may apply to embodiments using groups of tiles (e.g., video tiles) and/or tiles (e.g., video tiles), e.g., I, P or B groups of tiles and/or tiles may be used to encode video, in addition to or instead of slices (e.g., video slices).

The mode application unit 360 is used to determine prediction information of a video block of a current video slice by parsing motion vectors or related information and other syntax elements, and generate a prediction block of the current video block being decoded using the prediction information. For example, mode application unit 360 uses some received syntax elements to determine a prediction mode (e.g., intra or inter prediction) for encoding video blocks of a video slice, an inter prediction slice type (e.g., a B-slice, a P-slice, or a GPB-slice), construction information for one or more reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter prediction state for each inter-coded video block of the slice, and other information for decoding video blocks in the current video slice. The same or similar content may apply to embodiments using groups of tiles (e.g., video tiles) and/or tiles (e.g., video tiles), e.g., I, P or B groups of tiles and/or tiles may be used to encode video, in addition to or instead of slices (e.g., video slices).

The embodiment of video decoder 30 as shown in fig. 3 may be used for partitioning and/or decoding an image by using slices (also referred to as video slices), wherein the image may be partitioned into or decoded using one or more slices (typically non-overlapping), and each slice may include one or more blocks (e.g., CTUs).

The embodiment of video decoder 30 as shown in fig. 3 may be used for partitioning and/or decoding an image by using groups of tiles (also referred to as video tiles) and/or tiles (also referred to as video tiles), wherein an image may be partitioned into one or more groups of tiles (typically non-overlapping) or decoded using one or more groups of tiles (typically non-overlapping), and each group of tiles may comprise, for example, one or more blocks (e.g., CTUs) or one or more tiles, wherein each tile may be, for example, rectangular in shape and may comprise one or more blocks (e.g., CTUs), e.g., complete or partial blocks.

Other variations of video decoder 30 may be used to decode encoded image data 21. For example, decoder 30 may generate an output video stream without loop filtering unit 320. For example, for some blocks or frames, the non-transform based decoder 30 may inverse quantize the residual signal directly without the inverse transform processing unit 312. In another implementation, video decoder 30 may combine inverse quantization unit 310 and inverse transform processing unit 312 into a single unit.

It should be understood that in the encoder 20 and the decoder 30, the processing result of the current step may be further processed and then output to the next step. For example, after interpolation filtering, motion vector derivation, or loop filtering, further operations such as clipping (Clip) or shifting may be performed on the processing results of interpolation filtering, motion vector derivation, or loop filtering.

It should be noted that further operations may be applied to the derived motion vector for the current block (including but not limited to control point motion vectors for affine mode, sub-block motion vectors for affine, planar, ATMVP mode, temporal motion vectors, etc.). For example, the values of the motion vectors are constrained to a predefined range according to their representation bits. If the representation bit of the motion vector is bitDepth, the range is-2 ^ (bitDepth-1) to 2^ (bitDepth-1) -1, where "^" represents exponentiation. For example, if bitDepth is set equal to 16, the range is-32768 ~ 32767; if bitDepth is set equal to 18, the range is-131072-131071. For example, the values of the derived motion vectors (e.g., MVs of four 4 × 4 sub-blocks within an 8 × 8 block) are constrained such that the maximum difference between the integer parts of the four 4 × 4 sub-blocks MVs does not exceed N pixels, e.g., 1 pixel. Two methods of constraining motion vectors according to bitDepth are provided herein.

The method comprises the following steps: removal of overflow MSB (most significant bit) by streaming operation

ux＝(mvx+2^bitDepth)％2^bitDepth (1)

mvx＝(ux>＝2^bitDepth-1)？(ux-2^bitDepth)：ux (2)

uy＝(mvy+2^bitDepth)％2^bitDepth (3)

mvy＝(uy>＝2^bitDepth-1)？(uy-2^bitDepth)：uy (4)

Where mvx is the horizontal component of the motion vector of an image block or sub-block, mvy is the vertical component of the motion vector of an image block or sub-block, and ux and uy indicate intermediate values.

For example, if the value of mvx is-32769, the result value is 32767 after applying equations (1) and (2). In a computer system, decimal numbers are stored as two's complement. The two's complement of-32769 is 1, 0111, 1111, 1111, 1111(17 bits), and then the MSB is discarded, so the resulting two's complement is 0111, 1111, 1111, 1111 (decimal number is 32767), which is the same as the output of applying equations (1) and (2).

ux＝(mvpx+mvdx+2^bitDepth)％2^bitDepth (5)

mvx＝(ux>＝2^bitDepth-1)？(ux-2^bitDepth)：ux (6)

uy＝(mvpy+mvdy+2^bitDepth)％2^bitDepth (7)

mvy＝(uy>＝2^bitDepth-1)？(uy-2^bitDepth)：uy (8)

These operations may be applied during the summation of mvp and mvd as shown in equations (5) to (8).

The method 2 comprises the following steps: removing overflow MSBs by clipping values

vx＝Clip3(-2^bitDepth-1，2^bitDepth-1-1，vx)

vy＝Clip3(-2^bitDepth-1，2^bitDepth-1-1，vy)

Where vx is the horizontal component of the motion vector of the image block or sub-block and vy is the vertical component of the motion vector of the image block or sub-block; x, y, and z respectively correspond to three input values of the MV clipping process, and the function Clip3 is defined as follows:

fig. 4 is a schematic diagram of a video encoding device 400 according to an embodiment of the present disclosure. The video encoding device 400 is suitable for implementing the disclosed embodiments described herein. In one embodiment, video encoding device 400 may be a decoder, such as video decoder 30 of FIG. 1A, or an encoder, such as video encoder 20 of FIG. 1A.

The video encoding apparatus 400 includes: an input port 410 (or input port 410) and a receiver unit (Rx)420 for receiving data; a processor, logic unit, or Central Processing Unit (CPU) 430 to process data; a transmitter unit (Tx)440 and an egress port 450 (or output port 450) for transmitting data; and a memory 460 for storing data. The video encoding device 400 may also include optical-to-electrical (OE) and electrical-to-optical (EO) components coupled to the ingress port 410, the receiver unit 420, the transmitter unit 440, and the egress port 450 for egress or ingress of optical or electrical signals.

The processor 430 is implemented by hardware and software. Processor 430 may be implemented as one or more CPU chips, cores (e.g., multi-core processors), FPGAs, ASICs, and DSPs. Processor 430 is in communication with ingress port 410, receiver unit 420, transmitter unit 440, egress port 450, and memory 460. Processor 430 includes an encoding module 470. The encoding module 470 implements the embodiments disclosed above. For example, the encoding module 470 implements, processes, prepares, or provides various encoding operations. Thus, the inclusion of the encoding module 470 provides a substantial improvement in the functionality of the video encoding apparatus 400 and enables the transformation of the video encoding apparatus 400 into different states. Alternatively, the encoding module 470 is implemented as instructions stored in the memory 460 and executed by the processor 430.

The memory 460 may include one or more disks, tape drives, and solid state drives, and may serve as an over-flow data storage device to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. The memory 460 may be, for example, volatile and/or non-volatile, and may be read-only memory (ROM), Random Access Memory (RAM), ternary content-addressable memory (TCAM), and/or Static Random Access Memory (SRAM).

Fig. 5 is a simplified block diagram of an apparatus 500 that may be used as either or both of source device 12 and destination device 14 of fig. 1, according to an example embodiment.

The processor 502 in the apparatus 500 may be a central processing unit. Alternatively, processor 502 may be any other type of device or devices capable of manipulating or processing information now existing or later developed. Although the disclosed implementations may be implemented with a single processor as shown, for example, processor 502, speed and efficiency advantages may be achieved using more than one processor.

In one implementation, the memory 504 in the apparatus 500 may be a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of storage device may be used for memory 504. The memory 504 may include code and data 506 that are accessed by the processor 502 using a bus 512. The memory 504 may further include an operating system 508 and application programs 510, the application programs 510 including at least one program that allows the processor 502 to perform the methods described herein. For example, application 510 may include application 1 through application N, which also include video coding applications that perform the methods described herein.

The apparatus 500 may also include one or more output devices, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines the display with a touch-sensitive element operable to sense touch input. A display 518 may be coupled to the processor 502 by the bus 512.

Although depicted here as a single bus, the bus 512 of the apparatus 500 may be comprised of multiple buses. Further, the secondary memory 514 may be directly coupled to other components of the apparatus 500 or may be accessed via a network and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, the apparatus 500 may be implemented in a wide variety of configurations.

Motion Vector Refinement (MVR)

The motion vectors are typically determined at least partially at the encoder side and signaled to the decoder within the encoded bitstream. However, it is also possible to refine the motion vectors at the decoder (and also at the encoder) starting from the initial motion vectors indicated in the bitstream. In this case, for example, the similarity between already decoded pixel slices (patch) pointed to by the initial motion vector may be used to improve the accuracy of the initial motion vector. This motion refinement provides the advantage of reduced signaling overhead: the accuracy of the initial motion is improved in the same way at both the encoder and decoder, and therefore no additional signaling is needed for refinement.

Note that the initial motion vector before refinement may not be the best motion vector that yields the best prediction. Since the initial motion vector is signaled in the bitstream, it is not possible to represent the initial motion vector with very high accuracy (which would increase the bitrate), and therefore the initial motion vector is improved by means of a motion vector refinement procedure. For example, the initial motion vector may be a motion vector used to predict a neighboring block of the current block. In this case, it is sufficient to signal in the bitstream the following indication: a motion vector indicating which neighboring block the current block uses. This prediction mechanism is very effective in reducing the number of bits representing the initial motion vector. However, the accuracy of the initial motion vector may be low, because in general the motion vectors of two neighboring blocks are not expected to be the same.

To further improve the accuracy of the motion vectors without further increasing the signaling overhead, it may be beneficial to further refine the motion vectors derived at the encoder side and provided (signaled) in the bitstream. Motion vector refinement can be performed at the decoder without the aid of the encoder. The encoder in its decoder loop may employ the same refinement as would be available at the decoder to obtain a corresponding refined motion vector. Refinement of a current block being reconstructed in a current image is performed by determining a template of reconstructed samples, determining a search space around initial motion information of the current block, and finding a reference image portion in the search space that best matches the template. The best match component determines a refined motion vector for the current block, which is then used to obtain inter prediction samples for the current block (i.e., the current block being reconstructed).

The motion vector refinement is part of the inter prediction unit (244) in fig. 2 and 344 in fig. 3.

Motion vector refinement may be performed according to the following steps:

typically, the initial motion vector may be determined based on an indication in the bitstream. For example, an index may be signaled in the bitstream, the index indicating a position in the candidate motion vector list. In another example, the motion vector predictor index and the motion vector difference value may be signaled in a bitstream. A motion vector determined based on the indication in the bitstream is defined as an initial motion vector. In the case of bi-prediction, where inter-prediction of the current block is obtained as a weighted combination of prediction blocks of samples determined from two motion vectors, let the initial motion vector in the first reference picture in list L0 be MV0 and the initial motion vector in the second reference picture in list L1 be MV 1.

Using the initial motion vector, a pair of refinement candidate motion vectors is determined. At least two refinement candidate pairs need to be determined. Typically, the refinement candidate motion vector pair is determined based on the initial motion vector pair (MV0, MV 1). Furthermore, candidate MV pairs are determined by adding small motion vector differences to MV0 and MV 1. For example, the candidate MV pairs may include the following:

·(MV0，MV1)

·(MV0+(0，1)，MV1+(0，-1))

·(MV0+(1，0)，MV1+(-1，0))

·(MV0+(0，-1)，MV1+(0，1))

·(MV0+(-1，0)，MV1+(1，0))

·…

where (1, -1) represents a vector displaced by 1 in the horizontal (or x) direction and-1 in the vertical (or y) direction.

Note that the above list of candidate pairs is merely an example for explanation, and the present invention is not limited to a specific candidate list.

The refinement candidate Motion Vector (MV) pair forms a search space for the motion vector refinement process.

In the bi-prediction of the current block, the two prediction blocks obtained using the respective first motion vector of list L0 and the second motion vector of list L1 are combined into a single prediction signal, which may provide better adaptation to the original signal, resulting in less residual information and possibly more efficient compression, than uni-prediction.

In motion vector refinement, for each refinement candidate MV pair, two prediction blocks obtained using the first motion vector and the second motion vector of the candidate MV pair, respectively, are compared based on a similarity metric. The candidate MV pair that yields the highest similarity is usually selected as the refined motion vector. The refined motion vectors in the first reference picture in list L0 and the refined motion vectors in the second reference picture in list L1 are denoted as MV0 'and MV 1', respectively. In other words, predictions corresponding to the list L0 motion vector and the list L1 motion vector in the candidate motion vector pair are obtained and then compared based on the similarity metric. The candidate motion vector pair with the highest associated similarity is selected as the refined MV pair.

Typically, the output of the refinement process is a refined MV. The refined MVs may be the same as the initial MVs or may be different from the initial MVs, depending on which candidate MV pair achieves the highest similarity, the candidate MV pair formed by the initial MVs is also among the MV pair candidates. In other words, if the highest candidate MV pair that achieves the highest similarity is formed from the initial MV, the refined MV and the initial MV are equal to each other.

Instead of selecting the location that maximizes the similarity measure, another approach is to select the location that minimizes the dissimilarity measure. The dissimilarity comparison measure may be SAD (sum of absolute difference), MRSAD (mean removal absolute difference), SSE (sum of squared error), etc. The SAD between two prediction blocks can be obtained using a pair of candidate MVs (CMV0, CMV1), which can be calculated as follows:

where nCbH and nCbW are the height and width of the prediction block, function abs (a) specifies the absolute value of argument a, predSAmplesL0 and predSAmplesL1 are prediction block samples obtained from pairs of candidate MVs noted (CMV0, CMV 1).

Alternatively, the dissimilarity comparison measure may be obtained by evaluating only a subset of samples in the prediction block in order to reduce the amount of computations. The following is an example in which sample lines are alternately included in the SAD calculation (every other line is evaluated).

An example of motion vector refinement is explained in the file JFET-M1001-v 3 for JFET (of ITU-T SG 16WP 3 and ISO/IEC JTC 1/SC 29/WG 11), general video coding (draft 4), published in http:// phenix. The motion vector refinement is illustrated in the section "8.4.3 decoder side motion vector refinement procedure" in the file.

To reduce the internal memory requirements of the refinement, in some embodiments, the motion vector refinement process may be performed independently on blocks of luma samples obtained by partitioning blocks of encoded samples in luma samples that exceed some predetermined width or height into sub-blocks of samples having a luma less than or equal to the predetermined width and height. The refined MV pairs for each sub-block within the partition coded block may be different. Then, inter prediction of luminance and chrominance is performed on each sub-block using the refined MV pair of the sub-block.

Given max _ sb _ width and max _ sb _ height, which indicate the maximum allowed sub-block width and height, respectively, a current coding unit of size cbwidth × cbheight, which is eligible for applying MVR, is typically partitioned into a number of sub-blocks numSbs, each of which is sbwidth × sbheight (sub-block width × sub-block height), as follows:

numSbs＝numSbX*numSbY，

numSbX＝(cbWidth>max_sb_width)？(cbWidth/max_sb_width)：1

numSbY＝(cbHeight>max_sb_height)？(cbHeight/max_sb_height)：1

sbWidth＝(cbWidth>max_sb_width)？max_sb_width：cbWidth

sbHeight＝(cbHeight>max_sb_height)？max_sb_height：cbHeight。

each MV in the initial MV pair may have fractional pixel precision. In other words, the MV indicates the displacement between the current block of samples and the resampled reference area, and the displacement may point from the integer grid of reconstructed reference samples to fractional positions in the horizontal and vertical directions. Typically, a two-dimensional interpolation of the reconstructed reference integer sample grid values is performed to obtain sample values at fractional sample offset positions. The process of obtaining prediction samples from the reconstructed reference image using the candidate MV pairs may be performed by one of the following methods:

rounding the fractional part of the initial MV pair to the nearest integer position and obtaining the integer grid values of the reconstructed reference image.

Perform 2-tap (e.g., bilinear) separable bilinear interpolation to obtain the prediction sample values at the fractional pixel precision indicated by the initial MV pair.

Perform higher tap (e.g., 8-tap or 6-tap) separable interpolation to obtain the predicted sample values at the initial MV for the indicated fractional pixel precision.

Although the candidate MV pairs may have arbitrary sub-pixel offsets relative to the initial MV pair, in some embodiments, the candidate MV pairs are selected at integer-pixel distances relative to the initial MV pair for simplicity of the search. In this case, the prediction samples over all candidate MV pairs may be obtained by performing prediction on the sample blocks around the initial MV pair to cover all refinement positions around the initial MV pair.

In some embodiments, once the dissimilarity cost values of all candidate MV pairs at integer distances from the initial MV have been evaluated, additional candidate MV pairs at sub-pixel distance offsets from the best cost value location are added. A prediction sample is obtained for each of these locations using one of the methods described above, and the dissimilarity costs are evaluated and compared to obtain the lowest dissimilarity location. In certain other embodiments, to avoid such computationally expensive prediction processes for each sub-pixel distance position around the best-cost integer distance position, the estimated integer distance cost value is remembered, and a parametric error surface is fitted around the best integer distance position. The minimum of the error surface is then analytically computed and used as the location with the least dissimilarity. In this case, the dissimilarity cost value is said to be derived from the calculated integer distance cost values.

For a given coded block of samples, the application of motion vector refinement may be conditioned on certain coding properties of the coded block of samples. Some examples of such encoding properties may be:

the distances in number of pictures (when sampled at a uniform frame rate) from the current picture to the two reference pictures used for bi-directional prediction of the block of coded samples are equal and located on opposite sides of the current picture.

The initial dissimilarity between the two prediction blocks obtained using the initial MV pair is less than a predetermined per-sample threshold.

Bi-directional predictive optical flow refinement

Bi-directional predicted optical flow refinement is a process that improves the bi-directional prediction accuracy of a block, without requiring anything in the bitstream to be explicitly signaled other than those normally signaled for bi-directional prediction. Which is part of the inter prediction unit (244) in fig. 2 and 344 in fig. 3.

In bi-directional prediction, two inter predictions are obtained from two motion vectors, after which the predictions are combined by applying a weighted average. When quantization noise in two reference slices is cancelled, combined prediction may result in a reduction of residual energy, providing higher coding efficiency compared to uni-prediction. The weighted combination in bi-directional prediction can be done by the following equation:

bi-directional prediction 1W 1+ prediction 2W 2+ K,

where W1 and W2 are weighting factors that may be signaled or may be predefined. K is an additive factor, which may also be signaled or predefined. For example, bi-directional prediction may be obtained by

Bi-directional prediction ═(prediction 1+ prediction 2)/2,

where W1 and W2 are set to 1/2 and K is set to 0.

The goal of optical flow refinement is to improve the accuracy of bi-directional prediction. Optical flow is a pattern of apparent motion of an image object between two consecutive frames caused by motion of the object or camera. The optical flow refinement process improves the accuracy of the bi-directional prediction by applying optical flow equations (solutions of optical flow equations).

Consider pixel I (x, y, t) in the first frame (x and y correspond to spatial coordinates and t corresponds to the time dimension). It moves a distance (dx, dy) in the next frame after dt times. Since these pixels are identical and do not vary in intensity, the optical flow equation is given by:

I(x，y，t)＝I(x+v_x，y+v_y，t+dt)

i (x, y, t) specifies the intensity (sample value) of the pixel at the (x, y, t) coordinate. Assuming small displacements and negligible higher order terms in the taylor series expansion, the optical flow equation can also be written as:

whereinAndare the horizontal and vertical spatial sample gradients at position (x, y),is the time partial derivative at position (x, y).

Optical flow refinement exploits the above principles to improve the quality of bi-directional prediction.

The implementation of optical flow refinement typically includes the following steps:

1. the sample gradient is calculated.

2. A difference between the first prediction and the second prediction is calculated.

3. The displacement of the pixel or group of pixels is calculated such that the error between the two reference slices obtained using the optical flow equations is minimal.

Wherein I⁽⁰⁾Corresponding to the sample values in the first prediction, I⁽¹⁾Is the sample value in the second prediction, v_xAnd v_yIs the calculated displacement in the-x and-y directions,andis the gradient in the-x and-y directions. Tau is₁And τ₀Indicating the distance to the reference image, wherein the first prediction and the second prediction are obtained. Some methods minimize the sum of squared errors, and some methods minimize the sum of absolute errors. The minimization problem is solved by using sample patches around a given position (x, y).

4. A specific implementation of the optical flow equation is used, for example as follows:

therein, pred_BIOA modified prediction is specified, which is the output of the optical flow refinement process.

The sample gradient can be obtained by the following formula

In some embodiments, to simplify the complexity of estimating the displacement of each pixel, the displacement is estimated for a group of pixels. In some examples, to compute improved bi-prediction for a 4 × 4 luma sample block, the displacement is estimated using sample values of an 8 × 8 luma sample block, where the 4 × 4 sample block is located at its center.

The input to the optical flow refinement process is the predicted samples from two reference images and the output of the optical flow refinement is the combined prediction (predi) calculated from the optical flow equation.

An example of motion vector refinement is explained in section 8.4.7.4 "bi-directional optical flow prediction process" of general video coding (draft 4) of the jvt-M1001 document.

Some methods use a decimated set of prediction samples to compute the matching cost in the motion vector refinement process, and perform a first prediction to obtain prediction samples for all locations in the search or refinement region. Then, for the candidate block corresponding to the candidate MV pair, a decimated set of samples within the candidate block is used to evaluate the matching cost. While decimating the match cost estimate reduces the complexity of the match cost calculation, the complexity associated with generating the prediction sample values using an M-tap filter is not reduced. The complexity of DMVR is affected by both the prediction process to obtain the prediction sample values for refinement and the matching cost evaluation. For a single-sided vertical refinement of Sy, at least (2 × Sy +1) line buffers of predicted sample values need to be kept to perform cost calculations for all search locations. Each line buffer is (sbWidth +2 Sx) samples wide, where Sx is the single-sided horizontal refinement range.

Some approaches attempt to extract the predicted sample locations in the vertical direction by extracting candidate search locations in the vertical direction such that all candidate search locations fall on a uniform vertical offset. This process adversely affects the compression efficiency of the MVR implementation. Although the parametric error surface is used to obtain the minimum matching cost position in the vertical direction between these uniform vertical offset positions, the compression efficiency penalty is still high compared to using all candidate search positions.

Therefore, there is a need for a method that can reduce the complexity of the prediction process and also reduce the linebuffer requirements. The impact of this method on the compression efficiency gain should also be minimal compared to a method that does not use any decimation. The present invention reduces the complexity associated with the prediction process by generating prediction sample values only at the set of decimated positions, and also reduces line buffer storage requirements.

Embodiments of the present invention generate a first set of prediction sample values at a decimated set of locations within a refinement/search region during a motion vector refinement process for obtaining a refined motion vector for inter-bi-prediction of a current sub-block within a current coding unit in a current picture. Then, using the prediction sample values at the extracted set of locations within the search area, a matching cost evaluation of the offsets around the initial MV pair corresponding to different candidate MV pairs is performed.

In one exemplary embodiment of the invention, the set of extracted locations within the search area correspond to alternating rows of sample locations within the search area. When calculating the matching cost for candidate MV pairs with even vertical MV offset values, the corresponding candidate block of prediction samples in each reference is identified. The set of decimated locations at which the predicted sample values are computed will fall on a given odd or even row (odd or even) within the candidate block. When calculating the matching cost for candidate MV pairs with odd vertical MV offset values, the opposite parity rows within the candidate block will be used. With this arrangement for match cost evaluation, the match cost evaluation requires a search for half the number of rows of prediction sample values in the region, independent of the vertical MV offset value. This method takes advantage of the fact that the offset in one reference is reflected in the other reference (i.e., equal in magnitude but opposite in sign). Thus, even though the decimated candidate blocks for odd vertical MV offsets may be the same as the even vertical MV offsets in one reference, the corresponding blocks in the other reference will not be the same. Thus, the decimation matching cost evaluated for even and odd vertical MV shifts is useful for performing refinement while avoiding the computation of prediction sample values at half the number of rows in the search area.

In some embodiments, an expanded set of prediction samples is generated around the selected search area. These extended sets of prediction samples are used for matching cost evaluation. Samples within the extended search area are also generated for the extracted set of locations, and the matching cost evaluation uses an extended block (i.e., a block with more samples than the number of samples in the current sub-block within the current coded block).

Embodiments of the present invention may reduce the complexity associated with generating prediction samples for computing matching cost values. Some embodiments may reduce the matching cost complexity while having minimal impact on coding efficiency.

The brightness width of the current sub-block which is suitable for the motion vector refinement at the decoder side in the current coding unit in the current image is sbWidth, and the brightness height is sbHeight. According to the first exemplary embodiment of the present invention, the step for inter bidirectional prediction of the current coding block includes the steps of:

step 0: an initial MV pair (MV0, MV1) is obtained as a starting point for refinement with respect to a pair of reference pictures.

Step 1: a first set of candidate sample positions in each reference image is obtained, wherein the first set of candidate sample positions falls within a bounding rectangular region calculated using the initial MV pair (MV0, MV1), the top left position (x, y) of the current sub-block in the current image, the single-sided motion vector refinement range Sx in the horizontal direction, and the single-sided motion vector refinement range Sy in the vertical direction.

Step 2: performing a first prediction using the initial MV pair (MV0, MV1) and the reconstructed reference luma samples of the reference image pair to obtain predicted sample values at a subset of positions within the first set of candidate sample positions, wherein the subset of sample positions is obtained by a regular decimation of the first set of candidate sample positions.

And step 3: for each candidate MV pair (CMV0, CMV1) within the MV refinement range,

-determining a rectangular region within the first set of candidate sample positions in each reference based on the MV offset between the candidate MV and the initial MV in each reference;

-calculating a matching cost value using a subset of the prediction sample values obtained in each reference, the subset falling within the rectangular area determined for that reference.

And 4, step 4: based on the calculated matching cost values, refined MV pairs (MV0 ', MV 1') for the sub-blocks are determined.

And 5: performing second inter prediction based on the refined pair of MVs (MV0 ', MV 1') to obtain second prediction samples from each reconstructed reference picture, and performing bi-prediction using the second prediction samples.

In one embodiment, the steps are explained as follows:

in step 0, two initial motion vectors are obtained as input. The initial motion vector may be determined based on indication information in the bitstream. For example, an index may be signaled in the bitstream, the index indicating a position in the candidate motion vector list. In another example, the motion vector predictor index and the motion vector difference value may be signaled in a bitstream. A motion vector determined based on the indication information in the bitstream is defined as an initial motion vector.

In another example, a reference picture indication may be obtained from the bitstream, the initial motion vector being obtained based on the reference picture indication. The reference picture indicates a reference picture used for determining the reference picture pointed to by the initial motion vector.

Step 1 to step 4 correspond to the motion vector refinement procedure as explained in the above example. The initial motion vector is refined according to the motion vector refinement. In one example, the matching cost is a dissimilarity measure used in the motion vector refinement process.

In step 1, given the horizontal one-sided refinement range Sx and the vertical one-sided refinement range Sy, the upper left corner luma position of the current sub-block within the current image, and the initial MV pair obtained in step 0, a rectangular block is determined in the reference image. The set of sample positions within the rectangular block forms a first set of candidate positions at which a first set of prediction samples needs to be generated.

According to step 2, a first prediction is performed corresponding to initial motion vectors at a subset of positions within the first set of prediction samples. In one example, there are at least 2 pairs of candidate motion vectors in the motion vector refinement process, one pair typically being formed by the initial motion vectors (MV0, MV 1). In other words, the set of candidate motion vectors typically comprises more than one pair, where a pair is usually (MV0, MV 1). Another pair of candidate motion vectors is based on (MV0, MV1) and determined by adding small perturbations to the motion vectors (as explained in the example above).

In step 2, a first prediction corresponding to each pair of candidate motion vectors is performed based on the M-tap interpolation filter. As an example, one prediction sample corresponding to MV0 may be obtained for a subset of a first set of candidate sample positions in a reference picture (a picture that has been encoded in an encoder or decoded in a decoder), where a block is pointed to by MV 0. An interpolation filter is then applied to the samples within the block to which MV0 points. To provide more accurate motion estimation, the resolution of the reference image may be increased by interpolating samples between pixels. Fractional pixel interpolation may be performed by weighted averaging of the nearest pixels. Here, the M-tap filter may typically be a 2, 4, 6 or 8-tap filter (not limited to these options), which means that the filter has M multiplication coefficients. The prediction corresponding to MV1 may similarly be obtained using reconstructed reference samples in another reference picture. In one example, a subset of locations within the first set of candidate sample locations is selected as sample locations on alternating rows. This is shown in fig. 6 and 8A. In fig. 7, no extension sample is used. The filled circle corresponds to the sample position at which the predicted value is obtained. Unfilled circles correspond to sample positions for which no prediction value needs to be calculated. In this example, the size of rectangular block 710 is (sbWidth +2 Sx) x (sbHeight +2 Sy), where sub-block 720 has a size sbWidth x sbHeight. The number of prediction samples produced is half this value. Fig. 8A shows the case where extended samples are used, the horizontal one-sided extension being Ox and the vertical one-sided extension being Oy. The size of the rectangular block in this example is (sbWidth +2 Sx +2 Ox) x (sbHeight +2 Sy +2 Oy). Also, the first prediction sample value at the fill-in position needs to be generated without the need to generate the other half of the first prediction sample value.

In one example, with regular vertical decimation by a factor of 2 (with or without extended samples) and using an M-2 tap interpolation filter to obtain the predicted sample values at fractional positions, the interpolation filtering process is performed in the following manner.

Integer MV sections (IMV0, IMV1) and fractional MV sections (FMV0, FMV1) are obtained from the initial MV pair (MV0, MV 1). To obtain prediction sample values at fractional positions from an integer grid of reconstructed reference luma sample values in a refinement range in a reference image, the following steps are applied:

-if the fractional position is zero in both dimensions, obtaining a prediction sample value for the search area using the integer sample values.

-if the fractional position has a non-zero horizontal value fracx and a zero vertical value, the predicted sample value for each fractional sample position in the search area is obtained as follows:

((FILTSUMH–fracx)*a+fracx*b+r)>>(FILT_QH)，

where a and b are integer grid sample values to the left and right of the current fractional sample position, FILTSHMH corresponds to the sum of all filter taps, which is typically chosen to be a power of 2 and based on the fractional pixel precision associated with fracx, r represents the rounding offset, FILT _ QH corresponds to log₂(FILT_SUMH)。

-if the fractional position corresponds to a zero horizontal value and a non-zero vertical value fracy, the predicted sample value for each fractional sample position in the search area is obtained as follows:

((FILTSUMV–fracx)*a+fracx*b+r)>>(FILT_QV)，

where a and b are integer grid sample values above and below the current fractional sample position, FILTSUMV corresponds to the sum of all filter taps, which is typically chosen to be a power of 2 and based on the fractional image associated with fracyPrime precision, r stands for rounding offset, FILT _ QV corresponds to log₂(FILT_SUMV)。

-if the fractional positions correspond to non-zero values fracx and fracy in the horizontal and vertical directions, respectively, the predicted sample value for each fractional sample position in the search area is obtained as follows:

((FILTSUMH–fracx)*(FILTSUMV–fracy)*a+fracx*(FILTSUMV–fracy)*b+(FILTSUMH–fracx)*fracy*c+fracx*fracy*d+r)>>(FILT _ QHV) where a, b, c, d are integer grid sample values top left, top right, bottom left, bottom right, respectively, to the current fractional sample position, FILTSHMH corresponds to a sum of horizontal filter taps, typically selected to a power of 2 and based on fractional pixel precision associated with fracx, FILTSMH corresponds to a sum of vertical filter taps, typically selected to a power of 2 and based on fractional pixel precision associated with fracy, r is the rounding offset, and FILT _ QHV is log₂(FILT_SUMV)+log₂(FILT_SUMV)。

In particular, in the above example, 2-D separable interpolation is performed in the horizontal and vertical directions without any intermediate rounding and right shifting after the horizontal interpolation. In one example, when the integer sample bit depth is 10 bits and the fractional MV precision is at 1/16 of a pixel position, the fractional MV precision is reduced to 1/8 of the pixel position before performing interpolation so that the middle value remains in the unsigned 16 bit range before shifting to the right.

In another example, a subset of locations within the first set of candidate sample locations is selected in the following manner. Even parity rows (i.e., rows with even indices) produce first prediction samples at a predetermined number of consecutive sample positions that are aligned left (or right). Odd parity rows produce predicted samples at a predetermined number of consecutive sample positions that are right aligned (or left aligned). The predetermined number of consecutive sample positions is typically chosen to be an even number less than the width of the tile. In some embodiments, the predetermined number of consecutive sample positions is selected to be single instruction multiple data friendly (e.g., a multiple of 4, 8, 16). This example is shown in fig. 10A. Such extracted sample locations are hereinafter referred to as zipper pattern extractions.

In step 3, a matching cost associated with each pair of candidate motion vectors is determined from the first prediction.

Fig. 7A, 7B, 8C and 10B, 10C provide examples of sample positions from the predicted sample block in step 2, which are used to evaluate the matching cost for a given candidate MV pair.

In the example of fig. 7A, 7B corresponding to vertical decimation without extension and a factor of 2, the matching cost block size of the cost blocks 730, 740 is sbWidth × sbHeight, where alternating rows of sample positions do not participate in the matching cost evaluation, since the prediction sample values are not available at these positions. The sample locations for which the sample cost values are available are used to evaluate the matching cost. Thus, in the case where the subset of sample positions corresponds to a vertical decimation by a factor of 2, half the samples are used for the matching cost evaluation. This is shown in fig. 7A and 7B.

For the case of a search region with expanded samples, fig. 8B and 8C show the set of sample positions that contribute to the matching cost evaluation for a given candidate MV pair, and show rectangular block 810 and sub-block 820. Fig. 10B and 10C illustrate the case of using zipper pattern extraction and describe additional use cases with block spreading.

In the embodiment shown in fig. 8A, 8B, 8C, lines 0, 1, 2, 3, 4, 5, 6 … … need to be considered as lines of interpolated reference samples according to the present invention. For example, the result of taking every other row is 0, 2, 4, 6 as shown by the filled pixels in FIG. 8A. Fig. 8A shows interpolated samples for refinement, an example decimation factor of 2, where filled samples are computed without computing unfilled samples. In the case where the vertical search position consists of even rows, e.g., {2, 4, 6}, we need to consider fig. 8B, which shows samples used to compute the matching cost for candidate MV pairs with even vertical displacement. Thus, the 2 sets in this example need to be 2, 4, 6 and 6, 4, 2. In the case where the vertical search position consists of odd rows, e.g., {3, 5}, we need to consider FIG. 8C, which shows samples used to compute the matching cost for candidate MV pairs with odd vertical shifts. Thus, the 2 sets in this example are {3, 5} and {5, 3 }.

According to step 3, at least one matching cost (e.g. similarity measure) corresponding to one of the refined candidate Motion Vector (MV) pairs is obtained. The higher the similarity between two prediction blocks, the smaller the matching cost.

The matching cost may be a measure such as sum of absolute differences, mean-divided sum of absolute differences, and sum of squared differences.

The matching cost value is used for the refinement of the initial motion vector in step 4. In one example, the refined MV pair (MV0 ', MV 1') is selected as the candidate MV pair with the highest similarity (i.e., the smallest matching cost). In another example, matching cost values are used to fit a parametric equation to the cost values near the space where the true sub-pixel locations exactly refine the motion vector pairs. Sub-pixel accurate refined MV pairs are determined by solving the unknowns of the parametric equations using the location and estimated matching cost values.

In step 5, a second prediction is obtained from the refined motion vector and the K-tap interpolation filter. In case of two refined motion vectors (MV0 'and MV 1'), which is the case of bi-prediction, two second predictions are obtained.

The second prediction is obtained by applying a second interpolation filter (K-tap filter), which may or may not be the same as the first interpolation filter (M-tap filter). Similar to the first prediction, a second prediction is obtained by applying a second interpolation filter and from the blocks in the reference image pointed to by MV0 'and MV 1'.

This embodiment is illustrated by the flow chart in fig. 9. Block 910 corresponds to step 0, where an initial MV pair is obtained for a pair of reference pictures for the current sub-block in the current coding unit. Block 920 corresponds to step 1, where a first set of candidate sample positions is obtained using the initial MV pair, the top left luma sample position of the current sub-block in the current image, the horizontal and vertical single-sided refinement range values Sx and Sy, and the desired block expansions Ox and Oy in the horizontal and vertical directions, respectively. Block 930 corresponds to step 2, where a first set of prediction samples is generated from reconstructed reference luma samples in two reference pictures corresponding to initial MV pairs at a subset of sample positions within the first set of candidate sample positions. The subset of sample locations is obtained by decimation of the first set of candidate sample locations. Block 940 corresponds to step 3, where the predicted samples relative to the current sub-block size are used to calculate a matching cost value for each candidate MV pair used by the MVR using sample values at a subset of sample positions within the first set of candidate sample positions. Block 950 corresponds to step 4, where in block 840 the refined MV pairs are determined based on the matching cost values calculated for each candidate MV pair. Block 860 corresponds to step 5, where the refined MV pairs are used to generate a second set of prediction samples, and these samples are combined by a bi-directional prediction process.

Embodiments of the present invention reduce the complexity of the process of generating prediction samples for interpolation. The proposed invention requires the generation of a reduced set of prediction samples. These embodiments also reduce line buffer requirements. For vertical single-sided refinement of the extent Sy and vertical decimation by a factor of 2, (Sy +1) line buffers are sufficient, instead of the (2 × Sy +1) line buffers previously required. When horizontal spreading is not used, there are (sbWidth +2 Sx) samples per line buffer. If a single-sided horizontal extension of Ox samples is used, there will be (sbWidth +2 Sx +2 Ox) samples per line buffer. Meanwhile, by modifying the decimation pattern within the candidate block to even parity rows for even vertical displacement and odd parity rows for odd displacement, the impact on coding gain is minimal (e.g., 0.02% BDRATE reduction).

According to an embodiment of the present disclosure, as shown in fig. 11, the method according to the first aspect comprises the steps of: (1110) acquiring an initial motion vector and a reference image for bidirectional prediction; (1120) obtaining a set of candidate sample positions in the reference image from the initial motion vector and the candidate motion vectors, wherein each candidate motion vector is derived from the initial motion vector and each preset motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector; (1130) obtaining each sample position set from each candidate sample position set; (1140) calculating a matching cost for each candidate motion vector within each set of sample positions; (1150) obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and (1160) obtaining a prediction value of the current block based on the refined motion vector.

Fig. 12 illustrates an embodiment of an inter prediction unit 1200 according to the present disclosure. The inter prediction unit 1200 may be the inter prediction unit 244 in fig. 2 and/or the inter prediction unit 344 in fig. 3. The inter prediction unit 1200 includes a motion vector refinement means 1210. The motion vector refinement means 1210 is configured to: acquiring an initial motion vector and a reference image for bidirectional prediction; obtaining a set of candidate sample positions in the reference image from the initial motion vector and the candidate motion vectors, wherein each candidate motion vector is derived from the initial motion vector and each preset motion vector offset, and wherein each set of candidate sample positions corresponds to each candidate motion vector; obtaining each sample position set from each candidate sample position set; calculating a matching cost for each candidate motion vector within each set of sample positions; obtaining a refined motion vector based on the calculated matching cost of each candidate motion vector; and obtaining a prediction value of the current block based on the refined motion vector.

The present invention provides the following further embodiments:

a method for obtaining prediction samples to be used in inter prediction of a current block of a video image (or frame), comprising: obtaining an initial motion vector; obtaining a first set of candidate sample positions in a reference image (or frame) from said initial motion vector and a plurality of candidate motion vectors, which are MV offsets to the initial motion vector (defining a search area); obtaining a first prediction sample value at a subset of positions in a first set of candidate sample positions from the initial motion vector and reconstructed luma samples of the reference image (or frame); calculating the matching cost value of each candidate motion vector according to the obtained prediction sample value and the motion vector offset between the candidate motion vector and the initial motion vector; acquiring a refined motion vector based on the calculated matching cost value of each candidate motion vector; a second prediction sample of the current block is obtained based on the refined motion vector.

In the above method, an initial motion vector (in one example, two initial motion vectors are taken as input) is obtained based on indication information in the bitstream (e.g., an index may be signaled in the bitstream that indicates a position in a candidate motion vector list.

In the above method, the first set of candidate sample positions falls within a bounding rectangular region that is calculated using the initial MV pair, the position of the upper left corner of the current sub-block in the current image, and the one-sided motion vector refinement range in the horizontal direction and the one-sided motion vector refinement range in the vertical direction.

In the above method, the subset of sample positions is obtained by a regular extraction of the first set of candidate sample positions.

In the above method, a subset of sample positions in the first set of candidate sample positions is selected as sample positions on alternating rows.

An encoder (20) comprising processing circuitry for performing the above method.

A decoder (30) comprising processing circuitry for performing the above method.

A computer program product comprising program code for performing the above method.

A decoder, comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing programming for execution by the processor, wherein the decoder is configured to perform the above-described method when the programming is executed by the processor.

An encoder, comprising: one or more processors; and a non-transitory computer readable storage medium coupled to the processor and storing programming for execution by the processor, wherein the encoder is configured to perform the above-described method when the programming is executed by the processor.

Furthermore, the present invention provides a method for obtaining prediction samples to be used in inter-frame bi-directional prediction of a current block of a video picture, comprising: obtaining an initial pair of motion vectors for bi-directional prediction with respect to a reference image pair; obtaining a first set of candidate sample positions for each of the reference image pairs from the initial motion vector and a plurality of candidate motion vectors having a motion vector offset relative to the initial motion vector; obtaining first prediction sample values at a subset of positions in each of a first set of candidate sample positions from the initial motion vector and reconstructed luma samples of the reference image; calculating the matching cost value of each candidate motion vector according to the obtained prediction sample value and the motion vector offset between the candidate motion vector and the corresponding initial motion vector; obtaining a refined motion vector based on the calculated matching cost value of each candidate motion vector and each initial motion vector; a second prediction sample of the current block is obtained based on the refined motion vector.

Mathematical operators

The mathematical operators used in this application are similar to the operators used in the C programming language. However, the results of integer division and arithmetic shift operations are more accurately defined, and additional operations are defined, such as exponentiation and real-valued division. The numbering and counting convention generally starts with 0, e.g., "first" corresponds to 0 th, "second" corresponds to 1 st, and so on.

Arithmetic operator

The following arithmetic operators are defined as follows:

+ addition

Subtraction (as two-parameter operator) or negation (as unary prefix operator)

Multiplication, including matrix multiplication

x^yAnd (4) performing an exponentiation operation. Specifying the y power of x. In other contexts, such notation is used to represent superscripts and is not intended to be construed as exponentiation.

Integer division with the result truncated towards zero. For example, 7/4 and-7/-4 are truncated to 1, -7/4 and 7/-4 are truncated to-1.

Division to mean division in the mathematical equation without truncation or rounding.

For representing divisions in a mathematical equation that do not require truncation or rounding.

And f (i), i taking all integer values of x to y (including y).

Modulo x% y. The remainder of x divided by y is defined only for integers x and y where x >0 and y > 0.

Logical operators

The following logical operators are defined as follows:

boolean logical AND of x & & y x and y "

Boolean logical "OR" of x | y x and y "

| A Boolean logic "not"

Z if x is true or not equal to 0, the result is the value of y; otherwise, the result is z value.

Relational operators

The following relational operators are defined as follows:

is greater than

Greater than or equal to

< less than

Less than or equal to

Equal to

| A Is not equal to

When the relationship operator is applied to a syntax element or variable that has been assigned "na" (not applicable), the value "na" is treated as a unique value of the syntax element or variable. The value "na" is considered not equal to any other value.

Bit operator

The following operators are defined as follows:

and is pressed. When operating on integer parameters, the binary complement representation of the integer value is operated on. When operating on a binary parameter containing fewer bits than another parameter, the shorter parameter is extended by adding a more significant bit equal to 0.

| OR in bits. When operating on integer parameters, the binary complement representation of the integer value is operated on. When operating on a binary parameter containing fewer bits than another parameter, the shorter parameter is extended by adding a more significant bit equal to 0.

And ^ exclusive OR by bit. When operating on integer parameters, the binary complement representation of the integer value is operated on. When operating on a binary parameter containing fewer bits than another parameter, the shorter parameter is extended by adding a more significant bit equal to 0.

The arithmetic of a two's complement integer representation of x > > y x right shifts by y binary digits. The function is defined only for non-negative integer values of y. The value of the bit shifted into the Most Significant Bit (MSB) due to the right shift is equal to the MSB of x before the shift operation.

The arithmetic of the two's complement integer representation of x < < y x is left shifted by y binary digits. The function is defined only for non-negative integer values of y. The value of the bit shifted into the Least Significant Bit (LSB) due to the left shift is equal to 0.

Assignment operators

The following arithmetic operators are defined as follows:

operator for value assignment

Plus + increments, i.e., x + + is equivalent to x + 1; when used in array indexing, is equal to the value of the variable prior to the increment operation.

-decrement, i.e. x-equals x-1; when used in an array index, is equal to the value of the variable prior to the decrement operation.

The + is incremented by a predetermined amount, i.e., x + ═ 3 corresponds to x +3, and x + ═ 3 corresponds to x + (-3).

A predetermined amount is decremented, that is, x ═ 3 corresponds to x ═ x-3, and x ═ 3 corresponds to x ═ x- (-3).

Symbol of range

The following notation is used to designate ranges of values:

y.. z x takes integer values from y to z (including y and z), x, y and z being integers, z being greater than y.

Mathematical function

The following mathematical functions are defined:

asin (x) a trigonometric arcsine function acting on the parameter x, i.e.

In the range of-1.0 to 1.0 inclusive, with an output value in the range of

- π/2 to π/2 (inclusive), in radians

Atan (x) trigonometric arctangent function, which operates on the parameter x with an output value range of

- π/2 to π/2 (inclusive), in units of radians

Ceil (x) is the smallest integer greater than or equal to x.

Clip1_Y(x)＝Clip3(0，(1<<BitDepth_Y)-1，x)

Clip1_C(x)＝Clip3(0，(1<<BitDepth_C)-1，x)

Cos (x) is a trigonometric cosine function that operates on the parameter x in radians.

Floor (x) is less than or equal to the largest integer of x.

Ln (x) the natural logarithm of x (base e logarithm, where e is the base constant 2.718281828 of the natural logarithm).

Log2(x) x base 2 logarithm.

Log10(x) x base 10 logarithm.

Round(x)＝Sign(x)*Floor(Abs(x)+0.5)

Sin (x) a trigonometric sine function operating on the parameter x in radians.

Swap(x，y)＝(y，x)

Tan (x) is a trigonometric tangent function that operates on the parameter x in radians.

Operation priority order

When the order of priority in the expression is not explicitly indicated with parentheses, the following rule applies:

-the higher priority operations are calculated before the lower priority operations.

-computing the operations with the same priority in turn from left to right.

The following table specifies the operation priorities from highest to lowest; a higher position in the table indicates a higher priority.

For those operators that are also used in the C programming language, the precedence order used in this specification is the same as that used in the C programming language.

Table (b): operation priority from highest (table top) to lowest (table bottom)

Textual description of logical operations

In the text, the following logical operations will be described mathematically:

its statements may be described in the following way:

.. the following applies:

if it is conditional 0, statement 0

Else, if condition 1, statement 1

-…

Else (informative annotation of remaining conditions), statement n

Each "if.. else in the text, if.. else,." sentence is introduced with "as follows" or "as applicable. Else, if else, the last condition of. An interleaved "if.. else,." statement may be identified by matching ". as follows" or ". the following applies" to the end "else.

In the text, the following logical operations will be described mathematically:

its statements may be described in the following way:

.. the following applies:

statement 0 if all of the following conditions are true:

condition 0a

Condition 0b

Else, statement 1 if one or more of the following conditions is true:

condition 1a

Condition 1b

–...

Else, statement n

In the text, the following logical operations will be described mathematically:

if (Condition 0)

Statement 0

if (Condition 1)

Statement 1

Its statements may be described in the following way:

when condition 0, statement 0

When condition 1, statement 1.

Although embodiments of the present invention have been described primarily based on video coding, it should be noted that embodiments of the encoding system 10, encoder 20 and decoder 30 (and corresponding system 10) and other embodiments described herein may also be configured for still image processing or encoding, i.e., processing or encoding of individual images independent of any previous or consecutive image as in video coding. In general, where image processing encoding is limited to a single image 17, only inter prediction units 244 (encoders) and 344 (decoders) may not be available. All other functions (also referred to as tools or techniques) of the video encoder 20 and the video decoder 30 may be used for still image processing as well, such as residual calculation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse) transform 212/312, partition 262/362, intra prediction 254/354 and/or loop filtering 220, 320, as well as entropy encoding 270 and entropy decoding 304.

Embodiments such as encoder 20 and decoder 30 and functions described herein, for example, with reference to encoder 20 and decoder 30, may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code over a communication medium and executed by a hardware-based processing unit. The computer readable medium may comprise a computer readable storage medium, which corresponds to a tangible medium such as a data storage medium, or a communication medium, which includes any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, the computer-readable medium may generally correspond to: (1) a non-transitory, tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures to implement the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, an Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Further, in some aspects, the functionality described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be implemented entirely in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as noted above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperative hardware units including one or more processors as noted above, in conjunction with suitable software and/or firmware.

46页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于屏幕内容编解码的帧内块复制

DMVR using decimated prediction blocks

相关技术

网友询问留言