Affine limitation for worst-case bandwidth reduction in video coding

文档序号：621639 发布日期：2021-05-07 浏览：3次中文

阅读说明：本技术 针对视频译码中最差情况的带宽减少的仿射限制 (Affine limitation for worst-case bandwidth reduction in video coding ) 是由 W-J.钱 H.黄 M.卡尔切维茨 L.法姆范 V.塞雷金于 2019-10-03 设计创作，主要内容包括：示例方法包括使用仿射运动补偿获得选择用于译码的视频数据的当前块的多个亮度子块的亮度运动矢量的值；基于多个亮度子块的子集的亮度运动矢量的值,确定对应于该多个亮度子块的色度子块的色度运动矢量的值；使用仿射运动补偿,基于亮度运动矢量的相应值,预测多个亮度子块中每个亮度子块的相应样本；以及使用仿射运动补偿,基于色度运动矢量的值预测色度子块的样本。(An example method includes obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of video data selected for coding using affine motion compensation; determining a value of a chroma motion vector corresponding to a chroma sub-block of the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks; predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and predicting samples of the chroma sub-blocks based on values of the chroma motion vectors using affine motion compensation.)

1. A method for coding video data, the method comprising:

obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of the video data selected for coding using affine motion compensation;

determining a value of a chroma motion vector for a chroma sub-block corresponding to the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks;

predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and

predicting samples of the chroma sub-block based on values of the chroma motion vector using affine motion compensation.

2. The method of claim 1, wherein the plurality of luma sub-blocks comprises an upper left sub-block, an upper right sub-block, a lower left sub-block, and a lower right sub-block.

3. The method of claim 2, wherein the subset of the plurality of luma sub-blocks comprises two diagonally located luma sub-blocks.

4. The method of claim 3, wherein the two diagonally positioned luma sub-blocks comprise the top-left sub-block and the bottom-right sub-block.

5. The method of claim 4, wherein the value of the chroma motion vector is not determined based on a value of the luma motion vector of the upper-right sub-block or a value of the luma motion vector of the lower-left sub-block.

6. The method of claim 1, wherein the luma sub-blocks are each 4 x 4 samples, and wherein the chroma sub-blocks are 4 x 4 samples.

7. The method of claim 1, wherein determining the value of the chroma motion vector comprises:

determining the value of the chroma motion vector as an average of values of the luma motion vectors of the subset of the plurality of luma sub-blocks.

8. The method of claim 7, wherein determining that the average is the value of the luma motion vector of the subset of the plurality of luma sub-blocks comprises:

determining a sum of values of the luma motion vectors of the subset of the plurality of luma sub-blocks;

shifting the determined sum right to calculate the value of the chroma motion vector.

9. An apparatus for coding video data, the apparatus comprising:

a memory configured to store the video data; and

one or more processors implemented in circuitry and configured to:

obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of the video data selected for coding using affine motion compensation;

predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and

predicting samples of the chroma sub-block based on values of the chroma motion vector using affine motion compensation.

10. The apparatus of claim 9, wherein the plurality of luma sub-blocks comprise an upper left sub-block, an upper right sub-block, a lower left sub-block, and a lower right sub-block.

11. The apparatus of claim 10, wherein the subset of the plurality of luma sub-blocks comprises two diagonally located luma sub-blocks.

12. The apparatus of claim 11, wherein the two diagonally positioned luma sub-blocks comprise the top-left sub-block and the bottom-right sub-block.

13. The device of claim 12, wherein the one or more processors are not to determine the value of the chroma motion vector based on a value of the luma motion vector of the top-right sub-block or a value of the luma motion vector of the bottom-left sub-block.

14. The apparatus of claim 9, wherein the luma sub-blocks are each 4 x 4 samples, and wherein the chroma sub-blocks are 4 x 4 samples.

15. The device of claim 9, wherein to determine the value of the chroma motion vector, the one or more processors are configured to:

determining the value of the chroma motion vector as an average of values of the luma motion vectors of the subset of the plurality of luma sub-blocks.

16. The device of claim 15, wherein to determine that the average is the value of the luma motion vector for the subset of the plurality of luma sub-blocks, the one or more processors are configured to:

determining a sum of values of the luma motion vectors of the subset of the plurality of luma sub-blocks;

shifting the determined sum right to calculate the value of the chroma motion vector.

17. A computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors of a video coder to perform the steps of:

obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of the video data selected for coding using affine motion compensation;

predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and

predicting samples of the chroma sub-block based on values of the chroma motion vector using affine motion compensation.

18. The computer-readable storage medium of claim 17, wherein:

the plurality of luminance sub-blocks include an upper left sub-block, an upper right sub-block, a lower left sub-block, and a lower right sub-block;

the subset of the plurality of luma sub-blocks comprises two diagonally positioned luma sub-blocks; and

the two diagonally positioned luma sub-blocks include the top-left sub-block and the bottom-right sub-block.

19. The computer-readable storage medium of claim 18, wherein the instructions that cause the one or more processors to determine the value of the chroma motion vector comprise instructions that cause the one or more processors to:

determining the value of the chroma motion vector as an average of values of the luma motion vectors of the subset of the plurality of luma sub-blocks.

20. An apparatus for coding video data, the apparatus comprising:

means for obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of the video data selected for coding using affine motion compensation;

means for determining a value of a chroma motion vector for a chroma sub-block corresponding to the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks;

means for predicting, using affine motion compensation, a respective sample of each luma sub-block of the plurality of luma sub-blocks based on a respective value of the luma motion vector; and

means for predicting samples of the chroma sub-block based on values of the chroma motion vector using affine motion compensation.

Technical Field

The present disclosure relates to video encoding and video decoding.

Background

Digital video functionality can be integrated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), portable or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding (coding) techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HEVC), and extensions of such standards. By implementing such video coding techniques, video devices may more efficiently transmit, receive, encode, decode, and/or store digital video information.

Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as Coding Tree Units (CTUs), Coding Units (CUs) and/or coding nodes. For video blocks in an intra-coded (I) slice of a picture, spatial prediction with respect to reference samples in neighboring blocks in the same picture may be used for encoding. For video blocks in inter-coded (P or B) slices of a picture, spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures may be used. A picture may be referred to as a frame, and a reference picture may be referred to as a reference frame.

Disclosure of Invention

The present disclosure relates generally to techniques for reducing an amount of memory bandwidth used to predict samples of video data. A video coder (e.g., a video encoder or a video decoder) may predict samples for a current block of video data based on samples of one or more reference blocks of video data (referred to as reference samples). To predict samples for a current block from reference samples, a video coder may retrieve (retrieve) the reference samples from memory. The amount of memory bandwidth that the video coder uses to predict samples for a current block of video data may be a function of the number of reference samples retrieved. Retrieving reference samples from memory may consume power and increase processing time. As such, in some examples, it may be desirable to minimize the memory bandwidth used by the video decoder.

In accordance with one or more techniques of this disclosure, a video coder may impose one or more constraints to reduce an amount of memory bandwidth used to predict samples of a current block of video data. For example, a video coder may determine a memory bandwidth required for a current block, and may selectively modify a motion compensation method used to predict samples of the current block based on whether the determined memory bandwidth for the current block satisfies a bandwidth threshold. In this way, the video coder may reduce the amount of power consumed and/or the processing time required to predict the samples of the current block.

In one example, a method includes obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of video data selected for coding using affine motion compensation; determining a value of a chroma motion vector for a chroma sub-block corresponding to the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks; predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and predicting samples of the chroma sub-blocks based on values of the chroma motion vectors using affine motion compensation.

In another example, an apparatus for coding video data comprises: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of video data selected for coding using affine motion compensation; determining a value of a chroma motion vector for a chroma sub-block corresponding to the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks; predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and predicting samples of the chroma sub-blocks based on values of the chroma motion vectors using affine motion compensation.

In another example, a computer-readable storage medium stores instructions that, when executed, cause one or more processors of a video coder to: obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of video data selected for coding using affine motion compensation; determining a value of a chroma motion vector for a chroma sub-block corresponding to the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks; predicting, using affine motion compensation, a respective sample of each luminance sub-block of the plurality of luminance sub-blocks based on a respective value of the luminance motion vector; and predicting samples of the chroma sub-blocks based on values of the chroma motion vectors using affine motion compensation.

In another example, an apparatus for coding video data comprises: means for obtaining values of luma motion vectors for a plurality of luma sub-blocks of a current block of video data selected for coding using affine motion compensation; means for determining a value of a chroma motion vector for a chroma sub-block corresponding to the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks; means for predicting a respective sample of each of the plurality of luma sub-blocks based on a respective value of the luma motion vector using affine motion compensation; and means for predicting samples of the chroma sub-blocks based on values of the chroma motion vectors using affine motion compensation.

The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of various aspects of the technology will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram illustrating an example video encoding and decoding system that may perform the techniques of this disclosure.

Fig. 2A and 2B are conceptual diagrams illustrating an example quadtree-plus-binary tree (QTBT) structure and corresponding Coding Tree Unit (CTU).

Fig. 3A-3E are conceptual diagrams illustrating example segmentation of video data.

Fig. 4 is a block diagram illustrating an example video encoder that may perform techniques of this disclosure.

Fig. 5 is a block diagram illustrating an example video decoder that may perform techniques of this disclosure.

Fig. 6A and 6B are conceptual diagrams illustrating control points in the affine mode.

Fig. 7 is a conceptual diagram illustrating a non-overlapping reference region for reconstructing a current block according to one or more aspects of the present disclosure.

Fig. 8 is a conceptual diagram illustrating an overlapping reference region for reconstructing a current block according to one or more aspects of the present disclosure.

Fig. 9 is a conceptual diagram illustrating determining a chroma motion vector from a luma motion vector according to one or more techniques of this disclosure.

FIG. 10 is a flow diagram illustrating an example process for encoding a current block.

FIG. 11 is a flow diagram illustrating an example process for decoding a current block.

Fig. 12 is a flow diagram illustrating an example process for managing memory bandwidth for predictive video data in accordance with one or more techniques of this disclosure.

FIG. 13 is a conceptual diagram illustrating simplified memory bandwidth testing according to one or more aspects of the present disclosure.

Fig. 14 is a flow diagram illustrating an example method for managing memory bandwidth for predictive video data in accordance with one or more techniques of this disclosure.

Detailed Description

Fig. 1 is a block diagram illustrating an example video encoding and decoding system 100 that may perform techniques of the present disclosure. The techniques of this disclosure are generally directed to coding (encoding and/or decoding) video data. In general, video data includes any data used to process video. Thus, the video data may include original unencoded video, encoded video, decoded (e.g., reconstructed) video, and video metadata (such as signaled data).

As shown in fig. 1, in this example, system 100 includes a source device 102 that provides encoded video data to be decoded and displayed by a destination device 116. In particular, source device 102 provides video data to destination device 116 via computer-readable medium 110. Source device 102 and target device 116 may comprise any of a variety of devices, including desktop computers, notebooks (i.e., laptop computers), tablets, set-top boxes, hand-held telephones (e.g., smartphones), televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and the like. In some cases, source device 102 and destination device 116 may be equipped for wireless communication and may therefore be referred to as wireless communication devices.

In the example of fig. 1, the source device 102 includes a video source 104, a memory 106, a video encoder 200, and an output interface 108. The target device 116 includes an input interface 122, a video decoder 300, a memory 120, and a display device 118. In accordance with this disclosure, video encoder 200 of source device 102 and video decoder 300 of target device 116 may be configured to apply the techniques for memory bandwidth reduction for affine coding video data. Thus, source device 102 represents an example of a video encoding device, and destination device 116 represents an example of a video decoding device. In other examples, the source device and the target device may include other components or arrangements. For example, source device 102 may receive video data from an external video source, such as an external video camera. Also, the target device 116 may interface with an external display device without including an integrated display device.

The system 100 as shown in fig. 1 is merely an example. In general, any digital video encoding and/or decoding device may perform the techniques for memory bandwidth reduction for affine coding video data. Source device 102 and destination device 116 are merely examples of such transcoding devices, wherein source device 102 generates transcoded video data for transmission to destination device 116. The present disclosure denotes a "transcoding" apparatus as an apparatus that performs transcoding (encoding and/or decoding) of data. Thus, the video encoder 200 and the video decoder 300 represent examples of coding devices, in particular, a video encoder and a video decoder, respectively. In some examples, the devices 102, 116 may operate in a substantially symmetric manner such that each of the devices 102, 116 includes video encoding and decoding components. The system 100 may thus support one-way or two-way video transmission between the video devices 102, 116, for example for video streaming, video playback, video broadcasting, or video telephony.

In general, video source 104 represents a source of video data (i.e., raw, unencoded video data) and provides a sequence of consecutive pictures (also referred to as "frames") of video data to video encoder 200, which encodes the data for the pictures. The video source 104 of the source device 102 may include a video capture device, such as a video camera, a video archive including previously captured raw video, and/or a video feed interface that receives video from a video content provider. As a further alternative, video source 104 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In each case, the video encoder 200 encodes captured, pre-captured, or computer-generated video data. The video encoder 200 may rearrange the pictures from the receiving order (sometimes referred to as "display order") to the coding order for coding. The video encoder 200 may generate a bitstream that includes encoded video data. Source device 102 may then output the encoded video data onto computer-readable medium 110 via output interface 108, for receipt and/or retrieval by, for example, input interface 122 of destination device 116.

Memory 106 of source device 102 and memory 120 of destination device 116 represent general purpose memory. In some examples, the memories 106, 120 may store raw video data, such as raw video from the video source 104 and raw decoded video data from the video decoder 300. Additionally or alternatively, the memories 106, 120 may store software instructions executable by, for example, the video encoder 200 and the video decoder 300, respectively. Although shown separately from the video encoder 200 and the video decoder 300 in this example, it is understood that the video encoder 200 and the video decoder 300 may also include internal memory to achieve a functionally similar or equivalent purpose. Further, the memories 106, 120 may store, for example, encoded video data output from the video encoder 200 and input to the video decoder 300. In some examples, portions of the memory 106, 120 may be allocated as one or more video buffers, e.g., to store raw decoded and/or encoded video data.

Computer-readable medium 110 may represent any type of medium or device capable of transmitting encoded video data from source device 102 to destination device 116. In one example, computer-readable medium 110 represents a communication medium to enable source device 102 to send encoded video data directly to destination device 116 in real-time, e.g., via a radio frequency network or a computer-based network. Output interface 108 may modulate transport signaling including encoded video data and input interface 122 may demodulate received transport signaling in accordance with a communication standard, such as a wireless communication protocol. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other equipment that facilitates communication from source device 102 to target device 116.

In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, the target device 116 may access encoded data from the storage device 112 via the input interface 122. Storage device 112 may include any of a variety of distributed or locally accessed data storage media such as a hard disk, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.

In some examples, source device 102 may output the encoded video data to file server 114 or another intermediate storage device that may store the encoded video data generated by source device 102. The target device 116 may access the stored video data from the file server 114 via streaming or download. File server 114 may be any type of server device capable of storing encoded video data and transmitting the encoded video data to destination device 116. File server 114 may represent a web server (e.g., for a website), a File Transfer Protocol (FTP) server, a content delivery network device, or a Network Attached Storage (NAS) device. The target device 116 may access the encoded video data from the file server 114 via any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a Digital Subscriber Line (DSL), cable modem, etc.), or a combination of both suitable for accessing encoded video data and stored on file server 114. File server 114 and input interface 122 may be configured to operate according to a streaming protocol, a download transfer protocol, or a combination thereof.

Output interface 108 and input interface 122 may represent wireless transmitters/receivers, modems, wired networking components (e.g., ethernet cards), wireless communication components operating in accordance with any of a variety of IEEE802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 include wireless components, output interface 108 and input interface 122 may be configured to transmit data, such as encoded video data, in accordance with a cellular communication standard, such as 4G, 4G-LTE (long term evolution), LTE advanced, 5G, or similar standards. In some examples in which the output interface 108 includes a wireless transmitter, the output interface 108 and the input interface 122 may be configured according to other wireless standards, such as the IEEE802.11 specification, the IEEE 802.15 specification (e.g., ZigBee)^TM)、Bluetooth^TMStandards, etc. to transmit data such as encoded video data. In some examples, source device 102 and/or target device 116 may include respective system-on-a-chip (SoC) devices. For example, source device 102 may include a SoC device to perform the functions attributed to video encoder 200 and/or output interface 108, and target device 116 may include a SoC device to performThe lines pertain to the functionality of the video decoder 300 and/or the input interface 122.

The techniques of this disclosure may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions such as dynamic adaptive streaming over HTTP (DASH), digital video encoded onto a data storage medium, decoding digital video stored on a data storage medium, or other applications.

The input interface 122 of the target device 116 receives the encoded video bitstream from the computer-readable medium 110 (e.g., the storage device 112, the file server 114, etc.). The encoded video bitstream may include signaling information, such as syntax elements, defined by the video encoder 200, also used by the video decoder 300, having values that describe characteristics and/or processing of video blocks or other coding units (e.g., slices, pictures, groups of pictures, sequences, etc.). The display device 118 displays the decoded pictures of the decoded video data to the user. Display device 118 may represent any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

Although not shown in fig. 1, in some examples, each of the video encoder 200 and the video decoder 300 may be integrated with an audio encoder and/or an audio decoder, and may include appropriate MUX-DEMUX units or other hardware and/or software to process multiplexed streams including audio and video in a common data stream. The MUX-DEMUX unit may be compliant with the ITU h.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP), if applicable.

Each of video encoder 200 and video decoder 300 may be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the video encoder 200 and the video decoder 300 may be included in one or more encoders or decoders, both of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. A device including video encoder 200 and/or video decoder 300 may include an integrated circuit, a microprocessor, and/or a wireless communication device such as a cellular telephone.

The video encoder 200 and the video decoder 300 may operate according to a video code standard, such as ITU-t h.265, also known as High Efficiency Video Coding (HEVC), or extensions thereof, such as multi-view and/or scalable video coding extensions. Alternatively, the video encoder 200 and the video decoder 300 may be in accordance with other proprietary or industry standards, such as joint exploration test model (JEM) or ITU-T H.266 (also known as Universal video coding (VVC)). In ITU-T SG 16WP3 Joint video experts group (JVT) and ISO/IEC JTC 1/SC 29/WG 11, conference No. 11: the latest Draft of the VVC standard is described in lubulena, SI, 2018, month 7, day 10 to day 18, jfet-12001-vE, Bross et al, "Versatile Video Coding (Draft 2) in general Video Coding" (hereinafter referred to as "VVC Draft 2"). However, the techniques of this disclosure are not limited to any particular encoding standard.

In general, the video encoder 200 and the video decoder 300 may perform block-based coding of pictures. The term "block" generally refers to a structure that includes data to be processed (e.g., encoded, decoded, or otherwise used in an encoding and/or decoding process). For example, a block may comprise a two-dimensional matrix of samples of luminance and/or chrominance data. In general, the video encoder 200 and the video decoder 300 may code video data represented in YUV (e.g., Y, Cb, Cr) format. That is, rather than coding red, green, and blue (RGB) data for samples of a picture, video encoder 200 and video decoder 300 may code luma and chroma components, where the chroma components may include red and blue chroma components. In some examples, the video encoder 200 converts the received RGB-format data to a YUV representation prior to encoding, and the video decoder 300 converts the YUV representation to an RGB format. Alternatively, a pre-processing and post-processing unit (not shown) may perform these conversions.

This disclosure generally represents coding (e.g., encoding and decoding) of pictures, including processes that encode or decode picture data. Similarly, this disclosure may represent a process of coding a block of a picture to include encoding or decoding data of the block, such as prediction and/or residual coding. An encoded video bitstream typically includes syntax elements that are used to represent coding decisions (e.g., coding modes) and a series of values that partition a picture into blocks. Thus, references to coding a picture or block are generally understood to refer to coding values of syntax elements that form the picture or block.

HEVC defines various blocks, including Coding Units (CUs), Prediction Units (PUs), and Transform Units (TUs). According to HEVC, a video coder, such as video encoder 200, partitions a Coding Tree Unit (CTU) into CUs according to a quadtree structure. That is, the video coder partitions the CTU and CU into four equal, non-overlapping squares, and each node of the quadtree has zero or four child nodes. A node without child nodes may be referred to as a "leaf node," and a CU of such a leaf node may include one or more PUs and/or one or more TUs. The video coder may further partition the PU and TU. For example, in HEVC, the Residual Quadtree (RQT) represents the partitioning of a TU. In HEVC, PU represents inter prediction data and TU represents residual data. The intra-predicted CU includes intra-prediction information, such as an intra-mode indication.

As another example, the video encoder 200 and the video decoder 300 may be configured to operate in accordance with JEM or VVC. According to some examples of JEM or VVC, a video coder, such as video encoder 200, partitions a picture into multiple Coding Tree Units (CTUs). The video encoder 200 may partition the CTUs according to a tree structure such as a quadtree-binary tree (QTBT) structure or a multi-type tree (MTT) structure. The QTBT structure eliminates the concept of multiple partition types, such as the differentiation between CU, PU and TU of HEVC. The QTBT structure comprises two levels: a first level partitioned according to a quadtree partition, and a second level partitioned according to a binary tree partition. The root node of the QTBT structure corresponds to the CTU. Leaf nodes of the binary tree correspond to Coding Units (CUs).

In the MTT split structure, a block may be split using Quadtree (QT) split, Binary Tree (BT) split, and one or more types of Ternary Tree (TT) (also referred to as Ternary Tree (TT)) split. A ternary tree or treelet partition is a partition that splits a block into three sub-blocks. In some examples, the ternary tree or treelike partitioning does not divide an original block into three sub-blocks by centrally dividing the original block. The partition types (e.g., QT, BT, and TT) in MTT may be symmetric or asymmetric.

In some examples, the video encoder 200 and the video decoder 300 may represent each of the luma component and the chroma component using a single QTBT or MTT structure, while in other examples, the video encoder 200 and the video decoder 300 may use two or more QTBT or MTT structures, such as one QTBT/MTT structure for the luma component and another QTBT/MTT structure for the two chroma components (or two QTBT/MTT structures for the respective chroma components).

The video encoder 200 and the video decoder 300 may be configured to use quadtree partitioning, QTBT partitioning, MTT partitioning, or other partitioning structures according to HEVC. For purposes of illustration, a description of the techniques of the present disclosure is given for QTBT segmentation. However, it should be understood that the techniques of this disclosure may also be applied to configuring video coders that use quadtree partitioning or other types of partitioning.

Blocks (e.g., CTUs or CUs) may be grouped in various ways in a picture. As one example, a brick (brick) may refer to a rectangular region of a row of CTUs in a particular tile (tile) in a picture. A slice may be a rectangular region of CTUs in a particular slice column and a particular slice row in a picture. A slice column refers to a rectangular region of CTUs having a height equal to the picture height and a width specified by a syntax element (e.g., such as in a picture parameter set). A slice line refers to a rectangular region of CTUs having a height (e.g., such as in a picture parameter set) specified by a syntax element and a width equal to the picture width.

In some examples, a slice may be split into multiple bricks, each brick may include one or more rows of CTUs in the slice. A sheet that is not divided into a plurality of bricks may also be referred to as a brick. However, bricks that are a true subset of a sheet cannot be referred to as a sheet.

The bricks in a sheet may also be arranged in strips (slices). A slice may be an integer number of bricks of a picture that may be uniquely contained in a single Network Abstraction Layer (NAL) unit. In some examples, a strip comprises several complete tiles or a continuous sequence of complete tiles comprising only one tile.

The present disclosure may use "N × N" and "N by N" interchangeably to represent sample dimensions of a block (such as a CU or other video block) in terms of vertical and horizontal dimensions, e.g., 16 × 16 samples or 16 by 16 samples. In general, a 16 × 16CU will have 16 samples in the vertical direction (y ═ 16) and 16 samples in the horizontal direction (x ═ 16). Likewise, an nxn CU typically has N samples in the vertical direction and N samples in the horizontal direction, where N represents a non-negative integer value. The samples in a CU may be arranged in rows and columns. Furthermore, a CU does not necessarily have the same number of samples in the horizontal direction as in the vertical direction. For example, a CU may include nxm samples, where M is not necessarily equal to N.

Video encoder 200 encodes video data of a CU representing prediction and/or residual information, as well as other information. The prediction information indicates how the CU is to be predicted in order to form a prediction block for the CU. The residual information generally represents the sample-by-sample difference between the samples of the pre-coded CU and the prediction block.

To predict a CU, video encoder 200 may generally form a prediction block for the CU through inter prediction or intra prediction. Inter prediction generally refers to predicting a CU from data of a previously coded picture, while intra prediction generally refers to predicting a CU from previously coded data of the same picture. To perform inter prediction, the video encoder 200 may generate a prediction block using one or more motion vectors. Video encoder 200 may typically perform a motion search to identify a reference block that closely matches a CU, e.g., in terms of differences between the CU and the reference block. Video encoder 200 may calculate a difference metric using Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Mean Absolute Differences (MAD), Mean Squared Differences (MSD), or other such difference calculations to determine whether the reference block closely matches the current CU. In some examples, video encoder 200 may predict the current CU using uni-prediction or bi-prediction.

Some examples of JEM and VVC may also provide an affine motion compensation mode, which may be considered an inter prediction mode. In affine motion compensation mode, video encoder 200 may determine two or more motion vectors that represent non-translational motion, such as zoom-in or zoom-out, rotation, perspective motion, or other irregular motion types.

To perform intra-prediction, video encoder 200 may select an intra-prediction mode to generate the prediction block. Some examples of JEM and VVC provide sixty-seven intra prediction modes, including various directional modes as well as planar and DC modes. In general, video encoder 200 selects an intra-prediction mode that describes neighboring samples of a current block (e.g., a block of a CU) from which to predict predicted samples of the current block. Assuming that the video encoder 200 codes CTUs and CUs in raster scan order (left-to-right, top-to-bottom), such samples may typically be above, top-to-left, or left-to-right of the current block in the same picture as the current block.

The video encoder 200 encodes data representing the prediction mode of the current block. For example, for inter-prediction modes, video encoder 200 may encode data representing which of the various available inter-prediction modes is used along with motion information for the corresponding mode. For uni-directional or bi-directional inter prediction, for example, video encoder 200 may encode motion vectors using Advanced Motion Vector Prediction (AMVP) or merge mode. The video encoder 200 may use a similar mode to encode the motion vectors for the affine motion compensation mode.

After prediction (such as intra prediction or inter prediction of a block), video encoder 200 may calculate residual data for the block. Residual data (such as a residual block) represents the sample-by-sample difference between a block and a prediction block for the block, the prediction block being formed using a corresponding prediction mode. The video encoder 200 may apply one or more transforms to the residual block to produce transform data in the transform domain rather than the sample domain. For example, video encoder 200 may apply a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video data. In addition, video encoder 200 may apply a quadratic transform after the primary transform, such as a mode dependent non-differentiable quadratic transform (mdsnst), a signaling dependent transform, a Karhunen-Loeve transform (KLT), and so on. The video encoder 200 generates transform coefficients after applying one or more transforms.

As described above, video encoder 200 may perform quantization on the transform coefficients after any transform is performed to generate the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to possibly reduce the amount of data used to represent the coefficients, thereby providing further compression. By performing the quantization process, video encoder 200 may reduce the bit depth associated with some or all of the coefficients. For example, the video encoder 200 may round an n-bit value to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, video encoder 200 may perform a bit-wise right shift on the values to be quantized.

After quantization, video encoder 200 may scan the transform coefficients, thereby generating a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scanning may be designed to place higher energy (and therefore lower frequency) coefficients in front of the vector and lower energy (and therefore higher frequency) transform coefficients behind the vector. In some examples, video encoder 200 may scan the quantized transform coefficients with a predefined scan order to produce serialized vectors and then entropy encode the quantized transform coefficients of the vectors. In other examples, video encoder 200 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 200 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Binary Arithmetic Coding (CABAC). Video encoder 200 may also entropy encode values for syntax elements that describe metadata associated with the encoded video data used by video decoder 300 in decoding the video data.

To perform CABAC, the video encoder 200 may assign a context within the context model to a symbol to be transmitted. For example, the context may relate to whether adjacent values of a symbol are zero values. The probability determination may be based on the context assigned to the symbol.

The video encoder 200 may further generate syntax data, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, for example, in a picture header, a block header, a slice header to the video decoder 300, or generate other syntax data, such as a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a Video Parameter Set (VPS). Video decoder 300 may similarly decode such syntax data to determine how to decode the corresponding video data.

In this way, the video encoder 200 may generate a bitstream that includes encoded video data, e.g., syntax elements describing the partitioning of a picture into blocks (e.g., CUs) and prediction and/or residual information for the blocks. Finally, the video decoder 300 may receive the bitstream and decode the encoded video data.

In general, the video decoder 300 performs the reverse process performed by the video encoder 200 to decode the encoded video data of the bitstream. For example, video decoder 300 may use CABAC to decode values of syntax elements of a bitstream in a manner substantially similar to (although opposite to) the CABAC encoding process of video encoder 200. The syntax elements may define segmentation information about the segmentation of the picture into CTUs and the segmentation of each CTU according to a corresponding segmentation structure, such as a QTBT structure, to define the CU of the CTU. The syntax elements may further define prediction and residual information for blocks of video data (e.g., CUs).

For example, the residual information may be represented by quantized transform coefficients. The video decoder 300 may inverse quantize and inverse transform the quantized transform coefficients of the block to reproduce a residual block of the block. The video decoder 300 uses the signaled prediction mode (intra or inter prediction) and related prediction information (e.g., motion information for inter prediction) to form a prediction block for the block. The video decoder 300 may then combine the prediction block and the residual block (on a sample-by-sample basis) to reproduce the original block. The video decoder 300 may perform additional processes, such as performing deblocking (deblocking) processes, to reduce visual artifacts along block boundaries.

In general, the present disclosure may mean "signaling" certain information, such as syntax elements. The term "signaling" may generally refer to the communication of values for syntax elements and/or other data used to decode encoded video data. That is, the video encoder 200 may signal the value of a syntax element in the bitstream. Generally, signaling refers to generating values in a bitstream. As described above, source device 102 may transmit the bitstream to destination device 116 in substantially real-time (or non-real-time, such as may occur when syntax elements are stored to storage device 112 for later retrieval by destination device 116).

Fig. 2A and 2B are conceptual diagrams illustrating an example binary Quadtree (QTBT) structure 130 and a corresponding Coding Tree Unit (CTU) 132. The solid lines represent quad-tree partitions and the dashed lines indicate binary tree partitions. In each partition (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which partition type (i.e., horizontal or vertical) is used, where in this example 0 indicates horizontal partition and 1 indicates vertical partition. For quad-tree partitioning, there is no need to indicate the partition type since the quad-tree node partitions the block horizontally and vertically into 4 sub-blocks of equal size. Accordingly, the video encoder 200 and the video decoder 300 may encode and decode syntax elements (e.g., partition information) for the region tree level (i.e., solid line) of the QTBT structure 130 and syntax elements (e.g., partition information) for the prediction tree level (i.e., dashed line) of the QTBT structure 130, respectively. The video encoder 200 and the video decoder 300 may encode and decode video data (such as prediction and transform data) for a CU represented by a terminal leaf node of the QTBT structure 130, respectively.

Generally, the CTUs 132 of fig. 2B may be associated with parameters that define the size of the blocks corresponding to the nodes of the QTBT structure 130 of the first and second levels. These parameters may include CTU size (representing the size of CTU 132 in the sample), minimum quadtree size (MinQTSize, representing the minimum allowed quadtree leaf node size), maximum binary tree size (MaxBTSize, representing the maximum allowed binary tree root node size), maximum binary tree depth (MaxBTDepth, representing the maximum allowed binary tree depth), and minimum binary tree size (MinBTSize, representing the minimum allowed binary tree leaf node size).

The root node of the QTBT structure corresponding to the CTU may have four child nodes at the first level of the QTBT structure, each child node being divisible according to quadtree partitioning. That is, the node of the first level is a leaf node (no children) or has four children. The example of the QTBT structure 130 represents a node that includes a parent node and child nodes with solid line branches. If the nodes of the first level are not larger than the maximum allowed binary tree root node size (MaxBTSize), the nodes can be further partitioned by the corresponding binary tree. The binary tree partitioning of a node can be iterated until the nodes generated by the partitioning reach a minimum allowed binary tree leaf node size (MinBTSize) or a maximum allowed binary tree depth (MaxBTDepth). The example of the QTBT structure 130 represents such a node as having dashed-line branches. The binary tree leaf nodes are represented as Coding Units (CUs) that are used for prediction (e.g., intra-picture or inter-picture prediction) and transform without any further partitioning. As described above, a CU may also be denoted as a "video block" or "block".

In one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 (luma samples and two corresponding 64 × 64 chroma samples), MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (for width and height) is set to 4, and MaxBTDepth is set to 4. First, quadtree partitioning is applied to CTUs to generate quadtree leaf nodes. The quad tree leaf nodes may have sizes from 16 × 16 (i.e., MinQTSize) to 128 × 128 (i.e., CTU size). If the leaf quadtree node is 128 × 128, it is not further divided by the binary tree because its size exceeds MaxBTSize (64 × 64 in this example). Otherwise, the leaf quadtree nodes are further subjected to binary tree segmentation. Thus, the leaf nodes of the quadtree are also the root nodes of the binary tree and have a binary tree depth of 0. When the binary tree depth reaches MaxBTDepth (4 in this example), no further partitioning is allowed. When the binary tree nodes have a width equal to MinBTSize (4 in this example), the binary tree nodes mean that no further horizontal partitioning is allowed. Similarly, a binary tree node having a height equal to MinBTSize indicates that no further vertical partitioning of the binary tree node is allowed. As described above, the leaf nodes of the binary tree are referred to as CUs and are further processed according to prediction and transformation without further partitioning.

Fig. 3A-3E are conceptual diagrams illustrating example partitioning of a block of video data. As described above, the tree structure used in VVC is a generalization of QT-BTT (quadtree plus binary and ternary tree). The basic features of the structure may include two types of tree nodes: region Trees (RT) and Prediction Trees (PT) that support five types of partitions, as shown in fig. 3A-3E. In particular, fig. 3A shows quadtree splitting, fig. 3B shows vertical binary tree splitting, fig. 3C shows horizontal binary tree splitting, fig. 3D shows vertical ternary tree splitting, and fig. 3E shows horizontal ternary tree splitting. The zone tree can recursively split the CTUs into square blocks up to 4 x 4 sized zone leaf nodes. At each node in the region tree, a prediction tree may be formed from one of three tree types to form a Coding Unit (CU): binary tree, ternary tree. In PT partitioning, it may be prohibited to have quadtree partitioning in branches of a prediction tree.

The CTU may include one luma Coding Tree Block (CTB) and two chroma coding tree blocks. At the CU level, a CU may be associated with one luma Coding Block (CB) and two chroma coding blocks. As in JEM (reference software for VVC), the luma tree and the chroma tree may be separated into intra stripes, while the luma tree and the chroma tree are common in inter stripes. The size of the CTU may be 128 × 128 (luminance component), and the size of the coding unit may be in the size range of 4 × 4 to the CTU. In such a scenario, the size of the chroma block may be 4: 2: 2 × 2 in 0 color format.

Like HEVC, VVC supports a transform skip (skip) mode. When the video coder applies the transform skip mode to the residual of the CU, the video coder may not perform the transform and may quantize the residual. To select the best transform mode for a TU of a CU, a video encoder may test both the transform mode and the transform skip mode. A video encoder may encode (e.g., signal in a bitstream) a syntax element (e.g., transform _ skip _ flag) to a decoder to indicate a transform mode of a TU. At the picture level, the video encoder may signal a syntax element (e.g., a flag) in a Picture Parameter Set (PPS) to indicate the use of the transform skip mode.

In VVC, a video encoder may encode a block into an inter slice using an inter prediction mode in which a prediction value for the block is obtained using a block matching algorithm. The video encoder may search the reference frame within a window centered on the motion vector predictor (derived by the AMVP process) to find the best match for the block. For example, a video encoder may evaluate a plurality of motion vectors at an integer precision level as part of a motion estimation process. Once the best match in the integer level is obtained, the video encoder may further refine the best match through an interpolation process (e.g., half-pixel and quarter-pixel).

In JEM, a locally adaptive motion vector resolution (lamfr) is introduced. LAMVR takes into account Motion Vector Differences (MVDs) to code in units of quarter luma samples, integer luma samples, or four luma samples. The MVD resolution may be controlled at the CU level and the MVD resolution flag may be signaled conditionally for each CU with at least one non-zero MVD component.

For a CU with at least one non-zero MVD component (e.g., the x-component or the y-component is non-zero), the video coder may signal a first flag to indicate whether quarter luma sample MV precision is used in the CU. When the first flag indicates that quarter luma sample MV precision is not used (e.g., the first flag is equal to 1), then the video coder may signal another flag to indicate whether integer luma sample MV precision or four luma sample MV precision is used.

When the first MVD resolution flag of a CU is zero or not coded for the CU (meaning that all MVDs in the CU are zero), the video coder may use one-quarter luma sample MV resolution for the CU. When a CU uses integer luma sample MV precision or four luma sample MV precision, the video coder may round (round) the MVPs in the AMVP candidate list to the corresponding precision for the CU.

The video coder may perform sub-block motion compensation (e.g., in VVC) for Advanced Temporal Motion Vector Prediction (ATMVP) mode. In ATMVP mode, a video coder may split a CU into sub-blocks, called Prediction Units (PUs). The video coder may use the motion vectors of temporally collocated blocks in previously coded frames to independently evaluate these PUs. The motion vectors of these PUs may or may not be different. In some examples, the block size of a PU may be fixed to 4 x 4. In such an example, to reconstruct each PU on the decoder side, the video decoder may access a block of size 11 × 11[ (4+4+3) × (4+4+3) ] in memory (e.g., memory 120).

A video coder may code a CU using an affine coding mode (e.g., in VVC). An affine CU (e.g., a CU encoded using an affine pattern) may be split into sub-PUs that are evaluated independently. In contrast to ATMVP, which uses motion vectors of temporally collocated blocks to obtain motion vectors for PUs, a video coder may use motion vectors of spatially neighboring CUs of a CU to derive a motion vector for each affine PU.

Affine motion models can be described as:

(v_x，v_y) Is a motion vector at coordinates (x, y), and a, b, c, d, e, and f are six parameters. The affine motion model for a block may also be represented by three affine vectors for three corners of block 600 as shown in FIG. 6AAnda description is given. These three motion vectors may be referred to as Control Point Motion Vectors (CPMV). The playfield is then described as:

where w and h are the width and height of the block. This affine motion model is referred to by the present disclosure as a 6-parameter affine motion model.

The simplified 4-parameter affine model is described as:

similarly, a simplified 4-parameter affine model for a block may be formed from two of the two corners of the blockAnda description is given. The playfield is then described as:

now, VVC allows affine type prediction. In some examples, affine type prediction in VVCs may utilize one or both of a 6-parameter affine model or a simplified 4-parameter affine model.

To reduce complexity, a video coder (e.g., video encoder 200 and/or video decoder 300) may perform sub-block-based motion compensation for affine motion compensation. The video coder may divide the current block into non-overlapping sub-blocks. For each sub-block, the video coder may derive a Motion Vector (MV) through the determined affine motion model. The video coder may then perform block-based motion compensation (block matching) using the derived MVs.

Typically, the size of the sub-blocks is fixed. However, if the difference between the MVs of two adjacent subblocks is large, using a small subblock size (e.g., 4 × 4) may significantly increase the memory bandwidth. On the other hand, using a large sub-block size may reduce the motion compensated prediction accuracy. To address this problem, video coders may utilize adaptive sub-block sizes. For example, the video coder may apply one or more constraints to the affine motion model. A video coder may use a small sub-block size if the affine motion model satisfies one or more constraints. Otherwise, if the affine motion model does not satisfy the one or more constraints, the video coder may use a relatively large sub-block size. Examples of such constraints are described in U.S. provisional application No.62/754,463 filed on day 11/1 2018, U.S. provisional application No.62/786,023 filed on day 12/28 2018, and U.S. provisional application No.62/797,723 filed on day 28 1/2019. When large sub-block sizes are used (e.g., 8 × 8), the video coder may derive MVs for motion compensation that need to be derived from the MVs of the four 4 × 4 sub-blocks.

In some color formats, the affine block size of the chroma component may be different from the affine block size of the luma component. For example, for 4: 2: 2 or 4: 2: 0 color format, the affine block size of the chroma component may be half the size of the luma block. Thus, one 4 × 4 chroma sub-block may correspond to four 4 × 4 luma sub-blocks. The video coder may derive MVs for chroma sub-blocks based on MVs for luma sub-blocks. For example, the video coder may derive the MVs for the 4 × 4 chroma sub-blocks as an average of the MVs of all four 4 × 4 luma sub-blocks. However, calculating MVs for chroma sub-blocks based on MVs for all corresponding luma sub-blocks may be complex and require a video coder to access all luma sub-blocks MVs from memory, which may be undesirable.

In accordance with one or more techniques of this disclosure, a video coder (e.g., video encoder 200 and/or video decoder 300) may determine a value of a chroma MV of a chroma sub-block based on values of the luma MV corresponding to a subset of a plurality of luma sub-blocks of the chroma sub-block. The subset of luma sub-blocks is a strict subset that includes fewer luma sub-blocks than all luma sub-blocks corresponding to chroma sub-blocks. By determining the values of the chroma MVs based on the subset of the luma MVs, the video coder may simplify the chroma MV determination process and avoid accessing the values of all the luma MVs. In this way, the techniques of this disclosure may reduce the complexity and memory bandwidth requirements of affine motion compensation.

For purposes of the following discussion, the memory bandwidth is calculated as the number of reference pixels (e.g., reference samples) necessary for interpolation. In an actual hardware implementation, the actual bandwidth may also depend on the hardware architecture and may be greater than some number of access pixels.

For memory bandwidth purposes, the worst case coding scenario for inter modes (e.g., merge and AMVP modes) may be a 4 × 4 bi-prediction block with fractional pixel motion vMV from both directions. For this case, two 11 × 11(121 pixel) luma blocks may need to be accessed from memory to perform interpolation, and two corresponding chroma blocks of 5 × 5 size U and V color components must be obtained. In this calculation, it is assumed that an 8-tap filter is used for luminance component interpolation and a 6-tap filter is used for chrominance component interpolation. The number of pixels required for 2D interpolation depending on block size is summarized in table 1 below. Another case that helps to degrade bandwidth is in ATMVP and affine modes, where the motion vectors of the sub-PUs associated with a CU may result in fetching non-overlapping areas from memory.

TABLE 1 number of pixels for 2D interpolation in JEM

Fig. 7 is a conceptual diagram illustrating a non-overlapping reference region for reconstructing a current block according to one or more aspects of the present disclosure. As shown in fig. 7To reconstruct the current CU 706 of the current frame 702, the video coder may access several regions of the reference frame 704 (e.g., regions of the reference frame 704 stored in a reference picture buffer (such as the decoded picture buffer 218 of fig. 4 or the decoded picture buffer 213 of fig. 5)). In particular, current CU 706 may be divided into sub-blocks 708A-708D (collectively, "sub-blocks 708"), each associated with a sub-block motion vector mv_A-mv_DIs associated with a sub-block motion vector. Each of the sub-blocks 708 may be a prediction unit PU (e.g., 4 × 4 in size). As shown in fig. 7, the sub-block motion vector mv_A-mv_DIdentifies respective ones of the reference regions 710A-710D (collectively "reference regions 710") in the reference frame 704. Notably, each reference region 710 includes a block directly identified by a sub-block motion vector and a region around the block that includes samples accessed by the filter taps. To reconstruct current CU 706, the video coder may need to fetch, access, or otherwise obtain samples for each reference region 710 from memory. As shown in the example of fig. 7, the reference regions 710 may not overlap at all (e.g., meaning that any samples from a particular one of the reference regions 710 are not included in any other one of the reference regions 710). For these worst bandwidth cases mentioned above, the number of pixels to be acquired per pixel for 2D interpolation is summarized in table 2 below

TABLE 2 number of pixels for 2D interpolation in JEM

As described above, in some examples, the reference regions accessed to reconstruct the coding unit may be non-overlapping. In other examples, the reference regions accessed to reconstruct the coding units may overlap. Overlapping reference regions may provide some efficiency for a video coder. For example, a video coder may only need to access samples of overlapping reference regions once, resulting in memory bandwidth savings.

FIG. 8 is a schematic view showingA conceptual diagram for reconstructing an overlapping reference region of a current block according to one or more aspects of the present disclosure. As shown in fig. 8, to reconstruct a current block 806 (e.g., a current CU) of a current frame 802, a video coder may access regions from a reference frame 804 (e.g., regions stored in a reference picture buffer, such as decoded picture buffer 218 of fig. 4 or decoded picture buffer 213 of fig. 5). In particular, the current block 806 may be divided into sub-blocks 708A-708D, each associated with a sub-block motion vector mv_A-mv_DIs associated with a sub-block motion vector. As shown in fig. 8, the sub-block motion vector mv_A-mv_DIdentifies respective ones of the reference regions 810A-810D (collectively referred to as "reference regions 810") in the reference frame 804. It should be noted that each reference region 810 includes a block directly identified by a sub-block motion vector (dashed line) and a region around the block that includes samples accessed by the filter taps (solid line). To reconstruct the current block 806, the video coder may need to fetch, access, or otherwise obtain samples for each reference region 810 from memory. As shown in the example of fig. 8, the reference regions 810 may be partially overlapping (e.g., meaning that some samples from a first reference region of the reference regions 810 are also located in a second reference region of the reference regions 810).

The larger overlap area may save higher bandwidth than a separate acquisition for each 4 x 4 PU. The bandwidth savings BS may be calculated as:

where K, N, and F are the number of acquisition points for a 4 × 4PU, the number of PUs associated with the CU, and the number of acquisition points needed to encode the CU in affine mode, respectively.

The present disclosure describes a variety of techniques to address the above-described problems, including simplification of the transform for 2 x 2 blocks and reduction of bandwidth for memory accesses. The techniques of this disclosure may be used independently or in combination.

According to a first technique, a video coder (e.g., video encoder 200 and/or video decoder 300) may transform a 2 x 2 block. As one example, a video coder may force a 2 × 2 chroma block to be coded in a transform skip mode. In this way, the residual of the 2 × 2 chroma block can be quantized directly in the pixel domain. Transform skip is always applied for 2 x 2 chroma blocks, so the transform skip flag is no longer signaled in the bitstream for these blocks. With this technique, since no transform is applied and transform skipping can be simply performed by a shift operation, processing time can be reduced. In addition, removing the transform _ skip _ flag syntax may improve compression efficiency. For example, in response to determining to partition the video data into at least one 2 x 2 chroma block, the video coder may code the 2 x 2 chroma block using a transform skip mode.

As another example, for a 2 × 2 chroma block, the residual may always be forced to zero, in which case the transform may also be bypassed, and the Coded Block Pattern (CBP) flag for such chroma block signaling indicating whether the residual is zero may be omitted to reduce overhead. In one example, in inter mode, (e.g., whether the residual is zero) may be indicated by using a skip mode that always has a zero residual. For example, in response to determining to partition the video data into at least one 2 x 2 chroma block, the video coder may code the 2 x 2 chroma block with zero residual values (e.g., select a predictor block that exactly matches the 2 x 2 chroma block at least after quantization).

If separate tree coding is applied, meaning that the luminance and chrominance components may have different partition structures, splitting the partition into 2 x 2 chrominance blocks may be prohibited, in which case no 2 x 2 chrominance blocks would be present. Whether the 3 × 3 chroma block is to be disabled may depend on the prediction mode. For example, a 3 × 3 chroma block may be disabled for intra mode, but may be enabled for inter mode. Intra mode has additional dependencies because the prediction is done using neighboring samples, which adds additional burden compared to inter mode.

The described techniques may be applied to one of the following cases, or in any combination: an intra 2 × 2 chroma block in an I-band may be applied to either a chroma 2 × 2 block coded in intra mode or a 2 × 2 chroma block coded in inter mode regardless of the band type.

As described above, and in accordance with one or more techniques of this disclosure, a video coder (e.g., video encoder 200 and/or video decoder 300) may impose one or more constraints to reduce an amount of memory bandwidth used for predicting video data samples. For example, a video coder may determine a memory bandwidth required for a current block, and may selectively modify a motion compensation method for predicting samples of the current block based on whether the determined memory bandwidth for the current block satisfies a bandwidth threshold. In this way, the video coder may reduce the amount of power consumed and/or the processing time required to predict the samples of the current block.

As described above, in some examples, a video coder may determine a memory bandwidth required for a current block. The video coder may use an affine mode to determine the memory bandwidth required for a block based on the values of affine motion model parameters of the block, e.g., based on Control Point Motion Vectors (CPMVs) of the block. As discussed above with reference to equations (1) and (2), the affine motion model may include four or six parameters. The four parametric affine motion model can be implicitly represented as two CPMVs (e.g.,and) And, the six parameter affine motion model can be implicitly represented as three CPMVs: (And). Thus, a CPMV-based determination may be considered an affine model parameter-based determination, and vice versa.

The affine model parameters a, b, c, d determine how far the sub-block motion vectors can be from each other in the PU. In the worst case, the sub-block vectors may be so far apart that there is zero overlap between the reference regions used for motion compensation (e.g., as shown in the example of fig. 7). As described above, the reduced overlap between reference regions results in increased memory bandwidth requirements. A scenario in which there is zero overlap between reference regions may be referred to as a "worst case" scenario of memory bandwidth, as a maximum number of reference samples may need to be accessed from memory (e.g., memory 106, memory 120). As such, this disclosure presents a technique in which a video coder selectively adjusts a motion compensation method for predicting samples of a current block based on a memory bandwidth required to access reference samples of the current block coded using affine mode.

In some examples, a video coder may determine a memory bandwidth required for a current block based on an area of a minimum region that includes all reference blocks from which samples of the current block are to be predicted. An example of such a minimum region is shown as region 820 of fig. 8, which is the minimum region of all samples that comprise reference region 810. To determine the area of the minimum region, the video coder may determine the size of the minimum region. For example, the video coder may derive the location of the reference block from the CPMV of the current block. In the example of fig. 8, the video coder may obtain a value of the CPMV for current block 806, and may derive a sub-block motion vector mv based on the value of the CPMV_A-mv_DThe value of (c). As described above, the sub-block motion vector mv in FIG. 8_A-mv_DEach identifying a respective reference area of the reference areas 810, each reference area including a reference block.

The video coder may determine the boundary of the minimum region based on the location of the reference block. For example, the video coder may determine a top boundary, a bottom boundary, a left boundary, and a right boundary of the minimum region 820 based on the identified reference blocks. To determine the boundaries of the minimum regions, the video coder may determine the boundaries of each reference region. For example, the video coder may determine a top boundary, a bottom boundary, a left boundary, and a right boundary of each of the reference regions 810. As described above, each reference region may include one reference block (e.g., shown with a dashed line having an upper left corner identified by a sub-block motion vector) and additional samples around the reference block for interpolation (e.g., shown with a solid line around the dashed line). The left and top boundaries of the reference area 810A of sub-block 808A may be calculated as:

LeftR_810A＝x+mv_AX–interpolationX/2 (6)

TopR_810A＝y+mv_AY–interpolationY/2 (7)

the width (wR) and height (hR) of the reference region 810 may be calculated as:

wR_810A＝interpolationX+w–1 (8)

hR_810A＝interpolationY+w–1 (9)

the right and bottom boundaries of the reference region may be calculated as:

RightR_810A＝LeftR_810A+wR_810A–1 (10)

BottomR_810A＝TopR_810A+hR_810A–1 (11)

where (x, y) is the position of sub-block 808A, MV_A＝(mv_AX，mv_AY) Is the integer precision motion vector for sub-block 808A, interpolarationx and interpolarationy are the lengths of the filters for the horizontal and vertical directions, respectively, w is the width of sub-block 808A, and h is the height of sub-block 808A.

The video coder may determine a size of the minimum region based on the determined boundary of the reference region. For example, the video coder may determine the boundary of the minimum region as the minimum boundary of each of the top, bottom, left, and right boundaries of the reference region. The video coder may determine the boundary of the minimum region 820 as follows:

Top₈₂₀＝min(TopR_810A，TopR_810B，TopR_810C，TopR_810D) (12)

Left₈₂₀＝min(LeftR_810A，LeftR_810B，LeftR_810C，LeftR_810D) (13)

Right₈₂₀＝max(RightR_810A，RightR_810B，RightR_810C，RightR_810D) (14)

Bottom₈₂₀＝max(BottomR_810A，BottomR_810B，BottomR_810C，BottomR_810D) (15)

in this way, the video coder may determine the size of the smallest region that includes multiple reference blocks based on the value of the CPMV. As described above, a video coder may determine the memory bandwidth required for a current block based on the area of a minimum region that includes all reference blocks from which samples of the current block are to be predicted. For example, the video coder may determine the memory bandwidth required for the current block 806 based on the area of the minimum region 820. The video coder may determine the area of the minimum region by multiplying the height of the minimum region by the width of the minimum region. The video coder may determine the width of the minimum region by subtracting the left boundary from the right boundary, and may determine the height of the minimum region by subtracting the top boundary from the bottom boundary. For example, the video coder may determine the area of the minimum region 820 as follows:

Area₈₂₀＝(Right₈₂₀－Left₈₂₀+1)*(Bottom₈₂₀－Top₈₂₀+1) (16)

in other examples, the video coder may determine the area of the minimum region 820 as follows:

Area₈₂₀＝(max(LeftR_810A，LeftR_810B，LeftR_810C，LeftR_810D)－Left₈₁₀+wR)*(max(TopR_810A，TopR_810B，TopR_810C，TopR_810D)－Top₈₁₀+hR) (17)

the video coder may determine the memory bandwidth required for the current block based on the area of the smallest region. In some examples, the video coder may directly use the determined area of the minimum region as the current blockThe required memory bandwidth. For example, the video coder may determine that the bandwidth required for block 806 is equal to Area₈₂₀. In other examples, the video coder may scale or otherwise modify the area of the minimum region to determine the memory bandwidth required for the current block. In this way, the video coder may determine the memory bandwidth required to access samples of multiple reference blocks derived based on the value of the CPMV of the current block.

As described above, the video coder may selectively modify a motion compensation method for predicting samples of the current block based on whether the determined memory bandwidth for the current block satisfies a bandwidth threshold. The video coder may compare the determined memory bandwidth to a predetermined bandwidth threshold. The bandwidth threshold may be predefined in a configuration file or may be passed as a parameter to the video decoder.

To selectively modify the motion compensation method, the video coder may modify (e.g., change, alter, or otherwise adjust) the motion compensation method in response to determining that the memory bandwidth does not satisfy the bandwidth threshold. Similarly, the video coder may not modify the motion compensation method in response to determining that the memory bandwidth does not satisfy the bandwidth threshold. In some examples, the video coder may determine that the memory bandwidth satisfies the bandwidth threshold when the determined memory bandwidth is less than or equal to (e.g., < or ═ to) the bandwidth threshold. In some examples, the video coder may determine that the memory bandwidth satisfies a bandwidth threshold when the determined memory bandwidth is less than (e.g., <) the bandwidth threshold. In some examples, the video coder may determine that the memory bandwidth does not satisfy the bandwidth threshold when the determined memory bandwidth is greater than (e.g., >) the bandwidth threshold. In some examples, the video coder may determine that the memory bandwidth does not satisfy the bandwidth threshold when the determined memory bandwidth is greater than or equal to (e.g., > or ═ the bandwidth threshold.

The video coder may modify the motion compensation method in a variety of ways. For example, the video coder may modify the motion compensation method used to predict the samples of the current block in a manner that will reduce the memory bandwidth required to predict the samples of the current block (e.g., relative to an unmodified motion compensation method). Some example modifications to the motion compensation method include, but are not limited to, modifying the sub-block size, modifying the number of filter taps used for interpolation, coding the current block using a simple mode, or any other modification that reduces the memory bandwidth required to predict the samples of the current block. In this way, the video coder may selectively modify a motion compensation method for predicting samples of a current block of video data based on whether the determined memory bandwidth satisfies a bandwidth threshold.

To modify the sub-block size, the video coder may group one or more sub-blocks together in order to reduce the number of sub-blocks of the current block. For example, a video coder may change from using 4 x 4 sub-blocks to using 8 x 8 sub-blocks. The video coder may predict a block of packets using motion vectors derived using motion vectors of corresponding sub-blocks (e.g., included in the block of packets). In some examples, a video coder may derive a motion vector by averaging the motion vectors of the corresponding sub-blocks. As one example, if a block of packets is square (same width and height), the video coder may use the average motion vector of all corresponding sub-blocks as the motion vector of the block of packets. As another example, the video coder may use an average of a subset of the motion vectors of the corresponding sub-blocks as the motion vector of the block of packets. For example, the video coder may use an average of the motion vectors of the diagonal set sub-blocks (e.g., top-left and bottom-right, or top-right and bottom-left) as the motion vector for the block of packets. In some examples, the video coder may use a motion vector for a particular sub-block of the corresponding sub-blocks as the derived motion vector. Modifying the subblock size may reduce memory bandwidth because fewer reference regions may be needed.

To modify the number of filter taps, the video coder may reduce the number of filter taps used for interpolation. In one example, a 4-tap chroma interpolation filter may be used to interpolate the luma component instead of the original 8-tap luma interpolation filter. By reducing the number of filter taps, the video coder may reduce the size of the reference region (e.g., because interpolation x in equations (6) and (8) and/or interpolation y in equations (7) and (9) will be reduced).

The video coder may predict samples for a current block of video data from samples of multiple reference blocks using a selectively modified motion compensation method. For example, a video coder may obtain (e.g., from memory) samples of multiple reference blocks and add the obtained samples to residual data to reconstruct samples of a current block.

Fig. 9 is a conceptual diagram illustrating determining a chroma motion vector from a luma motion vector according to one or more techniques of this disclosure. As described above, a current block of video data may include a luma block and a corresponding chroma block. In a color format of 4: 2: 2 or 4: 2: in case of 0, the size of the chrominance block may be half the size of the luminance block. As shown in fig. 9, the chroma sub-blocks 904 may correspond to luma block 900, and the luma block 900 may be divided into luma sub-blocks 902A-902D (collectively, "luma sub-blocks 902"). The luma sub-block 902 may be referenced based on the relative position of the luma sub-block 902 within the luma block 900. For example, the luminance sub-block 902A may be referred to as a top-left (TL) sub-block, the luminance sub-block 902B may be referred to as a top-right (TR) sub-block, the luminance sub-block 902C may be referred to as a bottom-left (BL) sub-block, and the luminance sub-block 902D may be referred to as a bottom-right (BR) sub-block. In the example of fig. 9, the luma block 900 may be 8 × 8 samples, and the chroma sub-blocks 904 and each luma sub-block 902 may be 4 × 4 samples.

A video coder (e.g., video encoder 200 and/or video decoder 300) may obtain a respective luma motion vector for each luma sub-block 902. For example, the video coder may obtain a first luma MV for luma sub-block 902A, a second luma MV for luma sub-block 902B, a third luma MV for luma sub-block 902C, and a fourth luma MV for luma sub-block 902D. In some examples, the video coder may obtain the luma MV based on an affine motion model of the luma block 900.

The video coder may determine chroma MVs for chroma sub-blocks 904 based on the luma MVs. In accordance with one or more techniques of this disclosure, rather than determining chroma MVs based on each MV and each luma sub-block, the video coder may determine chroma MVs based on MVs of a subset of the luma sub-blocks. In some examples, the subset of luma sub-blocks may include two diagonally located luma sub-blocks. As one example, the video coder may determine the chroma MV based on the MV of the luma sub-block 902A and the MV of the luma sub-block 902D (e.g., top-left and bottom-right sub-blocks). As another example, the video coder may determine the chroma MV based on the MV of the luma sub-block 902B and the MV of the luma sub-block 902C (e.g., the upper-right and lower-left sub-blocks).

As described above, the video coder may determine chroma MVs based on MVs of a subset of the luma sub-blocks. For example, the video coder may determine the chroma MV as the average of the MVs of the subset of the luma sub-blocks. For illustration, (vx0, vy0) and (vx1, vy1) are represented as the two MVs from the selected sub-blocks used for averaging. In one example, the video coder may perform the averaging as ((vx0+ vx1) > >1, (vy0+ vy1) > > 1). In another example, the video coder may perform the averaging as ((vx0+ vx1+1) > >1, (vy0+ vy1+1) > > 1). In another example, the video coder may perform the averaging as ((vx0+ vx1)/2, (vy0+ vy 1)/2). In another example, the video coder may perform the averaging as ((vx0+ vx1+1)/2, (vy0+ vy1+ 1)/2).

In some examples, a video coder may derive MVs for chroma sub-blocks by averaging high precision motion vectors for corresponding luma blocks (e.g., at a high precision level). The video coder may perform averaging with a motion vector rounding process. Where (vxHi, vhHi) is the high precision motion vector for the ith luma block, the video decoder may calculate the sum of the motion vectors using the four luma blocks as follows:

(sumX，sumY)＝(vxH0+vxH1+vxH2+vxH3，vhH0+vhH1+vhH2+vhH3)

as described above, a video coder may determine the sum of motion vectors using two diagonal blocks. As one example, where two diagonal blocks include an upper left sub-block and a lower right sub-block, the video coder may calculate the sum motion vector as follows:

(sumX，sumY)＝(vxH0+vxH3，vhH0+vhH3)

as another example, where two diagonal blocks include an upper-right sub-block and a lower-left sub-block, the video coder may calculate the sum motion vector as follows:

(sumX，sumY)＝(vxH1+vxH2，vhH1+vhH2)

the video coder may round the sum of the motion vectors to form a scaled high precision motion vector for the chroma block (mvScX, mvScY). As an example, a video coder may round the sum motion vector as follows:

mvScX? (sumX + offset) > > nShift:- ((((sumX) + offset) > > nShift) mvScY ═ sumY > -0? (sumY + offset) > > nShift: (- ((sumY) + offset) > > nShift) wherein nShift and offset are integers that can be determined based on the number of luminance motion vectors involved in the summation of motion vectors (sumX, sumY). For example, if the sum motion vector is calculated using four luminance motion vectors, nShift may be set to two and offset may be zero or two. In another example, if (sumX, sumY) is the sum of two luma motion vectors, nShift may be set to one and offset may be set to zero or one.

The video coder may derive the motion vector for the chroma block by downscaling (mvScX, mvScY). For example, if the luma motion vector is 1/16 pixels accurate and the chroma motion vector is 1/32 pixels accurate, the video coder may derive the integer motion vectors (imvCX, imvCY) for the chroma blocks as follows:

(imvCX,imvCY)＝(mvScX>>5,mvScY>>5)

in some examples, the foregoing techniques may be used to derive MVs for motion compensation of large sub-blocks, typically from MVs of its multiple small sub-blocks. For example, a video coder may derive MVs for motion compensation of a large sub-block by averaging MVs of two small sub-blocks at a diagonal of the large sub-block.

In some examples, the size of the small sub-blocks may be M × N, while the size of the large sub-blocks may be 2M × 2N. M and N may be 4, but other values of M and N may be used (in some cases M may be equal to N, in other cases M may be different from N). As described above, in the case where a large subblock includes four small subblocks, the four small subblocks may be referred to as an upper left subblock, an upper right subblock, a lower left subblock, and a lower right subblock. In one example, the video coder may derive the MV for motion compensation of the 2 mx 2N sub-block by averaging the MVs of the top-left and bottom-right mxn sub-blocks. In another example, the video coder may derive the MV for motion compensation of the 2 nx 2N sub-block by averaging the MVs of the top-right and bottom-left mxn sub-blocks.

Note that the size of the large subblocks does not have to be 2 mx 2N. The above technique can also be applied if the large subblock size is s1 × M × s2 × N, where s1 and s2 are the number of small subblocks in the large subblocks in the row and column, respectively. For example, representing (vx0, vy0) and (vx1, vy1) as two MVs from small sub-blocks selected for averaging, the video coder may derive the MVs for motion compensation of large sub-blocks as ((vx0+ vx1) > >1, (vy0+ vy1) > > 1). By comparing the foregoing technique with the use of all small sub-blocks, it can be seen that the techniques of the present disclosure can significantly reduce computational complexity.

Fig. 4 is a block diagram illustrating an example video encoder 200 that may perform the techniques of this disclosure. Fig. 4 is provided for purposes of explanation and should not be considered a limitation on the techniques broadly illustrated and described in this disclosure. For purposes of illustration, this disclosure describes video encoder 200 in the context of video coding standards such as the HEVC video coding standard and the h.266 video coding standard under development. However, the techniques of this disclosure are not limited to these video coding standards and are generally applicable to video encoding and decoding.

In the example of fig. 4, the video encoder 200 includes a video data memory 230, a mode selection unit 202, a residual generation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a filter unit 216, a Decoded Picture Buffer (DPB)218, and an entropy coding unit 220. Any or all of video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, DPB218, and entropy encoding unit 220 may be implemented in one or more processors or in processing circuitry. Further, the video encoder 200 may include additional or alternative processors or processing circuits to perform these or other functions.

Video data memory 230 may store video data to be encoded by components of video encoder 200. Video encoder 200 may receive video data stored in video data storage 230 from, for example, video source 104 (fig. 1). DPB218 may be used as a reference picture memory that stores reference video data for use by video encoder 200 in predicting subsequent video data. Video data memory 230 and DPB218 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM), including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. Video data memory 230 and DPB218 may be provided by the same memory device or separate memory devices. In various examples, video data memory 230 may be located on-chip with other components of video encoder 200, as shown, or off-chip with respect to those components.

In this disclosure, references to video data memory 230 should not be construed as limited to memory internal to video encoder 200 (unless specifically stated otherwise) or memory external to video encoder 200 (unless specifically stated otherwise). Of course, references to video data memory 230 should be understood as reference memory that stores video data received by video encoder 200 for encoding (e.g., video data to be encoded for a current block). The memory 106 of fig. 1 may also provide temporary storage for the various unit outputs of the video encoder 200.

The various elements of fig. 4 are shown to assist in understanding the operations performed by video encoder 200. These units may be implemented as fixed function circuits, programmable circuits, or a combination thereof. Fixed function circuitry refers to circuitry that provides a particular function and presets operations that can be performed. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in operations that can be performed. For example, a programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., to receive parameters or output parameters), but the type of operations performed by the fixed function circuitry is typically immutable. In some examples, one or more of the cells may be different blocks of circuitry (fixed function or programmable), and in some examples, one or more of the cells may be integrated circuits.

The video encoder 200 may include an Arithmetic Logic Unit (ALU), a basic function unit (EFU), digital circuitry, analog circuitry, and/or a programmable core formed from programmable circuitry. In examples where the operations of video encoder 200 are performed using software executed by programmable circuitry, memory 106 (fig. 1) may store object code of the software received and executed by video encoder 200, or another memory within video encoder 200 (not shown) may store such instructions.

The video data memory 230 is configured to store the received video data. The video encoder 200 may retrieve pictures of video data from the video data memory 230 and provide the video data to the residual generation unit 204 and the mode selection unit 202. The video data in the video data memory 230 may be original video data to be encoded.

Mode selection unit 202 includes motion estimation unit 222, motion compensation unit 224, and intra prediction unit 226. The mode selection unit 202 may comprise additional functional units for performing video prediction according to other prediction modes. As an example, the mode selection unit 202 may include a palette unit, an intra block copy unit (which may be part of the motion estimation unit 222 and/or the motion compensation unit 224), an affine unit, a Linear Model (LM) unit, and the like.

In general, the mode selection unit 202 coordinates multiple encoding times to test combinations of encoding parameters and derive rate-distortion values for such combinations. The encoding parameters may include a CTU-to-CU partition, a prediction mode of the CU, a transform type of residual data of the CU, a quantization parameter of the residual data of the CU, and the like. The mode selection unit 202 may finally select a combination of encoding parameters that has a better rate-distortion value than other tested combinations.

Video encoder 200 may partition a picture retrieved from video data memory 230 into a series of CTUs and encapsulate one or more CTUs within a slice. The mode selection unit 202 may partition the CTUs of a picture according to a tree structure, such as the QTBT structure of HEVC or a quadtree structure described above. As described above, the video encoder 200 may form one or more CUs by partitioning CTUs according to a tree structure. Such CUs may also be commonly referred to as "video blocks" or "blocks".

In general, mode select unit 202 also controls its components (e.g., motion estimation unit 222, motion compensation unit 224, and intra prediction unit 226) to generate a prediction block for the current block (e.g., the current CU or an overlapping portion of a PU and a TU in HEVC). To inter-predict the current block, the motion estimation unit 222 may perform a motion search to identify one or more closely matching reference blocks in one or more reference pictures (e.g., one or more previously coded pictures stored in the DPB 218). In particular, the motion estimation unit 222 may calculate a value representing how similar the potential reference block is to the current block based on, for example, Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Mean Absolute Differences (MAD), Mean Squared Differences (MSD), etc. The motion estimation unit 222 may typically perform these calculations using the sample-by-sample difference between the current block and the reference block under consideration. The motion estimation unit 222 may identify the reference block having the lowest value generated by these calculations, indicating the reference block that most closely matches the current block.

The motion estimation unit 222 may form one or more Motion Vectors (MVs) that define the position of a reference block in a reference picture relative to a current block in a current picture. The motion estimation unit 222 may then provide the motion vectors to the motion compensation unit 224. For example, for uni-directional inter prediction, motion estimation unit 222 may provide a single motion vector, while for bi-directional inter prediction, motion estimation unit 222 may provide two motion vectors. Then, the motion compensation unit 224 may generate a prediction block using the motion vector. For example, the motion compensation unit 224 may use the motion vectors to retrieve data of the reference block. As another example, if the motion vector has fractional sample precision, the motion compensation unit 224 may interpolate the prediction block according to one or more interpolation filters. Further, for bi-directional inter prediction, the motion compensation unit 224 may retrieve data for two reference blocks identified by respective motion vectors and combine the retrieved data (e.g., by sample-by-sample averaging or weighted averaging).

As another example, for intra-prediction or intra-prediction coding, the intra-prediction unit 226 may generate a prediction block from samples adjacent to the current block. For example, for directional mode, the intra prediction unit 22 may generally mathematically combine adjacent sample values and pad these calculated values in a defined direction on the current block to generate a prediction block. As another example, for the DC mode, the intra prediction unit 226 may calculate an average value of neighboring samples of the current block, and generate the prediction block to include the average value obtained for each sample of the prediction block.

The mode selection unit 202 supplies the prediction block to the residual generation unit 204. The residual generation unit 204 receives the original unencoded version of the current block from the video data memory 230 and the prediction block from the mode selection unit 202. The residual generation unit 204 calculates a sample-by-sample difference between the current block and the prediction block. The resulting sample-by-sample difference defines a residual block for the current block. In some examples, residual generation unit 204 may also determine differences between sample values in the residual block using Residual Differential Pulse Code Modulation (RDPCM) to generate the residual block. In some examples, residual generation unit 204 may be formed using one or more subtractor circuits that perform binary subtraction.

In an example where mode selection unit 202 partitions a CU into PUs, each PU may be associated with a luma prediction unit and a corresponding chroma prediction unit. The video encoder 200 and the video decoder 300 may support PUs having various sizes. As described above, the size of a CU may refer to the size of a luma coding block of the CU, and the size of a PU may refer to the size of a luma prediction unit of the PU. Assuming that the size of a particular CU is 2 nx 2N, video encoder 200 may support 2 nx 2N or N × N PU sizes for intra prediction, and 2 nx 2N, 2 nx N, N × 2N, N × N, or similar symmetric PU sizes for inter prediction. The video encoder 200 and the video decoder 300 may also support asymmetric partitioning of PU sizes of 2 nxnu, 2 nxnd, nL × 2N, and nR × 2N for inter prediction.

In examples where the mode selection unit does not further partition a CU into PUs, each CU may be associated with a luma coding block and a corresponding chroma coding block. As described above, the size of a CU may refer to the size of the luma coding block of the CU. The video encoder 200 and the video decoder 300 may support CU sizes of 2N × 2N, 2N × N, or N × 2N.

For other video coding techniques, such as intra block copy mode coding, affine mode coding, and Linear Model (LM) mode coding as some examples, mode selection unit 202 generates a prediction block for the current block being encoded via respective units associated with the coding techniques. In some examples, such as palette mode coding, mode selection unit 202 may not generate a prediction block, but generate a syntax element that indicates the manner in which a block is reconstructed based on the selected palette. In such a mode, mode selection unit 202 may provide these syntax elements to entropy encoding unit 220 for encoding.

As described above, the residual generation unit 204 receives video data of the current block and the corresponding prediction block. Then, the residual generation unit 204 generates a residual block for the current block. To generate the residual block, the residual generation unit 204 calculates a sample-by-sample difference between the prediction block and the current block.

Transform processing unit 206 applies one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a "transform coefficient block"). Transform processing unit 206 may apply various transforms to the residual block to form a block of transform coefficients. For example, the transform processing unit 206 may apply Discrete Cosine Transform (DCT), directional transform, Karhunen-Loeve transform (KLT), or conceptually similar transform to the residual block. In some examples, transform processing unit 206 may perform multiple transforms on the residual block, e.g., a primary transform and a secondary transform such as a rotational transform. In some examples, transform processing unit 206 does not apply the transform to the residual block.

The quantization unit 208 may quantize transform coefficients in a transform coefficient block to produce a quantized transform coefficient block. The quantization unit 208 may quantize transform coefficients of a transform coefficient block according to a Quantization Parameter (QP) value associated with the current block. Video encoder 200 (e.g., via mode selection unit 202) may adjust the degree of quantization applied to the transform coefficient block associated with the current block by adjusting the QP value associated with the CU. Quantization may introduce information loss and, thus, quantized transform coefficients may have lower precision than the original transform coefficients produced by transform processing unit 206.

The inverse quantization unit 210 and the inverse transform processing unit 212 may apply inverse quantization and inverse transform, respectively, to the quantized transform coefficient block to reconstruct a residual block from the transform coefficient block. The reconstruction unit 214 may generate a reconstructed block (although potentially with some degree of distortion) corresponding to the current block based on the reconstructed residual block and the prediction block generated by the mode selection unit 202. For example, the reconstruction unit 214 may add samples of the reconstructed residual block to corresponding samples of the prediction block generated by the mode selection unit 202 to produce a reconstructed block.

The filter unit 216 may perform one or more filter operations on the reconstructed block. For example, filter unit 216 may perform deblocking operations to reduce blocking artifacts along edges of a CU. In some examples, the operation of the filter unit 216 may be skipped.

The video encoder 200 stores the reconstructed block in the DPB 218. For example, in an example of an operation that does not require the filter unit 216, the reconstruction unit 214 may store the reconstruction block to the DPB 218. In examples where operation of filter unit 216 is desired, filter unit 216 may store the filtered reconstructed block to DPB 218. The motion estimation unit 222 and the motion compensation unit 224 may retrieve reference pictures from the DPB218 that are formed from reconstructed (and potentially filtered) blocks to inter-predict blocks of subsequently encoded pictures. In addition, the intra prediction unit 226 may intra predict other blocks in the current picture using reconstructed blocks in the DPB218 of the current picture.

In general, the entropy encoding unit 220 may entropy encode syntax elements received from other functional components of the video encoder 200. For example, entropy encoding unit 220 may entropy encode the block of quantized transform coefficients from quantization unit 208. As another example, entropy encoding unit 220 may entropy encode the prediction syntax elements (e.g., motion information for inter prediction or intra mode information for intra prediction) from mode selection unit 202. Entropy encoding unit 220 may perform one or more entropy encoding operations on syntax elements of another example of video data to generate entropy encoded data. For example, entropy encoding unit 220 may perform a Context Adaptive Variable Length Coding (CAVLC) operation, a CABAC operation, a variable-to-variable (V2V) length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a Probability Interval Partitioning Entropy (PIPE) coding operation, an exponential-golomb coding operation, or another type of entropy encoding operation on the data. In some examples, entropy encoding unit 220 may operate in a bypass mode without entropy encoding the syntax elements.

The video encoder 200 may output a bitstream that includes entropy encoded syntax elements needed to reconstruct blocks of a slice or picture. In particular, the entropy encoding unit 220 may output a bitstream.

The above operations are described with respect to blocks. Such a description should be understood as an operation for a luma decoding block and/or a chroma decoding block. As described above, in some examples, the luma and chroma coding blocks are luma and chroma components of a CU. In some examples, the luma and chroma coding blocks are luma and chroma components of the PU.

In some examples, the operations performed for luma coded blocks need not be repeated for chroma coded blocks. As one example, the operations for identifying Motion Vectors (MVs) and reference pictures for luma coded blocks need not be repeated to identify MVs and reference pictures for chroma blocks. Instead, the MVs of the luma coding blocks may be scaled to determine the MVs of the chroma blocks, and the reference pictures may be the same. As another example, the intra prediction process may be the same for luma and chroma coded blocks.

As described above, and in accordance with one or more techniques of this disclosure, a video coder (e.g., video encoder 200 and/or video decoder 300) may reduce an amount of memory bandwidth used to predict samples of video data. In some embodiments, bandwidth is reduced by limiting the number of pixels accessed for interpolation.

As an exemplary bandwidth reduction technique, the video coder may round the Motion Vectors (MVs) for the merge and AMVP modes of the 4 × 4 CU. The video coder may round MV candidates in the merge candidate list and AMVP motion vector candidate list. In some examples, the video coder may not signal (e.g., in the bitstream) a first Motion Vector Difference (MVD) resolution flag that indicates whether or not it is quarter luma sample MV precision. Extending this rounding, the MVD can be made integer for the corresponding rounded MV component, so the video coder can signal MVD > 2. A video coder (such as video decoder 300) may reconstruct a motion vector according to the following formula:

MV＝[MVP]+(MVD＜＜2) (18)

where parenthesis indicates the rounding operation, MV is the value of the motion vector, MVP is the value of the motion vector prediction, and MVD is the value of the motion vector difference. MVD parsing is not affected by this shift and the MVs can be adjusted in the reconstruction stage. The number of reference pixels accessed and the worst-case reduction in bandwidth are summarized in table 2 below.

As an exemplary bandwidth reduction technique, the video coder may fix the sub-PU size in ATMVP mode to 8 × 8. When the size is set to 4 × 4, the video coder may access (e.g., from memory) an 11 × 11 block (121 pixels) to perform motion compensation. The 8 × 8 region contains four 4 × 4 PUs. Thus, the memory will be accessed four times and the total access point is 484 (4 × 121). However, when the size of the PU is set to 8 × 8, the video coder may only need to access (from memory) one block of (7+8) × (7+8) ═ 225 points of the luminance component and two 7 × 7 chroma blocks. This memory access may only need to be performed once. The number of reference pixels accessed and the reduction in bandwidth for this worst case is summarized in table 2 below. As shown in table 2, by fixing the sub-PU size of ATMVP to 8 × 8, the bandwidth can be reduced by 53.3% compared to the number of points accessed when the size of PU is set to 4 × 4.

Table 2-recommended worst case bandwidth reduction

In some examples, the video coder may apply one or more constraints to reduce an amount of memory bandwidth used to predict samples of video data in an affine mode. As one example, a video coder may reduce memory bandwidth for affine mode by limiting motion vector differences (also referred to as control point differences) between affine control points. For example, the video coder may determine a memory bandwidth required for the current block (e.g., based on a control point for the current block). The video coder may compare the determined memory bandwidth to a predetermined bandwidth threshold (e.g., to ensure that the memory bandwidth savings should not be less than a predetermined memory bandwidth savings amount). The bandwidth threshold may be predefined in a configuration file or may be passed as a parameter to the video decoder. The video coder may selectively modify a motion compensation method for predicting samples of the current block based on whether the determined memory bandwidth for the current block satisfies a bandwidth threshold.

The video coder may modify the motion compensation method in a variety of ways. As one example, if the determined bandwidth (e.g., control point difference) is less than the bandwidth threshold, the video coder may modify the motion compensation by affine testing the CU using the 4 × 4 sub-block (i.e., 4 × 4 affine mode). Otherwise, the video coder may affine test the CU using an 8 × 8 sub-block (8 × 8 affine mode) or using an SBWidth × SBHeight sub-block (where SBWidth or SBHeight is greater than 4) instead of a 4 × 4 affine mode, or simply disallow a 4 × 4 affine mode in order to meet the target bandwidth reduction.

The video coder may apply constraints to the L0 or L1 motion directions, respectively. Thus, it is possible to have 4 × 4 subblocks in either motion direction or SBWidth × SBHeight in the other motion direction. In some examples, the video coder may apply constraints to both motion lists to determine the sub-block size, meaning that both L0 and L1 should have the same sub-block size depending on whether the constraints are satisfied for both directions.

The video coder may perform normalization of the motion vector differences. For example, since the motion vector for each affine sub-block is calculated based on the size of the CU, the video coder may normalize the motion vector difference based on the size of the CU. In some examples, normalization may simplify prediction, as the constraints on each size may maximize the utilization of the 4 x 4 sub-blocks in affine motion.

The following are examples of normalized, constrained conditions for 6-parameter affine models and 4-parameter affine models (e.g., as used in VTM):

normalization of motion vector differences

An example of a paradigm of motion vector differences based on the size (wxh) of a CU can be given as follows:

Norm(v_1x-v_0x)＝(v_1x-v_0x)*S/w

Norm(v_1y-v_0y)＝(v_1y-v_0y)*S/w

Norm(v_2x-v_0x)＝(v_2x-v_0x)*S/h

Norm(v_2y-v_0y)＝(v_2y-v_0y)*S/h (19)

where S is the scaling factor with a fixed-point implementation and Norm (.) is full-pixel scale. Other pixel resolutions may also be applied.

6-limitation of the parametric model:

the 4 × 4 affine pattern is tested against the 6-parameter model if the following conditions are met:

|Norm(v_1x-v_0x)+Norm(v_2x-v_0x)+X|+|Norm(v_1y-v_0y)+Norm(v_2y-v_0y)+Y|+|Norm(v_1x-v_0x)-Norm(v_2x-v_0x)|+|Norm(v_1y-v_0y)-Norm(v_2y-v_0y)|<N (20)

the bandwidth savings may be symmetric. However, the bandwidth savings should be shifted by the values of X and Y in (20). Also in (20), N represents the boundary of the constraint. This value can be adjusted to achieve minimal bandwidth savings. For example, if the operation is applied to a full pixel scale, to ensure that the bandwidth savings should not be less than 50%, X, Y and N are set to S, S and S × 7/2, respectively. For fractional pixel scale, the values of X, Y and N should be adjusted accordingly. N may also depend on block size, with larger block sizes larger N may be used (e.g., larger overlap regions may be achieved because larger block sizes have more 4 x 4 sub-blocks).

Constraints for 4-parameter model:

a 4-parameter affine model is a special case of a 6-parameter model, in which case only two control points are considered to obtain the motion vectors of a sub-affine block. In this model, (v)_2x-v_0x) And (v)_2y-v_0y) The following settings are set:

(v_2x-v_0x)＝-(v_1y-v_0y)

(v_2y-v_0y)＝(v_1x-v_0x) (21)

in this case, (v)_2x-v_0x) And (v)_2y-v_0y) The paradigm of (a) is given as follows:

Norm(v_2x-v_0x)＝-Norm(v_1y-v_0y)

Norm(v_2y-v_0y)＝Norm(v_1x-v_0x) (22)

by applying either (22) or (19), the constraints of the 4-parameter affine model can be given as follows:

|Norm(v_1x-v_0x)+Norm(v_1y-v_0y)+X|+|Norm(v_1y-v_0y)+Norm(v₁x-v0x)+Y|+|Norm(v_1x-v_0x)-Norm(v_1y-v_0y)|+|Norm(v_1y-v_0y)-Norm(v_1x-v_0x)|＜N (23)

in some examples, the video coder may apply other restrictions (e.g., constraints) on the 4-parameter affine pattern. As an example, a video coder may apply the following restrictions:

|(v_1x-v_0x)+w|+|(v_1y-v_0y)|≤(N+log 2(h/8))(w/8) (24)

where (+ w) to the left of the condition indicates that the bandwidth reduction is symmetric, but shifted by a factor of w. The term log 2(h/8) is a bias normalization factor that represents the effect of h on bandwidth savings. w/8 is the normalization of the block size. N represents a bandwidth saving level. The condition (24) can be confirmed by an example of the bandwidth saving (%) calculated for the blocks having w of 8 and various h.

Additional or alternative constraints for a 4-parameter affine model can be given as follows:

|Norm(v_1x-v_0x)+X|+|Norm(v_1y-v_0y)+Y|＜N (25)

where X, Y and N correspond to a bandwidth savings level.

Additional or alternative constraints for a 4-parameter affine model can be defined as follows:

c₁＝|(v_1x-v_0x)+4*log 2(w/8)+2|+|(v_1y-v_0y)+8*log 2(w/8)+2*2/8|＜N+4+w (26)

c₂＝|(v_1x-v_0x)+w-4*log 2(w/8)-2|+|(v_1y-v_0y)-8*log 2(w/8)-2*w/8|＜N+4+w (27)

if c is satisfied at the same time₁And c₂Then canTo test the 4 x 4 affine pattern. Otherwise (e.g., if c is not satisfied)₁And c₂One or both) the affine pattern will be larger.

In yet another additional or alternative method of limiting: the variable blkW specifies the width of the block and the variable blkH specifies the height of the block. The variable subtiblkw specifies the default width of the sub-block for affine motion compensation. The variable subBlkH specifies the default height of the sub-block for affine motion compensation. In this example, if the constraint is satisfied, the subblock for affine motion compensation has a size (subBlkW, subBlkH); otherwise, the sub-blocks used for affine motion compensation have a size (subBlkW × 2, subBlkH × 2). Typically, subBlkW ═ 4 and subBlkH ═ 4, but they may have other integer values.

Some example limitations for a 6-parameter affine model include:

in one example of the use of a magnetic resonance imaging system,

when normalization is applied, the constraint becomes:

-subBlkW×S≤Norm(v_1x-v_0x)*subBlkW＜S (34)

-S＜Norm(v_1y-v_0y)*subBlkW＜S (35)

-S＜Norm(v_2x-v_0x)*subBlkH＜S (36)

-S≤Norm(v_2x-v_0x)*subBlkH＜S (37)

-blkW*S≤Norm(v_1x-v_0x)*subBlkW+Norm(v_2x-v_0x)*subBlkH＜S (38)

-blkH*S≤Norm(v_1y-v_0y)*subBlkW+Norm(v_2y-v_0y)*subBlkH＜S (39)

in another example, "less than or equal to" operation "≦" may be replaced with "less than" operation "<". For example:

equation (28) may be replaced by

Equations (31) to (33) may be replaced by

The video coder may similarly apply the normalization techniques described above. Wherein the resolution of the motion vectors is at the sub-pixel levelAnd v is_ix、v_iyIn units of sub-pixels, the video decoder may scale the corresponding equations accordingly. For example, if normalization is applied, the video coder may scale S.

In the 4-parameter affine model, the video decoder can be set as follows (v)_2x-v_0x) And (v)_2y-v_0y)：

(v_2x-v_0x)＝-(v_1y-v_0y)

(v_2y-v_0y)＝(v_1x-v_0x)

In this case, (v)_2x-v_0x) And (v)_2y-v_0y) The paradigm of (a) can be given as follows:

Norm(v_2x-v_0x)＝-Norm(v_1y-v_0y)

Norm(v_2y-v_0y)＝Norm(v_1x-v_0x)

the constraints of the 4-parameter affine model can then be established accordingly.

Additional or alternative constraints for unidirectional prediction are described below. The constraints for the unidirectional prediction may be the same as described above. The limit may be other alternative limits as well.

In one example, the limit includes equations (28) and (29). The subblock for affine motion compensation has a size (subBlkW, subBlkH) if the constraint is satisfied. Otherwise, the subblock used for affine motion compensation has a size (subBlkW × 2, subBlkH).

In yet another example, the limit includes equations (40) and (29). The subblock for affine motion compensation has a size (subBlkW, subBlkH) if the constraint is satisfied. Otherwise, the subblock used for affine motion compensation has a size (subBlkW × 2, subBlkH).

In yet another example, the limit includes equations (30) and (31). The subblock for affine motion compensation has a size (subBlkW, subBlkH) if the constraint is satisfied. Otherwise, the subblocks used for affine motion compensation have a size (subBlkW, subBlkH × 2).

In yet another example, the limit includes equations (30) and (41). The subblock for affine motion compensation has a size (subBlkW, subBlkH) if the constraint is satisfied. Otherwise, the subblocks used for affine motion compensation have a size (subBlkW, subBlkH × 2). Additionally, normalization may be applied for the above example.

Video encoder 200 represents an example of a device configured to encode video data, the device including a memory configured to store video data, and one or more processing units implemented in circuitry and configured to obtain CPMV values for a current block of video data; determining whether a memory bandwidth required to access samples of a plurality of reference blocks derived based on the values of the CPMV satisfies a bandwidth threshold; selectively modifying a motion compensation method for predicting samples of a current block of video data based on whether the determined memory bandwidth satisfies a bandwidth threshold; and predicting samples of a current block of video data from samples of a plurality of reference blocks using a selectively modified motion compensation method.

Fig. 5 is a block diagram illustrating an example video decoder 300 that may perform techniques of this disclosure. Fig. 5 is provided for purposes of explanation, and not limitation of the technology broadly illustrated and described in this disclosure. For purposes of illustration, this disclosure describes a video decoder 300 described in terms of JEM, VCC, and HEVC techniques. However, the techniques of this disclosure may be performed by video coding devices configured to operate according to other video coding standards.

In the example of fig. 5, the video decoder 300 includes a Coded Picture Buffer (CPB) memory 320, an entropy decoding unit 302, a prediction processing unit 304, an inverse quantization unit 306, an inverse transform processing unit 308, a reconstruction unit 310, a filter unit 312, and a Decoded Picture Buffer (DPB) 314. Any or all of the CPB memory 320, the entropy decoding unit 302, the prediction processing unit 304, the inverse quantization unit 306, the inverse transform processing unit 308, the reconstruction unit 310, the filter unit 312, and the DPB 314 may be implemented in one or more processors or in processing circuitry. Further, the video decoder 300 may include additional or alternative processors or processing circuits to perform these and other functions.

The prediction processing unit 304 includes a motion compensation unit 316 and an intra prediction unit 318. The prediction processing unit 304 may include additional units to perform prediction in other prediction modes. As an example, the prediction processing unit 304 may include a palette unit, an intra block copy unit (which may form part of the motion compensation unit 316), an affine unit, a Linear Model (LM) unit, and the like. In other examples, video decoder 300 may include more, fewer, or different functional components.

The CPB memory 320 may store video data, such as an encoded video bitstream, to be decoded by the components of the video decoder 300. For example, the video data stored in the CPB memory 320 may be obtained from the computer-readable medium 110 (fig. 1). The CPB memory 320 may include CPBs that store encoded video data (e.g., syntax elements) from an encoded video bitstream. Also, the CPB memory 320 may store video data other than syntax elements of coded pictures, such as temporary data representing output from the respective units of the video decoder 300. In general, the DPB 314 stores decoded pictures, which the video decoder 300 may output and/or use as reference video data when decoding subsequent data or pictures of the encoded video bitstream. The CPB memory 320 and DPB 314 may be formed from any of a variety of memory devices, such as DRAMs, including SDRAMs, MRAMs, RRAMs, or other types of memory devices. The CPB memory 320 and DPB 314 may be provided by the same memory device or separate memory devices. In various examples, the CPB memory 320 may be located on-chip with other components of the video decoder 300 or off-chip with respect to those components.

Additionally or alternatively, in some examples, video decoder 300 may retrieve the coded video data from memory 120 (fig. 1). I.e., memory 120 may store data with CPB memory 320 as discussed above. Also, when some or all of the functions of the video decoder 300 are implemented in software to be executed by processing circuitry of the video decoder 300, the memory 120 may store instructions to be executed by the video decoder 300.

The various elements shown in fig. 5 are shown to help understand the operations performed by the video decoder 300. These units may be implemented as fixed function circuits, programmable circuits, or a combination thereof. Similar to fig. 4, a fixed function circuit refers to a circuit that provides a specific function and is preset in operation that can be performed. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in operations that can be performed. For example, a programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., to receive parameters or output parameters), but the type of operations performed by the fixed function circuitry is typically immutable. In some examples, one or more of the cells may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the cells may be an integrated circuit.

The video decoder 300 may include an ALU, an EFU, digital circuitry, analog circuitry, and/or a programmable core formed of programmable circuitry. In examples where the operations of video decoder 300 are performed by software executing on programmable circuitry, on-chip or off-chip memory may store instructions (e.g., object code) of the software received and executed by video decoder 300.

The entropy decoding unit 302 may receive the encoded video data from the CPB and entropy decode the video data to reproduce the syntax element. The prediction processing unit 304, the inverse quantization unit 306, the inverse transform processing unit 308, the reconstruction unit 310, and the filter unit 312 may generate decoded video data based on syntax elements extracted from the bitstream.

In general, the video decoder 300 reconstructs pictures on a block-by-block basis. The video decoder 300 may perform a reconstruction operation on each block separately (where the block currently being reconstructed (i.e., decoded) may be referred to as a "current block").

Entropy decoding unit 302 may entropy decode syntax elements that define quantized transform coefficients of a quantized transform coefficient block and transform information, such as Quantization Parameters (QPs) and/or transform mode indications. The inverse quantization unit 306 may use a QP associated with the quantized transform coefficient block to determine a degree of quantization and, likewise, a degree of inverse quantization for application by the inverse quantization unit 306. The inverse quantization unit 306 may inverse quantize the quantized transform coefficients (e.g., performing a bit-left shift operation). The inverse quantization unit 306 may thus form a transform coefficient block comprising transform coefficients.

After the inverse quantization unit 306 forms the transform coefficient block, the inverse transform processing unit 308 may apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. For example, the inverse transform processing unit 308 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotation transform, an inverse direction transform, or another inverse transform to the coefficient block.

Further, prediction processing unit 304 generates a prediction block according to the prediction information syntax element entropy-decoded by entropy decoding unit 302. For example, if the prediction information syntax element indicates that the current block is inter-predicted, the motion compensation unit 316 may generate a prediction block. In this case, the prediction information syntax element may indicate a reference picture in the DPB 314 from which to retrieve the reference block, as well as a motion vector identifying the position of the reference block in the reference picture relative to the current block in the current picture. The motion compensation unit 316 may generally perform the inter prediction process in a manner substantially similar to that described for the motion compensation unit 224 (fig. 4).

As another example, if the prediction information syntax element indicates that the current block is intra-predicted, the intra prediction unit 318 may generate the prediction block according to the intra prediction mode indicated by the prediction information syntax element. Again, intra-prediction unit 318 may generally perform the intra-prediction process in a manner substantially similar to that described for intra-prediction unit 226 (fig. 4). The intra prediction unit 318 may retrieve data of neighboring samples of the current block from the DPB 314.

The reconstruction unit 310 may reconstruct the current block using the prediction block and the residual block. For example, the reconstruction unit 310 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the current block.

The filter unit 312 may perform one or more filter operations on the reconstructed block. For example, filter unit 312 may perform deblocking operations to reduce block artifacts along the edges of reconstructed blocks. The operation of the filter unit 312 is not necessarily performed in all examples.

The video decoder 300 may store the reconstructed block in the DPB 314. As described above, the DPB 314 may provide reference information, such as samples of a current picture for intra prediction and previously decoded pictures for subsequent motion compensation, to the prediction processing unit 304. In addition, video decoder 300 may output decoded pictures from DPB 314 for subsequent presentation on a display device, such as display device 118 of fig. 1.

In this manner, video decoder 300 represents an example of a video decoding device that includes a memory configured to store video data, and one or more processing units implemented in circuitry and configured to obtain CPMV values for a current block of video data; determining whether a memory bandwidth required to access samples of a plurality of reference blocks derived based on the values of the CPMV satisfies a bandwidth threshold; selectively modifying a motion compensation method for predicting samples of a current block of video data based on whether the determined memory bandwidth satisfies a bandwidth threshold; and predicting samples of a current block of video data from samples of a plurality of reference blocks using a selectively modified motion compensation method.

FIG. 10 is a flow diagram illustrating an example method for encoding a current block. The current block may include a current CU. Although described with respect to video encoder 200 (fig. 1 and 4), it should be understood that other devices may be configured to perform methods similar to fig. 10.

In this example, the video encoder 200 initially predicts the current block (1050). For example, the video encoder 200 may form a prediction block for the current block. As described above, in some examples, the video encoder 200 may predict the current block using a mode. In accordance with one or more techniques of this disclosure, video encoder 200 may perform various techniques to manage memory bandwidth for predicting a current block of video data. One example of such a memory bandwidth management technique is discussed below with reference to FIG. 11.

The video encoder 200 may then calculate a residual block for the current block (1052). To calculate the residual block, the video encoder 200 may calculate the difference between the original, unencoded block and the prediction block for the current block. The video encoder 200 may then transform and quantize the coefficients of the residual block (1054). Next, video encoder 200 may scan the quantized transform coefficients of the residual block (1056). During or after scanning, video encoder 200 may entropy encode the coefficients (1058). For example, video encoder 200 may encode the coefficients using CAVLC or CABAC. The video encoder 200 may then output entropy encoded data for the coefficients of the block (1060).

Fig. 11 is a flow diagram illustrating an example method for decoding a current block of video data. The current block may include a current CU. Although described with respect to video decoder 300 (fig. 1 and 5), it should be understood that other devices may be configured to perform methods similar to fig. 11.

The video decoder 300 may receive entropy encoded data for the current block, such as entropy encoded prediction information and entropy encoded data for coefficients of a residual block corresponding to the current block (1170). The video decoder 300 may decode the entropy encoded data to determine prediction information for the current block and reproduce the coefficients of the residual block (1172). The video decoder 300 may predict the current block (1174), e.g., using an intra or inter prediction mode indicated by the prediction information for the current block, to calculate a prediction block for the current block. As described above, in some examples, the video decoder 300 may predict the current block using an affine mode. In accordance with one or more techniques of this disclosure, video decoder 300 may perform various techniques to manage memory bandwidth for predicting a current block of video data. One example of such a memory bandwidth management technique is discussed below with reference to FIG. 12.

The video decoder 300 may then inverse scan the rendered coefficients (1176) to create a block of quantized transform coefficients. The video decoder 300 may then inverse quantize and inverse transform the coefficients to produce a residual block (1178). The video decoder 300 may finally decode the current block by combining the prediction block and the residual block (1180).

Fig. 12 is a flow diagram illustrating an example method for managing memory bandwidth for predicting video data in accordance with one or more techniques of this disclosure. The techniques of fig. 12 may be performed by a video coder, such as video encoder 200 of fig. 1 and 4 and/or video coder 300 of fig. 1 and 5. For simplicity of explanation, the technique of fig. 12 is described as being performed by the video decoder 300 of fig. 1 and 5.

The video decoder 300 may obtain a value for a Control Point Motion Vector (CPMV) for a current block of video data (1202). For example, video decoder 300 may obtain block 600 of fig. 6Andthe value of (c). The video decoder 300 may obtain the value of the CPMV based on the values of the motion vectors (which may or may not be the CPMV) of the spatial neighboring blocks of the current block.

The video decoder 300 may determine a memory bandwidth required to access samples of a reference block derived based on the value of the CPMV (1204). For example, where the current block is block 806 of fig. 8, the video decoder 300 may determine the memory bandwidth required to access samples of the reference region 810 (including the reference block and additional samples used for interpolation). As described above, the reference area 810 is composed of the sub-block motion vector mv_A-mv_DThese sub-vectors are identified as being derived from the CPMV of block 806.

In some examples, the video decoder 300 may determine the memory bandwidth by determining a size of a minimum area including the plurality of reference blocks based on the value of the CPMV. For example, the video decoder 300 may determine the size of the region 820 of fig. 8 (which is the smallest region that includes the reference region 810). As described above, in some examples, video decoder 300 may use equations (6) - (15) to determine the size of region 820. Based on the size of the minimum region, the video decoder 300 may determine the area of the minimum region as the memory bandwidth required to access the samples of the reference block. For example, video decoder 300 may determine the area of region 820 as the memory bandwidth required to access samples of a reference block derived based on the value of CPMV. As described above, in some examples, video decoder 300 may use equation (16) or equation (17) to determine the area of region 820.

The video decoder 300 may determine whether the determined memory bandwidth satisfies a bandwidth threshold (1206). As described above, the bandwidth threshold may be a predetermined bandwidth threshold. In some examples, video decoder 300 may determine that the memory bandwidth satisfies the bandwidth threshold when the determined memory bandwidth is less than or equal to (e.g., < or ═ bandwidth threshold. In some examples, video decoder 300 may determine that the memory bandwidth satisfies the bandwidth threshold when the determined memory bandwidth is less than (e.g., <) the bandwidth threshold. In some examples, video decoder 300 may determine that the memory bandwidth does not satisfy the bandwidth threshold when the determined memory bandwidth is greater than (e.g., >) the bandwidth threshold. In some examples, video decoder 300 may determine that the memory bandwidth does not satisfy the bandwidth threshold when the determined memory bandwidth is greater than or equal to (e.g., > or ═ the bandwidth threshold.

The video decoder 300 may selectively modify a motion compensation method for predicting samples of a current block of video data based on whether the determined memory bandwidth satisfies a bandwidth threshold. As one example, video decoder 300 may modify the motion compensation method in response to determining that the memory bandwidth does not satisfy the bandwidth threshold ("no" branch of 1206, 1208). As another example, video decoder 300 may maintain (e.g., not modify) the motion compensation method in response to determining that the memory bandwidth does meet the bandwidth threshold (the "yes" branch of 1206, 1210). In the event that the video decoder 300 determines that the motion compensation method is to be modified, the video decoder 300 may modify the motion compensation method in any manner. In general, by modifying the motion compensation method, the video decoder 300 will reduce the memory bandwidth required to predict samples of the current block (i.e., the memory bandwidth required to access samples of the predictor block for the current block).

The video decoder 300 may predict samples for a current block of video data using a selectively modified motion compensation method (1212). For example, prediction processing unit 304 of video decoder 300 may access samples of a plurality of reference blocks derived based on the CPMV values from a memory (e.g., decoded picture buffer 314 of video decoder 300). The video decoder 300 may determine the values of the samples of the current block based on the samples of the reference block and the residual data.

As described above, in some examples, the video decoder 300 may apply a bandwidth reduction technique separately to each prediction direction (e.g., to L0 or L1 motion, respectively). For example, when the current block is bidirectionally predicted, the video decoder 300 may obtain the CPMV value of the current block for a first prediction direction (e.g., L0), and obtain the CPMV value of the current block for a second prediction direction (e.g., L1). The video decoder 300 can independently test the bandwidth requirements for each prediction direction. For example, video decoder 300 may determine whether a memory bandwidth required to access samples of a plurality of reference blocks derived based on a value of CPMV for a first prediction direction satisfies a bandwidth threshold; and determining whether a memory bandwidth required to access samples of the plurality of reference blocks derived based on the value of the CPMV for the second prediction direction satisfies a bandwidth threshold. Based on the testing, the video decoder 300 may selectively modify the motion compensation method for each prediction direction independently. For example, the video decoder 300 may selectively modify a motion compensation method for predicting samples of a current block of video data in a first prediction direction based on whether a determined memory bandwidth for the first prediction direction satisfies a bandwidth threshold; and selectively modifying a motion compensation method for predicting samples of a current block of video data in a second prediction direction if the determined memory bandwidth for the second prediction direction satisfies a bandwidth threshold. As a result, the video decoder 300 may modify the motion compensation methods for the two prediction directions separately and independently (e.g., change one without changing the other, change both simultaneously, neither).

In some examples, video decoder 300 selectively adjusts the motion compensation method used to predict each respective block independently based on the memory bandwidth requirements of the respective block. For example, where a Coding Unit (CU) includes a plurality of 8 x 8 sub-blocks, video decoder 300 may determine, for each sub-block of the plurality of sub-blocks, whether a respective memory bandwidth required to access samples of a respective plurality of reference blocks of the respective sub-block, respectively, satisfies a bandwidth threshold. The video decoder 300 may selectively modify the motion compensation method used to predict the samples of a particular sub-block based on whether the respective memory bandwidth for the respective sub-block satisfies a bandwidth threshold. In other words, the video decoder 300 may selectively adjust the motion compensation method for each sub-block independently of other sub-blocks. In this way, depending on the memory bandwidth requirements of each sub-block, video decoder 300 may adjust the motion compensation methods for some sub-blocks of a CU while not adjusting the motion compensation methods for other sub-blocks of the same CU.

However, in some examples, it may not be desirable for the video decoder 300 to selectively adjust the motion compensation method used to predict each respective block independently based on the memory bandwidth requirements of the respective block. For example, evaluating and testing the memory bandwidth of each sub-block may be computationally intensive, which may slow down the decoding process.

In accordance with one or more techniques of this disclosure, video decoder 300 may selectively adjust a motion compensation method used to predict a plurality of sub-blocks based on a memory bandwidth requirement of a particular sub-block of the plurality of sub-blocks. For example, where a CU includes multiple 8 x 8 sub-blocks, video decoder 300 may determine, for a particular sub-block of the multiple sub-blocks, whether a respective memory bandwidth required to predict the particular sub-block satisfies a bandwidth threshold. Based on whether the memory bandwidth for the particular sub-block satisfies the bandwidth threshold, the video decoder 300 may modify a motion compensation method used to predict all samples of the plurality of sub-blocks. In other words, the video decoder 300 may selectively adjust the motion compensation method of each sub-block based on the evaluation of the individual sub-blocks. In this way, the video decoder 300 may avoid having to determine the memory bandwidth requirements of each sub-block separately.

FIG. 13 is a conceptual diagram illustrating simplified memory bandwidth testing according to one or more aspects of the present disclosure. As shown in fig. 13, a Coding Unit (CU)1304 includes four sub-blocks 1306A-1306D (collectively, "sub-blocks"). For example, CU 1304 may be a 16 × 16CU, and each sub-block 1306 may be an 8 × 8 coding block. As shown in fig. 13, each sub-block 1306 may include its own sub-block. For example, sub-block 1306 may be divided into four sub-blocks, which may be similar to sub-block 808 of fig. 8.

As described above, a video coder (e.g., video encoder 200 and/or video decoder 300) may selectively adjust motion compensation for sub-blocks of a CU, either independently or dependently. To selectively adjust motion compensation independently, the video coder may determine the memory bandwidth requirements of each sub-block 1306 separately. For example, the video coder may determine a first memory bandwidth for sub-block 1306A, a second memory bandwidth for sub-block 1306B, a third memory bandwidth for sub-block 1306C, and a fourth memory bandwidth for sub-block 1306D. The video coder may determine whether to adjust the motion compensation method for each respective sub-block of sub-blocks 1306, respectively, based on the memory bandwidth of the respective sub-block. For example, the video coder may determine whether to adjust the motion compensation method for sub-block 1306A based on a first memory bandwidth, determine whether to adjust the motion compensation method for sub-block 1306B based on a second memory bandwidth, determine whether to adjust the motion compensation method for sub-block 1306C based on a third memory bandwidth, and determine whether to adjust the motion compensation method for sub-block 1306D based on a fourth memory bandwidth.

To adjust motion compensation dependently, the video coder may determine the memory bandwidth requirements of a single sub-block of sub-blocks 1306 and selectively adjust the motion compensation methods for all sub-blocks 1306 based on the memory bandwidth requirements of the single sub-block. For example, the video coder may determine the memory bandwidth of sub-block 1306A and selectively adjust the motion compensation method for all sub-blocks 1306 based on the memory bandwidth requirements of sub-block 1306A. In some examples, the video coder may select which sub-block to test based on location. For example, the video coder may select the top-left sub-block for testing (e.g., sub-block 1306A in fig. 13). In this way, the video coder may avoid having to determine the memory bandwidth requirements of sub-blocks 1306B, 1306C, and 1306D.

Fig. 14 is a flow diagram illustrating an example method for managing memory bandwidth for predicting video data in accordance with one or more techniques of this disclosure. The technique of fig. 14 may be performed by a video coder, such as the video encoder 200 of fig. 1 and 4 and/or the video decoder 300 of fig. 1 and 5. For simplicity of explanation, the technique of fig. 14 is described as being performed by the video decoder 300 of fig. 1 and 5.

The video decoder 300 may obtain values for luma motion vectors for a plurality of luma sub-blocks of a current block of video data selected for coding using affine motion compensation (1402). For example, the video decoder 300 may obtain values for respective motion vectors for each of the luma sub-blocks 902 of fig. 9.

The video decoder 300 may determine a value of a chroma motion vector corresponding to a chroma sub-block of the plurality of luma sub-blocks based on values of luma motion vectors of a subset of the plurality of luma sub-blocks (1404). As described above, the subset of the plurality of luminance sub-blocks may include two diagonally positioned luminance sub-blocks (e.g., excluding all other luminance sub-blocks of the plurality of sub-blocks). For example, the video decoder 300 may determine the value of the chroma motion vector of the chroma sub-block 904 based on the value of the luma motion vector of the luma sub-block 902A and the value of the luma motion vector of the luma sub-block 902D.

As described above, in some examples, the video decoder 300 may determine the value of the chroma motion vector as an average of the values of the luma motion vectors of the subset of the plurality of luma sub-blocks. For example, video coder 300 may determine a sum of values of luma motion vectors for a subset of the plurality of luma sub-blocks and round the determined sum to calculate a value of a chroma motion vector. In some examples, video decoder 300 may perform rounding symmetrically. For example, the video decoder 300 may determine an average of the values of the luma motion vectors for a subset of the plurality of luma sub-blocks as follows:

sMV＝MV_A+MV_D

MV_Chrom.hor＝sMV.hor>＝0？(sMV.hor+offset)>>shift:-((-sMV.hor+offset)>>shift)

MV_Chrom.ver＝sMV.ver>＝0？(sMV.ver+offset)>>shift:-((-sMV.ver+offset)>>shift)

wherein, MV_AIs the value of the first luma sub-block MV (e.g., the value of the MV of luma sub-block 902A), the MV_DIs the value of the second luma sub-block MV (e.g., the value of the MV of luma sub-block 902D), x.hor is the horizontal component of motion vector X, x.ver is the vertical component of motion vector X, offset is one, and shift is one.

The video decoder 300 may predict values of samples of the luma sub-block and the chroma sub-block using the determined motion vectors. As one example, video decoder 300 may predict respective samples of each luma sub-block of the plurality of luma sub-blocks based on respective values of luma motion vectors using affine motion compensation (1406). For example, the video decoder 300 may predict values of samples of the luma sub-block 902 based on the determined luma motion vector of the luma sub-block 902. As another example, video decoder 300 may predict samples of chroma sub-blocks based on values of chroma motion vectors using affine motion compensation (1408). For example, video decoder 300 may predict values of samples of chroma sub-blocks 904 based on the determined chroma motion vectors of chroma sub-blocks 904.

It will be recognized that, according to an example, some acts or events of any of the techniques described herein can be performed in a different sequence, added, combined, or left out altogether (e.g., not all described acts or events are necessary for the practice of the technique). Further, in some examples, acts or events may be processed concurrently, e.g., through multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer readable medium may comprise a computer readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium, including any medium that facilitates transfer of a computer program from one place to another, for example, according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signaling or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures to implement the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the definition of medium includes coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signaling, or other transitory media, but are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as noted above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperative hardware units, including one or more processors as described above in combination with suitable software and/or firmware.

Various examples have been described. These and other examples are within the scope of the following claims.

51页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：色度帧内预测方法和装置、及计算机存储介质

Affine limitation for worst-case bandwidth reduction in video coding

相关技术

网友询问留言