Method, device and system for encoding and decoding a tree of blocks of video samples

文档序号：621645 发布日期：2021-05-07 浏览：4次中文

阅读说明：本技术 对视频样本块的树进行编码和解码的方法、设备和系统 (Method, device and system for encoding and decoding a tree of blocks of video samples ) 是由克里斯托弗·詹姆斯·罗斯沃恩于 2019-06-25 设计创作，主要内容包括：一种用于从位流中解码图像帧中的编码树单位的编码块的系统和方法。该方法包括：接收图像帧,该图像帧具有该图像帧的色度通道相对于该图像帧的亮度通道被子采样的色度格式；以及根据编码树单位的区域的尺寸来确定用于编码树单位的亮度通道的亮度拆分选项。该方法还包括：根据区域的尺寸来确定用于编码树单位的色度通道的色度拆分选项,色度拆分选项不同于亮度拆分选项,能够允许的色度拆分选项得到具有16个样本的最小大小的色度帧内预测块；以及通过从位流中确定用以选择所确定的亮度拆分选项其中之一和所确定的色度拆分选项其中之一的标志来解码编码树单位的编码块。(A system and method for decoding an encoded block of a coding tree unit in an image frame from a bitstream. The method comprises the following steps: receiving an image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame; and determining a luminance splitting option for a luminance channel of the coding tree unit according to the size of the region of the coding tree unit. The method further comprises the following steps: determining a chroma splitting option for a chroma channel of a coding tree unit according to the size of the region, wherein the chroma splitting option is different from a brightness splitting option, and the allowed chroma splitting option obtains a chroma intra-frame prediction block with the minimum size of 16 samples; and decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.)

1. A method of decoding an encoded block of an encoding tree unit in an image frame from a bitstream, the method comprising:

receiving the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame;

determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit;

determining a chroma splitting option for a chroma channel of the coding tree unit according to the size of the region, the chroma splitting option being different from the luma splitting option, the chroma splitting option being allowed to result in a chroma intra prediction block having a minimum size of 16 samples; and

decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.

2. The method of claim 1, wherein the chroma block size is a multiple of 16 samples for a chroma channel of the image frame.

3. The method of claim 1, wherein the determined luminance splitting option results in a luminance block size that is a multiple of 16 samples of a luminance channel for the image frame.

4. The method of claim 1, wherein the determined luma splitting option results in a luma block size that is a multiple of 16 samples for a luma channel of the image frame, and a chroma block having a width of two samples is encoded using block-to-sub-block partitioning, each sub-block size being 2 x 8 samples.

5. The method of claim 1, wherein the determined luma splitting option results in a luma block size that is a multiple of 16 samples for a luma channel of the image frame, and a chroma block having a height of two samples is encoded using block-to-sub-block partitioning, each sub-block size being 8 x 2 samples.

6. A non-transitory computer-readable medium having stored thereon a computer program to implement a method of decoding an encoded block of a coding tree unit in an image frame from a bitstream, the program comprising:

code for receiving the image frame, the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame;

code for determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit;

code for determining a chroma splitting option for a chroma channel of the coding tree unit according to a size of the region, the chroma splitting option being different from the luma splitting option, the chroma splitting option being allocable to result in a chroma intra prediction block having a minimum size of 16 samples; and

code for decoding an encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.

7. A video decoder configured to:

receiving a coding tree unit of an image frame from a bit stream, the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame;

determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit;

decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.

8. A system, comprising:

a memory; and

a processor, wherein the processor is configured to execute code stored on the memory to implement a method of decoding an encoded block of a coding tree unit in an image frame from a bitstream, the method comprising:

receiving the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame;

determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit;

determining a chroma splitting option for a chroma channel of the coding tree unit according to the size of the region, wherein the chroma splitting option is different from the luma splitting option, and the chroma splitting option which can be allowed can obtain a chroma block with a minimum size of 16 samples; and

decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options that are allowed.

Technical Field

The present invention relates generally to digital video signal processing, and more particularly to methods, devices and systems for encoding and decoding a tree of blocks of video samples. The invention also relates to a computer program product comprising a computer readable medium having recorded thereon a computer program for encoding and decoding a tree of blocks of video samples.

Background

There are currently many applications for video coding, including applications for transmitting and storing video data. Many video coding standards have also been developed and others are currently under development. Recent advances in video coding standardization have led to the formation of a group known as the "joint video experts group" (jfet). The joint video expert group (jfet) includes: a member of research group 16, issue 6(SG16/Q6), of the telecommunication standardization sector (ITU-T) of the International Telecommunication Union (ITU), also known as the "video coding experts group" (VCEG); and also known as the international organization for standardization/international electrotechnical commission joint technical committee 1/committee for subgroups 29/working group 11(ISO/IEC JTC1/SC29/WG11) of the "moving picture experts group" (MPEG).

The joint video experts group (jfet) published a proposal symptom set (CfP) and analyzed the responses at the 10 th meeting held in san diego, usa. The submitted answer indicates that the video compression capability is significantly better than that of the current state-of-the-art video compression standard, i.e., "high efficiency video coding" (HEVC). Based on this excellent performance, a project to start a new video compression standard named "universal video coding" (VVC) was decided. It is expected that VVCs will address the continuing demand for even higher compression performance, and the increasing market demand for service provision over WANs (where bandwidth costs are relatively high), particularly as the capabilities of video formats increase (e.g., with higher resolution and higher frame rates). At the same time, VVCs must be implementable in contemporary silicon processes and provide an acceptable tradeoff between implemented performance and implementation cost (e.g., in terms of silicon area, CPU processor load, memory utilization, and bandwidth).

The video data comprises a sequence of frames of image data each comprising one or more color channels. Typically, one primary color channel and two secondary color channels are required. The primary color channel is typically referred to as the "luminance" channel, and the secondary color channel(s) is typically referred to as the "chrominance" channel. Although video data is typically displayed in an RGB (red-green-blue) color space, the color space has a high correlation between three respective components. The video data representation seen by an encoder or decoder typically uses a color space such as YCbCr. YCbCr concentrates luminosity (mapped to "luminance" according to the transform equation) in the Y (primary) channel and chroma in the Cb and Cr (secondary) channels. Furthermore, the Cb and Cr channels may be spatially sampled at a lower rate than the luma channel (e.g., half in the horizontal direction and half in the vertical direction (referred to as the "4: 2:0 chroma format")). The 4:2:0 chroma format is commonly used in "consumer" applications, such as internet video streaming, broadcast television, and Blu-ray (Blu-ray)^TM) Storage on disk. Sub-sampling the Cb and Cr channels at half rate horizontally rather than vertically is referred to as the "4: 2:2 chroma format". The 4:2:2 chroma format is commonly used in professional applications, including the capture of clips for film production and the like. The higher sampling rate of the 4:2:2 chroma format makes the resulting video more resilient to editing operations, such as color grading, etc. Often, 4:2:2 chroma format material is converted to a 4:2:0 chroma format prior to distribution to consumers and then encoded prior to distribution to consumers. In addition to chroma formats, video is characterized by resolution and frame rate. An example resolution is ultra high with a resolution of 3840 × 2160Definition (UHD) or "8K" with a resolution of 7680 × 4320, and example frame rates are 60Hz or 120 Hz. The luminance sampling rate may range from about 500 mega samples per second to several thousand mega samples per second. For the 4:2:0 chroma format, the sampling rate for each chroma channel is one-fourth of the luma sampling rate, and for the 4:2:2 chroma format, the sampling rate for each chroma channel is one-half of the luma sampling rate.

The VVC standard is a "block-based" codec in which a frame is first partitioned into an array of square regions called "coding tree units" (CTUs). The CTUs typically occupy a relatively large area, such as 128 × 128 luminance samples. However, the area of the CTUs at the right and bottom edges of each frame may be small. Associated with each CTU is a "coding tree" for the luminance channel and an additional coding tree for the chrominance channel. The coding tree defines the decomposition of a region of a CTU into a set of blocks, also referred to as "coded blocks" (CBs). A single coding tree may also specify blocks for both the luma and chroma channels, in which case the blocks are referred to as "coding units" (CUs), each CU having an encoded block for each color channel. The CBs are processed in a specific order for encoding or decoding. Due to the use of the 4:2:0 chroma format, a CTU having a luma sample tree of 128 × 128 luma sample regions has a corresponding chroma code tree of 64 × 64 chroma sample regions co-located with the 128 × 128 luma sample regions. When a single coding tree is used for both the luma and chroma channels, the set of co-located blocks for a given region is often referred to as a "unit," such as the CU described above, as well as the "prediction unit" (PU) and the "transform unit" (TU). When a separate coding tree is used for a given region, the above CB is used, as well as a "prediction block" (PB) and a "transform block" (TB).

Although there is the above distinction between "unit" and "block", the term "block" may be used as a general term for a region (area) or area (region) of a frame in which an operation is applied to all color channels.

For each CU, a Prediction (PU) ("prediction unit") of the content (sample value) of the corresponding region of the frame data is generated. Furthermore, a representation of the difference between the prediction and the region content (or the "residual" in the spatial domain) seen at the input of the encoder is formed. The differences for each color channel may be transform coded into a sequence of residual coefficients, forming one or more TUs for a given CU. The applied transform may be a Discrete Cosine Transform (DCT) or other transform applied to individual blocks of residual values. The main transformation is applied separately, i.e. two-dimensional in two passes. The block is first transformed by applying a one-dimensional transform to each row of samples in the block. The partial results are then transformed by applying a one-dimensional transform to the columns of the partial results to produce a final block of transform coefficients that substantially decorrelates the residual samples. The VVC standard supports transforms of various sizes, including transforms of rectangular blocks (each side sized to the power of 2). The transform coefficients are quantized for entropy encoding into a bitstream.

When spatial prediction ("intra prediction") is used to generate the PB, a set of reference samples is used to generate the prediction samples for the current PB. The reference samples include samples adjacent to the PB that have been "reconstructed" (residual samples are added to the intra-predicted samples). These adjacent samples form a row above the PB and a column to the left of the PB. The rows and columns also extend beyond the PB boundary to include additional nearby samples. Since the blocks are scanned in a Z-order scan, some reference samples are reconstructed in the immediately preceding block. Using samples from an immediately preceding block results in feedback dependencies that may limit the throughput of the block through a video encoder or decoder.

Disclosure of Invention

It is an object of the present invention to substantially overcome or at least ameliorate one or more disadvantages of existing arrangements.

One aspect of the present invention provides a method of decoding an encoded block of a coding tree unit in an image frame from a bitstream, the method including: receiving the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame; determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit; determining a chroma splitting option for a chroma channel of the coding tree unit according to the size of the region, the chroma splitting option being different from the luma splitting option, the chroma splitting option being allowed to result in a chroma intra prediction block having a minimum size of 16 samples; and decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.

According to another aspect, wherein the chroma block size is a multiple of 16 samples for a chroma channel of the image frame.

According to another aspect, the determined luminance splitting option results in a luminance block size that is a multiple of 16 samples of a luminance channel for the image frame.

According to another aspect, a chroma block having a width of two samples is encoded using block-to-sub-block partitioning, each sub-block being 2 × 8 samples in size.

According to another aspect, a chroma block having a height of two samples is encoded using a block-to-sub-block partition, each sub-block having a size of 8 × 2 samples.

Another aspect of the present invention provides a non-transitory computer readable medium having stored thereon a computer program to implement a method of decoding an encoded block of a coding tree unit in an image frame from a bitstream, the program comprising: code for receiving the image frame, the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame; code for determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit; code for determining a chroma splitting option for a chroma channel of the coding tree unit according to a size of the region, the chroma splitting option being different from the luma splitting option, the chroma splitting option being allocable to result in a chroma intra prediction block having a minimum size of 16 samples; and code for decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.

Another aspect of the present invention provides a video decoder configured to: receiving a coding tree unit of an image frame from a bit stream, the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame; determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit; determining a chroma splitting option for a chroma channel of the coding tree unit according to the size of the region, the chroma splitting option being different from the luma splitting option, the chroma splitting option being allowed to result in a chroma intra prediction block having a minimum size of 16 samples; and decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options.

Another aspect of the invention provides a system comprising: a memory; and a processor, wherein the processor is configured to execute code stored on the memory to implement a method of decoding an encoded block of a coding tree unit in an image frame from a bitstream, the method comprising: receiving the image frame having a chroma format in which a chroma channel of the image frame is sub-sampled relative to a luma channel of the image frame; determining a luminance splitting option for a luminance channel of the coding tree unit according to a size of a region of the coding tree unit; determining a chroma splitting option for a chroma channel of the coding tree unit according to the size of the region, wherein the chroma splitting option is different from the luma splitting option, and the chroma splitting option which can be allowed can obtain a chroma block with a minimum size of 16 samples; and decoding the encoded block of the coding tree unit by determining a flag from the bitstream to select one of the determined luma split options and one of the determined chroma split options that are allowed.

Other aspects are also disclosed.

Drawings

At least one embodiment of the invention will now be described with reference to the following drawings and appendices, in which:

FIG. 1 is a schematic block diagram illustrating a video encoding and decoding system;

FIGS. 2A and 2B constitute a schematic block diagram of a general-purpose computer system in which one or both of the video encoding and decoding systems of FIG. 1 may be practiced;

FIG. 3 is a schematic block diagram showing functional modules of a video encoder;

FIG. 4 is a schematic block diagram showing functional modules of a video decoder;

FIG. 5 is a schematic block diagram illustrating the available partitioning of blocks into one or more blocks in a tree structure for general video coding;

FIG. 6 is a schematic diagram of a data stream to enable partitioning of blocks into one or more blocks in a tree structure for general video coding;

fig. 7A and 7B illustrate an example division of a Coding Tree Unit (CTU) into multiple Coding Units (CUs);

FIG. 8 is a diagram showing a set of transform block sizes and associated scan patterns;

FIG. 9 is a diagram showing rules for generating a list of allowed splits in a luma coding tree and a chroma coding tree;

FIG. 10 is a flow diagram of a method for encoding a coding tree of image frames into a video bitstream;

FIG. 11 is a flow diagram of a method for decoding an encoding tree of an image frame from a video bitstream;

FIG. 12 is a flow diagram of a method for encoding luma and chroma coding trees for an image frame into a video bitstream; and

fig. 13 is a flow diagram of a method for decoding luma and chroma coding trees for an image frame from a video bitstream.

Detailed Description

Where reference is made to steps and/or features having the same reference number in any one or more of the figures, those steps and/or features have the same function(s) or operation(s) for the purposes of this specification unless the contrary intention appears.

As described above, using samples from an immediately preceding block results in feedback dependencies that may limit the throughput of the block through a video encoder or decoder. As required by typical real-time encoding and decoding applications, methods for mitigating the severity of the resulting feedback dependency loop are desirable to ensure that a high rate of processing blocks can be maintained. Feedback-dependent loops are particularly problematic for the high sampling rates of modern video formats (e.g., 500-.

Fig. 1 is a schematic block diagram illustrating functional modules of a video encoding and decoding system 100. The system 100 may use different rules for the allowed sub-partitioning of regions in the luma and chroma coding trees to reduce the worst-case block processing rate encountered. For example, the system 100 may operate such that a block is always a multiple of 16 (sixteen) samples in size, regardless of the aspect ratio of the block. Residual coefficient coding may also employ multiples of 16 block sizes, including where a block has a width or height of two samples.

System 100 includes a source device 110 and a destination device 130. Communication channel 120 is used to communicate encoded video information from source device 110 to destination device 130. In some configurations, one or both of source device 110 and destination device 130 may each comprise a mobile telephone handset or "smartphone," where in this case, communication channel 120 is a wireless channel. In other configurations, source device 110 and destination device 130 may comprise video conferencing equipment, where in this case, communication channel 120 is typically a wired channel such as an internet connection or the like. Furthermore, source device 110 and destination device 130 may comprise any of a wide range of devices, including devices that support over-the-air television broadcasts, cable television applications, internet video applications (including streaming), and applications that capture encoded video data on some computer-readable storage medium, such as a hard drive in a file server, etc.

As shown in FIG. 1, source device 110 includes a video source 112, a video encoder 114, and a transmitter 116. The video source 112 typically comprises a source of captured video frame data (denoted 113), such as a camera sensor, a previously captured video sequence stored on a non-transitory recording medium, or a video feed from a remote camera sensor. The video source 112 may also be the output of a computer graphics card (e.g., the video output of various applications that display the operating system and execute on a computing device, such as a tablet computer). Examples of source devices 110 that may include a camera sensor as the video source 112 include smart phones, video camcorders, professional video cameras, and web video cameras.

The video encoder 114 converts (or "encodes") the captured frame data (indicated by arrow 113) from the video source 112 into a bitstream (indicated by arrow 115). The bitstream 115 is transmitted by the transmitter 116 as encoded video data (or "encoded video information") via the communication channel 120. The bit stream 115 may also be stored in a non-transitory storage device 122, such as a "flash" memory or hard drive, until subsequently transmitted over the communication channel 120 or as an alternative to transmission over the communication channel 120.

Destination device 130 includes a receiver 132, a video decoder 134, and a display device 136. Receiver 132 receives encoded video data from communication channel 120 and passes the received video data as a bitstream (indicated by arrow 133) to video decoder 134. The video decoder 134 then outputs the decoded frame data (indicated by arrow 135) to the display device 136. The decoded frame data 135 has the same chroma format as the frame data 113. Examples of display device 136 include a cathode ray tube, a liquid crystal display (such as in a smart phone, a tablet computer, a computer monitor, or a stand-alone television, etc.). The respective functions of the source device 110 and the destination device 130 may also be embodied in a single device, examples of which include a mobile telephone handset and a tablet computer.

Although example apparatuses are described above, source apparatus 110 and destination apparatus 130 may each be configured within a general purpose computer system, typically via a combination of hardware and software components. Fig. 2A illustrates such a computer system 200, the computer system 200 comprising: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227 which may be configured as a video source 112, and a microphone 280; and output devices including a printer 215, a display device 214, which may be configured as the display device 136, and speakers 217. The computer module 201 may use an external modulator-demodulator (modem) transceiver device 216 to communicate with the communication network 220 via connection 221. The communication network 220, which may represent the communication channel 120, may be a Wide Area Network (WAN), such as the internet, a cellular telecommunications network, or a private WAN. Where connection 221 is a telephone line, modem 216 may be a conventional "dial-up" modem. Alternatively, where the connection 221 is a high capacity (e.g., cable or optical) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communication network 220. The transceiver device 216 may provide the functionality of the transmitter 116 and the receiver 132, and the communication channel 120 may be embodied in the wiring 221.

The computer module 201 typically includes at least one processor unit 205 and a memory unit 206. For example, the memory unit 206 may have a semiconductor Random Access Memory (RAM) and a semiconductor Read Only Memory (ROM). The computer module 201 further comprises a plurality of input/output (I/O) interfaces, wherein the plurality of input/output (I/O) interfaces comprises: an audio-video interface 207 connected to the video display 214, the speaker 217, and the microphone 280; an I/O interface 213 connected to a keyboard 202, a mouse 203, a scanner 226, a camera 227, and optionally a joystick or other human interface device (not shown); and an interface 208 for an external modem 216 and printer 215. The signal from the audio-video interface 207 to the computer monitor 214 is typically the output of a computer graphics card. In some implementations, the modem 216 may be built into the computer module 201, such as into the interface 208. The computer module 201 also has a local network interface 211, wherein the local network interface 211 allows the computer system 200 to be connected via a connection 223 to a local area communication network 222 known as a Local Area Network (LAN). As shown in fig. 2A, local area communication network 222 may also be provided viaConnection 224 connects to wide area network 220 where local communication network 222 typically includes so-called "firewall" devices or devices with similar functionality. The local network interface 211 may comprise Ethernet (Ethernet)^TM) Circuit card, Bluetooth (Bluetooth)^TM) Wireless configuration or IEEE 802.11 wireless configuration; however, a variety of other types of interfaces may be practiced for interface 211. Local network interface 211 may also provide the functionality of transmitter 116 and receiver 132, and communication channel 120 may also be embodied in local area communication network 222.

I/O interfaces 208 and 213 can provide either or both of a serial connection and a parallel connection, with the former typically implemented according to the Universal Serial Bus (USB) standard and having a corresponding USB connector (not shown). A storage device 209 is provided, and the storage device 209 typically includes a Hard Disk Drive (HDD) 210. Other storage devices (not shown), such as floppy disk drives and tape drives may also be used. An optical disc drive 212 is typically provided to serve as a non-volatile source of data. For example, an optical disk (e.g., CD-ROM, DVD, Blu ray Disc (Blu ray Disc)) can be used^TM) USB-RAM, portable external hard drives and floppy disks, etc. as suitable sources of data for computer system 200. In general, any of the HDD 210, the optical disk drive 212, the networks 220 and 222 may also be configured to operate as the video source 112 or as a destination for decoded video data to be stored for reproduction via the display 214. The source device 110 and the destination device 130 of the system 100 may be embodied in a computer system 200.

The components 205 and 213 of the computer module 201 typically communicate via the interconnection bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those skilled in the relevant art. For example, the processor 205 is connected to the system bus 204 using connections 218. Also, the memory 206 and the optical disk drive 212 are connected to the system bus 204 by connections 219. Examples of computers that may practice the configuration include IBM-PC and compatible machines, Sun SPARCstation, Apple Mac^TMOr similar computer system.

The video encoder 114 and video decoder 134, and the methods described below, may be implemented using the computer system 200 where appropriate or desired. In particular, the video encoder 114, the video decoder 134 and the method to be described may be implemented as one or more software applications 233 executable within the computer system 200. In particular, the video encoder 114, the video decoder 134, and the steps of the method are implemented with instructions 231 (see fig. 2B) in software 233 that are executed within the computer system 200. The software instructions 231 may be formed as one or more modules of code, each for performing one or more particular tasks. It is also possible to divide the software into two separate parts, wherein a first part and a corresponding code module perform the method and a second part and a corresponding code module manage the user interface between the first part and the user.

For example, the software may be stored in a computer-readable medium including a storage device described below. The software is loaded into the computer system 200 from a computer-readable medium and then executed by the computer system 200. A computer-readable medium with such software or a computer program recorded on the computer-readable medium is a computer program product. The use of this computer program product in computer system 200 preferably enables advantageous apparatus for implementing video encoder 114, video decoder 134, and the described methods.

The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer-readable medium and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM)225 that is read by the optical disk drive 212.

In some instances, the application 233 is supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively, the application 233 may be read by the user from the network 220 or 222. Still further, the software may also be loaded into the computer system 200 from other computer readable media. Computer-readable storage medium refers to instructions and/or data to be recordedAny non-transitory tangible storage medium that is provided to computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tapes, CD-ROMs, DVDs, Blu-ray discs (Blu-ray discs)^TM) A hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card, etc., regardless of whether the devices are internal or external to the computer module 201. Examples of transitory or non-tangible computer-readable transmission media that may also participate in providing software, applications, instructions, and/or video data or encoded video data to the computer module 401 include: radio or infrared transmission channels and network wiring to other computers or networked devices, and the internet or intranet including e-mail transmissions and information recorded on websites and the like.

The second portion of the application 233 and corresponding code modules described above can be executed to implement one or more Graphical User Interfaces (GUIs) to be rendered or otherwise presented on the display 214. By operating typically the keyboard 202 and mouse 203, a user and applications of the computer system 200 can operate the interface in a functionally applicable manner to provide control commands and/or inputs to the applications associated with the GUI(s). Other forms of user interface that are functionally applicable may also be implemented, such as an audio interface utilizing voice prompts output via the speaker 217 and user voice commands input via the microphone 280, and so forth.

Fig. 2B is a detailed schematic block diagram of processor 205 and "memory" 234. The memory 234 represents a logical aggregation of all memory modules (including the HDD 209 and the semiconductor memory 206) that can be accessed by the computer module 201 in fig. 2A.

With the computer module 201 initially powered on, a power-on self-test (POST) program 250 is executed. The POST program 250 is typically stored in the ROM 249 of the semiconductor memory 206 of fig. 2A. A hardware device such as ROM 249 in which software is stored is sometimes referred to as firmware. The POST program 250 checks the hardware within the computer module 201 to ensure proper operation, and typically checks the processor 205, memory 234(209,206), and basic input-output system software (BIOS) module 251, which is also typically stored in ROM 249, for proper operation. Once the POST program 250 is successfully run, the BIOS 251 boots the hard drive 210 of FIG. 2A. Booting the hard drive 210 causes the boot loader 252 resident on the hard drive 210 to be executed via the processor 205. This loads the operating system 253 into the RAM memory 206, where the operating system 253 begins to operate on the RAM memory 206. The operating system 253 is a system-level application executable by the processor 205 to implement various high-level functions including processor management, memory management, device management, storage management, software application interfaces, and a general-purpose user interface.

Operating system 253 manages storage 234(209,206) to ensure that each process or application running on computer module 201 has sufficient memory to execute without conflict with memory allocated to other processes. Furthermore, the different types of memory available in the computer system 200 of FIG. 2A must be properly used so that the processes can run efficiently. Thus, the aggregate memory 234 is not intended to illustrate how a particular segment of memory is allocated (unless otherwise specified), but rather provides an overview of the memory accessible to the computer system 200 and how that memory is used.

As shown in FIG. 2B, the processor 205 includes a plurality of functional blocks including a control unit 239, an Arithmetic Logic Unit (ALU)240, and a local or internal memory 248, sometimes referred to as a cache memory. Cache memory 248 typically includes a plurality of storage registers 244-246 in the register section. One or more internal buses 241 functionally interconnect these functional modules. The processor 205 also typically has one or more interfaces 242 for communicating with external devices via the system bus 204 using connections 218. Memory 234 is connected to bus 204 using connections 219.

Application 233 includes instruction sequence 231, which may include conditional branch instructions and loop instructions. Program 233 may also include data 232 used when executing program 233. The instructions 231 and data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending on the relative sizes of instruction 231 and memory location 228 and 230, the particular instruction may be stored in a single memory location as described by the instruction shown in memory location 230. Alternatively, as depicted by the instruction segments shown in memory locations 228 and 229, the instruction may be split into multiple portions that are each stored in separate memory locations.

Typically, a set of instructions is given to the processor 205, where the set of instructions is executed within the processor 205. The processor 205 waits for a subsequent input to which the processor 205 reacts by executing another set of instructions. Each input may be provided from one or more of a plurality of sources, including data generated by one or more of the input devices 202, 203, data received from an external source via one of the networks 220, 202, data retrieved from one of the storage devices 206, 209, or data retrieved from a storage medium 225 inserted within a respective reader 212 (all shown in fig. 2A). Executing a set of instructions may in some cases result in outputting data. Execution may also involve storing data or variables to memory 234.

The video encoder 114, the video decoder 134, and the method may use the input variables 254 stored in respective memory locations 255, 256, 257 within the memory 234. The video encoder 114, video decoder 134, and the method generate output variables 261 stored in respective memory locations 262, 263, 264 within the memory 234. Intermediate variables 258 may be stored in memory locations 259, 260, 266, and 267.

Referring to the processor 205 of FIG. 2B, registers 244, 245, 246, Arithmetic Logic Unit (ALU)240, and control unit 239 work together to perform a sequence of micro-operations required to perform "fetch, decode, and execute" cycles for each of the instructions in the instruction set that make up the program 233. Each fetch, decode, and execute cycle includes:

a fetch operation to fetch or read instruction 231 from memory locations 228, 229, 230;

a decode operation in which the control unit 239 determines which instruction is fetched; and

an execution operation in which control unit 239 and/or ALU 240 execute the instruction.

Thereafter, further fetch, decode, and execute cycles for the next instruction may be performed. Also, a memory cycle may be performed by which the control unit 239 stores or writes values to the memory locations 232.

Each step or sub-process in the method of fig. 10-13 to be described is associated with one or more sections of the program 233 and is typically performed by register sections 244, 245, 247, ALU 240 and control unit 239 in processor 205 working together to perform fetch, decode and execution cycles for each instruction in the segmented instruction set of program 233.

Fig. 3 is a schematic block diagram showing functional modules of the video encoder 114. Fig. 4 is a schematic block diagram showing functional blocks of the video decoder 134. Typically, data is passed between functional modules within video encoder 114 and video decoder 134 in groups of samples or coefficients (such as partitions of blocks into fixed-size sub-blocks, etc.) or as arrays. As shown in fig. 2A and 2B, the video encoder 114 and video decoder 134 may be implemented using a general-purpose computer system 200, wherein the various functional modules may be implemented using dedicated hardware within the computer system 200, using software executable within the computer system 200, such as one or more software code modules of a software application 233 residing on the hard disk drive 205 and being controlled in execution by the processor 205. Alternatively, the video encoder 114 and the video decoder 134 may be implemented with a combination of dedicated hardware and software executable within the computer system 200. The video encoder 114, video decoder 134, and the methods may alternatively be implemented in dedicated hardware, such as one or more integrated circuits that perform the functions or sub-functions of the methods. Such special-purpose hardware may include a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Standard Product (ASSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or one or more microprocessors and associated memory. In particular, the video encoder 114 includes the module 310-.

Although the video encoder 114 of fig. 3 is an example of a general video coding (VVC) video coding pipeline, other video codecs may be used to perform the processing stages described herein. Video encoder 114 receives captured frame data 113, such as a series of frames (each frame including one or more color channels). The frame data 113 may be in a 4:2:0 chroma format or a 4:2:2 chroma format. The block partitioner 310 first partitions the frame data 113 into CTUs, which are generally square in shape and are configured such that a particular size of CTU is used. For example, the size of the CTU may be 64 × 64, 128 × 128, or 256 × 256 luminance samples. The block partitioner 310 further partitions each CTU into one or more CBs according to the luma coding tree and the chroma coding tree. CB has various sizes and may include both square and non-square aspect ratios. The operation of block partitioner 310 is further described with reference to FIG. 10. However, in the VVC standard, CB, CU, PU, and TU always have a side length of power of 2. Thus, the current CB (denoted 312) is output from the block partitioner 310 to proceed according to the luma and chroma coding trees of the CTU according to an iteration of one or more blocks of the CTU. The options for partitioning a CTU into CBs are further explained below with reference to fig. 5 and 6.

The CTUs resulting from the first segmentation of the frame data 113 may be scanned in raster scan order and may be grouped into one or more "slices". A slice may be an "intra" (or "I") slice. Intra-slice (I-slice) indicates that each CU in a slice is intra-predicted. Alternatively, the stripes may be uni-predictive or bi-predictive (a "P" or "B" stripe, respectively), indicating additional availability of uni-prediction and bi-prediction, respectively, in the stripe.

For each CTU, the video encoder 114 operates in two stages. In a first phase (referred to as the "search" phase), the block partitioner 310 tests various potential configurations of the code tree. Each potential configuration of the coding tree has an associated "candidate" CB. The first stage involves testing various candidate CBs to select a CB that provides high compression efficiency and low distortion. Testing typically involves lagrangian optimization whereby candidate CBs are evaluated based on a weighted combination of rate (coding cost) and distortion (error with respect to input frame data 113). The "best" candidate CB (the CB with the lowest evaluation rate/distortion) is selected for subsequent encoding in the bitstream 115. The evaluation of the candidate CB includes the following options: a CB is used for a given region, or the region is further split according to various splitting options and each smaller resulting region or further split region is encoded with other CBs. As a result, both CBs and the coding tree itself are selected in the search phase.

Video encoder 114 generates a Prediction Block (PB), indicated by arrow 320, for each CB (e.g., CB 312). The PB 320 is a prediction of the content of the associated CB 312. The subtractor module 322 produces a difference (or "residual", which means the difference is in the spatial domain) between PB 320 and CB 312, denoted 324. The difference 324 is the block size difference between the corresponding samples in the PB 320 and CB 312. The difference 324 is transformed, quantized and represented as a Transform Block (TB) indicated by arrow 336. PB 320 and associated TB 336 are typically selected from one of a plurality of possible candidate CBs, e.g., based on evaluation cost or distortion.

A candidate Coding Block (CB) is a CB derived from one of the prediction modes available to video encoder 114 for the associated PB and the resulting residual. Each candidate CB gets one or more corresponding TBs, as described below with reference to fig. 8. TB 336 is a quantized and transformed representation of difference 324. When combined with the predicted PB in the video decoder 114, the TB 336 reduces the difference between the decoded CB and the original CB 312 at the expense of additional signaling in the bitstream.

Thus, each candidate Coding Block (CB), i.e. the combination of the Prediction Block (PB) and the Transform Block (TB), has an associated coding cost (or "rate") and an associated difference (or "distortion"). The rate is typically measured in bits. The distortion of the CB is typically estimated as the difference of sample values, such as Sum of Absolute Differences (SAD) or Sum of Squared Differences (SSD). The mode selector 386 uses the difference 324 to determine an estimate derived from each candidate PB to determine the intra prediction mode (represented by arrow 388). The estimation of the coding cost associated with each candidate prediction mode and the corresponding residual coding can be done at a significantly lower cost compared to the entropy coding of the residual. Thus, multiple candidate patterns may be evaluated to determine the best pattern in a rate-distortion sense.

Determining the best mode in terms of rate distortion is typically achieved using a variant of lagrangian optimization. Selecting the intra-prediction mode 388 generally involves determining the coding cost of the residual data resulting from applying the particular intra-prediction mode. The coding cost can be approximated by using the "sum of absolute transform differences" (SATD), whereby the estimated transform residual cost is obtained using a relatively simple transform such as the Hadamard transform. In some implementations using relatively simple transformations, the cost resulting from the simplified estimation method is monotonically correlated with the actual cost that would otherwise be determined from the full evaluation. In implementations with monotonically related estimation costs, the simplified estimation method may be used to make the same decision (i.e., intra prediction mode), with the complexity of the video encoder 114 being reduced. To allow for possible non-monotonicity of the relationship between the estimated cost and the actual cost, the simplified estimation method may be used to generate a list of best candidates. For example, non-monotonicity may result from other mode decisions that may be used for the encoding of residual data. The list of best candidates may have any number. A more complete search may be made using the best candidate to establish the best mode selection for encoding the residual data of the respective candidate, allowing for the final selection of intra prediction modes as well as other mode decisions.

Other mode decisions include the ability to skip forward transforms (referred to as "transform skip"). Skipping the transform is suitable for residual data that lacks sufficient correlation to reduce coding cost via the expression as a basis function for the transform. Certain types of content, such as relatively simple computer-generated graphics, may exhibit similar behavior. For a "skipped transform," the residual coefficients are encoded even without the transform itself.

Both the selection of CTU to CB partitions (using block partitioner 310) and the selection of the best prediction mode from among multiple possibilities may be done using lagrangian or similar optimization process. The intra prediction mode with the lowest cost measure is selected as the best mode by applying a lagrangian optimization process of the candidate modes in the mode selector module 386. The best mode is the selected intra prediction mode 388 and is also encoded in the bitstream 115 by the entropy encoder 338. The selection of the intra prediction mode 388 by operation of the mode selector module 386 extends to the operation of the block partitioner 310. For example, the selected candidates for the intra prediction mode 388 may include a mode applicable to a given block and an additional mode applicable to a plurality of smaller blocks that are co-located with the given block as a whole. In the case of a pattern including blocks applicable to a given block and smaller co-located blocks, the process of selection of candidates is implicitly also the process of determining the optimal hierarchical decomposition of the CTU into CBs.

In a second phase of operation of the video encoder 114, referred to as the "encoding" phase, iterations of the selected luma coding tree and the selected chroma coding tree (and thus of each selected CB) are performed in the video encoder 114. As described further herein, in an iteration, CBs are encoded in the bitstream 115.

The entropy encoder 338 supports both variable length coding of syntax elements and arithmetic coding of syntax elements. Context-adaptive binary arithmetic coding processing is used to support arithmetic coding. The syntax elements of arithmetic coding consist of a sequence of one or more "bins". Like the bits, the bin has a value of "0" or "1". However, the bins are not encoded as discrete bits in the bitstream 115. The bins have associated prediction (or "probable" or "maximum probability") values and associated probabilities (referred to as "contexts"). The "maximum probability symbol" (MPS) is encoded when the actual bin to be encoded matches the prediction value. Encoding the most probable symbol is relatively inexpensive in terms of consumed bits. The "least probability symbol" (LPS) is encoded when the actual bin to be encoded does not match the possible value. Encoding the minimum probability symbols has a relatively high cost in terms of consumed bits. The bin coding technique enables efficient coding of bins that skew the probability of "0" vs "1". For a syntax element with two possible values (i.e., a "flag"), a single bin is sufficient. For syntax elements with many possible values, a sequence of bins is required.

The presence of a later bin in the sequence may be determined based on the value of the earlier bin in the sequence. In addition, each bin may be associated with more than one context. The particular context may be selected according to a more previous bin in the syntax element and a bin value of a neighboring syntax element (i.e., a bin value from a neighboring block), etc. Each time a context coding bin is coded, the context selected for that bin (if any) is updated in a way that reflects the new bin value. As such, binary arithmetic coding schemes are considered adaptive.

Video encoder 114 also supports context-less bins ("bypass bins"). The bypass bin is encoded assuming an equal probability distribution between "0" and "1". Thus, each bin occupies one bit in the bitstream 115. The absence of context saves memory and reduces complexity, thus using bypass bins whose distribution of values for a particular bin is not skewed. One example of an entropy encoder that employs context and adaptation is known in the art as a CABAC (context adaptive binary arithmetic encoder), and many variations of this encoder are employed in video coding.

The entropy encoder 338 encodes the intra prediction mode 388 using a combination of context-coded and bypass-coded bins. Typically, a list of "most probable modes" is generated in the video encoder 114. The list of most probable modes is typically a fixed length such as three or six modes, and may include the modes encountered in earlier blocks. The context coding bin encodes a flag indicating whether the intra prediction mode is one of the most probable modes. If the intra prediction mode 388 is one of the most probable modes, the further signaling is encoded using the bypass coding bin. For example, the encoded further signaling uses a truncated unary bin string to indicate which most probable mode corresponds to the intra prediction mode 388. Otherwise, the intra prediction mode 388 is encoded as a "residual mode". Encoding uses an alternative syntax, such as a fixed length code, for the remaining modes, and also uses bypass-coded bins to encode to represent intra-prediction modes other than those present in the maximum probability mode list.

The multiplexer module 384 outputs the PB 320 according to the determined best intra prediction mode 388 selected from the tested prediction modes of the candidate CBs. The candidate prediction modes need not include every conceivable prediction mode supported by video encoder 114.

Prediction modes are roughly classified into two categories. The first category is "intra-frame prediction" (also referred to as "intra prediction"). In intra prediction, a prediction of a block is generated, and the generation method may use other samples obtained from the current frame. For intra prediction PB, it is possible to use different intra prediction modes for luminance and chrominance, and thus intra prediction is described mainly in terms of operation on PB.

The second category of prediction modes is "inter-frame prediction" (also referred to as "inter prediction"). In inter-prediction, samples from one or two frames preceding the current frame in the order of the encoded frame in the bitstream are used to generate a prediction of the block. Furthermore, for inter prediction, a single coding tree is typically used for both the luminance and chrominance channels. The order of the encoded frames in the bitstream may be different from the order of the frames when captured or displayed. When a frame is used for prediction, the block is called "uni-prediction" and has two associated motion vectors. When two frames are used for prediction, the block is called "bi-prediction" and has two associated motion vectors. For P slices, each CU may be intra-predicted or mono-predicted. For B slices, each CU may be intra predicted, mono predicted, or bi predicted. Frames are typically encoded using a "group of pictures" structure, thereby achieving a temporal hierarchy of frames. The temporal hierarchy of frames allows a frame to reference previous and subsequent pictures in the order in which the frame is displayed. The pictures are encoded in the order necessary to ensure that the dependencies of the decoded frames are satisfied.

The sub-category of inter prediction is called "skip mode". The inter prediction and skip modes are described as two different modes. However, both inter-prediction mode and skip mode involve referencing motion vectors of blocks of samples from previous frames. Inter prediction involves encoding a motion vector delta (delta) to specify a motion vector relative to a motion vector predictor. The motion vector predictor is obtained from a list of one or more candidate motion vectors selected with a "merge index". The coded motion vector delta provides a spatial offset relative to the selected motion vector prediction. Inter prediction also uses the encoded residual in the bitstream 133. The skip mode uses only an index (also referred to as a "merge index") to select one from several motion vector candidates. The selected candidate is used without any further signaling. Also, skip mode does not support encoding any residual coefficients. The absence of coded residual coefficients when using skip mode means that no transform of skip mode is required. Therefore, the skip mode does not generally cause pipeline processing problems. Pipeline processing problems may be the case for intra-predicted CUs and inter-predicted CUs. Due to the limited signaling of the skip mode, the skip mode is useful for achieving very high compression performance when relatively high quality reference frames are available. Bi-predictive CUs in higher temporal layers of a random access picture group structure typically have high quality reference pictures and motion vector candidates that accurately reflect the underlying motion.

Samples are selected according to the motion vector and the reference picture index. Motion vectors and reference picture indices are applicable to all chroma channels, and thus inter prediction is described mainly in terms of operation on PUs rather than PBs. Within each category (i.e., intra and inter prediction), different techniques may be applied to generate PUs. For example, intra prediction may use the combined direction of values from adjacent rows and columns of previously reconstructed samples to generate a PU according to a prescribed filtering and generation process. Alternatively, the PU may be described using a small number of parameters. The inter prediction method may vary in the number of motion parameters and the accuracy thereof. The motion parameters typically include reference frame indices (which indicate which reference frames from the reference frame list will be used plus the respective spatial translations of the reference frames), but may include more frames, special frames, or complex affine parameters such as scaling and rotation. In addition, a predetermined motion refinement process may be applied to generate dense motion estimates based on the reference sample block.

In the case where the best PB 320 is determined and selected and the PB 320 is subtracted from the original sample block at subtractor 322, the residue with the lowest coding cost is obtained (denoted 324) and lossy compressed. The lossy compression process includes the steps of transform, quantization and entropy coding. The transform module 326 applies a forward transform to the difference 324, thereby converting the difference 324 to the frequency domain and producing transform coefficients represented by arrow 332. The forward transform is typically separable, such that a set of rows and then a set of columns of each block are transformed. The transformation of the sets of rows and columns is performed by first applying a one-dimensional transformation to the rows of the block to produce a partial result and then applying a one-dimensional transformation to the columns of the partial result to produce a final result.

The transform coefficients 332 are passed to a quantizer module 334. At block 334, quantization according to a "quantization parameter" is performed to produce residual coefficients represented by arrow 336. The quantization parameter is constant for a given TB, thus resulting in a uniform scaling of the generation of residual coefficients for the TB. Non-uniform scaling may also be achieved by applying a "quantization matrix", whereby the scaling factor applied to each residual coefficient is derived from a combination of the quantization parameter and a corresponding entry in a scaling matrix of size generally equal to the size of the TB. The residual coefficients 336 are supplied to an entropy encoder 338 to be encoded in the bitstream 115. Typically, the residual coefficients of each TB and at least one valid residual coefficient of a TU are scanned according to a scan pattern to generate an ordered list of values. Scan mode typically scans a TB into a sequence of 4 × 4 "sub-blocks", providing a conventional scanning operation at a granularity of 4 × 4 residual coefficient sets, where the configuration of the sub-blocks depends on the size of the TB. In addition, a prediction mode 388 and corresponding block partition are also encoded in the bitstream 115.

As described above, the video encoder 114 needs to access a frame representation corresponding to the frame representation seen in the video decoder 134. Thus, residual coefficients 336 are also inverse quantized by dequantizer module 340 to produce inverse transform coefficients represented by arrow 342. Inverse transform coefficients 342 pass through inverse transform module 348 to generate residual samples of the TU, represented by arrow 350. The summation module 352 adds the residual samples 350 and the PU 320 to generate reconstructed samples (indicated by arrow 354) of the CU.

Reconstructed samples 354 are passed to a reference sample cache 356 and an in-loop filter module 368. The reference sample cache 356, which is typically implemented using static RAM on an ASIC (thus avoiding expensive off-chip memory accesses), provides the minimum sample storage needed to satisfy the dependencies for generating the intra-frame PBs for subsequent CUs in the frame. The minimal dependencies typically include a "line buffer" of samples along the bottom of a row of CTUs for use by the next row of CTUs as well as column buffering whose range is set by the height of the CTUs. The reference sample cache 356 supplies reference samples (represented by arrow 358) to a reference sample filter 360. The sample filter 360 applies a smoothing operation to generate filtered reference samples (indicated by arrow 362). The filtered reference samples 362 are used by an intra-prediction module 364 to generate an intra-predicted block of samples represented by arrow 366. For each candidate intra-prediction mode, the intra-prediction module 364 generates a block of samples 366.

The in-loop filter module 368 applies several filtering stages to the reconstructed samples 354. The filtering stage includes a "deblocking filter" (DBF) that applies smoothing aligned with CU boundaries to reduce artifacts created by discontinuities. Another filtering stage present in the in-loop filter module 368 is an "adaptive loop filter" (ALF) that applies a Wiener-based adaptive filter to further reduce distortion. Another available filtering stage in the in-loop filter module 368 is a "sample adaptive offset" (SAO) filter. The SAO filter operates by first classifying reconstructed samples into one or more classes, and applying an offset at the sample level according to the assigned class.

The filtered samples represented by arrow 370 are output from the in-loop filter module 368. Filtered samples 370 are stored in frame buffer 372. Frame buffer 372 typically has the capacity to store several (e.g., up to 16) pictures, and thus is stored in memory 206. Frame buffer 372 is typically not stored using on-chip memory due to the large memory consumption required. As such, access to frame buffer 372 is expensive in terms of memory bandwidth. Frame buffer 372 provides reference frames (represented by arrow 374) to motion estimation module 376 and motion compensation module 380.

The motion estimation module 376 estimates a plurality of "motion vectors" (denoted 378), each of which is a cartesian spatial offset relative to the position of the current CB, to reference a block in one of the reference frames in the frame buffer 372. A filtered block of reference samples (denoted 382) is generated for each motion vector. The filtered reference samples 382 form further candidate modes available for potential selection by the mode selector 386. Furthermore, for a given CU, PU 320 may be formed using one reference block ("uni-prediction"), or may be formed using two reference blocks ("bi-prediction"). For the selected motion vector, the motion compensation block 380 generates the PB 320 according to a filtering process that supports sub-pixel precision in the motion vector. As such, the motion estimation module 376, which operates on many candidate motion vectors, may perform a simplified filtering process to achieve reduced computational complexity as compared to the motion compensation module 380, which operates on only selected candidates.

Although the video encoder 114 of fig. 3 is described with reference to general video coding (VVC), other video coding standards or implementations may employ the processing stages of the module 310 and 386. Frame data 113 (and bit stream 115) may also be retrieved from memory 206, hard drive 210, CD-ROM, Blue-ray disc (Blue-ray disc)^TM) Or other computer-readable storage medium (or written to memory 206, hard drive 210, CD-ROM, blu-ray disc, or other computer-readable storage medium). In addition, the frame data 113 (and the bit stream 115) may be received from (or transmitted to) an external source, such as a server or a radio frequency receiver connected to the communication network 220.

A video decoder 134 is shown in fig. 4. Although the video decoder 134 of fig. 4 is an example of a general video coding (VVC) video decoding pipeline, other video codecs may be used to perform the processing stages described herein. As shown in fig. 4, the bit stream 133 is input to the video decoder 134. The bit stream 133 can be read from memory 206, hard drive 210, CD-ROM, Blu-ray disc, or other non-transitory computer-readable storage medium. Alternatively, the bit stream 133 can be received from an external source (such as a server or a radio frequency receiver connected to the communication network 220). The bitstream 133 contains encoded syntax elements representing the captured frame data to be decoded.

The bitstream 133 is input to the entropy decoder module 420. The entropy decoder module 420 extracts syntax elements from the bitstream 133 and passes the values of the syntax elements to other modules in the video decoder 134. The entropy decoder module 420 applies a CABAC algorithm to decode syntax elements from the bitstream 133. The decoded syntax elements are used to reconstruct parameters within video decoder 134. The parameters include residual coefficients (represented by arrow 424) and mode selection information (represented by arrow 458) such as intra prediction mode. The mode selection information also includes information such as motion vectors, and partitioning of each CTU into one or more CBs. The parameters are used to generate the PB, typically in combination with sample data from previously decoded CBs.

The residual coefficients 424 are input to a dequantizer module 428. The dequantizer module 428 dequantizes (or "scales") the residual coefficients 424 to create reconstructed transform coefficients (represented by arrows 440) from the quantization parameters. If the use of a non-uniform inverse quantization matrix is indicated in the bitstream 133, the video decoder 134 reads the quantization matrices from the bitstream 133 as a sequence of scaling factors and arranges the scaling factors into a matrix according to the scaling factors. The inverse scaling uses the quantization matrix in combination with the quantization parameters to create reconstructed transform coefficients 440.

The reconstructed transform coefficients 440 are passed to an inverse transform module 444. Module 444 transforms the coefficients from the frequency domain back to the spatial domain. TB is effectively based on significant residual coefficients and non-significant residual coefficient values. The result of the operation of module 444 is a block of residual samples represented by arrow 448. The residual samples 448 are equal in size to the corresponding CU. The residual samples 448 are supplied to a summation module 450. At summing block 450, the residual samples 448 are added to the decoded PB (denoted 452) to produce a block of reconstructed samples, denoted by arrow 456. The reconstructed samples 456 are supplied to a reconstructed samples cache 460 and an in-loop filter module 488. The in-loop filtering module 488 produces a reconstructed block of frame samples denoted as 492. Frame samples 492 are written to a frame buffer 496.

Reconstructed sample cache 460 operates in a similar manner to reconstructed sample cache 356 of video encoder 114. The reconstructed sample cache 460 provides storage for reconstructed samples needed for intra-prediction of subsequent CBs without the memory 206 (e.g., by instead using data 232, which is typically on-chip memory). The reference samples represented by arrow 464 are obtained from the reconstructed sample cache 460 and are supplied to a reference sample filter 468 to produce filtered reference samples represented by arrow 472. The filtered reference samples 472 are supplied to an intra prediction module 476. Module 476 generates a block of intra-predicted samples, represented by arrow 480, from the intra-prediction mode parameters 458, represented in the bitstream 133 and decoded by the entropy decoder 420.

When intra prediction is indicated in bitstream 133 for the current CB, the intra predicted samples 480 form decoded PB 452 via multiplexer module 484.

When inter prediction is indicated in bitstream 133 for the current CB, motion compensation module 434 selects and filters a block of samples from frame buffer 496 using the motion vector and the reference frame index to produce a block of inter predicted samples, indicated as 438. Sample block 498 is obtained from a previously decoded frame stored in frame buffer 496. For bi-prediction, two blocks of samples are generated and mixed together to generate samples of the decoded PB 452. Frame buffer 496 is filled with filtered block data 492 from in-loop filter module 488. As with the in-loop filtering module 368 of the video encoder 114, the in-loop filtering module 488 applies any, at least, or all of the DBF, ALF, and SAO filtering operations. In-loop filter module 368 generates filtered block data 492 from reconstructed samples 456.

Fig. 5 is a schematic block diagram illustrating a set 500 of available segmentations or splits of a region into one or more sub-regions in a tree structure for general video coding. As described with reference to fig. 3, the partitioning shown in the set 500 may be utilized by the block partitioner 310 of the encoder 114 to partition each CTU into one or more CUs or CBs according to the number of encodings as determined by lagrangian optimization.

Although the collection 500 only shows partitioning a square region into other possible non-square sub-regions, it should be understood that the graph 500 is illustrating potential partitioning and does not require the inclusion region to be square. If the containing area is non-square, the size of the block resulting from the division is scaled according to the aspect ratio of the containing block. Once a region is not further split, i.e., at a leaf node of the coding tree, a CU occupies the region. The particular subdivision of a CTU into one or more CUs using block partitioner 310 is referred to as a "coding tree" of CTUs. The process of sub-dividing the region into sub-regions must be terminated when the resulting sub-regions reach the minimum CU size. In addition to constraining a CU to be forbidden to be smaller than, for example, a size of 4 × 4, a CU is constrained to have a minimum width or height of four. Other minimum values are also possible in terms of width and height or in terms of both width or height. The sub-division process may also be terminated before the deepest layer of the decomposition, resulting in a CU larger than the minimum CU size. It is possible that no splitting occurs resulting in a single CU occupying the entire CTU. A single CU occupying the entire CTU is the maximum available coding unit size. Furthermore, the CU that is not split is larger than the processing region size. CU sizes such as 64 × 128, 128 × 64, 32 × 128, and 128 × 32 are possible as a result of binary or ternary splitting at the highest level of the coding tree, each also being larger than the processing region size. An example of a CU that is larger than the processing region size is further described with reference to fig. 10.

In case there is no further sub-division, there is a CU at a leaf node of the coding tree. For example, leaf node 510 contains one CU. At non-leaf nodes of the coding tree, there is a split to two or more other nodes, where each node may contain a leaf node (and thus one CU), or a further split to a smaller area.

As shown in FIG. 5, quadtree splitting 512 partitions the containment region into four equal sized regions. Compared to HEVC, universal video coding (VVC) achieves additional flexibility by adding horizontal binary splits 514 and vertical binary splits 516. Splits 514 and 516 each split the containing region into two equal sized regions. The partitioning is along either a horizontal boundary (514) or a vertical boundary (516) within the containing block.

Further flexibility is achieved in general video coding by adding a ternary horizontal split 518 and a ternary vertical split 520. Ternary splitting 518 and 520 partitions a block into partitions along the encompassing region width or height¹/₄And³/₄three regions are defined in either the horizontal (518) or vertical (520) directions. The combination of the quadtree, binary tree, and ternary tree is referred to as a "QTBTTT". The root of the tree includes zero or more quadtree splits (the "QT" section of the tree). Once the QT segment terminates, zero or more binary or ternary splits ("multi-tree" or "MT" segments of a tree) may occur, eventually ending in CBs or CUs at leaf nodes of the tree. In the case where the tree describes all color channels, the leaf node is the CU. In the case where the tree describes a luminance channel or a chrominance channel, the leaf node is CB.

The QTBTTT results in more possible CU sizes than HEVC supporting only quadtrees, and thus only square blocks, especially in view of the possible recursive application of binary tree and/or ternary tree splitting. The likelihood of an abnormal (non-square) chunk size can be reduced by constraining the split option to eliminate splits that would result in a chunk width or height that is less than four samples or would result in a size that is not a multiple of four samples. In general, the constraints will apply when considering luma samples. However, in the above arrangement, the constraints may also be applied to the blocks of the chroma channel separately, and the splitting option that applies the constraints to the chroma channel may result in different minimum block sizes for luminance vs chroma (e.g., when the frame data is in a 4:2:0 chroma format or a 4:2:2 chroma format). Each split produces a sub-region having a constant, halved or quartered edge size relative to the containment region. Then, since the CTU size is a power of 2, the edge size of all CUs is also a power of 2.

Fig. 6 is a schematic flow diagram of a data stream 600 showing a QTBTTT (or "coding tree") structure used in general video coding. A QTBTTT structure is used for each CTU to define the partitioning of the CTU into one or more CUs. The QTBTTT structure of each CTU is determined by the block partitioner 310 in the video encoder 114 and encoded into the bitstream 115 or decoded from the bitstream 133 by the entropy decoder 420 in the video decoder 134. The data stream 600 further characterizes the allowable combinations available to the block partitioner 310 for partitioning a CTU into one or more CUs according to the partitioning shown in fig. 5.

Starting from the top level of the hierarchy, i.e. at the CTU, zero or more quadtree splits are first performed. In particular, a Quadtree (QT) split decision 610 is made by the block partitioner 310. The decision at 610 returns a "1" symbol, indicating that the decision is to split the current node into four child nodes according to quadtree splitting 512. The result is that four new nodes are generated, such as at 620, and for each new node, recursion is made back to the QT split decision 610. Each new node is considered in raster (or Z-scan) order. Alternatively, if the QT split decision 610 indicates that no further splits are to be made (returning a "0" symbol), then the quadtree partition is stopped and multi-tree (MT) splits are then considered.

First, an MT split decision 612 is made by block partitioner 310. At 612, a decision to perform MT splitting is indicated. A "0" symbol is returned at decision 612, indicating that no further splitting of the node into child nodes will be performed. If no further splitting of the node is to be done, the node is a leaf node of the coding tree and corresponds to a CU. The leaf nodes are output at 622. Alternatively, if MT split 612 indicates a decision to do MT split (return a "1" symbol), then chunk partitioner 310 proceeds to direction decision 614.

The direction decision 614 indicates the direction of MT split as horizontal ("H" or "0") or vertical ("V" or "1"). If decision 614 returns a "0" indicating the horizontal direction, then block partitioner 310 proceeds to decision 616. If decision 614 returns a "1" indicating the vertical direction, then the block partitioner 310 proceeds to decision 618.

In each of decisions 616 and 618, the number of partitions for MT split is indicated as two (binary split or "BT" node) or three (ternary split or "TT") at BT/TT split. That is, when the direction indicated from 614 is horizontal, a BT/TT split decision 616 is made by the block partitioner 310, and when the direction indicated from 614 is vertical, a BT/TT split decision 618 is made by the block partitioner 310.

BT/TT split decision 616 indicates whether the horizontal split is a binary split 514, as indicated by returning a "0," or a ternary split 518, as indicated by returning a "1. When BT/TT split decision 616 indicates a binary split, at generate HBT CTU node step 625, block partitioner 310 generates two nodes from binary level split 514. When BT/TT split 616 indicates a ternary split, block partitioner 310 generates three nodes from ternary horizontal split 518 at step 626 of generating an HTT CTU node.

The BT/TT split decision 618 indicates whether the vertical split is a binary split 516, indicated by returning a "0," or a ternary split 520, indicated by returning a "1. When the BT/TT split 618 indicates a binary split, the block partitioner 310 generates two nodes from the vertical binary split 516 at a generate VBT CTU node step 627. When the BT/TT split 618 indicates a ternary split, the block partitioner 310 generates three nodes from the vertical ternary split 520 at generate VTT CTU node step 628. For the nodes resulting from steps 625-628, the recursion of returning the data flow 600 to the MT split decision 612 is applied in either left-to-right or top-to-bottom order, depending on the direction 614. As a result, binary and ternary tree splitting may be applied to generate CUs of various sizes.

The allowed and disallowed disjoint sets at each node of the coding tree are further described with reference to fig. 9.

Fig. 7A and 7B provide an example partitioning 700 of a CTU 710 to multiple CUs or CBs. An example CU 712 is shown in fig. 7A. Fig. 7A shows the spatial arrangement of CUs in the CTU 710. The example partition 700 is also shown in fig. 7B as a coding tree 720.

At each non-leaf node (e.g., nodes 714, 716, and 718) in CTU 710 of fig. 7A, the contained nodes (which may be further partitioned or may be CUs) are scanned or traversed in "Z-order" to create a list of nodes represented as columns in coding tree 720. For quadtree splitting, a Z-order scan results in an order from top left to right followed by bottom left to right. For horizontal and vertical splits, the Z-order scan (traversal) is simplified to a scan from above to below and a scan from left to right, respectively. The coding tree 720 of fig. 7B lists all nodes and CUs according to the applied scanning order. Each split generates a list of two, three, or four new nodes at the next level of the tree until a leaf node (CU) is reached.

In the case where the image is decomposed into CTUs and further into CUs by the block partitioner 310 as described with reference to fig. 3, and each residual block is generated using the CUs (324), the residual block is forward transformed and quantized by the video encoder 114. The TB 336 thus obtained is then scanned to form an ordered list of residual coefficients as part of the operation of the entropy coding module 338. Equivalent processing is done in the video decoder 134 to obtain the TB from the bitstream 133.

The examples of fig. 7A and 7B describe coding trees that are applicable to both the luminance channel and the chrominance channel. However, the examples of fig. 7A and 7B also show behavior with respect to traversing a coding tree that only applies to the luminance channel or a coding tree that only applies to the chrominance channel. For coding trees with many nested splits, the available split options at the deeper layers are constrained by limits on the available block sizes for the respective small regions. A limitation on the available block size of the small region is imposed to prevent the worst case block processing rate from being so high as to impose unreasonable burdens on the implementation. In particular, the constraint that the block size will be a multiple of 16 (sixteen) samples in chroma enables processing samples at a granularity of 16 (sixteen) samples. Constraining the block size to a multiple of sixteen samples is particularly relevant to the "intra reconstruction" feedback loop (i.e., the path involving modules 450, 460, 468, 476, and 484 in video decoder 134 of fig. 4 and the equivalent path in video encoder 114). In particular, constraining the block size to a multiple of 16 (sixteen) samples helps maintain throughput in intra prediction mode. For example, a "simultaneous data multiple instruction" (SIMD) microprocessor architecture typically operates on a wide word that may contain 16 samples. Further, the hardware architecture may use a wide bus (such as a bus having a width of 16 samples, etc.) to transfer samples along the intra-frame reconstruction feedback loop. If a smaller chunk size is used (e.g., four samples), the bus will be under-utilized, e.g., only one-fourth of the bus width contains sample data. Although an under-utilized bus may handle smaller blocks (i.e., less than sixteen samples), in a worst-case scenario (such as many or all blocks having a relatively small size, etc.), the under-utilization may result in preventing real-time operation of the encoder (114) or decoder (134). For inter prediction, each block depends on reference samples obtained from a frame buffer (such as buffer 372 or 496). Since the frame buffer is filled with reference samples when processing the previous frame, there is no feedback dependency loop affecting the block-wise operation for generating the inter-predicted block. In addition to the feedback dependency loop related to intra frame reconstruction, there is an additional and concurrent feedback loop related to the determination of the intra prediction mode 458. The intra prediction mode 458 is determined by selecting a mode from the most probable mode list or selecting a mode from the remaining mode list. The determination of the most probable mode list and the remaining mode list requires intra prediction modes of neighboring blocks. When using a relatively small block size, the most probable mode list and the remaining mode list need to be determined more frequently, i.e. at a frequency governed by the block size in the samples and the sampling rate of the channel.

Fig. 8 is a diagram illustrating a set 800 of transform block sizes and associated scan patterns for chroma channels resulting from using a 4:2:0 chroma format. The set 800 may also be used for a 4:2:2 chroma format. The described arrangement is suitable for use with image frames having the following chroma format: wherein the chrominance channels of the image frame are sub-sampled with respect to the luminance channels of the image frame, particularly for the 4:2:0 and 4:2:2 formats. Set 800 does not include all possible chroma transform block sizes. Only chroma transform blocks having a width less than or equal to sixteen or a height less than or equal to eight are shown in fig. 8. Chroma blocks with larger widths and heights may occur, but are not shown in fig. 8 for ease of reference. For the case of sharing the coding tree between the luminance and chrominance channels, the additional chrominance transform sizes are 2 × 16, 4 × 16, 8 × 16, 16 × 16, and 32 × 32. For the case where the coding tree for the chroma channel is separate from the coding tree for the luma channel ("dual coding tree"), the following additional chroma transform sizes are also available: 2 × 32, 4 × 32, 8 × 32, 16 × 32, 32 × 2, 32 × 4, 32 × 8, and 32 × 16. However, set 800 demonstrates a method for scanning a TB that can be similarly applied to scan a larger TB.

A set of forbidden transform sizes 810 includes transform block sizes 2 x 2, 2 x 4, and 4 x 2, all of which have regions of less than sixteen samples. In other words, the minimum transform size of 16 (sixteen) chroma samples results from the operation of the described arrangement, in particular for intra prediction CB. Disabling instances of transform size 810 is avoided by determining the split option as described with reference to fig. 9. The residual coefficients in a transform are scanned in a two-layer manner, where the transform is partitioned into "sub-blocks" (or "coefficient groups"). The scan occurs along a scan path from the last significant (non-zero) coefficient back toward the DC (top left) coefficient. The scan path is defined as the progression within each sub-block ("lower layer") and from one sub-block to the next ("upper layer"). In the set 800, the 8 × 2TB 820 uses 8 × 2 sub-blocks, i.e., sub-blocks containing sixteen residual coefficients. The 2 × 8TB 822 uses 2 × 8 sub-blocks, i.e., also contains sixteen residual coefficients.

A TB with a width or height of 2 and another size of a multiple of 8 uses multiple 2 × 8 or 8 × 2 sub-blocks. Thus, in some cases chroma blocks having a width of two samples are encoded using a block-to-sub-block partition, each size being 2 x 8 samples, and in some examples, chroma blocks having a height of two samples are encoded using a block-to-sub-block partition, each size being 8 x 2 samples. For example, 16 × 2TB 816 has two 8 × 2 sub-blocks, each sub-block scanned as shown for TB 820. The progress of the scan from one sub-block to the next is shown in sub-block progress 817.

A 2 x 32TB (not shown in fig. 8) uses four 2 x 8 sub-blocks arranged in a one-by-four array. The residual coefficients in each sub-block are scanned as shown for the 2 x 8TB 822, with the sub-blocks proceeding from the lowest to the highest sub-block of the one-by-four array.

Larger TBs follow a similar scanning progression. For all TBs having a width and height each greater than or equal to four, a 4 x 4 sub-block scan is used. For example, 4 × 8TB 823 uses a 4 × 4 sub-block scan 824 with progress from the lower sub-block to the upper sub-block. The 4 x 4TB 825 may be scanned in a similar manner. The 8 × 8TB 829 uses the schedule 830 for four 4 × 4 sub-blocks. In all cases, the scanning within a sub-block and the progression from sub-block to sub-block follows an inverse diagonal scan, i.e., the scan proceeds from the "last" valid residual coefficient of the TB back towards the top left residual coefficient. Fig. 8 also shows the scanning order across, for example, 8 x 4TB 832, 16 x 4TB 834, and 16 x 8TB 836. Furthermore, depending on the position of the last significant coefficient along the scan path, it is only necessary to scan the portion of the sub-block that contains the last significant residual coefficient that returns from the last significant coefficient position of the sub-block to the upper left residual coefficient. Sub-blocks further along the scan path in the forward direction (i.e., closer to the lower right of the block) need not be scanned. The set 800 and in particular the forbidden transform size 810 imposes a limit on the ability to split a region (or node) of the coding tree into sub-regions (or sub-nodes), as described with reference to fig. 9.

In VVC systems using 2 × 2, 2 × 4, and 4 × 2 TBs (a group of TBs 810), 2 × 2 subblocks may be employed for TBs that are two samples in width and/or height. As described above, the use of TB 810 increases the throughput constraints in the intra-frame reconstruction feedback dependency loop. Furthermore, the use of sub-blocks with only four coefficients increases the difficulty of analyzing the residual coefficients at higher throughput. In particular, for each sub-block, a "significance map" indicates the significance of each residual coefficient contained. Encoding a one-value significance flag establishes the magnitude of the residual coefficient as at least one, and encoding a zero-value flag establishes the magnitude of the residual coefficient as zero. The residual coefficient magnitude (from a forward direction) and sign are encoded only for "significant" residual coefficients. For DC coefficients, no significant bits are coded and the magnitude (from zero) is always coded. High throughput encoders and decoders may need to encode or decode multiple active bins per clock cycle to maintain real-time operation. When inter-bin dependencies are large (e.g., when using smaller sub-block sizes), the difficulty of multi-bin encoding and decoding per cycle increases. In system 100, the sub-block size is 16 (although the sub-block containing the last significant coefficient is excluded), regardless of the block size.

Fig. 9 is a diagram illustrating a set of rules 900 for generating a list of allowed splits in a luma coding tree and a chroma coding tree when intra prediction is used. For a particular frame (including the first frame of a sequence of frames), all blocks use intra prediction. Other frames may allow for a mix of inter-predicted blocks and intra-predicted blocks. Although the complete set of available splits of the coding tree has been described with reference to fig. 6, the limitation on the available transform size imposes constraints on the particular split option for a given region size. As described below, the splitting options of the luminance channel and the chrominance channel are determined according to the size of the region corresponding to the coding tree unit.

Since VVC allows different coding trees to be used for luma samples and chroma samples, the allowable splitting options for chroma samples are different from the splitting options for luma samples. The set of rules 900 is thus partitioned into a set of rules 920 for chroma regions and a set of rules 910 for luma regions. Separate rules are shown for the luma coding tree and the chroma coding tree, enabling different transform block sets to be used for the luma channel and the chroma channel. In particular, the set of blocks available for the luminance channel and the chrominance channel is not required to be correlated by the chrominance format. When traversing the nodes of the coding tree, a list of allowed splits is obtained by checking the availability of a set of split options using the region size. The split option resulting in a region that can itself be coded using CB is added to the list of allowed splits. For regions coded using CB, the region size must be such that it can be coded with an integer number of transforms of a certain size from the set 800. The particular size is selected to be a maximum size that does not exceed the size of the region (taking into account both width and height). As such, for smaller regions, a single transform is used, and in the event that the region size exceeds the size of the maximum available transform, the maximum available transform is stitched to occupy the entire region.

When the chroma region is processed with the set 920, an initial split list of split options is generated. Each split option is tested against the region size to determine if the split option would result in a sub-region of forbidden size that is smaller than the size of the transforms of the set 800. The split option for the resulting sub-region is added to the allowed chroma split list 970, which allowed size matches the integer number of minimum transform sizes of the set 800.

For example, in the case of QT mode (corresponding to decision 610 of fig. 6), if a region has a size of 8 × 8 in the 4:2:0 format or 8 × 16 in the 4:2:2 format, then quad-tree splitting is not allowed, since splitting would result in a transform size of 2 × 2 or 2 × 4 for the chroma channels, respectively. The allowable region size is indicated by arrow 921. Similarly, other allowable splits for chroma rule set 920 are indicated by arrows 922, 923, 924, 925, and 926 and are discussed below with respect to fig. 10 and 11.

The region size of the chrominance channels is described in terms of a grid of luminance samples. For example, when a 4:2:0 chroma format is used, the 8 × 4 region corresponds to a 4 × 2 transform for the chroma channel. When the 4:2:2 chroma format is used, the 8 × 4 region corresponds to a 4 × 4 transform in chroma. When the 4:4:4 chroma format is used, the chroma is not subsampled with respect to the luma, so the transform size in chroma corresponds to the region size.

The allowable luma transform split involves different minimum size constraints, where 4 × 4 is not allowed. Although 4 x 4 luma PB meets the requirement of being a multiple of 16 samples, for video in the 4:2:0 chroma format, the sampling rate in luma is four times the sampling rate of each chroma channel. Even if the 4 × 4 luma prediction block does not result in insufficient bus utilization (e.g., in SIMD architectures or bus architectures with a width of 16 samples), the intra-reconstruction feedback loop and the intra-prediction mode determination feedback loop are difficult to adapt to operation at relatively high sampling rates. Disabling the 4 x 4 block in the luminance channel will reduce the severity of the feedback loop to a level where an implementation at a high sampling rate is feasible. Similar to the rule set 920, the allowable splits in the set 910 are illustrated by arrows 901-906 and used to generate a list 972 of allowable splits. The allowable split option is further described below with respect to fig. 10 and 11.

Fig. 10 illustrates a method 1000 of encoding a coding tree of image frames into a video bitstream. As described with reference to fig. 12, the method 1000 is performed for each of a luma coding tree and a chroma coding tree, such that each coding tree for a CTU is determined and the resulting coding tree is encoded into the bitstream 115. The method 1000 may be embodied by an apparatus such as a configured FPGA, ASIC, ASSP, or the like. Additionally, the method 1000 may be performed by the video encoder 114 under execution of the processor 205. As such, the method 1000 may be stored on a computer-readable storage medium and/or in the memory 206. The method 1000 of luma and chroma coding tree calls for each CTU begins at generate initial split option step 1010, where the "current node" (or current region) is the root of the luma or chroma coding tree, i.e., the region occupying the entire CTU. When block partitioner 310 receives frame data 113, encoder 114 implements method 1000 for each of the luma and chroma coding trees.

At generate initial split option step 1010, the processor 205 generates a split option for the current node of the coding tree. The split option is generated for either the luma channel or the chroma channel depending on the iteration of the method 1000. Initially, the coding tree is in the Quadtree (QT) stage, where splitting is only allowed to be a quadtree split (split 512 of fig. 5) or a stop of splitting (see 510 of fig. 5). Furthermore, for frames or slices encoded to use only intra-predicted blocks, the luma and chroma coding trees include a quadtree split at their root nodes. As a result, for a 128 × 128CTU, when the 4:2:0 chroma format is used, the maximum luma intra prediction CB is 64 × 64, and the maximum chroma intra prediction CB is 32 × 32. For frames encoded to use a slice of either or both of the intra-predicted block and the inter-predicted block, the luma coding tree and the chroma coding tree need not include quadtree splitting at their root nodes. However, the intra prediction CB is not allowed to cross the boundary of the 64 × 64 luminance sample grid. When the quadtree splitting has stopped, the coding tree is considered to be in the multi-tree (MT) phase, corresponding to decision 612 of fig. 6. In the multi-tree phase, the split option will (i) stop splitting (i.e., 510), in which case the region corresponding to the current node is encoded using the CB; or (ii) continuing the resolution. As an initial split option, binary and ternary splits in both the horizontal and vertical directions may be used (see 514 and 520 of FIG. 5). As a result of step 1010, a list of all possible splits (i.e., QTs or MTs) for the coding tree phase is created. Control in processor 205 proceeds from step 1010 to determine chroma format step 1020.

At determine chroma format step 1020, processor 205 determines the chroma format of frame data 113 as one of a 4:2:0 chroma format and a 4:2:2 chroma format. The chroma format is an attribute of the frame data and does not change during operation. The chroma format is thus provided to the video encoder 114 by means such as a configuration file or register. The determined chroma format is encoded in the bitstream 113, for example, using a "chroma _ format _ idc" syntax element, only once for video. Control in processor 205 proceeds from step 1020 to a generate allow splits step 1030.

At the generated allow split step 1030, the processor 205 applies rules to constrain the allow split types to the split options of step 1010 to produce a allow split list. When processing a luma coding tree, a list 972 of allowed luma splits is created by performing step 1030. When processing the chroma coding tree, a list of allowed chroma splits 970 is created in the execution of step 1030. The constraint allows the rule of splitting type to take into account the available transform size of the luma channel and the chroma channel, respectively.

In general, for an N M transform in a luminance channel, there is an N/2M/2 transform available for a chrominance channel when a 4:2:0 chrominance format is used, or there is an N/2M transform available for a chrominance channel when a 4:2:2 chrominance format is used. As such, the splitting rules are generally equivalent for the luminance and chrominance channels. However, for small block sizes, there are exceptions. In particular, the 4 × 8 and 8 × 4 luma transforms do not have the 2 × 4 and 4 × 2 transforms in the corresponding chroma. Also not allowed is the splitting resulting in a 4 x 4 luminance transform or a 2 x 2 chrominance transform. The rule is equivalent in the luma and chroma channels because the region size of a 2 x 2 chroma transform is 4 x 4 luma samples for the 4:2:0 chroma format.

To the extent that the transform set of luma is different from the transform set of chroma, there is a difference in the allowed split options between luma and chroma. When processing luma nodes in the code tree, the region size of the luma nodes is evaluated for each split option (510 and 520 as shown in FIG. 9). The no split case (510) is always allowed and therefore is always added to the allowed luma split list 972, as indicated by arrow 912. When the region size is 8 x 8, no quadtree splitting is allowed (512), avoiding the use of the disallowed 4 x 4 luma transform size. For larger region sizes, quadtree splitting is allowed and added to the allowed luma split list 972, as indicated by arrow 911. When in the MT phase of the luma coding tree, the following split is not allowed to prevent the use of 4 × 4 transforms in luma:

horizontal binary splitting of 4 × 8 regions (avoiding pairs of 4 × 4 blocks). The remaining splits are allowed as indicated by arrow 913.

Vertical binary splitting of 8 × 4 regions (avoiding pairs of 4 × 4 blocks). The remaining splits are allowed, as indicated by arrow 914.

Horizontal ternary splitting of 4 x 16 or smaller areas (the first and third blocks to avoid splitting are 4 x 4 blocks). The remaining splits are allowed, as indicated by arrow 915.

Vertical ternary splitting of 16 x 4 or smaller regions (the first and third blocks to avoid splitting are 4 x 4 blocks). The remaining splits are allowed as indicated by arrow 916.

In addition, the inhibition will result in any split in luminance where the block has a width or height less than four. Assuming no restrictions on the split are encountered due to avoiding widths or heights less than four and block sizes of 4 x 4, the split is added to the allowed luma split list 972.

When processing chroma nodes in a chroma coding tree, for each split option, a corresponding rule regarding the region size of the node is consulted to determine whether to add the split option to the allowed chroma split list 970. As with the luma coding tree, the chroma coding tree starts in the "QT" phase (corresponding to decision 610 of fig. 6), where either a quadtree split 512 or no split 510 is allowed. Once no split 510 occurs, the coding tree enters the "MT" phase (corresponding to decision 612 of FIG. 6). In the MT phase, either (i) no split indicates the presence of a CB occupying the region associated with the node, or (ii) split 514 occurs 520. The occurrence of one of the splits 514 and 520 divides the region into sub-regions. The resulting sub-regions are also evaluated to determine the allowed split option.

If in the QT stage of the coding tree and using the 4:2:0 chroma format, the node has a region size that has reached 8 x 8 (i.e., 4 x 4 chroma transform), there may be no further quadtree splitting. Furthermore, no other split option is available. An option available is to add "no split" to the node's allowed chroma split. Thus, there is a single 4 × 4CB at the node.

If in the QT phase of the coding tree and using the 4:2:2 chroma format, the node has a region size of 8 x 16 (i.e., a 4 x 8 chroma transform), then there may be no further quadtree splitting and step 1030 enters the MT phase of the coding tree. The 8 x 16 region in the MT phase may have a single 4 x 8 chroma transform, or may have a horizontal split resulting in two 8 x 8 regions and thus a pair of 4 x 4 chroma transforms, or a vertical split resulting in two 4 x 16 regions and thus a pair of 2 x 8 chroma transforms. In the chroma coding tree in the MT phase, the splitting that would result in regions of sizes 4 × 4, 4 × 8 and 8 × 4 and thus introduce transforms of sizes 2 × 2, 2 × 4 and 4 × 2 is prohibited and listed as follows:

horizontal binary splitting of 8 × 8 regions (avoiding pairs of 4 × 2 chroma transforms) or 4 × 16 regions (avoiding pairs of 2 × 4 chroma transforms). The remaining splits are allowed, as indicated by arrow 923.

Vertical binary splitting of 8 × 8 regions (avoiding pairs of 2 × 4 chroma transforms) or 16 × 4 regions (avoiding pairs of 4 × 2 chroma transforms). The remaining splits are allowed, as indicated by arrow 924.

Horizontal ternary splitting of 4 × 16 regions (avoiding the use of 2 × 2 chroma transforms for the first and third sub-regions and the use of chroma transforms for the central 2 × 4 sub-region) or 8 × 16 regions (avoiding 2 × 4 chroma transforms in the first and third sub-regions). The remaining splits are allowed, as indicated by arrow 925.

Vertical ternary splitting of 16 x 4 regions (avoiding the use of 2 x 2 chroma transforms for the first and third sub-regions and chroma transforms for the central 4 x 2 sub-region) or 16 x 8 regions (avoiding 4 x 2 chroma transforms in the first and third sub-regions). The remaining splits are allowed as indicated by arrow 926.

In addition to the above constraints, the prohibition would result in a split of sub-regions having a width or height less than two. Considering the split options of step 1010, the rules described above are consulted and the not prohibited split options are added to the chroma split options list 970 when step 1030 is performed. Once the initial split option has been refined to a list of allowed splits (list 970 for chroma and list 972 for luma), the block partitioner 310 selects one of the allowed splits by evaluating the prediction mode and coding cost according to lagrangian optimization. Control in processor 205 proceeds from step 1030 to zero enable split test 1040.

At the zero allowed split test 1040, the processor 205 tests whether the split option list (970 or 972) contains only "no split" entries (split 510). If so (YES at step 1040), then no further splits may be possible. There is a CB at the current node and control in processor 205 proceeds to encode coded blocks step 1070. If splitting is possible ("no" at step 1040), control in the processor 205 proceeds to a code tree phase test step 1050.

At a coding tree phase test step 1050, the processor 205 checks the phase of the current node in the coding tree, i.e., whether it is QT or MT. If the node is in the QT phase, the decision of the block partitioner 310 will remain in the QT phase and control in the processor 205 proceeds to an encode QT splitting step 1055. If the node is in the MT phase or the decision of the block partitioner 310 is to change from the QT phase to the MT phase for the current node of the coding tree, then control in the processor 205 proceeds to a coding MT splitting step 1060.

At an encode QT split step 1055, the entropy encoder 338, under execution by the processor 205, encodes a QT split flag (as described with respect to decision 610 of fig. 6) having a value of "1" into the bitstream 115. The QT split flag with a value of 1 indicates that the current node is partitioned into four sub-modes, namely, a quadtree split 512. Control in processor 205 proceeds from step 1055 to recursion sub-region step 10100.

At an encode MT split step 1060, the entropy encoder 338, under execution of the processor 205, encodes an additional flag into the bitstream 115 to indicate the type of MT split. If the current node is in the transition from the QT stage to the MT stage of the coding tree, a QT split flag (as described with respect to decision 610 of FIG. 6) having a value of "0" is encoded into the bitstream 115. If at least one split is allowed, except for the "no split" case, then the MT split flag indicates a selection of no split 510 (encode a "0" for the MT split flag, see decision 612 of fig. 6), as determined at step 1030. Step 1060 returns a no and control in processor 205 proceeds to encode encoded blocks step 1070.

Otherwise, the selection of one of the splits 514 and 520 by the block partitioner 310 is indicated by encoding a "1" for the MT split flag (i.e., 612). Step 1060 returns yes and control in processor 205 proceeds to code B/T H/V split step 1090.

At an encode coded block step 1070, the entropy encoder 338, under execution by the processor 205, encodes the prediction mode and residual coefficients of the coded block into the bitstream 115. For intra prediction CB, the intra prediction mode is encoded, and for inter prediction CB, the motion vector is encoded. Residual coefficients are encoded according to a scan proceeding from the last significant residual coefficient in the scan path back toward the DC coefficient of the block.

Furthermore, the coefficients are grouped into "sub-blocks," and for a sub-block, if appropriate, an encoded sub-block flag is encoded indicating that there is at least one valid residual coefficient in the sub-block. If no significant residual coefficients are present in the sub-block, then there is no need to encode each significance flag for each residual coefficient in the sub-block. The sub-block containing the last significant residual coefficient does not need an encoded sub-block flag. The encoded sub-block flag is not encoded for the sub-block containing the DC (top left of block) residual coefficient. As shown in fig. 8, for a given block, the sub-block size is always 4 × 4 in luminance and always one of 2 × 8, 4 × 4, and 8 × 2 in chrominance. Thus, the sub-block size is always 16, which coincides with a block size that is always a multiple of 16, as is the case in the set 800. Control in the processor 205 proceeds from step 1070 to a final coded block test step 1080.

At a last encode block test 1080, the processor 205 determines whether the current encode block is the last CB in the code tree. With hierarchical Z-order scanning, the last CB is the CB occupying the bottom right corner of the CTU. If the current CB is the last CB in the coding tree (YES at step 1080), the method 1000 terminates. Once the method 1000 processes the luma coding tree, the method 1000 is invoked to process the chroma coding tree. The processor 205 may perform two instances of the method 1000 in parallel to process the luma coding tree and the chroma coding tree. If the two instances of method 1000 are performed in parallel, the entropy encoder 338 operates on luma and chroma in a serialized manner to produce a deterministic bit stream. That is, the bit stream produced by the parallel encoder must be decodable by the serial decoder. Otherwise, if step 1080 returns a "no," the current node proceeds to the next node according to the hierarchical Z order scan, as illustrated in fig. 7A and 7B. Control in the processor 205 proceeds to generate an initial split option step 1010.

At the encode B/T H/V split step 1090, the entropy encoder 338, under execution by the processor 205, encodes an additional flag into the bitstream 115 that indicates which split of the split list is allowed to be selected by the block partitioner 310. If the allowed splits list includes only one split, except for the "no splits" case, then the one split must have been selected by the block partitioner 310 and no additional flag needs to be encoded to identify the split. If the list of allowed splits includes splits in both the horizontal and vertical directions, the entropy encoder 338 encodes a flag indicating the direction of the split selected by the block partitioner 310. If the list of allowed splits includes both binary and ternary splits, the entropy encoder 338 encodes a flag indicating the type of split (i.e., binary or ternary) selected by the block partitioner 310. Control in processor 205 proceeds from step 1090 to recursion sub-region step 10100.

At a recursion sub-region step 10100, the processor 205 generates a sub-region from the determined split of step 1030. The method 1000 is invoked recursively for each generated sub-region or node, resulting in a recursion through the coding tree. The recursive invocation of the method 1000 proceeds from one sub-region or node to the next according to a hierarchical Z-order scan of the coding tree, as shown in fig. 7A and 7B. When the child node resulting from the split has been processed by method 1000 to generate a sub-region, the recursion proceeds to the next sibling node in the coding tree. If no other sibling exists, the recursion proceeds to the parent node, at which point the next node (e.g., a sibling of the parent node) is selected as the next node to generate the sub-region. In the case where the parent node is in the QT phase of the coding tree, returning to the parent node results in returning to the QT phase of the coding tree.

Thus, steps 1055, 1060, 1090 and 1070 each operate to encode a marker of the determined allowable splitting option to the bitstream. Each splitting option is determined for one of the chroma channel and the luma channel. The split option may be determined based on the size of the region of the coding tree.

Fig. 11 illustrates a method 1100 for decoding a coding tree in an image frame from a video bitstream. As described with reference to fig. 13, method 1100 is performed for each of a luma coding tree and a chroma coding tree, such that each coding tree of a CTU is decoded from bitstream 133. The method 1100 may be embodied by a device such as a configured FPGA, ASIC, ASSP, or the like. Additionally, method 1100 may be performed by video decoder 134 under execution of processor 205. As such, the method 1100 may be stored on a computer-readable storage medium and/or in the memory 206. The method 1100 of luma and chroma coding tree calls for each CTU begins at generate initial split option step 1110, where the "current node" (or current region) is the root of the luma or chroma coding tree, i.e., the region occupying the entire CTU.

At generate initial split option step 1110, the processor 205 generates a split option for the current node of the coding tree. Initially, the coding tree is in the Quadtree (QT) stage, where splitting is only allowed as a quadtree split (split 512) or a stop of splitting (split 510). Furthermore, for frames or slices encoded to use only intra-predicted blocks, the luma and chroma coding trees include a quadtree split at their root nodes. When the quadtree splitting has stopped, the coding tree is considered to be in the multi-tree (MT) phase. In the multi-tree phase, the split option will stop splitting (using 510), in which case the region corresponding to the current node is encoded with a CB, or continue splitting. As an initial split option, binary and ternary splits (splits 514-520) in the horizontal and vertical directions are available. As a result of step 1110, a list of all possible splits for the coding tree phase (i.e., QT or MT) is created. Control in the processor 205 proceeds to determine chroma format step 1120.

At determine chroma format step 1120, the processor 205 determines the chroma format of the frame data 135 as one of a 4:2:0 chroma format and a 4:2:2 chroma format. For example, a "chroma _ format _ idc" syntax element may be read from bitstream 113 by entropy decoder 420 under execution by processor 205 to determine a chroma format. Control in processor 205 proceeds from step 1120 to generate allow split step 1130.

At generate allowed splits step 1130, processor 205 applies rules to constrain the allowed split types to the split options of step 1110 to produce a list of allowed splits. When processing luma coding trees, a list of allowed luma splits 972 is created. Step 1130 operates in the same manner as step 1030 of method 100, so the allowed split list of nodes in the luma and chroma coding tree in video decoder 134 is the same as the allowed split list of nodes in the luma and chroma coding tree in video encoder 114. Step 1030 operates to generate one of the allowed split lists 970 and 972 depending on whether the luma or chroma coding trees are being processed. As described with respect to step 1030 and fig. 9, the chroma splitting option is different from the luma splitting option, and the chroma splitting option results in a block having a minimum size of 16 samples. Control in the processor 205 proceeds to a QT/MT test step 1140.

At a QT/MT test step 1140, the processor 205 tests whether the current node (region) is in the QT stage or the MT stage of the coding tree. If the current node is in the QT stage of the coding tree and the list of allowed splits includes a "quadtree split" option (split 512), control in the processor proceeds to a decode QT split step 1155. If the current node is in the QT phase of the code tree and the list of allowed splits does not include the "quadtree split" option, i.e., only the "no split" option is allowed, the code tree phase transitions to the "MT" phase and control in the processor 205 proceeds to the zero allowed split test 1150. If the code tree is already in the MT phase, control in processor 205 proceeds to a zero allowed split test 1150.

At zero allowed split test 1150, processor 205 tests whether the split option list (i.e., 970 or 972 for chroma and luma coding trees, respectively) contains only "no split" entries (510). If the split option list contains only no split entries, there may be no further splits and there is a CB at the current node. Step 1150 returns yes and control in the processor proceeds to decode encoded blocks step 1170. If further splitting is possible ("no" at step 1150), control in the processor proceeds to a decode MT splitting step 1160.

At a decode QT split step 1055, the entropy decoder 420, under execution of the processor 205, decodes a QT split flag (i.e., 610) from the bitstream 133 that indicates whether the current node is split into four sub-modes, i.e., whether a quadtree split 512 occurs. If no quadtree splitting occurs ("no" at step 1155), control in the processor 205 proceeds to a zero-allow split test 1150. If quadtree splitting occurs ("yes" at step 1155), control in processor 205 proceeds to recursion sub-region step 11100.

At a decode MT split step 1060, the entropy decoder 420, under execution of the processor 205, decodes other flags from the bitstream 133 indicating the type of MT split. If at least one split is allowed, except for the "no split" case, then the MT split flag indicates the selection of no split 510 (decode "0" for MT split flag (i.e., 612)), as determined at step 1130. Step 1060 returns a no and control in the processor 205 proceeds to decode encoded blocks step 1170. Otherwise, the need to select one of the splits 514-520 that allows splitting (970 or 972) is indicated by decoding a "1" for the MT split flag (i.e., 612). Step 1060 returns yes and control in processor 205 proceeds to decode B/T H/V split step 1190.

At decode block step 1170, the entropy decoder 420, under execution of the processor 205, decodes the prediction mode and residual coefficients for the encoded block from the bitstream 133. For intra prediction CB, the intra prediction mode is decoded, and for inter prediction CB, the motion vector is decoded. Residual coefficients are decoded from a scan proceeding from the last significant residual coefficient in the scan path back toward the DC coefficient of the block. Furthermore, the coefficients are grouped into "sub-blocks," for which an encoded sub-block flag may be decoded, indicating that there is at least one valid residual coefficient in the sub-block. If no significant residual coefficients exist in the sub-block, it is not necessary to decode the respective significance flags for the respective residual coefficients in the sub-block. The sub-block containing the last significant residual coefficient does not require decoding of the sub-block flag, and decoding of the encoded sub-block flag is not done for the sub-block containing the DC (upper left of the block) residual coefficient. As shown in fig. 8, for a given block, the sub-block size is always 4 × 4 in luminance and always one of 2 × 8, 4 × 4, and 8 × 2 in chrominance. Thus, the sub-block size is always 16, which coincides with a block size that is always a multiple of 16, as is the case in the set 800. Control in the processor 205 proceeds to a final code block test step 1180.

At a last coded block test 1180, the processor 205 determines whether the current coded block is the last CB in the code tree. With hierarchical Z-order scanning, the last CB is the CB occupying the bottom right corner of the CTU. If the current CB is the last CB in the code tree, step 1180 returns a YES and method 1100 terminates. Once method 1100 has decoded the luma coding tree, method 1100 is invoked to decode the chroma coding tree. Otherwise, the current node proceeds to the next node according to the hierarchical Z-order scan, as illustrated in FIGS. 7A and 7B. Step 1180 returns a no and control in processor 205 proceeds to generate initial split options step 1110. To correctly parse the bitstream 133, the video decoder 134 typically reads the flags and other syntax elements in the same order as they were written by the video encoder 113. However, other operations may be performed in a different order and/or simultaneously, provided that the dependencies of the other operations are satisfied. For example, a set of these operations, such as for luma and chroma intra reconstruction, may be performed in parallel.

At a decode B/T H/V split step 1190, the entropy decoder 420, under execution by the processor 205, decodes from the bitstream 133 an additional flag indicating which of the splits in the split list is allowed to be made by the video decoder 134. When the split list is allowed to include only one split, except for the "no split" case, the one split needs to be done because there are no other alternatives. Therefore, no additional flag needs to be decoded to identify the split. Control in processor 205 then proceeds to recursion sub-region step 11100. When the list of allowed splits contains splits in both the horizontal and vertical directions, a flag indicating the direction of the split is decoded from the bitstream 133. If the list of allowed splits includes both binary and ternary splits, a flag indicating the type of split (i.e., binary or ternary) is decided from the bitstream 133. Control in processor 205 proceeds to recursion sub-region step 11100.

At a recursion sub-region step 11100, the processor 205 generates sub-regions from the determined split of steps 1190 or 1155 and calls the method 1100 for each of these sub-regions or nodes, resulting in a recursion through the coding tree. The recursive invocation of the method 1100 proceeds from one sub-region or node to the next according to a hierarchical Z-order scan of the coding tree, as shown in fig. 7A and 7B. If the child nodes resulting from the split have been processed to generate a block or sub-region, the recursion proceeds to the next sibling in the coding tree. If no other sibling nodes exist, the recursion proceeds to the parent node or region. The next node (e.g., siblings of the parent node) is selected as the next node to be processed. If the parent node is in the QT phase of the coding tree, then returning to the parent node results in a return to the QT phase of the coding tree.

Thus, steps 1155, 1160, 1190 and 1170 operate to decode the coding units of the coding tree unit by determining a flag from the bitstream 133 to select one of the luma split options and one of the chroma split options determined at step 1130.

As a result of methods 1000 and 1100 and particularly steps 1030 and 1130, relatively small intra-predicted blocks of chroma channels that are difficult to process at high rates in an intra-reconstruction loop are avoided. Avoiding small blocks facilitates implementation at high resolutions and/or frame rates where the "pixel rate" (pixels per second that need to be processed) is high without an unacceptable degradation in quality.

Fig. 12 illustrates a method 1200 for encoding luma and chroma coding trees for an image frame into a video bitstream. The method 1200 may be implemented by a device such as a configured FPGA, ASIC, ASSP, or the like. Additionally, the method 1200 may be performed by the video decoder 114 under execution of the processor 205. As such, the method 1200 may be stored on a computer-readable storage medium and/or in the memory 206. The method 1200 begins at step 1210 where a frame is partitioned into CTUs.

At step 1210 of partitioning the frame into CTUs, block partitioner 310 partitions the current frame of frame data 113 into an array of CTUs under execution of processor 205. Progress is made for encoding on CTUs resulting from the segmentation. Control in the processor proceeds from step 1210 to an encode luma coding tree step 1220.

At an encode luma coding tree step 1220, the video encoder 114, under execution of the processor 205, proceeds to method 1000 to determine and encode a luma coding tree for the current CTU into the bitstream 115. The current CTU is a selected one of the CTUs resulting from performing step 1210. Control in processor 205 passes from step 1220 to a encode chroma coding tree step 1230.

At encode chroma coding tree step 1230, video encoder 114, under execution by processor 205, proceeds to method 1000 to determine and encode into bitstream 115 the chroma coding tree for the current CTU. Control in the processor 205 proceeds from step 1230 to a final CTU test step 1240.

At a last CTU test step 1240, the processor 205 tests whether the current CTU is the last CTU in a slice or frame. If not ("no" at step 1240), video encoder 114 enters the next CTU in the frame and control in processor 205 proceeds from step 1240 back to step 1220 to continue processing the remaining CTUs in the frame. If the CTU is the last CTU in a frame or slice, step 1240 returns yes and method 1200 terminates.

Fig. 13 is a flow diagram of a method 1300 for decoding luma and chroma coding trees for an image frame from a video bitstream. The method 1300 may be implemented by a device such as a configured FPGA, ASIC, ASSP, or the like. Additionally, method 1300 may be performed by video decoder 134 under execution of processor 205. As such, the method 1300 may be stored on a computer-readable storage medium and/or in the memory 206. Method 1300 begins at step 1310 with partitioning a frame into CTUs.

At a partition frame into CTUs step 1310, video decoder 134, under execution of processor 205, determines a partition of a current frame of frame data 133 to be decoded into an array of CTUs. Progress is made for decoding on CTUs resulting from the determined segmentation. Control in the processor proceeds from step 1310 to a decode luma coding tree step 1320.

The video decoder 134, under execution of the processor 205, first performs the method 1100 for the current CTU to decode the luma coding tree of the current CTU from the bitstream 133, at a decode luma coding tree step 1320. The current CTU is a selected one of the CTUs resulting from performing step 1310. Control in processor 205 proceeds from step 1320 to a decode chroma coding tree step 1330.

At decode chroma coding tree step 1330, video decoder 134, under execution by processor 205, proceeds method 1100 a second time for the current CTU to decode the chroma coding tree for the current CTU from bitstream 133. Control in processor 205 proceeds from step 1330 to a final CTU test step 1340.

At a last CTU test step 1340, the processor 205 tests whether the current CTU is the last CTU in a slice or frame. If not ("no" at step 1340), the video decoder 134 enters the next CTU in the frame and control in the processor 205 proceeds from step 1340 back to step 1320 to continue decoding CTUs from the bitstream. If the CTU is the last CTU in a frame or slice, step 1340 returns yes and method 1300 terminates.

Grouping the residual coefficients into sub-blocks of size 16 (sixteen) facilitates implementation of the entropy encoder 338 and entropy decoder 420, e.g., using as described with respect to TB 816 and 823 of fig. 8. In particular, grouping the residual coefficients into sub-blocks of size 16 facilitates implementation of arithmetic coding of context coding bins (such as for significance maps) to allow fixed mode contexts to be used within each sub-block.

Industrial applicability

The described arrangement is applicable to the computer and data processing industries, and in particular to digital signal processing for encoding or decoding signals such as video and image signals, thereby achieving high compression efficiency.

In contrast to HEVC, VVC systems allow the use of separate coding trees for the luma and chroma channels to increase flexibility. However, as described above, the resulting problem may arise because using smaller chroma blocks may affect throughput. The arrangement described herein determines the appropriate rules at which each coding tree unit is processed to help avoid throughput problems. In addition, as described above, given the rules to avoid throughput issues, the described arrangement may help provide improved efficiency and accuracy for arithmetic coding of the context coding bins describing the coding trees.

The foregoing is illustrative of only some embodiments of the invention, which are exemplary only and not limiting, and modifications and/or changes may be made thereto without departing from the scope and spirit of the invention.

(australia only) in the context of this specification the word "comprising" means "primarily but not necessarily only to include" or "having" or "including", rather than "consisting only of …". The word "comprise" includes variations such as "comprises" and "comprising", and the like, has a corresponding variation.

43页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：3D数据生成装置、3D数据再现装置、控制程序以及记录介质

Method, device and system for encoding and decoding a tree of blocks of video samples

相关技术

网友询问留言