Wavefront parallel processing for blocks, bricks, and strips

文档序号：453565 发布日期：2021-12-28 浏览：27次中文

阅读说明：本技术 用于分块、砖块和条带的波前并行处理 (Wavefront parallel processing for blocks, bricks, and strips ) 是由弗努·亨德里王业奎于 2020-04-27 设计创作，主要内容包括：一种解码方法,包括：在视频码流中获得具有第一值的分块结束比特和字节对齐比特,其中,具有第一值的分块结束比特和字节对齐比特用于表示当前译码树块(codingtreeblock,CTB)是一个分块中的最后一个CTB；在该视频码流中获得具有第一值的CTB行结束比特和字节对齐比特,其中,具有第一值的CTB行结束比特和字节对齐比特用于表示波形并行处理(waveformparallelprocessing,WPP)被启用,并表示当前CTB是CTB行中的最后一个CTB,但不是所述分块中的最后一个CTB；根据具有第一值的分块结束比特、具有第一值的CTB行结束比特和字节对齐比特,重建分块中的多个CTB。(A decoding method, comprising: obtaining a block end bit and a byte alignment bit with a first value in a video code stream, wherein the block end bit and the byte alignment bit with the first value are used for indicating that a current Coding Tree Block (CTB) is the last CTB in a block; obtaining a CTB row end bit and a byte alignment bit having a first value in the video bitstream, wherein the CTB row end bit and the byte alignment bit having the first value are used to indicate that Waveform Parallel Processing (WPP) is enabled and indicate that a current CTB is a last CTB in a CTB row but not a last CTB in the block; the plurality of CTBs in the block are reconstructed based on the block end bit having the first value, the CTB row end bit having the first value, and the byte alignment bit.)

1. A decoding method implemented by a video decoder, the method comprising:

the video decoder receiving an encoded video stream, wherein the encoded video stream comprises pictures comprising one or more slices (slices) having one or more tiles (tile), each tile comprising a plurality of Coding Tree Blocks (CTBs);

the video decoder obtaining a block end bit and a byte alignment bit having a first value in the encoded video bitstream, wherein the block end bit and the byte alignment bit having the first value are used to indicate that a current CTB of the plurality of CTBs is a last CTB of a block;

the video decoder obtaining, in the encoded video bitstream, a CTB row end bit and the byte alignment bit having the first value, wherein the CTB row end bit and the byte alignment bit having the first value are used to indicate that Waveform Parallel Processing (WPP) is enabled and indicate that the current CTB of the plurality of CTBs is a last CTB of a CTB row but not a last CTB of the partition;

the video decoder reconstructs the plurality of CTBs in the block from the end-of-block bit having the first value, the end-of-CTB-row bit having the first value, and the byte-alignment bit.

2. The method of claim 1, wherein the end-of-block bit is represented by an end _ of _ tile _ one _ bit.

3. Method according to any of claims 1 and 2, characterized in that the CTB row end bit is represented by an end of subset bit.

4. The method according to any of claims 1 to 3, wherein the WPP is enabled by a flag set in a parameter set.

5. The method of claim 4, wherein WPP is enabled via a flag denoted as entry _ coding _ sync _ enabled _ flag.

6. The method of claim 4, wherein the first value is 1(1) when WPP is enabled.

7. The method according to any one of claims 1 to 6, further comprising: displaying one image generated from the plurality of reconstructed CTBs.

8. An encoding method implemented by a video encoder, the method comprising:

the video encoder segmenting an image into one or more slices, wherein each slice comprises one or more tiles (tile), each tile comprising a plurality of Coding Tree Blocks (CTBs);

when a current CTB in the plurality of CTBs is a last CTB in a block, the video encoder encodes a block end bit having a first value and a byte alignment bit into a video bitstream;

when Waveform Parallel Processing (WPP) is enabled and the current CTB is the last CTB in a CTB line but not the last CTB in the block, the video encoder encodes a CTB line end bit having the first value and the byte-alignment bit into the video bitstream;

the video encoder stores the video bitstream for transmission to a video decoder.

9. The method of claim 8, wherein the end-of-block bit is represented by an end _ of _ tile _ one _ bit.

10. The method according to any of claims 8 and 9, wherein the CTB row end bit is represented by an end of subset bit.

11. Method according to any of claims 8 to 10, wherein WPP is enabled by a flag set in a parameter set.

12. The method of claim 11, wherein WPP is enabled via a flag denoted as entry _ coding _ sync _ enabled _ flag.

13. The method of claim 11, wherein the first value is 1(1) when the WPP is enabled.

14. The method of claim 8, further comprising: and sending the video code stream to the video decoder.

15. A decoding device, characterized in that the decoding device comprises:

a receiver for receiving an encoded video stream;

a memory coupled to the receiver, wherein the memory stores instructions;

a processor coupled to the memory, wherein the processor is configured to execute the instructions to cause the decoding apparatus to:

receiving the encoded video stream, wherein the encoded video stream comprises pictures comprising one or more slices (slices) having one or more tiles (tiles), each tile comprising a plurality of Coding Tree Blocks (CTBs);

obtaining a block end bit and a byte alignment bit having a first value in the encoded video bitstream, wherein the block end bit and the byte alignment bit having the first value are used to indicate that a current CTB of the plurality of CTBs is a last CTB of a block;

obtaining a CTB row end bit and the byte alignment bit having the first value in the encoded video bitstream, wherein the CTB row end bit and the byte alignment bit having the first value are used to indicate that Waveform Parallel Processing (WPP) is enabled and to indicate that the current CTB of the plurality of CTBs is a last CTB of a CTB row but not a last CTB of the block;

reconstructing the plurality of CTBs in the block according to the block end bit having the first value, the CTB row end bit having the first value, and the byte alignment bit.

16. The decoding device according to claim 15, wherein said block end bit is represented by an end _ of _ tile _ one _ bit, said CTB row end bit is represented by an end _ of _ subset _ bit, and said first value is 1.

17. An encoding apparatus characterized by comprising:

a memory storing instructions;

a processor coupled to the memory, wherein the processor is configured to execute the instructions to cause the encoding device to:

segmenting an image into one or more slices (slices), wherein each slice comprises one or more tiles (tile), each tile comprising a plurality of Coding Tree Blocks (CTBs);

when a current CTB in the plurality of CTBs is the last CTB in a block, encoding a block end bit and a byte alignment bit having a first value into a video code stream;

encoding a CTB line end bit having the first value and the byte-alignment bit into the video bitstream when Waveform Parallel Processing (WPP) is enabled and the current CTB is the last CTB in a CTB line but not the last CTB in the block;

and storing the video code stream to send to a video decoder.

18. The encoding device of claim 17, further comprising a transmitter, wherein the transmitter is coupled to the processor and configured to transmit the video bitstream to the video decoder.

19. The encoding device according to claims 17 and 18, wherein said block end bit is represented by an end _ of _ tile _ one _ bit, said CTB row end bit is represented by an end _ of _ subset _ bit, and said first value is 1.

20. A decoding apparatus, characterized in that the decoding apparatus comprises:

the receiver is used for receiving an image for coding or receiving a code stream for decoding;

a transmitter coupled to the receiver, wherein the transmitter is configured to transmit the codestream to a decoder or to transmit a decoded image to a display;

a memory coupled to at least one of the receiver or the transmitter, wherein the memory is to store instructions;

a processor coupled to the memory, wherein the processor is to execute the instructions stored in the memory to perform the method of any of claims 1-7 and 8-14.

21. The decoding device according to claim 20, wherein the decoding device further comprises a display for displaying an image.

22. A system, characterized in that the system comprises:

an encoder;

a decoder in communication with the encoder, wherein the encoder or the decoder comprises a decoding device, an encoding device or a coding apparatus according to any one of claims 15 to 21.

23. A coding module, wherein the coding module comprises:

the receiving module is used for receiving an image for coding or receiving a code stream for decoding;

a sending module coupled to the receiving module, wherein the sending module is configured to send the code stream to a decoding module or send a decoded image to a display module;

a storage module coupled to at least one of the receiving module or the transmitting module, wherein the storage module is configured to store instructions;

a processing module coupled to the storage module, wherein the processing module is configured to execute the instructions stored in the storage module to perform the method of any of claims 1-7 and 8-14.

Technical Field

This disclosure generally describes techniques to support wave front parallel processing (WPP) in video coding. More specifically, the present invention avoids unnecessary repetition of bit and byte alignment in WPP.

Background

Even if a certain piece of video is relatively short, a large amount of video data is required for description. Difficulties may arise when the data is to be streamed or otherwise transmitted in a communication network with limited bandwidth capacity. Therefore, video data typically needs to be compressed before being transmitted over modern telecommunication networks. As memory resources may be limited, the size of the video may also create problems when storing the video in a storage device. Video compression devices typically use software and/or hardware on the source side to decode the video data before transmission or storage, thereby reducing the amount of data required to represent the digital video images. Then, the video decompression apparatus that decodes the video data receives the compressed data at the destination side. With limited network resources and an increasing demand for higher video quality, there is a need for improved compression and decompression techniques that hardly affect image quality while being able to increase compression ratio.

Disclosure of Invention

A first aspect relates to a method implemented by a video decoder of decoding an encoded video stream. The method comprises the following steps: receiving, by a video decoder, the encoded video stream, wherein the encoded video stream comprises pictures, a picture comprising one or more slices (slices) having one or more tiles (tile), each tile comprising a plurality of Coding Tree Blocks (CTBs); the video decoder obtains a block end bit and a byte alignment bit having a first value in the encoded video bitstream, wherein the block end bit and the byte alignment bit having the first value are used to indicate that a current CTB of the plurality of CTBs is a last CTB of a block; a video decoder obtaining a CTB row end bit and the byte alignment bit having a first value in the encoded video bitstream, wherein the CTB row end bit and the byte alignment bit having the first value are used to indicate that Waveform Parallel Processing (WPP) is enabled and indicate that a current CTB of a plurality of CTBs is a last CTB of a CTB row (CTB row) but not a last CTB of the block; the video decoder reconstructs a plurality of CTBs in the block based on a block end bit having a first value, a CTB row end bit having a first value, and a byte alignment bit.

The technique provided by the method avoids duplicate indication (signal) and byte alignment in WPP. By eliminating duplicate indications and byte alignment in WPP, the number of bits used to indicate the end of a line/block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Optionally, according to any one of the above aspects, in another implementation manner of the aspect, the end-of-partitioning bit is represented by an end _ of _ tile _ one _ bit.

Optionally, according to any one of the above aspects, in another implementation manner of this aspect, the CTB row end bit is represented by an end _ of _ subset _ bit.

Optionally, according to any one of the above aspects, in another implementation of this aspect, the WPP is enabled by a flag set in a parameter set.

Optionally, according to any one of the above aspects, in another implementation of this aspect, WPP is enabled via a flag denoted as entry _ coding _ sync _ enabled _ flag.

Optionally, according to any of the above aspects, in another implementation of this aspect, the first value is 1(1) when WPP is enabled.

Optionally, according to any one of the above aspects, in another implementation manner of this aspect, the method further includes: one image generated from the reconstructed plurality of CTBs is displayed.

A second aspect relates to a method implemented by a video encoder for encoding a video stream. The method comprises the following steps: the video encoder segmenting an image into one or more slices, wherein each slice comprises one or more partitions, each partition comprising a plurality of Coding Tree Blocks (CTBs); when a current CTB in a plurality of CTBs is a last CTB in a block, the video encoder encodes a block end bit and a byte alignment bit having a first value into the video bitstream; when Waveform Parallel Processing (WPP) is enabled and a current CTB is a last CTB in a CTB line but not a last CTB in the block, a video encoder encodes a CTB line end bit and a byte alignment bit having a first value into the video bitstream; the video encoder stores the video stream for transmission to a video decoder.

The technique provided by the method avoids duplicate indication and byte alignment in WPP. By eliminating duplicate indications and byte alignment in WPP, the number of bits used to indicate the end of a line/block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Optionally, according to any one of the above aspects, in another implementation manner of the aspect, the end-of-partitioning bit is represented by an end _ of _ tile _ one _ bit.

Optionally, according to any one of the above aspects, in another implementation manner of this aspect, the CTB row end bit is represented by an end _ of _ subset _ bit.

Optionally, according to any one of the above aspects, in another implementation of this aspect, the WPP is enabled by a flag set in a parameter set.

Optionally, according to any one of the above aspects, in another implementation of this aspect, WPP is enabled via a flag denoted as entry _ coding _ sync _ enabled _ flag.

Optionally, according to any of the above aspects, in another implementation of this aspect, the first value is 1(1) when WPP is enabled.

Optionally, according to any one of the above aspects, in another implementation manner of this aspect, the method further includes: and sending the video code stream to the video decoder.

A third aspect relates to a decoding device. The decoding apparatus includes: a receiver for receiving an encoded video stream; a memory coupled to the receiver, wherein the memory stores instructions; a processor coupled to the memory, wherein the processor is configured to execute the instructions to cause the decoding apparatus to: receiving the encoded video stream, wherein the encoded video stream comprises pictures, a picture comprising one or more slices having one or more partitions, each partition comprising a plurality of Coding Tree Blocks (CTBs); obtaining a block end bit and a byte alignment bit having a first value in the encoded video bitstream, wherein the block end bit and the byte alignment bit having the first value are used to indicate that a current CTB of a plurality of CTBs is a last CTB of a block; obtaining a CTB row end bit and a byte alignment bit having a first value in the encoded video bitstream, wherein the CTB row end bit and the byte alignment bit having the first value are used to indicate that Waveform Parallel Processing (WPP) is enabled and indicate that a current CTB of a plurality of CTBs is a last CTB of a CTB row but not a last CTB of the block; reconstructing a plurality of CTBs in the block based on a block end bit having a first value, a CTB row end bit having a first value, and a byte alignment bit.

The decoding apparatus provides a technique that avoids duplicate indication and byte alignment in WPP. By eliminating duplicate indications and byte alignment in WPP, the number of bits used to indicate the end of a line/block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Optionally, according to any one of the above aspects, in another implementation manner of the aspect, the block end bit is represented by an end _ of _ tile _ one _ bit, the CTB row end bit is represented by an end _ of _ subset _ bit, and the first value is 1.

A fourth aspect relates to an encoding device. The encoding device includes: a memory, wherein the memory stores instructions; a processor coupled to the memory, wherein the processor is configured to execute the instructions to cause the encoding device to: segmenting an image into one or more slices, wherein each slice comprises one or more partitions, each partition comprising a plurality of Coding Tree Blocks (CTBs); when a current CTB in a plurality of CTBs is the last CTB in a block, encoding a block end bit and a byte alignment bit having a first value into a video code stream; encoding a CTB end-of-line bit and a byte-alignment bit having a first value into the video bitstream when Waveform Parallel Processing (WPP) is enabled and a current CTB is a last CTB in a CTB line but not a last CTB in the block; and storing the video code stream to send to a video decoder.

The technique provided by the encoding apparatus avoids duplicate indication and byte alignment in WPP. By eliminating duplicate indications and byte alignment in WPP, the number of bits used to indicate the end of a line/block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Optionally, according to any one of the above aspects, in another implementation manner of this aspect, the encoding device further includes a transmitter, where the transmitter is coupled to the processor and configured to transmit the video bitstream to the video decoder.

A fifth aspect relates to a decoding device. The decoding device includes: the receiver is used for receiving an image for coding or receiving a code stream for decoding; a transmitter coupled to the receiver, wherein the transmitter is configured to transmit the codestream to a decoder or to transmit the decoded image to a display; a memory coupled to at least one of the receiver or the transmitter, wherein the memory is to store instructions; a processor coupled to the memory, wherein the processor is configured to execute instructions stored in the memory to perform the methods disclosed herein.

Optionally, according to any of the above aspects, in another implementation form of this aspect, the display is for displaying an image.

A sixth aspect relates to a system. The system comprises: an encoder; a decoder in communication with the encoder; wherein the encoder or decoder comprises a decoding device, an encoding device or a coding apparatus as disclosed herein.

The techniques provided by the system avoid duplicate indication and byte alignment in WPP. By eliminating duplicate indications and byte alignment in WPP, the number of bits used to indicate the end of a line/block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

A seventh aspect relates to a decoding module. The decoding module comprises: the receiving module is used for receiving the image to encode or receiving the code stream to decode; the transmitting module is coupled to the receiving module and is used for transmitting the code stream to the decoding module or transmitting the decoded image to the display module; a storage module coupled to at least one of the receiving module or the transmitting module, wherein the storage module is configured to store instructions; a processing module coupled to the storage module, wherein the processing module is configured to execute instructions stored in the storage module to perform the methods disclosed herein.

The techniques provided by the decode module avoid duplicate indication and byte alignment in WPP. By eliminating duplicate indications and byte alignment in WPP, the number of bits used to indicate the end of a line/block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Drawings

For a more complete understanding of the present invention, reference is made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

Fig. 1 is a block diagram of an exemplary coding system that may utilize video coding techniques.

Fig. 2 is a block diagram of an exemplary video encoder that may implement video coding techniques.

Fig. 3 is a block diagram of an exemplary video decoder that may implement video coding techniques.

Fig. 4 shows a video bitstream for implementing wavefront parallel processing.

Fig. 5 illustrates an embodiment of a method of decoding an encoded video bitstream.

Fig. 6 illustrates an embodiment of a method of encoding an encoded video bitstream.

FIG. 7 is a schematic diagram of a video coding apparatus.

FIG. 8 is a diagram of an embodiment of a decoding module.

Detailed Description

It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The invention should not be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

The following terms are defined below, unless used in the context to the contrary herein. In particular, the following definitions are intended to more clearly describe the present invention. However, each term may have a different description in a different context. The following definitions should therefore be seen as supplementary information and not as limiting the description of any other definitions made for these terms.

A bitstream (bitstream) is a sequence of bits comprising video data that is transmitted between an encoder and a decoder after having been compressed. An encoder (encoder) is a device for compressing video data into a bitstream using an encoding process. A decoder (decoder) is a device for reconstructing video data from a codestream for display using a decoding process. An image (picture) is a complete image in a video sequence intended to be displayed to a user in whole or in part at a corresponding moment in time. A reference picture (reference picture) is a picture including reference samples that can be used when encoding by referring to other pictures according to inter prediction. A coded picture (coded picture) is a representation of a picture that is coded according to inter prediction or intra prediction, is contained in a single access unit in the code stream, and includes a full set of Coding Tree Units (CTUs) for the picture. A slice (slice) is a split of a picture comprising an integer number of complete blocks or an integer number of consecutive complete CTU lines in the picture, wherein the slice and all sub-splits are contained in only a single Network Abstraction Layer (NAL) unit. A reference slice (referenceslice) is a slice of a reference picture that includes reference samples or is used when encoding by referring to other slices according to inter prediction. A slice header (sliceheader) is a portion of a coded slice, including data elements associated with rows of CTUs in all or one of the chunks represented in the slice. An entry point (entrypoint) is a bit position in the bitstream, comprising the first bit of video data of the corresponding subset of the coded slices. The offset (offset) refers to the bit distance between a known bit position and an entry point. A subset (subset) is a subdivision of a set, such as a block, a row of CTUs, or a CTU. A CTU is a subset of a stripe. A Coding Tree Unit (CTU) is a set of samples of a predefined size that can be partitioned through a coding tree. For each luminance/chrominance component, the CTU is divided into Coding Tree Blocks (CTBs). The CTB may be a 64 × 64, 32 × 32, or 16 × 16 pixel block size. Generally, larger pixel blocks increase coding efficiency. Then, the CTB is divided into one or more Coding Units (CUs) such that the size of the CTU is the size of the largest coding unit.

A CTU row (CTUrow) is a set of CTUs that extends horizontally between the left boundary of a stripe and the right boundary of the stripe. A CTB row (CTBrow) is a set of CTBs that extends horizontally between the left boundary of a stripe and the right boundary of the stripe. A CTU column (CTUcolumn) is a set of CTUs that extends vertically between an upper boundary of a strip and a lower boundary of the strip. A CTB column (ctbccolumn) is a set of CTBs that extends vertically between the upper boundary of a stripe and the lower boundary of the stripe. The CTB line end bit (endofCTBrowbit) refers to a bit located at the end of the CTB line. Byte alignment bits (bytearingbits) refer to bits added as padding at the end of a data subset, CTU line, CTB line, block, etc. Byte alignment bits may be used to interpret or compensate for delays introduced by WPP. WPP is a mechanism to delay encode the CTU rows of a slice to support parallel decoding of rows by different threads. The stripe address (sliceadress) refers to an identifiable location of a stripe or sub-portion thereof.

The following abbreviations are used herein: coding tree block (CodingTreeBlock, CTB), coding tree unit (CodingTreeUnit, CTU), coding unit (CodingUnit, CU), coded video sequence (codedvdeoseq, CVS), joint video expert group (joint video expert team, jmet), Motion constrained chunk set (Motion-constrained chunk set, MCTS), maximum transport unit (maxiummtransferrunit, MTU), network abstraction layer (network abstraction layer, NAL), picture order number (picteurederderrecount, SPUs), raw byte sequence load (rawbbytesequency payload, RBSP), sequence parameter set (SequenceParameterSet, SPS), Sub-picture unit (Sub-picteureureunit, general video coding (versingmaking, dvgaking), work draft (POC).

Fig. 1 is a block diagram of an exemplary coding system 10 that may utilize the video coding techniques described herein. As shown in fig. 1, coding system 10 includes a source device 12, where source device 12 provides encoded video data that is then decoded by a destination device 14. In particular, source device 12 may provide video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a variety of devices, including desktop computers, notebook (e.g., laptop) computers, tablets, set-top boxes, "smart" phones and "smart" tablets, etc., telephone handsets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and so forth. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may include any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium that enables source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet network, such as a local area network, a wide area network, or a global network, such as the internet. The communication medium may include a router, switch, base station, or any other device that facilitates communication from source device 12 to destination device 14.

In some examples, the encoded data may be output from output interface 22 to a storage device. Similarly, the encoded data may be accessed from the storage device through the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard disk drive, blu-ray discs, Digital Video Discs (DVDs), compact disc read-only memories (CD-ROMs), flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In another example, the storage device may correspond to a file server or other intermediate storage device, wherein the other intermediate storage device may store the encoded video generated by source device 12. Destination device 14 may access the stored video data from the storage device by streaming or downloading. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Exemplary file servers include a web server (e.g., for a website), a File Transfer Protocol (FTP) server, a Network Attached Storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data through any standard data connection, including an internet connection. The standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., Digital Subscriber Line (DSL), cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data in the storage device may be a streaming transmission, a download transmission, or a combination thereof.

The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions such as dynamic adaptive streaming over HTTP (DASH), encoding digital video for storage in a data storage medium, decoding digital video stored in a data storage medium, or other applications. In some examples, coding system 10 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In accordance with this disclosure, video encoder 20 of source device 12 and/or video decoder 30 of destination device 14 may be used to apply these techniques for video coding. In other examples, the source device and the destination device may include other components or apparatuses. For example, source device 12 may receive video data from an external video source, such as an external video camera. Likewise, destination device 14 may be connected with an external display device instead of including an integrated display device.

The decoding system 10 shown in fig. 1 is merely one example. These techniques for video coding may be implemented by any digital video encoding and/or decoding device. Although the techniques of this disclosure are typically implemented by video coding devices, the techniques may also be implemented by video encoders/decoders (commonly referred to as "codecs"). Furthermore, the techniques of this disclosure may also be implemented by a video preprocessor. The video encoder and/or video decoder may be a Graphics Processing Unit (GPU) or similar device.

Source device 12 and destination device 14 are merely examples of such transcoding devices, wherein source device 12 generates encoded video data for transmission to destination device 14. In some examples, source device 12 and destination device 14 may perform substantially symmetric operations, and thus, both source device 12 and destination device 14 include video encoding and decoding components. Accordingly, transcoding system 10 may support one-way or two-way video transmission between video devices 12 and 14, e.g., for video streaming, video playback, video broadcasting, or video telephony.

Video source 18 in source device 12 may include a video capture device (e.g., a video camera), a video archive that includes previously captured video, and/or a video feed interface that receives video from a video content provider. In another alternative, video source 18 may generate computer graphics-based data as the source video, or as a combination of live video, archived video, and computer-generated video.

In some cases, when video source 18 is a video camera, source device 12 and destination device 14 may constitute a camera handset or video handset. However, as noted above, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In various instances, captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output to the computer-readable medium 16 via the output interface 22.

Computer-readable medium 16 may include transitory media such as wireless broadcast or wired network transmission, and may also include storage media (i.e., non-transitory storage media) such as a hard disk, flash drive, compact disk, digital video disk, blu-ray disk, or other computer-readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14 via a network transmission or the like. Similarly, a computing device in a media production facility, such as an optical disc stamping facility, may receive encoded video data from source device 12 and generate an optical disc that includes the encoded video data. Thus, in various examples, computer-readable media 16 may be understood to include one or more computer-readable media in various forms.

The input interface 28 of the destination device 14 receives information from the computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20. This syntax information is also used by video decoder 30, including syntax elements used to describe characteristics and/or processing of blocks and other coding units (e.g., groups of pictures (GOPs)). The display device 32 displays the decoded video data to a user, and may include any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other types of display devices.

Video encoder 20 and video decoder 30 may operate in accordance with a video coding standard, such as the High Efficiency Video Coding (HEVC) standard currently being developed, and may conform to the HEVC test model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the international telecommunication union telecommunication standardization sector (ITU-T) h.264 standard, also known as Moving Picture Experts Group (MPEG) -4 part 10, Advanced Video Coding (AVC), h.265/HEVC, or an extended version of such a standard. However, the techniques of this disclosure are not limited to any particular encoding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may be integrated with an audio encoder and an audio decoder, respectively, and may include suitable multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to encode both audio and video in a common data stream or separate data streams. The MUX-DEMUX unit may comply with itu h.223 multiplexer protocol, or other protocols such as User Datagram Protocol (UDP), if applicable.

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more application-specific integrated circuits (ASICs), one or more Field Programmable Gate Arrays (FPGAs), one or more discrete logic, one or more software, one or more hardware, one or more firmware, or any combination thereof. When the techniques are implemented in part in software, the device may store the instructions of the software in a suitable non-transitory computer readable medium and execute the instructions by one or more processors in hardware to implement the techniques of the present invention. Video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, respectively, i.e., either may be integrated as part of a combined encoder/decoder (codec) in the respective device. Devices including video encoder 20 and/or video decoder 30 may include integrated circuits, microprocessors, and/or wireless communication devices such as cellular telephones.

Fig. 2 is a block diagram of an exemplary video encoder 20 that may implement video coding techniques. Video encoder 20 may perform intra-coding and inter-coding on video blocks within a video slice (slice). Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or image. Inter-frame coding relies on temporal prediction to reduce or remove temporal redundancy in adjacent frames of a video sequence or in intra-picture video. Intra mode (I-mode) may refer to any of several spatial-based coding modes. Inter modes, such as uni-directional/uni-prediction (P mode) or bi-prediction (B mode), may refer to any of several temporally based coding modes.

As shown in fig. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 2, video encoder 20 includes a mode selection unit 40, a reference frame memory 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The mode selection unit 40 in turn comprises a motion compensation unit 44, a motion estimation unit 42, an intra-prediction (intra-prediction) unit 46 and a segmentation unit 48. To reconstruct the video block, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and an adder 62. A deblocking filter (not shown in fig. 2) is also included to filter block boundaries to remove blockiness from the reconstructed video. A deblocking filter is typically used to filter the output of the adder 62, if desired. In addition to the deblocking filter, other (in-loop or post-loop) filters may be used. Such filters are not shown for simplicity, but may filter the output of the adder 50 (as an in-loop filter) if desired.

In the encoding process, video encoder 20 receives a video frame or slice to be encoded. The frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 inter-prediction encode the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction unit 46 may also intra-prediction encode the received video block relative to one or more neighboring blocks located within the same frame or slice as the block to be encoded to provide spatial prediction. Video encoder 20 may perform multiple encoding processes, for example, to select an appropriate encoding mode for each block of video data.

Furthermore, partition unit 48 may partition a block of video data into sub-blocks based on an evaluation of a previous partition scheme in a previous encoding process. For example, the partition unit 48 may initially partition a frame or slice into a plurality of Largest Coding Units (LCUs), and partition each LCU into a plurality of sub-coding units (sub-CUs) according to a rate-distortion analysis (e.g., rate-distortion optimization). Mode select unit 40 may further generate a quadtree data structure indicating a partitioning of the LCU into a plurality of sub-CUs. Leaf-node CUs of a quadtree may include one or more Prediction Units (PUs) and one or more Transform Units (TUs).

The term "block" is used in this disclosure to refer to any of a CU, PU, or TU in the HEVC context, or similar data structures in other standard contexts (e.g., macroblocks and sub-blocks thereof in h.264/AVC). A CU includes an encoding node, a PU associated with the encoding node, and a TU. The size of the CU corresponds to the size of the coding node and is square. The CU may range in size from 8 × 8 pixels to a maximum of 64 × 64 pixels or larger treeblock size. Each CU may include one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe partitioning the CU into one or more PUs. When a CU is encoded in a skip mode or a direct mode, an intra-prediction mode, or an inter-prediction (inter-prediction/inter-prediction) mode, the partition mode may be different. The PU may be segmented into non-square shapes. For example, syntax data associated with a CU may also describe partitioning the CU into one or more TUs according to a quadtree. TUs may be square or non-square (e.g., rectangular).

Mode select unit 40 may select one of the intra or inter coding modes based on the error result, etc., provide the resulting intra or inter coded block to adder 50 to generate residual block data, and provide to adder 62 to reconstruct the coded block for use as a reference frame. Mode select unit 40 also provides syntax elements such as motion vectors, intra-mode indicators, partition information, and other such syntax information to entropy encoding unit 56.

The motion estimation unit 42 and the motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. The motion estimation performed by motion estimation unit 42 is a process of generating motion vectors, which are used to estimate the motion of the video block. For example, a motion vector may represent the displacement of a PU of a video block within a current video frame or picture relative to a prediction block in a reference frame (or other coding unit) relative to a current block being encoded in the current frame (or other coding unit). A prediction block is a block found that highly matches the block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in reference frame memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Accordingly, the motion estimation unit 42 may perform a motion search with respect to the integer pixel position and the fractional pixel position and output a motion vector with fractional pixel accuracy.

Motion estimation unit 42 calculates the motion vector for a PU of a video block in an inter-coded slice by comparing the location of the PU to the location of a prediction block of a reference picture. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), where each list is used to identify one or more reference pictures stored in the reference frame memory 64. The motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and the motion compensation unit 44.

The motion compensation performed by motion compensation unit 44 may include extracting or generating a prediction block from the motion vector determined by motion estimation unit 42. Additionally, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may find the prediction block to which the motion vector points in one of the reference picture lists. Adder 50 forms a residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being encoded, resulting in pixel difference values, as described below. In general, motion estimation unit 42 performs motion estimation with respect to the luminance component, and motion compensation unit 44 uses a motion vector calculated from the luminance component for both the chrominance component and the luminance component. Mode select unit 40 may also generate syntax elements associated with the video blocks and the video slices for use by video decoder 30 in decoding the video blocks of the video slices.

Intra-prediction unit 46 may intra-predict the current block in place of the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, as described above. In particular, the intra prediction unit 46 may determine an intra prediction mode for encoding the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., in separate encoding processes, and intra-prediction unit 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode from the tested modes for use.

For example, intra-prediction unit 46 may calculate rate-distortion values using rate-distortion analysis for various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics from the tested modes. Rate-distortion analysis is typically used to determine the amount of distortion (or error) between the encoded block and the original unencoded block (once encoded to produce the encoded block) and the code rate (i.e., the number of bits) used to produce the encoded block. The intra-prediction unit 46 may calculate a ratio based on the distortion and rate of each encoded block to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

In addition, the intra prediction unit 46 may be used to encode a depth block of the depth map using a Depth Modeling Mode (DMM). The mode selection unit 40 may determine whether the available DMM mode produces better encoding results than the intra prediction mode and other DMM modes (e.g., using rate-distortion optimization (RDO)). Data of the texture image corresponding to the depth map may be stored in the reference frame memory 64. Motion estimation unit 42 and motion compensation unit 44 may also be used to inter-predict depth blocks of the depth map.

After selecting an intra-prediction mode (e.g., a conventional intra-prediction mode or one of the DMM modes) for a block, intra-prediction unit 46 may provide information to entropy encoding unit 56, wherein the information is indicative of the intra-prediction mode selected for the block. The entropy encoding unit 56 may encode information representing the selected intra prediction mode. Video encoder 20 may carry configuration data in the transmitted codestream, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of the coding contexts of the various blocks, indications of the intra-prediction mode, intra-prediction mode index table, and modified intra-prediction mode index table to be used for each coding context with the highest likelihood.

Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being encoded. Adder 50 represents one or more components that perform this subtraction operation.

The transform processing unit 52 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, thereby generating a video block including residual transform coefficient values. Transform processing unit 52 may perform other transforms conceptually similar to DCT, and may also apply wavelet transforms, integer transforms, sub-band transforms, or other types of transforms.

The transform processing unit 52 applies a transform to the residual block, thereby generating a block having residual transform coefficients. The transform process may convert the residual information from a pixel value domain to a transform domain, such as the frequency domain. The transform processing unit 52 may send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the code rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. Then, in some examples, quantization unit 54 may perform a scan of a matrix including quantized transform coefficients. Alternatively, the entropy encoding unit 56 may also perform scanning.

After quantization, the entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, the entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval entropy (PIPE) coding, or other entropy encoding techniques. In the case of context-based entropy coding, the context may be based on neighboring blocks. After entropy encoding by entropy encoding unit 56, the encoded codestream may be sent to another device (e.g., video decoder 30) or archived for subsequent transmission or retrieval.

Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for subsequent use as a reference block. Motion compensation unit 44 may calculate the reference block by adding the residual block to a predicted block of one of the frames in reference frame store 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Adder 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to produce a reconstructed video block, which is stored in reference frame store 64. Motion estimation unit 42 and motion compensation unit 44 may use the reconstructed video block as a reference block to inter-code a block in a subsequent video frame.

Fig. 3 is a block diagram of an exemplary video decoder 30 that may implement video coding techniques. In the example of fig. 3, video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intra prediction unit 74, an inverse quantization unit 76, an inverse transform unit 78, a reference frame memory 82, and an adder 80. In some examples, video decoder 30 may perform a decoding process that is generally the inverse of the encoding process performed by video encoder 20 (fig. 2). Motion compensation unit 72 may generate prediction data from the motion vectors received from entropy decoding unit 70, and intra-prediction unit 74 may generate prediction data from the intra-prediction mode indicator received from entropy decoding unit 70.

In the decoding process, video decoder 30 receives an encoded video bitstream from video encoder 20, which represents video blocks of an encoded video slice and associated syntax elements. Entropy decoding unit 70 of video decoder 30 entropy decodes the code stream to generate quantized coefficients, motion vectors, or intra prediction mode indicators, as well as other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive video slice-level and/or video block-level syntax elements.

When a video slice is encoded as an intra-coded (I) slice, intra-prediction unit 74 may generate prediction data for video blocks of the current video slice according to the indicated intra-prediction mode and data from previously decoded blocks of the current frame or picture. When a video frame is encoded as an inter-coded (e.g., B, P or GPB) slice, motion compensation unit 72 generates prediction blocks for the video blocks of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 70. The prediction blocks may be generated from one of the reference pictures in one of the reference picture lists. Video decoder 30 may use a default construction technique to construct reference frame list 0 and list 1 from the reference pictures stored in reference frame memory 82.

Motion compensation unit 72 determines prediction information for the video blocks of the current video slice by parsing the motion vectors and other syntax elements and uses the prediction information to generate a prediction block for the current video block being decoded. For example, motion compensation unit 72 uses some of the syntax elements received to determine a prediction mode (e.g., intra-prediction or inter-prediction) for encoding video blocks of a video slice, an inter-prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more reference picture lists of the slice, a motion vector for each inter-coded video block in the slice, an inter-prediction state for each inter-coded video block in the slice, and other information to decode video blocks within the current video slice.

The motion compensation unit 72 may also interpolate according to interpolation filters. Motion compensation unit 72 may use interpolation filters used by video encoder 20 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of the reference block. In this case, motion compensation unit 72 may determine interpolation filters used by video encoder 20 based on the received syntax elements and use these interpolation filters to generate prediction blocks.

Data of the texture image corresponding to the depth map may be stored in the reference frame memory 82. Motion compensation unit 72 may also be used to inter-predict depth blocks of the depth map.

In one embodiment, video decoder 30 includes a User Interface (UI) 84. User interface 84 is used to receive input from a user (e.g., a network administrator) of video decoder 30. Through the user interface 84, the user is able to manage or change settings on the video decoder 30. For example, a user can input or otherwise provide values for parameters (e.g., flags) to control the configuration and/or operation of video decoder 30 in accordance with the user's preferences. For example, user interface 84 may be a Graphical User Interface (GUI) that enables a user to interact with video decoder 30 through graphical icons, drop down menus, check boxes, and the like. In some cases, the user interface 84 may receive information from a user through a keyboard, mouse, or other peripheral device. In one embodiment, the user can access the user interface 84 through a smartphone, a tablet device, a personal computer remote from the video decoder 30, or the like. The user interface 84 described herein may be referred to as an external input module or an external module.

In view of the above, video compression techniques are employed to perform spatial (intra) prediction and/or temporal (inter) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video picture or a portion of a video picture) may be partitioned into multiple video blocks, which may also be referred to as treeblocks, Coding Treeblocks (CTBs), Coding Tree Units (CTUs), Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples of neighboring blocks of the same picture. A video block in an inter-coded (P or B) slice of a picture may use spatial prediction for reference samples of neighboring blocks of the same picture or temporal prediction for reference samples of other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Spatial prediction or temporal prediction is used to generate a prediction block for a block to be encoded. The residual data represents pixel differences between the original block to be encoded and the prediction block. An inter-coded block is coded according to a motion vector pointing to the block of reference samples constituting the prediction block and residual data representing the difference between the coded block and the prediction block. The intra-coded block is coded according to the intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, resulting in residual transform coefficients that may be re-quantized. The quantized transform coefficients are initially arranged in a two-dimensional array and may be scanned to produce a one-dimensional vector of transform coefficients. Entropy coding may be used to achieve further compression.

Image and video compression techniques have evolved rapidly, resulting in various coding standards. Such video coding standards include ITU-T H.261, International organization for standardization/International electrotechnical Commission (ISO/IEC) MPEG-1 part 2, ITU-T H.262 or ISO/IECMPEG-2 part 2, ITU-T H.263, ISO/IECMPEG-4 part 2, Advanced Video Coding (AVC) (also known as ITU-T H.264 or ISO/IECMPEG-4 part 10), and High Efficiency Video Coding (HEVC) (also known as ITU-T H.265 or MPEG-H part 2). AVC includes Scalable Video Coding (SVC), multi-view video coding (MVC), and extensions of multi-view video coding plus depth (MVC + D) and 3DAVC (3D-AVC). HEVC includes scalable HEVC (scalable HEVC, SHVC), multi-view HEVC (multiview HEVC, MV-HEVC), and 3D-HEVC extensions.

Universal video coding (VVC) is a new video coding standard being developed by the joint video expert group (JVET) of ITU-T and ISO/IEC. Although there are several working drafts to the VVC standard, one of them (work drafts, WD) (draft 5), the entire contents of the proposal "universal video coding (draft 5)" jvt-N1001-v 3 on the 13 th jvt conference on 3/27/2019, b.bross, j.chen and s.liu, is incorporated herein by reference.

An illustration of the techniques disclosed herein is the video coding standard being developed by the joint video expert group (jvt) based on ITU-T and ISO/IEC, i.e., universal video coding (VVC). However, these techniques are also applicable to other video codec specifications.

The image segmentation scheme in HEVC will be described below.

HEVC includes four different image segmentation schemes, namely regular banding, correlated banding, blocking (tile), and Wavefront Parallel Processing (WPP). The schemes can be applied to size matching of Maximum Transmission Units (MTUs), parallel processing and end-to-end delay reduction.

The conventional stripes in HEVC are similar to those in h.264/AVC. Each conventional slice is encapsulated in its own Network Abstraction Layer (NAL) unit and intra prediction (intra sample prediction, motion information prediction, coding mode prediction) and entropy coding dependencies across slice boundaries are disabled. Thus, a regular slice can be reconstructed independently of other regular slices within the same image (but interdependencies may still exist due to the presence of the loop filtering operation).

Conventional striping is the only tool available for parallelization, and it also exists in almost the same form in h.264/AVC. Conventional slice-based parallelization does not require too much inter-processor or inter-core communication (except in the case of inter-processor or inter-core data sharing for motion compensation when decoding the predictively encoded image, which typically requires more communication than the case of inter-processor or inter-core data sharing due to intra prediction). However, for the same reason, using conventional slices may generate a large amount of coding overhead due to the bit cost of the slice header and the lack of prediction across slice boundaries. Furthermore, because of the intra-frame independence of conventional stripes and the fact that each conventional stripe is encapsulated in its own NAL unit, conventional stripes (compared to other tools mentioned below) also serve as a key mechanism for bitstream segmentation for matching MTU size requirements. In many cases, this is achieved by the development of the parallelization tools mentioned below, since the goal of parallelization and MTU size matching is to be achieved, and hence, conflicting requirements are placed on the stripe layout within the image.

The relevant slices have shortened slice headers and support splitting the bitstream at tree block boundaries without interrupting any intra prediction. Basically, the relevant slice partitions the conventional slice into NAL units, reducing end-to-end latency by sending a portion of the conventional slice before coding of the entire conventional slice is completed.

In WPP, an image is divided into a single row of Coding Tree Blocks (CTBs). Data from CTBs of other partitions may be used in entropy decoding and prediction. Parallel processing may be performed by decoding the CTB lines in parallel, where the start time for decoding the CTB lines may have two CTB delays. This delay ensures that data related to the CTBs above and to the right of the current CTB is available before decoding the current CTB. With this staggered start approach (which, when graphically represented, appears as a wavefront), as many processor/cores as possible can be supported for parallel processing if the image includes rows of CTBs. Since intra prediction is supported between adjacent tree block rows within an image, significant processor/inter-core communication may be required to implement intra prediction. Using WPP partitioning does not result in additional NAL units compared to the case where WPP partitioning is not used. Thus, WPP is not a means of MTU size matching when MTU size matching is performed, conventional stripes can be used in conjunction with WPP, however this requires some decoding overhead.

The tiles define horizontal and vertical boundaries of columns and rows used to segment the image into tiles. The scanning order of the CTBs is changed to be done inside the partitions (in the CTB raster scan order of the partitions) before the upper left CTB in the next partition is decoded in the partition raster scan order of the picture. Like conventional slices, partitioning removes the dependency on intra prediction and entropy decoding. However, these partitions may not be included in each individual NAL unit (same as WPP in this respect), and thus, the partitions may not be used for MTU size matching. Each partition may be processed by one processor/core. In the case of slices spanning multiple partitions, the inter-processor/inter-core communication required to decode intra prediction between processing units of adjacent partitions is limited to sending shared slice headers, and sharing reconstructed samples and metadata related to loop filtering. When multiple partitions or WPP segments are included in a stripe, the entry point byte offset for each partition or WPP segment, except for the first partition or WPP segment in the stripe, is indicated in the stripe header.

For simplicity, the application of four different image segmentation schemes in HEVC serves as a limitation. A given encoded video sequence cannot include both partitions and the wavefronts of most of the profiles (profiles) specified in HEVC. For each stripe and block, either or both of the following conditions should be satisfied: (1) all decoding tree blocks in the stripe belong to the same block; (2) all coding tree blocks in a partition belong to the same stripe. Finally, a wavefront segment includes only one row of CTBs, and when WPP is performed, if a stripe starts from a row of CTBs, the stripe must end on the same row of CTBs.

An image segmentation scheme in VVC will be described below.

As mentioned above, HEVC includes four different image segmentation schemes, namely banding, chunking, brick (brick), and Wavefront Parallel Processing (WPP). The schemes can be applied to Maximum Transmission Unit (MTU) size matching, parallel processing and end-to-end delay reduction.

The partitions in VVC are similar to the partitions in HEVC. The tiles define horizontal and vertical boundaries of columns and rows used to segment the image into tiles. In VVCs, the chunking concept is further improved by making further horizontal divisions of the chunking to form bricks. The blocks not further divided can also be considered bricks. The scan order of the CTBs is changed to be inside the brick (in the CTB raster scan order of the brick) before decoding the top left CTB in the next brick in the brick raster scan order of the image.

The ribbon in the VVC includes one or more bricks. Each slice is encapsulated in its own NAL unit and intra prediction (intra sample prediction, motion information prediction, coding mode prediction) and entropy coding dependencies across slice boundaries are disabled. Thus, a regular slice can be reconstructed independently of other regular slices within the same image (but interdependencies may still exist due to the presence of the loop filtering operation). VVC defines two bands, respectively: rectangular stripes and raster scan stripes. A rectangular strip includes one or more bricks that occupy a rectangular area in the image. The raster scan strip includes one or more bricks arranged in a brick raster scan order in the image.

The WPP characteristics in VVC are similar to those in HEVC, except that hevcp has two CTU delays, while VVCWPP has only one CTU delay. For hevcpwp, the new decoding thread may begin decoding the first CTU in the CTU row allocated to it after the first two CTUs in the previous CTU row are decoded. On the other hand, for VVCWPP, the new decoding thread may start decoding the first CTU in the CTU row allocated to it after the first CTU in the previous CTU row is decoded.

The indication of the rectangular strip will be described below.

The structure of rectangular slices is indicated in a Picture Parameter Set (PPS) by describing the number of rectangular slices in a picture. A set of top-left brick indices and an increment value used to derive the bottom-right brick index are indicated for each strip to describe the position of the strip in the image and its size (i.e., in brick units). For a raster scan strip, its information is indicated in the strip header by the index of the first brick in the raster scan strip and the number of bricks in the strip.

The PPS syntax table portion shown below includes syntax elements for describing information indicating partitions, bricks, and rectangular strips in the PPS.

single _ brick _ per _ slice _ flag is equal to 1, indicating that each slice referring to this PPS comprises a brick; single _ brick _ per _ slice _ flag is equal to 0, indicating that a slice referring to this PPS may include a plurality of bricks. When the value of single _ brick _ per _ slice _ flag is not present, it is inferred that the value is equal to 1.

rect _ slice _ flag is equal to 0, indicating that the bricks in each slice are arranged in raster scan order and not indicating slice information in PPS; rect _ slice _ flag is equal to 1, represents that the brick in each slice covers a rectangular area of the image, and indicates slice information in the PPS. When single _ quick _ per _ slice _ flag is equal to 1, then it is inferred that rect _ slice _ flag is equal to 1.

num _ slices _ in _ pic _ minus1 plus 1 indicates the number of slices in each picture of the reference PPS. The value of num _ slices _ in _ pic _ minus1 should be in the range of 0 to numbricklipic-1 (inclusive). When the value of num _ slices _ in _ pic _ minus1 is not present and single _ split _ per _ slice _ flag is equal to 1, then it is inferred that the value of num _ slices _ in _ pic _ minus1 is equal to NumBricksInPic-1.

Top _ left _ brick _ idx [ i ] represents the brick index of the brick located in the top left corner of the ith strip. For any i not equal to j, the value of top _ left _ brake _ idx [ i ] should not equal the value of top _ left _ brake _ idx [ j ]. When the value of top _ left _ brick _ idx [ i ] is not present, it is inferred that the value is equal to i. the top _ left _ brick _ idx [ i ] syntax element is Ceil (Log2 (numbricklnpic)) bits in length.

bottom _ right _ brick _ idx _ delta [ i ] represents the difference between the brick index of the brick located at the bottom right corner of the ith strip and top _ left _ brick _ idx [ i ]. When single _ bright _ per _ slice _ flag is equal to 1, then the value of bottom _ right _ bright _ quick _ idx _ delta [ i ] is inferred to be equal to 0. The bottom _ right _ bright _ brick _ idx _ delta [ i ] syntax element is Ceil (Log2 (numbricklipic-top _ left _ brick _ idx [ i ])) bits in length.

The requirement for codestream consistency is that the stripe should comprise a number of complete blocks, or only a continuous sequence of complete bricks in one block.

The variables NumBricksInSlice [ i ] and BricksToSliceMap [ j ] represent the number of bricks in the ith stripe and the brick to stripe mapping, respectively, as follows:

the indication of WPP in VVC will be described below.

The indication method of WPP in VVC is described in syntax tables and PPS, slice header and semantics of slice data.

The flag in the PPS, referred to as the entropy _ coding _ sync _ enabled _ flag, indicates whether the WPP is used to encode pictures that refer to the PPS, as shown in the PPS syntax table portion described below.

When WPP is enabled to encode a picture, the slice headers of all slices of the picture include information of entry points (i.e., offsets from the slice payload data) for accessing each subset of CTU rows for processing according to the WPP method. The indication of this information is shown in the slice header syntax table portion described below.

When WPP is enabled, each row of CTUs is referred to as a subset of data in the stripe data payload. At the end of each data subset, one bit, denoted by end _ of _ subset _ one _ bit, is indicated, indicating the end of the data subset. Further, to ensure that the size of the data subsets is a multiple of one byte (i.e., 8 bits), byte alignment may be performed to add byte alignment bits at the end of each data subset. The indication of end _ of _ subset _ one _ bit and byte alignment ending at each subset is shown in the stripe data syntax table below.

Some of the problems with WPP and bricks will be described below.

First, when a stripe includes a plurality of bricks and WPP is enabled to encode an image including the stripe, each CTU row of each brick in the stripe is a subset of data. The syntax element end _ of _ subset _ one _ bit, indicating the end of the row of CTUs, is indicated at the end of each data subset, or the syntax element end _ of _ brick _ one _ bit, indicating the end of the CTU of the brick, is indicated at the end of each data subset. However, it is not necessary to indicate both syntax elements. Also, byte alignment should be performed at the end of each subset of data, but it is not necessary to repeatedly perform byte alignment.

When blocks, bricks and WPP are used together, the implementation of WPP may be more complex considering that a strip may comprise one or more blocks, and each block may comprise one or more bricks.

In order to solve the above-described problems, the present invention provides the following aspects (each of which may be applied alone, or some of which may be applied in combination).

A first solution includes a method of decoding a video stream. In one embodiment, the video bitstream includes at least one image comprising a plurality of strips, each of the plurality of strips comprising a plurality of bricks, each of the plurality of bricks comprising a plurality of coding tree blocks (CTUs). The method comprises the following steps: the parameter set is parsed to determine whether wavefront parallel processing is enabled for the current image and/or current stripe. The method includes parsing stripe data of a current stripe to obtain a plurality of bricks and a CTU in each brick. The method further comprises the following steps: parsing a current CTU, wherein the current CTU is within the brick; the location of the current CTU is determined. Further, the method comprises: indicating a bit to indicate the end of the CTU row and a byte alignment bit when all of the following conditions are met: the enabled WPP encodes the current stripe, the current CTU not being the last CTU of the current brick; the next CTU in decoding order of the brick is not the first CTU of a row of CTUs in the current brick. The method comprises the following steps: when the current CTU is the last CTU in the current brick, indicating a bit for representing the end of the brick; the byte alignment bit is indicated when the current brick is the last CTU in the current brick, but not the last CTU of the current stripe.

A second solution comprises a method of encoding a video stream. The video bitstream includes at least one image including a plurality of slices, each slice of the plurality of slices including a plurality of tiles and bricks, each tile including one or more bricks. The method comprises the following steps: when WPP is enabled to encode the current picture, each slice of the current picture is restricted to include only one tile, and each tile is restricted to include only one brick.

An alternative to the second solution comprises a method of encoding a video stream. The video bitstream includes at least one image including a plurality of slices, each slice of the plurality of slices including a plurality of tiles and bricks, each tile including one or more bricks. The method comprises the following steps: when WPP is enabled to encode the current picture, each tile of the current picture is restricted to include only one brick. That is, when the value of entry _ coding _ sync _ enabled _ flag is equal to 1, the value of split _ splitting _ present _ flag should be equal to 0.

An alternative to the second solution comprises a method of encoding a video stream. The video bitstream includes at least one image including a plurality of slices, each slice of the plurality of slices including a plurality of tiles and bricks, each tile including one or more bricks. The method comprises the following steps: when WPP is enabled to encode the current picture, each slice of the current picture is restricted to include only one brick. That is, when the value of entropy _ coding _ sync _ enabled _ flag is equal to 1, the value of the variable numbricklncurrentslice should be equal to 1.

Fig. 4 shows a video bitstream 400 for implementing WPP 450. The video bitstream 400 described herein may also be referred to as an encoded video bitstream, a bitstream, or variants thereof. As shown in fig. 4, the bitstream 400 includes a Sequence Parameter Set (SPS) 402, a Picture Parameter Set (PPS) 404, a slice header 406, and picture data 408.

The SPS402 includes data common to all pictures in a sequence of pictures (SOP), while the PPS404 includes data common to the entire picture. The slice header 406 includes information of the current slice, such as the slice type, the reference picture to be used, and the like. SPS402 and PPS404 may be collectively referred to as parameter sets. SPS402, PPS404, and slice header 406 are Network Abstraction Layer (NAL) unit types. A NAL unit is a syntax structure that includes an indication of the type of data (e.g., encoded video data) to follow. NAL units are divided into Video Coding Layer (VCL) NAL units and non-video coding layer NAL (non-VCL) units. The vcl nal units include data representing the values of samples in the video pictures and the non-vcl nal units include any associated additional information such as parameter sets (important slice header data that can be applied to a large number of vcl nal units) and supplemental enhancement information (timing information and other supplemental data that can enhance the usability of the decoded video signal but which is not necessary to decode the values of samples in the video pictures). Those skilled in the art will appreciate that the codestream 400 may include other parameters and information in practical applications.

The image data 408 shown in fig. 4 includes data associated with an image or video being encoded or decoded. The image data 408 may simply be referred to as payload or data carried in the codestream 400. Image data 408 may be segmented into one or more images, such as image 410, image 412, and image 414. Although three images 410-414 are shown in FIG. 4, more or fewer images may be present in a practical application.

In one embodiment, the images 410-414 are segmented into bands, such as band 416, band 418, and band 420, respectively. Although three bands are shown (e.g., bands 416 and 420), in actual practice more or fewer bands may be present. In one embodiment, the stripes 416-420 are partitioned into partitions, such as partition 422, partition 424, and partition 426, respectively. Although three partitions are shown (e.g., partition 422 and partition 426), in actual implementations more or fewer partitions may be present. In one embodiment, partitions 422 and 426 are divided into CTBs, such as CTB428 and CTB430, respectively. Although 40 CTBs (e.g., CTB428 and CTB430) are shown, in actual implementations more or fewer CTBs may be present.

WPP450 may be used to encode and/or decode a slice (e.g., slice 416-420). Thus, WPP450 may be used by an encoder (e.g., video encoder 20 described above) or a decoder (e.g., video decoder 30 described above).

In one embodiment, WPP450 is applied to tile 424, tile 424 being a segmentation of slice 416, and slice 416 being a segmentation of image 410. The block includes a plurality of CTBs, e.g., CTB428 and CTB 430. Each CTB (e.g., CTB428 or CTB430) is a set of samples having a predefined size that may be partitioned into coded blocks in a coding tree. Plurality of CTBs 428 and plurality of CTBs 430 may be arranged into CTB rows 460, 462, 464, 466, and 468 and CTB columns 470, 472, 474, 476, 478, 480,482, and 484. CTB row 460-468 is a set of CTBs 428 and 430 that extend horizontally between the left boundary of block 424 and the right boundary of block 424. CTB column 470-484 is a set of CTBs 428 and CTBs 430 that extend vertically between the upper boundary of tile 424 and the lower boundary of tile 424. In one embodiment, WPP450 is applied to a stripe (e.g., stripe 416) rather than to a partition (e.g., partition 424). That is, in some embodiments, chunking is optional.

WPP450 may operate in parallel using multiple computing threads to encode CTB428 and CTB 430. In the example shown, CTB428 (shaded) has been encoded, while CTB430 (unshaded) has not. For example, a first thread may start encoding CTB line 460 at a first time. In VVC, after one CTB428 has been encoded in the first CTB row 460, the second thread may begin encoding CTB row 462. After one CTB428 has been encoded in the second CTB row 462, the third thread may begin encoding the CTB row 464. After one CTB428 has been encoded in the third CTB row 464, the fourth thread may begin encoding CTB row 466. After one CTB428 has been encoded in the fourth CTB line 466, the fifth thread may begin encoding the fifth CTB line 468. Thus forming a pattern as shown in fig. 4. Other threads may be used as desired. In other words, the encoding process for starting a new row of CTBs may be repeated after the CTBs in the previous row have been encoded. This mechanism creates a pattern with the appearance of a wave front and is therefore named WPP 450. Some video coding mechanisms encode current CTB430 from an encoded CTB428 that is located above or to the left of current CTB 430. In VVC, WPP450 has a CTB430 encoding delay between the start of each thread, ensuring that CTB428 has been encoded when any current CTB430 to be encoded is encoded. In HEVC, WPP450 has two CTB430 coding latencies between the start of each thread, ensuring that CTB428 has been coded when any current CTB430 to be coded is coded.

CTB428 is encoded into a code stream (e.g., code stream 400) in CTB row 460 and 468. Thus, each CTB row 460-468 may be an independently addressable subset of the partitions 424 in the codestream 400. For example, each CTB row 460-468 may be addressed at an entry point 486. The entry point 486 is a bit position in the codestream 400 after encoding the partition 424, wherein the codestream 400 includes the first bit of video data for the corresponding subset of the partition 424. When WPP450 is employed, entry points 486 are bit positions that include the first bit corresponding to the respective CTB rows 460 and 468. Thus, the number of entry points (NumEntryPoints)488 is the number of entry points 486 for CTB row 460-468.

Take block 424 in fig. 4 as an example. In WPP, the encoder adds a CTB row end bit at the end of each CTB row 460-468. The CTB column end bit is used to indicate the end of CTB column 460 and 468 to the decoder. Then, the encoder performs byte alignment to add byte alignment bits as padding. In addition, in WPP, the encoder also adds a block end bit at the end of CTB line 468. The end of block bit is used to indicate the end of block 424 to the decoder. Then, the encoder performs byte alignment to add byte alignment bits as padding. Since the end of the CTB line 468 is also the end of the chunk 424, in WPP, after the last CTB430 in the CTB line 468 has been encoded, the encoder encodes the CTB line end bit and the chunk end bit and performs two byte alignments. Therefore, the indication and byte alignment are repeated in WPP.

Techniques to prevent duplicate indication and byte alignment in WPP are disclosed herein. By eliminating duplicate indication and byte alignment in WPP, the number of bits used to indicate the end of a block and the number of bits used for padding are reduced. The encoder/decoder (also known as a "codec) in video coding is improved over existing codecs due to the reduction in the number of bits required in WPP. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Unlike the WPP described above, the present invention indicates the end of block bit and performs byte alignment once only after encoding the last CTB430 in CTB row 468. Thus, the number of bits used for indication and the number of bits used for padding are reduced relative to WPP.

Fig. 5 illustrates an embodiment of a method 500 implemented by a video decoder (e.g., video decoder 30 described above) of decoding an encoded video bitstream. The method 500 may be performed after receiving a decoded codestream directly or indirectly from a video encoder (e.g., the video encoder 20 described above). Method 500 improves the decoding process by reducing the number of bits used for indication and the number of bits used as padding after encoding the last CTB (e.g., CTB430) of the last CTB row (e.g., CTB row 468) of a block (e.g., block 424). This therefore actually improves the performance of the codec, resulting in a better user experience.

Step 502: a video decoder receives an encoded video bitstream (e.g., bitstream 400). In one embodiment, the encoded video bitstream includes an image (e.g., image 410). In one embodiment, the image includes one or more strips (e.g., strips 416 and 420) having one or more segments (e.g., segments 422 and 426). In one embodiment, each partition includes multiple coding tree blocks (e.g., CTB428 and CTB 430).

Step 504: a video decoder obtains a blocking end bit and a byte alignment bit having a first value in the encoded video bitstream. In one embodiment, the end of block bit is represented by end of tile one bit. In one embodiment, the first value is 1 (1). In one embodiment, the byte alignment bits are the result of an encoder (e.g., video encoder 20 described above) performing a bit alignment process. In one embodiment, the block end bit and byte alignment bit having the first value are used to indicate that the current CTB (e.g., CTB430) of the plurality of CTBs (e.g., CTB428 and CTB430) is the last CTB of a block (e.g., block 424).

Step 506: and the video decoder obtains a CTB row end bit and a byte alignment bit of a first value in the coding video code stream. In one embodiment, the CTB row end bit is represented by an end _ of _ subset _ bit. In one embodiment, the first value is 1 (1). In one embodiment, the byte alignment bits are the result of an encoder (e.g., video encoder 20 described above) performing a bit alignment process. In one embodiment, the CTB row end bit and byte alignment bit having the first value are used to indicate that WPP is enabled and that the current CTB (e.g., CTB430) of the plurality of CTBs (e.g., CTB428 and CTB430) is the last CTB of the row of CTBs (e.g., CTB row 460 and 466) but not the last CTB of the block (e.g., block 424).

Step 508: the video decoder reconstructs a plurality of CTBs in the block based on a block end bit having a first value, a CTB row end bit having a first value, and a byte alignment bit. In one embodiment, one image is generated from the reconstructed multiple CTBs. In one embodiment, the image may be displayed to the user on an electronic device (e.g., a smartphone, a tablet, a laptop, a personal computer, etc.).

Fig. 6 illustrates an embodiment of a method 600 implemented by a video encoder (e.g., video encoder 20 described above) for encoding a video bitstream. The method 600 may be performed when an image (e.g., from a video) needs to be encoded into a video bitstream and transmitted to a video decoder (e.g., the video decoder 30 described above). The method 600 improves the encoding process by reducing the number of bits used for indication and the number of bits used as padding after encoding the last CTB (e.g., CTB430) of the last CTB row (e.g., CTB row 468) of a partition (e.g., partition 424). This therefore actually improves the performance of the codec, resulting in a better user experience.

Step 602: the video encoder partitions an image (e.g., image 410) into one or more slices (e.g., slice 416 and 420). In one embodiment, each strip includes one or more partitions (e.g., partitions 422 and 426). In one embodiment, each partition includes multiple coding tree blocks (e.g., CTB428 and CTB 430).

Step 604: when a current CTB of the plurality of CTBs is a last CTB in a block, the video encoder encodes a block end bit having a first value and a byte alignment bit into a video bitstream. In one embodiment, the end of block bit is represented by end of tile one bit. In one embodiment, the first value is 1 (1). In one embodiment, the byte alignment bits are the result of an encoder (e.g., video encoder 20 described above) performing a bit alignment process. In one embodiment, the block end bit and byte alignment bit having the first value are used to indicate that the current CTB (e.g., CTB430) of the plurality of CTBs (e.g., CTB428 and CTB430) is the last CTB of a block (e.g., block 424).

Step 606: when WPP is enabled and the current CTB is the last CTB in the CTB row but not the last CTB in the block, the video encoder encodes a CTB row end bit and a byte alignment bit having a first value into the video bitstream. In one embodiment, the CTB row end bit is represented by an end _ of _ subset _ bit. In one embodiment, the first value is 1 (1). In one embodiment, the byte alignment bits are the result of an encoder (e.g., video encoder 20 described above) performing a bit alignment process. In one embodiment, the CTB row end bit and byte alignment bit having the first value are used to indicate that WPP is enabled and that the current CTB (e.g., CTB430) of the plurality of CTBs (e.g., CTB428 and CTB430) is the last CTB of the row of CTBs (e.g., CTB row 460 and 466) but not the last CTB of the block (e.g., block 424).

Step 608: the video encoder stores the video code stream for transmission to a video decoder. In one embodiment, a video encoder sends the video bitstream to a video decoder.

The following syntax and semantics may be used to implement the embodiments disclosed herein. The following description is made with respect to the basic text, which refers to the latest VVC draft specification. In other words, only the newly added part is described below, and the text in the base text is applied as it is and is not mentioned below. The newly added text is displayed in bold against the base text and the deleted text is displayed in italics.

Fig. 7 is a schematic diagram of a video coding apparatus 700 (e.g., the video encoder 20 or the video decoder 30) according to an embodiment of the present invention. The video decoding apparatus 700 is suitable for implementing the disclosed embodiments described herein. The video decoding apparatus 700 includes: an ingress port 710 and a reception unit (Rx)720 for receiving data; a processor, logic unit, or Central Processing Unit (CPU) 730 for processing data; a transmission unit (Tx)740 and an egress port 750 for transmitting data; a memory 760 for storing data. The video decoding apparatus 700 may further include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the ingress port 710, the reception unit 720, the transmission unit 740, and the egress port 750 as an egress or ingress of optical signals or electrical signals.

The processor 730 is implemented by hardware and software. The processor 730 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and Digital Signal Processors (DSPs). Processor 730 is in communication with ingress port 710, receiving unit 720, sending unit 740, egress port 750, and memory 760. Processor 730 includes a decode module 770. The decode module 770 implements the above disclosed embodiments. For example, the decode module 770 may implement, process, prepare, or provide various codec functions. Thus, the inclusion of the decode module 770 substantially improves the functionality of the video decoding apparatus 700 and affects the transition of the video decoding apparatus 700 to a different state. Alternatively, the decode module 770 is implemented as instructions stored in the memory 760 and executed by the processor 730.

The video transcoding device 700 may also include input and/or output (I/O) devices 780 for exchanging data with a user. The I/O devices 780 may include output devices such as a display for displaying video data, speakers for outputting audio data, and the like. The I/O device 780 may also include input devices such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with these output devices.

Memory 760, which includes one or more hard disks, tape drives, and solid state disks, may be used as an over-flow data storage device to store programs for execution when such programs are selected, as well as to store instructions and data that are read during program execution. The memory 760 may be volatile and/or nonvolatile, and may be read-only memory (ROM), Random Access Memory (RAM), ternary content-addressable memory (TCAM), and/or Static Random Access Memory (SRAM).

Fig. 8 is a diagram of an embodiment of a decoding module 800. In one embodiment, coding module 800 is implemented in a video coding device 802 (e.g., video encoder 20 described above or video decoder 30 described above). The video coding device 802 comprises a receiving module 801. The receiving module 801 is used for receiving an image for encoding or receiving a code stream for decoding. The video coding device 802 comprises a sending module 807 coupled to the receiving module 801. The sending module 807 is used to send the codestream to a decoder or the decoded image to a display module (e.g., one of the I/O devices 780 described above).

The video coding device 802 comprises a storage module 803. The storing module 803 is coupled to at least one of the receiving module 801 or the transmitting module 807. The storage module 803 is used to store instructions. The video coding device 802 also includes a processing module 805. The processing module 805 is coupled to the storage module 803. The processing module 805 is used to execute instructions stored in the storage module 803 to perform the methods disclosed herein.

It should also be understood that the steps of the exemplary methods set forth herein do not necessarily need to be performed in the order described, and the order of the steps of these methods should be understood as being merely exemplary. Likewise, methods consistent with various embodiments of the present invention may include additional steps, and certain steps may be omitted or combined.

While several embodiments of the present invention have been provided, it should be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of the present invention. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details described herein. For example, various elements or components may be combined or integrated in another system, or some features may be omitted, or not implemented.

Furthermore, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or described as coupled or directly coupled or communicating with each other may also be indirectly coupled or communicating through some interface, device, or intermediate component, whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

37页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于子块使用变换跳过模式

Wavefront parallel processing for blocks, bricks, and strips

相关技术

网友询问留言