Method and apparatus for signaling prediction candidate list size

文档序号：246879 发布日期：2021-11-12 浏览：6次中文

阅读说明：本技术 用于通过信号发送预测候选列表大小的方法及装置 (Method and apparatus for signaling prediction candidate list size ) 是由许晓中李翔刘杉于 2020-05-04 设计创作，主要内容包括：一种视频解码方法,包括接收包括当前图像的已编码视频比特流。该方法还包括确定包括在当前图像中的当前块是否以帧内块复制(IBC)模式进行编码。该方法还包括,响应于当前块以IBC模式进行编码,确定与当前块相关联的IBC预测候选的数量。该方法还包括构建具有与IBC预测候选的数量相对应的大小的IBC预测候选列表。该方法还包括从IBC预测候选列表中选择块向量预测。该方法还包括使用块向量预测对与当前块相关联的块向量进行解码。该方法还包括根据块向量对当前块进行解码。(A video decoding method includes receiving an encoded video bitstream including a current picture. The method also includes determining whether a current block included in the current picture is encoded in an Intra Block Copy (IBC) mode. The method also includes determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in IBC mode. The method also includes constructing a list of IBC prediction candidates having a size corresponding to the number of IBC prediction candidates. The method also includes selecting a block vector prediction from the IBC prediction candidate list. The method also includes decoding a block vector associated with the current block using block vector prediction. The method also includes decoding the current block according to the block vector.)

1. A video decoding method, comprising:

receiving an encoded video bitstream comprising a current picture;

determining whether a current block included in the current picture is encoded in an Intra Block Copy (IBC) (intra block copy) mode;

determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in the IBC mode;

constructing an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates;

selecting a block vector prediction from the list of IBC prediction candidates;

decoding a block vector associated with the current block using the block vector prediction; and

decoding the current block according to the block vector.

2. The method of claim 1, wherein the number of IBC prediction candidates is greater than or equal to M and less than or equal to N, where M is 2.

3. The method of claim 2, wherein N is 5.

4. The method of claim 2, wherein the number of IBC prediction candidates is equal to:

max(M,min(MaxNumMergeCand,N))，

where MaxNumMergeCand equals the number of candidates in the merge mode list.

5. The method of claim 1, wherein the number of IBC prediction candidates is signaled in a bitstream.

6. The method of claim 5, wherein the number of IBC prediction candidates is signaled using a truncated unary code, wherein the number of IBC prediction candidates minus 1 is a maximum.

7. The method of claim 1, wherein an index associated with selecting IBC block vector prediction is not signaled when the number of IBC prediction candidates is equal to 1.

8. A video decoder for performing video decoding, comprising:

a processing circuit configured to:

receiving an encoded video bitstream including a current picture,

determining whether a current block included in the current picture is encoded in an Intra Block Copy (IBC) mode,

determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in the IBC mode,

constructing an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates,

selecting a block vector prediction from the list of IBC prediction candidates,

decoding a block vector associated with the current block using the block vector prediction, and

decoding the current block according to the block vector.

9. The video decoder of claim 8, wherein the number of IBC prediction candidates is greater than or equal to M and less than or equal to N, where M is 2.

10. The video decoder of claim 9, wherein N is 5.

11. The video decoder of claim 9, wherein the number of IBC prediction candidates is equal to:

max(M,min(MaxNumMergeCand,N))，

where MaxNumMergeCand equals the number of candidates in the merge mode list.

12. The video decoder of claim 8, wherein the number of IBC prediction candidates is signaled in a bitstream.

13. The video decoder of claim 12, wherein the number of IBC prediction candidates is signaled using truncated unary codes, wherein the number of IBC prediction candidates minus 1 is a maximum.

14. The video decoder of claim 8, wherein an index associated with selecting IBC block vector prediction is not signaled when the number of IBC prediction candidates is equal to 1.

15. A non-transitory computer readable medium having stored thereon instructions that, when executed by a processor in a video decoder, cause the processor to perform a method comprising:

receiving an encoded video bitstream comprising a current picture;

determining whether a current block included in the current picture is encoded in an Intra Block Copy (IBC) (intra block copy) mode;

determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in the IBC mode;

constructing an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates;

selecting a block vector prediction from the list of IBC prediction candidates;

decoding a block vector associated with the current block using the block vector prediction; and

decoding the current block according to the block vector.

16. The non-transitory computer-readable medium of claim 15, wherein the number of IBC prediction candidates is greater than or equal to M and less than or equal to N, where M is 2.

17. The non-transitory computer-readable medium of claim 16, wherein N is 5.

18. The non-transitory computer-readable medium of claim 16, wherein the number of IBC prediction candidates is equal to:

max(M,min(MaxNumMergeCand,N))，

where MaxNumMergeCand equals the number of candidates in the merge mode list.

19. The non-transitory computer-readable medium of claim 15, wherein the number of IBC prediction candidates is signaled in a bitstream.

20. The non-transitory computer-readable medium of claim 19, wherein the number of IBC prediction candidates is signaled using a truncated unary code, wherein the number of IBC prediction candidates minus 1 is a maximum.

Technical Field

Embodiments related to video coding are generally described.

Background

The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.

Video encoding and decoding may be performed using inter-picture prediction with motion compensation. Uncompressed digital video may comprise a series of images (pictures), each image having spatial dimensions of, for example, 1920 x 1080 luma samples and associated chroma samples. The series of images may have a fixed or variable image rate (also informally referred to as a frame rate), such as 60 images per second or 60 Hz. Uncompressed video has very high bit rate requirements. For example, at 8 bits per sample, 1080p 604: 2: a video of 0 (with 1920 x 1080 luminance sample resolution at 60Hz frame rate) requires a bandwidth of nearly 1.5 Gbit/s. An hour of such video requires more than 600GB of storage space.

One purpose of video encoding and decoding may be to reduce redundancy in the input video signal by compression. Compression may help reduce the bandwidth or storage requirements described above, in some cases by two orders of magnitude or more. Lossless compression and lossy compression, and combinations thereof, may be employed. Lossless compression refers to a technique by which an exact copy of an original signal can be reconstructed from a compressed original signal. When lossy compression is used, the reconstructed signal may be different from the original signal, but the distortion between the original signal and the reconstructed signal is small enough that the reconstructed signal is useful for the intended application. In the case of video, lossy compression is widely used. The amount of distortion that can be tolerated depends on the application; for example, some users consuming streaming applications may tolerate higher distortion than users consuming television distribution applications. The achievable compression ratio may reflect: higher allowable/tolerable distortion may result in higher compression rates.

Motion compensation may be a lossy compression technique and may involve a technique in which a sample data block from a previously reconstructed image or part thereof (a reference image) is used for prediction of a newly reconstructed image or image part after spatial shifting in the direction indicated by a Motion Vector (MV, hereafter). In some cases, the reference image may be the same as the image currently being reconstructed. The multiple MVs may have two or three dimensions X and Y, the third dimension being an indication of the reference picture in use (the latter may be indirectly the temporal dimension).

In some video compression techniques, an MV applicable to a certain region of sample data may be predicted from other multiple MVs, e.g., MVs that are related to another sample data region spatially adjacent to the region in reconstruction and that precede the MV in decoding order. This can substantially reduce the amount of data required to encode the MVs, thereby eliminating redundancy and increasing compression. MV prediction can work efficiently, for example, because when encoding an input video signal derived from a camera (referred to as natural video), there is a statistical likelihood that a region larger than the region to which a single MV applies moves in a similar direction, and thus, in some cases, similar motion vectors derived from MVs of neighboring regions can be used for prediction. This results in the MV found for a given region being similar or identical to the MV predicted from the surrounding multiple MVs, and after entropy coding, this in turn can be represented with a smaller number of bits than the bits used to directly code the MV. In some cases, MV prediction may be an example of lossless compression of a signal (i.e., multiple MVs) derived from an original signal (i.e., a sample stream). In other cases, MV prediction itself may be lossy, for example due to rounding errors (rounding errors) when calculating predicted values from several surrounding MVs.

Various MV prediction mechanisms are described in h.265/HEVC (ITU-T rec.h.265, "High Efficiency Video Coding", 2016 month 12). Among the various MV prediction mechanisms provided by h.265, described herein is a technique referred to hereinafter as "spatial merging".

Referring to fig. 1, a current block (101) includes samples found by an encoder during a motion search, which can be predicted from previous blocks of the same size that have been spatially shifted. Instead of encoding MVs directly, MVs may be derived from metadata associated with one or more reference pictures, e.g. from the most recent (in decoding order) reference picture, using MVs associated with any of the five surrounding samples, denoted a0, a1, B0, B1, B2 (102 to 106, respectively). In h.265, MV prediction can use prediction from the same reference picture that neighboring blocks are using. The order of forming the candidate list may be a0 → B0 → B1 → a1 → B2.

Disclosure of Invention

According to an exemplary embodiment, a video decoding method includes: an encoded video bitstream including a current picture is received. The method also includes determining whether a current block included in the current picture is encoded in an Intra Block Copy (IBC) mode. The method also includes, in response to the current block being encoded in IBC mode, determining a number of IBC prediction candidates associated with the current block. The method also includes constructing a list of IBC prediction candidates having a size corresponding to the number of IBC prediction candidates. The method also includes selecting a block vector prediction from the list of IBC prediction candidates. The method also includes decoding a block vector associated with the current block using the block vector prediction. The method also includes decoding the current block according to the block vector.

According to an exemplary embodiment, a video decoder for performing video decoding includes a processing circuit configured to receive an encoded video bitstream including a current picture. The processing circuit is further configured to determine whether a current block included in a current picture is encoded in an Intra Block Copy (IBC) mode. In response to the current block being encoded in IBC mode, the processing circuit is further configured to determine a number of IBC prediction candidates associated with the current block. The processing circuit is further configured to construct a list of IBC prediction candidates having a size corresponding to the number of IBC prediction candidates. The processing circuit is further configured to select a block vector prediction from the IBC prediction candidate list. The processing circuit is also configured to decode a block vector associated with the current block using the block vector prediction. The processing circuit is further configured to decode the current block according to the block vector.

According to an example embodiment, a non-transitory computer readable medium has instructions stored thereon. The instructions, when executed by a processor in a video decoder, cause the processor to perform a method comprising: an encoded video bitstream including a current picture is received. The method also includes determining whether a current block included in the current image is encoded in an Intra Block Copy (IBC) mode. The method also includes, in response to the current block being encoded in IBC mode, determining a number of IBC prediction candidates associated with the current block. The method also includes constructing a list of IBC prediction candidates having a size corresponding to the number of IBC prediction candidates. The method also includes selecting a block vector prediction from the list of IBC prediction candidates. The method also includes decoding a block vector associated with the current block using the block vector prediction. The method also includes decoding the current block according to the block vector.

Drawings

Other features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and the accompanying drawings, in which:

fig. 1 is a schematic diagram of a current block and its surrounding spatial merge candidates in one example.

Fig. 2 is a schematic diagram of a simplified block diagram of a communication system (200) according to an embodiment.

Fig. 3 is a schematic diagram of a simplified block diagram of a communication system (300) according to an embodiment.

Fig. 4 is a schematic diagram of a simplified block diagram of a decoder according to an embodiment.

Fig. 5 is a schematic diagram of a simplified block diagram of an encoder according to an embodiment.

Fig. 6 shows a block diagram of an encoder according to another embodiment.

Fig. 7 shows a block diagram of a decoder according to another embodiment.

Fig. 8 is a schematic diagram of intra image block compensation according to an embodiment.

Fig. 9A-9D are diagrams of intra block compensation with a Code Tree Unit (CTU) size search range according to an embodiment.

10A-10D are schematic diagrams of how a buffer is updated according to an embodiment.

Fig. 11A is an illustration of a decoding flow diagram for a history-based MV prediction (HMVP) buffer.

Fig. 11B is a schematic diagram of updating the HMVP buffer.

Fig. 12 is an illustration of an exemplary decoding process according to an embodiment.

FIG. 13 is a schematic diagram of a computer system according to an embodiment of the present application.

Detailed Description

Fig. 2 shows a simplified block diagram of a communication system (200) according to one embodiment of the present disclosure. The communication system (200) includes a plurality of terminal devices that can communicate with each other through, for example, a network (250). For example, a communication system (200) includes a first pair of terminal devices (210) and (220) interconnected by a network (250). In the example of fig. 2, the first pair of terminal devices (210) and (220) performs unidirectional data transmission. For example, a terminal device (210) may encode video data, such as a stream of video images captured by the terminal device (210), for transmission over a network (250) to another terminal device (220). The encoded video data may be transmitted in the form of one or more encoded video bitstreams. The terminal device (220) may receive the encoded video data from the network (250), decode the encoded video data to recover the video image, and display the video image according to the recovered video data. Unidirectional data transmission is common in media service applications and the like.

In another example, the communication system (200) includes a second pair of terminal devices (230) and (240) that perform bi-directional transmission of encoded video data, which may occur, for example, during a video conference. For bi-directional data transmission, in an example, each of the terminal device (230) and the terminal device (240) may encode video data (e.g., a stream of video images captured by the terminal device) for transmission over the network (250) to the other of the terminal device (230) and the terminal device (240). Each of the terminal device (230) and the terminal device (240) may also receive encoded video data transmitted by the other of the terminal device (230) and the terminal device (240), and may decode the encoded video data to recover the video image, and may display the video image on an accessible display device according to the recovered video data.

In the example of fig. 2, the terminal device (210), the terminal device (220), the terminal device (230), and the terminal device (240) may be illustrated as a server, a personal computer, and a smart phone, but the principles of the present disclosure may not be limited thereto. Embodiments of the present disclosure are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. Network (250) represents any number of networks that convey encoded video data between terminal device (210), terminal device (220), terminal device (230), and terminal device (240), including, for example, wired (wired) and/or wireless communication networks. The communication network (250) may exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunications networks, local area networks, wide area networks, and/or the internet. For purposes of this discussion, the architecture and topology of the network (250) may be immaterial to the operation of the present disclosure, unless explained below.

As an example of an application of the disclosed subject matter, fig. 3 shows the placement of a video encoder and a video decoder in a streaming environment. The disclosed subject matter is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV, storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.

The streaming system may include an acquisition subsystem (313) that may include a video source (301), such as a digital camera, that creates an uncompressed video image stream (302), for example. In one example, the video image stream (302) includes samples taken by a digital camera. Compared to encoded video data (304) (or encoded video bitstream), a video image stream (302) depicted as a thick line to emphasize high data volume may be processed by an electronic device (320) comprising a video encoder (303) coupled to a video source (301). The video encoder (303) may comprise hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in more detail below. Encoded video data (304) (or encoded video bitstream (304)) depicted as thin lines to emphasize lower data volumes may be stored on a streaming server (305) for future use as compared to a video image stream (302). One or more streaming client subsystems, such as client subsystem (306) and client subsystem (308) in fig. 3, may access streaming server (305) to retrieve copies (307) and copies (309) of encoded video data (304). The client subsystem (306) may include, for example, a video decoder (310) in an electronic device (330). A video decoder (310) decodes incoming copies (307) of the encoded video data and generates an output video image stream (311) that may be presented on a display (312), such as a display screen, or another presentation device (not depicted). In some streaming systems, encoded video data (304), encoded video data (307), and encoded video data (309) (e.g., a video bitstream) may be encoded according to certain video encoding/compression standards. Examples of these standards include ITU-T recommendation H.265. In one example, the video coding standard being developed is informally referred to as multi-function video coding (VVC). The disclosed subject matter may be used in the context of VVCs.

It should be noted that electronic device (320) and electronic device (330) may include other components (not shown). For example, the electronic device (320) may include a video decoder (not shown), and the electronic device (330) may also include a video encoder (not shown).

Fig. 4 shows a block diagram of a video decoder (410) according to one embodiment of the present disclosure. The video decoder (410) may be included in an electronic device (430). The electronic device (430) may include a receiver (431) (e.g., a receive circuit). The video decoder (410) may be used in place of the video decoder (310) in the example of fig. 3.

The receiver (431) may receive one or more encoded video sequences to be decoded by the video decoder (410); in the same or another embodiment, the encoded video sequences are received one at a time, wherein the decoding of each encoded video sequence is independent of the decoding of other encoded video sequences. The encoded video sequence may be received from a channel (401), which may be a hardware/software link to a storage device that stores encoded video data. The receiver (431) may receive encoded video data as well as other data, such as encoded audio data and/or auxiliary data streams, which may be forwarded to their respective use entities (not depicted). The receiver (431) may separate the encoded video sequence from other data. To prevent network jitter, a buffer memory (415) may be coupled between the receiver (431) and the entropy decoder/parser (420) (hereinafter "parser (420)"). In some applications, the buffer memory (415) is part of the video decoder (410). In other cases, buffer memory (415) may be disposed external (not depicted) to video decoder (410). While in other cases a buffer memory (not depicted) may be provided external to the video decoder (410), e.g., to prevent network jitter, and another buffer memory (415) may be configured internal to the video decoder (410), e.g., to handle playout timing. When the receiver (431) is receiving data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network, the buffer memory (415) may not be needed or may be made smaller. In an effort to use over a traffic packet network such as the internet, a buffer memory (415) may be required, which may be relatively large and may advantageously be of an adaptive size, and may be implemented at least partially in an operating system or similar element (not depicted) external to the video decoder (410).

The video decoder (410) may include a parser (420) to reconstruct symbols (421) from the encoded video sequence. The categories of these symbols include information for managing the operation of the video decoder (410), as well as potential information to control a rendering device (rendering device), such as a rendering device (412) (e.g., a display screen), that is not an integral part of the electronic device (430), but may be coupled to the electronic device (430), as shown in fig. 3. The control Information for the rendering device may be in the form of Supplemental Enhancement Information (SEI message) or Video Usability Information (VUI) parameter set fragments (not depicted). The parser (420) may parse/entropy decode the received encoded video sequence. Encoding of the encoded video sequence may be performed in accordance with video coding techniques or standards and may follow various principles, including variable length coding, Huffman coding, arithmetic coding with or without contextual sensitivity, and so forth. A parser (420) may extract a subgroup parameter set for at least one of the subgroups of pixels in the video decoder from the encoded video sequence based on at least one parameter corresponding to the group. The subgroups may include groups of Pictures (GOPs), Pictures, tiles (tile), slices (slice), macroblocks (macroblock), Coding Units (CU), blocks, Transform Units (TU), Prediction Units (PU), and so on. The parser (420) may also extract information from the encoded video sequence, such as transform coefficients, quantizer parameter values, motion vectors, and so on.

The parser (420) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (415), thereby creating symbols (421).

The reconstruction of the symbol (421) may involve a number of different units depending on the type of the encoded video image or portion of the encoded video image (e.g., inter and intra images, inter and intra blocks), and other factors. Which units are involved and the manner in which they are involved can be controlled by subgroup control information parsed from the encoded video sequence by a parser (420). For simplicity, such a subgroup control information flow between parser (420) and a plurality of units below is not depicted.

In addition to the functional blocks already mentioned, the video decoder (410) may be conceptually subdivided into several functional units as described below. In a practical implementation operating under business constraints, many of these units interact closely with each other and may be at least partially integrated with each other. However, for the purposes of describing the disclosed subject matter, a conceptual subdivision into the following functional units is appropriate.

The first unit is a sealer/inverse transform unit (451). The sealer/inverse transform unit (451) receives the quantized transform coefficients as symbols (421) from the parser (420) along with control information including which transform scheme to use, block size, quantization factor, quantization scaling matrix, etc. The sealer/inverse transform unit (451) may output a block comprising sample values, which may be input into the aggregator (455).

In some cases, the output samples of sealer/inverse transform unit (451) may belong to an intra-coded block; namely: predictive information from previously reconstructed pictures is not used, but blocks of predictive information from previously reconstructed portions of the current picture may be used. Such predictive information may be provided by intra picture prediction unit (452). In some cases, the intra picture prediction unit (452) generates a block of the same size and shape as the block being reconstructed using the surrounding reconstructed information extracted from the current picture buffer (458). For example, the current image buffer (458) buffers a partially reconstructed current image and/or a fully reconstructed current image. In some cases, the aggregator (455) adds, on a per-sample basis, the prediction information generated by the intra prediction unit (452) to the output sample information provided by the scaler/inverse transform unit (451).

In other cases, the output samples of sealer/inverse transform unit (451) may belong to inter-coded and potential motion compensated blocks. In this case, motion compensated prediction unit (453) may access reference picture store (457) to extract samples for prediction. After motion compensation of the extracted samples according to the symbols (421) belonging to the block, these samples may be added by an aggregator (455) to the output of the scaler/inverse transform unit (451), in this case referred to as residual samples or residual signals, thereby generating output sample information. The address at which the motion compensated prediction unit (453) extracts the prediction samples from within the reference picture store (457) may be controlled by a motion vector, and the motion vector is used by the motion compensated prediction unit (453) in the form of a symbol (421), which symbol (421) may have, for example, X, Y and a reference picture component. Motion compensation may also include interpolation of sample values fetched from the reference picture store (457), motion vector prediction mechanisms, etc., when using sub-sample exact motion vectors.

The output samples of the aggregator (455) may be subjected to various loop filtering techniques in a loop filter unit (456). The video compression techniques may include in-loop filter techniques that are controlled by parameters included in the encoded video sequence (also referred to as the encoded video bitstream) and available to the loop filter unit (456) as symbols (421) from the parser (420), however, the video compression techniques may also be responsive to meta-information obtained during decoding of previous (in decoding order) portions of the encoded image or encoded video sequence, as well as to sample values previously reconstructed and loop filtered.

The output of the loop filter unit (456) may be a sample stream that may be output to a rendering device (412) and stored in a reference picture store (457) for subsequent inter picture prediction.

Once fully reconstructed, some of the coded pictures may be used as reference pictures for future prediction. For example, once the encoded picture corresponding to the current picture is fully reconstructed and the encoded picture is identified (by, for example, parser (420)) as a reference picture, the current picture buffer (458) may become part of the reference picture memory (457) and a new current picture buffer may be reallocated before starting reconstruction of a subsequent encoded picture.

The video decoder (410) may perform decoding operations according to predetermined video compression techniques, such as in the ITU-T rec.h.265 standard. The encoded video sequence may conform to the syntax specified by the video compression technique or standard used, in the sense that the encoded video sequence follows the syntax of the video compression technique or standard (syntax) and the profile recorded in the video compression technique or standard (profile). In particular, the configuration file may select certain tools from all tools available in the video compression technology or standard as the only tools available under the configuration file. For compliance, the complexity of the encoded video sequence may also be required to be within a range defined by the level of the video compression technique or standard. In some cases, the hierarchy limits the maximum image size, the maximum frame rate, the maximum reconstruction sampling rate (measured in units of, e.g., mega samples per second), the maximum reference image size, and so forth. In some cases, the limits set by the hierarchy may be further defined by a Hypothetical Reference Decoder (HRD) specification and metadata signaled HRD buffer management in the encoded video sequence.

In one embodiment, receiver (431) may receive additional (redundant) data along with the reception of the encoded video. The additional data may be included as part of the encoded video sequence. The additional data may be used by the video decoder (410) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, a temporal, spatial, or signal-to-noise ratio (SNR) enhancement layer, a redundant slice, a redundant picture, a forward error correction code, and so forth.

Fig. 4 shows a block diagram of a video encoder (503) according to one embodiment of the present disclosure. The video encoder (503) is included in an electronic device (520). The electronic device (520) includes a transmitter (540) (e.g., a transmission circuit). The video encoder (503) may be used in place of the video encoder (303) in the example of fig. 3.

The video encoder (503) may receive video samples from a video source (501) (not part of the electronics (520) in the example of fig. 5) that may capture video images to be encoded by the video encoder (503). In another example, the video source (501) is part of an electronic device (520).

The video source (501) may provide a source video sequence in the form of a stream of digital video samples to be encoded by the video encoder (503), which may have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit … …), any color space (e.g., bt.601y CrCB, RGB … …), and any suitable sampling structure (e.g., Y CrCB 4: 2: 0, Y CrCB 4: 4: 4). In the media service system, the video source (501) may be a storage device that stores previously prepared video. In a video conferencing system, the video source (501) may be a camera that captures local image information as a video sequence. The video data may be provided as a plurality of separate images that, when viewed in sequence, produce a motion effect. The image itself may be constructed as an array of spatial pixels, where each pixel may comprise one or more samples, depending on the sampling structure, color space, etc. used. The relationship between the pixel and the sample can be easily understood by those skilled in the art. The following text focuses on describing the samples.

According to one embodiment, the video encoder (503) may encode and compress images of a source video sequence into an encoded video sequence (543) in real time or under any other temporal constraints required by the application. It is a function of the controller (550) to perform the appropriate encoding speed. In some embodiments, the controller (550) controls and is functionally coupled to other functional units as described below. For simplicity, the coupling is not depicted in the figures. The parameters set by the controller (550) may include rate control related parameters (picture skip, quantizer, lambda value … … for rate distortion optimization techniques), picture size, group of picture (GOP) layout, maximum motion vector search range, etc. The controller (550) may be configured with other suitable functions relating to the video encoder (503) optimized for a system design.

In some embodiments, the video encoder (503) is configured to operate in an encoding loop. As a brief description, in one example, the encoding loop may include a source encoder (530) (e.g., responsible for creating symbols, e.g., a stream of symbols, based on input pictures and reference pictures to be encoded) and a (local) decoder (533) embedded in the video encoder (503). The decoder (533) reconstructs the symbols to create sample data in a manner similar to that which a (remote) decoder can create (since in the video compression techniques contemplated by the disclosed subject matter any compression between the symbols and the encoded video bitstream is lossless). The reconstructed sample stream (sample data) is input to a reference image memory (534). Since the decoding of the symbol stream produces bit accurate results independent of decoder location (local or remote), the contents of the reference picture store (534) also correspond bit-wise accurately between the local encoder and the remote encoder. In other words, the reference picture samples that the prediction portion of the encoder "sees" are exactly the same as the sample values that the decoder would "see" when using prediction during decoding. This reference to the picture synchronization philosophy (and the offset that occurs if synchronization cannot be maintained due to channel errors, for example) is also used in some correlation techniques.

The operation of the "local" decoder (533) may be the same as the operation of a "remote" decoder, such as the video decoder (410) that has been described in detail above in connection with fig. 4. However, referring briefly to fig. 4 additionally, when symbols are available and the entropy encoder (545) and parser (420) can losslessly encode/decode the symbols into an encoded video sequence, the entropy decoding portion of the video decoder (410), including the buffer memory (415) and parser (420), may not be fully implemented in the local decoder (533).

At this point it is observed that any decoder technique other than the parsing/entropy decoding present in the decoder must also be present in the corresponding encoder in substantially the same functional form. For this reason, the disclosed subject matter focuses on decoder operation. The description of the encoder techniques may be simplified because the encoder techniques are reciprocal to the fully described decoder techniques. A more detailed description is needed only in certain areas and is provided below.

During operation, in some examples, the source encoder (530) may perform motion compensated predictive coding. The motion compensated predictive coding predictively codes an input picture with reference to one or more previously coded pictures from the video sequence that are designated as "reference pictures". In this way, the encoding engine (532) encodes differences between blocks of pixels of an input image and blocks of pixels of a reference image, which may be selected as a prediction reference for the input image.

The local video decoder (533) may decode encoded video data for pictures that may be designated as reference pictures based on the symbols created by the source encoder (530). The operation of the encoding engine (532) may advantageously be a lossy process. When the encoded video data may be decoded at a video decoder (not shown in fig. 5), the reconstructed video sequence may typically be a copy of the source video sequence, but with some errors. The local video decoder (533) replicates the decoding process, which may be performed on the reference pictures by the video decoder, and may cause the reconstructed reference pictures to be stored in the reference picture cache (534). In this way, the video encoder (503) can locally store a copy of the reconstructed reference picture that has common content (no transmission errors) with the reconstructed reference picture to be obtained by the far-end video decoder.

The predictor (535) may perform a prediction search against the coding engine (532). That is, for a new image to be encoded, the predictor (535) may search the reference picture memory (534) for sample data (as candidate reference pixel blocks) or some metadata, such as reference picture motion vectors, block shapes, etc., that may be a reference for a proper prediction of the new image. The predictor (535) may operate on a block-by-block basis of samples to find a suitable prediction reference. In some cases, the input image may have prediction references derived from multiple reference images stored in a reference image memory (534), as determined by search results obtained by a predictor (535).

The controller (550) may manage encoding operations of the source encoder (530), including, for example, setting parameters and subgroup parameters for encoding video data.

The outputs of all of the above functional units may be entropy encoded in an entropy encoder (545). The entropy encoder (545) transforms the symbols generated by the various functional units into an encoded video sequence by lossless compression according to techniques such as huffman coding, variable length coding, arithmetic coding, and the like.

The transmitter (540) may buffer the encoded video sequence created by the entropy encoder (545) in preparation for transmission over a communication channel (560), which may be a hardware/software link to a storage device that will store the encoded video data. The transmitter (540) may combine the encoded video data from the video encoder (503) with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (sources not shown).

The controller (550) may manage the operation of the video encoder (503). During encoding, the controller (550) may assign each encoded picture a certain encoded picture type, but this may affect the encoding techniques that may be applied to the respective picture. For example, images can be generally assigned to any of the following image types:

intra pictures (I pictures), which may be pictures that can be encoded and decoded without using any other picture in the sequence as a prediction source. Some video codecs tolerate different types of intra pictures, including, for example, Independent Decoder Refresh ("IDR") pictures. Those skilled in the art are aware of variations of I-pictures and their corresponding applications and features.

A predictive picture (P picture), which may be a picture that can be encoded and decoded using intra prediction or inter prediction that predicts sample values of each block using at most one motion vector and a reference index.

A bi-directional predictive picture (B picture), which may be a picture that can be encoded and decoded using intra prediction or inter prediction that predicts sample values of each block using at most two motion vectors and a reference index. Similarly, a multiple-predictive picture may use more than two reference pictures and associated metadata for reconstructing a single block.

A source image may typically be spatially subdivided into blocks of samples (e.g., blocks of 4 × 4, 8 × 8, 4 × 8, or 16 × 16 samples) and encoded block-by-block. These blocks may be predictively encoded with reference to other (encoded) blocks determined by the coding allocation applied to their respective pictures. For example, a block of an I picture may be non-predictively encoded, or the block may be predictively encoded (spatial prediction or intra prediction) with reference to an encoded block of the same picture. The pixel blocks of the P picture can be predictively coded by spatial prediction or by temporal prediction with reference to a previously coded reference picture. A block of a B picture may be predictively coded by spatial prediction or by temporal prediction with reference to one or two previously coded reference pictures.

The video encoder (503) may perform encoding operations according to a predetermined video encoding technique or standard, such as ITU-T rec.h.265. In operation, the video encoder (503) may perform various compression operations, including predictive encoding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the encoded video data may conform to syntax specified by the video coding technique or standard used.

In one embodiment, the transmitter (540) may transmit the encoded video along with additional data. The source encoder (530) may include such data as part of an encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, redundant pictures and slices, among other forms of redundant data, SEI messages, VUI parameter set slices, etc.

The captured video may be provided as a plurality of source images (video images) in a time sequence. Intra-picture prediction, often abbreviated as intra-prediction, exploits spatial correlation in a given picture, and inter-picture prediction exploits (temporal or other) correlation between pictures. In one example, a particular image being encoded/decoded, referred to as a current image, is partitioned into blocks. When a block in a current picture is similar to a reference block in a reference picture that has been previously encoded in the video and is still buffered, the block in the current picture may be encoded by a vector called a motion vector. The motion vector points to a reference block in a reference picture, and in the case of multiple reference pictures, the motion vector may have a third dimension that identifies the reference picture.

In some embodiments, bi-directional prediction techniques may be used for inter-picture prediction. According to bi-directional prediction techniques, two reference pictures are used, e.g., a first reference picture and a second reference picture that are both prior to the current picture in video in decoding order (but may be past and future in display order, respectively). A block in a current picture may be encoded by a first motion vector pointing to a first reference block in a first reference picture and a second motion vector pointing to a second reference block in a second reference picture. The block may be predicted by a combination of the first reference block and the second reference block.

Furthermore, merge mode techniques may be used for inter-picture prediction to improve coding efficiency.

According to some embodiments of the present disclosure, prediction such as inter-image prediction and intra-image prediction is performed in units of blocks. For example, according to the HEVC standard, pictures in a sequence of video pictures are partitioned into Coding Tree Units (CTUs) for compression, the CTUs in the pictures having the same size, e.g., 64 × 64 pixels, 32 × 32 pixels, or 16 × 16 pixels. In general, a CTU includes three Coding Tree Blocks (CTBs), which are one luminance CTB and two chrominance CTBs. Each CTU may be recursively split into one or more Coding Units (CUs) in a quadtree. For example, a 64 × 64-pixel CTU may be split into one 64 × 64-pixel CU, or 4 32 × 32-pixel CUs, or 16 × 16-pixel CUs. In one example, each CU is analyzed to determine a prediction type for the CU, such as an inter prediction type or an intra prediction type. Depending on temporal and/or spatial predictability, a CU is split into one or more Prediction Units (PUs). In general, each PU includes a luma Prediction Block (PB) and two chroma blocks PB. In one embodiment, the prediction operation in encoding (encoding/decoding) is performed in units of prediction blocks. Taking a luma prediction block as an example of a prediction block, the prediction block includes a matrix of values (e.g., luma values) for pixels, such as 8 × 8 pixels, 16 × 16 pixels, 8 × 16 pixels, 16 × 8 pixels, and so on.

Fig. 6 shows a diagram of a video encoder (603) according to another embodiment of the present disclosure. A video encoder (603) is configured to receive a processed block of sample values within a current video picture in a sequence of video pictures (e.g., a predicted block), and encode the processed block into an encoded picture that is part of an encoded video sequence. In one example, the video encoder (603) is used in place of the video encoder (303) in the example of fig. 3.

In the HEVC example, a video encoder (603) receives a matrix of sample values for a processing block, e.g., a prediction block of 8 × 8 samples, etc. The video encoder (603) uses, for example, rate-distortion (RD) optimization to determine whether to use intra mode, inter mode, or bi-prediction mode to optimally encode the processing block. When the processing block is to be encoded in intra mode, the video encoder (603) may use intra prediction techniques to encode the processing block into an encoded image; and when the processing block is to be encoded in inter mode or bi-prediction mode, the video encoder (603) may encode the processing block into the encoded picture using inter prediction or bi-prediction techniques, respectively. In some video coding techniques, the merge mode may be an inter-picture prediction sub-mode, in which motion vectors are predicted derived from one or more motion vectors without resorting to coded motion vector components outside of the predictor. In some other video coding techniques, there may be motion vector components that are applicable to the subject block. In one example, the video encoder (603) includes other components, such as a mode decision module (not shown) for determining a mode of processing the block.

In the example of fig. 6, the video encoder (603) includes an inter encoder (630), an intra encoder (622), a residual calculator (623), a switch (626), a residual encoder (624), a general controller (621), and an entropy encoder (625) coupled together as shown in fig. 6.

The inter encoder (630) is configured to receive samples of a current block (e.g., a processed block), compare the block to one or more reference blocks in a reference picture (e.g., blocks in previous and subsequent pictures), generate inter prediction information (e.g., redundant information descriptions, motion vectors, merge mode information according to inter coding techniques), and calculate an inter prediction result (e.g., a predicted block) using any suitable technique based on the inter prediction information. In some examples, the reference picture is a decoded reference picture that is decoded based on the encoded video information.

The intra encoder (622) is configured to receive samples of a current block (e.g., a processing block), in some cases compare the block to an already encoded block in the same image, generate quantized coefficients after transformation, and in some cases also generate intra prediction information (e.g., intra prediction direction information according to one or more intra coding techniques). In one example, the intra encoder (622) also computes an intra prediction result (e.g., a predicted block) based on the intra prediction information and a reference block in the same picture.

The general purpose controller (621) is configured to determine general purpose control data and to control other components of the video encoder (603) based on the general purpose control data. In one example, a general purpose controller (621) determines a mode of a block and provides a control signal to a switch (626) based on the mode. For example, when the mode is intra, the general controller (621) controls the switch (626) to select an intra mode result for use by the residual calculator (623), and controls the entropy encoder (625) to select and include intra prediction information in the bitstream; and when the mode is an inter mode, the general purpose controller (621) controls the switch (626) to select an inter prediction result for use by the residual calculator (623), and controls the entropy encoder (625) to select and include inter prediction information in the bitstream.

The residual calculator (623) is configured to calculate a difference (residual data) between the received block and a prediction result selected from the intra encoder (622) or the inter encoder (630). A residual encoder (624) is configured to operate on the residual data to encode the residual data to generate transform coefficients. In one example, a residual encoder (624) is configured to convert residual data from a spatial domain to a frequency domain and generate transform coefficients. The transform coefficients are then subjected to a quantization process to obtain quantized transform coefficients. In embodiments, the video encoder (603) further comprises a residual decoder (628). A residual decoder (628) is configured to perform an inverse transform and generate decoded residual data. The decoded residual data may be suitably used by an intra encoder (622) and an inter encoder (630). For example, inter encoder (630) may generate a decoded block based on decoded residual data and inter prediction information, and intra encoder (622) may generate a decoded block based on decoded residual data and intra prediction information. The decoded block is processed appropriately to generate a decoded picture, and in some examples, the decoded picture may be buffered in a memory circuit (not shown) and used as a reference picture.

The entropy encoder (625) is configured to format the bitstream to include encoded blocks. The entropy encoder (625) is configured to include various information according to a suitable standard, such as the HEVC standard. In one example, the entropy encoder (625) is configured to include general control data, selected prediction information (e.g., intra prediction information or inter prediction information), residual information, and other suitable information in the bitstream. It should be noted that, according to the disclosed subject matter, there is no residual information when a block is encoded in a merge sub-mode of an inter mode or a bi-directional prediction mode.

Fig. 7 shows a diagram of a video decoder (710) according to another embodiment of the present disclosure. A video decoder (710) is configured to receive an encoded image that is part of an encoded video sequence and decode the encoded image to generate a reconstructed image. In one example, the video decoder (710) is used in place of the video decoder (310) in the example of fig. 3.

In the example of fig. 7, the video decoder (710) includes an entropy decoder (771), an inter-frame decoder (780), a residual decoder (773), a reconstruction module (774), and an intra-frame decoder (772) coupled together as shown in fig. 7.

The entropy decoder (771) may be configured to reconstruct from the encoded image certain symbols representing syntax elements constituting the encoded image. Such symbols may include, for example, a mode used to encode the block (e.g., intra mode, inter mode, bi-prediction mode, a merge sub-mode of the latter two, or another sub-mode), prediction information (e.g., intra prediction information or inter prediction information) that may identify certain samples or metadata used by the intra decoder 772 or inter decoder 780, respectively, for prediction, residual information in the form of, for example, quantized transform coefficients, and so forth. In one example, when the prediction mode is inter or bi-directional prediction mode, inter prediction information is provided to an inter decoder (780); and providing the intra prediction information to an intra decoder (772) when the prediction type is an intra prediction type. The residual information may be subjected to inverse quantization and provided to a residual decoder (773).

An inter-frame decoder (780) is configured to receive the inter-frame prediction information and generate an inter-frame prediction result based on the inter-frame prediction information.

An intra-frame decoder (772) is configured to receive intra-frame prediction information and generate a prediction result based on the intra-frame prediction information.

A residual decoder (773) is configured to perform inverse quantization to extract dequantized transform coefficients and to process the dequantized transform coefficients to transform the residual from the frequency domain to the spatial domain. The residual decoder (773) may also need some control information (to include Quantizer Parameter (QP)) and this information may be provided by the entropy decoder (771) (data path not depicted, as this is only low-level control information).

The reconstruction module (774) is configured to combine in the spatial domain the residuals output by the residual decoder (773) with the prediction results (which may be output by the inter prediction module or the intra prediction module as the case may be) to form a reconstructed block, which may be part of a reconstructed image, which in turn may be part of a reconstructed video. It should be noted that other suitable operations, such as deblocking operations, may be performed to improve visual quality.

It should be noted that video encoder (303), video encoder (503), and video encoder (603) as well as video decoder (310), video decoder (410), and video decoder (710) may be implemented using any suitable techniques. In one embodiment, video encoder (303), video encoder (503), and video encoder (603) and video decoder (310), video decoder (410), and video decoder (710) may be implemented using one or more integrated circuits. In another embodiment, the video encoder (303), the video encoder (503), and the video decoder (310), the video decoder (410), and the video decoder (710) may be implemented using one or more processors executing software instructions.

Block-based compensation from different images may be referred to as motion compensation. Block compensation may also be done from previously reconstructed regions within the same Picture, which may be referred to as Intra Block compensation, Intra Block Copy (IBC), or Current Picture Referencing (CPR). For example, a displacement vector indicating an offset between a current block and a reference block is referred to as a block vector. According to some embodiments, the block vector points to a reference block that has already been reconstructed and is available for reference. Also, reference regions outside tile/tile boundaries or wavefront trapezoidal boundaries may be excluded from reference of the block vector for parallel processing considerations. Due to these constraints, the block vector may be different from the motion vector in motion compensation, where the motion vector may be any value (positive or negative, in the x or y direction).

The encoding of the block vector may be explicit or implicit. In the explicit mode, sometimes referred to as the AMVP mode (Advanced Motion Vector Prediction) in inter coding, the difference between a block Vector and its Prediction is signaled. In the implicit mode, the block vector is retrieved from its predictor in a similar manner to the motion vector in the merge mode. In some embodiments, the resolution of the block vector is limited to integer positions. In other embodiments, the resolution of the block vector may be allowed to point to fractional positions.

The use of intra block copy at the block level may be signaled using a block level flag called IBC flag. In one embodiment, the IBC flag is signaled when the current block is not encoded in merge mode. The IBC flag may also be signaled by a reference index method, which is performed by taking a currently decoded picture as a reference picture. In HEVC Screen Content Coding (SCC), such reference pictures are placed at the last position of the list. The particular reference picture may also be managed along with other temporal reference pictures in the DPB. IBC may also include variants such as flipped IBC (e.g., the reference block is flipped horizontally or vertically before being used to predict the current block) or line-based IBC (e.g., each compensation unit within an mxn coded block is an mx1 or 1 xn line).

Fig. 8 illustrates an embodiment of intra block compensation (e.g., intra block copy mode). In fig. 8, a current picture 800 includes a set of block regions that have been encoded/decoded (i.e., gray squares) and a set of block regions that have not been encoded/decoded (i.e., white squares). A block 802 of one of the block regions that has not yet been encoded/decoded may be associated with a block vector 804 that points to another block 806 that has been previously encoded/decoded. Thus, any motion information associated with block 806 may be used for encoding/decoding of block 802.

In some embodiments, the search range for the CPR mode is limited to within the current CTU. The effective memory requirement to store the reference samples for the CPR mode is 1 CTU size of a sample. Considering the existing reference sample memory for storing reconstructed samples in the current 64 × 64 region, more than 3 reference sample memories of 64 × 64 size are required. Embodiments of the present invention extend the effective search range of the CPR mode to some portion of the left CTU, while the total memory requirement for storing reference pixels remains unchanged (1 CTU size, 4 total 64 x 64 reference sample memories).

In fig. 9A, the upper left region of the CTU 900 is the current region being decoded. When the top left region of the CTU 900 is decoded, entry [1] of the reference sample memory is overwritten by samples from that region, as shown in fig. 10A (e.g., the overwritten memory locations have diagonal cross-hatching). In fig. 9B, the upper right region of the CTU 900 is the next current region being decoded. When the upper right region of the CTU 900 is decoded, entry [2] of the reference sample memory is overwritten by samples from that region, as shown in fig. 10B. In fig. 9C, the lower left region of the CTU 900 is the next current region being decoded. When the lower left area of the CTU 900 is decoded, entry [3] of the reference sample memory is overwritten by samples from that area, as shown in fig. 10C. In fig. 9D, the lower right region of the CTU 900 is the next current region being decoded. When the lower right region of the CTU 900 is decoded, entry [3] of the reference sample memory is overwritten by samples from that region, as shown in fig. 10D.

In some embodiments, the bitstream conformance condition is the following specified condition that a valid block vector (mvL, at 1/16 pixel resolution) should follow. In some embodiments, the luminance motion vector MVL obeys the following a1, a2, B1, C1, and C2 constraints.

In the first constraint (a1), when a derivation process of block availability (e.g., adjacent block availability check process) is called using the current luminance position (xCurr, yCurr) and the adjacent luminance position (xCb + (mvL [0] > >4) + cbWidth-1, yCb + (mvL [1] > >4) + cbHeight-1) set equal to (xCb, yCb) as inputs, the output should be equal to TRUE (TRUE).

In the second constraint (a2), when a derivation process of block availability (e.g., neighboring block availability check process) is called using the current luminance position (xCurr, yCurr) and the neighboring luminance position (xCb + (mvL [0] > >4) + cbWidth-1, yCb + (mvL [1] > >4) + cbHeight-1) set equal to (xCb, yCb) as inputs, the output should be equal to TRUE (TRUE).

In the third constraint (B1), one or both of the following conditions are true:

(i) the value of (mvL [0] > >4) + cbWidth is less than or equal to 0.

(ii) The value of (mvL [1] > >4) + cbHeight is less than or equal to 0.

In the fourth constraint (C1), the following condition is true:

(i)(yCb+(mvL[1]>>4))>>CtbLog2SizeY＝yCb>>CtbLog2SizeY

(ii)(yCb+(mvL[1]>>4)+cbHeight-1)>>CtbLog2SizeY＝yCb>>CtbLog2SizeY

(iii)(xCb+(mvL[0]>>4))>>CtbLog2SizeY>＝(xCb>>CtbLog2SizeY)-1

(iv)(xCb+(mvL[0]>>4)+cbWidth-1)>>CtbLog2SizeY<＝(xCb>>CtbLog2SizeY)

in the fifth constraint condition (C2), when (xCb + (mvL [0] > >4)) > > ctbmog 2SizeY is equal to (xCb > > ctbmog 2SizeY) -1, a derivation process of block availability (for example, a neighboring block availability check process) is called as an input using a current luminance position (xCurr, yCurr) and a neighboring luminance position (((xCb + (mvL [0] > >4) + CtbSizeY) > (ctbmog 2SizeY-1)) < (ctbmog 2SizeY-1) set equal to (xCb, yCb), ((ctbmog 2SizeY-1), ((yCb + (mvL [1] > >4)) > (ctbmog 2SizeY-1)), and the output should be equal to FALSE (FALSE).

In the above equation, xCb and yCb are the x and y coordinates of the current block, respectively. The variables cbHeight and cbWidth are the height and width of the current block, respectively. The variable CtbLog2sizeY refers to the CTU size in the log2 domain. For example, CtbLog2sizeY ═ 7 means that the CTU size is 128 × 128. The variables mvL0[0] and mvL0[1] refer to the x and y components of the block vector mvL0, respectively. If the output is FALSE, then the samples of the reference block are determined to be available (e.g., neighboring blocks are available for intra block copy use). If the output is TRUE, it is determined that the samples of the reference block are not available.

According to some embodiments, a History-Based MVP (HMVP) method includes HMVP candidates defined as motion information of previously encoded blocks. A table with multiple HMVP candidates is maintained during the encoding/decoding process. When a new fragment is encountered, the table is emptied. Whenever there is an inter-coded non-affine block, the associated motion information is added to the last entry of the table as a new HMVP candidate. The encoding flow of the HMVP method is shown in fig. 11A.

The size S of the table is set to 6, which means that up to 6 HMVP candidates can be added to the table. When a new motion candidate is inserted into the table, a constrained FIFO rule is utilized such that a redundancy check is first applied to determine if there is an identical HMVP in the table. If the same HMVP is found, the same HMVP is removed from the table and then all HMVP candidates are moved forward, i.e., the index is decreased by 1. Fig. 11B shows an example of inserting a new motion candidate into the HMVP table.

The HMVP candidates may be used in a merge candidate list construction process. The latest HMVP candidates in the table are checked in order and inserted into the candidate list after the TMVP candidate. Pruning (pruning) may be applied on the HMVP candidates to obtain spatial or temporal merge candidates (i.e., ATMVP) in addition to the subblock motion candidates.

In some embodiments, to reduce the number of pruning operations, the number of candidate HMPVs to be examined (denoted by L) is set to L? M (8-N), where N indicates the number of non-sub-block merge candidates available and M indicates the number of HMVP candidates available in the table. Furthermore, the merge candidate list construction process according to the HMVP list is terminated as soon as the total number of available merge candidates reaches the signaled maximum allowed merge candidate minus 1. Furthermore, the logarithm of the bi-predictive merging candidate derivation for combining is reduced from 12 to 6.

HMVP candidates may also be used in the AMVP candidate list construction process. The motion vectors of the last K HMVP candidates in the table are inserted after the TMVP candidate. Only HMVP candidates having the same reference picture as the AMVP target reference picture are used to construct the AMVP candidate list. Pruning is applied to the HMVP candidates. In some applications, K is set to 4 and the AMVP list size remains unchanged, i.e., equal to 2.

According to some embodiments, when intra block copy is made as a mode operation independent of inter mode, an independent history buffer, referred to as HBVP, may be used to store previously encoded intra block copy block vectors. As a mode independent of inter prediction, it is desirable to have a simplified block vector derivation process for intra block copy mode. The candidate list for IBC BV prediction in AMVP mode may share the candidate list (merge candidate list) used in IBC merge mode, which has 2 spatial candidates +5 HBVP candidates.

The merge candidate list size for the IBC mode may be designated as MaxNumMergeCand. MaxNumMergeCand may be determined by the inter-mode merge candidate list size MaxNumMergeCand, which in some examples is designated as six minus max num merge scan. The variable six minus max num merge cand may specify that the maximum number of merge Motion Vector Prediction (MVP) candidates supported in a slice is subtracted from 6.

In some examples, the maximum number of merged MVP candidates MaxNumMergeCand may be derived as:

MaxNumMergeCand＝6-six_minus_max_num_merge_cand

the value of MaxNumMergeCand may range from 1 to 6, including 1 and 6. BV predictions in non-merge mode may share the same list generated for IBC merge mode. However, in some examples, the candidate list size is typically 2 for the non-merge mode case. Therefore, when MaxNumMergeCand is set to a different value and the maximum number of IBC merge candidate lists is set differently compared to the inter-merge candidate list size, it is necessary to develop an appropriate method to handle the IBC merge candidate list size and the IBC non-merge mode (AMVP mode) prediction list size.

The embodiments of the present disclosure may be used alone or in any order in combination. Furthermore, each of the method, the encoder and the decoder according to embodiments of the present invention may be implemented by a processing circuit (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer readable medium. According to some embodiments, the term block may be interpreted as a prediction block, a coding block, or a coding unit (i.e., CU). When discussing merge mode, skip mode may be considered a special merge mode. All disclosed embodiments for merge mode may also be applied to skip mode.

In some embodiments, when MaxNumMergeCand for inter-frame merge mode is signaled as 1, the corresponding merge candidate list size for IBC is 1, which can lead to undesirable behavior, especially when IBC AMVP mode also uses the same candidate list for BV prediction. In this case, a candidate list with at least two entries is desired (AMVP mode has two predictors). Embodiments of the present disclosure address these issues.

According to some embodiments, in the first method, the merge list size for IBC mode is at least M, where M is an integer. In some embodiments, the merge list size for IBC maxnumibcmerecaged is set to:

MaxNumIBCMergeCand＝max(MaxNumMergeCand,M)。

in one embodiment, M is set equal to 1 to ensure that at least 1 candidate is used for IBC merge mode. In another embodiment, M is set equal to 2 to guarantee at least 2 candidates for IBC merge mode. Furthermore, the IBC AMVP mode, which shares the same candidate list with the IBC merge mode, may ensure that there will be two entries in the candidate list for BV prediction in AMVP mode. In another embodiment, when the IBC AMVP mode shares the same candidate list with the IBC merge mode, M is set to 1 if the current CU is encoded in the IBC merge mode, and M is set to 2 when the current CU is encoded in the IBC AMVP mode.

In accordance with some embodiments, in the second method, the index of the IBC merge mode (i.e., merge _ idx) is signaled only when the merge candidate list size for IBC is greater than 1. In this second approach, a merge candidate list size (i.e., maxnumibcmerergecand) for IBC is first derived, which infers that the variable maxnumibcmerergecand for IBC may be different from maxnummergeecand for inter mode.

In some embodiments, if only the merge candidate size for IBC is considered, the following applies:

when general _ merge _ flag is true (true):

TABLE 1

In some embodiments, if maxnumibcmerergecand is set equal to MaxNumMergeCand-1, the following applies:

when merge _ flag is true:

TABLE 2

In some embodiments, if the IBC AMVP mode shares the same candidate list with the IBC merge mode, at least two entries in the candidate list are expected for BV prediction in AMVP mode. To support this condition, the following assignment may be made for the merge candidate list size of the IBC:

MaxNumIBCMergeCand＝max(MaxNumMergeCand，2)。

since MaxNumIBCMergeCand > -2 is guaranteed in the present embodiment, the signaling of merge _ idx does not need to depend on the size MaxNumMergeCand of the inter-frame merge candidate list. An example of a grammar table is as follows:

TABLE 3

In a third method, according to some embodiments, a separate merge candidate list size for IBC is signaled, and the range of the size MaxNumIBCMergeCand is at least from M, where M is an integer. In this third method, a merge candidate list size maxnumibcmerecaged for IBC may be signaled from a separate syntax element (syntax element). In one embodiment, MaxNumIBCMergeCand ranges from 2 to maxnummerrgecand.

The following is one embodiment of a signaling method of the third method. In some examples, MaxNumIBCMergeCand < ═ MaxNumMergeCand. At the level of an image or slice,

TABLE 4

In some examples, the variable max _ num _ merge _ cand _ minus _ max _ num _ IBC _ cand specifies the maximum number of IBC merge mode candidates supported in a slice subtracted from maxnummemegacand. The maximum number of IBC merge mode candidates maxnumibcmergeencand can be derived as follows:

MaxNumIBCMergeCand＝

MaxNumMergeCand-max_num_merge_cand_minus_max_num_ibc_cand.。

when max _ num _ merge _ cand _ minus _ max _ num _ ibc _ cand is present, in some examples, the value of MaxNumIBCMergeCand is within the range of 2 to maxnummerrgecand (inclusive). When max _ num _ merge _ cand _ minus _ max _ num _ ibc _ cand is not present, MaxNumIBCMergeCand is set to 0 in some examples. When MaxNumIBCMergeCand is equal to 0, in some examples, IBC merge mode is not allowed for the current slice.

In some embodiments, at the CU level, MergeIBCFlag [ x0] [ y0] is set equal to 1 if all of the following conditions are true:

(i) sps _ ibc _ enabled _ flag is equal to 1.

(ii) general _ merge _ flag x0 y0 equals 1.

(iii) CuPredMode [ x0] [ y0] is equal to MODE _ IBC.

(iv) MaxNumIBCMergeCand is greater than or equal to 2.

TABLE 5

The variable merge _ IBC _ idx is an index for the IBC merge candidate list, and in some examples, is between 0 and MaxNumIBCMergeCand-1. If MaxNumIBCMergeCand is greater than MaxNumMergeCand, the above embodiments for the CU stages do not hold.

In some embodiments, a skip _ flag or general _ merge _ flag is first signaled. Ensuring that these flags can be used correctly. A constraint should be imposed such that when IBC mode is selected and the merge candidate size is less than a desired minimum number M (e.g., M-2), these flags are not signaled or should only be false. In one embodiment, when MaxNumIBCMergeCand is less than 2, IBC merge mode or skip mode is not used. If the skip mode flag is signaled before the block-level IBC flag, then constraints may be placed on the IBC flag signaling such that when skip _ flag is true, the IBC flag is not signaled and is inferred to be false. In another example, if the skip mode flag is signaled before the block-level IBC flag, the IBC flag is signaled but still inferred to be false.

In some embodiments, if the block level IBC flag is signaled before the general _ merge _ flag, then constraints are imposed on the general _ merge _ flag signaling such that when the IBC flag is true, the general _ merge _ flag is not signaled and is inferred to be false, or the general _ merge _ flag is signaled but is inferred to be false. An exemplary syntax table design is shown below:

TABLE 6

When CuPredMode [ x0] [ y0] ═ MODE _ IBC and MaxNumIBCMergeCand <2, general _ merge _ flag [ x0] [ y0] is inferred to be 0 in some examples.

According to some embodiments, in the fourth method, the merge list size for IBC mode is at least M and at most N, where M and N are integers and M < N. In this fourth method, the merge list size of IBC maxnumibcmergesecand may be set to:

MaxNumIBCMergeCand＝max(M，min(MaxNumMergeCand,N))。

in one embodiment, M is set equal to 2, so there are at least 2 candidates for IBC merge mode. In another embodiment, N is set equal to 5, so there are up to 5 candidates for IBC merge mode.

According to some embodiments, in a fifth method, the maximum number of IBC prediction candidates MaxNumIBCCand is signaled in the bitstream, e.g. in the slice header. This number may be used for both IBC merging and IBC AMVP. The IBC merge index and the IBC AMVP prediction index may be signaled using the same syntax flag (candidate index) with the truncated unary code, where (MaxNumIBCCand-1) is the maximum value of the truncated unary code.

According to some embodiments, in the sixth embodiment, regardless of whether maxnumibcmergesecand is equal to maxnummergeecand, mvp idx (mvp _ l0_ flag) for IBC AMVP mode is not signaled when maxnumibcmergesecand is equal to 1. Thus, according to the sixth method, if the BV prediction size is equal to 1 (determined by MaxNumIBCMergeCand ═ 1), then the IBC AMVP mode will have only 1 prediction candidate. When this occurs, there is no need to signal the index of the prediction.

The following syntax is an example of an embodiment according to method 6.

TABLE 7

Fig. 12 shows an embodiment illustrating a video decoding process performed by a video decoder, such as video decoder (710). The process may begin at step (S1200), where an encoded video bitstream including a current picture is received (S1200). The process proceeds to step (S1202) to determine whether the current block in the current picture is encoded in the IBC mode. If the current block is encoded in the IBC mode, the process proceeds from the step (S1202) to the step (S1204) to determine the number of IBC prediction candidates for the current block. For example, the number of IBC prediction candidates may be determined according to one of the first to sixth methods disclosed above. The process proceeds to step (S1206), where in step (S1206), an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates is constructed. The process proceeds to step (S1208), and in step (S1208), a block vector prediction is selected from the IBC prediction candidate list. The process proceeds to step (S1210), where in step (S1210), a block vector associated with the current block is decoded using block vector prediction. The process proceeds to step (S1212), and in step (S1212), the current block is decoded from the block vector decoded in the previous step. For example, a block vector prediction may be selected from the IBC prediction candidate list, a block vector may be decoded using the selected block vector prediction, and IBC decoding may be performed on the current block using the block vector. In this regard, the block vector associated with the selected candidate is used to point to another block in the current image used to decode the current block.

Returning to step (S1202), if the current block is not encoded in the IBC mode, the process proceeds to step (S1214), where the current block is decoded according to the encoding mode of the current block (S1214). For example, the current block may be decoded based on an intra prediction mode or an inter prediction mode. The process shown in fig. 12 may end after steps (S1212) and (S1214) are completed.

The techniques described above may be implemented as computer software using computer readable instructions and physically stored in one or more computer readable media. For example, fig. 13 illustrates a computer system (1300) suitable for implementing certain embodiments of the disclosed subject matter.

Computer software may be encoded using any suitable machine code or computer language that may be subject to assembly, compilation, linking, or similar mechanisms to create code that includes instructions that may be executed directly by one or more computer Central Processing Units (CPUs), Graphics Processing Units (GPUs), etc., or by interpretation, microcode execution, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablet computers, servers, smart phones, gaming devices, internet of things devices, and so forth.

The components of computer system (1300) shown in FIG. 13 are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Neither should the configuration of the components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (1300).

The computer system (1300) may include some human interface input devices. Such human interface input devices may be responsive to input by one or more human users through, for example: tactile input (e.g., keystrokes, strokes, data glove movements), audio input (e.g., speech, clapping hands), visual input (e.g., gestures), olfactory input (not depicted). The human interface device may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, captured images from still image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The input human interface device may include one or more of the following (only one shown in each): keyboard (1301), mouse (1302), touch pad (1303), touch screen (1310), data glove (not shown), joystick (1305), microphone (1306), scanner (1307), camera (1308).

The computer system (1300) may also include certain human interface output devices. Such human interface output devices may stimulate one or more human user's senses, for example, through tactile output, sound, light, and smell/taste. Such human interface output devices may include tactile output devices (e.g., tactile feedback for a touch screen (1310), a data glove (not shown), or a joystick (1305), but may also be tactile feedback devices that do not act as input devices), audio output devices (e.g., speakers (1309), headphones (not depicted)), visual output devices (e.g., screens (1310) including CRT screens, LCD screens, plasma screens, OLED screens, each with or without touch screen input functionality — some of these screens can output two-dimensional visual output or output beyond three-dimensional output through devices such as stereoscopic image output, virtual reality glasses (not depicted), holographic displays, and smoke boxes (not depicted)), and printers (not depicted).

The computer system (1300) may also include human-accessible storage and its associated media, such as optical media including CD/DVD ROM/RW (1320) with CD/DVD like media (1321), finger drives (1322), removable hard or solid state drives (1323), conventional magnetic media (not depicted) such as magnetic tape and floppy disk, special purpose ROM/ASIC/PLD based devices (not depicted) such as a secure dongle, and so forth.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the presently disclosed subject matter does not encompass transmission media, carrier waves, or other transitory signals.

The computer system (1300) may also include an interface to connect to one or more communication networks. The network may be, for example, a wireless network, a wired network, an optical network. The network may further be a local network, a wide area network, a metropolitan area network, a vehicle and industrial network, a real time network, a delay tolerant network, etc. Examples of networks include local area networks such as ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., television wired or wireless wide area digital networks including cable television, satellite television, and terrestrial broadcast television, automotive and industrial networks including CANBus, and so forth. Some networks typically require external network interface adapters (e.g., USB ports of computer system (1300)) that connect to some general purpose data ports or peripheral bus (1349); as described below, other network interfaces are typically integrated into the core of computer system (1300) by connecting to a system bus (e.g., an Ethernet interface in a PC computer system or a cellular network interface in a smartphone computer system). Computer system (1300) may communicate with other entities using any of these networks. Such communications may be received only one way (e.g., broadcast television), transmitted only one way (e.g., CANbus connected to certain CANbus devices), or bi-directional, e.g., connected to other computer systems using a local or wide area digital network. As described above, certain protocols and protocol stacks may be used on each of these networks and network interfaces.

The human interface device, human accessible storage device, and network interface described above may be attached to the kernel (1340) of the computer system (1300).

The core (1340) may include one or more Central Processing Units (CPUs) (1341), Graphics Processing Units (GPUs) (1342), special purpose Programmable processing units (1343) in the form of Field Programmable Gate Arrays (FPGAs), hardware accelerators (1344) for certain tasks, and the like. These devices, as well as Read Only Memory (ROM) (1345), random access memory (1346), internal mass storage (1347), such as internal non-user accessible hard drives, SSDs, etc., may be connected by a system bus (1348). In some computer systems, the system bus (1348) may be accessed in the form of one or more physical plugs to enable expansion by additional CPUs, GPUs, and the like. The peripheral devices may be connected directly to the system bus of the core (1348) or through a peripheral bus (1349) to the system bus of the core (1348). The architecture of the peripheral bus includes PCI, USB, etc.

The CPU (1341), GPU (1342), FPGA (1343) and accelerator (1344) may execute certain instructions, which may be combined to form the computer code described above. The computer code may be stored in ROM (1345) or RAM (1346). Transitional data may also be stored in RAM (1346), while persistent data may be stored in internal mass storage (1347), for example. Fast storage and retrieval to any storage device may be made by using a cache, which may be closely associated with: one or more CPUs (1341), GPUs (1342), mass storage (1347), ROMs (1345), RAMs (1346), and the like.

The computer-readable medium may have thereon computer code for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having an architecture (1300), and in particular a core (1340), may provide functionality as a result of one or more processors (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be media associated with user-accessible mass storage as described above, as well as some non-transitory memory of the kernel (1340), such as kernel internal mass storage (1347) or ROM (1345). Software implementing embodiments of the present disclosure may be stored in such a device and executed by core (1340). The computer readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (1340), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (1346) and modifying such data structures according to processes defined by the software. Additionally or alternatively, the computer system may provide functionality as a result of logic hardwired or otherwise embodied in circuitry (e.g., accelerator (1344)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to portions of software may include logic and vice versa. Where appropriate, reference to portions of a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) that stores software for execution, circuitry embodying logic for execution, or both. The present disclosure includes any suitable combination of hardware and software.

Appendix A: abbreviations

JEM: joint exploration model (join exploration model)

VVC: multifunctional video coding

BMS: reference set

MV: motion vector

HEVC: high Efficiency Video Coding (High Efficiency Video Coding)

SEI: auxiliary enhancement information

VUI: video usability information

GOPs: image group

TUs: conversion unit

And (4) PUs: prediction unit

CTUs: coding tree unit

CTBs: coding tree block

PBs: prediction block

HRD: hypothetical reference decoder

SNR: signal to noise ratio

CPUs: central processing unit

GPUs: graphics processing unit

CRT: cathode Ray Tube (Cathode Ray Tube)

LCD: LCD Display (Liquid-Crystal Display)

An OLED: organic Light Emitting Diode (Organic Light-Emitting Diode)

CD: compact Disc (Compact Disc)

DVD: digital Video Disc (Digital Video Disc)

ROM: Read-Only Memory (Read-Only Memory)

RAM: random Access Memory (Random Access Memory)

ASIC: Application-Specific Integrated Circuit (IC)

PLD: programmable Logic Device (Programmable Logic Device)

LAN: local Area Network (Local Area Network)

GSM: global System for Mobile communications

LTE: long Term Evolution (Long-Term Evolution)

CANBus: controller Area Network Bus (Controller Area Network Bus)

USB: universal Serial Bus (Universal Serial Bus)

PCI: interconnection Peripheral (Peripheral Component Interconnect)

FPGA: field programmable gate area

SSD: solid state drive (solid-state drive)

IC: integrated circuit (solid-state drive)

CU: coding unit

While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope of the disclosure.

(1) A video decoding method, comprising: receiving an encoded video bitstream comprising a current picture; determining whether a current block included in the current image is encoded in an Intra Block Copy (IBC) mode; determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in IBC mode; constructing an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates; selecting a block vector prediction from the IBC prediction candidate list; decoding a block vector associated with the current block using the block vector prediction; and decoding the current block according to the block vector.

(2) The method according to feature (1), wherein the number of IBC prediction candidates is greater than or equal to M and less than or equal to N, where M is 2.

(3) The method according to feature (2), wherein N is 5.

(4) The method according to feature (2), wherein the number of IBC prediction candidates is equal to max (M, min (MaxNumMergeCand, N)), wherein MaxNumMergeCand is equal to the number of candidates in the merge mode list.

(5) The method according to any of the features (1) - (4), wherein the number of IBC prediction candidates is signaled in the bitstream.

(6) The method according to feature (5), wherein the number of IBC prediction candidates is signaled using a truncated unary code, wherein the number of IBC prediction candidates is the maximum value minus 1.

(7) The method according to any of features (1) - (6), wherein the index associated with selecting IBC block vector prediction is not signaled when the number of IBC prediction candidates is equal to 1.

(8) A video decoder for performing video decoding, comprising processing circuitry configured to: the method includes receiving an encoded video bitstream including a current picture, determining whether a current block included in the current picture is encoded in an Intra Block Copy (IBC) mode, determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in the IBC mode, and constructing an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates. Selecting a block vector prediction from the IBC prediction candidate list, decoding a block vector associated with the current block using the block vector prediction, and decoding the current block according to the block vector.

(9) The video decoder according to feature (8), wherein the number of IBC prediction candidates is greater than or equal to M and less than or equal to N, where M is 2.

(10) The video decoder according to feature (9), wherein N is 5.

(11) The video decoder according to feature (9), wherein the number of IBC prediction candidates is equal to max (M, min (MaxNumMergeCand, N)), wherein MaxNumMergeCand is equal to the number of candidates in the merge mode list.

(12) The video decoder according to any one of the features (8) to (11), wherein the number of IBC prediction candidates is signaled in a bitstream.

(13) The video decoder according to feature (12), wherein the number of IBC prediction candidates is signaled using truncated unary codes, wherein the number of IBC prediction candidates is the maximum value minus 1.

(14) The video decoder according to any of features (8) - (13), wherein the index associated with selecting IBC block vector prediction is not signaled when the number of IBC prediction candidates is equal to 1.

(15) A non-transitory computer readable medium having stored thereon instructions that, when executed by a processor in a video decoder, cause the processor to perform a method comprising: receiving an encoded video bitstream comprising a current picture; determining whether a current block included in the current image is encoded in an Intra Block Copy (IBC) mode; determining a number of IBC prediction candidates associated with the current block in response to the current block being encoded in IBC mode; constructing an IBC prediction candidate list having a size corresponding to the number of IBC prediction candidates; selecting a block vector prediction from the IBC prediction candidate list; decoding a block vector associated with the current block using the block vector prediction; and decoding the current block according to the block vector.

(16) The non-transitory computer readable medium of feature (15), wherein the number of IBC prediction candidates is greater than or equal to M and less than or equal to N, wherein M is 2.

(17) The non-transitory computer readable medium according to feature (16), wherein N is 5.

(18) The non-transitory computer readable medium of feature (16), wherein the number of IBC prediction candidates is equal to max (M, min (MaxNumMergeCand, N)), wherein MaxNumMergeCand is equal to the number of candidates in the merge mode list.

(19) A non-transitory computer readable medium according to feature (15), wherein the number of IBC prediction candidates is signaled in the bitstream.

(20) A non-transitory computer readable medium according to feature (19), wherein the number of IBC prediction candidates is signaled using truncated unary codes, wherein the number of IBC prediction candidates is a maximum value minus 1.

42页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于视频编码的方法和装置

Method and apparatus for signaling prediction candidate list size

相关技术

网友询问留言