Method and apparatus for small sub-block affine inter prediction

文档序号：98612 发布日期：2021-10-12 浏览：31次中文

阅读说明：本技术 用于小子块仿射帧间预测的方法和装置 (Method and apparatus for small sub-block affine inter prediction ) 是由李贵春李翔许晓中刘杉于 2020-03-13 设计创作，主要内容包括：本申请各方面提供视频解码的方法和装置。一些例子中,视频解码的装置包括处理电路。该处理电路用于对已编码视频码流中当前图像中的块的预测信息进行解码。预测信息指示帧间预测模式中的仿射模型。该处理电路用于根据所述仿射模型确定所述块的控制点的运动矢量。该处理电路用于根据确定的所述控制点的运动矢量来确定所述块的子块的运动矢量,所述子块的宽度和高度之一小于4。此外,该处理电路用于根据确定的所述运动矢量,重建所述子块的至少一个样本。(Aspects of the present application provide methods and apparatus for video decoding. In some examples, an apparatus for video decoding includes a processing circuit. The processing circuit is used for decoding the prediction information of the block in the current image in the coded video code stream. The prediction information indicates an affine model in the inter prediction mode. The processing circuit is configured to determine motion vectors for control points of the block based on the affine model. The processing circuitry is configured to determine a motion vector for a sub-block of the block based on the determined motion vector for the control point, one of a width and a height of the sub-block being less than 4. Furthermore, the processing circuit is configured to reconstruct at least one sample of the sub-block based on the determined motion vector.)

1. A method of video decoding in a decoder, comprising:

decoding prediction information of a block in a current image in an encoded video code stream, wherein the prediction information indicates an affine model in an inter-frame prediction mode;

determining a motion vector of a control point of the block according to the affine model;

determining a motion vector of a sub-block of the block according to the determined motion vector of the control point, one of a width and a height of the sub-block being smaller than 4; and

reconstructing at least one sample of the sub-block according to the determined motion vector.

2. The method of claim 1, further comprising:

determining a motion vector of a 4x4 block of the blocks, the 4x4 block comprising the sub-blocks, according to the determined motion vector of the control point of the block.

3. The method of claim 2, further comprising:

storing the determined motion vector of the 4x4 block in a memory.

4. The method of claim 1, further comprising:

storing the determined motion vector of the sub-block in a memory when the sub-block is at an upper left region of a 4x4 block in the block.

5. The method of claim 1, further comprising:

storing the determined motion vector of the sub-block in a memory when the sub-block is at a lower right region of a 4x4 block in the block.

6. The method of claim 1, further comprising:

storing the determined motion vector of the sub-block in a memory when the sub-block includes a center sample of a 4x4 block of the blocks.

7. The method of claim 1, wherein:

when the sub-block is a chroma block, the chroma block has a fixed size regardless of the size of the corresponding luma block.

8. The method of claim 1, wherein:

the inter prediction mode is uni-directional prediction.

9. The method of claim 1, wherein determining the motion vector of the sub-block comprises:

determining the motion vector of the sub-block of the block according to a motion vector of the control point when the inter prediction mode is uni-directional prediction and a memory bandwidth of motion compensation of an 8 x 8 block including the sub-block is less than or equal to a threshold.

10. The method of claim 1, further comprising:

receiving an indicator indicating whether a small sub-block affine mode is used for the current image or a group of tiles in the current image; and

performing the steps of decoding, determining the motion vector of the control point, determining the motion vector of the sub-block, and reconstructing when the received indicator indicates that the small sub-block affine mode is used.

11. An apparatus, comprising:

a processing circuit to:

decoding prediction information of a block in a current image in an encoded video code stream, wherein the prediction information indicates an affine model in an inter-frame prediction mode;

determining a motion vector of a control point of the block according to the affine model;

determining a motion vector of a sub-block of the block according to the determined motion vector of the control point, one of a width and a height of the sub-block being smaller than 4; and

reconstructing at least one sample of the sub-block according to the determined motion vector.

12. The apparatus of claim 11, wherein the processing circuit is further configured to: determining a motion vector of a 4x4 block of the blocks, the 4x4 block comprising the sub-blocks, according to the determined motion vector of the control point of the block.

13. The apparatus of claim 12, wherein the processing circuit is further configured to: storing the determined motion vector of the 4x4 block in a memory.

14. The apparatus of claim 11, wherein the processing circuit is further configured to: storing the determined motion vector of the sub-block in a memory when the sub-block is in an upper left region of a 4x4 block in the block.

15. The apparatus of claim 11, wherein the processing circuit is further configured to: storing the determined motion vector of the sub-block in a memory when the sub-block is in a lower right region of a 4x4 block in the block.

16. The apparatus of claim 11, wherein the processing circuit is further configured to: storing the determined motion vector of the sub-block in a memory when the sub-block includes a center sample of a 4x4 block of the blocks.

17. The apparatus of claim 11, wherein when the sub-block is a chroma block, the chroma block has a fixed size regardless of a size of a corresponding luma block.

18. The apparatus of claim 11, wherein the processing circuit is further configured to:

19. The apparatus of claim 11, wherein the processing circuit is further configured to:

receiving an indicator indicating whether a small sub-block affine mode is used for the current image or for a tile group in the current image; and

20. A non-transitory computer-readable medium storing instructions that, when executed by a computer for video decoding, cause the computer to perform:

decoding prediction information of a block in a current image in an encoded video code stream, wherein the prediction information indicates an affine model in an inter-frame prediction mode;

determining a motion vector of a control point of the block according to the affine model;

determining a motion vector of a sub-block of the block according to the determined motion vector of the control point, one of a width and a height of the sub-block being smaller than 4; and

reconstructing at least one sample of the sub-block according to the determined motion vector.

Technical Field

The embodiment of the application relates to the field of video coding and decoding.

Background

The background description provided herein is for the purpose of presenting the context of the disclosure. Work of the named inventors, to the extent it is described in this background section, as well as aspects of the description that may fall within the scope of various embodiments of the present specification, may not be admitted to be prior art at the time of filing, either explicitly or implicitly.

Currently, video encoding and decoding may be performed using inter-picture prediction in conjunction with motion compensation. Uncompressed digital video typically comprises a series of images. For example, each image has luma samples and associated chroma samples at 1920 × 1080 resolution. The series of images may have, for example, 60 images per second or a fixed or variable image rate (also referred to as frame rate) of 60 Hz. Thus, uncompressed video has significant bit rate requirements. For example, 1080p 604: 2:0 video (1920 × 1080 luma sample resolution at 60Hz frame rate) with 8 bits per sample requires a bandwidth of approximately 1.5 Gbit/s. Such videos of one hour in length require more than 600GB of storage space.

One purpose of video encoding and decoding may be to reduce redundancy in the input video signal by compression. Compression helps to reduce the bandwidth or storage space requirements described above, in some cases by two orders of magnitude or more. In general, lossless compression and lossy compression, as well as combinations thereof, may be used. Lossless compression refers to a technique by which an exact copy of an original signal can be reconstructed from a compressed original signal. When lossy compression is used, the reconstructed signal may be different from the original signal, but the distortion between the original signal and the reconstructed signal is small, and thus the reconstructed signal can achieve the intended use. Lossy compression is widely used in the video domain. The amount of distortion allowed varies from application to application. For example, users of consumer live applications can tolerate higher distortion than users of television programming applications. The achievable compression ratio may reflect: the higher the allowable/tolerable distortion, the higher the compression ratio that can be produced.

Motion compensation may be a lossy compression technique, and the following techniques may be associated: for a specimen data block from a previously reconstructed picture or part thereof (a reference picture), after a spatial offset in the direction indicated by the Motion Vector (MV), it can be used to predict a newly reconstructed picture or part thereof. In some cases, the reference image may be the same as the image currently being reconstructed. Each MV may have two dimensions X and Y, or three dimensions, the third dimension indicating the reference image being used (indirectly, the third dimension may also be the temporal dimension).

In some video compression techniques, MVs applicable to a certain sample data region may be predicted from other MVs, e.g., related to another sample data region that is spatially neighboring the region being reconstructed and whose decoding order precedes the MVs of the certain sample data region. In this manner, the amount of data required to encode the MVs can be greatly reduced, thereby eliminating redundancy and improving compression. For example, when encoding an input video signal from a camera (referred to as raw video), there are statistical possibilities as follows: a plurality of regions larger than a single MV region may move in similar directions, and thus, in some cases, prediction may be performed using similar motion vectors extracted from MVs of neighboring regions, and thus, MV prediction is very effective. In this way, the MVs determined for a given region are made similar or identical to the MVs predicted from the surrounding MVs, and after entropy coding, the number of bits representing the MVs is less than the number of bits used in the case of direct coding of the MVs. In some cases, MV prediction may be an embodiment of lossless compression for a signal (i.e., MV) extracted from an original signal (i.e., sample stream). In other cases, MV prediction itself may be lossy, e.g., due to rounding errors when the predictor is calculated from several MVs around.

Various MV prediction mechanisms are described in H.265/HEVC (ITU-T H.265 recommendation, "High Efficiency Video Coding", 2016 (12 months) to Hi-Fi). Among the various MV prediction mechanisms provided by h.265, described herein is a technique referred to hereinafter as "spatial merging.

Referring to fig. 1, a current block (101) includes samples that have been found by an encoder during a motion search process, which samples can be predicted from previous blocks of the same size that have generated spatial offsets. In addition, the MVs may be derived from metadata associated with one or more reference pictures, rather than directly encoding the MVs. For example, the MVs associated with any of the five surrounding samples a0, a1 and B0, B1, B2 (102 to 106, respectively) are used to derive the MVs (in decoding order) from the metadata of the most recent reference picture. In h.265, MV prediction can use the predictor of the same reference picture that neighboring blocks are also using.

Disclosure of Invention

Aspects of the present application provide methods and apparatuses for video encoding/decoding. In some examples, an apparatus for video decoding includes a processing circuit. In one embodiment, prediction information for a block in a current picture is decoded from an encoded video bitstream. The prediction information indicates an affine model in the inter prediction mode. Motion vectors for control points of the block are determined from the affine model. And determining the motion vector of the sub-block of the block according to the determined motion vector of the control point. One of the width and height of the sub-block is less than 4. Furthermore, at least one sample of the sub-block is reconstructed from the determined motion vector.

In one embodiment, the motion vector of the 4x4 block of the block, which 4x4 block includes the sub-block, is determined according to the determined motion vector of the control point of the block.

In one embodiment, the determined motion vectors for the 4x4 block are stored in memory.

In one embodiment, the motion vector of the determined sub-block is stored in the memory when the sub-block is in the upper left region of the 4x4 block in the block.

In one embodiment, the motion vector of the determined sub-block is stored in the memory when the sub-block is in the lower right region of the 4x4 block in the block.

In one embodiment, the determined motion vector of the sub-block is stored in the memory when the sub-block comprises the center sample of a 4x4 block in the block.

In one embodiment, when the sub-block is a chroma block, the chroma block has a fixed size regardless of the size of the corresponding luma block.

In one embodiment, the inter prediction mode is uni-directional prediction.

In one embodiment, when the inter prediction mode is uni-directional prediction and a memory bandwidth of motion compensation of an 8 × 8 block including sub-blocks is less than or equal to a threshold, a motion vector of a sub-block of the block is determined according to a motion vector of a control point.

In one embodiment, an indicator is received that indicates whether the sub-block affine mode is for a current picture or for a group of tiles in the current picture. Further, when the received indicator indicates that the small sub-block affine mode is used, the above steps of decoding, determining the motion vector of the control point, determining the motion vector of the sub-block, and reconstructing are performed.

Aspects of the present application also provide a non-transitory computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform any one or combination of video decoding methods.

Drawings

To make the features, principles, and various advantages of the disclosed subject matter more clear, the following detailed description and the accompanying drawings are described. In the drawings:

fig. 1 is a diagram of a current block and its surrounding spatial merging candidates according to an embodiment.

Fig. 2 is a simplified schematic block diagram of a communication system (200) of an embodiment.

Fig. 3 is a simplified schematic block diagram of a communication system (300) of an embodiment.

FIG. 4 is a simplified schematic block diagram of a decoder of an embodiment.

FIG. 5 is a simplified schematic block diagram of an encoder of an embodiment.

Fig. 6 is a schematic block diagram of an encoder of another embodiment.

Fig. 7 is a schematic block diagram of a decoder of another embodiment.

Fig. 8 is a diagram illustrating the positions of spatial merge candidates according to an embodiment.

FIG. 9 is a schematic diagram of candidate pairs in an extended merge list used by the redundancy check process of one embodiment.

Fig. 10 is a diagram of deriving temporal merging candidates from an extended merge list in a current picture according to an embodiment.

FIG. 11 is a diagram of candidate locations from which temporal merge candidates in an extended merge list may be selected, according to one embodiment.

Fig. 12 is a diagram of predicted positions from which a prediction in a merge with motion vector difference (MMVD) mode can be selected, according to an embodiment.

Fig. 13 is a diagram of two Control Point Motion Vectors (CPMV) representing a two-parameter affine model according to an embodiment.

Fig. 14 shows three CPMVs representing a three-parameter affine model.

Fig. 15 illustrates motion vectors obtained for sub-blocks divided in a current block encoded in an affine prediction mode.

Fig. 16 illustrates neighboring blocks of a current block for deriving inherited affine merge candidates.

Fig. 17 shows candidate block positions for deriving a constructed affine merge candidate.

Fig. 18 is a diagram of one embodiment of spatial neighboring blocks that may be used to determine predicted motion information for a current block using a sub-block based temporal motion vector prediction (SbTMVP) method.

Fig. 19 is a diagram of spatial neighboring blocks selected for the SbTMVP method, according to an embodiment.

FIG. 20 illustrates an example of partitioning a coding unit into two triangular prediction units of one embodiment.

Fig. 21 is a diagram of spatial neighboring blocks and temporal neighboring blocks used to create a list of uni-directional prediction candidates for a trigonometric prediction mode, according to one embodiment.

FIG. 22 is a diagram of a lookup table used to derive segmentation direction and segmentation motion information based on a triangulation index, according to an embodiment.

FIG. 23 illustrates weighting factors used for coding units in an adaptive blending process for one embodiment.

FIG. 24 is a schematic diagram of an interleaved affine prediction process of an embodiment.

FIG. 25 illustrates a pattern of weights of a weighted average operation in an interleaved affine prediction process of one embodiment.

FIG. 26 is a diagram of small sub-block affine inter prediction, according to an embodiment.

FIG. 27 is a schematic diagram of a decoding process of an embodiment.

FIG. 28 is a schematic diagram of a computer system of an embodiment.

Detailed Description

I. Encoder and decoder for video encoding and decoding

Fig. 2 is a simplified block diagram of a communication system (200) according to an embodiment disclosed herein. The communication system (200) includes a plurality of terminal devices that can communicate with each other through, for example, a network (250). For example, a communication system (200) includes a first terminal device (210) and a second terminal device (220) interconnected by a network (250). In the embodiment of fig. 2, the first terminal device (210) and the second terminal device (220) perform unidirectional data transmission. For example, a first terminal device (210) may encode video data, such as a stream of video images captured by the terminal device (210), for transmission over a network (250) to a second end device (220). The encoded video data is transmitted in the form of one or more encoded video streams. The second terminal device (220) may receive the encoded video data from the network (250), decode the encoded video data to recover the video data, and display a video image according to the recovered video data. Unidirectional data transmission is common in applications such as media services.

In another embodiment, a communication system (200) includes a third terminal device (230) and a fourth terminal device (240) that perform bidirectional transmission of encoded video data, which may occur, for example, during a video conference. For bi-directional data transmission, each of the third terminal device (230) and the fourth terminal device (240) may encode video data (e.g., a stream of video images captured by the terminal device) for transmission over the network (250) to the other of the third terminal device (230) and the fourth terminal device (240). Each of the third terminal device (230) and the fourth terminal device (240) may also receive encoded video data transmitted by the other of the third terminal device (230) and the fourth terminal device (240), and may decode the encoded video data to recover the video data, and may display a video image on an accessible display device according to the recovered video data.

In the embodiment of fig. 2, the first terminal device (210), the second terminal device (220), the third terminal device (230), and the fourth terminal device (240) may be a server, a personal computer, and a smart phone, but the principles disclosed herein may not be limited thereto. Embodiments disclosed herein are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. Network (250) represents any number of networks that communicate encoded video data between first terminal device (210), second terminal device (220), third terminal device (230), and fourth terminal device (240), including, for example, wired (wired) and/or wireless communication networks. The communication network (250) may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For purposes of this application, the architecture and topology of the network (250) may be immaterial to the operation disclosed herein, unless explained below.

By way of example, fig. 3 illustrates the placement of a video encoder and a video decoder in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV, storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.

The streaming system may include an acquisition subsystem (313), which may include a video source (301), such as a digital camera, that creates an uncompressed video image stream (302). In an embodiment, the video image stream (302) includes samples taken by a digital camera. The video image stream (302) is depicted as a thick line to emphasize a high data amount video image stream compared to the encoded video data (304) (or encoded video code stream), the video image stream (302) being processable by an electronic device (320), the electronic device (320) comprising a video encoder (303) coupled to a video source (301). The video encoder (303) may comprise hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in more detail below. The encoded video data (304) (or encoded video stream (304)) is depicted as a thin line compared to the video image stream (302) to emphasize the lower data amount of the encoded video data (304) (or encoded video stream (304)), which may be stored on the streaming server (305) for future use. One or more streaming client subsystems, such as client subsystem (306) and client subsystem (308) in fig. 3, may access streaming server (305) to retrieve copies (307) and copies (309) of encoded video data (304). The client subsystem (306) may include, for example, a video decoder (310) in an electronic device (330). A video decoder (310) decodes incoming copies (307) of the encoded video data and generates an output video image stream (311) that may be presented on a display (312), such as a display screen, or another presentation device (not depicted). In some streaming systems, encoded video data (304), video data (307), and video data (309) (e.g., video streams) may be encoded according to certain video encoding/compression standards. Examples of such standards include ITU-T H.265. In an embodiment, the Video Coding standard under development is informally referred to as next generation Video Coding (VVC), and the present application may be used in the context of the VVC standard.

It should be noted that electronic device (320) and electronic device (330) may include other components (not shown). For example, the electronic device (320) may include a video decoder (not shown), and the electronic device (330) may also include a video encoder (not shown).

Fig. 4 is a block diagram of a video decoder (410) according to an embodiment of the present disclosure. The video decoder (410) may be disposed in an electronic device (430). The electronic device (430) may include a receiver (431) (e.g., a receive circuit). The video decoder (410) may be used in place of the video decoder (310) in the fig. 3 embodiment.

The receiver (431) may receive one or more encoded video sequences to be decoded by the video decoder (410); in the same or another embodiment, the encoded video sequences are received one at a time, wherein each encoded video sequence is decoded independently of the other encoded video sequences. The encoded video sequence may be received from a channel (401), which may be a hardware/software link to a storage device that stores encoded video data. The receiver (431) may receive encoded video data as well as other data, e.g., encoded audio data and/or auxiliary data streams, which may be forwarded to their respective usage entities (not labeled). The receiver (431) may separate the encoded video sequence from other data. To prevent network jitter, a buffer memory (415) may be coupled between the receiver (431) and the entropy decoder/parser (420) (hereinafter "parser (420)"). In some applications, the buffer memory (415) is part of the video decoder (410). In other cases, the buffer memory (415) may be disposed external (not labeled) to the video decoder (410). While in other cases a buffer memory (not labeled) is provided external to the video decoder (410), e.g., to prevent network jitter, and another buffer memory (415) may be configured internal to the video decoder (410), e.g., to handle playout timing. The buffer memory (415) may not be required to be configured or may be made smaller when the receiver (431) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network. Of course, for use over traffic packet networks such as the internet, a buffer memory (415) may also be required, which may be relatively large and may be of an adaptive size, and may be implemented at least partially in an operating system or similar element (not labeled) external to the video decoder (410).

The video decoder (410) may include a parser (420) to reconstruct symbols (421) from the encoded video sequence. The categories of these symbols include information for managing the operation of the video decoder (410), as well as potential information to control a display device, such as a display screen (412), that is not an integral part of the electronic device (430), but may be coupled to the electronic device (430), as shown in fig. 4. The control Information for the display device may be a parameter set fragment (not shown) of Supplemental Enhancement Information (SEI message) or Video Usability Information (VUI). The parser (420) may parse/entropy decode the received encoded video sequence. Encoding of the encoded video sequence may be performed in accordance with video coding techniques or standards and may follow various principles, including variable length coding, Huffman coding, arithmetic coding with or without contextual sensitivity, and so forth. A parser (420) may extract a subgroup parameter set for at least one of the subgroups of pixels in the video decoder from the encoded video sequence based on at least one parameter corresponding to the group. A subgroup may include a Group of Pictures (GOP), a picture, a tile, a slice, a macroblock, a Coding Unit (CU), a block, a Transform Unit (TU), a Prediction Unit (PU), and so on. The parser (420) may also extract information from the encoded video sequence, such as transform coefficients, quantizer parameter values, motion vectors, and so on.

The parser (420) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (415), thereby creating symbols (421).

The reconstruction of the symbol (421) may involve a number of different units depending on the type of the encoded video image or a portion of the encoded video image (e.g., inter and intra images, inter and intra blocks), among other factors. Which units are involved and the way in which they are involved can be controlled by subgroup control information parsed from the coded video sequence by a parser (420). For the sake of brevity, such a subgroup control information flow between parser (420) and a plurality of units below is not described.

In addition to the functional blocks already mentioned, the video decoder (410) may be conceptually subdivided into several functional units as described below. In a practical embodiment operating under business constraints, many of these units interact closely with each other and may be integrated with each other. However, for the purposes of describing the disclosed subject matter, a conceptual subdivision into the following functional units is appropriate.

The first unit is a sealer/inverse transform unit (451). The sealer/inverse transform unit (451) receives the quantized transform coefficients as symbols (421) from the parser (420) along with control information including which transform scheme to use, block size, quantization factor, quantization scaling matrix, etc. The sealer/inverse transform unit (451) may output a block comprising sample values, which may be input into the aggregator (455).

In some cases, the output samples of sealer/inverse transform unit (451) may belong to an intra-coded block; namely: predictive information from previously reconstructed pictures is not used, but blocks of predictive information from previously reconstructed portions of the current picture may be used. Such predictive information may be provided by intra picture prediction unit (452). In some cases, the intra picture prediction unit (452) generates a surrounding block of the same size and shape as the block being reconstructed using the reconstructed information extracted from the current picture buffer (458). For example, the current image buffer (458) buffers a partially reconstructed current image and/or a fully reconstructed current image. In some cases, the aggregator (455) adds, on a per-sample basis, the prediction information generated by the intra prediction unit (452) to the output sample information provided by the scaler/inverse transform unit (451).

In other cases, the output samples of sealer/inverse transform unit (451) may belong to inter-coded and potential motion compensated blocks. In this case, motion compensated prediction unit (453) may access reference picture store (457) to extract samples for prediction. After motion compensating the extracted samples according to the sign (421), the samples may be added to the output of the scaler/inverse transform unit (451), in this case referred to as residual samples or residual signals, by an aggregator (455), thereby generating output sample information. The fetching of prediction samples by motion compensated prediction unit (453) from addresses within reference picture memory (457) may be controlled by motion vectors, and the motion vectors are used by motion compensated prediction unit (453) in the form of the symbol (421), the symbol (421) comprising, for example, X, Y and a reference picture component. Motion compensation may also include interpolation of sample values fetched from the reference picture store (457), motion vector prediction mechanisms, etc., when using sub-sample exact motion vectors.

The output samples of the aggregator (455) may be employed in a loop filter unit (456) by various loop filtering techniques. The video compression techniques may include in-loop filter techniques that are controlled by parameters included in the encoded video sequence (also referred to as an encoded video bitstream) and that are available to the loop filter unit (456) as symbols (421) from the parser (420). However, in other embodiments, the video compression techniques may also be responsive to meta-information obtained during decoding of previous (in decoding order) portions of the encoded image or encoded video sequence, as well as to sample values previously reconstructed and loop filtered.

The output of the loop filter unit (456) may be a sample stream that may be output to a display device (412) and stored in a reference picture store (457) for subsequent inter picture prediction.

Once fully reconstructed, some of the coded pictures may be used as reference pictures for future prediction. For example, once the encoded picture corresponding to the current picture is fully reconstructed and the encoded picture is identified (by, e.g., parser (420)) as a reference picture, the current picture buffer (458) may become part of the reference picture memory (457) and a new current picture buffer may be reallocated before starting reconstruction of a subsequent encoded picture.

The video decoder (410) may perform decoding operations according to predetermined video compression techniques, such as in the ITU-T h.265 standard. The encoded video sequence may conform to the syntax dictated by the video compression technique or standard used, in the sense that the encoded video sequence conforms to the syntax of the video compression technique or standard and the configuration file recorded in the video compression technique or standard. In particular, the configuration file may select certain tools from all tools available in the video compression technology or standard as the only tools available under the configuration file. For compliance, the complexity of the encoded video sequence is also required to be within the limits defined by the level of the video compression technique or standard. In some cases, the hierarchy limits the maximum image size, the maximum frame rate, the maximum reconstruction sampling rate (measured in units of, e.g., mega samples per second), the maximum reference image size, and so forth. In some cases, the limits set by the hierarchy may be further defined by a Hypothetical Reference Decoder (HRD) specification and metadata signaled HRD buffer management in the encoded video sequence.

In an embodiment, receiver (431) may receive additional (redundant) data along with the encoded video. The additional data may be part of an encoded video sequence. The additional data may be used by the video decoder (410) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, a temporal, spatial, or signal-to-noise ratio (SNR) enhancement layer, a redundant slice, a redundant picture, a forward error correction code, and so forth.

Fig. 5 is a block diagram of a video encoder (503) according to an embodiment of the present disclosure. The video encoder (503) is disposed in the electronic device (520). The electronic device (520) includes a transmitter (540) (e.g., a transmission circuit). The video encoder (503) may be used in place of the video encoder (303) in the fig. 3 embodiment.

Video encoder (503) may receive video samples from a video source (501) (not part of electronics (520) in the fig. 5 embodiment) that may capture video images to be encoded by video encoder (503). In another embodiment, the video source (501) is part of the electronic device (520).

The video source (501) may provide a source video sequence in the form of a stream of digital video samples to be encoded by the video encoder (503), which may have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit … …), any color space (e.g., bt.601y CrCB, RGB … …), and any suitable sampling structure (e.g., Y CrCB 4:2:0, Y CrCB 4:4: 4). In the media service system, the video source (501) may be a storage device that stores previously prepared video. In a video conferencing system, the video source (501) may be a camera that captures local image information as a video sequence. Video data may be provided as a plurality of individual images that are given motion when viewed in sequence. The image itself may be constructed as an array of spatial pixels, where each pixel may comprise one or more samples, depending on the sampling structure, color space, etc. used. The relationship between pixels and samples can be readily understood by those skilled in the art. The following text focuses on describing the samples.

According to an embodiment, the video encoder (503) may encode and compress images of a source video sequence into an encoded video sequence (543) in real time or under any other temporal constraints required by the application. It is a function of the controller (550) to perform the appropriate encoding speed. In some embodiments, the controller (550) controls and is functionally coupled to other functional units as described below. For simplicity, the couplings are not labeled in the figures. The parameters set by the controller (550) may include rate control related parameters (picture skip, quantizer, lambda value of rate distortion optimization technique, etc.), picture size, group of pictures (GOP) layout, maximum motion vector search range, etc. The controller (550) may be used to have other suitable functions relating to the video encoder (503) optimized for a certain system design.

In some embodiments, the video encoder (503) operates in an encoding loop. As a brief description, in an embodiment, an encoding loop may include a source encoder (530) (e.g., responsible for creating a symbol, e.g., a stream of symbols, based on input and reference pictures to be encoded) and a (local) decoder (533) embedded in the video encoder (503). The decoder (533) reconstructs the symbols to create sample data in a similar manner as the (remote) decoder creates the sample data (since any compression between the symbols and the encoded video bitstream is lossless in the video compression techniques considered in this application). The reconstructed sample stream (sample data) is input to a reference image memory (534). Since the decoding of the symbol stream produces bit accurate results independent of decoder location (local or remote), the contents of the reference picture store (534) also correspond bit-wise accurately between the local encoder and the remote encoder. In other words, the reference picture samples that the prediction portion of the encoder "sees" are exactly the same as the sample values that the decoder would "see" when using prediction during decoding. This reference to the picture synchronization philosophy (and the drift that occurs if synchronization cannot be maintained, e.g., due to channel errors) is also used in some correlation techniques.

The operation of "local" decoder (533) may be the same as a "remote" decoder, such as video decoder (410) that has been described in detail above in connection with fig. 4. However, referring briefly also to fig. 4, when symbols are available and the entropy encoder (545) and parser (420) are able to losslessly encode/decode the symbols into an encoded video sequence, the entropy decoding portion of the video decoder (410), including the buffer memory (415) and parser (420), may not be fully implemented in the local decoder (533).

At this point it can be observed that any decoder technique other than the parsing/entropy decoding present in the decoder must also be present in the corresponding encoder in substantially the same functional form. For this reason, the present application focuses on decoder operation. The description of the encoder techniques may be simplified because the encoder techniques are reciprocal to the fully described decoder techniques. A more detailed description is only needed in certain areas and is provided below.

During operation, in some embodiments, the source encoder (530) may perform motion compensated predictive coding. The motion compensated predictive coding predictively codes an input picture with reference to one or more previously coded pictures from the video sequence, indicated as "reference pictures". In this way, the encoding engine (532) encodes differences between blocks of pixels of an input image and blocks of pixels of a reference image, which may be selected as a prediction reference for the input image.

The local video decoder (533) may decode encoded video data for a picture, which may be indicated as a reference picture, based on the symbols created by the source encoder (530). The operation of the encoding engine (532) may be a lossy process. When the encoded video data can be decoded at a video decoder (not shown in fig. 5), the reconstructed video sequence may typically be a copy of the source video sequence with some errors. The local video decoder (533) replicates the decoding process, which may be performed on the reference pictures by the video decoder, and may cause the reconstructed reference pictures to be stored in the reference picture cache (534). In this way, the video encoder (503) may locally store a copy of the reconstructed reference picture that has common content (no transmission errors) with the reconstructed reference picture to be obtained by the remote video decoder.

The predictor (535) may perform a prediction search against the coding engine (532). That is, for a new image to be encoded, the predictor (535) may search the reference picture memory (534) for sample data (as candidate reference pixel blocks) or some metadata, such as reference picture motion vectors, block shapes, etc., that may be referenced as appropriate predictions for the new image. The predictor (535) may operate on a block-by-block basis of samples to find a suitable prediction reference. In some cases, from search results obtained by predictor (535), it may be determined that the input image may have prediction references derived from multiple reference images stored in reference image memory (534).

The controller (550) may manage encoding operations of the source encoder (530), including, for example, setting parameters and subgroup parameters for encoding video data.

The outputs of all of the above functional units may be entropy encoded in an entropy encoder (545). The entropy encoder (545) losslessly compresses the symbols generated by the various functional units according to techniques such as huffman coding, variable length coding, arithmetic coding, etc., to convert the symbols into an encoded video sequence.

The transmitter (540) may buffer the encoded video sequence created by the entropy encoder (545) in preparation for transmission over a communication channel (560), which may be a hardware/software link to a storage device that will store the encoded video data. The transmitter (540) may combine the encoded video data from the video encoder (503) with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (sources not shown).

The controller (550) may manage the operation of the video encoder (503). During encoding, the controller (550) may assign each encoded picture a certain encoded picture type, but this may affect the encoding techniques applicable to the respective picture. For example, images can be generally assigned to any of the following image types:

intra pictures (I pictures), which may be pictures that can be encoded and decoded without using any other picture in the sequence as a prediction source. Some video codecs tolerate different types of intra pictures, including, for example, Independent Decoder Refresh ("IDR") pictures. Variations of I-pictures and their corresponding applications and features are known to those skilled in the art.

A predictive picture (P picture), which may be a picture that can be encoded and decoded using intra prediction or inter prediction that predicts sample values of each block using at most one motion vector and a reference index.

A bi-directional predictive picture (B picture), which may be a picture that can be encoded and decoded using intra prediction or inter prediction that predicts sample values of each block using at most two motion vectors and a reference index. Similarly, multiple predictive pictures may use more than two reference pictures and associated metadata for reconstructing a single block.

A source image may typically be spatially subdivided into blocks of samples (e.g., blocks of 4 × 4, 8 × 8, 4 × 8, or 16 × 16 samples) and encoded block-by-block. These blocks may be predictively coded with reference to other (coded) blocks, which are determined according to the coding allocation applied to their respective pictures. For example, a block of an I picture may be non-predictively encoded, or the block may be predictively encoded (spatial prediction or intra prediction) with reference to already encoded blocks of the same picture. The blocks of pixels of the P picture can be predictively coded by spatial prediction or by temporal prediction with reference to a previously coded reference picture. A block of a B picture may be prediction coded by spatial prediction or by temporal prediction with reference to one or two previously coded reference pictures.

The video encoder (503) may perform encoding operations according to a predetermined video encoding technique or standard, such as the ITU-T h.265 recommendation. In operation, the video encoder (503) may perform various compression operations, including predictive encoding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the encoded video data may conform to the syntax dictated by the video coding technique or standard used.

In an embodiment, the transmitter (540) may transmit the additional data while transmitting the encoded video. The source encoder (530) may treat such data as part of an encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, redundant pictures and slices, among other forms of redundant data, SEI messages, VUI parameter set fragments, and the like.

The captured video may be provided as a plurality of source images (video images) in a time sequence. Intra-picture prediction, often abbreviated as intra-prediction, exploits spatial correlation in a given picture, while inter-picture prediction exploits (temporal or other) correlation between pictures. In an embodiment, the particular image being encoded/decoded, referred to as the current image, is partitioned into blocks. When a block in a current picture is similar to a reference block in a reference picture that has been previously encoded in the video and is still buffered, the block in the current picture may be encoded by a vector called a motion vector. The motion vector points to a reference block in a reference picture, and in the case where multiple reference pictures are used, the motion vector may have a third dimension that identifies the reference pictures.

In some embodiments, bi-directional prediction techniques may be used in inter-picture prediction. According to bi-directional prediction techniques, two reference pictures are used, e.g., a first reference picture and a second reference picture that are both prior to the current picture in video in decoding order (but may be past and future in display order, respectively). A block in a current picture may be encoded by a first motion vector pointing to a first reference block in a first reference picture and a second motion vector pointing to a second reference block in a second reference picture. In particular, the block may be predicted by a combination of a first reference block and a second reference block.

Furthermore, merge mode techniques may be used in inter-picture prediction to improve coding efficiency.

According to some embodiments disclosed herein, prediction such as inter-image prediction and intra-image prediction is performed in units of blocks. For example, according to the HEVC standard, pictures in a sequence of video pictures are partitioned into Coding Tree Units (CTUs) for compression, the CTUs in the pictures having the same size, e.g., 64 × 64 pixels, 32 × 32 pixels, or 16 × 16 pixels. In general, a CTU includes three Coding Tree Blocks (CTBs), which are one luminance CTB and two chrominance CTBs. Further, each CTU may be further split into one or more Coding Units (CUs) in a quadtree. For example, a 64 × 64-pixel CTU may be split into one 64 × 64-pixel CU, or 432 × 32-pixel CUs, or 16 × 16-pixel CUs. In an embodiment, each CU is analyzed to determine a prediction type for the CU, such as an inter prediction type or an intra prediction type. Furthermore, depending on temporal and/or spatial predictability, a CU is split into one or more Prediction Units (PUs). In general, each PU includes a luma Prediction Block (PB) and two chroma blocks PB. In an embodiment, a prediction operation in encoding (encoding/decoding) is performed in units of prediction blocks. Taking a luma prediction block as an example of a prediction block, the prediction block includes a matrix of pixel values (e.g., luma values), such as 8 × 8 pixels, 16 × 16 pixels, 8 × 16 pixels, 16 × 8 pixels, and so on.

Fig. 6 is a diagram of a video encoder (603) according to another embodiment of the present disclosure. A video encoder (603) is used to receive a processing block of sample values within a current video picture in a sequence of video pictures, such as a prediction block, and encode the processing block into an encoded picture that is part of an encoded video sequence. In this embodiment, a video encoder (603) is used in place of the video encoder (303) in the embodiment of fig. 3.

In an HEVC embodiment, a video encoder (603) receives a matrix of sample values for a processing block, e.g., a prediction block of 8 × 8 samples, etc. The video encoder (603) uses, for example, rate-distortion (RD) optimization to determine whether to encode the processing block using intra mode, inter mode, or bi-directional prediction mode. When encoding a processing block in intra mode, the video encoder (603) may use intra prediction techniques to encode the processing block into an encoded image; and when the processing block is encoded in inter mode or bi-prediction mode, the video encoder (603) may encode the processing block into the encoded picture using inter prediction or bi-prediction techniques, respectively. In some video coding techniques, the merge mode may be an inter-picture predictor mode, in which motion vectors are derived from one or more motion vector predictors without the aid of coded motion vector components outside the predictors. In some other video coding techniques, there may be motion vector components that are applicable to the subject block. In an embodiment, the video encoder (603) comprises other components, such as a mode decision module (not shown) for determining a processing block mode.

In the embodiment of fig. 6, the video encoder (603) comprises an inter encoder (630), an intra encoder (622), a residual calculator (623), a switch (626), a residual encoder (624), a general controller (621), and an entropy encoder (625) coupled together as shown in fig. 6.

The inter encoder (630) is used to receive samples of a current block (e.g., a processed block), compare the block to one or more reference blocks in a reference picture (e.g., blocks in previous and subsequent pictures), generate inter prediction information (e.g., redundant information descriptions, motion vectors, merge mode information according to inter coding techniques), and calculate inter prediction results (e.g., predicted blocks) using any suitable technique based on the inter prediction information. In some embodiments, the reference picture is a decoded reference picture that is decoded based on the encoded video information.

An intra encoder (622) is used to receive samples of a current block (e.g., a processing block), in some cases compare the block to a block already encoded in the same image, generate quantized coefficients after transformation, and in some cases also generate intra prediction information (e.g., intra prediction direction information according to one or more intra coding techniques). In an embodiment, the intra encoder (622) also computes intra prediction results (e.g., predicted blocks) based on the intra prediction information and reference blocks in the same picture.

The universal controller (621) is used to determine universal control data and control other components of the video encoder (603) based on the universal control data. In an embodiment, a general purpose controller (621) determines a mode of a block and provides a control signal to a switch (626) based on the mode. For example, when the mode is intra, the general purpose controller (621) controls the switch (626) to select an intra mode result for use by the residual calculator (623), and controls the entropy encoder (625) to select and add intra prediction information in the code stream; and when the mode is an inter mode, the general purpose controller (621) controls the switch (626) to select an inter prediction result for use by the residual calculator (623), and controls the entropy encoder (625) to select and add inter prediction information in the code stream.

A residual calculator (623) is used to calculate the difference (residual data) between the received block and the prediction result selected from the intra encoder (622) or the inter encoder (630). A residual encoder (624) is operative based on the residual data to encode the residual data to generate transform coefficients. In an embodiment, a residual encoder (624) is used to convert residual data from the time domain to the frequency domain and generate transform coefficients. The transform coefficients are then subjected to a quantization process to obtain quantized transform coefficients. In various embodiments, the video encoder (603) also includes a residual decoder (628). A residual decoder (628) is used to perform the inverse transform and generate decoded residual data. The decoded residual data may be suitably used by an intra encoder (622) and an inter encoder (630). For example, inter encoder (630) may generate a decoded block based on decoded residual data and inter prediction information, and intra encoder (622) may generate a decoded block based on decoded residual data and intra prediction information. The decoded blocks are processed appropriately to generate decoded pictures, and in some embodiments, the decoded pictures may be buffered in a memory circuit (not shown) and used as reference pictures.

An entropy coder (625) is used to format the code stream to produce coded blocks. The entropy encoder (625) generates various information according to a suitable standard such as the HEVC standard. In an embodiment, the entropy encoder (625) is used to obtain general control data, selected prediction information (e.g., intra prediction information or inter prediction information), residual information, and other suitable information in the code stream. It should be noted that, according to the disclosed subject matter, there is no residual information when a block is encoded in the merge sub-mode of the inter mode or bi-prediction mode.

Fig. 7 is a diagram of a video decoder (710) according to another embodiment of the present disclosure. A video decoder (710) is for receiving encoded images as part of an encoded video sequence and decoding the encoded images to generate reconstructed images. In an embodiment, the video decoder (710) is used in place of the video decoder (310) in the fig. 3 embodiment.

In the fig. 7 embodiment, video decoder (710) includes an entropy decoder (771), an inter-frame decoder (780), a residual decoder (773), a reconstruction module (774), and an intra-frame decoder (772) coupled together as shown in fig. 7.

An entropy decoder (771) may be used to reconstruct from the encoded image certain symbols representing syntax elements constituting the encoded image. Such symbols may include, for example, a mode used to encode the block (e.g., intra mode, inter mode, bi-prediction mode, a merge sub-mode of the latter two, or another sub-mode), prediction information (e.g., intra prediction information or inter prediction information) that may identify certain samples or metadata for use by an intra decoder 772 or an inter decoder 780, respectively, residual information in the form of, for example, quantized transform coefficients, and so forth. In an embodiment, when the prediction mode is inter or bi-directional prediction mode, inter prediction information is provided to an inter decoder (780); and providing the intra prediction information to an intra decoder (772) when the prediction type is an intra prediction type. The residual information may be inverse quantized and provided to a residual decoder (773).

An inter-frame decoder (780) is configured to receive inter-frame prediction information and generate an inter-frame prediction result based on the inter-frame prediction information.

An intra decoder (772) is used for receiving intra prediction information and generating a prediction result based on the intra prediction information.

A residual decoder (773) is used to perform inverse quantization to extract dequantized transform coefficients and process the dequantized transform coefficients to convert the residual from the frequency domain to the spatial domain. The residual decoder (773) may also need some control information (to obtain the quantizer parameter QP) and that information may be provided by the entropy decoder (771) (data path not labeled as this is only low-level control information).

A reconstruction module (774) is used to combine the residuals output by the residual decoder (773) and the prediction results (which may be output by the inter prediction module or the intra prediction module) in the spatial domain to form a reconstructed block, which may be part of a reconstructed image, which in turn may be part of a reconstructed video. It should be noted that other suitable operations, such as deblocking operations, may be performed to improve visual quality.

It should be noted that video encoder (303), video encoder (503), and video encoder (603) as well as video decoder (310), video decoder (410), and video decoder (710) may be implemented using any suitable techniques. In an embodiment, video encoder (303), video encoder (503), and video encoder (603), and video decoder (310), video decoder (410), and video decoder (710) may be implemented using one or more integrated circuits. In another embodiment, the video encoder (303), the video encoder (503), and the video encoder (603), and the video decoder (310), the video decoder (410), and the video decoder (710) may be implemented using one or more processors executing software instructions.

Inter prediction technique

For each inter-predicted CU, the motion parameters (including the motion vector, reference picture index, reference picture list usage index, and additional information needed for the new coding features of VVC) are used to generate the inter-predicted samples. These motion parameters may be signaled explicitly or implicitly. When a CU is coded in skip mode, the CU is associated with one PU and does not have significant residual coefficients, coded motion vector δ, or reference picture indices. When a CU is encoded in merge mode, the motion parameters of the current CU are obtained from neighboring CUs (including spatial candidate CUs and temporal candidate CUs) and additional schedules introduced in the VVC. The merge mode may be applied to any inter-predicted CU, including CUs encoded in skip mode. An alternative to the merge mode is explicit transmission of motion parameters, where for each CU, motion vectors, corresponding reference picture indices for each reference picture list, reference picture list usage flags, and other required information are explicitly signaled.

In addition to the inter-coding features in HEVC, VTM3 includes some of the modified inter-prediction coding tools listed below:

1) extended merge prediction;

2) merge mode with MVD (MMVD);

3) affine motion compensated prediction;

4) sub-block-based temporal motion vector prediction (SbTMVP);

5) performing triangulation prediction; and

6) combined Inter and Intra Prediction (CIIP).

The following description of the present application describes each inter-prediction encoding tool indicated in VVC.

1. Extended merge prediction mode

In some embodiments, the merge candidate list described above may be expanded, and the expanded merge candidate list may be used in the merge mode. For example, the extended merge candidate list may be constructed by sequentially adding the following five types of merge candidates until the merge candidates in the list reach the maximum allowable size:

1) a spatial Motion Vector Predictor (MVP) from a spatial neighboring Coding Unit (CU);

2) temporal MVP from co-located CUs;

3) history-based MVP from the history buffer;

4) pair-wise mean MVP; and

5) and zero MV.

The term "Coding Unit (CU)" may refer to a prediction block or a coding block divided from a picture.

In various embodiments, the size of the extended merge list may be signaled in a slice header, tile group header, or the like. For example, the maximum allowed size of the extended merge list is 6. In some embodiments, for a CU encoded in merge mode, the index of the best merge candidate may be encoded using truncated unary binarization (TU). The first bin of the merge index may be context coded while the other bins may be coded with bypass coding.

An embodiment of a generation process of different types of merge candidates in the extended merge candidate list is described below.

1.1 spatial candidate derivation

In one embodiment, the spatial merging candidates in the extended merging list are derived in a similar manner as the spatial merging candidates in HEVC. FIG. 8 is a block diagram of one embodiment of a spatial merge candidate location for a current block (810). A maximum of four merging candidates may be selected and derived in the candidate locations shown in fig. 8. In one example, the order may be a1, B1, B0, a0, and B2. In one example, position B2 is considered only when a CU at any one of positions a1, B1, B0, a0 is unavailable or intra-coded. In one example, a CU in one location may not be available because it belongs to another slice or tile.

After adding the candidate at position a1 to the expanded candidate list, redundancy checks may be performed on the addition of other candidates. By the redundancy check, merging candidates having the same motion information are excluded from the extended merging list, so that the encoding efficiency can be improved. To reduce computational complexity, in one example, not all possible pair-wise candidates are considered in the redundancy check. Instead, only the pairs connected by arrows in fig. 9 are considered. In some examples, a candidate is not added to the merge list if it has a corresponding item indicated in fig. 9 in the merge list and the corresponding item has the same or similar motion information as the candidate to be added.

1.2 derivation of temporal candidates

In one embodiment, only one temporal candidate is added to the extended merge list. FIG. 10 is an example of a temporal merging candidate (1031) of a current block (1011) derived in a current image (1001), according to one embodiment. The temporal merging candidate (1031) is derived by scaling the motion vector (1032) of the co-located block (1012) of the current block (1011) in the picture (1002) (referred to as co-located picture). In one example, the reference picture index of the co-located picture is explicitly signaled, e.g. in a slice header. In one example, the reference image index of the temporal merging candidate (1031) is set to 0. In one embodiment, the scaling operation is performed based on the distance of Picture Order Count (POC), Tb (1041) and Td (1042). For example, Tb (1041) is defined as the POC distance between the reference picture (1003) of the current block (1011) and the current picture (1001), and Td (1042) is defined as the POC distance between the reference picture (1004) of the co-located block (1012) and the co-located picture (1002).

Fig. 11 is candidate positions C1 and C0 of a temporal merging candidate from which a current block (1110) may be selected, according to an embodiment. In one embodiment, location C0 is first abbreviated to arrive at the temporal merging candidate. Position C1 is used when the merge candidate at position C0 is not available, or is intra-coded, or is outside the current row of CTUs.

1.3 History-based merge candidate derivation

In some embodiments, history-based motion-vector prediction (HMVP) merge candidates are added to the extended merge list of the current CU after the spatial and temporal MVPs. In HMVP, the motion information of previously encoded blocks may be stored in a table (or history buffer) as MVP candidates for the current CU. Such motion information is referred to as HMVP candidates. A table with multiple HMVP candidates may be maintained during the encoding/decoding process. In one example, the table may be reset (cleared) when a new row of CTUs is encountered. In one embodiment, whenever there is a non-sub-block inter-coded CU, the associated motion information may be added to the last entry of the table as a new HMVP candidate.

In one embodiment, the size of the HMVP table, denoted by S, is set to 6. Accordingly, at most 6 HMVP candidates may be added to the table. When inserting new motion candidates into the table, a constrained first-in-first-out (FIFO) rule may be used in one embodiment. Furthermore, redundancy checks may be made when adding new HMVP candidates to find out if the same HMVP is present in the table. If the same HMVP is found in the table, the same HMVP candidate may be removed from the table and all HMVP candidates following the removed HMVP candidate are moved forward. A new HMVP candidate may then be added at the end of the table.

In one embodiment, the HMVP candidates are used in the extended merge candidate list construction process. In one embodiment, several newly added HMVP candidates in the table are checked in sequence and inserted into the candidate list at a position after the TMVP candidate. A redundancy check may be performed to determine whether the HMVP candidate is similar or identical to a spatial candidate or a temporal merging candidate previously added to the extended merge list.

To reduce the number of redundancy check operations, the following simplification is introduced in one embodiment:

(i) the number of HMPV candidates for generating the extended merge list is set to (N < ═ 4)? M: (8-N), wherein N indicates the number of existing candidates in the extended merge list and M indicates the number of available HMVP candidates in the history table.

(ii) Once the total number of available merge candidates in the extended merge list reaches the maximum allowed number of merge candidates minus 1, the process of constructing the merge candidate list using HMVP is terminated.

1.4 Pairwise Average merging Candidate (Pairwise Average Merge Candidate) derivation

In some embodiments, a pair average candidate (pair average candidate) may be generated by averaging predefined pairs of candidates in the current merge candidate list. For example, in one embodiment, the predefined pairs may be defined as { (0, 1), (0, 2), (1, 2), (0, 3), (1, 3), (2, 3) }, where the number represents the merge index of the merge candidate list. For example, an average motion vector may be calculated separately for each reference picture list. If both motion vectors to be averaged are available in one list, they can be averaged even when they point to different reference pictures. If only one motion vector is available, this one motion vector can be used directly. If no motion vectors are available, in one embodiment, the corresponding pair may be skipped. In some embodiments, this candidate pair may be skipped when constructing the merged candidate list.

1.5 zero motion vector predictor

In some embodiments, when the expanded merge list is still not full after adding the pair-wise average merge candidate, zero MVP is inserted at the end of the extended merge list until the maximum allowed number of merge candidates is reached.

2. Merge pattern with motion vector difference (MMVD)

In addition to the merge mode (where implicitly derived motion information is directly used to generate prediction samples for the current CU), in some embodiments a Merge Mode (MMVD) with motion vector differences is used. The MMVD flag is signaled immediately after the skip flag and the merge flag are sent to indicate whether MMVD mode is used for the CU.

In MMVD mode, after the merge candidate is selected, the merge candidate is further modified by signaled Motion Vector Difference (MVD) information to obtain modified motion information. The MVD information includes a merge candidate flag, a distance index indicating a motion magnitude, and an index indicating a motion direction.

One of the first two candidates in the merge list is selected as the MV base (starting MV). The merge candidate flag is signaled to indicate which candidate to use. As shown in fig. 12, the MV basis determines a starting point (1211) or (1221) in a reference image list (1202) or (1203), denoted as L0 or L1, respectively.

The distance index indicates motion amplitude information and indicates a predefined offset from the start point (1211) or (1221). As shown in fig. 12, an offset is added to the horizontal component or the vertical component of the starting MV (MV basis) pointing to the position (1211) or (1221). The mapping relationship of the distance index and the predefined offset is given in table 1.

TABLE 1

The direction index indicates a direction of the MVD with respect to the start point (1211) or (1221). The direction index may represent one of four directions shown in table 2.

TABLE 2

Direction IDX	00	01	10	11
					x axis	+	-	Not applicable to	Not applicable to
y axis	Not applicable to	Not applicable to	+	-

Note that the meaning of the MVD symbol may vary according to the information of the starting MV. When the starting MV is a uni-directionally predicted MV or a bi-directionally predicted MV that both point to the same side of the current picture (i.e., the POC of both reference terms is greater than the POC of the current picture, or both are less than the POC of the current picture), the symbols in table 2 indicate the symbols of the MV offset added to the starting MV. When the starting MV is a bi-directionally predicted MV, and the two MVs point to different sides of the current picture (i.e., POC of one reference item is greater than POC of the current picture, POC of the other reference item is less than POC of the current picture), the sign in table 2 indicates the sign of MV offset added to the L0 MV component of the starting MV, while the sign of the L1 MV is opposite.

Based on the base MV, the offset, and the MVD symbol, a final MV may be determined for the current CU.

3. Affine motion compensated prediction

In HEVC, only translational motion models may be used for Motion Compensation Prediction (MCP). However, in the real world, there are many kinds of motions including zoom in/out, rotation, perspective motion, and other irregular motions. In VTM3, block-based affine transform motion compensated prediction is used. As shown in fig. 13 and 14, the affine motion field of the block can be described by two control point motion vectors (i.e., 4-parameter) or three control point motion vectors (i.e., 6-parameter).

In some examples, the current block is divided into sub-blocks. In these sub-blocks, a position is selected, and the motion vector of the selected position is referred to as the Motion Vector Field (MVF) of the sub-block. In one example, the sub-block is the smallest unit for affine compensation. The MVF of the sub-block may be determined based on the motion vector at the control point of the current block.

FIG. 13 is a schematic diagram of a current block and two control points CP0 and CP1 for the current block according to some embodiments of the present application. As shown in FIG. 13, CP0 is a control point located at the upper left corner of the current block, with motion vector MV0 ═ MV (MV)_0x,mv_0yAnd CP1 is a control point located at the upper right corner of the current block, with motion vector MV1 ═ MV (MV)_1x,mv_1y). When the position selected for the sub-block is (x, y) (where (x, y) is the relative position with respect to the upper left corner of the current block), the MVF of the sub-block is MV (MV)_x,mv_y) And can be calculated by:

where W represents the width and height of the current block (e.g., the current block is square in shape).

FIG. 14 is a schematic diagram of a current block and three control points CP0, CP1, and CP2 of the current block according to some embodiments of the present application. As shown in FIG. 14, the CP0 is a control point located at the upper left corner of the current block, and has a motion vector MV0 ═ MV (MV)_0x,mv_0y) The CP1 is a control point located at the upper right corner of the current block, and has a motion vector MV1 ═ (MV)_1x,mv_1y) The CP2 is a control point located at the lower right corner of the current block, having a motion vector MV2 ═ (MV)_2x,mv_2y). When the position selected for the sub-block is (x, y), which is a relative position with respect to the upper left corner of the current block, MVF of the sub-block is MV ═ MV_x,mv_y) And can be calculated from (equation 2):

where W denotes the width of the current block and H denotes the height of the current block.

To simplify motion compensated prediction, block-based affine transform prediction may be performed. To derive the motion vector for each 4 × 4 luma sub-block, the motion vector for the center sample of each sub-block is calculated according to the above equation and rounded to 1/16 fractional accuracy, as shown in fig. 15. A motion compensated embedding filter may then be applied to generate a prediction for each sub-block having the resulting motion vector. The subblock size of the chrominance component may also be set to 4 × 4. The MV of the 4 × 4 chroma sub-block may be calculated as the average of the MVs of the four corresponding 4 × 4 luma sub-blocks.

According to an aspect of the present application, affine predictors for a current block can be generated using various techniques, either using model-based affine prediction from multiple neighboring affine encoded blocks or using multiple control point-based affine prediction from multiple neighboring MVs.

3.1 affine merge prediction

According to some embodiments, the AF _ MERGE mode may be applied to CUs having a width and height greater than or equal to 8. In this mode, Control Point Motion Vectors (CPMV) of the current CU may be generated based on motion information of spatially neighboring CUs. There may be up to five Control Point Motion Vector Predictor (CPMVP) candidates and an index is signaled to indicate the CPMVP for the current CU.

In some embodiments, the affine merge candidate list is constructed using the following three types of CPMV candidates:

(i) inherited affine merging candidates extrapolated from CPMV of neighboring CUs,

(ii) a constructed affine merging candidate CPMVP derived using the translated MVs of neighboring CUs, and

(iii) and zero MV.

According to some embodiments, there may be at most two inherited affine candidates in the VTM3 that may be derived from affine motion models of neighboring blocks. The two inherited candidates may include one candidate from the left-neighboring CU and one candidate from the above-neighboring CU. In one example, the candidate block may be the candidate block shown in fig. 8. For left-hand predictors, the scan order may be A0 → A1, while for upper predictors, the scan order may be B0 → B1 → B2. In some embodiments, only the first inherited candidate from each side is selected, and no pruning check is performed between the two inherited candidates. When determining an adjacent affine CU, the control point motion vector of the CU may be used to derive a CPMVP candidate in the affine merge list of the current CU. As shown in FIG. 16, which illustrates a current block 1600, if the neighboring lower left block A is encoded in affine mode, the motion vectors v of the upper left corner, upper right corner and lower left corner of the CU 1602 containing block A are obtained₂、v₃And v₄. When block A is encoded by a 4-parameter affine model, it can be based on v₂And v₃To calculate the two CPMVs of the current CU. When block A is encoded with a 6-parameter affine model, it can be based on v₂、v₃And v₄To calculate the three CPMVs of the current CU.

According to some embodiments, the constructed affine candidates may be constructed by combining adjacent translational motion information of each control point. The motion information of the control point may be derived from a specific spatial neighboring block and a temporal neighboring block (i.e., "T") of the current block 1700 shown in fig. 17. CPMV (compact peripheral component management video) system_k(k ═ 1,2,3,4) may denote the kth control point. For CPMV₁The B2 → B3 → a2 block may be checked and the MV of the first available block may be used. For CPMV₂B1 → B0 blocks can be checked, while for CPMV₃The A1 → A0 block may be examined. If TMVP is available, it can be used as CPMV₄。

In some embodiments, after obtaining the MVs of the four control points, affine merging candidates may be constructed based on the motion information of these control points. The following example combinations of control points MV can be used to construct a block: { CPMV₁,CPMV₂,CPMV₃},{CPMV₁,CPMV₂,CPMV₄},{CPMV₁,CPMV₃,CPMV₄},{CPMV₂,CPMV₃,CPMV₄},{CPMV₁,CPMV₂},and{CPMV₁,CPMV₃}

The combination of three CPMVs constructs a 6-parameter affine merging candidate, while the combination of two CPMVs constructs a 4-parameter affine merging candidate. In some embodiments, to avoid the motion scaling process, relevant combinations of control points MV are discarded if the reference indices of the control points are different.

After checking the inherited affine merging candidate and the constructed affine merging candidate, if the list is still not full, a zero MV may be inserted at the end of the list.

3.2 affine AMVP prediction

In some embodiments, the affine AMVP mode may be applied to CUs having a width and height greater than or equal to 16. An affine flag at the CU level may be signaled in the codestream to indicate whether affine AMVP mode is used, and then another flag may be signaled to indicate whether 4-parameter affine or 6-parameter affine is used. The difference of the CPMV of the current CU and its predictor can be signaled in the codestream. The affine AVMP candidate list size may be 2, and may be generated by sequentially using the following four types of CPVM candidates:

(i) inherited affine AMVP candidates extrapolated from CPMVs of neighboring CUs;

(ii) a constructed affine AMVP candidate derived using the translated MVs of the neighboring CUs;

(iii) a translation MV from a neighboring CU; and

(iv) and zero MV.

In one example, the checking order of the inherited affine AMVP candidates is similar to the checking order of the inherited affine merge candidates. The difference is that for the AVMP candidate, an affine CU with the same reference picture as in the current block is considered. In some embodiments, when the inherited affine motion predictor is inserted into the candidate list, no pruning process is performed.

The constructed AMVP candidates may be derived from specified spatial neighboring blocks as shown in fig. 17. The same checking order as in the construction process of the affine merging candidate may be used. In addition, reference picture indices of neighboring blocks may also be checked. The first block in the checking order that is inter-coded and has the same reference picture as the current CU is used. When the current CU is encoded with a 4-parameter affine model, and both CPMV0 and CPMV1 are available, the two available CPMVs are added as one candidate to the affine AMVP list. When the current CU is encoded in the 6-parameter affine mode and all three CPMVs (CPMV0, CPMV1, and CPMV2) are available, these available CPMVs are added as a candidate to the affine AMVP list. Otherwise, setting the constructed AMVP candidate as unavailable.

If there are still fewer than 2 candidates in the affine AMVP list after checking the inherited affine AMVP candidates and the constructed affine AMVP candidates, then translational motion vectors (when available) adjacent to the control points are added to predict all the control points MVs of the current CU. Finally, if the affine AMVP list is still not full, the affine AMVP list may be populated with zero MVs.

4. Subblock-based temporal motion vector predictor (SbTMVP)

In some embodiments, similar to Temporal Motion Vector Prediction (TMVP) in HEVC, a sub-block based temporal motion vector prediction (SbTMVP) method supported by VTM may use motion fields in co-located pictures to improve motion vector prediction and merging modes of CUs in a current picture. The same co-located image as in TMVP can be used for SbTVMP. SbTMVP differs from TMVP mainly in the following two aspects: (1) TMVP predicts motion at CU level, while SbTMVP predicts motion at sub-CU level; and (2) although the TMVP extracts a temporal motion vector from a co-located block in the co-located image (the co-located block is a block to the right and below or a block at the center with respect to the current CU), the SbTMVP performs a motion shift obtained from a motion vector of one of the spatial neighboring blocks of the current CU before extracting temporal motion information from the co-located image.

The SbTVMP process is shown in fig. 18 and fig. 19. In some embodiments, SbTMVP predicts the motion vectors of sub-CUs within the current CU in two steps. In the first step, as shown in FIG. 18, spatial neighboring blocks of the current block (1800) are checked in the order of A1, B1, B0, and A0. Once the first available spatial neighboring block with a motion vector using the co-located picture as its reference picture is identified, the motion vector is selected as the motion shift used. If no such motion vector is identified from the spatial neighboring blocks, the motion shift is set to (0, 0).

In the second step, the motion shift identified in the first step is performed (i.e., added to the coordinates of the current block) to obtain the motion information (e.g., motion vector and reference index) of the sub-CU level from the co-located image as shown in fig. 19. The example in fig. 19 assumes that the motion shift (1949) is set as the motion vector of the spatially adjacent block a1 (1943). Then, for a current sub-CU (e.g., sub-CU (1944)) in a current block (1942) of the current picture (1941), motion information of a corresponding co-located sub-CU (e.g., co-located sub-CU (1954)) in a co-located block (1952) of the co-located picture (1951) is used to derive motion information of the current sub-CU. In a similar manner to the TMVP process in HEVC, the motion information of the corresponding co-located sub-CU (e.g., co-located sub-CU (1954)) is converted to a motion vector and reference index of the current sub-CU (e.g., sub-CU (1944)), where temporal motion scaling is performed to align the reference picture of the temporal motion vector with the reference picture of the current CU.

According to some embodiments, a combined sub-block-based merge list comprising both SbTVMP candidates and affine merge candidates may be used in the sub-block-based merge mode. The SbTVMP mode may be enabled or disabled by a Sequence Parameter Set (SPS) flag. When SbTMVP mode is enabled, the SbTMVP predictor is added as the first entry of the sub-block based merge list, followed by the affine merge candidate. In some applications, the maximum allowed size of the sub-block based merge list is 5. For example, the sub-CU size used in SbTMVP is fixed to 8 × 8. Like the affine merge mode, the SbTMVP mode is applicable only to CUs having a width and a height both greater than or equal to 8.

The coding logic of the additional SbTMVP merge candidates is the same as the coding logic of the other merge candidates. That is, for each CU in a P slice or a B slice, an additional rate-distortion (RD) check may be performed to determine whether to use the SbTMVP candidate.

5. Triangulation prediction

In some embodiments, a Triangular Prediction Mode (TPM) may be used for inter prediction. In one embodiment, a TPM-conducting CU is a CU that is greater than or equal to 8 x 8 samples in size and is encoded using skip or merge mode. In one embodiment, for CUs that meet these conditions (size greater than or equal to 8 × 8 samples and coded in skip or merge mode), a flag at one CU level is signaled to indicate whether to use TPM.

In some embodiments, when using a TPM, a CU may be evenly partitioned into two triangular partitions, either diagonally or anti-diagonally, as shown in fig. 20. In fig. 20, the first CU (2010) may be partitioned from the top left corner to the bottom right corner, resulting in two triangular prediction units, PU1 and PU 2. The second CU (2020) may be partitioned from the top right corner to the bottom left corner, resulting in two triangular prediction units, PU1 and PU 2. Each of the trigonometric prediction units PU1 or PU2 in CU (2010) and CU (2020) may use inter prediction of its own motion information. In some embodiments, only unidirectional prediction is allowed for each of the triangular prediction units. Thus, each trigonometric prediction unit has one motion vector and one reference picture index. One-way prediction motion constraints may be applied to ensure that no more than two motion compensated predictions are performed for each CU, as in conventional bi-directional prediction methods. In this way, processing complexity can be reduced. The uni-directional prediction motion information of each of the trigonometric prediction units may be derived from the uni-directional prediction merge candidate list. In some other embodiments, bi-directional prediction is allowed for each of the triangular prediction units. Therefore, the bi-directional prediction motion information of each of the trigonometric prediction units can be derived from the bi-directional prediction merge candidate list.

In some embodiments, when the CU level flag indicates that the current CU is using TPM for encoding, an index, referred to as a triangle partition index, is further signaled. For example, the triangle partition index may have a value in the range of [0,39 ]. Using the trigonometric partition index, the direction (diagonal or anti-diagonal) of the trigonometric partition and the motion information of each partition (e.g., the merge index of the corresponding unidirectional prediction candidate list (or referred to as TPM index)) can be obtained by table lookup at the decoder side. In one embodiment, after each of the triangular prediction units is predicted based on the obtained motion information, sample values along a diagonal or anti-diagonal edge of the current CU are adjusted by performing a blending process with adaptive weights. As a result of the mixing process, a prediction signal for the entire CU can be obtained. The transform and quantization process may then be applied to the entire CU in a manner similar to other prediction modes. Finally, motion fields may be created for CUs predicted with the trigonometric partition mode, e.g. by storing motion information in a set of 4 × 4 units partitioned from the CU. The motion field may be used, for example, in a subsequent motion vector prediction process to construct a merge candidate list.

8.1 construction of unidirectional prediction candidate lists

In some embodiments, a merge candidate list for predicting two trigonometric prediction units of a coding block that is TPM processed may be constructed based on a set of spatial neighboring blocks and temporal neighboring blocks of the coding block. Such a merged candidate list may be referred to as a TPM candidate list, with TPM candidates listed herein. In one embodiment, the merge candidate list is a unidirectional prediction candidate list. The unidirectional prediction candidate list includes five unidirectional prediction candidate motion vectors. For example, five uni-directional prediction candidate motion vectors may be derived from seven neighboring blocks, including five spatial neighboring blocks (numbered 1 through 5 in fig. 21) and two temporal neighboring blocks (numbered 6 through 7 in fig. 21).

In one embodiment, the motion vectors of the seven neighboring blocks are collected and put into the uni-directional prediction candidate list in the following order: firstly, motion vectors of adjacent blocks are predicted unidirectionally; then, for the bi-predicted neighboring blocks, the L0 motion vector (i.e., the L0 motion vector portion of the bi-predicted MV), the L1 motion vector (i.e., the L1 motion vector portion of the bi-predicted MV), and the average motion vector of the L0 and L1 motion vectors of the bi-predicted MV. In one embodiment, if the number of candidates is less than five, a zero motion vector is added to the end of the list. In some other embodiments, the merge candidate list may include less than 5 or more than 5 uni-directional prediction candidates or bi-directional prediction merge candidates, which may be selected from the same or different candidate locations as those shown in fig. 15B.

5.2 lookup tables and Table indices

In one embodiment, a CU is encoded with a triangle partition mode with a TPM (or merge) candidate list of five TPM candidates. Accordingly, when 5 merging candidates are used per triangle PU, there are 40 possible ways to predict a CU. In other words, there may be 40 different combinations of split orientation and merge (or TPM) indices: 2 (possible partition directions) × (5 (possible merge indices of the first triangular prediction unit) × 5 (possible merge indices of the second triangular prediction unit) — 5 (possible number of identical merge indices when the first and second prediction unit pairs share the same merge index)). For example, when one same merge index is determined for two trigonometric prediction units, a CU may be processed using a conventional merge mode instead of the trigonometric prediction mode.

Thus, in one embodiment, a triangular partition index with a value range of [0,39] may be used to indicate which of the 40 combinations to use based on a lookup table. Fig. 22 is an example of a lookup table (2200) for obtaining a division direction and a merge index from one triangle division index. As shown in the lookup table (2200), the first row (2201) includes triangle partition indices ranging from 0 to 39; a second row (2202) includes possible division directions represented by 0 or 1; the third row (2203) comprises possible first merging indices corresponding to the first trigonometric prediction unit, ranging from 0 to 4; the fourth row (2204) includes possible second merge indices corresponding to the second trigonometric prediction unit, ranging from 0 to 4.

For example, when a triangular partition index having a value of 1 is received at the decoder, based on column (2220) of look-up table (2200), it may be determined that the split direction is the partition direction represented by the value of 1, and the first and second merge indices are 0 and 1, respectively. Since the triangle partition index is associated with a lookup table, the triangle partition index is also referred to as a table index in this application.

5.3 blending along the triangular part edge

In one embodiment, after prediction using the respective motion information for each of the trigonometric prediction units, a blending process is performed on the two prediction signals of the two trigonometric prediction units to derive samples around a diagonal or anti-diagonal edge. The blending process adaptively selects between two sets of weighting factors based on the motion vector difference between the two trigonometric prediction units. In one embodiment, the two weight factor sets are as follows:

(1) first set of weighting factors: {7/8, 6/8, 4/8, 2/8, 1/8} for the luma component of the sample, and {7/8, 4/8, 1/8} for the chroma component of the sample; and

(2) second set of weighting factors: for the luma component of the sample {7/8, 6/8, 5/8, 4/8, 3/8, 2/8, 1/8}, and for the chroma component of the sample {6/8, 4/8, 2/8 }.

The second set of weighting factors has more luminance weighting factors and mixes more luminance samples along the partition edge.

In one embodiment, the following condition is used to select one from two weight factor sets. The second set of weighting factors may be selected when the reference images of the two triangular partitions are different, or when the motion vector difference between the two triangular partitions is greater than a threshold (e.g., 16 luma samples). Otherwise, a first set of weighting factors may be selected.

Fig. 23 is a diagram of a CU using a first weighting factor set. As shown, the first encoded block (2301) includes luma samples and the second encoded block (2302) includes chroma samples. The sets of pixels along the diagonal edge in the encoded block (2301) or (2302) are labeled with the numbers 1,2, 4, 6, and 7 corresponding to the weighting factors 1/8, 2/8, 4/8, 6/8, and 7/8, respectively. For example, for a pixel labeled 2, the sample value of the pixel after the blending operation can be obtained according to the following equation:

the mixed sample value is 2/8 x P1+6/8 x P2,

where P1 and P2 denote sample values at the corresponding pixels, but belong to the predictors of the first and second triangular prediction units, respectively.

6. Combined Inter and Intra Prediction (CIIP)

In VTM3, when a CU is encoded in merge mode, and if the CU contains at least 64 luma samples (i.e., the width of the CU multiplied by the height of the CU is equal to or greater than 64), an additional flag is signaled to indicate whether a combined inter/intra prediction (CIIP) mode is used for the current CU.

To form CIIP prediction, the intra-prediction mode is first derived from two additional syntax elements. Up to four possible intra prediction modes may be used: DC mode, planar mode, horizontal mode, or vertical mode. Conventional intra and inter decoding processes may then be used to derive the inter-predicted signal and the intra-predicted signal. Finally, the inter-prediction signal and the intra-prediction signal may be weighted averaged to obtain the CIIP prediction.

6.1 Intra prediction mode derivation

In one embodiment, up to 4 intra prediction modes (including DC mode, planar mode, horizontal mode, and vertical mode) may be used to predict the luma component in the CIIP mode. If the CU shape is very wide (i.e., width is greater than twice the height), horizontal mode is not allowed. If the CU shape is very narrow (i.e., height is more than twice the width), the vertical mode is not allowed. In these cases, only 3 intra prediction modes are allowed.

The CIIP mode uses 3 Most Probable Modes (MPMs) for intra prediction. The CIIP MPM candidate list is formed as follows:

(i) setting a left adjacent block and an upper adjacent block as A and B respectively;

(ii) the intra prediction modes for block a and block B (denoted intramode a and intramode B, respectively) are derived as follows:

a. let X be A or B,

b. if 1) Block X is not available; or 2) prediction of block X does not use CIIP mode or intra mode; or 3) block X is outside the current CTU, then intraModex is set to DC, an

c. Otherwise, intramode is set to: 1) DC or plane (if intra prediction mode of block X is DC or plane); or 2) vertical (if the intra prediction mode for block X is a "vertical-like" angular mode (e.g., greater than 34)), or 3) horizontal (if the intra prediction mode for block X is a "horizontal-like" angular mode (e.g., less than or equal to 34));

(iii) if IntraModeA and IntraModeB are the same:

a. if IntraModeA is planar or DC, then the three MPMs are set to { planar, DC, vertical } in order;

b. otherwise, the three MPMs are set to { intramode, plane, DC } in order; and

(iv) otherwise (i.e., IntraModeA and IntraModeB are different):

a. the first two MPMs are sequentially set as { IntraModeA, IntraModeB }; and is

b. Checking the uniqueness of plane, DC and vertical in order for the first two MPM candidate modes; once the unique pattern is found, it is added as a third MPM.

As described above, if the CU shape is very wide or very narrow, the MPM flag is inferred to be 1 without signaling. Otherwise, an MPM flag is signaled to indicate whether the CIIP intra prediction mode is one of the CIIP MPM candidate modes.

If the MPM flag is 1, the MPM index is further signaled to indicate which MPM candidate mode to use in CIIP intra prediction. Otherwise, if the MPM flag is 0, the intra prediction mode is set to the "missing" mode in the MPM candidate list. For example, if the plane mode is not in the MPM candidate list, the plane mode is a missing mode, and the intra prediction mode is set to the plane mode. Since 4 possible intra prediction modes are allowed in CIIP and the MPM candidate list contains only 3 intra prediction modes, one of the 4 possible modes may be a missing mode.

For the chroma component, the DM mode is applied without additional signaling. For example, chroma uses the same prediction mode as luma.

The intra-prediction mode of a CU encoded using CIIP will be saved and used for intra-mode coding of subsequent neighboring CUs.

6.2 merging of inter and Intra prediction signals

The inter prediction signal P in the CIIP mode can be derived using the same inter prediction process as in the conventional merge mode_inter(ii) a And the CIIP intra-prediction mode can be used to derive the intra-prediction signal P after the conventional intra-prediction process_intra. The intra-prediction signal and the inter-prediction signal may then be combined using a weighted average, wherein the weight values depend on the intra-prediction mode, and the samples are located in the encoded blocks.

For example, if the intra prediction mode is the DC mode or the planar mode, or if the block width or height is less than 4, equal weights are applied to the intra prediction signal and the inter prediction signal.

Otherwise, the weights are determined based on the intra-prediction mode (in this case horizontal mode or vertical mode) and the sample position in the block. In the horizontal prediction mode (the weights for the vertical mode are derived in a similar manner, but only in the orthogonal direction), W is the width of the block and H is the height of the block. The coding block is first divided into four equal-area portions, each of which has a size of (W/4) × H. The weight wt of each of the 4 regions is set to 6,5, 3, and 2, respectively, starting from the portion closest to the intra prediction reference sample and ending at the portion farthest from the intra prediction reference sample. The final CIIP prediction signal can be derived using the following formula:

P_CIIP＝((8-wt)*P_inter+wt*P_intra+4) > 3 (equation 3)

7. Interleaved affine prediction

In some embodiments, interleaved affine prediction is used. For example, as shown in fig. 24, the current block (2410) having a size of 16 × 16 samples is divided into subblocks having two different division patterns, pattern 0(2420) and pattern 1 (2430). With respect to the pattern 0(2420), the current block (2410) is divided into 4 × 4 subblocks (2421) of equal size. In contrast, pattern 1(2430) is shifted by an offset of 2 × 2 toward the lower right corner of the current block (2410) with respect to pattern 0 (2420). With respect to the pattern 1(2430), the current block (2410) is divided into full sub-blocks (2431) and partial sub-blocks (2432), each of which has a size of 4 × 4 and each of which has a size of less than 4 × 4. In fig. 24, partial sub-blocks (2432) constitute a shaded area, surrounding a non-shaded area formed by full sub-blocks (2431).

Subsequently, two subsidiary predictions P0(2440) and P1(2450) corresponding to the two division patterns (2420) and (2430) are generated by Affine Motion Compensation (AMC). For example, the affine model may be determined according to affine merge candidates in the sub-block-based merge candidate list. The MV of each sub-block divided from the styles 0(2420) and (2430) may be derived based on an affine model. For example, the MVs may each start from the center position of the corresponding sub-block.

Thereafter, the final prediction is calculated by merging the two predictions P0(2440) and P1 (2450). For example, a weighted average operation (2461) may be performed to compute a weighted average of two corresponding samples (represented by P0 and P1) in the two predictions P0(2440) and P1(2450) on a pixel-by-pixel basis according to the following equation:

wherein ω is₀And ω₁Are weights corresponding to a pair of co-located samples in two predictions P0(2240) and P1(2250), respectively.

In one embodiment, the weight of each sample in the weighted average operation (2461) may be determined according to the pattern (2500) shown in fig. 25. Pattern (2500) includes 16 samples in sub-block 2510 (e.g., complete sub-block (2421) or (2431)). The prediction sample located at the center of the sub-block (2510) is associated with a weight value of 3, while the prediction sample located at the boundary of the sub-block (2510) is associated with a weight value of 1. Depending on the location of the sample within sub-block (2421) or (2431), the weight corresponding to the sample may be determined based on the pattern (2500).

In one embodiment, to avoid fine block motion compensation, the interleaved prediction is only applied to regions with sub-block sizes of 4 × 4 for both partition patterns, as shown in fig. 24. For example, in the shaded region (2430) of pattern 1, the interleaved prediction is not employed, and in the non-shaded region (2430) of pattern 1, the interleaved prediction is employed.

In one embodiment, interleaved prediction may be applied to the chrominance and luminance components. Furthermore, since the areas of the reference pictures for AMC of all sub-blocks are extracted together as a whole, the memory access bandwidth is not increased by the interleaved prediction. Therefore, no additional read operation is required.

Furthermore, for flexibility, a flag is signaled in the slice header indicating whether or not interleaved prediction is used. In one example, the flag is always signaled as 1. In various embodiments, interleaved affine prediction may be applied to a uni-directional predictive affine block, or to both uni-directional and bi-directional predictive affine blocks.

Inter prediction related signaling in VVC

8.1 inter prediction related syntax elements

Table 4 shows an example of inter prediction related syntax elements at the CU level in the VVC. The array indices x0, y0 indicate the position of the top left luminance sample of the current coding block relative to the top left luminance sample of the image (x0, y 0).

In table 4, cbWidth and cbHeight represent the width and height, respectively, of the luma sample of the current coding block.

TABLE 4-syntax elements related to inter prediction

8.2 inter prediction related semantics at CU level

Table 5 shows inter prediction related semantics at the CU level. Specifically, according to Table 5, inter _ pred _ idc [ x0] [ y0] indicates whether list0, list1, or bi-prediction is used for the current coding unit. The array indices x0, y0 in table 4 indicate the position of the top left luminance sample of the current coding block relative to the top left luminance sample of the image (x0, y 0).

TABLE 5 inter prediction related semantics

When inter _ PRED _ idc x0 y0 is not present, it is inferred to be equal to PRED _ L0.

Table 6 below gives the binarization of the syntax element inter _ pred _ idc.

TABLE 6 binarization of inter _ pred _ idc

In Table 4, ref _ idx _ l0[ x0] [ y0] indicates the List0 reference Picture index for the current coding unit. The array indices x0, y0 indicate the position of the top left luminance sample of the current coding block relative to the top left luminance sample of the image (x0, y 0). When ref _ idx _ l0[ x0] [ y0] is not present, it is inferred to be equal to 0. In one embodiment, the codestream consistency may require inter _ pred _ idc x0 y0 to be equal to 0 when the current decoded picture is a reference picture for the current coded block.

9. Affine prediction of small subblock sizes

As described above, affine inter prediction can be performed on each 4 × 4 subblock, and each subblock has its own MV derived from CPMV. When predicting a block using affine inter prediction, conventional inter prediction may be applied to sub-blocks using sub-block MVs derived from CPMV. The coding efficiency can be further improved by using a reduced sub-block size. Changing sub-block sizes of an affine inter prediction process of a video codec is described.

In one embodiment, the size of the sub-block in affine inter prediction is set to 4 × 4 samples. However, the sub-block size for motion compensation may be set to a smaller rectangle in which the width of the sub-block is larger than the height of the sub-block, for example, 4 × 2, as shown in fig. 26. In fig. 26, the current block (2600) is divided into sixteen 4 × 4 sub-blocks, including the upper-left sub-block (2610). Affine inter prediction can be performed for each 4 × 4 subblock. In one embodiment, the sub-block size may be reduced to 4x 2 samples. For example, the current block (2620) may be divided into thirty-two 4 × 2 sub-blocks, including the top-left sub-block (2630). Affine inter prediction may be performed for each 4 × 2 sub-block in the current block (2620).

The small/reduced sub-block size is not limited to 4 × 2 samples, but may be set to other sizes such as 2 × 4, 4 × 1, 1 × 4, 2 × 1, or 1 × 2 and used for motion compensation.

In one embodiment, when a small sub-block size of, for example, 4 × 2 samples is used, sub-block motion vectors for interpolation can be derived from CPMV in a manner similar to that of the present application in section 3. For example, the motion vector of the small sub-block (2630) for interpolation may be derived from the CPMV of the current block (2620) using equation 1 or equation 2. Specifically, the motion vector of the center sample of each small sub-block in the current block (2620) may be derived using equation 1 or equation 2 and rounded to 1/16 fractional precision. The prediction for each small sub-block with the resulting motion vector may then be generated using a motion compensated interpolation filter.

In one embodiment, in order to store the sub-block motion vectors of each 4 × 4 sub-block in the current block, the motion vectors of each 4 × 4 sub-block in the current block may be derived and stored as the sub-block motion vectors of each 4 × 4 sub-block. The stored motion vectors of the 4x4 block can be used for the merge mode of the neighboring blocks. The encoder and decoder may derive a motion vector for each small sub-block while performing motion compensation.

In one embodiment, in order to store the sub-block motion vector of each 4 × 4 sub-block in the current block, the motion vector of the upper-left small sub-block in each 4 × 4 block is stored and used as the motion vector of each 4 × 4 block. The top-left small subblock may refer to a small subblock that includes top-left samples. The stored motion vectors of the 4x4 block can be used for the merge mode of the neighboring blocks. The encoder and decoder may derive a motion vector for each small sub-block while performing motion compensation.

In one embodiment, in order to store the sub-block motion vector of each 4 × 4 sub-block in the current block, the motion vector of the lower-right small sub-block in each 4 × 4 block is stored and used as the motion vector of each 4 × 4 block. The lower right small sub-block may refer to a small sub-block including the lower right samples. The stored motion vectors of the 4x4 block can be used for the merge mode of the neighboring blocks. The encoder and decoder may derive a motion vector for each small sub-block while performing motion compensation.

In one embodiment, in order to store the sub-block motion vector of each 4 × 4 sub-block in the current block, the motion vector of the small sub-block at the center of each 4 × 4 block is stored and used as the motion vector of each 4 × 4 block. In one example, the small sub-block at the center of each 4x4 block includes the sample at the (2,2) position within the 4x4 block. In another example, a small sub-block may include samples near the (2,2) position of a 4x4 block. The stored motion vectors of the 4x4 block can be used for the merge mode of the neighboring blocks. The encoder and decoder may derive a motion vector for each small sub-block while performing motion compensation.

When a small sub-block is used for affine mode, the chroma MV can be derived based on the co-located luma block. In one embodiment, the chroma sub-blocks have a fixed sub-block size, independent of the co-located luma block size. When the 4:2:2 chroma format is used, the fixed sub-block size may be 2 × 2 chroma samples, or 4 × 4 chroma samples, or 2 × 4 chroma samples. Each chroma sub-block may have at least one co-located luma sub-block. When the chroma sub-block has more than one co-located luma sub-block, in some examples, the MV of the chroma sub-block may be derived based on an average MV of the more than one co-located luma sub-blocks. In some examples, the MVs of the chroma sub-blocks may be derived from MVs of one co-located luma sub-block (e.g., an upper-left luma sub-block, a center luma sub-block, or a lower-right luma sub-block). In some examples, the MVs of the chroma sub-blocks may be derived from a weighted average of a subset of more than one co-located luma sub-blocks.

In one embodiment, small sub-block affine inter prediction may be used for uni-directional prediction only. When the small sub-block affine inter prediction is used for only unidirectional prediction, the memory bandwidth can be reduced. In one embodiment, small sub-block affine inter prediction can be used for both uni-directional and bi-directional prediction.

In one embodiment, small sub-block affine is enabled when uni-directional prediction is used and the memory bandwidth of small sub-block motion compensation of an 8 x 8 block is less than or equal to a threshold. Otherwise, conventional 4 × 4 sub-block affine inter prediction is used. In one example, the threshold for the memory bandwidth of an 8 × 8 affine block may be set to a memory bandwidth of 15 × 15 samples.

In one example, when using small subblock affine inter prediction, a conventional 8-tap interpolation filter is used for the horizontal direction and a shorter-tap filter is used for the vertical direction. In one example, a 6-tap filter may be used for the vertical direction. In another example, a 4-tap filter may be used for the vertical direction.

In embodiments of the present application, a flag (e.g., small _ subblock _ affine _ flag) may be signaled at a high level (e.g., slice, tile, group of tiles, picture, sequence) to indicate whether small sub-block affine inter prediction is used.

In one embodiment, an SPS flag (e.g., SPS _ small _ sub _ fine _ flag) may be signaled. If this flag is true, a picture-level or tile group-level flag (e.g., picture _ small _ sub _ fine _ flag) may be signaled to indicate whether a small sub-block size for affine inter prediction may be used for the currently decoded picture or tile group.

In one embodiment, small sub-block sizes for affine inter prediction flags (e.g., small _ sub _ frame _ flag) may be signaled at a level below the sequence level (e.g., picture level, tile group level, picture level, block level, etc.). In this case, the small sub-block size for the affine inter prediction flag may be signaled only if the sequence level signaled affine prediction enable flag is true. Otherwise, when the affine prediction enable flag at the sequence level is signaled as false, the small sub-block size for the affine inter prediction flag is inferred as false.

In another embodiment, small sub-block sizes for affine prediction may be enabled by other methods (e.g., by predefined default settings) and may not be signaled.

FIG. 27 is a flowchart outlining a method (2700) for small sub-block affine prediction in accordance with some embodiments of the present application. In various embodiments, method (2700) may be performed by processing circuitry, such as processing circuitry in terminal devices (210), (220), (230), and (240), processing circuitry that performs the functions of video decoder (310), processing circuitry that performs the functions of video decoder (410), and so forth. In some embodiments, the method (2700) is implemented by software instructions, such that when the processing circuit executes the software instructions, the processing circuit performs the method (2700). The method starts at (S2701) and proceeds to (S2710).

At (S2710), prediction information of a block in a current image in the encoded video stream is decoded. The prediction information indicates an affine model in the inter prediction mode.

At (S2720), a motion vector of a control point of the block is determined according to the affine model. The affine model includes a 4-parameter model described by the motion vectors of two control points and a 6-parameter model described by the motion vectors of three control points. The motion vectors of the control points may be determined using an affine merge mode or an affine AMVP mode.

At (S2730), a motion vector of a sub-block of the block is determined according to the determined motion vector. For example, the sub-block is one of a plurality of sub-blocks of the block, and a motion vector is determined for each of the plurality of sub-blocks. One of the width and height of the sub-block is less than 4 (e.g., 4 luma samples). In one embodiment, in order to derive a motion vector of each sub-block divided from the block, the motion vector of the center sample of each sub-block may be calculated according to equations 1 and 2 and rounded to 1/16 fractional precision. The size of the sub-blocks may be less than 4x4 samples. I.e. one of the width and height of the sub-blocks is less than 4. For example, the size of the sub-block may be 4 × 2 samples.

At (S2740), at least one sample of the sub-block is reconstructed based on the determined motion vector. The method (2700) proceeds to and terminates at (S2799).

The techniques described above may be implemented as computer software using computer readable instructions and physically stored in one or more computer readable media. For example, fig. 28 illustrates a computer system (2800) suitable for implementing some embodiments of the present application.

The computer software may be encoded using any suitable machine code or computer language and may employ assembly, compilation, linking or similar mechanisms to generate instruction code. These instruction codes may be executed directly by one or more computer Central Processing Units (CPUs), Graphics Processing Units (GPUs), etc., or by operations of code interpretation, microcode execution, etc.

The instructions may be executed in various types of computers or computer components, including, for example, personal computers, tablets, servers, smart phones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 28 for the computer system (2800) are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the application. Neither should the configuration of the components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (2800).

The computer system (2800) may include some human interface input devices. Such human interface input devices may be responsive to input by one or more human users through, for example, tactile input (such as keystrokes, swipes, data glove movements), audio input (such as speech, taps), visual input (such as gestures), olfactory input (not shown). The human interface device may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (such as speech, music, ambient sounds), images (such as scanned images, photographic images obtained from still image cameras), video (such as two-dimensional video, three-dimensional video including stereoscopic video).

The human interface input device may include one or more of the following (only one depicted each): keyboard (2801), mouse (2802), touch pad (2803), touch screen (2810), data glove (not shown), joystick (2805), microphone (2806), scanner (2807), camera (2808).

The computer system (2800) may also include some human interface output devices. Such human interface output devices may stimulate the perception of one or more human users through, for example, tactile output, sound, light, and smell/taste. Such human interface output devices may include tactile output devices (e.g., tactile feedback through a touch screen (2810), data glove (not shown), or joystick (2805), but there may also be tactile feedback devices that do not act as input devices), audio output devices (such as speakers (2809), headphones (not shown)), visual output devices, and printers (not shown), wherein the visual output devices such as a screen (2810), virtual reality glasses (not shown), holographic display, and smoke box (not shown), the screen (2810) includes a Cathode Ray Tube (CRT) screen, a Liquid Crystal Display (LCD) screen, a plasma screen, an Organic Light Emitting Diode (OLED) screen, each with or without touch screen input capability, each with or without haptic feedback capability, some of these screens are capable of outputting two-dimensional visual output or more than three-dimensional output by means such as stereoscopic image output.

The computer system (2800) may also include human-accessible storage devices and their associated media, such as optical media (including CD/DVD ROM/RW (2820) or similar media (2821) with CD/DVD), thumb drive (2822), removable hard or solid state drive (2823), conventional magnetic media (such as magnetic tape and floppy disk (not shown)), special purpose ROM/ASIC/PLD based devices (such as secure dongles (not shown)), and so forth.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the presently disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.

The computer system (2800) may also include an interface to connect to one or more communication networks. The network may be, for example, a wireless network, a wired network, an optical network. The network may also be a local network, a wide area network, a metropolitan area network, an internet of vehicles and industrial network, a real time network, a delay tolerant network, and so forth. Examples of networks include local area networks (such as ethernet, wireless LAN), cellular networks (including global system for mobile communications (GSM), third generation mobile communications (3G), fourth generation mobile communications (4G), fifth generation mobile communications (5G), Long Term Evolution (LTE), etc.), television wired or wireless wide area digital networks (including cable, satellite, and terrestrial broadcast television), vehicular and industrial networks (including CANBus), and so forth. Some networks typically require an external network interface adapter that connects to some universal data port or peripheral bus (2849), such as a Universal Serial Bus (USB) port of a computer system (2800); others are typically integrated into the core of the computer system (2800) by connecting to a system bus as described below (e.g., an ethernet interface into a personal computer system or a cellular network interface into a smartphone computer system). Using any of these networks, the computer system (2800) may communicate with other entities. Such communication may be unidirectional, receive-only (e.g., broadcast TV), unidirectional transmit-only (e.g., CAN bus to certain CAN bus devices), or bidirectional communication to other computer systems using a local or wide area digital network. Certain protocols and protocol stacks may be used on each of those networks and network interfaces as described above.

The human interface device, the human accessible storage device, and the network interface may be connected to a core (2840) of the computer system (2800).

The core (2840) may include one or more Central Processing Units (CPU) (2841), Graphics Processing Units (GPU) (2842), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGA) (2843), hardware accelerators for specific tasks (2844), and so forth. These devices, as well as Read Only Memory (ROM) (2845), random access memory (2846), internal mass storage (e.g., internal non-user accessible hard drives, SSDs) (2847), etc., may be interconnected via a system bus (2848). In some computer systems, the system bus (2848) may be accessed in the form of one or more physical plugs, enabling expansion by additional CPUs, GPUs, and the like. The peripheral devices may be connected to the system bus (2848) of the core either directly or through a peripheral bus (2849). The architecture of the peripheral bus includes PCI, USB, etc.

The CPU (2841), GPU (2842), FPGA (2843), and accelerator (2844) may execute certain instructions, which in combination may constitute the aforementioned computer code. The computer code may be stored in ROM (2845) or RAM (2846). Intermediate data may also be stored in RAM (2846), while permanent data may be stored in, for example, internal mass storage (2847). Fast storage and retrieval to any memory device may be achieved through the use of cache memory, which may be closely associated with one or more CPUs (2841), GPUs (2842), mass storage (2847), ROMs (2845), RAMs (2846), and the like.

Computer readable media may have computer code thereon to perform various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present application, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having an architecture (2800), and in particular a core (2840), may provide functionality that is implemented by a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software in one or more tangible computer-readable media. Such computer-readable media may be media associated with user-accessible mass storage as described above, as well as some storage of the non-volatile core (2840), such as core internal mass storage (2847) or ROM (2845). Software implementing embodiments of the present application may be stored in such devices and executed by the kernel (2840). The computer readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (2840), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (2846), and modifying these data structures according to software-defined processes. Additionally or alternatively, the computer system may provide the same functionality as logical hardwired or other components in a circuit (e.g., accelerator 2844), may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to software may include logic and vice versa. Where appropriate, reference to a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) storing executable software, circuitry comprising executable logic, or both. This application includes any suitable combination of hardware and software.

Appendix A: abbreviations

Advanced Motion Vector Prediction (AMVP)

ASIC Application-Specific Integrated Circuit Application Specific Integrated Circuit

BMS benchmark set reference set

CANBus Controller Area Network Bus

Compact Disc (CD) Compact Disc

CPU Central Processing unit of CPU Central Processing Units

CRT Cathode Ray Tube

Coding Tree Blocks of CTBs

Coding Tree Units of CTUs

Coding Unit (CU)

DVD Digital Video Disc

FPGA Field Programmable Gate Array

GOPs group of Pictures

Graphics Processing Units

Global System for Mobile communications (GSM)

HEVC High Efficiency Video Coding

History-based motion vector prediction based on History

HRD, Hypothetical Reference Decoder Hypothetical Reference Decoder

IC Integrated Circuit

Joint Exploration Model (JEM)

Local Area Network (LAN)

Liquid-Crystal Display (LCD)

Long-Term Evolution of Long-Term Evolution

Merge merging mode with motion vector difference for MMVD

Motion Vector of MV

Motion vector difference of Motion vector difference

Motion Vector Predictor (MVP)

OLED Organic Light-Emitting Diode

Prediction of PBs Prediction Blocks

PCI Peripheral Component Interconnect

Programmable Logic Device (PLD)

Prediction unit of Prediction Units of PUs

RAM (Random Access Memory) Random Access Memory

ROM Read-Only Memory

Supplemental Enhancement Information of SEI (supplemental Enhancement Information)

SNR Signal Noise Ratio

Solid-state drive solid-state disk (SSD)

Sub-block-based temporal motion vector prediction based on sub-block

TUs Transform Units, Transform Units

Temporal motion vector prediction for TMVP Temporal motion vector prediction

Universal Serial Bus (USB)

VTM (virtual test model) multifunctional test model

Video Utility Information Video Usability Information

VVC scalable video coding multifunctional video coding

While the present application has described several example embodiments, various alterations, permutations, and various substitutions of the embodiments are within the scope of the present application. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles of the application and are thus within its spirit and scope.

50页详细技术资料下载

Method and apparatus for small sub-block affine inter prediction

相关技术

网友询问留言