Method and apparatus for multi-line intra prediction in video compression

文档序号：958986 发布日期：2020-10-30 浏览：3次中文

阅读说明：本技术 视频压缩中的多线帧内预测的方法和装置 (Method and apparatus for multi-line intra prediction in video compression ) 是由赵亮赵欣李翔刘杉于 2019-06-27 设计创作，主要内容包括：包括一种包括计算机代码的方法和装置,该计算机代码配置为使一个或多个硬件处理器在多个参考线中执行帧内预测,以针对距非零参考线中帧内预测的当前块最近的零参考线设置多个帧内预测模式,并且针对该多个非零参考线中的一个设置一种或多种最可能模式。(A method and apparatus are included that include computer code configured to cause one or more hardware processors to perform intra prediction in a plurality of reference lines, to set a plurality of intra prediction modes for a zero reference line that is closest to a current block that is intra predicted in a non-zero reference line, and to set one or more most probable modes for one of the plurality of non-zero reference lines.)

1. A method of video decoding, comprising:

decoding a video sequence based on intra-prediction contained in a plurality of reference lines of the video sequence;

setting a plurality of intra prediction modes for a zero reference line nearest to the intra predicted current block of a plurality of non-zero reference lines; and

setting one or more most probable modes for one of the plurality of non-zero reference lines.

2. The method of claim 1, further comprising:

signaling a reference line index before signaling a most probable mode flag and an intra mode;

signaling the most probable mode flag in response to determining that the reference line index is signaled and that the signaled index indicates the zero reference line; and

In response to determining that the reference line index is signaled and the signaled index indicates at least one of the plurality of non-zero reference lines, without signaling the most probable mode flag, deriving the most probable mode flag to be true and signaling the most probable mode index for the current block.

3. The method of claim 1, wherein the one or more most probable modes are included in a most probable mode list, and

wherein planar mode and DC mode are excluded from the list of most probable modes.

4. The method of claim 3, further comprising:

setting a length of the most probable mode list based on a reference line index value such that the length of the most probable mode list includes the number of the one or more most probable modes.

5. The method of claim 4, further comprising:

in response to detecting the non-zero reference line, setting the length of the most probable mode list to 1 or 4; and

setting the length of the most probable mode list to 3 or 6 in response to determining that a current reference line is a zero reference line.

6. The method of claim 4, further comprising:

In response to detecting the non-zero reference line, setting the length of the most probable mode list to include 5 most probable modes.

7. The method of claim 1, wherein one of the plurality of non-zero reference lines is adjacent to the current block and is further from the current block than the zero reference line.

8. The method of claim 1, wherein the one or more most probable modes include a first level most probable mode.

9. The method of claim 1, wherein the one or more most probable modes include a most probable mode at any level from a lowest level most probable mode to a highest level most probable mode.

10. The method of claim 1, wherein the one or more most probable modes include only levels of the most probable modes allowed by the non-zero reference line.

11. An apparatus, comprising:

at least one memory configured to store computer program code;

at least one hardware processor configured to access and operate as directed by the computer program code, the computer program code comprising:

Decoding code configured to cause the processor to decode a video sequence by performing intra prediction contained in a plurality of reference lines of the video sequence;

intra-prediction mode code configured to cause the at least one processor to set a plurality of intra-prediction modes for a zero reference line nearest to the intra-predicted current block of a plurality of non-zero reference lines; and

most probable mode code configured to cause the at least one processor to set one or more most probable modes for one of the plurality of non-zero reference lines.

12. The apparatus of claim 11, wherein the program code further comprises signaling code configured to cause the at least one processor to:

signaling a reference line index before signaling a most probable mode flag and an intra mode;

signaling the most probable mode flag in response to determining that the reference line index is signaled and that the signaled index indicates the zero reference line; and

in response to determining that the reference line index is signaled and that the signaled index indicates at least one of the plurality of non-zero reference lines, without signaling a most probable mode flag, the most probable mode flag is pushed to true and the most probable mode index of the current block is signaled.

13. The apparatus of claim 11, wherein the most probable mode code is further configured to cause the at least one processor to:

including the one or more most probable modes in a most probable mode list; and

planar mode and DC mode are excluded from the most probable mode list.

14. The apparatus of claim 13, wherein the most probable mode code is further configured to cause the at least one processor to:

setting a length of the most probable mode list based on a reference line index value such that the length of the most probable mode list includes the number of the one or more most probable modes.

15. The apparatus of claim 14, wherein the most probable mode code is further configured to cause the at least one processor to:

in response to detecting the non-zero reference line, setting the length of the most probable mode list to 1 or 4; and

setting the length of the most probable mode list to 3 or 6 in response to determining that a current reference line is a zero reference line.

16. The apparatus of claim 15, wherein the most probable mode code is further configured to cause the at least one processor to:

In response to detecting the non-zero reference line, setting the length of the most probable mode list to include 5 most probable modes.

17. The apparatus of claim 11, wherein one of the plurality of non-zero reference lines is adjacent to the current block and is further from the current block than the zero reference line.

18. The apparatus of claim 11, wherein the one or more most probable modes comprise a first level most probable mode.

19. The apparatus of claim 11, wherein the one or more most probable modes comprise the most probable mode at any level from the lowest level most probable mode to the highest level most probable mode.

20. A non-transitory computer-readable medium storing a program that causes a computer to execute a process, the process comprising:

decoding a video sequence based on intra-prediction included in a plurality of reference lines of the video sequence;

setting a plurality of intra prediction modes for a zero reference line nearest to the intra predicted current block of a plurality of non-zero reference lines; and

setting one or more most probable modes for one of the plurality of non-zero reference lines.

Technical Field

The present disclosure relates to next generation video coding techniques other than HEVC, and more particularly to improvements, for example, to intra prediction schemes using multiple reference lines.

Background

The video coding standard HEVC (high efficiency video coding) main specification (main profile) has been completed in 2013. Soon after, the international standards organizations ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) began exploring the need to develop future video coding standards that have the potential to significantly improve compression capabilities compared to the current HEVC standard, including its current extensions. In a Joint collaboration called Joint Video exploration team (jfet), groups are jointly conducting this exploration activity to evaluate compression technology designs proposed by experts in the field. Jfet has developed a Joint Exploration Model (JEM) to explore video coding techniques not offered by HEVC, and the current latest version of JEM is JEM-7.1.

The H.265/HEVC (high efficiency video coding) standards were published in 2013 (version 1), 2014 (version 2), 2015 (version 3) and 2016 (version 4) by ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11). Since then, they have been studying the potential need for standardization of future video coding techniques whose compression capabilities significantly exceed those of the HEVC standard (including extensions thereof). In 2017, month 10, they released a joint proposal symptom (CfP, Call for prosusals) on video compression with capabilities exceeding HEVC. By 2018, day 2, month 15, 22 CfP responses for Standard Dynamic Range (SDR), 12 CfP responses for High Dynamic Range (HDR), and 12 CfP responses for 360 video categories were submitted, respectively. In 2018, 4 months, all received CfP responses were evaluated at 122 MPEG/jvt (joint video exploration group-joint video experts group) conference. Through careful evaluation, jfet formally initiated standardization of next generation Video Coding other than HEVC, so-called universal Video Coding (VVC). The current version of the VVC Test Model (VTM) is VTM 1.

Even if multiple wires are available, various technical problems remain in the art. For example, there is a technical problem of finding that the first reference line is still the most selected line. However, each block with the first reference line always needs to signal one bin to indicate the line index of the current block.

Furthermore, multi-line intra prediction is applied only to luma intra prediction. The potential coding gain of multiline intra prediction with chroma components is not utilized.

Furthermore, reference samples with different line indices may have different characteristics, and thus setting the same number of intra prediction modes for different reference lines is not the best choice.

Furthermore, for multi-line intra prediction, pixels of multiple neighboring lines have been stored and accessed. However, the pixels in the current line are not smoothed with the pixels in the adjacent line.

Furthermore, for multi-line intra prediction, the encoder selects one reference line to predict the pixel values in the current block. However, the trend of the neighboring pixels is not utilized to predict the samples in the current block.

Furthermore, for multi-line intra prediction, when the number >1, there is no plane or DC mode. The exploration of other versions of DC or planar modes has not been fully utilized.

In addition, multiline reference pixels are applied for intra prediction. However, there are other places where reference pixels are also used, but the coding gain of the multiline reference pixels is not utilized.

Therefore, a technical solution is required to solve these problems.

Disclosure of Invention

Including a method and apparatus comprising a memory configured to store computer program code and one or more hardware processors configured to access the computer program code and to operate as directed by the computer program code. The computer program includes an intra prediction code, an intra prediction mode code, and a most probable mode code. The intra-prediction code is configured to cause the processor to encode or decode a video sequence by performing intra-prediction included in a plurality of reference lines of the video sequence. The intra prediction mode code is configured to cause the processor to set an intra prediction mode for a first reference line (zero reference line) nearest to a current block intra predicted in a plurality of non-zero reference lines. The most probable mode code is configured to cause the processor to set one or more most probable modes for a second reference line of the plurality of non-zero reference lines.

According to an exemplary embodiment, the program code further includes signaling code. The signaling code is configured to cause the processor to signal a reference line index prior to signaling a most probable mode flag and an intra mode, signal a most probable mode flag in response to determining that the reference line index is signaled and the signaled index indicates a zero reference line, and deduce the most probable mode flag to true without signaling the most probable mode flag and signal a most probable mode index for the current block in response to determining that the reference line index is signaled and the signaled index indicates at least one of a plurality of non-zero reference lines.

According to an exemplary embodiment, the most probable mode code is further configured to cause the processor to include one or more most probable modes in the most probable mode list, and exclude the planar mode and the DC mode from the most probable mode list.

According to an exemplary embodiment, the most probable mode code is further configured to cause the processor to set a length of the most probable mode list based on the reference line index value, such that the length of the most probable mode list comprises a number of the one or more most probable modes.

According to an exemplary embodiment, the most probable mode code is further configured to cause the processor to set a length of the most probable mode list to 1 or 4 in response to detecting the non-zero reference line, and to set the length of the most probable mode list to 3 or 6 in response to determining that the current reference line is a zero reference line.

According to an exemplary embodiment, the most probable mode code is further configured to cause the processor to set a length of the most probable mode list to include the 5 most probable modes in response to detecting the non-zero reference line.

According to an exemplary embodiment, wherein one of the plurality of non-zero reference lines is adjacent to the current block and is further away from the current block than the zero reference line.

According to an exemplary embodiment, the one or more most probable modes include any level of most probable modes from the lowest level most probable mode to the highest level most probable mode.

According to an exemplary embodiment, the one or more most probable modes include only the levels of most probable modes allowed by the non-zero reference line.

Drawings

Further features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and the accompanying drawings, in which:

fig. 1-8 are schematic illustrations according to embodiments.

Fig. 9-14 are simplified flow diagrams according to embodiments.

Fig. 15 is a schematic illustration according to an embodiment.

Fig. 16-25 are simplified flow diagrams according to embodiments.

Fig. 26 is a schematic illustration according to an embodiment.

Detailed Description

The proposed features discussed below can be used alone or in any combination thereof. Furthermore, embodiments may be implemented by processing circuitry (e.g., one or more processors, or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer readable medium. In the present disclosure, a Most Probable Mode (MPM) may refer to a primary MPM, a secondary MPM, or both the primary MPM and the secondary MPM.

Fig. 1 is a simplified block diagram of a communication system 100 in accordance with an embodiment of the present disclosure. The communication system 100 may comprise at least two terminals 102 and 103 interconnected via a network 105. For unidirectional transmission of data, the first terminal 103 may encode video data at a local location for transmission to another terminal 102 over the network 105. The second terminal 102 may receive encoded video data of another terminal from the network 105, decode the encoded data, and display the restored video data. Unidirectional data transmission may be common in media service applications and the like.

Fig. 1 shows a second pair of terminals 101 and 104 provided for supporting a bi-directional transmission of encoded video, which may occur, for example, during a video conference. For bi-directional transmission of data, each terminal 101 and 104 may encode video data captured at a local location for transmission to another terminal via the network 105. Each of the terminals 101 and 104 may also receive encoded video data transmitted by another terminal, may decode the encoded data, and may display the restored video data on a local display device.

In fig. 1, the terminals 101, 102, 103, and 104 may be illustrated as a server, a personal computer, and a smartphone, but the principles of the present disclosure are not limited thereto. Embodiments of the present disclosure are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing devices. Network 105 represents any number of networks that communicate encoded video data between terminals 101, 102, 103, and 104, including, for example, a wired communication network and/or a wireless communication network. The communication network 105 may exchange data in circuit-switched channels and/or packet-switched channels. Representative networks include telecommunications networks, local area networks, wide area networks, and/or the internet. For purposes of this discussion, the architecture and topology of the network 105 may not be important to the operation of the present disclosure, unless explained below.

As an example of an application of the disclosed subject matter, fig. 2 illustrates the placement of a video encoder and decoder in a streaming environment. The disclosed subject matter may be equally applicable to other video-enabled applications including, for example, video conferencing, digital television, storing compressed video on digital media including CDs, DVDs, memory banks, etc., and so forth.

The streaming system may include an acquisition subsystem 203, which acquisition subsystem 203 may include a video source 201 (e.g., a digital camera) that creates, for example, a stream 213 of uncompressed video samples. A sample stream 213 depicted as a thick line to emphasize a high data amount may be processed by an encoder 202 coupled to the camera 201 compared to an encoded video stream. The encoder 202 may comprise hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in greater detail below. The encoded video codestream 204, depicted as a thin line to emphasize lower data amounts, may be stored on the streaming server 205 for future use as compared to the sample stream. One or more streaming clients 212 and 207 may access the streaming server 205 to retrieve a copy 208 and a copy 206 of the encoded video codestream 204. The client 212 may include a video decoder 211, which video decoder 211 decodes an input copy 208 of the encoded video codestream and creates an output stream of video samples 210 that may be presented on a display 209 or other presentation device (not depicted). In some streaming systems, video streams 204, 206, and 208 may be encoded according to some video encoding/compression standard. Examples of such standards are noted above and are further described herein.

Fig. 3 may be a functional block diagram of a video decoder 300 according to an embodiment of the present invention.

The receiver 302 may receive one or more encoded video sequences to be decoded by the decoder 300; in the same or another embodiment, the encoded video sequences are received one at a time, wherein each encoded video sequence is decoded independently of the other encoded video sequences. The encoded video sequence may be received from a channel 301, which channel 301 may be a hardware/software link to a storage device that stores encoded video data. The receivers 302 may receive encoded video data, which may be forwarded to their respective usage entities (not depicted), as well as other data, such as encoded audio data and/or auxiliary data streams. The receiver 302 may separate the encoded video sequence from other data. To prevent network jitter, a buffer memory 303 may be coupled between the receiver 302 and an entropy decoder/parser 304 (hereinafter "parser"). The buffer memory 303 may not need to be configured or may be made smaller when the receiver 302 is receiving data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network. For use on a traffic packet network like the internet, a buffer 303 may also be needed, which may be relatively large and may advantageously be of adaptive size.

The video decoder 300 may comprise a parser 304 to reconstruct symbols 313 from the entropy encoded video sequence. The categories of these symbols include information for managing the operation of the decoder 300, as well as potential information for controlling a rendering device, such as a display 312, which is not an integral part of the decoder, but may be coupled to the decoder. The control Information for the rendering device(s) may be supplemental enhancement Information (SEI message) or Video Usability Information (VUI) parameter set fragment (not depicted). Parser 304 may parse/entropy decode the received encoded video sequence. The encoding of the encoded video sequence may be in accordance with video coding techniques or standards and may follow principles well known to those skilled in the art, including variable length coding, huffman coding, arithmetic coding with or without environmental sensitivity, and the like. The parser 304 may extract a subgroup parameter set for at least one subgroup of subgroups of pixels in a video decoder from the encoded video sequence based on at least one parameter corresponding to the group. A subgroup may include a Group of Pictures (GOP), a picture, a tile, a slice, a macroblock, a Coding Unit (CU), a block, a Transform Unit (TU), a Prediction Unit (PU), and so on. The entropy decoder/parser may also extract information such as transform coefficients, quantizer parameter values, motion vectors, etc. from the encoded video sequence.

Parser 304 may perform entropy decoding/parsing operations on the video sequence received from buffer memory 303, thereby creating symbol 313. Parser 304 may receive the encoded data and selectively decode particular symbols 313. In addition, the parser 304 may determine whether to provide a specific symbol 313 to the motion compensation prediction unit 306, the scaler/inverse transform unit 305, the intra prediction unit 307, or the loop filter 311.

The reconstruction of the symbol 313 may involve a number of different units depending on the type of encoded video picture or portion thereof (e.g., inter and intra pictures, inter and intra blocks), and other factors. Which units are involved and the way in which they are involved can be controlled by the subgroup control information parsed from the coded video sequence by the parser 304. For clarity, such a subgroup control information flow between parser 304 and various elements below is not described.

In addition to the functional blocks already mentioned, the decoder 200 may be conceptually subdivided into a plurality of functional units as described below. In a practical embodiment operating under business constraints, many of these units interact closely with each other and may be at least partially integrated with each other. However, for the purpose of describing the disclosed subject matter, it is appropriate to subdivide the concepts into the following functional units.

The first unit is a sealer/inverse transform unit 305. The sealer/inverse transform unit 305 receives the quantized transform coefficients as symbol(s) 313 from the parser 304 along with control information including which transform to use, block size, quantization factor, quantization scaling matrix, etc. Scaler/inverse transform unit 305 may output a block including sample values, which may be input into aggregator 310.

In some cases, the output samples of sealer/inverse transform 305 may belong to intra-coded blocks, i.e., blocks that do not use prediction information from previously reconstructed pictures, but may use prediction information from previously reconstructed portions of the current picture. Such prediction information may be provided by the intra picture prediction unit 307. In some cases, the intra picture prediction unit 307 uses surrounding reconstructed information extracted from the current (partially reconstructed) picture 309 to generate a block having the same size and shape as the block being reconstructed. In some cases, aggregator 310 adds the prediction information that intra prediction unit 307 has generated to the output sample information provided by scaler/inverse transform unit 305 on a per sample basis.

In other cases, the output samples of sealer/inverse transform unit 305 may belong to inter-coded and possibly motion compensated blocks. In this case, the motion compensated prediction unit 306 may access the reference picture store 308 to extract samples for prediction. After motion compensating the extracted samples according to the sign 313 associated with the block, these samples may be added by the aggregator 310 to the output of the sealer/inverse transform unit (in this case referred to as residual samples or residual signals) to generate output sample information. The address in the reference picture memory from which the motion compensation unit takes the prediction samples may be controlled by a motion vector which may be used by the motion compensation unit in the form of a symbol 313, which symbol 313 may have for example X, Y and a reference picture component. Motion compensation may also include interpolation of sample values extracted from reference picture memory when using sub-sample exact motion vectors, motion vector prediction mechanisms, etc.

The output samples of aggregator 310 may be employed in loop filter unit 311 by various loop filtering techniques. The video compression technique may comprise an in-loop filtering technique controlled by parameters included in the encoded video stream and which may be used as symbols 313 from the parser 304 for the in-loop filtering unit 311. However, the video compression techniques may also be responsive to meta-information obtained during decoding of previous (in decoding order) portions of the encoded picture or encoded video sequence, as well as to previously reconstructed and loop filtered sample values.

The output of the loop filter unit 311 may be a sample stream, which may be output to the rendering device 312 and may be stored in the reference picture memory 557 for subsequent inter-picture prediction.

Some coded pictures, once fully reconstructed, may be used as reference pictures for future prediction. Once the encoded picture is fully reconstructed and identified (e.g., by parser 304) as a reference picture, current reference picture 309 may become part of reference picture buffer 308 and new current picture memory may be reallocated before reconstruction of a subsequent encoded picture begins.

The video decoder 300 may perform decoding operations according to a predetermined video compression technique, which may be recorded in a standard such as the ITU-t h.265 recommendation. The encoded video sequence may conform to the syntax specified by the video compression technique or standard used, in the sense of following the syntax of the video compression technique or standard as specified in the video compression technique document or standard (particularly the configuration file therein). For compliance, the complexity of the encoded video sequence is also required to be within the limits defined by the level of the video compression technique or standard. In some cases, the hierarchy limits the maximum picture size, the maximum frame rate, the maximum reconstruction sampling rate (measured in units of, e.g., mega samples per second), the maximum reference picture size, and so on. In some cases, the limits set by the hierarchy may be further defined by a Hypothetical Reference Decoder (HRD) specification and metadata signaled HRD buffer management in the encoded video sequence.

In one embodiment, receiver 302 may receive additional (redundant) data along with the encoded video. The additional data may be part of the encoded video sequence(s). The video decoder 300 may use the additional data to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be presented in the form of, for example, a temporal, spatial, or signal-to-noise ratio (SNR) enhancement layer, a redundant slice, a redundant picture, a forward error correction code, etc.

Fig. 4 may be a functional block diagram of a video encoder 400 according to an embodiment of the present disclosure.

Encoder 400 may receive video samples from video source 401 (not part of the encoder), and video source 401 may capture video image(s) to be encoded by encoder 400.

The video source 401 may provide a source video sequence in the form of a digital video sample stream to be encoded by the encoder 303, which may have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit, etc.), any color space (e.g., bt.601Y CrCB, RGB, etc.), and any suitable sampling structure (e.g., Y CrCB 4: 2: 0, Y CrCB 4: 4: 4). In the media service system, the video source 401 may be a storage device that stores previously prepared video. In a video conferencing system, the video source 401 may be a camera that captures local image information as a video sequence. Video data may be provided as a plurality of individual pictures that are given motion when viewed in sequence. The picture itself may be constructed as an array of spatial pixels, where each pixel may comprise one or more samples, depending on the sampling structure, color space, etc. used. The relationship between the pixel and the sample can be easily understood by those skilled in the art. The following description focuses on the sample.

According to an embodiment, encoder 400 may encode and compress pictures of a source video sequence into an encoded video sequence 410 in real time or under any other temporal constraints required by the application. It is a function of the controller 402 to implement the appropriate encoding speed. The controller controls other functional units as described below and is functionally coupled to these units. For clarity, the coupling is not labeled in the figures. The parameters set by the controller may include rate control related parameters (picture skip, quantizer, lambda value of rate distortion optimization technique, etc.), picture size, GOP layout, maximum motion vector search range, etc. Other functions of the controller 402 may be readily identified by those skilled in the art, as these functions may relate to the video encoder 400 optimized for a certain system design.

Some video encoders operate in a manner that is readily recognized by those skilled in the art as an "encoding loop". As a simple description, the encoding loop may consist of an encoding part of the encoder 402 (hereinafter referred to as "source encoder") (responsible for creating symbols based on the input picture to be encoded and the reference picture (s)) and a (local) decoder 406 embedded in the encoder 400, which (local) decoder 406 reconstructs the symbols to create sample data that the (remote) decoder will also create (since in the video compression technique considered by the presently disclosed subject matter any compression between the symbols and the encoded video code stream is lossless). The reconstructed sample stream is input to a reference picture memory 405. Since the decoding of the symbol stream produces bit accurate results independent of decoder location (local or remote), the reference picture buffer contents also correspond bit accurately between the local encoder and the remote encoder. In other words, the reference picture samples that the prediction portion of the encoder "sees" are identical to the sample values that the decoder would "see" when using prediction during decoding. This reference picture synchronization philosophy (and the drift that occurs if synchronization cannot be maintained, e.g., due to channel errors) is well known to those skilled in the art.

The operation of the "local" decoder 406 may be the same as the operation of the "remote" decoder 300, which has been described in detail above in connection with fig. 3. Referring briefly also to fig. 4, when symbols are available and the entropy encoder 408 and parser 304 are able to losslessly encode/decode the symbols into an encoded video sequence, the parts including the channel 301, the receiver 302, the buffer 303 and the parser 304 may not be fully implemented in the local decoder 406.

At this point it can be observed that any decoder technique other than the parsing/entropy decoding present in the decoder must also be present in the corresponding encoder in substantially the same functional form. The description of the encoder techniques may be reduced because the encoder techniques are reciprocal to the fully described decoder techniques. A more detailed description is only needed in certain areas and is provided below.

As part of the operation, the source encoder 403 may perform motion compensated predictive coding. The motion compensated predictive coding predictively codes an input frame with reference to one or more previously coded frames from the video sequence that are designated as "reference frames". In this way, the encoding engine 407 encodes the difference between the pixel block of the input frame and the pixel block of the reference frame(s), which may be selected as prediction reference(s) for the input frame.

The local video decoder 406 may decode encoded video data for a frame that may be designated as a reference frame based on the symbols created by the source encoder 403. The operation of the encoding engine 407 may advantageously be a lossy process. When the encoded video data can be decoded at a video decoder (not shown in fig. 4), the reconstructed video sequence may typically be a copy of the source video sequence with some errors. The local video decoder 406 replicates the decoding process that may be performed on the reference frames by the video decoder and may cause the reconstructed reference frames to be stored in the reference picture cache 405. In this way, the encoder 400 can locally store a copy of the reconstructed reference frame that has common content (no transmission errors) with the reconstructed reference frame that will be obtained by the far-end video decoder.

Predictor 404 may perform a prediction search against coding engine 407. That is, for a new frame to be encoded, the predictor 404 may search the reference picture memory 405 for sample data (as candidate reference pixel blocks) or some metadata (e.g., reference picture motion vectors, block shapes, etc.) that may be referenced as appropriate predictions for the new picture. The predictor 404 may operate on a block-by-block basis of samples to find a suitable prediction reference. In some cases, an input picture may have prediction references derived from multiple reference pictures stored in reference picture store 405, as determined by search results obtained by predictor 404.

The controller 402 may manage the encoding operations of the video encoder 403 including, for example, setting parameters and subgroup parameters for encoding the video data.

The outputs of all of the above functional units may be entropy encoded in an entropy encoder 408. The entropy encoder losslessly compresses the symbols generated by the various functional units according to techniques known to those skilled in the art (e.g., huffman coding, variable length coding, arithmetic coding, etc.) to convert the symbols into an encoded video sequence.

The transmitter 409 may buffer the encoded video sequence(s) created by the entropy encoder 408 in preparation for transmission over a communication channel 411, which communication channel 411 may be a hardware/software link to a storage device that will store the encoded video data. The transmitter 409 may combine the encoded video data from the video encoder 403 with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (sources not shown).

The controller 402 may manage the operation of the encoder 400. During encoding, the controller 405 may assign a certain encoded picture type to each encoded picture, which may affect the encoding techniques applicable to the respective picture. For example, a picture may be generally assigned to any of the following frame types:

Intra pictures (I pictures), which may be pictures that can be encoded and decoded without using any other frame in the sequence as a prediction source. Some video codecs tolerate different types of intra pictures, including, for example, Independent Decoder Refresh (IDR) pictures. Those skilled in the art are aware of variations of picture I and their corresponding applications and features.

A predictive picture (P picture), which may be a picture that can be encoded and decoded using intra prediction or inter prediction that uses at most one motion vector and a reference index to predict sample values of each block.

Bi-predictive pictures (B-pictures), which may be pictures that can be encoded and decoded using intra prediction or inter prediction that uses at most two motion vectors and a reference index to predict sample values of each block. Similarly, multiple predictive pictures may use more than two reference pictures and associated metadata for reconstructing a single block.

A source picture may typically be spatially subdivided into blocks of samples (e.g., blocks of 4 × 4, 8 × 8, 4 × 8, or 16 × 16 samples) and encoded block-by-block. These blocks may be predictively encoded with reference to other (encoded) blocks determined according to the encoding allocation applied to the respective pictures of the block. For example, blocks of an I picture may be non-predictive coded, or they may be predictive coded (spatial prediction or intra prediction) with reference to already coded blocks of the same picture. The pixel block of the P picture can be prediction-encoded by spatial prediction or by temporal prediction with reference to one previously-encoded reference picture. A block of a B picture may be prediction-coded by spatial prediction or by temporal prediction with reference to one or two previously-coded reference pictures.

Video encoder 400 may perform encoding operations according to a predetermined video encoding technique or standard, such as the ITU-T h.265 recommendation. In operation, the video encoder 400 may perform various compression operations, including predictive coding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the encoded video data may conform to syntax specified by the video coding technique or standard used.

In one embodiment, the transmitter 409 may transmit additional data along with the encoded video. Video encoder 403 may treat such data as part of an encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data such as redundant pictures and slices, SEI messages, VUI parameter set fragments, etc.

Fig. 5 shows intra prediction modes used in HEVC and JEM. To obtain arbitrary edge directions present in natural video, the number of directional intra modes extends from 33 to 65 used in HEVC. The additional directional modes in JEM at the top of HEVC are depicted in fig. 1(b) with dashed arrows, and the planar mode and DC mode remain unchanged. These denser directional intra prediction modes are applicable to all block sizes, as well as to luma intra prediction and chroma intra prediction. As shown in fig. 5, the directional intra prediction mode associated with the odd intra prediction mode index, which is identified by the dotted arrow, is referred to as an odd intra prediction mode. The directional intra-prediction mode associated with the even intra-prediction mode index identified by the solid arrow is referred to as an even intra-prediction mode. The directional intra prediction mode, as indicated by the solid-line arrow or the dashed-line arrow in fig. 5, is also referred to herein as an angular mode (angular mode).

In JEM, a total of 67 intra prediction modes are used for luminance intra prediction. To encode an intra-mode, a size 6 MPM list is built based on the intra-modes of neighboring blocks. If the intra mode is not from the MPM list, a flag is signaled to indicate whether the intra mode belongs to the selected mode. In JEM-3.0, there are 16 selected modes, which are collectively selected as every fourth angular mode. In JFET-D0114 and JFET-G0060, 16 auxiliary MPMs are derived to replace the uniformly selected mode.

Fig. 6 shows N reference layers utilized in intra directional mode. There is a chunk unit 611, a fragment a 601, a fragment B602, a fragment C603, a fragment D604, a fragment E605, a fragment F606, a first reference layer 610, a second reference layer 209, a third reference layer 608, and a fourth reference layer 607.

In HEVC and JEM, as well as some other standards (e.g., h.264/AVC), the reference samples used to predict the current block are restricted to the nearest reference line (row or column). In the method of multi-reference line intra prediction, the number of candidate reference lines (rows or columns) is increased from 1 (i.e., most recent) to N for intra directional mode, where N is an integer greater than or equal to 1. Fig. 2 illustrates the concept of the multi-line intra directional prediction method, with a 4 × 4 Prediction Unit (PU) as an example. The intra directional mode may arbitrarily select one of the N reference layers to generate the predictor. In other words, the predictor p (x, y) is generated from one of the reference samples S1, S2. A flag is signaled indicating which reference layer was selected for intra directional mode. If N is set to 1, the intra directional prediction method is the same as the conventional method in JEM 2.0. In fig. 6, the reference lines 610, 609, 608 and 607 are composed of six segments 601, 602, 603, 604, 605 and 606 and the top left reference sample. The reference layer is also referred to herein as a reference line. The coordinates of the upper-left pixel within the current block unit are (0, 0), and the coordinates of the upper-left pixel in the first reference line are (-1, -1).

In JEM, for the luminance component, neighboring samples for intra prediction sample generation are filtered prior to the generation process. The filtering is controlled by the given intra prediction mode and transform block size. If the intra prediction mode is DC or the transform block size is equal to 4 × 4, the neighboring samples are not filtered. If the distance between a given intra-prediction mode and a vertical mode (or horizontal mode) is greater than a predefined threshold, then the filtering process is enabled. For adjacent sample filtering, a [1, 2, 1] filter and a bilinear filter are used.

A position-dependent intra prediction combination (PDPC) method is an intra prediction method that invokes a combination of unfiltered boundary reference samples and HEVC-type intra prediction with filtered boundary reference samples. The calculation of each prediction sample pred [ x ] [ y ] at (x, y) is as follows:

pred[x][y]＝(wL*R_-1,y+wT*R_x,-1+wTL*R_-1,-1+(64-wL-wT-wTL)*pred[x][y]+32)＞＞6

(equation 2-1)

Wherein R is_x，-1，R_-1，yRespectively representing unfiltered reference samples located at the top and left of the current sample (x, y), and R_-1，-1Representing a position in the upper left corner of the current blockOf the reference sample. The weights are calculated as follows:

wT 32 > ((y < 1) > shift) (equation 2-2)

wL 32 > ((x < 1) > shift) (equation 2-3)

wTL ═ wL > 4) - (wT > 4 (EQUATION 2-4)

shift (log2(width) + log2(height) +2) > 2 (equation 2-5)

Fig. 7 shows a graph 700 of the weights (wL, wT, wTL) for the (0, 0) and (1, 0) positions within a 4 x 4 block.

Fig. 8 shows a Local Illumination Compensation (LIC) diagram 800 using a scaling factor a and an offset b, and fig. 8 is based on a linear model for illumination variation. And adaptively enabling or disabling local illumination compensation for each inter-mode coded Coding Unit (CU).

When LIC is applied to a CU, parameters a and b are derived using a least squares error method using neighboring samples of the current CU and their corresponding reference samples. More specifically, as shown in fig. 8, neighboring samples of sub-samples (2: 1 sub-samples) of a CU and corresponding samples in a reference picture (identified by motion information of the current CU or sub-CU) are used. IC parameters are derived and applied separately for each prediction direction.

When a CU is encoded in merge mode, the LIC flag is copied from neighboring blocks in a similar way as the motion information copy in merge mode; otherwise, the LIC flag is signaled to the CU to indicate whether LIC is applied.

Fig. 9 shows a flowchart 900 according to an example embodiment.

At block S901, for multi-line intra prediction, the number of reference layers may be adaptively selected for each block instead of setting the same number of reference layers for all blocks. Here, the index of the nearest reference line is denoted as 1.

At block S902, the number of reference layers of the current block may be determined using the block sizes of the upper/left blocks. For example, if the size of the upper and/or left block is greater than M × N, the number of reference layers of the current block is limited to L. M and N may be 4, 8, 16, 32, 64, 128, 256, and 512. L may be 1 to 8.

In one embodiment, L is set to 1 when M and/or N is equal to or greater than 64.

In another embodiment, the ratio of the number of upper candidate reference lines to the number of left candidate reference columns is the same as the ratio of block width to block height. For example, if the size of the current block is MxN, the number of candidates for the upper reference line is M, and the number of candidates for the left reference column is N, then M: N is M: N.

Alternatively, at block S903, the number of reference layers of the current block may be determined using the positions of the last coefficients of the blocks on the left and top. For example, if the position of the last coefficient is within the first MxN region for the upper and/or left block, the number of reference layers for the current block is limited to L (e.g., L may be 1 ~ 8), and M and N may be 1 ~ 1024.

In one embodiment, when there are no coefficients in the block above and/or on the left, the number of reference layers for the current block is limited to 1.

In another embodiment, when the coefficients in the upper and/or left blocks are within the 2 x 2 upper-left region, the number of reference layers for the current block is limited to 1-2.

Alternatively, at block S904, the number of reference layers of the current block may be determined using pixel values of reference samples in the upper and/or left block. For example, if the index is L_iIs given by reference line and index L_j(L_i<L_j) Will be deleted from the reference line list if the difference between the reference lines of (1) is small_j。L_iAnd L_jCan be 1 to 8. In some cases, a number of reference lines greater than 1 will be deleted since the difference between all reference lines is small. Methods of measuring the difference between two reference lines include, but are not limited to, gradient, SATD, SAD, MSE, SNR, and PSNR.

In one embodiment, if L_iAnd L_jIs less than 2, the reference line L is deleted from the reference line list_j。

Alternatively, at block S905, the number of reference layers of the current block may be determined using the prediction mode of the mode information on the upper and/or left side.

In one embodiment, if the prediction mode of the block above and/or left is the skip mode, the number of reference layers of the current block is limited to L. L may be 1 to 8.

Fig. 10 shows a flowchart 1000 according to an exemplary embodiment.

At block S1001, for either a separate tree or the same tree, the reference line index for chroma may be derived from luma. Here, the index of the nearest reference line is denoted as 1.

At block S1002, for the same tree, if the reference line index of the co-located luma block is ≧ 3, the reference line index of the current chroma block is set to 2. Otherwise, the reference line index of the current chroma block is set to 1.

At block S1003, for the split tree, if the chroma block covers only one block in the luma component, the derivation algorithm of the reference line index is the same as 2. a. If the chroma block covers multiple blocks in the luma component, the derivation algorithm of the reference line index may be one of:

for co-located blocks in the luma component, if the reference line indices of most of these blocks are less than 3, the reference line index of the current chroma block is derived to be 1. Otherwise, the reference line index of the current chroma block is derived to be 2. Most methods of measuring may include, but are not limited to, the region size of the block and the number of blocks.

Alternatively, for a co-located block in the luminance component, if the reference line index of one block is equal to or greater than 3, it is derived that the reference line index of the current chrominance block is 2. Otherwise, the reference line index of the current chroma block is derived to be 1.

Alternatively, for co-located blocks in the luma component, if the reference line indices of most of these blocks are less than 3, the reference line index of the current chroma block is derived to be 1. Otherwise, the reference line index of the current chroma block is derived to be 2.

Alternatively, at block S1004, consider whether to use selfAdaptive selection, and if adaptive selection is used, the method in fig. 9 may also be used to limit the number of reference layers for the current chroma block. After applying the method of fig. 9, the number of reference layers is set to L_C1. Then, the derivation algorithm in S1002 and S1003 or S1005 and S1006 in fig. 10 is also applied to obtain the derivation algorithm for the current block L_C2The line index of (c). Then, min (L)_C1，L_C2) Is the final reference line index of the current chroma block.

Fig. 11 shows a flowchart 1100 according to an example embodiment.

At block S1101, different reference lines are considered to have different numbers of intra prediction modes. Here, the index of the nearest reference line is denoted as 1.

For example, the first reference line has 67 patterns, the second reference line has 35 patterns, the third reference line has 17 patterns, and the fourth reference line has 9 patterns.

For example, the first reference line has 67 patterns, the second reference line has 33 patterns, the third reference line has 17 patterns, and the fourth reference line has 9 patterns.

Alternatively, at block S1102, reference lines with indices greater than 1 share the same number of intra modes, but are much fewer than the number of modes of the first reference line, e.g., equal to or less than half of the intra prediction mode of the first reference line.

At block S1103, for example, for reference lines with indices greater than 1, only directional intra prediction modes with even mode indices are allowed. As shown in fig. 5, directional intra-prediction modes with odd mode indices are marked with dashed arrows, while directional intra-prediction modes with even mode indices are marked with solid arrows.

At block S1104, in another example, for reference lines with an index greater than 1, only directional intra prediction modes with even mode indices and DC and planar modes are allowed.

In block S1105, in another example, the non-zero reference lines only allow Most Probable Mode (MPM), which includes a first level MPM and a second level MPM.

At block S1106, in another example, since only the reference line indices greater than 1 are enabled for the even (or odd mode) intra prediction modes, when encoding the intra prediction mode, if the reference line indices greater than 1 are signaled, the intra prediction modes (e.g., the plane/DC intra prediction mode, and the odd (or even) intra prediction modes) are excluded from the MPM derivation sum list, from the second level MPM derivation sum list, and from the remaining non-MPM mode list.

At block S1107, after the intra prediction mode is signaled, the reference line index is signaled, and whether the reference line index is signaled depends on the signaled intra prediction mode.

For example, for reference lines with an index greater than 1, only directional intra prediction modes with even mode indices are allowed. If the signaled intra prediction mode is directional prediction with even mode index, the selected reference line index is signaled. Otherwise, only one default reference line (e.g., the most recent reference line) is allowed to intra predict and no index is signaled.

In another example, for reference lines with an index greater than 1, only the Most Probable Mode (MPM) is allowed. If the signaled intra prediction is from the MPM, the selected reference line index needs to be signaled. Otherwise, only one default reference line (e.g., the most recent reference line) is allowed to intra predict and no index is signaled.

In another sub-embodiment, reference lines with indices greater than 1 are still enabled for all directional intra-prediction modes or all intra-prediction modes, and the intra-prediction mode index may be referenced to the context in which the reference line index is entropy encoded.

In another embodiment, for reference lines with an index greater than 1, only the Most Probable Mode (MPM) is allowed. In one approach, all MPMs are allowed to be available for reference lines with indices greater than 1. In another approach, for multiple reference lines with indices greater than 1, a subset of MPMs is allowed to be employed. When the MPMs are divided into multiple levels, in one approach, for reference lines with indices greater than 1, only certain levels of MPMs are allowed to be employed. In one example, for reference lines with an index greater than 1, only the lowest level MPM is allowed to be employed. In another example, for reference lines with an index greater than 1, only the highest level MPM is allowed to be employed. In another example, for reference lines with an index greater than 1, only predefined (or signaled, indicated) levels of MPM are allowed to be employed.

In another embodiment, for reference lines with an index greater than 1, only non-MPM is allowed. In one approach, for reference lines with indices greater than 1, all non-MPMs are allowed to be employed. In another approach, for multiple reference lines with indices greater than 1, a subset of non-MPMs is allowed to be employed. In one example, for reference lines with indices greater than 1, only non-MPMs associated with even (or odd) indices in descending (or ascending) order of all non-MPM intra-mode indices are allowed to be employed.

In another embodiment, the planes and DC modes are assigned predefined indices of the MPM mode list.

In one example, the predefined index also depends on coding information including, but not limited to, block width and height.

In another sub-embodiment, for reference lines with indices greater than 1, MPM with a given index is allowed. A given MPM index may be signaled or specified as a high level syntax element, for example in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, or as a common syntax element or parameter for a region of a picture. The reference line index is signaled only when the intra-mode of the current block is equal to one of the given MPM indices.

For example, the length of the MPM list is 6, and the indexes of the MPM list are 0, 1, 2, 3, 4, and 5. If the intra-mode of the current block is not equal to the modes with MPM indices of 0 and 5, a reference line with an index greater than 1 is allowed.

At block S1108, in one embodiment, all intra prediction modes are allowed for the nearest reference line of the current block, while only the most probable mode is allowed for reference lines with indices greater than 1 (or a particular index value, e.g., 1).

At block S1109, in one embodiment, the most probable mode includes only first level MPMs, e.g., 3 MPMs in HEVC, 6 MPMs in JEM (or VTM).

At block S1110, in another embodiment, the most probable mode may be any level of MPMs from the lowest level of MPMs to the highest level of MPMs.

At block S1111, in another embodiment, for reference lines with an index greater than 1, only certain levels of MPM are allowed.

At block S1112, in another embodiment, the most probable mode may be only one level of MPM, such as the lowest level of MPM, the highest level of MPM, or a predefined level of MPM.

At block S1113, in another embodiment, the reference line index is signaled prior to the MPM identification and intra mode. When the signaled reference line index is 1, the MPM flag is also signaled. When the signaled reference line index is greater than 1, the MPM identity of the current block is not signaled, and the MPM identity of the current block is derived to be 1. For reference lines with an index greater than 1, the MPM index of the current block is still signaled.

At block S1114, in one embodiment, the MPM list generation process depends on the reference line index value.

In one example, the MPM list generation procedure for reference lines with indices greater than 1 is different from the MPM list generation procedure for reference lines with indices equal to 1. For reference lines with an index greater than 1, the planar mode and the DC mode are excluded from the MPM list. The length of the MPM list is the same for all reference lines.

The default MPM used in the MPM list generation process depends on the reference line index. In one example, the default MPM associated with reference lines having an index greater than 1 is different from the default MPM associated with reference lines having an index equal to 1.

At block S1115, in one embodiment, the length of the MPM list (i.e., the MPMs) depends on the reference line index value.

In another embodiment, the length of the MPM list having the reference line index value of 1 and the reference line index value greater than 1 is set to be different. For example, the length of the MPM list of reference lines having an index greater than 1 is 1 or 2 shorter than the length of the MPM list having a reference line index of 1.

In another embodiment, for reference line indices greater than 1, the length of the MPM list (i.e., the number of MPMs) is 5. When 65 corner modes are applied, the default MPM for the MPM list generation process is { VER, HOR, 2, 66, 34 }. The order of the default MPMs may be any combination of the 5 listed modes.

At block S1116, for angular intra prediction modes (e.g., odd directional intra prediction mode, and/or Planar (Planar)/DC mode) for which reference line indices have been signaled, a predictor for the current block may be generated using multi-line reference samples.

For angular intra prediction modes for which reference line indices have been derived (not signaled), the prediction sample values are generated using a weighted sum of a plurality of predictors, each of which is a predictor generated using one of a plurality of reference lines.

In one example, the weighted sum uses {3, 1} weights applied to predictors generated from the first reference line and the second reference line, respectively.

In another example, the weights depend on block size, block width, block height, sample position within the current block to be predicted, and/or intra prediction mode.

In one example, for a given angular prediction mode with odd indices, the first reference line is used to generate one prediction block unit Pred₁And the second reference line is used to generate another prediction block unit. Then, the final prediction value of each pixel in the current block unit is a weighted sum of the two generated prediction block units. This process can be represented by equation (4-1), where W is for all pixels in the same block_iThe values of (a) are the same. For different blocks, W_iMay be the same (independent of intra prediction mode and block size) or may depend on intra prediction mode and block size.

Alternatively, at block S1117, the number of intra prediction modes for each reference line is derived from the difference between the reference samples in that reference line. Methods of measuring differences include, but are not limited to, gradient, SATD, SAD, MSE, SNR, and PSNR.

If the upper row and the left column of the reference sample are very similar, the number of patterns can be reduced to 4, 9, 17 or 35 patterns. These 4 modes are a planar mode, a DC mode, a vertical mode, and a horizontal mode.

If only the upper row of the reference sample is very similar, the mode in the vertical-like prediction mode is downsampled. In a special case, only the mode 50 is retained, and the modes 35 to 49 and 51 to 66 are excluded. In order to make the total intra prediction mode 9, 17, or 35, the intra prediction modes in the horizontal direction are reduced accordingly.

Otherwise, if only the left columns of the reference sample are very similar, the pattern in the horizontal-like direction is downsampled. In a special case, only the mode 18 is retained, and the modes 2 to 17 and 19 to 33 are excluded. In order to make the total intra prediction mode 9, 17, or 35, the intra prediction mode in the vertical direction is reduced accordingly.

Fig. 12 shows a flowchart 1200 according to an example embodiment.

At block S1201, each sample in the current reference line is smoothed based on the neighboring sample in the current line and its neighboring reference line (S) of the current line. Here, the index of the nearest reference line is denoted as 1.

At block S1202, for each pixel in the current line, all pixels in the reference lines 1-L may be used to smooth the pixels in the current line. L is the maximum allowed reference line number for intra prediction, and L can be 1 ~ 8.

At block S1203, for the boundary pixels, they may or may not be filtered. If the boundary pixels are filtered, each boundary pixel in the same reference line uses the same filter. Boundary pixels in different reference lines may use different filters. For example, boundary pixels in a first reference line may be filtered by a [3, 2, 2, 1] filter, boundary pixels in a second reference line may be filtered by a [2, 3, 2, 1] filter, boundary pixels in a third reference line may be filtered by a [1, 2, 3, 2] filter, and boundary pixels in a fourth reference line may be filtered by a [1, 2, 2, 3] filter.

At block S1204, for other pixels, the pixels in each reference line may use the same filter, while the pixels in different reference lines may use different filters. Alternatively, different filters may be used for pixels in different locations for other pixels. But these filters are predefined and the encoder does not need to signal the index of the filter.

At block S1205, the filtering operation for each reference line may alternatively be an intra prediction mode and depend on the transform size. The filtering operation is enabled only when the intra-prediction mode and the transform size satisfy certain conditions. For example, when the transform size is equal to 4x4 or less, the filtering operation is disabled.

At block S1206, alternatively, the filter used to smooth each pixel may have an irregular filter support shape in addition to a rectangular shape. The filter support shape may be predefined and may depend on any information available to both the encoder and decoder, including but not limited to: reference line index, intra mode, block height and/or width.

Alternatively, at block S1207, for each pixel in the first reference line, the pixel may be smoothed using the pixels in the first and second reference lines. For each pixel in the second reference line, the pixel may be smoothed using the pixels in the first reference line, the second reference line, and the third reference line. For each pixel in the third reference line, the pixel in the second reference line, the third reference line, and the fourth reference line may be used to smooth the pixel. For each pixel in the fourth reference line, the pixel in the third reference line and the fourth reference line may be used to smooth the pixel. In other words, for pixels in the first and fourth reference lines, each pixel is filtered using pixels in two reference lines, and for pixels in the second and third reference lines, each pixel is filtered using pixels in three reference lines.

For example, the filtered pixels in the second and third reference lines may be calculated according to equations 4-2 through 4-5.

p′(x,y)＝(p(x-1,y)+p(x,y-1)+p(x,y+1)+p(x+1,y)+4*p(x,y))＞＞3

(equation 4-2)

p' (x, y) ═ p (x, y +1) -p (x, y-1) + p (x, y)) (equation 4-3)

p′(x,y)＝(p(x-1,y)+p(x-1,y-1)+p(x-1,y+1)+p(x,y-1)+p(x,y+1)+p(x+1,y-1)+p(x+1,y)+p(x+1,y+1)+8*p(x,y))＞＞4

(equations 4-4)

The filtered pixels in the first reference line may be calculated according to equations 4-6 through 4-10.

p′(x,y)＝(p(x-1,y)+p(x,y-1)+p(x+1,y)+5*p(x,y))＞＞3

(equations 4 to 6)

p′(x,y)＝(p(x-1,y)+p(x,y-1)+p(x+1,y)+p(x,y))＞＞2

(equations 4 to 7)

p′(x,y)＝(2p(x,y)-p(x,y-1))

(equations 4 to 8)

p′(x,y)＝(p(x-1,y)+p(x-1,y-1)+p(x,y-1)+p(x+1,y-1)+p(x+1,y)+3*p(x,y))＞＞3

(equations 4 to 9)

The filtered pixels in the fourth reference line may be calculated according to equations 4-11 through 4-15.

p′(x,y)＝(p(x-1,y)+p(x,y+1)+p(x+1,y)+5*p(x,y))＞＞3

(equations 4 to 11)

p′(x,y)＝(p(x-1,y)+p(x,y+1)+p(x+1,y)+p(x,y))＞＞2

(equations 4 to 12)

p′(x,y)＝(2p(x,y)-p(x,y+1))

(equations 4 to 13)

p′(x,y)＝(p(x-1,y)+p(x-1,y+1)+p(x,y+1)+p(x+1,y+1)+p(x+1,y)+3*p(x,y))＞＞3

(equations 4 to 14)

In addition, rounding (e.g., to zero, to positive infinity, or to negative infinity) may be added to the above calculations.

Fig. 13 shows a flowchart 1300 according to an example embodiment.

At block S1301, samples at different positions in the current block may use different combinations of reference samples with different cue predictions. Here, the index of the nearest reference line is denoted as 1.

At block S1302, for a given intra prediction mode, each reference line i may generate one prediction block Pred_i. For each pixel of the position, the mode may use these generated prediction blocks Pred_iTo generate a final prediction block. Specifically, for the pixel at position (x, y), the predicted value may be calculated using equations 4-16.

Wherein, W_iDepending on the location. In other words, the weighting factors for the same location are the same, and the weighting factors for different locations are different.

Alternatively, given an intra prediction mode, for each sample, a reference sample set is selected from a plurality of reference lines, and a weighted sum of these selected reference sample sets is calculated as the final prediction value. The selection of the reference samples may depend on the intra mode and the position of the prediction samples, and the weights may depend on the intra mode and the position of the prediction samples.

At block S1303, when applying reference line x to intra prediction, for each sample, the prediction values of reference line 0 and reference line x are compared, and if reference line 1 generates a very different prediction value, the prediction value from reference line x is excluded and reference line 0 may be used instead. Methods of measuring the difference between the predicted value of the current position and the predicted value of the neighboring position of the current position include, but are not limited to, gradient, SATD, SAD, MSE, SNR, and PSNR.

Alternatively, more than two predicted values are generated from different reference lines and the median (or average or most frequently occurring value) is used as the prediction sample.

At block S1304, when applying reference line x for intra prediction, the prediction values of reference line 1 and reference line x are compared for each sample, and if reference line 1 generates a very different prediction value, the prediction value from reference line x is excluded and reference line 1 may be used instead. Methods of measuring the difference between the predicted value of the current position and the predicted value of the neighboring position of the current position include, but are not limited to, gradient, SATD, SAD, MSE, SNR, and PSNR.

Alternatively, more than two predicted values are generated from different reference lines and the median (or average or most frequently occurring value) is used as the prediction sample.

Fig. 14 shows a flowchart 1400 according to an example embodiment.

At block S1401, after intra prediction, the prediction value of each block is filtered using pixels in a plurality of lines, instead of using only pixels in the nearest reference line. Here, the index of the nearest reference line is denoted as 1.

For example, at block S1402, the PDPC may be extended for multi-line intra prediction. The calculation of each prediction sample pred [ x ] [ y ] at (x, y) is as follows:

wherein m may be-8 to-2.

In one example, the samples in the current block are filtered using the reference samples in the two nearest lines. For the top left pixel, only the top left sample in the first row is used. Which can be represented by equations 4-18.

Alternatively, at block S1403, the boundary filter may be extended to multiple lines.

After DC prediction, filtering is performed by neighboring reference pixels for pixels in the first few columns and rows. The pixels in the first column may be filtered by equations 4-19.

For the pixels in the first row, the filtering operation is as follows:

In some special cases, the pixels in the first column may be filtered by equations 4-21.

p′(0,y)＝p(0,y)+R_-1,y-R_-2,y(equations 4 to 21)

The pixels in the first column may also be filtered by equations 4-22.

p′(x,0)＝p(x,0)+R_x,-1-R_x,-2(equations 4 to 22)

After vertical prediction, the pixels in the first few columns may be filtered by equations 4-23.

After horizontal prediction, the pixels in the first few rows may be filtered by equations 4-24.

In another embodiment, for vertical/horizontal prediction, if a reference line with index greater than 1 is used to generate the prediction samples, then the first column/row and its corresponding line with index greater than 1 are used for boundary filtering. As shown in fig. 15, by means of reference lines 1503, 1502 and a block unit 1501, a second reference line 1503 is used to generate prediction samples of the current block unit, and pixels having a vertical direction are used for vertical prediction. After vertical prediction, the first few columns in the current block unit are filtered using the pixels with diagonal texture in reference line 1 and the pixels with diagonal texture in reference line 1503. The filtering process may be represented by equations 4-25, where m represents the selected line index, and m may be 2-8. n is a right shift number, and n may be 1 to 8.

p' (x, y) ═ p (x, y) + (p (-1, y) -p (-1, -m)) > n (equations 4-25)

For horizontal prediction, the filtering process can be represented by equations 4-26.

p' (x, y) ═ p (x, y) + (p (x, -1) -p (-m, -1)) > n (equations 4-26)

In another embodiment, when using reference lines with indices greater than 1, pixels in the first few columns/rows of the current block unit are filtered using pixels along the diagonal direction from the first reference line to the current reference line after diagonal prediction (e.g., mode 2 and mode 34 in FIG. 1 (a)). Specifically, after mode 2 prediction, pixels in the first few rows may be filtered by equations 4-27. After mode 34 prediction, the pixels in the first few columns may be filtered by equations 4-28. m denotes a reference line index of the current block, and m may be 2 ~ 8. n is a right shift number, and n may be 2 to 8. W_iIs a weighting coefficient, and W_iIs an integer.

Fig. 16 shows a flowchart 1600 according to an example embodiment.

At block S1601, for multi-reference line intra prediction, when the reference line index is greater than 1, a modified DC mode and a planar mode are added. Here, the index of the nearest reference line is denoted as 1.

At block S1602, for planar mode, when different reference lines are used, different predefined upper-right and lower-left reference samples are used to generate prediction samples.

At block S1603, alternatively, when different reference lines are used, different intra smoothing filters are used.

At block S1604, for the DC mode, for the first reference line, the DC value is calculated using all pixels in the upper row and the left column, and when the reference line index is greater than 1, the DC value is calculated using only some of the pixels.

For example, the DC value of the second reference line is calculated using the upper pixels in the first reference line, the DC value of the third reference line is calculated using the left pixels in the first reference line, and the DC value of the fourth reference line is calculated using half of the upper pixels and half of the left pixels in the first reference line.

At block S1605, for DC mode, a DC predictor is computed using all reference pixels in all available candidate lines (rows and columns).

Fig. 17 shows a flowchart 1700 according to an example embodiment.

At block S1701, implementation is to extend multiple reference lines to the IC mode. At block S1702, the IC parameters are calculated using multiple upper/left reference lines, and at block S1703, which reference line was used is signaled to calculate the IC parameters.

Fig. 18 shows a flowchart 1800 according to an example embodiment.

At block S1801, implementations are made to signal a plurality of reference line indices.

In one embodiment, at block S1802, the reference line index is signaled using variable length coding. The closer to the current block, the shorter the codeword. For example, if the reference line indexes are 0, 1, 2, and 3, where 0 is closest to the current block and 3 is farthest from the current block, their codewords are 1, 01, 001, and 000, where 0 and 1 may be used alternately.

In another embodiment, at S1806, the reference line index is signaled using a fixed length code. For example, if the reference line indexes are 0, 1, 2, 3, where 0 is closest to the current block and 3 is farthest from the current block, their codewords are 10, 01, 11, 00, where 0 and 1 may be used alternately and the order may be changed.

At block S1803, considering whether the codeword table is used in various ways, if not, at block S1804, in yet another embodiment, the reference line indices are signaled using variable length coding, where the order of indices in the codeword table (from shortest codeword to longest codeword) is as follows: 0. 2, 4, … …, 2k, 1, 3, 5, … …, 2k +1 (or 2 k-1). Index 0 indicates the reference line closest to the current block, and index 2k +1 indicates the reference line farthest from the current block.

In yet another embodiment, at block S1805, the reference line indices are signaled using variable length coding, where the order of the indices in the codeword table (from shortest codeword to longest codeword) is as follows: nearest, farthest, second nearest, second farthest, … …, etc. In a specific example, if the reference line index is 0, 1, 2, 3, where 0 is closest to the current block and 3 is farthest from the current block, the codeword for index 0 is 0, the codeword for index 3 is 10, the codeword for index 2 is 110, and the codeword for index 1 is 111. The codewords of reference line indices 1 and 2 may be switched. 0's and 1's in the codeword may change.

Fig. 19 shows a flowchart 1900 according to an example embodiment.

At block S1901, when the number of reference lines (rows) above is different from the number of reference lines (columns) on the left, a multi-reference line index is signaled.

At block S1902, in one embodiment, if the number of reference lines (rows) above is M and the number of reference lines (columns) to the left is N, the reference line index for max (M, N) may use any of the above methods or a combination thereof. min (M, N) reference line index a subset of codewords, typically shorter codewords, is extracted from the codewords indicating the reference line index of max (M, N). For example, if M is 4, N is 2, and the codeword used to signal M (4) reference line indices {0, 1, 2, 3} is 1, 01, 001, 000, the codeword used to signal N (2) reference line indices {0, 1} is 1, 01.

In another embodiment, at block S1903, if the number of reference lines (rows) above is M, the number of reference lines (columns) on the left is N, and if M and N are different, the reference line indices for signaling the reference line (row) index above and the reference line (column) index on the left may be separate, and any one or a combination of the above methods may be used independently.

Fig. 20 shows a flowchart 2000 according to an exemplary embodiment.

At block S2000, consider the number of reference lines found in various encoding tools, and at block S2001, the maximum number of reference lines available for intra prediction may be limited to not exceed the number of reference lines used in other encoding tools (e.g., deblocking filters or template matching based intra prediction) to potentially save pixel line buffering.

Fig. 21 shows a flowchart 2100 according to an example embodiment.

At block S2100, interaction between multi-line intra prediction and other coding tools/modes is enabled.

For example, at block S2101, in one embodiment, the use and/or signaling of other syntax elements/coding tools/modes may depend on the multi-line reference line index, including but not limited to: coding Block Flag (CBF), last position, transform skip, transform type, secondary transform index, primary transform index, PDPC index.

At block S2102, in one example, when the multiline reference index is non-zero, no transform skip is used and no transform skip identification is signaled.

At block S2103, in another example, the context used to signal other encoding tools (e.g., transform skip, CBF, primary transform index, secondary transform index) may depend on the value of the multi-line reference index.

At block S2104, in another embodiment, the multi-line reference index may be signaled after other syntax elements, including but not limited to: CBF, last position, transform skip, transform type, secondary transform index, primary transform index, PDPC index, and the use and/or signaling of the multi-line reference index may depend on other syntax elements.

Fig. 22 shows a flowchart 2200 in accordance with an example embodiment.

At block S2201, it is considered to obtain a reference line index, and at block S2202, the reference line index may be used as context for entropy encoding another syntax element including, but not limited to, an intra prediction mode, an MPM index, a primary transform index, a secondary transform index, a transform skip flag, a CBF, and transform coefficients, or vice versa.

Fig. 23 shows a flowchart 2300, according to an example embodiment.

At block S2301, it is proposed to include the reference line information in the MPM list. That is, if the prediction mode of the current block is the same as one candidate in the MPM list, both the intra prediction and the selected reference line among the selected candidates are applied to the current block, and the intra prediction mode and the reference line index are not signaled. Furthermore, the number of MPM candidates for different reference line indices is predefined. Here, the nearest reference line is denoted as 1.

At block S2302, in one embodiment, the number of MPMs per reference line index is predefined and may be signaled as a higher level syntax element, such as a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a tile header, a Coding Tree Unit (CTU) header, or as a general syntax element or parameter for a picture region. Thus, the length of the MPM list may be different in different sequences, pictures, slices, tiles, groups of coding blocks, or regions of a picture.

For example, the number of MPMs of reference line index 1 is 6, and the number of MPMs of each of the other reference line indexes is 2. Therefore, if the total number of reference lines is 4, the total number of MPM lists is 12.

In another embodiment, at block S2303, all intra prediction modes and their reference threads in the upper, left, upper right and lower left blocks are included in the MPM list. By means of the diagram 2400 in fig. 24, all neighboring blocks of the current block unit are shown, wherein a is the bottom left block, B, C, D and E are the left blocks, F is the top left block, G and H are the top blocks, and I is the top right block. After adding the modes of the neighboring blocks to the MPM list, if the number of MPM candidates having a given reference line number is less than a predefined number, the MPM list is populated using a default mode.

In another embodiment, at block S2304, if the mode of the current block is equal to one of the candidates in the MPM list, the reference line index is not signaled. If the mode of the current block is not equal to any candidate in the MPM list, the reference line index is signaled.

In one example, if line 1 is used for the current block, the second level MPM mode is still used, but the second level MPM includes only intra prediction mode information.

In another example, for other lines, the second level MPM is not used and the remaining modes are encoded using fixed length coding.

Fig. 25 shows a flowchart 2500 in accordance with an example embodiment.

At block S2501, in the current VVC test mode VTM-1.0, the chroma intra coding mode is the same as the mode in HEVC (including Derived (DM) mode (direct copy of luma mode) and 4 other angular intra prediction modes), and in the current BMS-1.0, the Cross Component Linear Model (CCLM) mode is also applicable to chroma intra coding. The CCLM modes include an LM mode, a multi-model LM (MMLM) mode, and 4 multi-filter LM (MFLM) modes, and thus, when the CCLM mode is not enabled, only the DM mode is used for the chroma blocks, and when the CCLM mode is enabled, only the DM mode and the CCLM mode are used for the chroma blocks.

At block S2502, in one embodiment, only one DM mode is used for chroma blocks, and no identification for chroma blocks is signaled, the chroma mode being derived as a DM mode.

In another embodiment, at block S2503, only one DM mode and one CCLM mode are used for chroma blocks, and one DM identifier is used to signal whether DM mode or LM mode is used for the current chroma block.

In one sub-embodiment, there are 3 contexts for signaling the DM identity. When both the left block and the top block use the DM mode, context 0 is used to signal the DM identity. When only one of the left block and the upper block uses the DM mode, context 1 is used to signal the DM identity. Otherwise, when neither the left block nor the top block uses the DM mode, context 2 is used to signal the DM identity.

In another embodiment, at block S2504, only the DM mode and the CCLM (when enabled) mode are used for small chroma blocks. When the width, height, or region size (width x height) of a chroma block is less than or equal to Th, the current chroma block is referred to as a small chroma block. Th may be 2, 4, 8, 16, 32, 64, 128, 256, 512, or 1024.

For example, when the area size of the current chroma block is less than or equal to 8, only the DM mode and the CCLM (when enabled) mode are used for the current chroma block.

In another example, when the area size of the current chroma block is less than or equal to 16, only the DM mode and the CCLM (when enabled) mode are used for the current chroma block.

In another example, only one DM mode and one CCLM (when enabled) mode are used for small chroma blocks.

In another embodiment, at block S2505, when the intra mode of the luma component is equal to one of the MPM modes, the chroma blocks can only use the DM mode and do not signal an identification for the chroma mode, otherwise the chroma blocks allow the DM mode and the CCLM mode.

In one example, the MPM mode can only be first-level MPM.

In another example, the MPM mode can only be second level MPM.

In another example, the MPM mode may be a first-level MPM or a second-level MPM.

In another embodiment, at block S2506, when the intra mode of the luma component is not equal to any of the MPM modes, the chroma block may use the DM mode and not signal an identification for the chroma mode, otherwise the chroma block allows the DM mode and the CCLM mode.

Therefore, with the exemplary embodiments described herein, the above technical problems can be advantageously ameliorated by these technical solutions.

The techniques described above may be implemented as computer software using computer readable instructions and stored physically on one or more computer readable media, or by one or more specially configured hardware processors. For example, fig. 26 illustrates a computer system 2600 suitable for implementing certain embodiments of the disclosed subject matter.

The computer software may be encoded using any suitable machine code or computer language that may be subject to assembly, compilation, linking, or similar mechanism to create code that includes instructions that may be executed directly by a computer Central Processing Unit (CPU), Graphics Processing Unit (GPU), etc., or by transcoding, microcode execution, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablet computers, servers, smart phones, gaming devices, internet of things devices, and so forth.

The components of computer system 2600 shown in fig. 26 are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiment of computer system 2600.

Computer system 2600 may include some human interface input devices. Such human interface input devices may be responsive to input by one or more human users through, for example: tactile input (e.g., keystrokes, strokes, data glove movements), audio input (e.g., speech, clapping hands), visual input (e.g., gestures), olfactory input (not depicted). The human interface device may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, captured images from a still image camera), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video), and so forth.

The input human interface device may include one or more of the following (only one depicted in each): a keyboard 2602, a mouse 2603, a touch pad 403, a touch screen 2604, a joystick 2605, a microphone 2606, a scanner 2608, and a camera 2607.

Computer system 2600 may also include some human interface output devices. Such human interface output devices may stimulate one or more human user's senses, for example, through tactile output, sound, light, and smell/taste. Such human interface output devices can include tactile output devices (e.g., tactile feedback for a touch screen 2610 or a joystick 2605, but may also be tactile feedback devices that do not act as input devices), audio output devices (e.g., speakers 2609, headphones (not depicted)), visual output devices (e.g., a screen 2610 including a CRT screen, an LCD screen, a plasma screen, an OLED screen, each with or without touch screen input functionality, each with or without tactile feedback functionality, some of which can output two-dimensional visual output or output in excess of three-dimensional output through devices such as stereoscopic image output, virtual reality glasses (not depicted), holographic displays and smoke boxes (not depicted), and printers (not depicted).

Computer system 2600 may also include human-accessible storage devices and their associated media: such as optical media including CD/DVD ROM/RW 2612 with CD/DVD like media 2611, finger drives 2613, removable hard or solid state drives 2614, conventional magnetic media (not depicted) such as tapes and floppy disks, dedicated ROM/ASIC/PLD based devices (not depicted) such as secure dongles, and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the presently disclosed subject matter does not encompass transmission media, carrier waves, or other transitory signals.

Computer system 2600 can also include an interface to one or more communication networks. The network may be, for example, a wireless network, a wired network, an optical network. The network may further be a local network, a wide area network, a metropolitan area network, a vehicle and industrial network, a real time network, a delay tolerant network, etc. Examples of networks include local area networks such as ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., television wired or wireless wide area digital networks including cable television, satellite television, and terrestrial broadcast television, automotive and industrial television including CANBus, and so forth. Some networks typically require external network interface adapters (e.g., USB ports of computer system 2600) that connect to some general purpose data port or peripheral bus 2625; as described below, other network interfaces are typically integrated into the core of computer system 2600 by connecting to a system bus (e.g., to an Ethernet interface in a PC computer system or to a cellular network interface in a smartphone computer system). Computer system 2600 can communicate with other entities using any of these networks. Such communications may be received only one way (e.g., broadcast television), transmitted only one way (e.g., CANbus connected to certain CANbus devices), or bi-directional, e.g., connected to other computer systems using a local or wide area network digital network. As described above, certain protocols and protocol stacks may be used on each of those networks and network interfaces.

The human interface device, human accessible storage device, and network interface described above may be attached to the core 2612 of the computer system 2600.

The kernel 2612 may include one or more Central Processing Units (CPUs) 2612, Graphics Processing Units (GPUs) 2622, special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) 2624, hardware accelerators 2624 for certain tasks, and so forth. These devices, as well as Read Only Memory (ROM)2619, random access memory 2618, internal mass storage memory 447 such as an internal non-user accessible hard drive, SSD, etc., may be connected via system bus 2626. In some computer systems, the system bus 226 may be accessed in the form of one or more physical plugs to enable expansion by additional CPUs, GPUs, and the like. Peripheral devices may be connected directly to the system bus 2626 of the core or connected to the system bus 848 of the core through the peripheral bus 2601. The architecture of the peripheral bus includes PCI, USB, etc.

CPU 2621, GPU 2622, FPGA 2624, and accelerators 2624 may execute certain instructions that may be combined to make up the computer code described above. The computer code may be stored in the ROM 2619 or RAM 2618. Transitional data can also be stored in RAM 2618, while persistent data can be stored in internal mass storage 2620, for example. Fast storage and retrieval to any storage device may be done by using a cache, which may be closely associated with: one or more CPUs 2621, GPUs 2622, mass storage 2620, ROM 2619, RAM 2618, and the like.

The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of non-limiting example, a computer system having architecture 2600, and in particular core 2616, may be caused to provide functionality as a result of one or more processors (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media can be media associated with user-accessible mass storage as described above, as well as memory of some non-transitory kernel 2616, such as kernel internal mass storage 2620 or ROM 2619. Software implementing various embodiments of the present disclosure may be stored in such devices and executed by the kernel 2616. The computer readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the kernel 2616, and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures 2618 stored in RAM and modifying such data structures according to processes defined by the software. Additionally or alternatively, the functionality provided by the computer system may be provided as a result of logic hardwired or otherwise embodied in circuitry (e.g., accelerator 2624) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to portions of software may encompass logic and vice versa. Where appropriate, reference to portions of a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) that stores software for execution, circuitry embodying logic for execution, or both. The present disclosure includes any suitable combination of hardware and software.

While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within its spirit and scope.

45页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：使用自适应乘数系数进行图像滤波的方法及装置

Method and apparatus for multi-line intra prediction in video compression

相关技术

网友询问留言