Video image encoder, video image decoder and corresponding motion information encoding method
阅读说明:本技术 视频图像编码器、视频图像解码器以及对应的运动信息编码方法 (Video image encoder, video image decoder and corresponding motion information encoding method ) 是由 鲁斯兰·法里托维奇·穆拉赫梅托夫 谢尔盖·尤里耶维奇·伊科宁 马克西姆·鲍里索维奇·西切夫 于 2018-03-26 设计创作,主要内容包括:本发明涉及图像(picture/image)处理领域。本发明尤其涉及一种视频图像解码设备和一种视频图像编码设备。本发明特别涉及减少从所述编码设备传输到所述解码设备的信息量。根据本发明,所述编码设备只将运动信息的绝对值传输到所述解码设备。所述编码设备和所述解码设备均使用所述运动信息的所述绝对值构建所述生成的运动信息的运动信息候选,其中,每个运动信息候选根据所述绝对值的不同符号组合生成;计算每个运动信息候选的成本;根据所述计算出的成本确定每个运动信息候选的排列值(rank)。所述编码设备根据所述确定的排列值传输所述运动信息的所述绝对值,而所述解码设备能够根据所述确定的排列值将运动信息候选确定为所述运动信息。(The present invention relates to the field of image processing. More particularly, the present invention relates to a video image decoding apparatus and a video image encoding apparatus. The invention relates in particular to reducing the amount of information transmitted from the encoding device to the decoding device. According to the present invention, the encoding apparatus transmits only the absolute value of the motion information to the decoding apparatus. The encoding apparatus and the decoding apparatus each construct motion information candidates of the generated motion information using the absolute value of the motion information, wherein each motion information candidate is generated from a different sign combination of the absolute value; calculating a cost for each motion information candidate; determining an arrangement value (rank) for each motion information candidate based on said calculated cost. The encoding device transmits the absolute value of the motion information according to the determined permutation value, and the decoding device is capable of determining a motion information candidate as the motion information according to the determined permutation value.)
1. A video image decoding apparatus (500), said apparatus (500) comprising:
a receiver (501) for receiving absolute values (506) of motion information (507);
a processor (502) configured to:
-generating motion information candidates (503) based on the received absolute values (506), wherein each motion information candidate (503) is generated based on a different sign combination of the absolute values (506);
calculating a cost (504) for each motion information candidate (503);
determining a ranking value (505) for each motion information candidate (603) based on the calculated cost (504);
-determining a motion information candidate (503) as said motion information (507) in dependence of said determined permutation value (505).
2. The apparatus (500) of claim 1,
the receiver (501) is further configured to receive a permutation value;
the processor (502) is configured to:
-determining a motion information candidate (503) with a ranking value (505) as the motion information (507) in dependence of the received ranking value.
3. The apparatus (500) of claim 2,
the received rank value is an index,
the processor (502) is configured to:
generating an index list of the motion information candidates (503) sorted by rank value;
-determining an indexed motion information candidate (503) in the index list as the motion information (507) according to the received index.
4. The apparatus (500) of claim 1,
the processor (502) is configured to:
-determining as said motion information (507) a motion information candidate (503) with an alignment value (505) corresponding to the calculated lowest cost (504).
5. The apparatus (500) of any of claims 1 to 4,
the processor (502) is configured to:
the cost (504) of each motion information candidate (503) is calculated by template or bi-directional matching, in particular based on the sum of absolute differences or other distortion measures.
6. The apparatus (500) of any of claims 1 to 5,
the processor (502) is configured to:
excluding one of two motion information candidates (503), wherein the two motion information candidates (503) differ only in the sign of at least one zero value.
7. The apparatus (500) of any of claims 1 to 6,
the processor (502) is configured to:
a cost (504) for each motion information candidate (503) is calculated based on the number of bits required to transmit the rank value (505) for each motion information candidate (503).
8. A video image encoding apparatus (400), said apparatus (400) comprising:
a processor (401) configured to:
generating motion information (402);
-constructing motion information candidates (403) from absolute values (407) of the generated motion information (402), wherein each motion information candidate (403) is generated from a different sign combination of the absolute values (407);
calculating a cost (404) for each motion information candidate (403),
determining a ranking value (405) for each motion information candidate (403) based on the calculated cost (404);
a transmitter (406) for transmitting the absolute value (407) of the generated motion information (402) in accordance with the determined permutation value (405).
9. The apparatus (400) of claim 8,
the transmitter (406) is configured to transmit the permutation value (405) of the motion information candidate (403) corresponding to the generated motion information (402).
10. The apparatus (400) of claim 8 or 9,
the processor (401) is configured to:
a cost (404) for each motion information candidate (403) is calculated based on the number of bits required to transmit the rank value (405) for each motion information candidate (403).
11. The apparatus (400) of claim 9 or 10,
the processor (401) is configured to:
generating an index list of the motion information candidates (403) sorted by a sorting value (405);
determining an index in the index list of the motion information candidate (403) corresponding to the generated motion information (402);
a transmitter (406) is configured to transmit the determined index.
12. The apparatus (400) of claim 8,
the processor (401) is configured to:
determining whether a motion information candidate (403) with an alignment value (405) corresponding to the calculated lowest cost (404) corresponds to the generated motion information (402);
discarding the generated motion information (402) if the determined motion information candidate (403) does not correspond to the generated motion information (402).
13. The apparatus (400) of any of claims 8 to 12,
the processor (401) is configured to calculate a cost (404) for each motion information candidate (403) by template or bi-directional matching, in particular based on a sum of absolute differences or other distortion measure.
14. The apparatus (400) of any of claims 8 to 13,
the processor (401) is configured to:
excluding one of two motion information candidates (403), wherein the two motion information candidates (403) differ only in the sign of at least one zero value.
15. A method (700) for decoding video images, the method (700) comprising:
receiving (701) an absolute value (506) of motion information (507);
generating (702) motion information candidates (503) from the received absolute values, wherein each motion information candidate (503) is generated from a different sign combination of the absolute values;
calculating (703) a cost (504) for each motion information candidate (503);
determining a ranking value (505) for each motion information candidate (503) based on the calculated cost (504);
-determining (704) a motion information candidate (503) as said motion information (507) in dependence of said determined permutation value (505).
16. A method (600) for encoding video images, the method (600) comprising:
generating (601) motion information (402);
-constructing (602) motion information candidates (403) from absolute values (407) of the generated motion information (402), wherein each motion information candidate (403) is generated from a different sign combination of the absolute values (407);
calculating (603) a cost (404) for each motion information candidate (403);
determining (604) a ranking value (405) for each motion information candidate (403) in dependence on the calculated cost (404);
-transmitting (605) the absolute value (407) of the generated motion information (402) in accordance with the determined ranking value (405).
17. A computer program product storing program code for performing the method according to claim 15 or 16, when the computer program runs on a computer.
Technical Field
Embodiments of the present invention relate to the field of video image processing (e.g., video image and/or still image coding). More particularly, the present invention relates to a video image decoding apparatus (e.g., a video image decoder) and a video image encoding apparatus (e.g., a video image encoder). The invention also relates to corresponding video image decoding and encoding methods.
Background
Video encoding (video encoding and video decoding) is widely used in digital video applications, such as broadcast digital TV, video transmission over the internet and mobile networks, real-time session applications such as video chat and video conferencing, DVD and blu-ray discs, video content acquisition and editing systems, and security applications for camcorders.
With the development of the hybrid block-based video coding scheme in the h.261 standard in 1990, new video coding techniques and tools have been developed and form the basis for new video coding standards. One of the goals of most video coding standards is to achieve a lower bitrate than the previous standard while guaranteeing image quality. Other Video Coding standards include MPEG-1 Video, MPEG-2 Video, ITU-T H.262/MPEG-2, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), ITU-T H.265, High Efficiency Video Coding (HEVC), and extensions to these standards, such as scalability and/or three-dimensional (3D) extensions.
In hybrid video coding, the encoder performs inter prediction supported by inter estimation, thereby exploiting temporal redundancy in the video sequence. This can reduce the amount of information that needs to be transmitted from the encoder to the decoder. Specifically, motion information resulting from inter-frame estimation is transmitted from the encoder to the decoder, along with other information. Typically, the Motion information includes different forms of Motion Vectors (MVs). Inter-prediction in the encoder ensures that the encoder and decoder are in sync and identical to inter-prediction in the decoder. In the decoder, inter prediction is performed to reconstruct temporal redundancy using motion information transmitted from the encoder.
One particular form of Motion information transmitted is a pair of Motion Vector Predictor (MVP) indices and Motion Vector Difference (MVD). An MVP is one vector in a vector list that is constructed in the same way for a given coding unit in the encoder/decoder. The MVP index is an index of the MVP in the MVP list. The MVD is a difference value between the MV generated through inter-frame estimation and the selected MVP. As the name implies, an MVD is a 2D vector.
Currently, the encoder transmits the MVD to the decoder, and the transmission process is as shown in fig. 15. First, the absolute value (x, y) of the MVD is transmitted through an entropy encoder with a non-uniform probability model. Then, for the non-zero component, symbols are transmitted in an Equal Probability (EP) mode, each symbol requiring 1 bit to indicate. In most cases, the symbols of the MVDs are uniformly distributed, and thus Context-Adaptive binary arithmetic Coding (CABAC) or the like cannot improve compression efficiency.
Disclosure of Invention
In view of the above implementation, the present invention aims to further improve hybrid video coding. In particular, it is an object of the invention to reduce the amount of information transmitted from an encoder to a decoder while ensuring image quality. Accordingly, the present invention provides a video image encoding apparatus and a video image decoding apparatus, respectively, whereby information transmitted (i.e., encoded on a code stream from an encoder to a decoder) can be further reduced.
The object of the invention is achieved according to the embodiments of the invention defined by the features of the independent claims. Further advantageous implementations of these embodiments are defined by the features of the dependent claims.
In particular, the invention proposes not to transmit the sign of the motion information, but only the absolute value of the motion information, e.g. the MVD components (x-component and y-component), from the encoder to the decoder. In contrast, the invention proposes a method for deriving symbols in the decoder by means of templates or bi-directional matching or the like without increasing the computational complexity, possibly with the aid of some transmitted side information.
A first aspect of the present invention provides a video image decoding apparatus. The apparatus comprises: a receiver for receiving an absolute value of motion information; a processor to: generating motion information candidates according to the received absolute values, wherein each motion information candidate is generated according to different sign combinations of the absolute values; calculating a cost for each motion information candidate; determining a ranking value of each motion information candidate according to the calculated cost; and determining a motion information candidate as the motion information according to the determined ranking value.
The apparatus may be a video image decoder, or may be implemented by such a decoder. Since the decoding device is able to determine the motion information without receiving symbols of the motion information, the encoder does not need to transmit these symbols. Thus, the amount of information encoded onto the codestream by the encoder and transmitted to the decoder is reduced. The decoding device determines that the motion information does not add too much computational complexity nor affect the decoding efficiency of the decoding device.
The motion information may include MVs, MVPs, and/or MVDs. The invention can be applied to different motion models. For example, the invention may be applied to translation models, affine models or perspective models. Accordingly, the motion information may include a directly transmitted MVD or MV. The invention may also be applied to affine motion models, wherein the motion information may comprise MV/MVD lists. In this case, there is 22NA motion information candidate, where N is the length of the MV/MVD list generated by the motion model. Notably, the translation model may be considered to produce a length-1 list of MVDs.
The absolute value of the motion information may be an absolute value of the MV or MVD. Determining the motion information candidate from the received absolute value (e.g., absolute MVD component). For example, for a received unsigned MVD (x, y), where x ≧ 0 and y ≧ 0, the candidate may be [ (x, y), (x, -y), (-x, y), (-x, -y) ]. For zero-valued components, insignificant combinations may be excluded from the list, e.g., (x, y) — (x, y), where x is 0; (x, y) ═ x, -y, where y is 0.
The cost of a motion information candidate may account for the probability that the motion information candidate is the correct motion information. For example, the lower the cost of the motion information candidate, the higher the probability. Thus, the rank value of a motion information candidate may be information that accounts for the cost of the motion information candidate relative to other motion information candidates. For example, the lower the cost, the higher the ranking value.
In an implementation manner of the first aspect, the receiver is further configured to receive a ranking value, and the processor is configured to determine a motion information candidate with a ranking value as the motion information according to the received ranking value.
The permutation value, like the absolute value of the motion information, may be received from a code stream encoded by an encoding device, i.e. transmitted by the encoding device to the decoding device. The permutation value is auxiliary information, thereby enabling the decoding apparatus to quickly and accurately determine the motion information.
In another implementation manner of the first aspect, the received permutation value is an index, and the processor is configured to: generating an index list of the motion information candidates sorted according to the ranking values; and determining the motion information candidate with the index in the index list as the motion information according to the received index.
As described above, a candidate having a large ranking value (i.e., low cost) is more likely to become correct motion information than other candidates having a small ranking value (high cost). A method of encoding permutation value indexes may take advantage of the fact that the amount of information to be transmitted is reduced by using an adaptive context of CABAC and/or using a non-uniformly distributed code (e.g., a unary code or Golomb (Golomb) code that assigns shorter codewords to larger permutation value candidates).
In this implementation, only the index is transmitted from the encoder to the decoder, thus adding only a small amount of additional information. Of course, the decoding device is configured to accurately determine the correct motion information from the received absolute value.
In another implementation manner of the first aspect, the processor is configured to determine a motion information candidate with an arrangement value corresponding to the calculated lowest cost as the motion information.
In this implementation, the decoding device does not need the encoding device to provide any side information (like the permutation values or indices described above). Therefore, the amount of information transmitted from the encoder to the decoder can be as small as possible. It is noted that even if side information (rank value, index) is transmitted, in most cases the motion information candidate with the best rank value, lowest cost or smallest index is the true motion information. Thus, the present implementation avoids transmitting the rank value/index.
In another implementation form of the first aspect, the processor is configured to calculate the cost for each motion information candidate by template or bi-directional matching, in particular based on a sum of absolute differences or other distortion measure.
Conventional template or two-way matching techniques may be used.
In another implementation form of the first aspect, the processor is configured to exclude one of two motion information candidates, wherein the two motion information candidates differ only in the sign of at least one zero value.
Therefore, the list becomes shorter, it is more efficient to determine correct motion information, and the amount of matching operation can be reduced.
In another implementation manner of the first aspect, the processor is configured to calculate a cost of each motion information candidate according to a number of bits required for transmitting the rank value of each motion information candidate.
Thus, to obtain better results, an improved cost metric is used.
A second aspect of the present invention provides a video image encoding apparatus. The apparatus comprises: a processor for generating motion information; constructing motion information candidates according to the absolute values of the generated motion information, wherein each motion information candidate is generated according to different sign combinations of the absolute values; calculating a cost for each motion information candidate; determining a ranking value of each motion information candidate according to the calculated cost; a transmitter for transmitting the absolute value of the generated motion information according to the determined permutation value.
The encoding device transmits the absolute values, in particular the symbols for which the motion information is not transmitted. Thus, the amount of information encoded onto the codestream, transmitted to the decoding device, can be reduced. The term "according to said determined permutation value" does not mean that a permutation value is also transmitted, but only means that said encoding device takes into account said determined permutation value when performing the transmitting step. Different methods of considering the determined permutation value are described below.
In one implementation of the second aspect, the transmitter is configured to transmit the permutation value of the motion information candidate corresponding to the generated motion information.
That is, the encoding apparatus transmits an absolute value of the generated motion information, and transmits an arrangement value of the motion information candidate according to the determined arrangement value. In this implementation, the term "according to the determined arrangement value" means that the arrangement value of the motion information candidate corresponding to the generated motion information is also transmitted together with the absolute value. The permutation value serves as side information that assists the decoding apparatus in determining the motion information.
In another implementation manner of the second aspect, the processor is configured to calculate a cost of each motion information candidate according to a number of bits required for transmitting the rank value of each motion information candidate.
Thus, to obtain better results, an improved cost metric is used.
In another implementation manner of the second aspect, the processor is configured to: generating an index list of the motion information candidates sorted according to the ranking values; determining an index in the index list of the motion information candidate corresponding to the generated motion information; the transmitter is configured to transmit the determined index.
The ranking values of the generated motion information correspond accordingly to the determined indices, the transmitted indices being used as side information on the decoding device side. The determined arrangement value, list, and index are the same on the encoding apparatus and the decoding apparatus side.
In another implementation manner of the second aspect, the processor is configured to: determining whether a motion information candidate with an arrangement value corresponding to the calculated lowest cost corresponds to the generated motion information; discarding the generated motion information if the determined motion information candidate does not correspond to the generated motion information.
In this implementation, "according to the determined layout" means: the encoding device transmits only the absolute value if the permutation value determined for the generated motion information is associated with the calculated lowest cost. Otherwise, discarding the generated motion information and not transmitting the absolute value of the generated motion information. Discarding may mean that the encoder selects other motion information, or selects other encoding modes. Since the decoding apparatus of the first aspect uses the least costly motion information as correct motion information, the present implementation prevents the decoding apparatus from erroneously determining correct motion information.
In another implementation form of the second aspect, the processor is configured to calculate the cost for each motion information candidate by template or bi-directional matching, in particular based on a sum of absolute differences or other distortion measure.
The advantages of this aspect are the same as those described above in connection with the decoding apparatus of the first aspect.
In another implementation form of the second aspect, the processor is configured to exclude one of two motion information candidates, wherein the two motion information candidates differ only in the sign of at least one zero value.
A third aspect of the present invention provides a video image decoding method. The method comprises the following steps: receiving an absolute value of motion information; generating motion information candidates according to the received absolute values, wherein each motion information candidate is generated according to different sign combinations of the absolute values; calculating a cost for each motion information candidate; determining a ranking value of each motion information candidate according to the calculated cost; and determining a motion information candidate as the motion information according to the determined ranking value.
In one implementation form of the third aspect, the method includes: receiving an arrangement value; and determining the motion information candidate with the ranking value as the motion information according to the received ranking value.
In another implementation manner of the third aspect, the received permutation value is an index, and the method includes: generating an index list of the motion information candidates sorted according to the ranking values; and determining the motion information candidate with the index in the index list as the motion information according to the received index.
In another implementation form of the third aspect, the method includes: and determining a motion information candidate with an arrangement value corresponding to the calculated lowest cost as the motion information.
In another implementation form of the third aspect, the method includes: the cost of each motion information candidate is calculated by template or bi-directional matching, in particular based on the sum of absolute differences or other distortion measures.
In another implementation form of the third aspect, the method includes: excluding one of two motion information candidates, wherein the two motion information candidates differ only in the sign of at least one zero value.
In another implementation form of the third aspect, the method includes: the cost of each motion information candidate is calculated based on the number of bits required to transmit the rank value of each motion information candidate.
The third aspect and its implementations provide methods that achieve the same advantages and effects as the decoding device provided by the first aspect and its corresponding implementations.
A fourth aspect of the present invention provides a video image encoding method. The method comprises the following steps: generating motion information; constructing motion information candidates according to the absolute values of the generated motion information, wherein each motion information candidate is generated according to different sign combinations of the absolute values; calculating a cost for each motion information candidate; determining a ranking value of each motion information candidate according to the calculated cost; transmitting the absolute value of the input motion information according to the determined permutation value.
In one implementation form of the fourth aspect, the method comprises: transmitting the arrangement value of the motion information candidate corresponding to the generated motion information.
In another implementation form of the fourth aspect, the method comprises: the cost of each motion information candidate is calculated based on the number of bits required to transmit the rank value of each motion information candidate.
In another implementation form of the fourth aspect, the method includes: generating an index list of the motion information candidates sorted according to the ranking values; determining an index in the index list of the motion information candidate corresponding to the generated motion information; the transmitter is configured to transmit the determined index.
In another implementation form of the fourth aspect, the method includes: determining whether a motion information candidate with an arrangement value corresponding to the calculated lowest cost corresponds to the generated motion information; discarding the generated motion information if the determined motion information candidate does not correspond to the generated motion information.
In another implementation form of the fourth aspect, the method includes: the cost of each motion information candidate is calculated by template or bi-directional matching, in particular based on the sum of absolute differences or other distortion measures.
In another implementation form of the fourth aspect, the method includes: excluding one of two motion information candidates, wherein the two motion information candidates differ only in the sign of at least one zero value.
The fourth aspect and its implementations provide a method that achieves the same advantages and effects as the encoding device provided by the second aspect and its corresponding implementations.
According to a fourth aspect, a computer program product is provided. The computer program product has program code stored thereon. The program code is adapted to perform the methods provided by the third and fourth aspects and their implementations when the computer program is run on a computer.
It should be noted that all devices, elements, units and components described in the present application may be implemented in software or hardware elements or any type of combination thereof. All steps performed by the various entities described in the present application and the functions described to be performed by the various entities are intended to indicate that the respective entities are adapted or arranged to perform the respective steps and functions. Although in the following description of specific embodiments specific functions or steps performed by an external entity are not illustrated in the description of specific elements of that entity performing the specific steps or functions, it should be clear to a skilled person that these methods and functions may be implemented in respective hardware or software elements or any combination thereof.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Embodiments of the invention will be described in more detail below with reference to the attached drawings and schematic drawings, in which:
FIG. 1 is a block diagram of an exemplary architecture of a video encoder for implementing embodiments of the present invention;
FIG. 2 is a block diagram of an exemplary architecture of a video decoder for implementing embodiments of the present invention;
FIG. 3 is a block diagram of one example of a video encoding system for implementing an embodiment of the invention;
fig. 4 is a block diagram of a video image encoding apparatus according to an embodiment of the present invention;
fig. 5 is a block diagram of a video image decoding apparatus according to an embodiment of the present invention;
fig. 6 schematically illustrates a video image encoding method provided by an embodiment of the present invention;
fig. 7 schematically illustrates a video image decoding method provided by an embodiment of the present invention;
fig. 8 is a flowchart illustrating MVD transmission provided by an embodiment of the present invention;
fig. 9 is a flowchart illustrating MVD transmission provided by an embodiment of the present invention;
FIG. 10 shows a block diagram of one implementation of an embodiment of the invention in a video encoder;
FIG. 11 shows a block diagram of one implementation of an embodiment of the invention in a video encoder;
FIG. 12 shows a block diagram of one implementation of an embodiment of the invention in a video decoder;
FIG. 13 shows a block diagram of one implementation of an embodiment of the invention in a video decoder;
fig. 14 is a flowchart illustrating the MVD candidate list construction provided by the embodiment of the present invention;
fig. 15 illustrates MVD transmission in a hybrid codec.
Detailed Description
In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the invention or in which embodiments of the invention may be practiced. It should be understood that embodiments of the invention may be used in other respects, and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
For example, it is to be understood that the disclosure in connection with the described methods may equally apply to corresponding apparatuses or systems for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may include one or more equivalent units of functional units to perform the described one or more method steps (e.g., one unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular device is described in terms of one or more units, such as functional units, the corresponding method may include one step to perform the function of the one or more units (e.g., one step to perform the function of the one or more units, or multiple steps, each of which performs the function of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.
Video coding generally refers to the processing of a sequence of images forming a video or video sequence. In the field of video coding, the terms "frame" and "picture" may be used as synonyms. The video coding comprises two parts of video coding and video decoding. Video encoding is performed on the source side, typically involving processing (e.g., compressing) the original video image to reduce the amount of data required to represent the video image (and thus more efficient storage and/or transmission). Video decoding is performed at the destination side, typically involving inverse processing with respect to the encoder, to reconstruct the video image. Embodiments relate to video pictures (or video pictures in general, which will be explained below) "encoding" is to be understood as "encoding" and "decoding" of video pictures. The encoding portion and the decoding portion are also collectively referred to as a CODEC (coding and decoding).
In the case of lossless video coding, the original video image can be reconstructed, i.e. the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission), in the case of lossy video coding, further compression is performed by quantization or the like to reduce the amount of data representing the video image, whereas the decoder side cannot reconstruct the video image completely, i.e. the quality of the reconstructed video image is lower or worse than the quality of the original video image.
Several video coding standards of h.261 belong to the "lossy hybrid video codec" (i.e., the combination of spatial prediction and temporal prediction in the pixel domain with 2D transform coding in the transform domain for applying quantization). Each image in a video sequence is typically partitioned into non-overlapping sets of blocks, and encoding is typically performed at the block level. In other words, on the encoder side, the video is typically processed (i.e. encoded) at the block (video block) level, e.g. the prediction blocks are generated by spatial (intra) prediction and temporal (inter) prediction; subtracting the prediction block from the current block (currently processed block/block to be processed) to obtain a residual block; the residual block is transformed and quantized in the transform domain to reduce the amount of data to be transmitted (compressed), while the decoder side performs inverse processing with respect to the encoder on the encoded or compressed block to reconstruct the current block for representation. In addition, the encoder and decoder processing steps are the same, such that the encoder and decoder generate the same prediction (e.g., intra-prediction and inter-prediction) and/or reconstruction for processing (i.e., encoding) subsequent blocks.
Since video image processing (also known as moving image processing) and still image processing (the term "processing" includes encoding) share many concepts and techniques or tools, the term "image" is used hereinafter to refer to video images (as described above) and/or still images in a video sequence to avoid unnecessary repetition and differentiation of video images and still images when not needed. If the above description refers to only still images (still picture/still image), the term "still image" should be used.
An
Fig. 3 is a conceptual or schematic block diagram of one embodiment of an encoding system 300 (e.g., an image encoding system 300). The
The
The (digital) image is or can be seen as a two-dimensional array or matrix of pixels with intensity values. The pixels in the array may also be referred to as pixels (pels) (short for image elements). The number of pixels in the array or image in both the horizontal and vertical directions (or axes) defines the size and/or resolution of the image. To represent color, three color components are typically employed, i.e., the image may be represented as, or may include, an array of three pixels. In the RBG format or color space, the image includes corresponding red, green, and blue pixel arrays. However, in video coding, each pixel is typically represented in a luminance/chrominance format or in a color space, e.g., YCbCr, comprising a luminance component indicated by Y (sometimes also indicated by L) and two chrominance components indicated by Cb and Cr. The luminance (luma) component Y represents luminance or gray-scale intensity (e.g., like a gray-scale image), while the two chrominance (chroma) components Cb and Cr represent chrominance or color information components. Thus, an image in YCbCr format includes an array of luminance pixel points made up of luminance pixel point values (Y) and two arrays of chrominance pixel points made up of chrominance values (Cb and Cr). An image in RGB format may be converted or transformed into YCbCr format and vice versa. This process is also referred to as color transformation or color conversion. If the image is black and white, the image may include only an array of luminance pixels.
For example, the
In order to distinguish the
The
The destination device 320 comprises a decoder 200 or decoding unit 200 and may additionally (i.e. optionally) comprise a communication interface or
The
For example, the
For example,
The decoder 200 is configured to receive encoded
Post-processor 326 in destination device 320 is configured to post-process decoded image data 231 (e.g., decoded image 231) to obtain post-processed image data 327 (e.g., post-processed image 327). Post-processing performed by
Although fig. 3 shows
It will be apparent to those skilled in the art from this description that the existence and division of different units or functions in the
Accordingly, the
Encoder and encoding method
Fig. 1 is a schematic/conceptual block diagram of one embodiment of an
For example, the residual calculation unit 104, the transform unit 106, the quantization unit 108, and the entropy encoding unit 170 form a forward signal path of the
The
Residual calculation
The residual calculation unit 104 is configured to calculate a residual block 105 (the prediction block 165 is described in detail below) from the image block 103 and the prediction block 165 by, for example: pixel point values of the prediction block 165 are subtracted from pixel point values of the image block 103 pixel by pixel (pixel by pixel) to obtain a residual block 105 in a pixel domain.
Transformation of
The transform unit 106 is configured to perform spatial frequency transform or linear spatial transform (for example, Discrete Cosine Transform (DCT) or Discrete Sine Transform (DST)) on the pixel values of the residual block 105 to obtain transform coefficients 107 in a transform domain. The transform coefficients 107, which may also be referred to as transform residual coefficients, represent the residual block 105 in the transform domain.
The transform unit 106 may be used to perform DCT/DST integer approximation, e.g., the core transform specified for HEVC/h.265. Such an integer approximation is typically scaled by a certain factor compared to the orthogonal DCT transform. To maintain the norm of the residual block that is processed by the forward and inverse transforms, other scaling factors are used as part of the transform process. The scaling factor is typically selected according to certain constraints, e.g., the scaling factor is a power of 2 for a shift operation, the bit depth of the transform coefficients, a trade-off between accuracy and implementation cost, etc. For example, on the decoder 200 side, a specific scaling factor is specified for the inverse transform by the
Quantization
The quantization unit 108 is configured to quantize the transform coefficient 107 by performing scalar quantization, vector quantization, or the like to obtain a quantized transform coefficient 109. The quantized coefficients 109 may also be referred to as quantized residual coefficients 109. For example, for scalar quantization, different degrees of scaling may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. An appropriate quantization step size may be indicated by a Quantization Parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization step sizes. For example, a small quantization parameter may correspond to a fine quantization (small quantization step size) and a large quantization parameter may correspond to a coarse quantization (large quantization step size), or vice versa. Quantization may comprise division by a quantization step size, while corresponding or inverse dequantization performed by inverse quantization 110 or the like may comprise multiplication by a quantization step size. Embodiments according to HEVC may be used to determine a quantization step size using a quantization parameter. In general, the quantization step size may be calculated from the quantization parameter using a fixed point approximation of an equation that includes a division. Other scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block that may be modified due to the scaling used in the fixed point approximation of the equation for the quantization step size and quantization parameter. In one exemplary implementation, the scaling of the inverse transform and inverse quantization may be combined. Alternatively, a custom quantization table may be used and indicated from the encoder to the decoder in the code stream, or the like. Quantization is a lossy operation, where the larger the quantization step, the greater the loss.
Embodiments of the encoder 100 (or the quantization unit 108) may be configured to output a quantization scheme and a quantization step size by means of a corresponding quantization parameter, etc., such that the decoder 200 may receive and perform a corresponding inverse quantization. Embodiments of the encoder 100 (or quantization unit 108) may be used to output the quantization scheme and quantization step size directly or after entropy encoding by entropy encoding unit 170 or any other entropy encoding unit.
The inverse quantization unit 110 is configured to perform inverse quantization of the quantization coefficient by the quantization unit 108 to obtain a dequantized coefficient 111 by: the inverse quantization scheme, which is the quantization scheme performed by the quantization unit 108, is performed according to or using the same quantization step size as the quantization unit 108. The dequantized coefficients 111, which may also be referred to as dequantized residual coefficients 111, correspond to the transform coefficients 108, but the dequantized coefficients 111 are typically not exactly the same as the transform coefficients due to the loss caused by quantization.
The inverse Transform unit 112 is configured to perform an inverse Transform of the Transform performed by the Transform unit 106, for example, inverse Discrete Cosine Transform (DCT) or inverse Discrete Sine Transform (DST), to obtain an inverse Transform block 113 in the pixel domain. The inverse transform block 113 may also be referred to as an inverse transform dequantization block 113 or an inverse transform residual block 113.
The reconstruction unit 114 is configured to combine the inverse transform block 113 and the prediction block 165 to obtain a reconstructed block 115 in the pixel domain, by: the pixel point value of the decoded residual block 113 and the pixel point value of the predicted block 165 are added in units of pixel points.
A buffer unit 116 (or simply "buffer" 116) (e.g., column buffer 116) is used to buffer or store reconstructed blocks and corresponding pixel values for intra estimation and/or intra prediction, etc. In other embodiments, the encoder may be configured to perform any type of estimation and/or prediction using the unfiltered reconstructed block and/or corresponding pixel point values stored in the buffer unit 116.
Loop filtering unit 120 (or simply "loop filter" 120) is configured to filter reconstructed block 115 by using a deblock sample-adaptive offset (SAO) filter or other filters (e.g., a sharpening or smoothing filter or a collaborative filter), etc., to obtain filtered block 121. The filtering block 121 may also be referred to as a filtered reconstruction block 121.
An embodiment of the loop filter unit 120 may comprise (not shown in fig. 1) a filter analysis unit for determining loop filter parameters for the actual filter and an actual filter unit. The filter analysis unit may be adapted to apply fixed predetermined filter parameters to the actual loop filter, to adaptively select filter parameters from a predetermined set of filter parameters, or to adaptively calculate filter parameters for the actual loop filter.
Embodiments of the loop filtering unit 120 may comprise (not shown in fig. 1) one or more filters (e.g., loop filtering components and/or sub-filters), e.g., one or more of different kinds or types of filters connected in series or in parallel or any combination thereof, wherein each filter may comprise a filter analysis unit to determine the respective loop filter parameters either individually or in combination with other filters of the plurality of filters, e.g., as described in the paragraph above.
Embodiments of encoder 100 (correspondingly, loop filtering unit 120) may be configured to output the loop filter parameters directly or after entropy encoding by entropy encoding unit 170 or any other entropy encoding unit, such that decoder 200 may receive and use the same loop filter parameters for decoding, and so on.
A Decoded Picture Buffer (DPB) 130 is used to receive and store the filtering block 121. The decoded picture buffer 130 may also be used to store other previously reconstructed filter blocks (e.g., previously reconstructed filter block 121) in the same current picture or a different picture (e.g., previously reconstructed picture), and may provide the complete previously reconstructed (i.e., decoded) picture (and corresponding reference blocks and pixels) and/or the partially reconstructed current picture (and corresponding reference blocks and pixels) for inter estimation and/or inter prediction, etc.
Other embodiments of the present invention may also be used to use previously filtered blocks and corresponding filtered pixel point values of decoded image buffer 130 for any type of estimation or prediction, e.g., intra-frame estimation and prediction and inter-frame estimation and prediction.
Motion estimation and prediction
Prediction unit 160, also referred to as block prediction unit 160, is configured to receive or retrieve an image block 103 (current image block 103 in current image 101) and decoded image data or at least reconstructed image data, e.g., reference pixel points of the same (current) image from buffer 116 and/or decoded
Mode selection unit 162 may be used to select a prediction mode (e.g., intra or inter prediction mode) and/or a corresponding prediction block 145 or 155 to use as prediction block 165 to calculate residual block 105 and reconstruct block 115.
Embodiments of mode selection unit 162 may be used to select a prediction mode (e.g., from among the prediction modes supported by prediction unit 160) that provides the best match or the smallest residual (the smallest residual refers to better compression in transmission or storage), or that provides the smallest signaling overhead (the smallest signaling overhead refers to better compression in transmission or storage), or both. The mode selection unit 162 may be configured to determine the prediction mode according to Rate Distortion Optimization (RDO), i.e. to select the prediction mode providing the minimum rate distortion optimization, or to select the prediction mode having an associated rate distortion at least satisfying a prediction mode selection criterion.
The prediction processing (e.g., prediction unit 160) and mode selection (e.g., by mode selection unit 162) performed by
As described above, the
The intra prediction mode set may include 32 different intra prediction modes, for example, a non-directional mode like a DC (or mean) mode and a planar mode or a directional mode as defined by h.264, or may include 65 different intra prediction modes, for example, a non-directional mode like a DC (or mean) mode and a planar mode or a directional mode as defined by h.265.
The set of (possible) inter prediction modes depends on the available reference pictures (i.e., the previously at least partially decoded pictures stored in
In addition to the prediction mode described above, a skip mode and/or a direct mode may be applied.
Prediction unit 160 may also be used to partition block 103 into smaller block portions or sub-blocks by, among other things: iteratively using quad-tree (QT) partitions, binary-tree (BT) partitions, or ternary-tree (TT), or any combination thereof; and for performing prediction or the like for each of the block portions or sub-blocks, wherein the mode selection comprises selecting a tree structure for partitioning the block 103 and selecting a prediction mode to be used for each of the block portions or sub-blocks.
The inter-frame estimation unit 142(inter estimation unit 142/inter picture estimation unit 142) is configured to receive or obtain the image block 103 (the current image block 103 of the current image 101) and the decoded
For example, the
The inter prediction unit 144 is configured to obtain or receive the inter prediction parameters 143, and perform inter prediction according to or using the inter prediction parameters 143 to obtain an inter prediction block 145.
Although fig. 1 shows two different units (or steps) for inter-coding, i.e., inter estimation unit 142 and inter prediction unit 152, these two functions may be performed as a whole by, among other things, inter estimation (inter estimation typically involves calculating inter prediction blocks, i.e., the above-mentioned or a "class" of inter prediction 154): all possible inter prediction modes or a predetermined subset of the possible inter prediction modes are tested iteratively while storing the currently best inter prediction mode and the corresponding inter prediction block and using the currently best inter prediction mode and the corresponding inter prediction block as (final) inter prediction parameters 143 and inter prediction block 145 without performing inter prediction 144 again.
The intra-frame estimation unit 152 is used for obtaining or receiving the image block 103 (current image block) and one or more previous reconstructed blocks (e.g., reconstructed neighboring blocks) of the same image for intra-frame estimation. For example, the
Embodiments of the
The intra-prediction unit 154 is used to determine an intra-prediction block 155 according to the intra-prediction parameters 153 (e.g., the selected intra-prediction mode 153).
Although fig. 1 shows two different units (or steps) for intra coding, i.e., intra estimation unit 152 and intra prediction unit 154, these two functions may be performed as a whole by, among other things, (intra estimation typically requires/includes calculating intra prediction blocks, i.e., the above-mentioned or a "class" of intra prediction 154): by iteratively testing all possible intra prediction modes or a predetermined subset of the possible intra prediction modes, the currently best intra prediction mode and the corresponding intra prediction block are stored at the same time, and the currently best intra prediction mode and the corresponding intra prediction block are used as the (final) intra prediction parameters 153 and the intra prediction block 155 without performing the intra prediction 154 once more.
Entropy encoding unit 170 is configured to apply an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a Context Adaptive VLC (CAVLC) scheme, an arithmetic coding scheme, a Context Adaptive Binary Arithmetic Coding (CABAC)) to quantized residual coefficients 109, inter-prediction parameters 143, intra-prediction parameters 153, and/or loop filter parameters, either alone or in combination (or not), to obtain encoded
Fig. 2 shows an exemplary video decoder 200. The video decoder 200 is configured to receive, for example, encoded image data (e.g., an encoded code stream) 171 encoded by the
Decoder 200 includes an
The
In an embodiment of the decoder 200, the
Specifically,
The
The decoder 200 is operative to output a decoded
Fig. 4 illustrates an
The
The
The
Fig. 5 illustrates an apparatus 500 provided by another embodiment of the present invention. In particular, the device 500 is used for decoding video images. The apparatus 500 may specifically be the decoder 200 shown in fig. 2, or may be implemented in the decoder 200 of fig. 2. The device 500 comprises a receiver 501. The receiver 501 is arranged to receive (among other things)
In particular, the processor 502 is configured to generate
Fig. 6 illustrates a
Fig. 7 illustrates a method 700 provided by an embodiment of the invention. Method 700 is particularly useful for decoding video images and may be performed by apparatus 500 shown in fig. 5 and/or decoder 200 shown in fig. 2. The method 700 comprises the steps 701: the absolute value of the motion information is received 506. Step 701 may be performed by the receiver 501 in the device 500 or by the
A detailed description is given below on the basis of the general embodiments of the present invention described in conjunction with fig. 4 to 7, respectively. Specifically, two specific embodiments of the present invention are described as examples. These two embodiments are described in conjunction with fig. 8 and 9 and fig. 10-13, respectively. In two specific embodiments, the transmission of motion information from the
In the first particular embodiment, the
In the second embodiment, the
Fig. 8 is a flow chart of a possible implementation of the first embodiment. The
The
In this implementation manner of the first specific embodiment, the apparatus 500 in the decoder 200 is configured to read (step 806 and step 807) the index (MVSD _ idx) and the
Fig. 9 is a flow chart of a possible implementation of the second embodiment. The
Subsequently, the
In this implementation of the second specific embodiment, the device 500 in the decoder 200 is configured to read (step 907) the
Fig. 10 shows how in one possible implementation of the first particular embodiment the
Fig. 11 shows how in one possible implementation of the second particular embodiment the
Fig. 12 shows how in one possible implementation of the first particular embodiment, the device 500 is integrated into the decoder 200 of fig. 2, in particular how components in the device 500 are integrated into the
The
Fig. 13 shows how in one possible implementation of the second particular embodiment the device 500 is integrated into the decoder 200 of fig. 2, in particular how components in the device 500 are integrated into the
The
Fig. 14 shows in more detail how the
Specifically, in fig. 14, the absolute value of the MVD is used (step 1401) as an input, and a list of
It should be noted that the present specification provides an explanation of an image (frame), but in the case of an interlaced image signal, a field replaces an image.
Those skilled in the art will understand that the "steps" ("elements") in the various figures (methods and apparatus) represent or describe the functionality of an embodiment of the present invention (rather than individual "elements" in hardware or software), and thus describes the functionality or features of an apparatus embodiment equally as well as a method embodiment (element-equivalent steps).
The term "unit" is used merely to illustrate the functionality of an embodiment of an encoder/decoder and is not intended to limit the present invention.
In several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely exemplary. For example, the division of the unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units are integrated into one unit.
Embodiments of the invention may also include an apparatus, e.g., an encoder and/or decoder, including processing circuitry to perform any of the methods and/or processes described herein.
Embodiments of
The functionality of the encoder 100 (and corresponding encoding method 100) and/or the decoder 200 (and corresponding decoding method 200) may be implemented by program instructions stored on a computer readable medium. Which when executed cause a processing circuit, computer, processor, etc., to perform the steps of the encoding and/or decoding method. The computer readable medium may be any medium that stores the program, including non-transitory storage media such as a blu-ray disc, DVD, CD, USB (flash) drive, hard disk, server storage available via a network, and the like.
Embodiments of the invention include or are a computer program comprising program code. The program code is for performing any of the methods described herein when executed on a computer.
Embodiments of the invention include or are a computer readable medium containing program code. The program code, when executed by a processor, causes a computer system to perform any of the methods described herein.
REFERENCE SIGNS LIST
FIG. 1 shows a schematic view of a
100 encoder
103 image block
102 input (e.g., input port, input interface)
104 residual calculation [ units or steps ]
105 residual block
106 transformation (e.g., additionally including scaling) [ units or steps ]
107 transform coefficients
108 quantification [ units or steps ]
109 quantized coefficients
110 inverse quantization [ units or steps ]
111 dequantization coefficients
112 inverse transformation (e.g., additionally including scaling) [ units or steps ]
113 inverse transform block
114 reconstruction [ units or steps ]
115 reconstruction block
116 (column) buffer [ unit or step ]
117 reference pixel
120 loop filter [ unit or step ]
121 filter block
130 Decoded Picture Buffer (DPB) [ unit or step ]
142 inter-frame estimation (inter picture estimation) [ units or steps ]
143 inter-frame estimation parameters (e.g., reference picture/reference picture index, motion vector/offset)
144 inter prediction/inter picture prediction [ unit or step ]
145 inter-predicted block
152 intra estimation/intra picture estimation [ unit or step ]
153 Intra prediction parameters (e.g., Intra prediction mode)
154 intra prediction (intra prediction/intra frame/picture prediction) [ units or steps ]
155 intra prediction block
162 mode selection [ cell or step ]
165 prediction block (inter prediction block 145 or intra prediction block 155)
170 entropy coding [ units or steps ]
171 coded image data (e.g., codestream)
172 output (output port, output interface)
231 decoding images
FIG. 2
200 decoder
171 coded image data (e.g., codestream)
202 input (Port/interface)
204 entropy decoding
209 quantized coefficients
210 inverse quantization
211 dequantizing coefficients
212 inverse transformation (zoom)
213 inverse transform block
214 rebuild (unit)
215 reconstructed block
216 (column) buffer
217 reference pixel point
220 Loop filter (in-loop filter)
221 filter block
230 Decoded Picture Buffer (DPB)
231 decoding images
232 output (Port/interface)
244 inter prediction (inter prediction/inter frame/picture prediction)
245 interframe prediction block
254 Intra prediction/intra frame/picture prediction)
255 intra prediction block
260 mode selection
265 prediction block (
FIG. 3
300 coding system
310 source device
312 image source
313 (original) image data
314 preprocessor/preprocessing unit
315 pre-processing image data
318 communication unit/interface
320 destination device
322 communication unit/interface
326 post-processor/post-processing unit
327 post-processing image data
328 display device/unit
330 transmission/reception/communication (encoding) of image data
FIG. 4
400 video image encoding apparatus
401 processor
402 motion information
403 motion information candidates
404 cost of motion information candidates
405 rank value of motion information candidates
406 transmitter
407 absolute value of motion information
FIG. 5
500 video image decoding apparatus
501 receiver
502 processor
503 motion information candidates
504 cost of motion information candidates
505 rank value of motion information candidates
506 absolute value of motion information
507 motion information
FIG. 6
601 generating motion information
602 constructing motion information candidates
603 calculate the cost of the motion information candidates
604 determining rank values of motion information candidates
605 transmit absolute values according to permutation values
FIG. 7
701 receiving absolute values of motion information
702 constructing motion information candidates
703 calculating the cost of the motion information candidate
704 determining a rank value of a motion information candidate
705 determining motion information from permutation values
FIG. 10 shows a schematic view of a
1001 motion estimation
1002 code MVP index
1003 encodes the MVD absolute value
1004 construct MVD candidates
1005 calculating a permutation value
1006 compares the MVD to the MV
1007 mode selection
FIG. 11
1101 motion estimation
1102 encode the MVP index
1103 encodes the MVD absolute value
1104 construction of MVD candidates
1105 calculating a permutation value
1106 compares the MVD to the MV
1107 mode selection
FIG. 12
1201 parses the MVP index
1202 resolving MVD absolute values
1203 parsing indexes
1204 construction of MVD candidates
1205 calculates permutation values and selects MVD candidates
1206 construction of MVs
1207 mode selection
1208 motion compensation
FIG. 13
1301 parse MVP index
1302 parsing MVD absolute values
1303 construction of MVD candidates
1304 compute rank values and select MVD candidates
1305 construction of MV
1306 mode selection
1307 motion compensation
- 上一篇:一种医用注射器针头装配设备
- 下一篇:具有基于边信息的视频编译的混合运动补偿神经网络