Motion information propagation in video coding

文档序号:1581259 发布日期:2020-01-31 浏览:16次 中文

阅读说明:本技术 视频译码中的运动信息传播 (Motion information propagation in video coding ) 是由 张凯 李翔 陈建乐 张莉 M·卡切维奇 于 2018-06-21 设计创作,主要内容包括:技术涉及一种用于对当前译码图片中的视频数据的当前块进行解码的装置。所述装置可包含配置成存储视频数据的存储器。所述装置还可包含处理器,所述处理器配置成:根据帧内预测模式为所述当前图片中的所述视频数据的所述当前块生成第一预测块;以及根据帧间预测模式为所述当前图片所述图片中的所述视频数据的所述当前块生成第二预测块。所述处理器可配置成:生成从所述图片的所述第二预测块传播到所述第一预测块的运动信息;以及使用所述运动信息来获得最终预测块;接着基于所述最终预测块与残余块的组合来生成经重构块。(The techniques may involve a device for decoding a current block of video data in a currently coded picture, the device may include a memory configured to store video data, the device may also include a processor configured to generate a th prediction block for the current block of the video data in the current picture according to an intra prediction mode and a second prediction block for the current block of the video data in the picture of the current picture according to an inter prediction mode, the processor may be configured to generate motion information that propagates from the second prediction block to the th prediction block of the picture and use the motion information to obtain a final prediction block, then generate a reconstructed block based on a combination of the final prediction block and a residual block.)

1, a device for decoding a current block of video data in a currently coded picture, the device comprising:

a memory configured to store video data; and

a processor configured to:

generating a prediction block for the current block of the video data in the current picture according to an intra prediction mode;

generating a second prediction block for the current block of the video data in the picture of the current picture according to an inter prediction mode;

generating motion information propagated from the second prediction block to the th prediction block of the picture;

obtaining a final prediction block using the motion information; and

generating a reconstructed block based on a combination of the final prediction block and a residual block.

2. The apparatus of claim 1, wherein the th prediction block is used in construction of a candidate list.

3. The apparatus of claim 2, wherein the candidate list is a merge candidate list.

4. The apparatus of claim 2, wherein the candidate list is an AMVP list.

5. The device of claim 1, wherein the th prediction block and the second prediction block are neighboring blocks.

6. The device of claim 5, wherein the th prediction block and the second prediction block are spatially neighboring blocks.

7. The device of claim 5, wherein the th prediction block and the second prediction block are temporally neighboring blocks.

8. The device of claim 5, wherein the neighboring block is within the group of , a slice or a tile or an LCU or a ROW or a picture.

9. The device of claim 5, wherein the neighboring block is located in or a plurality of pre-coded frames.

10. The device of claim 1, wherein a relative position of the second prediction block with respect to the -th prediction block is predefined.

11. The device of claim 1, wherein the second prediction block is selected from a plurality of neighboring blocks according to a predetermined rule.

12. The device of claim 1, wherein the motion information propagated is done on a sub-block level.

A method of processing video data of the variety, comprising:

generating a prediction block for a block of the picture according to the intra prediction mode;

generating a second prediction block for the block of the picture according to an inter-prediction mode;

propagating motion information to the th prediction block based on motion information from the second prediction block, and

generating a final prediction block for the block of the picture based on a combination of the th prediction block and the second prediction block.

14. The method of claim 13, wherein the th prediction block is used in construction of a candidate list.

15. The method of claim 14 wherein the candidate list is a merge candidate list.

16. The method of claim 14 wherein the candidate list is an AMVP list.

17. The method of claim 13, wherein the -th prediction block and the second prediction block are neighboring blocks.

18. The method of claim 17, wherein the -th prediction block and the second prediction block are spatially neighboring blocks.

19. The method of claim 17, wherein the -th prediction block and the second prediction block are temporally neighboring blocks.

20. The method of claim 17, wherein the neighboring block is within the group of , a slice or a tile or an LCU or a ROW or a picture.

21. The method of claim 17, wherein the neighboring block is located in or a plurality of pre-coded frames.

22. The method of claim 13, wherein the th prediction block inherits motion information from the second prediction block, and wherein a relative position of the second prediction block with respect to the th prediction block is predefined.

23. The method of claim 13, wherein the second prediction block is selected from a plurality of neighboring blocks according to a predetermined rule.

24. The method of claim 23, wherein the motion information propagated is done on a sub-block level.

25. The method of claim 13, wherein the propagating of the motion information is performed after encoding a top block.

26. The method of claim 13, wherein the propagating of the motion information is performed after decoding a top block.

27, a device for encoding a current block of video data in a currently coded picture, the device comprising:

a memory configured to store video data; and

a processor configured to:

generating a prediction block for the current block of the video data in the current picture according to an intra prediction mode;

generating a second prediction block for the current block of the video data in the picture of the current picture according to an inter prediction mode;

generating motion information propagated from the second prediction block to the th prediction block of the picture;

obtaining a final prediction block using the motion information; and

generating a reconstructed block based on a combination of the final prediction block and a residual block.

28. The apparatus of claim 27, wherein the th prediction block is used in construction of a merge candidate list.

29. The device of claim 27, wherein the th prediction block is used in construction of an AMVP list.

30, computer readable media having instructions stored thereon that when executed by a processor perform:

generating a prediction block for a block of the picture according to the intra prediction mode;

generating a second prediction block for the block of the picture according to an inter-prediction mode;

propagating motion information to the th prediction block based on motion information from the second prediction block, and

generating a final prediction block for the block of the picture based on a combination of the th prediction block and the second prediction block.

Technical Field

The present disclosure relates to video encoding and video decoding.

Background

Video coding standards include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262, or ISO/IECMPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, and ITU-T H.264 (also known as ISO/IECMPEG-4AVC), including Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions thereof. The latest joint draft of MVC is described in "Advanced video coding for general audio visual services" ITU-T specification h.264 at 3 months 2010.

Furthermore, there is a newly developed video coding standard, High Efficiency Video Coding (HEVC), which is developed by the joint collaborative team of video coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Motion Picture Experts Group (MPEG). Recent draft of HEVC is available fromhttp://phenix.int-evry.fr/jct/doc_end_user/documents/12_ Geneva/wg11/JCTVC-L1003-v34.zipAnd (4) obtaining.

Digital video capabilities can be incorporated into -wide range devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), hand-held or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like digital video devices implement video compression techniques, such as those described by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10, the standards defined by Advanced Video Coding (AVC), the recently finalized High Efficiency Video Coding (HEVC) standard, and extensions of such standards.

For block-based video coding, a video slice (i.e., portion of a video frame or video frame) may be partitioned into video blocks (which may also be referred to as tree blocks), Coding Units (CUs) and/or coding nodes.

For further steps of compression, the residual data may be transformed from the pixel domain to the transform domain, producing residual transform coefficients, which may then be quantized.

Disclosure of Invention

In examples, the techniques relate to devices for decoding a current block of video data in a currently coded picture.

In another example, the techniques involve a method for processing video data that includes generating a th prediction block for a block of a picture according to an intra-prediction mode, generating a second prediction block for the block of the picture according to an inter-prediction mode, and propagating motion information to a th prediction block based on motion information from the second prediction block.

Other features, objects, and advantages of the technology described in this disclosure will be apparent from the description, drawings, and claims.

Drawings

Fig. 1A and 1B show examples of intra and inter frames.

Fig. 2 shows intra prediction modes representing different prediction directions in HEVC.

Fig. 3 shows reference frames using unidirectional prediction and bidirectional prediction.

FIG. 4 shows exemplary neighboring blocks of a current block.

Fig. 5A and 5B illustrate spatially neighboring MV candidates for the merge mode (fig. 5A) and the AMVP mode (fig. 5B).

Fig. 6A illustrates TMVP candidates and fig. 6B illustrates MV scaling.

Fig. 7 is an example of a candidate list based on a predefined order (or in a predefined priority).

Fig. 8 illustrates an example of the construction of a merge candidate list in HEVC.

Fig. 9 shows an example of motion propagation between inter-coded blocks and intra-coded blocks; and instances of motion propagation between intra-coded blocks and inter-coded blocks.

FIG. 10 shows an example of how motion information is inherited based on an intra-prediction mode of a current block.

Fig. 11 illustrates an example of inheriting motion information from temporal neighboring blocks.

Fig. 12 illustrates an example of constructing a merge candidate list with inherited motion information.

Fig. 13 illustrates another example of constructing a merge candidate list with inherited motion information.

Fig. 14 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure.

Fig. 15 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure.

Fig. 16 is a flow diagram illustrating an example video decoding process in accordance with the techniques of this disclosure.

Detailed Description

This disclosure describes techniques relating to filtering methods that may be used in a post-processing stage (as part of in-loop coding) or in a prediction stage. The techniques of this disclosure may be implemented into existing video codecs, such as HEVC (high efficiency video coding), or efficient coding tools for future video coding standards, such as the h.266 standard currently being developed.

Base of video coding

Referring to fig. 1A and 1B and in a video coding standard such as HEVC, there are two categories of frames (or slices, dividing a frame into slices for coding): i.e., inter-frames and intra-frames. In an intra frame, a block may only be coded as an intra block. See g.j. sallivan (Sullivan), j. -r. Ohm (Ohm), w. -j. korean (Han), t. Wiegand (Wiegand) (12 months 2012) "summary of High Efficiency Video Coding (HEVC) standard" (PDF), IEEE Video Technology circuit and system exchanges (IEEE transactions on Circuits and Systems for Video Technology) (IEEE)22(12), retrieve 2012-09-14 (hereinafter "[ 1 ]"). Although in an inter frame, blocks may be coded as intra-blocks or inter-blocks. Fig. 1A and 1B show examples of intra and inter frames. The term "block" as used herein may be a coding unit/block, prediction unit/block, sub-PU, transform unit/block, or any other coding structure. The term "frame" may be used interchangeably herein with "picture".

Referring to fig. 2, intra prediction is used to predict a block that is intra coded, where the current block is predicted from neighboring pixels in the current frame (e.g., pixels in neighboring blocks). Inter prediction is applied as an inter-coded block, which is used to predict the current block by pixels in a previously coded/decoded frame (named "reference frame"). In HEVC, there are 35 intra prediction modes representing different prediction directions as shown in fig. 2. In JEM, the number of intra prediction modes is increased to 67. See j. Chen (Chen), e. ashinar (Alshina), g.j. sarlivin, j. -r. ohm, j. boiss (Boyce), JVET-F1001, "algorithm description of joint exploration test model 6", month 4 in 2017 (hereinafter referred to as "[ 2 ]").

For inter-prediction, Motion Compensation (MC) is used according to reference blocks (uni-directional prediction) or two reference blocks (bi-directional prediction) in the reference frame as shown in fig. 3, each inter-coded block has its motion information, including reference frame index and Motion Vector (MV).

Referring to fig. 4, in order to code an intra prediction mode, an intra prediction mode of a neighboring block is used as a prediction mode for a current mode. Fig. 4 shows an exemplary neighboring block. In HEVC, the intra prediction modes of neighboring blocks a and B are used as prediction modes for the intra mode of the current block. In JEM, more neighboring blocks (a0, a1, B0, B1, B2) are used as prediction modes for the current mode. If a neighboring block does not exist (the current block is at the boundary of the frame) or the neighboring block is not intra-coded (the current frame is an inter-frame but the current block is intra-coded), the neighboring block is marked as "unavailable" and its intra-prediction mode is not used to predict the intra-prediction mode of the current block.

After prediction, the residue (the difference between the block and the prediction) is typically generated and encoded on the encoder using a frequency transform. After prediction, at the decoder, residual data in the bitstream may be used to reconstruct the residual, which is then combined with the decoder prediction to generate a decoded block.

Motion information

Here, the forward and backward prediction directions are two prediction directions of reference picture list 0(RefPicList0) and reference picture list 1(RefPicList1) corresponding to the current picture or slice.

If only reference picture lists are used for a given picture or slice, then every block inside the picture or slice is forward predicted.

For each prediction direction, the motion information contains a reference index and a motion vector the reference index is used to identify a reference picture in a corresponding reference picture list (e.g., RefPicList0 or RefPicList 1).

POC

Although there may be cases where two pictures within coded video sequences may have the same POC value, they typically do not occur within a coded video sequence.

POC values for pictures are typically used for reference picture list construction, derivation of reference picture sets, as in HEVC and motion vector scaling .

Advanced Video Coding (AVC)

MB structure in AVC

In H.264/AVC, each inter Macroblock (MB) can be partitioned into four different ways, (a) 16x16 MB partitions, (b) two 16x 8MB partitions, (c) two 8x16 MB partitions, and (d) four 8x 8MB partitions.

Different MB partitions in the MBs may have different reference index values for each direction (RefPicList0 or RefPicList 1.) when an MB is not partitioned into four 8 × 8MB partitions, the MB has only motion vectors per MB partition in each direction when the MB is not partitioned into four 8 × 8MB partitions each × 8MB partition may be further partitioned into subblocks, each of which may have different motion vectors in each direction there are four different ways of obtaining subblocks from an 8 × 8MB partition (a) 8 × 8 subblocks, (b) two 8x4, (c) two 4x8 subblocks, and (d) four 4x4 subblocks each subblock may have different motion vectors in each direction.

Temporal direct mode in AVC

For each MB partition, motion vectors for blocks collocated with the current MB partition in RefPicList1[0] for the current block are derived.

Spatial direct mode in AVC

In AVC, the direct mode may also predict motion information from spatial neighbors.

HEVC

Coding structure in HEVC

In HEVC, the largest coding unit in a slice is referred to as a Coding Tree Block (CTB) or Coding Tree Unit (CTU). The CTB contains a quadtree whose nodes are coding units.

The size of the CTB may be within the 16x16 to 64x64 range in the HEVC main specification (although 8x8CTB sizes may be supported technically.) although a Coding Unit (CU) may be the same as the CTB size, it may be as small as 8x8 -coded with modes per coding unit when a CU is inter coded, it may be further partitioned into 2 or 4 Prediction Units (PUs), or when no further partitioning is applied, become only PU. which when two PUs are present in CUs, may be a rectangle of half size or two rectangle sizes with 1/4 or 3/4 sizes of the CU.

When a CU is inter coded, there are sets of motion information for each PU, moreover, each PU is coded with -only inter prediction mode to derive the sets of motion information, each unit includes blocks of or each of a plurality of luma or color components.

Motion vector prediction

In the HEVC standard, there are two inter prediction modes for a Prediction Unit (PU), named merge (special case where skip is considered as merge) and Advanced Motion Vector Prediction (AMVP) modes, respectively.

However, in AMVP mode, for each potential prediction directions from either list0 or list1, the reference index needs to be explicitly signaled along with the MVP index to the MV candidate list, since the AMVP candidate contains only motion vectors.

Candidates for both modes are derived in a similar way from the same spatial and temporal neighboring blocks.

Spatially adjacent candidate

Fig. 5A and 5B illustrate spatially neighboring MV candidates for the merge mode (fig. 5A) and the AMVP mode (fig. 5B).

Referring to fig. 5A and 5B, a slave target is directed to a specific PU (PU)0) The neighboring blocks shown derive spatial MV candidates, however the method of generating candidates from blocks differs for merge mode and AMVP mode.

In merge mode, up to four spatial MV candidates may be derived in the order shown with numbers on fig. 5A, and the order is as follows: left (0, a1), top (1, B1), top right (2, B0), bottom left (3, a0), and top left (4, B2). In other words, fig. 5A illustrates spatial neighboring MV candidates for the merge mode, and fig. 5B illustrates spatial neighboring MV candidates for the AMVP mode.

In AVMP mode, as illustrated in FIG. 5B, neighboring blocks are divided into two groups, a left group consisting of blocks 0 and 1, and an upper group consisting of blocks 2, 3, and 4 as shown on FIG. 5B for each group, potential candidates in neighboring blocks that reference the same reference picture as indicated by the signaled reference index have the highest candidate priority to form the final candidate for the group.

Temporal motion vector prediction in HEVC

If enabled and available, a Temporal Motion Vector Predictor (TMVP) candidate is added to the MV candidate list after the spatial motion vector candidate. The process of motion vector derivation for TMVP candidates is the same for both merge mode and AMVP mode, whereas the target reference index for a TMVP candidate is always set to 0 in merge mode.

Fig. 6A illustrates TMVP candidates and fig. 6B illustrates MV scaling.

Referring to fig. 6A, the main block orientation for TMVP candidate derivation is to juxtapose the outer, lower right block of the PU as shown by block "T" to compensate for the bias to the upper left block used to generate the spatially neighboring candidates. However, if the block is located outside of the current CTB row or motion information is not available, the block is replaced by the central block of the PU.

The motion vector of the TMVP candidate is derived from the collocated PU of the collocated picture indicated in the slice level. The motion vector of the collocated PU is called the collocated MV.

Similar to the temporal direct mode in AVC, to derive the TMVP candidate motion vectors, collocated MVs need scaling to compensate for temporal distance differences, as shown in fig. 6A and 6B.

Other aspects of motion prediction in HEVC

Aspects of merge mode and AMVP mode merit motion vector scaling, assuming that the value of the motion vector is proportional to the distance of the picture within the presentation time, a motion vector associates two pictures, i.e., reference pictures, with a picture containing a motion vector, i.e., containing a picture, when the motion vector is utilized to predict another motion vector, the distance of the containing picture from the reference picture is calculated based on the Picture Order Count (POC) value.

For a motion vector to be predicted, both its associated contained picture and reference picture may be different. Therefore, a new distance (POC based) is calculated. And the motion vector is scaled based on these two POC distances. For spatially neighboring candidates, the contained pictures of the two motion vectors are the same, while the reference pictures are different. In HEVC, motion vector scaling applies to both TMVP and AMVP of spatial and temporal neighboring candidates.

Artificial motion vector candidate generation: if the motion vector candidate list is not complete, artificial motion vector candidates are generated and inserted at the end of the list until it will have all candidates.

In merge mode, there are two types of artificial MV candidates, the combined bi-prediction candidate derived only for B slices and the default fixed candidate, if type does not provide enough artificial candidates, only the zero candidate is used for AMVP.

For every pairs of candidates that are already in the candidate list and have the necessary motion information, a bi-directionally combined motion vector candidate may be derived by referring to the combination of the motion vector of candidate of a picture in list0 and the motion vector of the second candidate of a picture in reference list 1.

This refinement procedure may be applied to solve this problem.it compares candidates with other candidates in the current candidate list to avoid inserting the same candidates to the extent of . to reduce complexity, only a limited number of refinement procedures are applied, rather than comparing every potential candidates with all other existing candidates.

For example, in HEVC, a merge candidate list, AMVP candidate list, is constructed by inserting candidates based on a predefined order (or by a predefined priority). as shown in fig. 7, a merge candidate list is constructed by inserting spatial merge candidates in a predefined order (a1 → B1 → B0 → a0 → B2).

The term "available" means that the Block exists, is inter coded, the candidate list is incomplete, and motion information in the Block is not pruned through existing candidates in the current candidate list, it should be noted that candidates can only be pruned into part of existing candidates in the current candidate list, for B2, it is only checked whether there are less than 4 candidates after checking a1, B1, B0, and a0, if the merge candidate list is incomplete after checking all spatial and temporal neighboring blocks, then artificial candidates list will be filled in to complete the merge candidate list.

Improving decoding efficiency

To improve prediction efficiency, embodiments include propagating motion information to intra-coded blocks in inter-pictures. In other words, the intra-coded block itself may be used for motion vector prediction. For example, intra-coded blocks may be used in the construction of candidate lists, such as the merge candidate list and AMVP list for newly coded blocks. According to embodiments, the following exemplary methods may be applied, respectively. According to alternative embodiments, any combination of the exemplary methods may be applied.

Fig. 9 illustrates an example of motion propagation between inter-coded blocks and intra-coded blocks; and instances of motion propagation between intra-coded blocks and inter-coded blocks.

An intra-coded block inherits motion information from spatial and/or temporal neighboring blocks, where the neighboring blocks may be intra-coded and/or inter-coded. Fig. 9 shows an example of motion propagation between inter-coded blocks and intra-coded blocks; and instances of motion propagation between intra-coded blocks and inter-coded blocks.

In examples, the neighboring blocks are neighboring blocks within the same slice/tile/row of LCUs/picture.

Alternatively, the neighboring block may be a neighboring block located in or a plurality of pre-coded frames.

An intra-coded block may inherit motion information from neighboring blocks and may pre-define the relative positions of the neighboring blocks.

The rule may be further defined to select neighbors from the multiple neighbors, and the intra-coded block will inherit the motion information from the selected neighbors.

According to an example, each block will be filled with motion information after being encoded/decoded, whether it is intra-coded or inter-coded.

According to an example, an intra-coded block inherits motion information from a neighboring block, and selection of a neighboring block from which to inherit the motion information of the current intra-coded block is based on the coding mode of the neighboring block.

According to an example, a priority-based approach may be defined to select motion information from neighboring blocks. When an intra-coded block chooses to inherit motion information from its neighboring blocks, the inter-coded neighboring blocks have a higher priority than the intra-coded neighboring blocks.

According to an example, if a neighboring block is intra coded, the intra coded block inherits motion information from the neighboring block based on intra prediction modes of the neighboring block.

An embodiment is where an intra-coded block chooses to inherit motion information from its neighboring blocks when the intra-coded block chooses to inherit motion information from its neighboring blocks, the bi-directionally predicted neighboring blocks have a higher priority than the uni-directionally predicted neighboring blocks.

In the example of FIG. 10, an intra-coded block inherits motion information from neighboring blocks based on the intra-prediction mode of the current intra-coded block.

FIG. 10 shows an example of how motion information is inherited based on an intra-prediction mode of a current block. If the intra prediction mode is DC or Planar, the neighboring block priority is L > T > LT > LB > TR; if the intra prediction mode is a direction lower than the diagonal direction, the neighboring block priority is L > LB > LT > T > TR; if the intra prediction mode is a direction greater than the diagonal direction, the neighboring block priority is T > TR > LT > L > LB.

Fig. 11 illustrates an example of inheriting motion information from temporal neighboring blocks. The motion information of an intra-coded block may be propagated from temporally neighboring blocks. For example, an intra-coded block may inherit motion information from its collocated blocks in a collocated block. Fig. 11 shows an example of propagation from a temporal neighboring block. The virtual reference block is located in a reference picture with a virtual motion vector. The current block may inherit motion information from a virtual reference block. The virtual motion vector may be predefined or it may be inherited from a spatial or temporal neighboring block.

An exemplary method to inherit motion information of an intra-coded block in an inter-picture is described as follows.

First, five spatial neighboring blocks are accessed, L, T, TR, LB, LT. check whether it is available per neighboring blocks, furthermore, in examples, neighboring blocks outside the current slice/tile/LCU row are defined as unavailable and will not be used.

Next, the available neighboring blocks are classified into two categories: class 1 and class 2. Classifying a neighboring block into class 1 if it is inter-coded; otherwise (i.e., neighboring blocks are intra coded), they are classified into class 2.

Next, a Priority Order List (POL) is constructed based on the intra prediction mode of the current block. For example, there are different POLs that may be possible: (i) if intra prediction mode is DC or Planar, POL ═ { L, T, LT }; (ii) if the intra prediction mode is a direction lower than the diagonal direction, POL ═ { L, LB, LT }; and (iii) if the intra prediction mode is a direction greater than the diagonal direction, POL ═ T, TR, LT }.

Next, each neighboring block in the POL is checked one by one and the th in class 1 is found if we can find the th in class 1, then the motion information is inherited from these and the algorithm is stopped.

Next, each neighboring block in the POL is checked one by one and the th in class 2 is found if we can find the th in class 2, then the motion information is inherited from these and the algorithm is stopped.

For example, for a B picture, bi-prediction with zero motion with reference index 0 to reference list0 and zero motion with reference index 0 to reference list1 may be default motion information.

According to an embodiment, the motion information of the intra-coded block is populated with the motion information of the th candidate for the merge list of the current intra-coded block.

According to an embodiment, motion propagation may be performed at a sub-block level. A sub-block is an mxn block smaller than the current block. For example, the sub-block size may be 4 × 4, 4 × 8, 8 × 48 × 8, etc. The current block consists of X non-overlapping sub-blocks. Motion information may be propagated from block to sub-block. Further, motion information may be propagated from sub-block to sub-block.

According to an embodiment, inherited motion information in an intra-coded block may be used in motion vector prediction. For example, they may be used to construct a merge candidate list and/or an AMVP candidate list for a recently coded block.

Fig. 12 illustrates an example of constructing a merge candidate list with inherited motion information.

According to another example, when constructing the candidate list, the order of motion information from spatial or neighboring blocks inserted into the candidate list may depend on whether it is original information from an inter-coded block, or whether it is inherited information from an intra-coded block.

The original information (i.e., associated with inter-coded blocks) and the inherited motion information (i.e., associated with intra-coded blocks) may have different priorities. Motion information with a higher priority may be added to the candidate list first.

The term "intra-valid" means that neighboring blocks exist, are intra coded, and the candidate list for the current block is incomplete, and the inherited motion information in the neighboring blocks is not pruned from existing candidates in the current candidate list.

Fig. 13 illustrates another example of constructing a merge candidate list with inherited motion information in examples, the raw information associated with spatial neighboring blocks may have a higher priority than the inherited motion information, while the inherited motion information may have a higher priority than the raw information associated with temporal neighboring blocks fig. 13 shows another example of constructing a merge candidate list with inherited motion information, checking whether spatial neighboring blocks are intra-active after checking for normal spatial candidates, checking whether temporal neighboring blocks are intra-active after checking for normal TMVP candidates.

In addition to considering original or inherited motion information for the priority definition, steps can be taken into account further in examples of checking order are available a1/B1/B0/a0 blocks, intra-valid a1/B1/B0/a0 blocks, available B2, intra-valid B2, TMVP.

In another example, the inherited motion information may have a higher priority than motion information from non-spatially neighboring and/or non-temporally neighboring blocks.

According to another example, the inherited motion information may only be stored and used for coding the current slice/tile/picture.

FIG. 14 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure.

In the example of fig. 14, video encoder 20 includes video data memory 33, partition unit 35, prediction processing unit 41, summer 50, transform processing unit 52, quantization unit 54, entropy encoding unit 56. Prediction processing unit 41 includes a Motion Estimation Unit (MEU)42, a Motion Compensation Unit (MCU)44, and an intra prediction unit 46. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform processing unit 60, summer 62, filter unit 64, and Decoded Picture Buffer (DPB) 66.

As shown in fig. 14, video encoder 20 receives video data and stores the received video data in video data memory 33 may store video data to be encoded by components of video encoder 20 DPB 66 may, for example, be obtained from video source 18 DPB 66 may be a reference picture memory that stores reference video data for encoding the video data by video encoder 20, such as in an intra or inter coding mode, video data memory 33 and DPB 66 may be formed of any of various memory devices, such as Dynamic Random Access Memory (DRAM) including synchronous DRAM (sdram), magnetoresistive ram mram, (r), resistive ram (rram), or other types of memory devices video data memory 33 and DPB 66 may be provided by the same memory device or separate memory devices in various examples video data memory 33 may be on-chip with other components of video encoder 20 or off-chip with respect to those components.

The video encoder 20 generally illustrates components for encoding video blocks within a video slice to be encoded, the slice may be divided into a plurality of video blocks (and possibly into a set of video blocks referred to as tiles). the prediction processing unit 41 may select of a plurality of possible coding modes, such as of a plurality of intra coding modes or of a plurality of inter coding modes, for the current video block based on error results (e.g., coding rate and distortion). the prediction processing unit 41 may provide the resulting intra or inter coded block to the summer 50 to generate residual block data and to the summer 62 to reconstruct the coded block for use as a reference picture list prediction processing unit 41 may be part of a processor configured to generate a second intra or inter coded block according to the prediction mode and a configurable prediction block from the second intra prediction mode to generate a final prediction block based on the picture list prediction information, such as a second prediction block, and a configurable prediction block from the second picture prediction mode, such as a vp, and a configurable prediction block from the second picture list prediction processing unit 6778.

In an example, the th prediction block and the second prediction block are neighboring blocks, in another example, the th prediction block and the second prediction block are spatial neighboring blocks, in another example, the th prediction block and the second prediction block are temporal neighboring blocks, in another example, the neighboring blocks are within the same group as slice or tile or LCU or ROW or picture.

Intra-prediction unit 46 within prediction processing unit 41 may perform intra-predictive coding of the current video block relative to or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression, motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-predictive coding of the current video block relative to or more predictive blocks in or more reference pictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine an inter prediction mode for a video slice from a predetermined mode for a video sequence. The predetermined pattern may designate a video slice in the sequence as a P-slice or a B-slice. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation by motion estimation unit 42 is the process of generating motion vectors, which estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.

At , video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in DPB 66. for example, video encoder 20 may interpolate values for -quarter pixel positions, -eighth pixel positions, or other fractional pixel positions of the reference picture.

Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the location of the PU to the location of a predictive block of a reference picture.

The reference picture may be selected from the -th reference picture list (list 0) or the second reference picture list (list 1), each of which identifies or more reference pictures stored in the DPB 66 motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.

The motion compensation by motion compensation unit 44 may involve extracting or generating a predictive block based on a motion vector determined by motion estimation (possibly interpolation to sub-pixel precision), upon receiving the motion vector for a PU of a current video block, motion compensation unit 44 may locate, among in a reference picture list, the predictive block to which the motion vector points.

After prediction processing unit 41 generates a predictive block for the current video block via intra-prediction or inter-prediction, video encoder 20 forms a residual video block by subtracting the predictive block from the current video block residual video data in the residual block may be included in or more TUs and applied to transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, transform processing unit 52 may convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.

Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54 quantizes the transform coefficients to further steps reduce the bit rate the quantization process may reduce the bit depth associated with some or all of the coefficients the degree of quantization may be modified by adjusting the quantization parameter in some examples quantization unit 54 may then scan a matrix that includes the quantized transform coefficients in another example entropy encoding unit 56 may scan.

Entropy encoding unit 56 entropy encodes the quantized transform coefficients after quantization, for example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method or technique after entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30, or archived for later transmission or retrieval by video decoder 30.

Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct residual blocks in the pixel domain for later use as reference blocks for reference pictures motion compensation unit 44 may calculate reference blocks by adding residual blocks to predictive blocks of of the reference pictures within of the reference picture list motion compensation unit 44 may also apply or multiple interpolation filters to the reconstructed residual blocks to calculate sub-integer pixel values for use in motion estimation summer 62 adds the reconstructed residual blocks to the motion compensated prediction block produced by motion compensation unit 44, producing a reconstructed block.

Filter unit 64 filters the reconstructed block (e.g., the output of summer 62) and stores the filtered reconstructed block in DPB 66 for use as a reference block. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture. The filter unit 64 may perform any type of filtering such as deblocking filtering, SAO filtering, ALF and/or GALF, and/or other types of loop filters. For example, the deblocking filter may apply deblocking filtering to filter block boundaries to remove blockiness artifacts from the reconstructed video. The SAO filter may apply an offset to the reconstructed pixel values in order to improve the overall coding quality. Additional loop filters (in the loop or post-loop) may also be used.

Fig. 15 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure, video decoder 30 of fig. 8 may, for example, be configured to receive the signaling described above with respect to video encoder 20 of fig. 14 in the example of fig. 15, video decoder 30 includes video data memory 78, entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, inverse transform processing unit 88, summer 90, and DPB94 prediction processing unit 81 includes motion compensation unit 82 and intra prediction unit 84 in examples, video decoder 30 may make a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 from fig. 7.

During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. video decoder 30 stores the received encoded video bitstream in video data memory 78. video data memory 78 may store video data to be decoded by components of video decoder 30, such as an encoded video bitstream. video data stored in video data memory 78 may be obtained, for example, from storage device 26 via link 16 or from a local video source, such as a camera, or by accessing a physical data storage medium. video data memory 78 may form a Coded Picture Buffer (CPB) that stores encoded video data from the encoded video bitstream DPB94 may be a reference picture memory that stores reference video data used to decode the video data by video decoder 30, such as in an intra or inter coding mode. video data memory 78 and DPB94 may be formed from of various memory devices, such as SDRAM 636394, MRAM 63dram or other types of memory devices and DPB memory 78 may be located separately from video decoder 78 or other video memory components on-chip, or other memory devices such as 3683.

Entropy decoding unit 80 of video decoder 30 entropy decodes the video data stored in video data memory 78 to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction processing unit 81. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

When a video frame is coded as an inter-coded (I) slice, the intra prediction unit 84 of the prediction processing unit 81 may generate prediction data for video blocks of the current video slice based on a signaled intra prediction mode and data from pre-decoded blocks of the current frame or picture the motion compensation unit 82 of the prediction processing unit 81 may generate a final generated predictive block for a video block of the current video slice based on motion vectors and other syntax elements received from the entropy decoding unit 80 when the video frame is coded as an inter-coded slice (e.g., a B slice or a P slice). the final generated predictive block may be generated from of the reference pictures within of the reference picture list.

In an example, the th prediction block and the second prediction block are neighboring blocks, in another example, the th prediction block and the second prediction block are spatial neighboring blocks, in another example, the th prediction block and the second prediction block are temporal neighboring blocks, in another example, the neighboring blocks are within the same group as slice or tile or LCU or ROW or picture.

Video decoder 30 may construct reference frame lists (list 0 and list1) using default construction techniques based on the reference pictures stored in DPB 94.

Inverse quantization unit 86 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. the inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in a video slice to determine a degree of quantization and likewise determine a degree of inverse quantization that should be applied inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate residual blocks in the pixel domain.

After prediction processing unit generates a predictive block for the current video block using, for example, intra or inter prediction, video decoder 30 forms a reconstructed video block by summing the residual block from inverse transform processing unit 88 with the corresponding predictive block generated by motion compensation unit 82 summer 90 represents or more components that may perform this summation operation.

Filter unit 92 filters the reconstructed block, such as the output of summer 90, and stores the filtered reconstructed block in DPB94 for use as a reference block. The reference block may be used by motion compensation unit 82 as a reference block to inter-predict a block in a subsequent video frame or picture. The filter unit 92 may perform any type of filtering such as deblocking filtering, SAO filtering, ALF and/or GALF, and/or other types of loop filters. For example, the deblocking filter may apply deblocking filtering to filter block boundaries to remove blockiness artifacts from the reconstructed video. The SAO filter may apply an offset to the reconstructed pixel values in order to improve the overall coding quality. Additional loop filters (in the loop or post-loop) may also be used.

FIG. 16 is a flow diagram illustrating an example video decoding process according to techniques of this disclosure , as illustrated in FIG. 16, a device for decoding a current block of video data in a current coded picture, the device comprising a memory configured to store video data, and a processor configured to generate a th prediction block 122 for the current block of video data in the current picture according to an intra prediction mode, the processor configurable to generate a second prediction block 124 for the current block of video data in the current picture according to an inter prediction mode, the processor further configurable to generate motion information 126 that propagates from the second prediction block to a th prediction block of the picture, the processor further configurable to obtain a final prediction block using the determined motion information, 128, and to generate a reconstructed block 130 based on a combination of the final prediction block and a residual block.

In or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof if implemented in software, the functions may be stored or transmitted as or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit the computer-readable medium may include a computer-readable storage medium corresponding to a tangible medium such as a data storage medium or a communication medium including any medium that facilitates transfer of a computer program from to , such as in accordance with a communication protocol.

If implemented in software, the techniques may be realized at least in part by a computer-readable data storage medium comprising program code containing instructions that when executed perform the method or methods described above or a plurality thereof.

Program code may be executed by a processor, which may include or multiple processors, such as or multiple Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.

The source device and destination device may include any of the -wide range of devices, including desktop computers, notebook (i.e., handheld) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" boards, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like.

In examples, the computer-readable medium may comprise a communication medium that enables the source device to transmit encoded video data directly to the destination device in real-time.

In other examples, the destination device may access the stored video data from the storage device via streaming or download.

The techniques of this disclosure are not necessarily limited to wireless applications or settings, the techniques may be applied to support video coding of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions (e.g., dynamic adaptive streaming over HTTP (DASH)), digital video encoded onto a data storage medium, decoding digital video stored on a data storage medium, or other applications in in some examples, a system may be configured to support one-way or two-way video transmissions to support applications such as video streaming, video playback, video broadcasts, and/or video telephony.

In examples, a source device includes a video source, a video encoder, and an output interface, a destination device may include an input interface, a video decoder, and a display device.

The techniques of this disclosure may also be performed by a video pre-processor, the source device and destination device are merely examples of such coding devices in which coded video data is generated by the source device for transmission to the destination device.

The video source may include a video capture device, such as a video camera, a video archive containing pre-captured video, and/or a video feed interface to receive video from a video content provider.

As mentioned, the computer-readable medium may comprise a transitory medium, such as a wireless cast or a wired network transmission, or a storage medium (i.e., a non-transitory storage medium), such as a hard disk, flash drive, compact disc, digital video disc, Blu-ray disc, or other computer-readable medium.in some examples, a network server (not shown) may receive encoded video data from a source device and provide the encoded video data to a destination device, such as via a network transmission.

The display device displays the decoded video data to a user and may include any of various display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

Various examples have been described. These and other examples are within the scope of the following claims.

31页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于视频代码化中的运动补偿预测的光流估计

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类