Low complexity frame rate up-conversion

文档序号：1146568 发布日期：2020-09-11 浏览：18次中文

阅读说明：本技术 低复杂度的帧速率上变换 (Low complexity frame rate up-conversion ) 是由修晓宇贺玉文叶艳于 2019-01-28 设计创作，主要内容包括：所描述的是用于选择在视频块的帧速率上变换(FRUC)译码处理中使用的运动矢量(MV)的系统和方法。在一个实施例中,为所述块的FRUC预测识别第一运动矢量候选集合。基于所述第一运动矢量候选集合来定义搜索中心,并且确定搜索窗口,所述搜索窗口具有选定的宽度并以所述搜索中心为中心。在所述搜索窗口内执行对选定的MV的搜索。在一些实施例中,通过使用聚类算法来处理初始MV集合,以便生成用作第一集合的数量较少的MV。所选择的MV可以经历运动细化搜索,所述搜索也可以在受约束的搜索范围上执行。在附加实施例中,搜索迭代会受到约束,以便限制复杂度。(Described are systems and methods for selecting Motion Vectors (MVs) for use in a Frame Rate Up Conversion (FRUC) coding process for video blocks. In one embodiment, a first set of motion vector candidates is identified for FRUC prediction of the block. A search center is defined based on the first candidate set of motion vectors, and a search window is determined, the search window having a selected width and centered on the search center. A search for the selected MV is performed within the search window. In some embodiments, the initial set of MVs is processed by using a clustering algorithm to generate a smaller number of MVs for use as the first set. The selected MV may undergo a motion refinement search, which may also be performed over a constrained search range. In additional embodiments, search iterations may be constrained to limit complexity.)

1. A method for coding video, the method comprising: for at least one current block in the video:

identifying a first set of motion vector candidates for Frame Rate Up Conversion (FRUC) prediction of the block;

defining a search center based on a first motion vector candidate set, wherein the search center is an average of one or more motion vectors in the first motion vector candidate set;

determining a search window having a selected width and centered at the search center;

processing the first set of motion vector candidates to fall within the search window by clipping any motion vectors of the first set of motion vector candidates that fall outside the search window; and

a search for a selected motion vector is performed from the processed first set of motion vector candidates.

2. The method of claim 1, further comprising:

performing a motion refinement search based on the selected motion vector candidate to generate a refined motion vector; and

predicting the block by frame rate up-conversion using the refined motion vector.

3. The method of any of claims 1-2, wherein identifying the first motion vector candidate set is performed by a method comprising:

aggregating the initial set of motion vector candidates into a plurality of clusters; and

for each cluster, a centroid of the respective cluster is calculated and a centroid motion vector representing the centroid of the respective cluster is contributed to the first motion vector candidate set.

4. The method of claim 3, wherein identifying the first set of motion vector candidates comprises limiting the first set of motion vector candidates to a selected maximum number of motion vectors.

5. The method according to any of claims 1-4, wherein said search center is one motion vector in said first motion vector candidate set.

6. The method according to any of claims 1-4, wherein the search center is an average of the first motion vector candidate set.

7. The method of claim 3, wherein the search center is the center of the cluster having the most motion vector candidates.

8. The method of any of claims 1-7, wherein performing a search for a selected motion vector from the processed first motion vector candidate set comprises: selecting a motion vector that achieves the lowest matching cost from the processed first motion vector candidate set.

9. The method of any of claims 1-8, wherein the current block is a Coding Unit (CU), further comprising:

performing a motion refinement search based on the selected motion vector candidate to generate a refined CU-level motion vector; and

for each sub-CU in the coding unit:

using the refined CU-level motion vector as a motion vector candidate in a sub-CU-level motion vector search for the selected sub-CU-level motion vector;

refine the sub-CU level motion vectors; and

predicting the sub-CU using the refined sub-CU level motion vector.

10. The method of any of claims 1-8, wherein the current block is a Coding Unit (CU), further comprising:

performing a motion refinement search based on the selected motion vector candidate to generate a refined CU-level motion vector; and

for each sub-CU level block inside the coding unit:

identifying a sub-CU initial motion vector set comprising the refined CU-level motion vector;

aggregating the sub-CU initial motion vectors into a plurality of clusters, each cluster having an associated centroid motion vector;

processing any centroid motion vector in the set that falls outside a sub-CU search window by clipping the centroid motion vector to fall inside the sub-CU search window;

performing a search for a selected sub-CU motion vector from the set of processed centroid motion vectors;

performing a motion refinement search inside the search window to generate refined sub-CU motion vectors; and

predicting the sub-CU level block by Frame Rate Up Conversion (FRUC) using the refined sub-CU motion vector.

11. The method of claim 10, wherein the sub-CU search window is the same as a search window used to determine the CU-level motion vectors.

12. The method of any of claims 1 to 11, further comprising: selecting at least one reference picture for Frame Rate Up Conversion (FRUC) prediction of the block, wherein the selected width of the current picture is determined based at least in part on a POC distance between the current picture and the at least one reference picture.

13. The method of any of claims 1 to 12, further comprising:

performing a motion refinement search based on the selected motion vector candidate to generate a refined motion vector, wherein the motion refinement search is limited to a selected maximum number of iterations.

14. The method of any of claims 1-13, wherein the method is performed by a decoder.

15. A system comprising a processor and a non-transitory computer-readable medium storing instructions operable to perform a method for coding video, the method comprising: for at least one current block in the video:

identifying a first set of motion vector candidates for Frame Rate Up Conversion (FRUC) prediction of the block;

defining a search center based on a first motion vector candidate set, wherein the search center is an average of one or more motion vectors in the first motion vector candidate set;

determining a search window having a selected width and centered at the search center;

a search for a selected motion vector is performed from the processed first set of motion vector candidates.

Background

Video coding systems are widely used to compress digital video signals in order to reduce the storage requirements and/or transmission bandwidth of such signals. Among the different types of video coding systems (e.g., block-based, wavelet-based, and object-based systems), the most widely used and deployed today are block-based hybrid video coding systems. Examples of block-based video coding systems include international video coding standards such as MPEG1/2/4 part 2, H.264/MPEG-4 part 10AVC, VC-1, and the latest video coding standard named High Efficiency Video Coding (HEVC) developed by ITU-T/SG16/Q.6/VCEG and JCT-VC of ISO/IEC/MPEG.

The first version of the HEVC standard, which was completed in 2013 in 10 months, saves about 50% of the bit rate or provides equivalent perceptual quality compared to the previous generation video coding standard h.264/mpeg avc. Although the HEVC standard provides significant coding improvements over its predecessors, there is evidence that by using additional coding tools, higher coding efficiencies can be achieved beyond HEVC. Accordingly, both VCEG and MPEG have enabled exploratory work on new decoding techniques to standardize future video coding. In 10 months of 2015, ITU-T VECG and ISO/IEC MPEG built a joint video exploration team (jfet), and significant research was started on advanced technologies that can greatly enhance coding efficiency compared to HEVC. In the same month, a software code base named as a Joint Exploration Model (JEM) is also established for the future video decoding exploration work. The JEM reference software is based on the HEVC test model (HM) developed for HEVC by JCT-VC. Preferably, any of the proposed additional transcoding tools can be integrated into JEM software and can be detected with JFET Universal test Condition (CTC).

Disclosure of Invention

Example embodiments include methods used in video encoding and decoding (collectively, "coding"). In one example, provided is a method for encoding video containing a plurality of pictures including a current picture. For at least one current block (which may be, as an example, a coding unit or a sub-coding unit block) in a current picture, a first set of motion vector candidates for Frame Rate Up Conversion (FRUC) prediction of the block is identified. The first candidate set of motion vectors may be dominant motion vectors, wherein each dominant motion vector is associated with a cluster (cluster) of one or more initial motion vectors. A search center is defined based on the first set of motion vector candidates. The search center may be located at a position determined by an average of one or more motion vector candidates in the first set.

A search window is determined, the search window having a selected width and centered on a search center. The first candidate set of motion vectors is processed to fall within the search window by clipping any motion vectors of the first set that fall outside the search window. A search for the selected motion vector may be performed, which selects a motion vector from the processed first motion vector candidate set.

In some exemplary embodiments, the width of the search window may be selected to allow searches to be performed using a memory cache while limiting the number of accesses to external memory.

The motion refinement search may be performed starting with the selected motion vector. In some embodiments, the search scope of the refined search is constrained, thereby allowing the refined search to be performed using the memory cache and limiting the number of accesses to the main memory. In some embodiments, the number of search iterations may be constrained in order to limit the decoding complexity.

In some embodiments, the first motion vector candidate set is an initial candidate set identified using advanced motion vector prediction or another technique. In some embodiments, the initial candidate set of motion vectors will be aggregated by performing a clustering algorithm, and the typical motion vector (e.g., centroid) for each cluster will be contributed to the first set. In some embodiments, the number of clusters may be constrained in order to limit coding complexity.

The selected (and in some cases refined) motion vector may be used to predict the current block. In one method performed by an encoder, a prediction for a current block is compared to an input block, a residue is determined, and the residue is encoded in a bitstream. In one method performed by the decoder, the residual is decoded from the bitstream and added to the prediction to generate a reconstructed block that can be displayed (possibly after filtering).

The disclosed embodiments further include an encoder and decoder having a processor and a non-transitory computer-readable storage medium storing instructions operable to perform the methods described herein. The disclosed embodiments further include a non-transitory computer-readable storage medium for storing a bitstream generated using any of the methods described herein.

Drawings

Fig. 1 is a functional block diagram of a block-based video encoder.

Fig. 2 is a functional block diagram of a video decoder.

Fig. 3A-3B illustrate Frame Rate Up Conversion (FRUC) using template matching (fig. 3A) and bilateral matching (fig. 3B).

Fig. 4 is a flow chart of a motion derivation process for FRUC in a JEM implementation.

Fig. 5 is an illustration regarding the positions of spatial motion vector candidates in merge mode.

Fig. 6 is an illustration of motion field interpolation as used in FRUC.

Fig. 7 shows an example of reference sample access after applying MV clustering for FRUC.

Fig. 8 is a diagram of memory accesses for sub-CU level motion search.

Fig. 9 is a flow diagram of a FRUC motion search process using a constrained search range for CU-level/sub-CU-level initial motion search, according to some embodiments.

10A-10B illustrate an example of a search center selection method in some embodiments. Fig. 10A shows majority-based selection and fig. 10B shows average-based selection.

Fig. 11 illustrates memory access in an example embodiment after applying a constrained search range to an initial CU-level/sub-CU-level motion search.

Fig. 12 is a flow diagram of the FRUC motion search process in an example embodiment after applying the constrained search range to CU-level motion refinement, sub-CU-level initial motion search, and sub-CU-level motion refinement.

Fig. 13 is a flow diagram of an exemplary FRUC motion search process after applying a uniform search range to the entire FRUC process.

Fig. 14 is a diagram showing an example of a structure of a decoded bitstream.

Fig. 15 is a diagram showing an exemplary communication system.

Figure 16 is a diagram illustrating an example wireless transmit/receive unit (WTRU).

Detailed Description

Block-based video coding

HEVC test model (HM) and Joint Exploration Model (JEM) software are both built on a block-based hybrid video coding framework. Fig. 1 is a block diagram of a block-based hybrid video coding system. The input video signal 102 is processed in a block-by-block manner. In HEVC, an extension block size, referred to as a "coding unit" or CU, is used to efficiently compress high resolution (1080p and above) video signals. In HEVC, a CU may be up to 64x64 pixels. A CU may be further partitioned into Prediction Units (PUs), where separate prediction methods are applied to the prediction units. Each input video block (MB or CU) may be spatially predicted (160) and/or temporally predicted (162). Spatial prediction (or "intra prediction") uses pixels from samples of coded neighboring blocks (referred to as reference samples) in the same video image/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from a coded video picture to predict a current video block. Temporal prediction reduces the temporal redundancy inherent in video signals. The temporal prediction signal used to specify a video block is typically signaled by one or more motion vectors that indicate the amount and direction of motion between the current block and its reference block. Furthermore, if multiple reference pictures are supported (as with the latest video coding standards (e.g., h.264/AVC or HEVC)), then additionally its reference picture index is sent for each video block; and the reference index is used to identify from which reference picture in the reference picture store (164) the temporal prediction signal originates. After spatial and/or temporal prediction, a mode decision block (180) in the encoder selects the optimal prediction mode (e.g., based on a rate-distortion optimization method). Then, the prediction block (116) will be subtracted from the current video block (116); and transform (104) and quantization will be used to de-correlate the prediction residual. The quantized residual coefficients are inverse quantized (110) and inverse transformed (112) to form a reconstructed residual, which is then added back to the prediction block (126) to form a reconstructed video block. Loop filtering processes, such as deblocking filters and adaptive loop filters, may be further applied 166 to the reconstructed video block before it is placed in a reference picture store 164 and used to decode future video blocks. To form the output video bitstream 120, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to an entropy coding unit (108) for further compression and packing to form the bitstream.

Fig. 2 is a block diagram of a block-based video decoder. First, the video bitstream 202 is subjected to decapsulation and entropy decoding processing at the entropy decoding unit 208. The coding mode and prediction information are sent to spatial prediction unit 260 (if intra coding is performed) or temporal prediction unit 262 (if inter coding is performed) to form a prediction block. The residual transform coefficients are then sent to the inverse quantization unit 210 and the inverse transform unit 212 in order to reconstruct the residual block. Then, at 226, the prediction block and the residual block will be added together. The reconstructed block may further undergo loop filtering processing before being stored in the reference picture store 264. The reconstructed video in the reference picture store is then sent to drive the display device and used to predict future video blocks.

In HEVC, motion information (including MV and reference picture index) is determined by an encoder and explicitly sent to a decoder. Therefore, a large amount of overhead is spent in coding motion parameters for inter-coded blocks. To save signaling motion information overhead, a coding mode known as Frame Rate Up Conversion (FRUC) for inter-coding is supported in current JEMs. When this mode is enabled for a CU, MV and reference picture indices are not signaled; instead, the information is derived at the decoder side by template matching or bilateral matching techniques. Fig. 3A-3B illustrate MV derivation processes used in FRUC. In the example of fig. 3A, template matching is used, which derives the MV of the current CU by finding the best match between the template 302 (top and/or left neighboring samples) of the current CU 304 in the current image 306 and the reference template 308 of the reference block in the reference image 310. The best match may be selected as the MV that achieves the lowest matching cost (e.g., lowest Sum of Absolute Differences (SAD)) between the current template and the reference template. In fig. 3B, bilateral matching is used, which derives motion information of the current block 352 by finding the best match between two blocks 354, 356 along the motion trajectory of the current block in two different reference images. The motion search process for bilateral matching is based on motion trajectories, whereby the motion vectors MV pointing to the two reference blocks 354, respectively₀And MV₁Should be associated with the current picture and two reference pictures (T)₀And T₁) Each of which isThe temporal distance between the reference images is proportional. To determine a motion vector pair MV for use in bilateral FRUC mode₀And MV₁The motion vector candidates will be checked. The list-0 and list-1 motion vectors are used separately for each candidate to perform a motion search and the pair of motion vectors with the lowest matching cost (e.g., SAD) is selected. The decision as to whether to use the template matching mode or the bilateral matching mode is based on rate-distortion (R-D) cost optimization. The mode that yields the smallest rate-distortion cost will be selected as the FRUC mode for the current CU.

The motion derivation process of the template matching mode and the bilateral matching mode comprises four steps: CU-level motion search, CU-level motion refinement, sub-CU-level motion search, and sub-CU-level motion refinement. Initial motion (including MV and reference picture indices) for the entire CU is derived based on template matching or bilateral matching in a CU-level motion search. In particular, a list of MV candidates will be generated first, and the candidate that yields the smallest matching cost will be selected as the starting MV for the CU. Then, a local search based on template matching or bilateral matching is performed around the starting point in the CU-level motion refinement stage, and the MV with the smallest matching cost is considered as the MV of the entire CU. Subsequently, in the sub-CU level motion search stage and the sub-CU level motion refinement stage, the granularity and precision of the motion information will be further refined by partitioning the current CU into a plurality of sub-CUs and deriving the motion information of each sub-CU with the MV derived from the CU level as a starting search point. Fig. 4 is a flowchart of motion derivation processing of FRUC mode in the current JEM.

Frame rate up-conversion

CU level motion search.

A candidate-based search method is applied in the CU-level motion search to derive the initial MV of the current CU. The method works by selecting a set of unique MV candidates for which a cost measure of its template match or bilateral match (depending on the FRUC mode applied for the current CU) will be computed; the MV candidate that minimizes the cost is selected as the initial MV for the entire CU. In particular, MV candidates are evaluated in the CU-level motion search as follows:

1) an MV predictor derived from the AVMP in JEM when predicting one or more MVs of a current CU in Advanced Motion Vector Prediction (AMVP) mode by applying FRUC;

2) MVs of conventional merge candidates, including five spatial candidates a1, B1, B0, a0 and B2 (as described in fig. 5), and temporal candidates derived from MVs of collocated blocks of temporally neighboring pictures by Temporal Motion Vector Prediction (TMVP);

3) four interpolated MVs at positions (0,0), (W/2,0), (0, H/2), and (W/2, H/2) generated by the image-level MV field interpolation process as described in the "image-level MV field interpolation" section, where W and H are the width and height of the current CU; and

4) the top and left neighboring MVs of the current CU.

Furthermore, when the MV candidate list is generated at the CU level, a pruning operation is performed so that it is not added to the MV candidate list in the case where the MV candidates are redundant (i.e., in the case where the MV candidates have the same motion as the existing MV candidates in the MV candidate list). This technique for generating the candidate list may be used in steps 904, 1204, and 1304, described below.

Furthermore, when applying the bilateral matching mode, each MV of the valid candidates is used as input to generate a MV pair based on the assumption that both MVs are on the same motion trajectory of the current CU. For example, one valid MV is an (MV) located in the reference list a (a ═ 0,1)_a,ref_a). Then, a reference picture ref of a bilateral MV matching it is found on the other reference list b_bTo make ref_aAnd ref_bIs symmetrical in time with respect to the current image. If there is no such AND ref in the reference list B_aSymmetrical reference picture ref_bThen the sum ref is selected_aDifferent reference pictures to determine ref_bAnd the ref_bThe temporal distance between list B and the current picture is minimal. After determining ref_bThereafter, the current image is scaled to ref by scaling it as follows_aAnd ref_bTime-distance-by-time distance between the MVs_aThe MV can be deduced_b，

WhereinAndis MV_aAnd MV_bHorizontal and vertical motion components; tau is_aAnd τ_bIs the reference picture ref_aAnd ref_bThe temporal distance from the current image. Based on existing designs, up to 15 MVs may be included in the CU-level motion search stage.

sub-CU level motion search

In CU-level motion search, the corresponding MV is derived at the CU level (the derived MV is used to represent the motion of the entire CU). However, such granularity may not be good enough considering that the current CU may cover areas corresponding to different objects associated with different motions. Therefore, to improve the accuracy of the derived MVs, each CU encoded using FRUC mode (template matching or bilateral matching) would be further partitioned into M × M sub-CUs, and the individual MVs for each sub-CU are derived separately. The M value is calculated in the manner in equation 2, where D is a predefined split depth constraint, which is set to 3 in JEM.

Furthermore, like CU-level motion search, the MVs of each sub-CU are derived by first generating a list of MV candidates and selecting the MV candidate that minimizes the matching cost as the MV of the sub-CU. In some embodiments, the sub-CU level MV candidate list comprises:

1) MV determined in CU-level motion search;

2) MVs from top, left, top left and top right spatially adjacent CUs;

3) a scaled version of collocated MVs from a temporal reference picture;

4) up to 4 MVs obtained from candidates derived by Advanced Temporal Motion Vector Prediction (ATMVP); and

5) up to 4 MVs obtained from candidates derived by spatio-temporal motion vector prediction (STMVP).

Furthermore, during sub-CU motion search, a particular MV candidate may be included in the MV candidate list only if the MV candidate points to the same reference picture indicated by the starting CU-level MV; otherwise, the MV candidate will be excluded from the matching cost calculation. Since the MVs of all sub-CUs inside a CU all point to the same reference picture (the reference picture associated with the starting CU level MV), doing so reduces memory bandwidth consumption when FRUC is implemented in hardware. Thus, only these reference pictures are accessed in order to find the optimal MV at the sub-CU level. Based on existing designs, up to 17 MVs may be included in the CU-level motion search stage. This technique for generating the sub-CU level MV candidate list may be used in steps 918, 1218, and 1318 discussed below.

Image level MV field interpolation

For each 4 × block in the reference picture, if its MV passes through 4 × blocks in the current block and the 4 × block has not been assigned an interpolated MV, then the MV of the 4 × 4 reference block will be according to the current and reference pictures (TD), if the MV passes through 4 × blocks in the current block and the 4 × block has not been assigned an interpolated MV₀) Temporal distance between and reference image and its own reference image (TD)₁) The ratio of the time distances between is scaled to the current image. And isIf there is no scaled MV assigned to the 4 × block, then the motion for that block would be marked as unavailable in the interpolated motion field.

CU-LEVEL AND SUB-CU-LEVEL MOTION FINISHING

Due to the limited number of search candidates, the MVs derived from the CU-level motion search and the sub-CU-level motion search are not always accurate enough to represent the true motion of the current CU, thereby potentially reducing the efficiency of motion compensated prediction. To further improve the accuracy of the derived MVs, MV refinement processing is further applied after the CU-level motion search and the sub-CU-level motion search. MV refinement is a style-based local MV search process by minimizing a cost metric of template matching or bilateral matching. In particular, two styles are supported in current JEM, which are a diamond search mode at CU-level motion refinement and a cross search mode at sub-CU-level motion refinement, respectively. For CU-level motion refinement and sub-CU-level motion refinement, the MV search is performed first with a quarter-sample precision, and then additional local motion refinement (surrounding the optimal quarter-sample MV) is performed with an eighth-sample precision. Furthermore, when MV points to a fractional sample position, instead of using an 8-tap interpolation filter in HEVC, a bilinear interpolation filter is used for template matching and bilateral matching modes in order to reduce encoding/decoding complexity. Furthermore, it should be mentioned that in CU-level motion refinement and sub-CU-level motion refinement, the search process is not limited in the sense that the refinement search will be repeated until the search center is no longer updated in the predefined range (which is set equal to 8 integer luma samples in the current JEM).

FRUC CU/sub-CU motion search based on MV clustering

As described above, in the CU-level motion search and sub-CU-level motion search in the existing FRUC, the optimal MV is selected from a unique set of MV candidates. As an example, up to 15 and 17 MV candidates need to be checked in CU-level motion search and sub-CU-level motion search, respectively. This results in a significant increase in the complexity of the encoder and decoder, since the motion compensated prediction needs to be performed multiple times to generate a prediction signal for each MV candidate. Furthermore, this candidate-based search process greatly increases the memory bandwidth of the hardware implementation, since the reference samples are retrieved from the temporal reference image to compute the cost metric. Such memory bandwidth issues may become especially critical for CU-level motion searches where MV candidates are likely to come from different reference pictures. Therefore, the encoder/decoder will frequently switch memory access to a different reference picture. Doing so may increase the likelihood of cache read misses and thus may result in a significant increase in external memory accesses.

To reduce complexity and improve memory bandwidth usage, an MV clustering method is proposed for CU-level and sub-CU-level motion search processing. In particular, the proposed method comprises three steps: 1) reference picture selection, 2) MV scaling and pruning, and 3) MV clustering.

Reference picture selection

To avoid frequent switching of memory accesses to multiple reference pictures, a single reference picture is selected in MV cluster-based motion search for a specified reference picture list at CU-level and sub-CU-level motion searches. To select a reference picture, most rules will be applied to select the reference picture index of the specified reference picture list most frequently used by the MV candidates in the candidate list. In particular, it is assumed that there are K MV candidates and M reference pictures in the reference picture list LX (X ═ 0, 1). Further, assume that the K MV candidates and the LX have an index r₀,r₁,…,r_K-1Is associated with the reference picture selection, wherein r_i∈[0,M-1]Then the selected reference picture index is determined to be:

wherein 1 is_n(r_i) Is an indicator function defined as:

furthermore, since the above-described reference picture selection methods are used to determine the optimal reference picture index that specifies the reference picture list (i.e., L0 or L1), these methods can be directly applied to single-edge based motion derivation (e.g., template matching mode), where MVs in L0 and L1 are derived in an independent manner. When applying the bilateral matching mode, the MVs are derived in pairs based on the assumption that the two MVs are in the same motion trajectory of the current block. Thus, when applying the proposed reference image selection method to the bilateral matching mode, it is also possible to ensure that the selected reference images in L0 and L1 satisfy the conditions that enable bilateral matching (the two reference images are symmetric in time with respect to the current image, or the temporal distance from the current image is minimized in the reference image list) by applying additional constraints. This reference image selection method may be used in steps 906, 1206 and 1306 as described below.

MV scaling and pruning

After the reference picture index is determined, the MV candidate list is updated by scaling the initial MV candidates to the same reference picture indicated by the selected reference picture index. Without loss of generality, the reference picture list L0 will be used as an example to illustrate the MV scaling process. For example, consider the ith MV candidate MV_iIf its original L0 reference picture index (i.e., the picture index of the original L0 reference picture)

) Equals the selected L0 reference picture index

Then the updated MV candidate MV_i' set directly to MVi; otherwise (

Is not equal to)，MV_i' set to MV calculated in the following manner_iScaled versions of (a):

wherein tau is_iIs thatA temporal distance between the indicated reference image and the current image; tau is^*Is thatThe indicated temporal distance between the reference picture and the current picture. Furthermore, since the values of different MV candidates may become exactly the same value after the MV scaling operation (e.g., due to loss of precision), a pruning process is performed to remove any duplicate entries, thereby keeping only unique MV candidates in the final list.

MV clustering

To further reduce the complexity of the initial motion search, MV clustering methods may be used to reduce the total number of MV candidates that will be calculated cost metrics in CU-level motion search and sub-CU-level motion search. Generally, the objective of this MV clustering method is to divide the MVs in the candidate list into L groups, where L is the number of MV elements in the final candidate list, to minimize the average distance of all MV candidates falling into the same group. After reference picture selection and MV scaling/pruning, the MVs in the MV candidate list are represented as { MV₀’,MV₁’,…,MV_N-1', where N is the total number of elements in the MV candidate list. The proposed MV clustering method aims at dividing N MVs into L (L ≦ N) groups S ═ S₀,S₁,...,S_L-1-to minimize the distance inside the cluster, formulated in the following way:

wherein mu_iIs S_iThe center of mass of the lens. To solve the cluster optimization problem in equation 6, a classical k-means refinement algorithm (also known as the Lloyd algorithm) may be adapted to cluster MVs in the initial candidate list. In particular, the algorithm would proceed by repeating the following two steps in an alternating manner while traversing all MV candidates.

1) An assignment step: the distance of the MV candidate from the centroid of each MV cluster is calculated and the cluster that yields the smallest distance is found. If the optimal distance is less than a predefined distance threshold, adding the MV candidate into the cluster; otherwise, a new cluster is created and the MV candidate is added to the new cluster.

2) An updating step: when a new MV candidate is added to a cluster, the new mean is updated to the centroid of all MV candidates (including the newly added candidate) in the cluster in the manner indicated below:

whereinRepresenting MV groups S at the t-th clustering iteration_iThe number of elements in (a);

is an updated S_iThe center of mass of the lens.

In view of the derived MV clusters, the centroid MV of the resulting clusters can be used as the final MV candidate to derive the initial MVs at the CU level and the sub-CU level by comparing the cost metrics. In some embodiments, each initial MV candidate is assigned and updated once. In one such embodiment, the first motion vector candidate is initially set to the centroid of the first cluster. Adding the second motion vector candidate to the first cluster and recalculating the centroid of the first cluster if the second motion vector candidate is less than the threshold distance from the first centroid; otherwise a second cluster will be generated and the second motion vector will be used as its centroid. The process continues with each subsequent candidate motion vector being either assigned to an existing cluster with the closest centroid (and subsequently updating the centroid) or used to spawn a new cluster. The candidate MVs may be processed in a predetermined order to ensure that the processing gives the same clustering result at both the encoder and decoder side. This clustering method may be used in steps 908, 920, 1208, 1220, 1308, and 1320, described below.

Additional details and alternative clustering methods are also possible, as described in U.S. provisional patent application serial No. 62/599,238 entitled "low complexity frame rate up conversion," filed on 12/15/2017.

Problems addressed in some embodiments

As described above, a cluster-based motion search would group the initial MV candidates into several sets, and only determine one master MV (e.g., MV centroid) for each MV group. This reduces the total number of MV candidates to be tested for performing the cost metric. While MV clustering methods can effectively reduce the complexity of FRUC motion derivation (CU-level and sub-CU-level), several aspects of existing FRUC designs remain difficult to implement in practical encoder/decoder hardware. In particular, the following complexity problems present in current FRUC are identified in the present disclosure.

One problem is that at the start of the cluster-based motion search, a reference picture selection process is applied to determine the best reference picture for each reference picture list based on the reference picture indices of the MVs in the initial candidate list (CU level or sub-CU level). Unlike MV search methods that require frequent switching of memory accesses to different reference pictures, the reference picture selection process scales all MV candidates to the selected reference picture (in L0 and L1). Therefore, all MV candidates forwarded to the MV clustering process are associated with the same reference picture. However, the MV clustering approach groups MVs in the initial candidate list by minimizing intra-cluster distances. Thus, if the initial candidate list contains MVs with significant characteristics, many clusters will be generated and the centroids of the clusters may be far apart from each other. In this case, the encoder/decoder will frequently switch memory access to a different region in the selected reference picture. Doing so may increase the likelihood of a cache read miss and thus may result in a significant increase in external memory accesses. Fig. 7 shows an example for illustrating the memory access bandwidth problem that occurs after applying MV clustering to FRUC CU/sub-CU level motion initial search. In the example of fig. 7, there are a total of 5 different dominant MV candidates (centroids of MV clusters) after the MV clustering process. Since these MV candidates have different characteristics, 5 separate memory accesses will be performed for the selected reference picture in order to derive the initial MV of the current CU/sub-CU. In a practical encoder/decoder implementation, this may require a large amount of memory bandwidth.

Another problem stems from the fact that: for each FRUC CU (template matching or bilateral matching), it may be further divided into sub-CUs, each of which may derive its own motion in order to improve MV accuracy. Further, by using the MV determined in the sub-CU level initial search as a starting point, the following MV refinement process will be performed for each sub-CU. Although the MV refinement process for each sub-CU is limited to a predefined search window around the corresponding starting point (e.g., 8 integer luma samples), the search range of the entire sub-CU level motion search is not limited because the initial MVs of sub-CUs that are inside the CU are unbounded. Fig. 8 shows an example for illustrating the memory access process performed at the sub-CU level motion search stage, where the current CU is divided into 4 sub-CUs and the dashed square surrounds the respective local search window of each sub-CU (e.g., 8 integer luma samples from the starting MV). As can be seen from fig. 8, after setting the respective search starting point of each sub-CU, it will frequently switch (four different memory accesses in fig. 8) to access different areas in the reference image. In practice, such a design may require the use of a large amount of memory bandwidth for the encoder/decoder hardware.

Another problem relates to FRUC motion refinement (CU-level motion refinement or sub-CU-level motion refinement), which is a local MV search process that promotes the accuracy of the derived MVs by iteratively repeating pattern-based motion searches (e.g., diamond search and cross search) starting from the initial MV. However, the maximum number of iterations for FRUC motion refinement is not specified in existing designs. In other words, the refinement process will continue unless the search center remains unchanged in two consecutive search iterations. Based on the analysis of the decoded bit stream, statistical results show that the number of search iterations may be as much as about 100. This design is not friendly to practical hardware implementations, as no restrictions are made on the computational complexity of each FRUC CU. Considering that FRUC uses neighboring reconstructed samples to derive motion information for a current block, this unconstrained motion search process reduces parallel processing power and complicates pipeline design for practical encoder/decoder hardware implementations.

To address these issues, various methods presented in this disclosure operate to reduce the average and worst-case complexity of FRUC. Various embodiments described herein include the following aspects. In some embodiments, provided are methods for constraining the search range for deriving optimal MVs at different FRUC motion search stages. The illustrated method may perform FRUC-related motion search processing in a unified search area to minimize the total number of external memory accesses. In some embodiments, the maximum search iteration performed in the FRUC motion refinement process is limited by proposing constraints, and the total number of master MV candidates produced by the MV clustering process is limited. These embodiments aim to reduce the worst case complexity of FRUC.

FRUC motion search with constrained search range

As described above, multiple initial MV candidates may be generated in different FRUC motion search processes, e.g., the main MV candidates after the MV clustering process and the initial MV of each sub-CU after the initial sub-CU level motion search. Since these initial MVs are likely to be separated by large distances, it may be necessary to perform multiple external memory accesses on the respective reference pictures in order to derive the optimal MV for the current CU/sub-CU. For a hardware codec implementation, this may result in a significant increase in memory bandwidth. To address this memory bandwidth problem, constrained search ranges at different FRUC motion search stages are presented below so that all reference samples needed for a given FRUC motion search can be obtained with a single access to external memory.

Constrained search range for CU-level initial motion search

In some embodiments, the restricted search space used for CU/sub-CU level initial motion search to reduce memory bandwidth usage of the main MV candidates is generated after the MV clustering process. Fig. 9 shows the modified FRUC motion search process after applying the proposed constrained search range to the CU/sub-CU level initial motion search process.

In fig. 9, an initial CU-level MV candidate list (904) is generated in a CU-level initial motion search (902) and a reference picture is selected (906). The MVs in the MV candidate list are to be clustered (908), where each cluster is represented by a master MV. The dominant MV of a cluster may be the centroid of MV candidates in the cluster. The search center and the search range are determined based on the master MV (910). These master MVs can be processed 911 by appropriately cropping the master MVs so that they fall within the search range, and a search is performed to select 912 the best master MV to process. CU-level motion refinement (914) is performed on the selected MV. After CU-level motion refinement, a sub-CU-level initial motion search is performed (916). An initial sub-CU-level MV candidate list is generated for each sub-CU in the CU (918), and sub-CU-level MVs are clustered (920), where each cluster is represented by a master MV. The search center and search range for the sub-CU level initial motion search will be determined 922 and the MV will be processed 923 by cropping to fall within the sub-CU level search range. Within the search area, the processed optimal main MV will be selected (924) and sub-CU level motion refinement will be performed (926).

In more detail, as shown in FIG. 9After generating the master MV candidate (e.g., the centroid of the MV cluster) by the MV clustering process (908), the search center will be determined (910) to select the initial CU/sub-CU level MVs. To select the search center, most rules are applied in some embodiments to select the centroid of the MV cluster that contains the majority of the initial MV candidates in the CU/sub-CU level motion candidate list. In particular, assume M MV candidates (MVs) from the initial MV candidate list₀,MV₁,…,MV_M-1) K MV clusters (C) are generated₀,C₁,…,C_K-1) The selected search center will be determined to be:

wherein 1 is_n(MV_i) Is an indicator function as follows:

fig. 10A shows an example for illustrating the above search center selection method. In fig. 10A, there are a total of 29 MV candidates in the initial MV candidate list (i.e., M ═ 29), and it is divided into 3 MV clusters (i.e., K ═ 3): clusters 1002, 1004, 1006. Further, in this example, the number of initial MV candidates covered by each MV cluster is 20 in the top cluster, 6 in the lower left cluster, and 3 in the lower right cluster. The centroids of clusters 1002, 1004, and 1006 are indicated at 1003, 1005, and 1007, respectively. According to the majority-based selection method in equation 8, the selected search center is set to be the centroid 1003 (represented by a pentagon) of the MV cluster with 20 initial MV candidates. When the initial MV candidates are sparsely distributed, it is likely that the number of initial MV candidates included in different MV clusters will be relatively similar. In this case, different MV clusters may play the same role to derive the optimal motion of the current CU/sub-CU.

In another embodiment of the present disclosure, when the resulting clusters show similar MV candidate coverage in the initial MV candidate list (e.g., no dominant MV cluster above other MV clusters), then it is suggested to take the average of the centroids of all MV clusters and use this average as the search center for the subsequent CU/sub-CU level initial motion search. Fig. 10B shows an example for illustrating the mean-based search center selection method, where three MV clusters 1010, 1012, 1014 contain the same number of initial MV candidates. The respective centroids of the MV clusters are shown by triangles 1016, 1018, 1020. Whether the resulting clusters have similar initial MV candidate coverage can be determined by applying different methods. In one example, the difference between (i) the number of MV candidates included in the MV cluster with the largest coverage and (ii) the number of MV candidates included in the MV cluster with the smallest coverage may be used. If the difference is less than a predefined threshold, the resulting clusters can be considered to have similar coverage (i.e., the process applied should be to determine the search center based on averaging); otherwise, the respective coverage of the different clusters would be considered unbalanced, and thus a process of determining the search center based on the majority should be applied. In the example of fig. 10B, in response to determining that the clusters 1010, 1012, 1014 have similar initial MV candidate coverage, the selected search center is the average of the centroids 1016, 1018, 1020. The pentagon at 1022 shows the selected search center.

It should be noted that fig. 10A and 10B all show embodiments in which the selected search center is the average of one or more main MV candidates. In particular, in the embodiment of fig. 10A, the selected search center is the average of only one master MV candidate, and thus is equal to the candidate itself, whereas in the embodiment of fig. 10B, the selected search center is the average of more than one MV candidate.

Referring again to fig. 9, after the search center is calculated, a search window is determined (910) with the center of the search window set to the selected search center and the width set to the selected range (e.g., a predefined or signaled range). The master MVs (e.g., centroids of MV clusters) will then be updated by clipping each master MV into the search window (911). Based on the same example in fig. 7, fig. 11 shows external memory accesses after applying the proposed search constraints to the CU/sub-CU level initial motion search. As can be seen based on a comparison between fig. 7 and fig. 11, the constrained motion search retrieves all reference samples needed to determine the optimal MV (912) for the initial CU/sub-CU level motion search using only one single external memory access for the corresponding region (as shown by search window 1102) as compared to the unconstrained motion search, which uses five separate memory accesses for the reference image in this example. This effectively reduces the memory bandwidth of the encoder/decoder hardware used for FRUC.

In the above description, the same search range is applied to the CU-level initial motion search and the sub-CU-level initial motion search. However, given that the sub-CU-level motion search is based on MVs derived from the CU-level motion search, the initial MV candidates generated for each sub-CU are typically more relevant than the initial MV candidates generated at the CU level. Therefore, it is reasonable that the search range used for the sub-CU level motion search is smaller than the search range used for the CU level motion search. This may further reduce the size of the area to be accessed from the reference image.

In the above, the proposed constrained CU/sub-CU level initial motion search is described with MV clustering applied. However, in practice it is possible to use constrained motion search independently. That is, constrained motion search is applicable with or without using MV clustering. The constrained motion search may further consider the reference image from which the initial MV candidate originates when no MV clustering process is applied, and may be applied separately to a set of MV candidates from the same reference image.

Constrained search ranges for CU-level motion refinement, sub-CU-level initial motion search, and sub-CU-level motion refinement

As described above, the search area of sub-CU level motion search is usually constrained, considering that each sub-CU inside the current CU can derive its own initial CU, and that these CUs are far away from each other. In terms of hardware implementation, such a design may require the use of a large amount of memory bandwidth. To address such issues, it is proposed in some embodiments to add search range constraints for all FRUC-related motion search processes after the CU-level initial motion search, including CU-level motion refinement, sub-CU-level initial motion search, and sub-CU-level motion refinement.

Fig. 12 shows the modified FRUC motion search process after applying the proposed search range to CU-level motion refinement, sub-CU-level initial motion search, and sub-CU-level motion refinement. Specifically, in fig. 12, the search window is determined by setting the CU-level initial MV to the search center after the CU-level initial motion search (1212); therefore, subsequent CU-level motion refinement is only performed inside the search window. At the sub-CU level, all MV candidates obtained for each sub-CU are clipped (1221) to the search window area, and motion refinement of each sub-CU is only allowed inside the search window. As such, it may use a single memory access for the external reference picture after the CU-level initial MV is obtained. In contrast, other FRUC designs may require external memory access for motion search of each sub-CU, and there may be as many as 64 sub-CUs inside one CU. Thus, after the CU-level initial MV is determined, other FRUC methods may use up to 64 separate memory accesses to generate motion information for the CU. In this sense, the example embodiments may reduce memory bandwidth usage for FRUC.

In the method according to fig. 12, an initial CU-level MV candidate list is generated (1204) in a CU-level initial motion search (1202), and a reference picture is selected (1206). The MVs in the MV candidate list will be clustered (1208), where each cluster is represented by a master MV. The clustered master MV may be a centroid of MV candidates in the cluster, and the optimal master MV is selected by performing a search (1210). For CU-level motion refinement, its search range is determined 1212, and CU-level motion refinement 1214 is performed inside the search range. After CU-level motion refinement, a sub-CU-level initial motion search is performed (1216). Each sub-CU in the CU is generated an initial sub-CU-level MV candidate list (1218), and a clustering process (1220) is performed on the sub-CU-level MVs, where each cluster is represented by a master MV. These dominant MVs are processed (1221) as needed and by clipping to make the dominant MVs fall within the search window. The optimal MV is selected from the processed master MVs within the defined search range (1222). After the sub-CU level motion search (1216), sub-CU level motion refinement (1224) is performed inside the defined search range.

FRUC motion search in a unified search range

While the constrained motion search method described above may provide significant memory bandwidth reduction compared to other FRUC methods, some such embodiments still use at least two separate memory accesses to the external cache of the reference image: one of the accesses is for CU-level initial motion search to generate CU-level initial MVs, and the other access is for other FRUC motion search processes to generate MVs of sub-CUs located inside the current CU. To further reduce FRUC memory bandwidth, it is proposed to use a uniform search range for all FRUC-related motion search processes, thereby using only a single memory access for the external reference image for the entire FRUC process. Fig. 13 depicts a modified FRUC motion derivation process after applying a uniform search range to the entire FRUC motion search process. More specifically, after MV clustering (1308) a search center is determined from the master MV candidates based on equations 8 and 9, and a search window (1310) is also determined, the center of which is set to the selected search center and the width of which is set to the selected range.

Subsequent CU-level motion refinement (1314) and sub-CU-level motion search, including sub-CU-level initial motion search (1316) and sub-CU-level motion refinement (1324), are only allowed to search for MV candidates inside the defined search window. In the example of fig. 13, with the proposed unified search range, only a single external memory access for the selected reference image will be used to retrieve reference samples that are within the search window size in order to generate motion information for the entire CU. Furthermore, since the unified search method as described above reduces the number of external memory accesses, it is reasonable to use a larger search space than the search space used by the constrained search method in the embodiments described above, in order to better trade off coding performance against memory bandwidth usage.

As shown in fig. 13, an initial CU-level MV candidate list is generated (1304) in the CU-level initial motion search (1302), and a reference picture is selected (1306). The MVs in the MV candidate list will be clustered (1308), where each cluster is represented by a master MV. The dominant MV of a cluster may be the centroid of MV candidates in the cluster. The search center and the search range are determined based on the master MV (1310). The master MVs are processed by tailoring them as appropriate (1311) to fall within the search range, and a search is performed to select the best processed master MV (1312). CU-level motion refinement is performed for the selected MV (1314). A sub-CU level initial motion search is performed after CU level motion refinement (1316). Each sub-CU in the CU is generated an initial sub-CU-level MV candidate list (1318) and the sub-CU-level MVs are clustered (1320), where each cluster is represented by a master MV. The master MV is processed to fall within the search range by clipping (1321) as appropriate. Inside the search region, the processed optimal main MV will be selected (1322), and sub-CU level motion refinement will be performed (1324).

Adaptive search range for FRUC motion search

In some embodiments with respect to the constrained FRUC motion search method described above, the same search range is applied to all images in a video sequence. However, in alternative embodiments, the search range may be adaptively adjusted at different levels (e.g., sequence level, image level, and block level); and each adaptation level may provide a different performance/complexity tradeoff. In addition, different methods may be applied to determine the optimal search range when applying the adaptive search range. For example, when the search range is adapted at the image level, video blocks in some images show stable motion (e.g., images of a high temporal layer in a random access configuration) and video blocks in some images show relatively unstable motion (e.g., images of a low temporal layer in a random access configuration) according to the correlation between the current image and its reference image. In this case, it is beneficial to use a relatively smaller search range for pictures with stable motion than for pictures with unstable motion in order to achieve a larger reduction in memory access while maintaining coding performance. In another example, when adapting the search range at the block level, the optimal search range for a block may be determined based on the correlation of MVs of the spatial neighbors of the current block. To measure motion correlation, one approach is to calculate the variance of the neighboring MVs of the current block. If the motion variance is less than a predefined threshold, then it is reasonable to assume that the motion of the current block is highly correlated with the motion of its neighbors, and a very small search range can be safely applied; otherwise, the motion of the current block will be considered to be less correlated with its neighbors, and a large search range should be applied to ensure that the optimal MV for the current block can be identified inside the search window. In another approach, it is proposed to adaptively adjust the search range based on block size. The consideration behind this approach is that when the size of the current block is larger, the current block is more likely to contain more complex content (e.g., rich texture and/or directional edges); a large search range can thus be applied, which helps the current block to find a good match from the reference image. Otherwise, if the size of the current block is relatively small, it may be reasonable to assume that the current block is likely to contain less texture information; accordingly, in this case, a very small search window is good enough. In another embodiment of the present disclosure, it is proposed to adaptively adjust the search range based on the POC distance between the current picture and its reference picture. In particular, with this method, if the POC distance between the current picture and its closest reference picture is less than a predefined threshold, then the block in the current picture will show stable motion and a smaller search range can be applied; otherwise (if the POC distance between the current picture and its closest reference picture is greater than or equal to a predefined threshold), the motion of the blocks inside the current picture is likely to be unstable, in which case a large search range should be applied.

When applying the adaptive search range to FRUC, the corresponding search range is either signaled in the bitstream (e.g., in the Video Parameter Set (VPS), Sequence Parameter Set (SPS), Picture Parameter Set (PPS), and slice segment header) or derived at the encoder and decoder without signaling.

Complexity limited motion search for FRUC

Limiting number of search iterations for FRUC motion refinement

In known FRUC designs, refinement search continues unless the search center is updated in two consecutive search cycles, in the sense that both CU-level motion refinement and sub-CU-level motion refinement are computationally unconstrained. For a pipeline design of a practical encoder/decoder hardware implementation, such a design is impractical because the encoding/decoding complexity of the FRUC block is unbounded. Thus, to make FRUC designs more hardware-friendly, it is proposed to limit the maximum number of search iterations performed at the CU/sub-CU level motion refinement stage. Similar to the constrained search range of the CU-level initial motion search described above, the maximum number of search iterations may be adapted or signaled at different coding levels (e.g., sequence, picture, and block). Furthermore, the maximum number of search iterations applied to CU-level motion refinement and sub-CU-level motion refinement may be the same. However, since sub-CU level motion candidates typically show stronger correlation compared to CU level motion candidates, it is also proposed in at least some embodiments to set a larger maximum search iteration value for CU level motion refinement than for sub-CU level motion refinement.

Limitation on the number of dominant MVs used for MV clustering

Although the MV clustering method described above can significantly reduce the average computational complexity of the initial MV derivation (at the CU level and sub-CU level). But it does not have to change the maximum value of the MV candidate that needs to be tested. For example, in the worst case, up to 15 and 17 MV candidates are still to be checked in CU-level motion search and sub-CU-level motion search, respectively. In practice, the worst case scenario is an important consideration for practical encoder/decoder implementations, which directly determines the processing power/conditions to be met by the hardware design. Thus, it is proposed in some embodiments to impose constraints on the number of master MVs generated from the MV clustering process in order to reduce the mean and maximum values of the tested MV candidates. Different criteria may be used to determine which main MVs should be selected for FRUC motion search, taking into account the maximum number of main MVs (e.g., L).

In some embodiments, one criterion used is to select the main MV based on MV candidate averages. In some embodiments where such criteria are used, the encoder/decoder will count the number of initial MVs contained in the initial MV candidate list (MV candidate overlay) for each MV cluster (CU level or sub-CU level); the encoder/decoder then sorts the generated main MVs according to MV candidate coverage by placing the main MVs with more coverage at the beginning of the list, and keeps only the first L main MVs as output that is further used for the subsequent CU/sub-CU level motion search process.

In some embodiments, another criterion used is to select an MV based on MV candidate variances. In some embodiments where such criteria are used, the encoder/decoder will calculate the variance of the initial MVs contained in each MV cluster during the MV clustering process; the encoder/decoder then ranks the generated main MVs in order of ascending MV candidate variance, and keeps only the first main MV with the smallest MV candidate variance as the output of the subsequent CU-level or sub-CU-level motion search process.

The maximum number of master MVs can be adapted or signaled at different coding levels, e.g. sequence level, picture level and block level. In addition, the maximum values used for the MV clustering process applied in the CU-level motion search and the sub-CU-level motion search may be different.

Method for using constrained search range for CU-level initial motion search

Some such embodiments further comprise: performing a motion refinement search based on the selected motion vector candidate to generate a refined motion vector; and predicting the block using the refined motion vector.

As an example, the current block may be a coding unit or a sub-coding unit block.

The selected motion vector may be selected such that the lowest matching cost or the lowest Sum of Absolute Difference (SAD) is achieved.

In some embodiments, the first set of motion vector candidates is generated by a method comprising: aggregating the initial set of motion vector candidates into a plurality of clusters; and for each cluster, calculating a centroid of the respective cluster and contributing a centroid motion vector representing the centroid of the respective cluster to the first motion vector candidate set. In such an embodiment, the search center may be the center of the cluster with the most motion vector candidates.

As an example, the selected width may be signaled in the VPS, SPS, PPS, slice segment header, or at the block level. In some embodiments, the selected width of the current image is determined based at least in part on a level of motion stability in the current image. In some embodiments, the selected width of the current block is determined based at least in part on correlation levels of motion vectors of spatial neighbors of the current block. In some embodiments, the selected width of the current block is determined based at least in part on a size of the current block. In some embodiments, the selected width for the current picture is determined based at least in part on the POC distance between the current picture and its reference picture.

Methods of using constrained search ranges for CU-level motion refinement, sub-CU-level initial motion search, and sub-CU-level motion refinement

In some embodiments, a method of coding video comprising a plurality of pictures including a current picture is provided. The method comprises the following steps: for at least one current Coding Unit (CU) in a current picture: identifying a first set of motion vector candidates for Frame Rate Up Conversion (FRUC) prediction for the coding unit; performing a search for a CU-level motion vector selected from the first set of motion vector candidates; determining a search window having a selected width and centered on the selected CU-level motion vector; and performing a motion refinement search inside the search window to generate a refined CU-level motion vector. The coding unit may predict with the refined CU-level motion vector.

In some embodiments, the method further comprises: identifying a sub-CU initial motion vector set, and for each sub-CU inside the coding unit: processing the set of sub-CU initial motion vectors to fall within a search window by clipping any motion vectors of the set of sub-CU initial motion vectors that fall outside the search window; performing a search for a sub-CU motion vector selected from the set of processed sub-CU initial motion vectors; and performing a motion refinement search inside the search window to produce refined sub-CU motion vectors. The sub-CU may be predicted with refined sub-CU level motion vectors.

The selected width may be signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice segment header, or at a block level. The width selected for the current image may be determined based at least in part on a level of motion stability in the current image. The width selected for the current block may be determined based at least in part on a correlation level of motion vectors of spatial neighbors of the current block. The width selected for the current block may be determined based at least in part on the size of the current block. The width selected for the current block may be determined based at least in part on the POC distance between the current picture and its reference picture.

Method for using FRUC motion search in unified search range

In some embodiments, a method for coding video comprising a plurality of pictures including a current picture is provided. The method comprises the following steps: for at least one Coding Unit (CU) in a current picture: identifying a first CU-level motion vector candidate set for Frame Rate Up Conversion (FRUC) prediction for the coding unit; defining a search center based on the first CU-level motion vector candidate set (e.g., as an average of the first CU-level motion vector candidate set); determining a search window having a selected width and centered at the search center; processing the first set to fall within the search window by clipping any motion vectors in a first CU-level motion vector candidate set that falls outside the search window; performing a search for a CU-level motion vector selected from the processed first motion vector candidate set; performing a motion refinement search inside the search window to produce a refined CU-level motion vector; identifying a set of sub-CU initial motion vectors; and for each sub-CU inside the coding unit: processing the set of sub-CU initial motion vectors to fall within a search window by clipping any motion vectors of the set of motion vectors that fall outside the search window; performing a search for a sub-CU motion vector selected from the set of processed sub-CU initial motion vectors; and generating refined sub-CU motion vectors by performing a motion refinement search inside the search window.

In some such embodiments, the first CU-level motion vector candidate set is generated by aggregating the initial CU-level motion vector candidate set into a plurality of clusters; and for each cluster, calculating a centroid of the respective cluster, and contributing a centroid motion vector representing the centroid of the respective cluster to the first motion vector candidate set. The search center may be the center of the cluster having the most CU-level motion vector candidates.

The selected width may be signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice segment header, or at the block level. The width selected for the current image may be determined based at least in part on a level of motion stability in the current image. The width selected for the current block may be determined based at least in part on a correlation level of motion vectors of spatial neighbors of the current block. The width selected for the current block may be determined based at least in part on the size of the current block. The width selected for the current picture may be determined based at least in part on the POC distance between the current picture and its reference picture.

Method for refining and using search iteration number constraint for FRUC motion

The selected maximum number of iterations may depend at least in part on whether the block is a coding unit block or a sub-coding unit block. The selected maximum number of iterations may be signaled at the sequence level, image level, block level, or in the slice segment header.

Method for using a constrained number of master MVs for MV clustering

In some embodiments, a method for coding video comprising a plurality of pictures including a current picture is provided. The method comprises the following steps: for at least one block (e.g., coding unit or sub-coding unit block) in a current picture: aggregating the initial set of motion vector candidates into a plurality of clusters; and for each cluster, calculating the centroid of the respective cluster and contributing a centroid motion vector representing the centroid of the respective cluster to the set of master motion vectors; selecting no more than the selected maximum number of motion vectors from the set of primary motion vectors to produce a constrained set of motion vectors; a search for a selected motion vector is performed from the constrained motion vector candidate set.

In some embodiments, selecting no more than the selected maximum number of motion vectors comprises selecting a dominant motion vector from a set of dominant motion vectors that represents a cluster having the maximum number of initial motion vector candidates. In some embodiments, selecting no more than the selected maximum number of motion vectors comprises selecting a primary motion vector from a set of primary motion vectors that represents a cluster with minimal variance between motion vector candidates.

The selected maximum number of motion vectors may depend at least in part on whether the block is a coding unit block or a sub-coding unit block. The selected maximum number of motion vectors may be signaled at the sequence level, image level, block level, or in slice segment headers.

Decoding bit stream structure

Fig. 14 is a diagram showing an example regarding a structure of a decoded bitstream. The coded bitstream 1300 includes a plurality of NAL (network abstraction layer) units 1301. The NAL units may include coding sample data (e.g., coding slice 1306) or high level syntax metadata (e.g., parameter set data, slice header data 1305, or supplemental enhancement information data 1307 (which may be referred to as SEI messages)). Parameter sets are high level syntax structures that contain basic syntax elements that can be applied to multiple bitstream layers (e.g., video parameter set 1302(VPS), or to coded video sequences within a layer (e.g., sequence parameter set 1303(SPS)), or to multiple coded pictures within a coded video sequence (e.g., picture parameter set 1304 (PPS)). parameter sets can be sent either with the coded pictures of a video bitstream or in other ways (including out-of-band transport using reliable channels, hard coding, etc.). slice header 1305 is also a high level syntax structure that can contain some picture-related information that is relatively small or only related to certain slices or picture types. But this information can be used for other various purposes such as image output timing or display and loss detection and concealment.

Communication device and system

Fig. 15 is a schematic diagram showing an example of a communication system. The communication system 1400 may include an encoder 1402, a communication network 1404, and a decoder 1406. The encoder 1402 may communicate with the network 1404 via a connection 1408, which connection 1408 may be a wired connection or a wireless connection. The encoder 1402 may be similar to the block-based video encoder of fig. 1. The encoder 1402 may include a single layer codec (e.g., fig. 1) or a multi-layer codec. The decoder 1406 may communicate with the network 1404 via a connection 1410, which connection 1410 may be a wired connection or a wireless connection. The decoder 1406 may be similar to the block-based video decoder of fig. 2. The decoder 1406 may include a single layer codec (e.g., fig. 2) or a multi-layer codec.

The encoder 1402 and/or decoder 1406 may be incorporated into various wired communication devices and/or wireless transmit/receive units (WTRUs), such as, but not limited to, digital televisions, wireless broadcast systems, network components/terminals, servers (e.g., content or web servers (e.g., hypertext transfer protocol (HTTP) servers)), Personal Digital Assistants (PDAs), laptop or desktop computers, tablets, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and/or digital media players, among others.

The communication network 1404 may be an appropriate type of communication network. For example, the communication network 1404 may be a multiple-access system that provides content (e.g., voice, data, video, messaging, broadcast, etc.) to multiple wireless users. The communications network 1404 enables multiple wireless users to access such content by sharing system resources, including wireless bandwidth. By way of example, the communication network 1404 can employ one or more channel access methods, such as Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), orthogonal FDMA (ofdma), and/or single carrier FDMA (SC-FDMA), among others. The communication network 1404 may include a plurality of connected communication networks. The communication network 1404 may include the internet and/or one or more private business networks, such as a cellular network, a WiFi hotspot, and/or an Internet Service Provider (ISP) network, among others.

Figure 16 is a system diagram illustrating a WTRU. As shown, the exemplary WTRU 1500 may include a processor 1518, a transceiver 1520, transmit/receive components 1522, a speaker/microphone 1524, a keyboard or keypad 1526, a display/touch pad 1528, non-removable memory 1530, removable memory 1532, a power supply 1534, a Global Positioning System (GPS) chipset 1536, and other peripherals 1538. It should be appreciated that the WTRU 1500 may include any subcombination of the foregoing components while remaining consistent with an embodiment. Further, a terminal of an encoder (e.g., encoder 100) and/or a decoder (e.g., decoder 200) may include some or all of the components depicted in the WTRU 1500 of fig. 16 and described herein with reference to the WTRU 1500 of fig. 16.

The processor 1518 may be a general purpose processor, a special purpose processor, a conventional processor, a Digital Signal Processor (DSP), a graphics processing unit (DPU), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) circuit, any other type of Integrated Circuit (IC), a state machine, or the like. The processor 1518 may perform signal decoding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1500 to operate in a wireless environment. Processor 1518 may be coupled to a transceiver 1520, and transceiver 1520 may be coupled to transmit/receive component 1522. Although fig. 16 depicts the processor 1518 and the transceiver 1520 as separate components, it should be understood that the processor 118 and the transceiver 1520 may also be integrated together in an electronic component or chip.

The transmit/receive component 1522 may be configured to transmit and/or receive signals to and/or from another terminal via the air interface 1515. For example, in one or more embodiments, the transmit/receive component 1522 can be an antenna configured to transmit and/or receive RF signals. As an example, in one or more embodiments, the transmission/reception component 1522 can be a transmitter/detector configured to transmit and/or receive IR, UV, or visible light signals. In one or more embodiments, the transmit/receive component 1522 can be configured to transmit and/or receive RF as well as optical signals. It should be appreciated that the transmit/receive component 1522 can be configured to transmit and/or receive any combination of wireless signals.

Further, although transmit/receive component 1522 is depicted in fig. 16 as a single component, WTRU 1500 may include any number of transmit/receive components 1522. More specifically, the WTRU 1500 may use MIMO technology. Thus, in one embodiment, the WTRU 1500 may include two or more transmit/receive components 1522 (e.g., multiple antennas) that transmit and receive radio signals over the air interface 1516.

Transceiver 1520 can be configured to modulate signals to be transmitted by transmit/receive component 1522 and to demodulate signals received by transmit/receive component 1522. As noted above, the WTRU 1500 may have multi-mode capabilities. Thus, the transceiver 1520 may include multiple transceivers that allow the WTRU 1500 to communicate via multiple RATs (e.g., UTRA and IEEE 802.11).

The processor 1518 of the WTRU 1500 may be coupled to and may receive user input data from a speaker/microphone 1524, a keypad 1526, and/or a display/touch panel 1528, such as a Liquid Crystal Display (LCD) display unit or an Organic Light Emitting Diode (OLED) display unit. The processor 1518 may also output user data to a speaker/microphone 1524, a keypad 1526, and/or a display/touchpad 1528. Further, the processor 1518 can access information from, and store data in, any suitable memory (e.g., non-removable memory 1530 and/or removable memory 1532). The non-removable memory 1530 may include Random Access Memory (RAM), Read Only Memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 1532 may include a Subscriber Identity Module (SIM) card, a memory stick, a Secure Digital (SD) memory card, and so forth. In one or more embodiments, the processor 1518 may access information from, and store data in, memories that are not actually located in the WTRU 1500, such memories may be located in a server or a home computer (not shown), for example.

The processor 1518 may receive power from the power supply 1534 and may be configured to distribute and/or control power for other components in the WTRU 1500. The power supply 1534 may be any suitable device for powering the WTRU 1500. For example, the power supply 1534 may include one or more dry cell batteries (e.g., nickel-cadmium (Ni-Cd), nickel-zinc (Ni-Zn), nickel metal hydride (NiMH), lithium ion (Li-ion), etc.), solar cells, and fuel cells, among others.

The processor 1518 may also be coupled with a GPS chipset 1536, which may be configured to provide location information (e.g., longitude and latitude) related to the current location of the WTRU 1500. In addition to, or in lieu of, information from the GPS chipset 1536, the WTRU 1500 may receive location information from a terminal (e.g., a base station) via the air interface 1515 and/or determine its location based on the timing of signals received from two or more nearby base stations. It should be appreciated that the WTRU 1500 may acquire location information via any suitable positioning method while remaining consistent with an embodiment.

The processor 1518 may be further coupled to other peripheral devices 1538, which may include one or more software and/or hardware modules that provide additional features, functionality, and/or wired or wireless connectivity. For example, the peripheral devices 1538 may include accelerometers, orientation sensors, motion sensors, proximity sensors, electronic compasses, satellite transceivers, digital cameras and/or video recorders (for photos and video), Universal Serial Bus (USB) ports, vibration devices, television transceivers, hands-free headsets、

A module, a Frequency Modulation (FM) radio unit, and a software module (e.g., a digital music player, a media player, a video game player module, and an internet browser, etc.).

By way of example, the WTRU 1500 may be configured to transmit and/or receive wireless signals and may include a User Equipment (UE), a mobile station, a fixed or mobile subscriber unit, a pager, a cellular telephone, a Personal Digital Assistant (PDA), a smartphone, a laptop, a netbook, a tablet, a personal computer, a wireless sensor, consumer electronics, or any other terminal capable of receiving and processing compressed video communications.

The WTRU 1500 and/or a communication network (e.g., communication network 804) may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) terrestrial radio access (UTRA), which may use wideband cdma (wcdma) to establish the air interface 1515. WCDMA may include communication protocols such as High Speed Packet Access (HSPA) and/or evolved HSPA (HSPA +). HSPA may include High Speed Downlink Packet Access (HSDPA) and/or High Speed Uplink Packet Access (HSUPA). The WTRU 1500 and/or a communication network (e.g., communication network 804) may implement a radio technology such as evolved UMTS terrestrial radio access (E-UTRA) that may establish an air interface 1515 using Long Term Evolution (LTE) and/or LTE-advanced (LTE-a).

The WTRU 1500 and/or a communication network (e.g., communication network 804) may implement radio technologies such as IEEE802.16 (e.g., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA 20001X, CDMA2000 EV-DO, interim standard 2000(IS-2000), interim standard 95(IS-95), interim standard 856(IS-856), Global System for Mobile communications (GSM), enhanced data rates for GSM evolution (EDGE), and GSME DGE (GERAN), among others. The WTRU 1500 and/or a communication network (e.g., communication network 804) may implement radio technologies such as IEEE 802.11 or IEEE 802.15.

It should be noted that the various hardware components of one or more of the described embodiments are referred to as "modules," which perform the various functions described herein in connection with the respective module. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more Application Specific Integrated Circuits (ASICs), one or more Field Programmable Gate Arrays (FPGAs), one or more memory devices) deemed suitable by one of ordinary skill in the relevant art for a given implementation. Each of the modules described may also include instructions executable to implement one or more functions described as being performed by the respective module, and it should be noted that these instructions may take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, and/or software instructions, and may be stored in any suitable non-transitory computer-readable medium or media, such as media or media commonly referred to as RAM, ROM, and so forth.

Although features and elements are described above in particular combinations, one of ordinary skill in the art will recognize that each feature can be used alone or in any combination with other features and elements. Furthermore, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer readable storage media include, but are not limited to, Read Only Memory (ROM), Random Access Memory (RAM), registers, cache memory, semiconductor memory devices, magnetic media such as internal hard disk removable disks, magneto-optical media, and optical media such as CD-ROM disks and Digital Versatile Disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any computer host.

39页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：提供故事情节选择界面的系统和方法

Low complexity frame rate up-conversion

相关技术

网友询问留言