Video encoding device, video decoding device, video encoding method, video decoding method, program, and video system

文档序号：1382835 发布日期：2020-08-14 浏览：4次中文

阅读说明：本技术 视频编码设备、视频解码设备、视频编码方法、视频解码方法、程序和视频系统 (Video encoding device, video decoding device, video encoding method, video decoding method, program, and video system ) 是由蝶野庆一于 2018-08-31 设计创作，主要内容包括：一种视频编码设备使用基于块的仿射变换运动补偿预测来执行视频编码,基于块的仿射变换运动补偿预测包括使用块中的控制点的运动矢量来计算每个子块的运动矢量的过程。该视频编码设备被提供有基于块的仿射变换运动补偿预测控制装置,其用于使用从外部供应的编码参数来控制经受基于块的仿射变换运动补偿预测的块中的子块的块尺寸、预测方向和运动矢量精度中的至少一项。(A video encoding apparatus performs video encoding using block-based affine transform motion compensation prediction including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block. The video encoding apparatus is provided with block-based affine transform motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using encoding parameters supplied from the outside.)

1. A video encoding apparatus that performs video encoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video encoding apparatus comprising:

block-based affine transform motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of the sub-block in the block subjected to the block-based affine transform motion compensation prediction using encoding parameters supplied from the outside.

2. The video encoding apparatus of claim 1, wherein the block-based affine transform motion compensation prediction control means: increasing the block size of the sub-block if the block size of the sub-block is controlled; restricting the prediction direction to a one-way direction in case of controlling the prediction direction; and rounding the motion vector of the sub-block to an integer motion vector under the condition of controlling the precision of the motion vector.

3. A video decoding apparatus that performs video decoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video decoding apparatus comprising:

4. The video decoding apparatus of claim 3, wherein the block-based affine transform motion compensation prediction control means: increasing the block size of the sub-block if the block size of the sub-block is controlled; restricting the prediction direction to a one-way direction in case of controlling the prediction direction; and rounding the motion vector of the sub-block to an integer motion vector under the condition of controlling the precision of the motion vector.

5. A video encoding method for performing video encoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video encoding method comprising:

using the supplied encoding parameters to control at least one of a block size, a prediction direction, and a motion vector precision of the sub-block of the block subjected to the block-based affine transform motion compensation prediction.

6. A video decoding method for performing video decoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video decoding method comprising:

controlling at least one of a block size, a prediction direction, and a motion vector precision of the sub-block in the block subjected to the block-based affine transform motion compensated prediction using at least encoding parameters extracted from a bitstream.

7. The video decoding method of claim 6, wherein: in a case of controlling the block size of the sub-block, the block size of the sub-block is increased; in the case of controlling the prediction direction, the prediction direction is limited to one direction; and in case of controlling the motion vector precision, the motion vector of the sub-block is rounded to an integer motion vector.

8. A video encoding program executed in a video encoding apparatus that performs video encoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video encoding program causing a computer to:

9. A video decoding program executed in a video decoding apparatus that performs video decoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video decoding program causing a computer to:

10. A video system using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector for each sub-block using motion vectors of control points in the block, the video system comprising:

a video encoding device for performing video encoding using the block-based affine transform motion compensation prediction technique; and

a video decoding apparatus to perform video decoding using the block-based affine transform motion compensation prediction technique,

wherein the video encoding apparatus includes an encoding-side block-based affine transformation motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of the sub-block in the block subjected to the block-based affine transformation motion compensation prediction using encoding parameters supplied in the video system, and

wherein the video decoding apparatus includes a decoding-side block-based affine transform motion compensation prediction control means for controlling at least one of the block size, the prediction direction, and the motion vector precision of the sub-block in the block subjected to the block-based affine transform motion compensation prediction using at least encoding parameters extracted from a bitstream from the video encoding apparatus.

Technical Field

The present invention relates to a video encoding apparatus, a video decoding apparatus, and a video system using block-based affine transform motion compensated prediction.

Background

As a video coding scheme, a scheme based on the HEVC (high efficiency video coding) standard is described in non-patent literature (NPL) 1. NPL 2 discloses block-based affine transform motion compensation prediction techniques to enhance the compression efficiency of HEVC.

With affine transform motion compensated prediction, motion involving deformations such as scaling or rotation can be expressed, which cannot be expressed with motion compensated prediction based on the translation model used in HEVC.

The affine transform motion compensated prediction technique is described in NPL 3.

The aforementioned block-based affine transformation motion compensated prediction (hereinafter referred to as "typical block-based affine transformation motion compensated prediction") is simplified to an affine transformation motion compensated prediction having the following features.

The upper left and upper right positions of the block to be processed are used as control points.

-as a motion vector field of the block to be processed, a motion vector of a sub-block obtained by dividing the block to be processed by a fixed size is derived.

A typical block-based affine transformation motion compensated prediction will be described below with reference to the explanatory diagrams in fig. 22 and 23. Fig. 23 is an explanatory diagram depicting one example of the positional relationship among a reference picture, a picture to be processed, and a block to be processed. In fig. 23, picWidth denotes the number of pixels in the horizontal direction, and picHeight denotes the number of pixels in the vertical direction.

Fig. 24 is an explanatory diagram depicting a state in which a unidirectional motion vector is set in each control point (circle in (B) in fig. 24) of the block to be processed depicted in fig. 23 (see (a) in fig. 24) and the motion vector of each sub-block is derived as a motion vector field of the block to be processed (see (C) in fig. 24).

For simplicity, fig. 24 depicts an example in which the number of horizontal pixels of the block to be processed is w-16, the number of vertical pixels of the block to be processed is h-16, the prediction direction of the motion vector of the control point is dir-L0, and the number of horizontal pixels and the number of vertical pixels of each sub-block are s-4.

The control point motion vector setting unit 5051 and the sub-block motion vector derivation unit 5052 depicted in fig. 24 are included in a functional block for performing motion compensated prediction in a video encoding device.

The control point motion vector setting unit 5051 sets the input two motion vectors as motion vectors for the upper left control point and the upper right control point (v in (B) in fig. 24_TLAnd v_TR)。

The motion vector at position (x, y) {0 ≦ x ≦ w-1,0 ≦ y ≦ h-1} in the block to be processed is expressed as follows.

v(x)＝((v_TR(x)-v_TL(x))×x/w)-((v_TR(y)-v_TL(y))×y/w)+v_TL(x) (1).

v(y)＝((v_TR(y)-v_TL(y))×x/w)+((v_TR(x)-v_TL(x))×y/w)+v_TL(y) (2).

In the above formula, v_TL(x)、v_TL(y)、v_TR(x) And v_TR(y) each represents v_TLComponent in x-direction (horizontal direction), v_TLComponent in the y-direction (vertical direction), v_TRComponent in the x-direction (horizontal direction), and v_TRThe component in the y-direction (vertical direction).

Next, the sub-block motion vector derivation unit 5052 calculates a motion vector at the center position in the sub-block as a sub-block motion vector for each sub-block based on the motion vector expression of the position in the block to be processed.

Accordingly, the control point motion vector setting unit 5051 and the sub-block motion vector derivation unit 5052 determine sub-block motion vectors.

Reference list

Non-patent document

NPL 1: R.Joshi et al "HEVC Screen Content Coding Draft Text 5" documentJCTVC-vtr005, Joint marketing Team on Video Coding (JCT-VC) of ITU-T SG 16WP3and ISO/IEC JTC1/SC 29/WG 11,22ndMeeting: Geneva, CH, 10 months 15-212015.

"Algorithm Description of Joint expression test model 5(JEM 5)" document JVT-E1001-v 2, Joint Video expression Team (JVT) of ITU-T SG 16WP3and ISO/IEC JTC1/SC 29/WG 11,5th Meeting: Geneva, CH,1 month in 12-202017, NPL 2: J.Chen et al.

Zhang et al, "Video coding using affine motion compensated prediction" (ISASSP 1996).

Disclosure of Invention

Technical problem

With the above-described typical block-based affine transform motion compensated prediction, motion vectors are dispersed in the block to be processed. Therefore, in a video encoding apparatus using typical block-based affine transform motion compensated prediction, the amount of memory access related to a reference picture during motion compensated prediction is greatly increased compared to the case of using ordinary motion compensated prediction (translational model-based motion compensated prediction with which motion vectors are not scattered in a block to be processed).

For example, when a typical block-based affine transform motion compensation prediction is applied to a video signal of a large image size such as 8K, there is a possibility that the amount of memory access related to a reference picture exceeds the peak band of the memory included in the device.

Herein, the "large image size" means that at least one of the number picWidth of pixels in the horizontal direction of the picture depicted in fig. 23 and the number picHeight of pixels in the vertical direction of the picture or the product of picWidth and picHeight (i.e., the area of the picture) is a large value.

As described above, the typical block-based affine transformation motion compensation prediction has a problem in that implementation costs of the video encoding apparatus and the video decoding apparatus increase.

The present invention has the following objects: provided are a video encoding device, a video decoding device, a video encoding method, a video decoding method, a program, and a video system, which can reduce the amount of memory access and reduce implementation cost in the case of using block-based affine transform motion compensation prediction.

Solution to the problem

A video encoding apparatus according to the present invention is a video encoding apparatus that performs video encoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in a block, the video encoding apparatus including: block-based affine transform motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using encoding parameters supplied from the outside.

A video decoding apparatus according to the present invention is a video decoding apparatus that performs video decoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in a block, the video decoding apparatus including: block-based affine transform motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using at least an encoding parameter extracted from a bitstream.

A video encoding method according to the present invention is a video encoding method that performs video encoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in a block, the video encoding method including: using the supplied encoding parameters to control at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction.

A video decoding method according to the present invention is a video decoding method that performs video decoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in a block, the video decoding method including: at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction is controlled using at least encoding parameters extracted from a bitstream.

A video decoding program according to the present invention is a video encoding program executed in a video encoding apparatus that performs video encoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in a block, the video encoding program causing a computer to: using the supplied encoding parameters to control at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction.

A video decoding program according to the present invention is a video decoding program executed in a video decoding apparatus that performs video decoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in a block, the video decoding program causing a computer to: at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction is controlled using at least encoding parameters extracted from a bitstream.

A video system according to the present invention is a system for video using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video system including: a video encoding device for performing video encoding using a block-based affine transformation motion compensation prediction technique; and a video decoding device for performing video decoding using a block-based affine transform motion compensation prediction technique, wherein the video encoding device includes a block-based affine transform motion compensation prediction control means on an encoding side for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using encoding parameters supplied in a video system, and wherein the video decoding device includes a block-based affine transform motion compensation prediction control means on a decoding side for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using at least encoding parameters extracted from a bitstream from the video encoding device.

Advantageous effects of the invention

According to the present invention, the amount of memory access can be reduced, and the implementation cost can be reduced.

Further, since the video encoding apparatus and the video decoding apparatus reduce the amount of memory access by a common method, a video system in which the interconnectivity between the video encoding apparatus and the video decoding apparatus is ensured can be provided.

Drawings

Fig. 1 is an explanatory diagram depicting an example of 33 types of angular intra prediction.

Fig. 2 is an explanatory diagram depicting an example of intra prediction.

Fig. 3 is an explanatory diagram depicting an example of a CTU partition of the frame t and an example of a CU partition of the CTU8 of the frame t.

Fig. 4 is an explanatory diagram depicting a quadtree structure corresponding to an example of a CU partition of the CTU 8.

Fig. 5 is a block diagram depicting the structure of an exemplary embodiment of a video encoding device.

Fig. 6 is a block diagram depicting an example of the structure of a block-based affine transform motion compensated prediction controller.

Fig. 7 is an explanatory diagram depicting a state in which a unidirectional motion vector is set in each control point of a block to be processed and a motion vector of each sub-block is derived as a motion vector field of the block to be processed in exemplary embodiment 1.

Fig. 8 is a flowchart depicting the operation of the block-based affine transformation motion compensation prediction controller in exemplary embodiment 1.

Fig. 9 is a block diagram depicting the structure of an exemplary embodiment of a video decoding apparatus.

Fig. 10 is an explanatory diagram depicting a state in which a unidirectional motion vector is set in each control point of a block to be processed and a motion vector of each sub-block is derived as a motion vector field of the block to be processed in exemplary embodiment 3.

Fig. 11 is a flowchart depicting the operation of the block-based affine transform motion compensation prediction controller in exemplary embodiment 3.

Fig. 12 is an explanatory diagram depicting an example of a positional relationship among a reference picture, a picture to be processed, and a block to be processed in bidirectional prediction.

Fig. 13 is an explanatory diagram depicting a state in which a typical block-based affine transformation motion compensation prediction controller sets a motion vector of a corresponding direction in each control point of a block to be processed and derives a motion vector of each sub-block as a motion vector field of the block to be processed.

Fig. 14 is an explanatory diagram depicting a state in which a motion vector of a corresponding direction is set in each control point of a block to be processed and a motion vector of each sub-block is derived as a motion vector field of the block to be processed in exemplary embodiment 4.

Fig. 15 is a flowchart depicting the operation of the block-based affine transform motion compensation prediction controller in exemplary embodiment 4.

Fig. 16 is a flowchart depicting the operation of the block-based affine transform motion compensation prediction controller in exemplary embodiment 5.

Fig. 17 is a flowchart depicting the operation of the block-based affine transform motion compensation prediction controller in exemplary embodiment 6.

Fig. 18 is a flowchart depicting the operation of the block-based affine transform motion compensation prediction controller in exemplary embodiment 7.

Fig. 19 is a block diagram depicting an example of the structure of a video system.

Fig. 20 is a block diagram depicting an example of the structure of an information processing system capable of realizing the functions of a video encoding apparatus and a video decoding apparatus.

Fig. 21 is a block diagram depicting the main components of the video encoding apparatus.

Fig. 22 is a block diagram depicting the main components of the video decoding apparatus.

Fig. 23 is an explanatory diagram depicting an example of a positional relationship between a reference picture, a picture to be processed, and a block to be processed.

Fig. 24 is an explanatory diagram depicting a state in which a unidirectional motion vector is set in each control point of a block to be processed and a motion vector of each sub-block is derived as a motion vector field of the block to be processed.

Detailed Description

Exemplary embodiment 1

First, intra prediction, inter prediction, and signaling indication of CUs and CTUs used in the video encoding apparatus according to the exemplary embodiment and the video decoding apparatus described below will be described below.

Each frame of the digitized video is divided into Coding Tree Units (CTUs), and each CTU is encoded in raster scan order.

Each CTU is divided into Coding Units (CUs) and coded in a quad-tree structure. Each CU is predictive coded. Predictive coding includes intra-prediction and inter-prediction.

The prediction error of each CU is transform coded based on a frequency transform.

The largest-sized CU is referred to as "largest CU" (largest coding unit: LCU), and the smallest-sized CU is referred to as "smallest CU" (smallest coding unit: SCU). The LCU size and CTU size are the same.

The intra prediction is a prediction for generating a prediction image from a reconstructed image having the same display time as a frame to be encoded. NPL 1 defines 33 types of angular intra prediction depicted in fig. 1. In angular intra prediction, reconstructed pixels near a block to be encoded are used for extrapolation in any one of 33 directions to generate an intra prediction signal. In addition to the 33 types of angular intra prediction, NPL 1 defines DC intra prediction for averaging reconstructed pixels near a block to be encoded, and planar intra prediction for linearly interpolating reconstructed pixels near the block to be encoded. A CU encoded based on intra prediction is hereinafter referred to as an "intra CU".

Inter prediction is prediction for generating a prediction image from a reconstructed image (reference picture) different in display time from a frame to be encoded. Inter prediction is also referred to as "inter prediction" hereinafter. Fig. 2 is an explanatory diagram depicting an example of intra prediction. Motion vector MV ═ (MV)_x,mv_y) Indicating the amount of translation of a reconstructed image block of a reference picture relative to a block to be encoded. In inter prediction, an inter prediction signal is generated (using pixel interpolation if necessary) based on a reconstructed image block of a reference picture. A CU encoded based on inter prediction is hereinafter referred to as an "inter CU".

In this exemplary embodiment, the video encoding apparatus can use the general motion compensation prediction depicted in fig. 2 and the aforementioned block-based affine transformation motion compensation prediction as inter prediction. Signaling whether normal motion compensated prediction or block-based affine motion compensated prediction is used by an inter _ affine _ flag syntax that indicates whether the inter CU is block-based affine motion compensated prediction.

A frame encoded to include only intra-CU is referred to as an "I-frame" (or "I-picture"). A frame encoded to include not only intra-CU but also inter-CU is referred to as "P frame" (or "P picture"). A frame encoded to include inter-CUs each using not only one reference picture but two reference pictures at the same time for inter-prediction of a block is referred to as a "B frame" (or "B picture").

Inter prediction using one reference picture is called "unidirectional prediction", and inter prediction using two reference pictures at the same time is called "bidirectional prediction".

Fig. 3 is an explanatory diagram depicting an example of a CTU partition of a frame t and an example of a CU partition of an eighth CTU (CTU8) included in the frame t in the case where the spatial resolution of the frame is Common Intermediate Format (CIF) and the CTU size is 64.

Fig. 4 is an explanatory diagram depicting a quadtree structure corresponding to an example of a CU partition of the CTU 8. The quad-tree structure of each CTU, i.e., the CU partition shape, is signaled by CU _ split _ flag syntax (referred to as split _ CU _ flag in NPL 1) described in NPL 1.

This completes the description of intra prediction, inter prediction, and signaling of the CTUs and CUs.

The structure and operation of the video encoding apparatus that receives each CU of each frame of digitized video as an input image and outputs a bitstream according to this exemplary embodiment will be described below with reference to fig. 5. Fig. 5 is a block diagram depicting an exemplary embodiment of a video encoding device.

The video encoding apparatus depicted in fig. 5 includes a transformer/quantizer 101, an entropy encoder 102, an inverse quantizer/transformer 103, a buffer 104, a predictor 105, and a multiplexer 106.

The predictor 105 determines, for each CTU, a CU _ split _ flag syntax value for determining a CU partition shape that minimizes the encoding cost.

The predictor 105 then determines, for each CU, a pred _ mode _ flag syntax value for determining intra prediction/inter prediction, an inter _ affine _ flag syntax value indicating whether the inter CU is predicted based on block-based affine transform motion compensation, an intra prediction direction (intra prediction direction of motion compensation prediction for the block to be processed), and a motion vector that minimizes the encoding cost. The predictor 105 includes a block-based affine transform motion compensated prediction controller 1050. The prediction direction of the motion compensated prediction for the block to be processed is hereinafter simply referred to as "prediction direction".

The predictor 105 generates a prediction signal corresponding to the input image signal of each CU based on the determined CU _ split _ flag syntax value, pred _ mode _ flag syntax value, inter _ affine _ flag syntax value, intra prediction direction, motion vector, and the like. The prediction signal is generated based on the aforementioned intra prediction or inter prediction.

Inter prediction is a normal motion compensation prediction when inter _ affine _ flag is 0, and is otherwise a block-based affine transform motion compensation prediction (i.e., when inter _ affine _ flag is 1).

The transformer/quantizer 101 frequency-transforms a prediction error image obtained by subtracting the prediction signal from the input image signal.

The transformer/quantizer 101 further quantizes the frequency-transformed prediction error image (frequency transform coefficients). The quantized frequency transform coefficients are hereinafter referred to as "transform quantization values".

The entropy encoder 102 entropy-encodes the cu _ split _ flag syntax value, the pred _ mode _ flag syntax value, the inter _ affine _ flag syntax value, the disparity information of the intra prediction direction and the disparity information of the motion vector determined by the predictor 105, and the transform quantization value.

The inverse quantizer/inverse transformer 103 inversely quantizes the transformed quantized value. The inverse quantizer/inverse transformer 103 further performs inverse frequency transform on the frequency transform coefficient obtained by the inverse quantization. The prediction signal is added to a reconstructed prediction error image obtained by inverse frequency transform, and the result is supplied to the buffer 104. The buffer 104 stores the reconstructed image.

The multiplexer 106 multiplexes the entropy-encoded data supplied from the entropy encoder 102 and outputs as a bitstream.

The bitstream includes the picture size, the prediction direction determined by the predictor 105, and the difference between the motion vectors determined by the predictor 105 (in particular, the difference between the motion vectors of the control points in the block).

The operation of the block-based affine transform motion compensation prediction controller 1050 will be described below.

Fig. 6 is a block diagram depicting an example of the structure of the block-based affine transform motion compensation prediction controller 1050. In the example depicted in fig. 6, the block-based affine transform motion compensation prediction controller 1050 includes a control point motion vector setting unit 1051 and a sub-block motion vector derivation unit 1052 that adds a control function.

Fig. 7 is an explanatory diagram depicting a state in which a unidirectional motion vector is set in each control point (circle in (B) in fig. 7) of the block to be processed depicted in fig. 23 (see (a) in fig. 7), and the motion vector of each sub-block is derived as a motion vector field of the block to be processed (see (C) in fig. 7).

As in the control point motion vector setting unit 5051 in fig. 24, the control point motion vector setting unit 5051 sets two motion vectors input as motion vectors for the upper left control point and the upper right control point (v in (B) in fig. 7)_TLAnd v_TR)。

The motion vector at position (x, y) {0 ≦ x ≦ w-1,0 ≦ y ≦ h-1} in the block to be processed is expressed by the foregoing equations (1) and (2).

The operation of the block-based affine transform motion compensation prediction controller 1050 will be described below with reference to the flowchart in fig. 8.

As in the control point motion vector setting unit 5051 in fig. 24, the control point motion vector setting unit 1051 assigns an externally input motion vector to a control point of a block to be processed (step S1001). The sub-block motion vector derivation unit 1052 that adds a control function determines whether the image size is larger than a predetermined size (step S1003). The predetermined size is, for example, a 4K size (picWidth ═ 4096 (or 3840), picHeight ═ 2160) or an 8K size (picWidth ═ 7680, picHeight ═ 4320), and can be appropriately set by a user depending on the performance of the video encoding apparatus and the like.

In the case where the image size is larger than a predetermined size, the sub-block motion vector derivation unit 1052 which adds a control function sets 8 × 8 pixels larger than the 4 × 4 pixel size depicted in fig. 24 to the sub-block size. That is, the sub-block motion vector derivation unit 1052 which adds a control function sets S to 8 (step S1004).

In the case where the image size is not larger than the predetermined size, the sub-block motion vector derivation unit 1052 which adds a control function sets the sub-block size to be the same as the 4 × 4 pixel size depicted in fig. 24. That is, the sub-block motion vector derivation unit 1052 that adds a control function sets S to 4 (step S1005).

As in the sub-block motion vector derivation unit 5052 in fig. 24, the sub-block motion vector derivation unit 1052 that adds a control function calculates a motion vector at the center position of the sub-block for each sub-block based on the motion vector representation of the position in the block to be processed, and sets the calculated motion vector as a sub-block motion vector (step S1002).

As described above, the predictor 105 generates a prediction signal of the input image signal for each CU based on the determined motion vector and the like.

In the case where the image size is larger than the predetermined size, the number of motion vectors of the block-based affine transform motion compensation prediction of the block to be processed in the video encoding apparatus according to this exemplary embodiment is less than that in the conventional video encoding apparatus, as can be understood from the difference between the number of motion vectors in the L0 direction of the sub-block in (C) in fig. 24 and the number of motion vectors in the L0 direction of the sub-block in (C) in fig. 7. In the example in fig. 7, the number of motion vectors is reduced to 1/4. In the case where the size of an image subjected to encoding is larger than a predetermined size, the video encoding apparatus according to the exemplary embodiment can thus reduce the amount of memory access related to a reference picture, as compared with a video encoding apparatus using a conventional block-based affine transformation motion compensation prediction controller.

Exemplary embodiment 2

The structure and operation of a video decoding apparatus that receives a bitstream as input from a video encoding apparatus or the like and outputs decoded video frames will be described below with reference to fig. 9. The video decoding apparatus according to this exemplary embodiment corresponds to the video encoding apparatus according to exemplary embodiment 1. That is, the video decoding apparatus according to this exemplary embodiment performs control for memory access amount reduction by a method common to the video encoding apparatus according to exemplary embodiment 1.

The video decoding apparatus according to the exemplary embodiment includes a demultiplexer 201, an entropy decoder 202, an inverse quantizer/inverse transformer 203, a predictor 204, and a buffer 205.

The demultiplexer 201 demultiplexes the input bitstream to extract an entropy-encoded video bitstream.

The entropy decoder 202 entropy decodes the video bitstream. The entropy decoder 202 entropy-decodes the coding parameters and the transform quantization values and supplies them to the inverse quantizer/transformer 203 and the predictor 204.

The entropy decoder 202 also supplies cu _ split _ flag, pred _ mode _ flag, inter _ affine _ flag, intra prediction direction, and motion vector to the predictor 204.

The inverse quantizer/inverse transformer 203 inversely quantizes the transformed quantized value. The inverse quantizer/inverse transformer 203 further performs inverse frequency transform on the frequency transform coefficient obtained by the inverse quantization.

After the inverse frequency transform, the predictor 204 generates a prediction signal using the reconstructed image stored in the buffer 205 based on the entropy-encoded u _ split _ flag, pred _ mode _ flag, inter _ affine _ flag, intra prediction direction, and motion vector. The prediction signal is generated based on the aforementioned intra prediction or inter prediction.

Inter prediction is a normal motion compensation prediction when inter _ affine _ flag is 0, and is a block-based affine transform motion compensation prediction otherwise (i.e., when inter _ affine _ flag is 1).

The predictor 204 includes a block-based affine transform motion compensation prediction controller 2040. As in the block-based affine transform motion compensation prediction controller 1050 in the video encoding apparatus according to exemplary embodiment 1, the block-based affine transform motion compensation prediction controller 2040 sets a motion vector in each control point, and then determines a sub-block size depending on whether or not the image size is larger than a predetermined size. The block-based affine transform motion compensation prediction controller 2040 then calculates a motion vector at the center position in the sub-block for each sub-block based on the motion vector representation of the position in the block to be processed, and sets the calculated motion vector as a sub-block motion vector. Specifically, the block-based affine transform motion compensation prediction controller 2040 includes blocks that operate in the same manner as the control point motion vector setting unit 1051 and the sub-block motion vector derivation unit 1052 to which a control function is added.

After the prediction signal is generated, the prediction signal supplied from the predictor 204 is added to a reconstructed prediction error image obtained by the inverse frequency transform by the inverse quantizer/inverse transformer 203, and the result is supplied to the buffer 205 as a reconstructed image.

The reconstructed image stored in the buffer 205 is then output as a decoded image (decoded video).

In the case where the image size is larger than the predetermined size, the number of motion vectors of the block-based affine transform motion compensation prediction for the block to be processed in the video decoding apparatus according to this exemplary embodiment is less than that in the conventional video decoding apparatus, as can be understood from the difference between the number of motion vectors in the L0 direction of the sub-block in (C) in fig. 24 and the number of motion vectors in the L0 direction of the sub-block in (C) in fig. 7. In the example in fig. 7, the number of motion vectors is reduced to 1/4. In the case where the size of an image subjected to decoding is larger than a predetermined size, the video decoding apparatus according to the exemplary embodiment can thus reduce the amount of memory access related to a reference picture, as compared with a video decoding apparatus using a conventional block-based affine transformation motion compensation prediction controller.

Exemplary embodiment 3

In the video encoding apparatus according to exemplary embodiment 1 and the video decoding apparatus according to exemplary embodiment 2, in the case where it is determined that the amount of memory access related to the reference picture is large, the block-based affine transform motion compensation prediction controllers 1050 and 2040 increase the sub-block size to reduce the amount of memory access.

The amount of memory access can also be reduced by changing the sub-block motion vector to an integer vector (i.e., changing the pixel location specified by the motion vector to an integer location) as depicted in 10 instead of increasing the sub-block size. By changing the pixel positions to integer positions, the fractional pixel position interpolation process is omitted so that the amount of memory access is reduced by an amount corresponding to the interpolation process.

Fig. 10 is an explanatory diagram depicting a state in which a unidirectional motion vector is set in each control point (circle in (B) in fig. 10) of the block to be processed depicted in fig. 23 (see (a) in fig. 10) and the motion vector of each sub-block is derived as a motion vector field of the block to be processed (see (C) in fig. 10) in the video encoding apparatus and the corresponding video decoding apparatus according to exemplary embodiment 3.

The video encoding apparatus and the corresponding video decoding apparatus according to exemplary embodiment 3 may have the same general structures as those depicted in fig. 5 and 9.

The operation of the block-based affine transform motion compensation prediction controller 1050 in the video encoding apparatus according to exemplary embodiment 3 will be described below with reference to the flowchart in fig. 11. The block-based affine transform motion compensation prediction controller 2040 in the video decoding apparatus operates in the same manner as the block-based affine transform motion compensation prediction controller 1050.

As in the control point motion vector setting unit 5051 in fig. 24, the control point motion vector setting unit 1051 assigns an externally input motion vector to a control point of a block to be processed (step S1001). As in the sub-block motion vector derivation unit 5052 in fig. 24, the sub-block motion vector derivation unit 1052 that adds a control function calculates a motion vector at the center position of the sub-block for each sub-block, and sets the calculated motion vector as a sub-block motion vector (step S1002). The motion vector is a vector of fractional precision.

The sub-block motion vector derivation unit 1052 which adds a control function then determines whether the image size is larger than a predetermined size (step S1003). In the case where the image size is not larger than the predetermined size, the process ends. In this case, the motion vector v is kept as a vector of fractional precision.

In the case where the image size is larger than the predetermined size, the sub-block motion vector derivation unit 1052 which adds a control function rounds the motion vector v of each sub-block to a vector of integer precision (step S2001).

The motion vector v is expressed by the following formula.

v_INT(x)＝floor(v(x),prec)

v_INT(y)＝floor(v(x),prec) (3).

In this formula, floor (a, b) is a function that returns a multiple of b. The multiple of b returned is closest to the variable a among the multiple multiples of b. "prec" means the pixel precision of a motion vector. For example, in the case where the motion vector pixel accuracy is 1/16, prec is 16.

The predictor 105 (in the video decoding apparatus, the predictor 204) generates a prediction signal of the input image signal for each CU based on the determined motion vector or the like.

Exemplary embodiment 4

The amount of memory access can also be reduced by forcing the motion vector of the block to be processed in bi-directional prediction to be unidirectional instead of increasing the sub-block size.

Fig. 12 is an explanatory diagram depicting an example of a positional relationship among a reference picture, a picture to be processed, and a block to be processed in bidirectional prediction.

Fig. 13 is an explanatory diagram for comparison between typical block-based affine transform motion compensated prediction and exemplary embodiment 4. Specifically, fig. 13 is an explanatory diagram depicting a state in which a typical block-based affine transform motion compensation prediction controller (including the control point motion vector setting unit 5051 and the sub-block motion vector derivation unit 5052 depicted in fig. 24) sets a motion vector of a corresponding direction in each control point (circle in (B) in fig. 13) of a block to be processed depicted in fig. 12 (see (a) in fig. 13), and derives the motion vector of each sub-block as a motion vector field of the block to be processed (see (C) in fig. 13).

Fig. 14 is an explanatory diagram depicting a state in which the block-based affine transformation motion compensation prediction controller 1050 in the video encoding apparatus according to exemplary embodiment 4 sets a motion vector of a corresponding direction in each control point (circle in (B) in fig. 14) of the block to be processed depicted in fig. 12 (see (a) in fig. 14), and derives the motion vector of each sub-block as a motion vector field of the block to be processed (see (C) in fig. 14).

The video encoding apparatus and the corresponding video decoding apparatus according to exemplary embodiment 4 may have the same general structures as those depicted in fig. 5 and 9.

The operation of the block-based affine transform motion compensation prediction controller 1050 in the video encoding apparatus according to exemplary embodiment 4 will be described below with reference to the flowchart in fig. 15. The block-based affine transform motion compensation prediction controller 2040 in the video decoding apparatus operates in the same manner as the block-based affine transform motion compensation prediction controller 1050.

As in the control point motion vector setting unit 5051 in fig. 24, the control point motion vector setting unit 1051 assigns an externally input motion vector to a control point of a block to be processed (step S1001). As in the sub-block motion vector derivation unit 5052 in fig. 24, the sub-block motion vector derivation unit 1052 that adds a control function calculates a motion vector at the center position of the sub-block for each sub-block, and sets the calculated motion vector as a sub-block motion vector (step S1002).

In the case where the image size is larger than the predetermined size, the sub-block motion vector derivation unit 1052 which adds a control function disables the sub-block motion vector in the L1 direction to restrict the motion vector v of each sub-block to one direction (step S2002).

The predictor 105 (in the video decoding apparatus, the predictor 204) generates a prediction signal of the input image signal for each CU based on the determined motion vector or the like.

The sub-block motion vector derivation unit 1052 that adds a control function may disable the sub-block motion vector in the L0 direction instead of disabling the sub-block motion vector in the L1 direction. In addition, the video decoding apparatus may multiplex a syntax of information about a prediction direction to be disabled into the bitstream, and the video decoding apparatus may extract the syntax of the information from the bitstream and disable the motion vector in the prediction direction.

The number of motion vectors of the block-based affine transformation motion compensation prediction for the block to be processed in the video encoding apparatus and the video decoding apparatus according to this exemplary embodiment is less than the number of motion vectors of the block-based affine transformation motion compensation prediction in the conventional video encoding apparatus and the video decoding apparatus, as can be understood from the difference (specifically, 1/2) between the number of motion vectors of the sub-block in (C) in fig. 13 and the number of motion vectors of the sub-block in (C) in fig. 14. In the case where the size of an image subject to encoding is larger than a predetermined size, the video encoding apparatus and the video decoding apparatus according to the exemplary embodiment can thus reduce the amount of memory access related to a reference picture, as compared with the video encoding process and the video decoding process using the conventional block-based affine transformation motion compensation prediction controller.

As is clear from the above description, the number of motion vectors of the block-based affine transformation motion compensation prediction for the block to be processed in this exemplary embodiment is the same as that in the case of using the typical block-based affine transformation motion compensation prediction for all blocks of P pictures that do not use bidirectional prediction and blocks of B pictures that do not use bidirectional prediction (i.e., unidirectional predicted blocks). Therefore, the block-based affine transform motion compensated prediction in this exemplary embodiment can be limited only to blocks using bi-directional prediction.

Exemplary embodiment 5

In the video encoding apparatus according to exemplary embodiment 1 and the video decoding apparatus according to exemplary embodiment 2, the block-based affine transform motion compensation prediction controllers 1050 and 2040 determine whether the amount of memory access related to the reference picture is large based on the image size, and in a case where it is determined that the amount of memory access related to the reference picture is large, increase the sub-block size to reduce the amount of memory access.

Instead of performing the determination based on the image size, the block-based affine transform motion compensation prediction controllers 1050 and 2040 may control the commonly used sub-block size S based on the syntax. That is, the multiplexer 106 in the video encoding apparatus may multiplex log2_ affine _ sublock _ size _ minus2 syntax indicating information on a subblock size S into a bitstream, and the demultiplexer 201 in the video decoding apparatus may extract the syntax of the information from the bitstream and decode the syntax to obtain the subblock size S, which is then used by the predictor 204.

The relationship between the syntax value of log2_ affine _ sublock _ size _ minus2 and the subblock size S is expressed by the following formula.

S＝1<<(log2_affine_subblock_size_minus2+2) (4).

In this formula, < < denotes a displacement operation in the left direction.

The operation of the block-based affine transform motion compensation prediction controller 1050 performing the above-described control in the video encoding apparatus according to exemplary embodiment 5 will be described below with reference to the flowchart in fig. 16. The block-based affine transform motion compensation prediction controller 2040 in the video decoding apparatus operates in the same manner as the block-based affine transform motion compensation prediction controller 1050.

The sub-block motion vector derivation unit 1052 that adds a control function determines a sub-block size S from the log2_ affine _ sub _ size _ minus2 syntax value based on the relational formula (4) (step S2003).

As in the sub-block motion vector derivation unit 5052 in fig. 24, the sub-block motion vector derivation unit 1052 that adds a control function calculates a motion vector at the center position of the sub-block for each sub-block, and sets the calculated motion vector as a sub-block motion vector (step S1002). In this exemplary embodiment, the sub-block motion vector derivation unit 1052 that adds a control function calculates a sub-block motion vector for the sub-block of the sub-block size S determined in the process of step S2002.

The predictor 105 (in the video decoding apparatus, the predictor 204) generates a prediction signal of the input image signal for each CU based on the determined motion vector or the like.

The video encoding apparatus and the corresponding video decoding apparatus according to exemplary embodiment 5 may have the same general structures as those depicted in fig. 5 and 9.

In this exemplary embodiment, the image size determination process is unnecessary, so that the structures of the block-based affine transformation motion compensation prediction controllers 1050 and 2040 can be simplified.

Exemplary embodiment 6

In the video encoding apparatus and the video decoding apparatus according to exemplary embodiment 3, the block-based affine transform motion compensation prediction controllers 1050 and 2040 determine whether the amount of memory access related to the reference picture is large based on the image size, and in the case where it is determined that the amount of memory access related to the reference picture is large, change the sub-block motion vector to an integer vector to reduce the amount of memory access.

Alternatively, the block-based affine transform motion compensation prediction controllers 1050 and 2040 may determine whether to change the sub-block motion vectors to integer vectors based on a syntax indicating whether to change the motion vectors to integer vectors.

That is, the multiplexer 106 in the video decoding apparatus may multiplex an enable _ affine _ sub _ integer _ mv _ flag syntax indicating information on whether integer precision is applied (i.e., whether integer precision is enabled) into the bitstream, and the demultiplexer 201 in the video decoding apparatus may extract the syntax of the information from the bitstream and decode the syntax to obtain information, which is then used by the predictor 204.

In case the enable _ affine _ sub _ integer _ mv _ flag syntax value is 1, integer precision is applied (integer precision is enabled). Otherwise (i.e., in case the enable _ affine _ sub _ integer _ mv _ flag syntax value is 0), no integer precision is applied (integer precision is disabled).

The operation of the block-based affine transform motion compensation prediction controller 1050 performing the above-described control in the video encoding apparatus according to exemplary embodiment 6 will be described below with reference to the flowchart in fig. 17. The block-based affine transform motion compensation prediction controller 2040 in the video decoding apparatus operates in the same manner as the block-based affine transform motion compensation prediction controller 1050.

The sub-block motion vector derivation unit 1052 that adds a control function determines whether to make the sub-block motion vector an integer vector (i.e., whether integer precision is enabled) from enable _ affine _ sub _ integer _ mv _ flag (step S3001). In the case where integer precision is not enabled, the process ends.

In the case where integer precision is enabled, the sub-block motion vector derivation unit 1052 that adds a control function rounds the motion vector v of each sub-block to a vector of integer precision (step S2001). The motion vector v of integer precision is expressed by the aforementioned formula (3).

The predictor 105 (in the video decoding apparatus, the predictor 204) generates a prediction signal of the input image signal for each CU based on the determined motion vector or the like.

The video encoding apparatus and the corresponding video decoding apparatus according to exemplary embodiment 6 may have the same general structures as those depicted in fig. 5 and 9.

Exemplary embodiment 7

In the video encoding device and the video decoding device according to exemplary embodiment 4, the block-based affine transform motion compensation prediction controllers 1050 and 2040 determine whether the amount of memory access relating to a reference picture is large based on the image size, and in the case where it is determined that the amount of memory access relating to a reference picture is large, the motion vector of the block to be processed in bidirectional prediction is forcibly set to a unidirectional motion vector to reduce the amount of memory access.

Alternatively, the block-based affine transformation motion compensation prediction controllers 1050 and 2040 may determine whether or not to forcibly make the motion vector of the block to be processed in bidirectional prediction into a unidirectional motion vector based on syntax indicating whether or not to make the motion vector into an integer vector.

That is, the multiplexer 106 in the video decoding apparatus may multiplex a disable _ affine _ subblock _ bipred _ mv _ flag syntax indicating information on whether to forcibly set a motion vector to a one-way direction (i.e., whether a change to a one-way direction is enabled) into a bitstream, and the demultiplexer 201 in the video decoding apparatus may extract the syntax of the information from the bitstream and decode the syntax to obtain information, which is then used by the predictor 204.

In the case where the disable _ affine _ sub _ bipred _ mv _ flag syntax value is 1, a forced change to a unidirectional direction is not performed (a change to a unidirectional direction is disabled). Otherwise (i.e., disable _ affine _ sub _ bipred _ mv _ flag syntax value is 0), a forced change to unidirectional is performed (change to unidirectional is enabled).

The operation of the block-based affine transform motion compensation prediction controller 1050 performing the above-described control in the video encoding apparatus according to exemplary embodiment 7 will be described below with reference to the flowchart in fig. 18. The block-based affine transform motion compensation prediction controller 2040 in the video decoding apparatus operates in the same manner as the block-based affine transform motion compensation prediction controller 1050.

The sub-block motion vector derivation unit 1052 that adds a control function determines whether or not to set the sub-block motion vector to one-way (i.e., whether or not a change to one-way is enabled) from disable _ affine _ sub _ bipred _ mv _ flag (step S4001). In the case where the change to the one-way direction is not enabled, the process ends.

In the case where the change to the one direction is enabled, the sub-block motion vector derivation unit 1052 that adds a control function disables the sub-block motion vector in the L1 direction to restrict the motion vector v of each sub-block to the one direction (step S2001).

The predictor 105 (in the video decoding apparatus, the predictor 204) generates a prediction signal of the input image signal for each CU based on the determined motion vector or the like.

The video encoding apparatus and the corresponding video decoding apparatus according to exemplary embodiment 9 may have the same general structures as those depicted in fig. 5 and 9.

As in exemplary embodiment 4, the sub-block motion vector derivation unit 1052 that adds a control function may disable the sub-block motion vector in the L0 direction instead of disabling the sub-block motion vector in the L1 direction. In addition, the video encoding apparatus may multiplex a syntax of information about a prediction direction to be disabled into the bitstream, and the video decoding apparatus may extract the syntax of the information from the bitstream and disable the motion vector in the prediction direction.

As described above, in the block-based affine transform motion compensation prediction in each of the foregoing exemplary embodiments, the sub-block motion vector derivation unit that adds a control function determines whether the amount of memory access related to the reference picture is large, and in the case where it is determined that the amount of memory access is large, derives the sub-block motion vector so as to reduce the amount of memory access related to the reference picture.

Whether the amount of memory access related to the reference picture is large is determined using the difference between the image size, the prediction direction (prediction direction of motion compensated prediction for the block to be processed), or the motion vector of the control point in the block to be processed.

Further, at least one of the limitation of the number of motion vectors and the reduction of the motion vector precision is used to reduce the amount of memory access related to the reference picture, as described below.

Limitation of the number of motion vectors: increasing the sub-block size, setting the prediction direction to unidirectional, or a combination thereof.

The motion vector precision decreases: and rounding the motion vector of the sub-block into a motion vector with integer precision.

The foregoing exemplary embodiments may be used singly, or two or more exemplary embodiments may be combined where appropriate.

Specifically, although in the video encoding device and the video decoding device according to each of the foregoing exemplary embodiments, the determination whether the amount of memory access is large is performed using the image size, the prediction direction of the block to be processed, or the difference between the motion vectors of the control points in the block to be processed, any combination of these three elements may be used in the determination.

Although in the video encoding apparatus and the video decoding apparatus according to each of the foregoing exemplary embodiments, the reduction in the amount of memory access is performed by increasing the sub-block size, making the sub-block motion vector an integer vector, or restricting the sub-block motion vector to one direction, any combination of these three methods may be used.

Exemplary embodiment 8

Fig. 19 is a block diagram depicting an example of the structure of a video system. The video encoding apparatus 100 in the video system 400 is a video encoding apparatus according to any one of the foregoing exemplary embodiments and a video encoding apparatus combining two or more of the foregoing exemplary embodiments. The video decoding apparatus 200 in the video system 400 is a video decoding apparatus according to any one of the foregoing exemplary embodiments and a video decoding apparatus combining two or more of the foregoing exemplary embodiments. The video encoding device 100 and the video decoding device 200 are communicatively connected via a transmission path 300 (wireless transmission path or wired transmission path).

In the exemplary embodiment, the video encoding apparatus 100 and the video decoding apparatus 200 reduce the amount of memory access by a common method. This ensures high interconnectivity between the video encoding device 100 and the video decoding device 200.

For example, in the case where the video encoding apparatus 100 and the video decoding apparatus 200 are configured according to the foregoing exemplary embodiment 5, as shown in table 1, values of the log2_ affine _ sublock _ size _ minus2 syntax corresponding to each image size are specified. The video system 400 then sets the specified value corresponding to the image size in the video encoding apparatus 100, thereby ensuring interconnectivity between the video encoding apparatus 100 and the video decoding apparatus 200 and making service and operation more efficient.

[ Table 1]

For example, in the case where the video encoding apparatus 100 and the video decoding apparatus 200 are configured according to the foregoing exemplary embodiment 6, as shown in table 2, a value of enable _ affine _ sub _ integer _ mv _ flag syntax corresponding to each picture size is specified. The video system 400 then sets the specified value corresponding to the image size in the video encoding apparatus 100, thereby ensuring interconnectivity between the video encoding apparatus 100 and the video decoding apparatus 200 and making service and operation more efficient.

[ Table 2]

For example, in the case where the video encoding apparatus 100 and the video decoding apparatus 200 are configured according to the foregoing exemplary embodiment 7, as shown in table 3, the value of disable _ affine _ sub _ bipred _ mv _ flag corresponding to each picture size is specified. The video system 400 then sets the specified value corresponding to the image size in the video encoding apparatus 100, thereby ensuring interconnectivity between the video encoding apparatus 100 and the video decoding apparatus 200 and making service and operation more efficient.

[ Table 3]

Each of the foregoing exemplary embodiments may be implemented by hardware or a computer program.

The information processing system depicted in fig. 20 includes a processor 1001, a program memory 1002, a storage medium 1003 for storing video data, and a storage medium 1004 for storing a bitstream. The storage medium 1003 and the storage medium 1004 may be separate storage media or storage areas included in the same storage medium. A magnetic storage medium such as a hard disk may be used as the storage medium.

In the information processing system depicted in fig. 20, a program for realizing the functions of the blocks depicted in fig. 5 (except for the buffer blocks) or the blocks depicted in fig. 9 (except for the buffer blocks) is stored in the program memory 1002. The processor 1001 realizes the functions of the video encoding apparatus or the video decoding apparatus according to the foregoing exemplary embodiments by executing a process according to a program stored in the program memory 1002.

In the video system 400 depicted in fig. 19, the video encoding apparatus 100 may be implemented by the information processing system depicted in fig. 20, and the video decoding apparatus 200 may be implemented by the information processing system depicted in fig. 20.

Fig. 21 is a block diagram depicting the main components of the video encoding apparatus. As depicted in fig. 21, the video encoding apparatus 10 includes a block-based affine transform motion compensation prediction control unit 11 (corresponding to the block-based affine transform motion compensation prediction controller 1050 in the exemplary embodiment) for controlling at least one of a block size, a prediction direction, and a motion vector precision of sub-blocks in a block subjected to block-based affine transform motion compensation prediction, using encoding parameters supplied from the outside.

The term "external" means external to the block-based affine transform motion compensation prediction control unit 11. Examples of the encoding parameters supplied from the outside include an image size set outside the block-based affine transformation motion compensation prediction control unit 11, a prediction direction determined by a prediction unit (e.g., the predictor 105 in fig. 5), and a difference between motion vectors determined by the prediction unit (e.g., the predictor 105 in fig. 5) (in particular, a difference between motion vectors of control points in a block).

Fig. 22 is a block diagram depicting the main components of the video decoding apparatus. As depicted in fig. 22, the video decoding apparatus 20 includes a block-based affine transform motion compensation prediction control unit 21 (corresponding to the block-based affine transform motion compensation prediction controller 2040 in the exemplary embodiment) for controlling at least one of the block size, the prediction direction, and the motion vector precision of sub-blocks in a block subjected to block-based affine transform motion compensation prediction using at least encoding parameters extracted from a bitstream.

Examples of encoding parameters for block-based affine transform motion compensated prediction include an image size, a prediction direction determined by a prediction unit (e.g., predictor 105 in fig. 5), and a difference between motion vectors determined by the prediction unit (e.g., predictor 105 in fig. 5) (particularly, a difference between motion vectors of control points in a block), which are included in a bitstream.

All or part of the foregoing exemplary embodiments may be described as the following supplementary explanation, but the present invention is not limited to the following structure.

(supplementary note 1) a video encoding apparatus that performs video encoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video encoding apparatus comprising: block-based affine transform motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using encoding parameters supplied from the outside.

(supplementary note 2) the video encoding device according to supplementary note 1, wherein the block-based affine transformation motion compensation prediction control means: under the condition of controlling the block size of the subblock, increasing the block size of the subblock; in the case of controlling the prediction direction, restricting the prediction direction to one direction; and rounding the motion vector of the sub-block to a motion vector of integer precision under the condition of controlling the precision of the motion vector.

(supplementary note 3) a video decoding apparatus that performs video decoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video decoding apparatus comprising: block-based affine transform motion compensation prediction control means for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using at least an encoding parameter extracted from a bitstream.

(supplementary note 4) the video decoding device according to supplementary note 3, wherein the block-based affine transformation motion compensation prediction control means: under the condition of controlling the block size of the subblock, increasing the block size of the subblock; in the case of controlling the prediction direction, restricting the prediction direction to one direction; and rounding the motion vector of the sub-block to a motion vector of integer precision under the condition of controlling the precision of the motion vector.

(supplementary note 5) a video encoding method of performing video encoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video encoding method including: using the supplied encoding parameters, at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction is controlled.

(supplementary note 6) the video encoding method according to supplementary note 5, wherein: in the case of controlling the block size of the subblock, the block size of the subblock is increased; in the case of controlling the prediction direction, the prediction direction is limited to one direction; and in the case of controlling the motion vector precision, the motion vector of the sub-block is rounded to a motion vector of integer precision.

(supplementary note 7) a video decoding method of performing video decoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video decoding method comprising: at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction is controlled using at least encoding parameters extracted from a bitstream.

(supplementary note 8) the video decoding method according to supplementary note 7, wherein: in the case of controlling the block size of the subblock, the block size of the subblock is increased; in the case of controlling the prediction direction, the prediction direction is limited to one direction; and in the case of controlling the motion vector precision, the motion vector of the sub-block is rounded to a motion vector of integer precision.

(supplementary note 9) a video encoding program executed in a video encoding apparatus that performs video encoding using a block-based affine transform motion compensation prediction technique including a process of calculating a motion vector of each sub-block using a motion vector of a control point in the block, the video encoding program causing a computer to: using the supplied encoding parameters, at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction is controlled.

(supplementary note 10) the video encoding program according to supplementary note 9, wherein the computer is caused to execute processes for: under the condition of controlling the block size of the subblock, increasing the block size of the subblock; in the case of controlling the prediction direction, restricting the prediction direction to one direction; and rounding the motion vector of the sub-block to a motion vector of integer precision under the condition of controlling the precision of the motion vector.

(supplementary note 11) a video decoding program executed in a video decoding apparatus that performs video decoding using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video decoding program causing a computer to: at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction is controlled using at least encoding parameters extracted from a bitstream.

(supplementary note 12) the video decoding program according to supplementary note 11, wherein the computer is caused to execute processes for: under the condition of controlling the block size of the subblock, increasing the block size of the subblock; in the case of controlling the prediction direction, restricting the prediction direction to one direction; and rounding the motion vector of the sub-block to a motion vector of integer precision under the condition of controlling the precision of the motion vector.

(supplementary note 13) a video system using a block-based affine transformation motion compensation prediction technique including a process of calculating a motion vector of each sub-block using motion vectors of control points in the block, the video system comprising: a video encoding device for performing video encoding using a block-based affine transform motion compensation prediction technique; and a video decoding apparatus for performing video decoding using a block-based affine transform motion compensation prediction technique, wherein the video encoding apparatus includes a block-based affine transform motion compensation prediction control means on an encoding side for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using encoding parameters supplied in a video system, and wherein the video decoding apparatus includes a block-based affine transform motion compensation prediction control means on a decoding side for controlling at least one of a block size, a prediction direction, and a motion vector precision of a sub-block in a block subjected to block-based affine transform motion compensation prediction using at least encoding parameters extracted from a bitstream from the video encoding apparatus.

(supplementary note 14) the video system according to supplementary note 13, wherein each of the block-based affine transformation motion compensation prediction controlling means on the encoding side and the block-based affine transformation motion compensation prediction controlling means on the decoding side: under the condition of controlling the block size of the subblock, increasing the block size of the subblock; in the case of controlling the prediction direction, restricting the prediction direction to one direction; and rounding the motion vector of the sub-block to a motion vector of integer precision under the condition of controlling the precision of the motion vector.

(supplementary note 15) a video encoding program for implementing the video encoding method according to supplementary note 5 or 6.

(supplementary note 16) a video decoding program for implementing the video decoding method according to supplementary note 7 or 8.

The present application claims priority based on japanese patent application No.2017-193503, filed on 3/10/2017, the contents of which are incorporated herein in their entirety.

Although the present invention has been described with reference to the foregoing exemplary embodiments, the present invention is not limited to the foregoing exemplary embodiments. Various changes in the structure and details of the invention may be made within the scope of the invention as will be understood by those skilled in the art.

List of reference numerals

10 video encoding apparatus

11 affine transformation motion compensation prediction control unit based on block

20 video decoding device

21 affine transformation motion compensation prediction control unit based on block

100 video encoding apparatus

101 converter/quantizer

102 entropy coder

103 inverse quantizer/inverse transformer

104 buffer

105 predictor

106 multiplexer

200 video decoding apparatus

201 demultiplexer

202 entropy decoder

203 inverse quantizer/inverse transformer

204 predictor

205 buffer

300 transmission path

400 video system

1001 processor

1002 program memory

1003 storage medium

1004 storage medium

1050 block-based affine transformation motion compensation prediction controller

1051 control point motion vector setting unit

1052 sub-block motion vector derivation unit adding control function

2040 affine transformation motion compensation prediction controller based on block

43页详细技术资料下载

Video encoding device, video decoding device, video encoding method, video decoding method, program, and video system

相关技术

网友询问留言