Encoding device, decoding device, encoding method, and decoding method

文档序号:144857 发布日期:2021-10-22 浏览:29次 中文

阅读说明:本技术 编码装置、解码装置、编码方法及解码方法 (Encoding device, decoding device, encoding method, and decoding method ) 是由 远间正真 西孝启 安倍清史 加藤祐介 于 2020-03-25 设计创作,主要内容包括:编码装置(100)具备:电路;以及存储器,与电路连接;电路在动作中将图像编码;在图像的编码中,将图像的系数信息二值化;控制是否对系数信息被二值化后的二值化数据序列应用算术编码;输出包含被应用了算术编码或没有被应用算术编码的二值化数据序列的比特序列;在系数信息的二值化中,在对二值化数据序列应用算术编码、并且不满足规定的条件的情况下,将系数信息按照第1语法构造二值化;在对二值化数据序列应用算术编码、并且满足规定的条件的情况下,将系数信息按照与第1语法构造不同的第2语法构造二值化;在不对二值化数据序列应用算术编码的情况下,将系数信息按照第2语法构造二值化。(An encoding device (100) is provided with: a circuit; and a memory connected to the circuit; the circuit encodes the image in motion; in the coding of the image, the coefficient information of the image is binarized; controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized; outputting a bit sequence containing a binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied; in the binarization of the coefficient information, in the case where arithmetic coding is applied to the binarized data sequence and a prescribed condition is not satisfied, the coefficient information is binarized in accordance with the 1 st syntax structure; binarizing the coefficient information in a2 nd syntax configuration different from the 1 st syntax configuration in a case where arithmetic coding is applied to the binarized data sequence and a prescribed condition is satisfied; the coefficient information is binarized in accordance with the 2 nd syntax configuration without applying arithmetic coding to the binarized data sequence.)

1. A kind of encoding device is disclosed, which comprises a frame,

the disclosed device is provided with:

a circuit; and

a memory connected to the circuit;

the circuit encodes an image during operation;

in the above-described encoding of the image,

binarizing the coefficient information of the image;

controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized;

outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied;

in the binarization of the above-described coefficient information,

applying arithmetic coding to the binarized data sequence and, when a predetermined condition is not satisfied, binarizing the coefficient information according to a1 st syntax structure;

applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure;

the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

2. The encoding apparatus as set forth in claim 1,

the predetermined condition is that the orthogonal transform process is skipped when the coefficient information is derived from the prediction residual of the image.

3. The encoding apparatus as set forth in claim 1,

the predetermined condition is that the number of syntax elements that have been encoded in a mode different from the bypass mode in the context-based adaptive binary arithmetic coding, which is CABAC, is equal to or greater than a threshold value in the region of the image including the processing target block.

4. The encoding device according to any one of claims 1 to 3,

the above-described bit sequence indicates whether the application of arithmetic coding is valid in a sequence parameter set, a picture parameter set, or a slice header.

5. The encoding device according to any one of claims 1 to 4,

the above circuit switches inclusively whether arithmetic coding is applied or not in units including 1 or more slices or 1 or more pictures.

6. A kind of decoding device is disclosed, which can be used in the decoding device,

the disclosed device is provided with:

a circuit; and

a memory connected to the circuit;

the circuit decodes an image during operation;

in the above-described decoding of the image,

acquiring a bit sequence including a binarized data sequence in which the coefficient information of the image is binarized;

controlling whether to apply arithmetic decoding to the binarized data sequence;

inversely binarizing the binarized data sequence to which arithmetic decoding is applied or not applied;

in the above-described inverse binarization of the binarized data sequence,

applying arithmetic decoding to the binarized data sequence and, when a predetermined condition is not satisfied, constructing inverse binarization for the binarized data sequence in accordance with syntax 1;

applying arithmetic decoding to the binarized data sequence and, when the predetermined condition is satisfied, inverse binarization to the binarized data sequence in accordance with a2 nd syntax structure different from the 1 st syntax structure;

the binarized data sequence is constructed as inverse binarization according to the syntax 2 without applying arithmetic decoding to the binarized data sequence.

7. The decoding device as set forth in claim 6,

the predetermined condition is that the inverse orthogonal transform process is skipped when deriving the prediction residual of the image from the coefficient information.

8. The decoding device as set forth in claim 6,

the predetermined condition is that the number of syntax elements subjected to decoding processing in a mode different from the bypass mode in context-based adaptive binary arithmetic coding, which is CABAC, is equal to or greater than a threshold value in the region including the processing target block in the image.

9. The decoding apparatus according to any one of claims 6 to 8,

the above-described bit sequence indicates whether the application of arithmetic decoding is valid in a sequence parameter set, a picture parameter set, or a slice header.

10. The decoding apparatus according to any one of claims 6 to 9,

the above circuit inclusively switches whether arithmetic decoding is applied or not in units including 1 or more slices or 1 or more pictures.

11. A method for coding a plurality of data streams,

encoding the image;

in the above-described encoding of the image,

binarizing the coefficient information of the image;

controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized;

outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied;

in the binarization of the above-described coefficient information,

applying arithmetic coding to the binarized data sequence and, when a predetermined condition is not satisfied, binarizing the coefficient information according to a1 st syntax structure;

applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure;

the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

12. A method of decoding a digital video signal having a plurality of bits,

decoding the image;

in the above-described decoding of the image,

acquiring a bit sequence including a binarized data sequence in which the coefficient information of the image is binarized;

controlling whether to apply arithmetic decoding to the binarized data sequence;

inversely binarizing the binarized data sequence to which arithmetic decoding is applied or not applied;

in the above-described inverse binarization of the binarized data sequence,

applying arithmetic decoding to the binarized data sequence and, when a predetermined condition is not satisfied, constructing inverse binarization for the binarized data sequence in accordance with syntax 1;

applying arithmetic decoding to the binarized data sequence and, when the predetermined condition is satisfied, inverse binarization to the binarized data sequence in accordance with a2 nd syntax structure different from the 1 st syntax structure;

the binarized data sequence is constructed as inverse binarization according to the syntax 2 without applying arithmetic decoding to the binarized data sequence.

Technical Field

The present invention relates to video encoding, and for example, to a system, a component, a method, and the like in encoding and decoding of a moving image.

Background

Video Coding techniques are advancing to h.261 and MPEG-1 to h.264/AVC (Advanced Video Coding), MPEG-LA, h.265/HEVC (High Efficiency Video Coding), and h.266/VVC (Versatile Video Coding). With this advancement, there is always a need to provide improvements and optimizations in video coding techniques in order to handle the ever increasing amount of digital video data in a wide variety of applications.

Non-patent document 1 is an example of a conventional standard relating to the above-described video encoding technique.

Documents of the prior art

Non-patent document

Non-patent document 1: h.265(ISO/IEC 23008-2 HEVC)/HEVC (high Efficiency Video coding)

Disclosure of Invention

Problems to be solved by the invention

With regard to the above-described encoding method, it is desirable to propose a new method for improving encoding efficiency, improving image quality, reducing processing amount, reducing circuit scale, and appropriately selecting elements and operations such as a filter, a block, a size, a motion vector, a reference picture, a reference block, and the like.

The present invention provides a configuration or a method that can contribute to 1 or more of, for example, improvement in coding efficiency, improvement in image quality, reduction in processing amount, reduction in circuit scale, improvement in processing speed, and appropriate selection of elements or operations. The present invention may include configurations and methods that can contribute to benefits other than those described above.

Means for solving the problems

For example, an encoding device according to an aspect of the present invention includes: a circuit; and a memory connected to the circuit; the circuit encodes an image during operation; in the encoding of the image, the coefficient information of the image is binarized; controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized; outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied; in the binarization of the coefficient information, the coefficient information is binarized in accordance with a1 st syntax structure when arithmetic coding is applied to the binarized data sequence and a predetermined condition is not satisfied; applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure; the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

Some implementations of the embodiments of the present invention can improve the encoding efficiency, simplify the encoding/decoding process, increase the encoding/decoding processing speed, and efficiently select appropriate components and operations used for encoding and decoding, such as appropriate filters, block sizes, motion vectors, reference pictures, and reference blocks.

Further advantages and effects of a solution according to the invention will become clear from the description and the accompanying drawings. These advantages and/or effects can be obtained by the features described in the several embodiments and the description and drawings, respectively, but not necessarily all of them need to be provided in order to obtain 1 or more of the advantages and/or effects.

These general and specific aspects may be implemented by a system, a method, an integrated circuit, a computer program, a recording medium, or any combination thereof.

Effects of the invention

The configuration or method according to an aspect of the present invention can contribute to, for example, 1 or more of improvement in coding efficiency, improvement in image quality, reduction in processing amount, reduction in circuit scale, improvement in processing speed, and appropriate selection of elements or operations. Further, the configuration and the method according to one embodiment of the present invention may contribute to advantages other than those described above.

Drawings

Fig. 1 is a block diagram showing a functional configuration of an encoding device according to an embodiment.

Fig. 2 is a flowchart showing an example of the overall encoding process performed by the encoding device.

Fig. 3 is a conceptual diagram illustrating an example of block division.

Fig. 4A is a conceptual diagram illustrating an example of a slice configuration.

Fig. 4B is a conceptual diagram illustrating an example of the structure of a tile.

Fig. 5A is a table showing transformation basis functions corresponding to various transformation types.

Fig. 5B is a conceptual diagram illustrating an example of SVT (spatial Varying Transform).

Fig. 6A is a conceptual diagram illustrating an example of the shape of a filter used in the ALF (adaptive loop filter).

Fig. 6B is a conceptual diagram illustrating another example of the shape of the filter used in the ALF.

Fig. 6C is a conceptual diagram illustrating another example of the shape of the filter used in the ALF.

Fig. 7 is a block diagram showing an example of a detailed configuration of a loop filter unit that functions as a DBF (deblocking filter).

Fig. 8 is a conceptual diagram illustrating an example of a deblocking filter having filter characteristics symmetrical with respect to a block boundary.

Fig. 9 is a conceptual diagram for explaining a block boundary for performing deblocking filter processing.

Fig. 10 is a conceptual diagram showing an example of the Bs value.

Fig. 11 is a flowchart showing an example of processing performed by the prediction processing unit of the encoding apparatus.

Fig. 12 is a flowchart showing another example of the processing performed by the prediction processing unit of the encoding apparatus.

Fig. 13 is a flowchart showing another example of the processing performed by the prediction processing unit of the encoding apparatus.

Fig. 14 is a conceptual diagram illustrating an example of 67 intra prediction modes of intra prediction according to the embodiment.

Fig. 15 is a flowchart showing an example of a basic process flow of inter prediction.

Fig. 16 is a flowchart showing an example of motion vector derivation.

Fig. 17 is a flowchart showing another example of motion vector derivation.

Fig. 18 is a flowchart showing another example of motion vector derivation.

Fig. 19 is a flowchart showing an example of inter prediction in the normal inter mode.

Fig. 20 is a flowchart showing an example of inter prediction in the merge mode.

Fig. 21 is a conceptual diagram for explaining an example of motion vector derivation processing in the merge mode.

Fig. 22 is a flowchart showing an example of a frame rate up conversion process.

Fig. 23 is a conceptual diagram for explaining an example of pattern matching (bidirectional matching) between 2 blocks along a motion trajectory.

Fig. 24 is a conceptual diagram for explaining an example of pattern matching (template matching) between a template in a current picture and a block in a reference picture.

Fig. 25A is a conceptual diagram for explaining an example of deriving a motion vector in units of sub-blocks based on motion vectors of a plurality of adjacent blocks.

Fig. 25B is a conceptual diagram for explaining an example of deriving a motion vector for each sub-block in the affine mode having 3 control points.

Fig. 26A is a conceptual diagram for explaining the affine merging mode.

Fig. 26B is a conceptual diagram for explaining the affine merging mode with 2 control points.

Fig. 26C is a conceptual diagram for explaining the affine merging mode with 3 control points.

Fig. 27 is a flowchart showing an example of the processing in the affine merging mode.

Fig. 28A is a conceptual diagram for explaining the affine inter mode having 2 control points.

Fig. 28B is a conceptual diagram for explaining the affine inter mode having 3 control points.

Fig. 29 is a flowchart showing an example of processing in the affine inter mode.

FIG. 30A is a conceptual diagram for explaining an affine inter mode in which a current block has 3 control points and a neighboring block has 2 control points.

FIG. 30B is a conceptual diagram for explaining an affine inter mode in which a current block has 2 control points and a neighboring block has 3 control points.

Fig. 31A is a flowchart showing a merge mode including DMVR (decoder motion vector refinement).

Fig. 31B is a conceptual diagram for explaining an example of DMVR processing.

Fig. 32 is a flowchart showing an example of generation of a prediction image.

Fig. 33 is a flowchart showing another example of generation of a prediction image.

Fig. 34 is a flowchart showing another example of generation of a prediction image.

Fig. 35 is a flowchart for explaining an example of the prediction image correction processing by the OBMC (overlapped block motion compensation) processing.

Fig. 36 is a conceptual diagram for explaining an example of the predicted image correction processing by the OBMC processing.

Fig. 37 is a conceptual diagram for explaining generation of a predicted image of 2 triangles.

Fig. 38 is a conceptual diagram for explaining a model assuming constant-velocity linear motion.

Fig. 39 is a conceptual diagram for explaining an example of a predicted image generation method using luminance correction processing by LIC (local optical compensation) processing.

Fig. 40 is a block diagram showing an example of mounting the encoder device.

Fig. 41 is a block diagram showing a functional configuration of a decoding device according to the embodiment.

Fig. 42 is a flowchart showing an example of the overall decoding process performed by the decoding apparatus.

Fig. 43 is a flowchart showing an example of processing performed by the prediction processing unit of the decoding apparatus.

Fig. 44 is a flowchart showing another example of the processing performed by the prediction processing unit of the decoding apparatus.

Fig. 45 is a flowchart showing an example of inter prediction in the normal inter mode in the decoding apparatus.

Fig. 46 is a block diagram showing an example of mounting the decoding apparatus.

Fig. 47 is a block diagram showing a detailed functional configuration of an entropy encoding unit of the encoding device according to the embodiment.

Fig. 48 is a block diagram showing a detailed functional configuration of an entropy decoding unit of the decoding device according to the embodiment.

Fig. 49 is a flowchart showing an example of the 1 st operation of the entropy encoding unit of the encoding device according to the embodiment.

Fig. 50 is a flowchart showing an example of operation 2 of the entropy encoding unit of the encoding device according to the embodiment.

Fig. 51 is a flowchart showing an operation of the encoding device according to the embodiment.

Fig. 52 is a flowchart showing a specific example of the encoding operation according to the embodiment.

Fig. 53 is a flowchart showing a specific example of the binarization operation according to the embodiment.

Fig. 54 is a flowchart showing an operation of the decoding device according to the embodiment.

Fig. 55 is a flowchart showing a specific example of the decoding operation according to the embodiment.

Fig. 56 is a flowchart showing a specific example of the inverse binarization operation according to the embodiment.

Fig. 57 is a block diagram showing the overall configuration of a content providing system that realizes a content distribution service.

Fig. 58 is a conceptual diagram illustrating an example of an encoding structure in hierarchical encoding.

Fig. 59 is a conceptual diagram illustrating an example of an encoding structure in hierarchical encoding.

Fig. 60 is a conceptual diagram illustrating an example of a display screen of a web page.

Fig. 61 is a conceptual diagram illustrating an example of a display screen of a web page.

Fig. 62 is a block diagram showing an example of a smartphone.

Fig. 63 is a block diagram showing a configuration example of the smartphone.

Detailed Description

For example, by applying arithmetic coding to a binarized data sequence in which coefficient information such as a prediction residual coefficient, a frequency transform coefficient, a quantization coefficient, and the like is binarized, the amount of code can be reduced. On the other hand, the processing delay may increase thereby. Thus, a mode in which arithmetic coding is skipped is studied. In this mode, an increase in processing delay is suppressed.

Further, the syntax structure in the case of skipping arithmetic coding may be different from the syntax structure in the case of applying arithmetic coding. Thereby, an increase in the code amount in the case of skipping arithmetic coding is likely to be suppressed.

However, when the syntax structure in the case of skipping arithmetic coding is different from the syntax structure in the case of applying arithmetic coding, the circuit scale may increase.

Therefore, for example, an encoding device according to an aspect of the present invention includes: a circuit; and a memory connected to the circuit; the circuit encodes an image during operation; in the encoding of the image, the coefficient information of the image is binarized; controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized; outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied; in the binarization of the coefficient information, the coefficient information is binarized in accordance with a1 st syntax structure when arithmetic coding is applied to the binarized data sequence and a predetermined condition is not satisfied; applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure; the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress a processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition is that the orthogonal transform process is skipped when the coefficient information is derived from the prediction residual of the image.

This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition that orthogonal transform processing is skipped is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress a processing delay while suppressing an increase in the circuit scale.

The predetermined condition is, for example, a condition that the number of syntax elements subjected to encoding processing in a mode different from the bypass mode in accordance with CABAC (Context-based Adaptive Binary Arithmetic Coding) in an area including the processing target block in the image is equal to or greater than a threshold value.

This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition that the number of syntaxes of non-bypass CABAC is equal to or greater than a threshold is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

Further, the above-described bit sequence indicates whether or not the application of arithmetic coding is valid in a sequence parameter set, a picture parameter set, or a slice header, for example.

Thus, the encoding apparatus can switch whether or not the application of arithmetic coding is valid in the sequence parameter set, the picture parameter set, or the slice header. Therefore, the encoding device can suppress frequent switching as to whether or not arithmetic coding is applied to switch between data types. This can suppress an increase in the code amount and suppress processing delay.

Further, for example, the above-described circuit switches inclusively whether or not arithmetic coding is applied in units including 1 or more slices or 1 or more pictures.

Thus, the encoding device can inclusively switch whether or not arithmetic coding is applied in a large unit. Therefore, the encoding device can suppress frequent switching as to whether or not arithmetic coding is applied to switch between data types. This can suppress an increase in the code amount and suppress processing delay.

For example, a decoding device according to an aspect of the present invention includes: a circuit; and a memory connected to the circuit; the circuit decodes an image during operation; in the decoding of the image, acquiring a bit sequence including a binarized data sequence in which coefficient information of the image is binarized; controlling whether to apply arithmetic decoding to the binarized data sequence; inversely binarizing the binarized data sequence to which arithmetic decoding is applied or not applied; in the inverse binarization of the binarized data sequence, when arithmetic decoding is applied to the binarized data sequence and a predetermined condition is not satisfied, the binarized data sequence is inverse binarized according to a1 st syntax structure; applying arithmetic decoding to the binarized data sequence and, when the predetermined condition is satisfied, inverse binarization to the binarized data sequence in accordance with a2 nd syntax structure different from the 1 st syntax structure; the binarized data sequence is constructed as inverse binarization according to the syntax 2 without applying arithmetic decoding to the binarized data sequence.

This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition is a condition under which the inverse orthogonal transform process is skipped when deriving the prediction residual of the image from the coefficient information.

This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition that the inverse orthogonal transform process is skipped is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition is a condition that the number of syntax elements subjected to decoding processing in a mode different from the bypass mode by CABAC (Context-based Adaptive Binary Arithmetic Coding) is equal to or greater than a threshold value in an area including the processing target block in the image.

This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition that the number of syntaxes of non-bypass CABAC is equal to or greater than a threshold is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress a processing delay while suppressing an increase in the circuit scale.

Further, the above-described bit sequence indicates whether or not the application of arithmetic decoding is valid in a sequence parameter set, a picture parameter set, or a slice header, for example.

Thus, the decoding apparatus can switch whether or not the application of arithmetic decoding is valid in the sequence parameter set, the picture parameter set, or the slice header. Therefore, the decoding device can suppress frequent switching such as switching for each data type as to whether or not arithmetic decoding is applied. This can suppress an increase in the amount of code and suppress processing delay.

Further, for example, the above-described circuit switches inclusively whether or not arithmetic decoding is applied in units including 1 or more slices or 1 or more pictures.

Thus, the decoding device can inclusively switch whether or not arithmetic decoding is applied in a large unit. Therefore, the decoding device can suppress frequent switching such as switching for each data type as to whether or not arithmetic decoding is applied. This can suppress an increase in the amount of code and suppress processing delay.

For example, in an encoding method according to an aspect of the present invention, an image is encoded; in the encoding of the image, the coefficient information of the image is binarized; controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized; outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied; in the binarization of the coefficient information, the coefficient information is binarized in accordance with a1 st syntax structure when arithmetic coding is applied to the binarized data sequence and a predetermined condition is not satisfied; applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure; the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress a processing delay while suppressing an increase in the circuit scale.

For example, a decoding method according to an aspect of the present invention decodes an image; in the decoding of the image, acquiring a bit sequence including a binarized data sequence in which coefficient information of the image is binarized; controlling whether to apply arithmetic decoding to the binarized data sequence; inversely binarizing the binarized data sequence to which arithmetic decoding is applied or not applied; in the inverse binarization of the binarized data sequence, when arithmetic decoding is applied to the binarized data sequence and a predetermined condition is not satisfied, the binarized data sequence is inverse binarized according to a1 st syntax structure; applying arithmetic decoding to the binarized data sequence and, when the predetermined condition is satisfied, inverse binarization to the binarized data sequence in accordance with a2 nd syntax structure different from the 1 st syntax structure; the binarized data sequence is constructed as inverse binarization according to the syntax 2 without applying arithmetic decoding to the binarized data sequence.

This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress a processing delay while suppressing an increase in the circuit scale.

For example, an encoding device according to an aspect of the present invention includes a partition unit, an intra prediction unit, an inter prediction unit, a prediction control unit, a transform unit, a quantization unit, an entropy encoding unit, and a loop filter unit.

The dividing unit divides a picture to be encoded constituting the moving image into a plurality of blocks. The intra prediction unit performs intra prediction for generating the predicted image of the block to be encoded in the picture to be encoded using a reference image in the picture to be encoded. The inter prediction unit performs inter prediction for generating the predicted image of the block to be encoded using a reference image in a reference picture different from the picture to be encoded.

The prediction control unit controls intra prediction by the intra prediction unit and inter prediction by the inter prediction unit. The transformation unit transforms a prediction residual signal between the prediction image generated by the intra prediction unit or the inter prediction unit and the image of the block to be encoded, and generates a transformation coefficient signal of the block to be encoded. The quantization unit quantizes the transform coefficient signal. The entropy encoding unit encodes the quantized transform coefficient signal. The loop filter unit applies filtering to the block to be encoded.

For example, the entropy encoding unit encodes an image during operation; in the encoding of the image, the coefficient information of the image is binarized; controlling whether to apply arithmetic coding to the binarized data sequence after the coefficient information is binarized; outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied; in the binarization of the coefficient information, the coefficient information is binarized in accordance with a1 st syntax structure when arithmetic coding is applied to the binarized data sequence and a predetermined condition is not satisfied; applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure; the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

For example, a decoding device according to an aspect of the present invention is a decoding device that decodes a moving image using a predicted image, and includes an entropy decoding unit, an inverse quantization unit, an inverse transformation unit, an intra prediction unit, an inter prediction unit, a prediction control unit, an addition unit (reconstruction unit), and a loop filter unit.

The entropy decoding unit decodes a quantized transform coefficient signal of a decoding target block in a decoding target picture constituting the moving image. The inverse quantization unit inversely quantizes the quantized transform coefficient signal. The inverse transform unit inverse-transforms the transform coefficient signal to obtain a prediction residual signal of the decoding target block.

The intra prediction unit performs intra prediction for generating the predicted image of the decoding target block using a reference image in the decoding target picture. The inter prediction unit performs inter prediction for generating the predicted image of the decoding target block using a reference image in a reference picture different from the decoding target picture. The prediction control unit controls intra prediction by the intra prediction unit and inter prediction by the inter prediction unit.

The adder adds the prediction image generated by the intra prediction unit or the inter prediction unit to the prediction residual signal, thereby reconstructing an image of the block to be decoded. The loop filter unit applies filtering to the decoding target block.

For example, the entropy decoding unit decodes an image during operation; in the decoding of the image, acquiring a bit sequence including a binarized data sequence in which coefficient information of the image is binarized; controlling whether to apply arithmetic decoding to the binarized data sequence; inversely binarizing the binarized data sequence to which arithmetic decoding is applied or not applied; in the inverse binarization of the binarized data sequence, when arithmetic decoding is applied to the binarized data sequence and a predetermined condition is not satisfied, the binarized data sequence is inverse binarized according to a1 st syntax structure; applying arithmetic decoding to the binarized data sequence and, when the predetermined condition is satisfied, inverse binarization to the binarized data sequence in accordance with a2 nd syntax structure different from the 1 st syntax structure; the binarized data sequence is constructed as inverse binarization according to the syntax 2 without applying arithmetic decoding to the binarized data sequence.

Further, these inclusive or specific technical means may be realized by a system, an apparatus, a method, an integrated circuit, a computer program, or a non-volatile recording medium such as a computer-readable CD-ROM, or may be realized by any combination of a system, an apparatus, a method, an integrated circuit, a computer program, and a recording medium.

Hereinafter, embodiments will be described in detail with reference to the drawings. The embodiments described below are all illustrative or specific examples. The numerical values, shapes, materials, constituent elements, arrangement and connection forms of constituent elements, steps, order of steps, and the like shown in the following embodiments are examples, and do not limit the scope of the invention.

Hereinafter, embodiments of an encoding device and a decoding device will be described. The embodiment is an example in which the encoding device and the decoding device having the processing and/or configuration described in each of the embodiments of the present invention can be applied. The processing and/or configuration may be implemented in an encoding device and a decoding device different from those of the embodiments. For example, the following process may be performed with respect to the process and/or the configuration applied to the embodiment.

(1) Any one of the plurality of components of the encoding device or the decoding device according to the embodiments described in the respective aspects of the present invention may be replaced with another component described in any one of the respective aspects of the present invention or may be combined with the other component.

(2) In the encoding device or the decoding device according to the embodiment, any change such as addition, replacement, deletion, or the like of a function or a process performed by a part of the plurality of components of the encoding device or the decoding device may be performed. For example, a certain function or process may be replaced with or combined with another function or process described in a certain one of the embodiments of the present invention.

(3) In the method performed by the encoding device or the decoding device according to the embodiment, some of the plurality of processes included in the method may be optionally changed such as addition, replacement, or deletion. For example, a certain process in the method may be replaced with another process described in a certain one of the embodiments of the present invention or combined with the other process.

(4) The present invention is not limited to the above-described embodiments, and various modifications and variations can be made without departing from the spirit and scope of the present invention.

(5) The constituent elements having a part of the functions of the encoding device or the decoding device of the embodiment or the constituent elements having a part of the processes of the encoding device or the decoding device of the embodiment may be combined with or replaced by the constituent elements described in any of the aspects of the present invention, the constituent elements having a part of the functions described in any of the aspects of the present invention, or the constituent elements performing a part of the processes described in any of the aspects of the present invention.

(6) In the method implemented by the encoding device or the decoding device of the embodiment, any of a plurality of processes included in the method may be replaced with the process described in any of the technical aspects of the present invention, or a process similar to the process, or a combination thereof.

(7) Some of the plurality of processes included in the method performed by the encoding device or the decoding device according to the embodiment may be combined with the process described in any one of the aspects of the present invention.

(8) The embodiments of the processing and/or configuration described in the embodiments of the present invention are not limited to the encoding device or the decoding device of the embodiments. For example, the processing and/or configuration may be implemented in a device used for a purpose different from the moving image encoding or the moving image decoding disclosed in the embodiment.

[ coding apparatus ]

First, the coding apparatus according to the embodiment will be described. Fig. 1 is a block diagram showing a functional configuration of an encoding device 100 according to an embodiment. The encoding apparatus 100 is a moving image encoding apparatus that encodes a moving image in units of blocks.

As shown in fig. 1, the encoding apparatus 100 is an apparatus for encoding an image in units of blocks, and includes a division unit 102, a subtraction unit 104, a transformation unit 106, a quantization unit 108, an entropy encoding unit 110, an inverse quantization unit 112, an inverse transformation unit 114, an addition unit 116, a block memory 118, a loop filter unit 120, a frame memory 122, an intra prediction unit 124, an inter prediction unit 126, and a prediction control unit 128.

The encoding device 100 is implemented by, for example, a general-purpose processor and a memory. In this case, when the software program stored in the memory is executed by the processor, the processor functions as the dividing unit 102, the subtracting unit 104, the transforming unit 106, the quantizing unit 108, the entropy encoding unit 110, the inverse quantizing unit 112, the inverse transforming unit 114, the adding unit 116, the loop filtering unit 120, the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128. The encoding device 100 may be implemented as 1 or more dedicated electronic circuits corresponding to the division unit 102, the subtraction unit 104, the transformation unit 106, the quantization unit 108, the entropy encoding unit 110, the inverse quantization unit 112, the inverse transformation unit 114, the addition unit 116, the loop filter unit 120, the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128.

Hereinafter, each of the components included in the coding apparatus 100 will be described after describing the flow of the overall processing of the coding apparatus 100.

[ Overall flow of encoding processing ]

Fig. 2 is a flowchart showing an example of the overall encoding process performed by the encoding apparatus 100.

First, the dividing unit 102 of the encoding device 100 divides each picture included in an input image as a moving image into a plurality of fixed-size blocks (e.g., 128 × 128 pixels) (step Sa _ 1). Then, the dividing unit 102 selects a division pattern (also referred to as a block shape) for the fixed-size block (step Sa _ 2). That is, the dividing unit 102 further divides the fixed-size block into a plurality of blocks constituting the selected division pattern. Then, encoding apparatus 100 performs the processing of steps Sa _3 to Sa _9 for each of the plurality of blocks (i.e., the block to be encoded).

That is, the prediction processing unit, which is configured by all or a part of the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128, generates a prediction signal (also referred to as a prediction block) of the block to be encoded (also referred to as a current block) (step Sa _ 3).

Next, the subtraction unit 104 generates a difference between the block to be encoded and the predicted block as a prediction residual (also referred to as a difference block) (step Sa _ 4).

Next, the transform unit 106 and the quantization unit 108 transform and quantize the difference block to generate a plurality of quantized coefficients (step Sa _ 5). In addition, a block composed of a plurality of quantized coefficients is also referred to as a coefficient block.

Next, the entropy encoding unit 110 encodes (specifically, entropy encodes) the coefficient block and the prediction parameter for the generation of the prediction signal, thereby generating an encoded signal (step Sa _ 6). In addition, the encoded signal is also referred to as an encoded bitstream, a compressed bitstream, or a stream.

Next, the inverse quantization unit 112 and the inverse transform unit 114 inverse-quantize and inverse-transform the coefficient block to restore a plurality of prediction residuals (i.e., difference blocks) (step Sa _ 7).

Next, the adder 116 adds the prediction block to the restored difference block to reconstruct the current block into a reconstructed image (also referred to as a reconstructed block or a decoded image block) (step Sa _ 8). Thereby, a reconstructed image is generated.

If the reconstructed image is generated, the loop filter unit 120 filters the reconstructed image as necessary (step Sa _ 9).

Then, encoding apparatus 100 determines whether or not encoding of the entire picture is completed (step Sa _10), and if it is determined that encoding is not completed (no in step Sa _10), repeats the processing from step Sa _ 2.

In the above example, the encoding device 100 selects 1 partition pattern for a block of a fixed size and encodes each block according to the partition pattern, but may encode each block according to each of a plurality of partition patterns. In this case, the encoding apparatus 100 may also evaluate costs for each of the plurality of division patterns, for example, select, as an output encoded signal, an encoded signal obtained by encoding of a division pattern following the minimum cost.

As shown in the figure, the processing in steps Sa _1 to Sa _10 is sequentially performed by encoding apparatus 100. Alternatively, a plurality of some of these processes may be performed in parallel, or the order of these processes may be replaced.

[ division part ]

The dividing unit 102 divides each picture included in the input moving image into a plurality of blocks, and outputs each block to the subtracting unit 104. For example, the dividing section 102 first divides the picture into blocks of a fixed size (e.g., 128 × 128). Other fixed block sizes may also be used. This fixed-size block may be referred to as a Code Tree Unit (CTU). The dividing unit 102 divides the fixed-size blocks into variable-size (e.g., 64 × 64 or less) blocks based on, for example, recursive quadtree (quadtree) and/or binary tree (binary tree) block division. That is, the dividing section 102 selects the division pattern. The variable-size block may be referred to as a Coding Unit (CU), a Prediction Unit (PU), or a Transform Unit (TU). In the various processing examples, it is not necessary to distinguish CU, PU, and TU, and some or all of the blocks in the picture may be a unit of processing CU, PU, and TU.

Fig. 3 is a conceptual diagram illustrating an example of block division according to the embodiment. In fig. 3, a solid line indicates a block boundary formed by the quad-tree block division, and a dotted line indicates a block boundary formed by the binary-tree block division.

Here, the block 10 is a square block of 128 × 128 pixels (128 × 128 block). The 128 × 128 block 10 is first divided into 4 square 64 × 64 blocks (quad-tree block division).

The upper left 64 × 64 block is vertically subdivided into 2 rectangular 32 × 64 blocks, and the left 32 × 64 block is vertically subdivided into 2 rectangular 16 × 64 blocks (binary tree block division). As a result, the upper-left 64 × 64 block is divided into 216 × 64 blocks 11, 12, and 32 × 64 block 13.

The upper right 64 x 64 block is divided horizontally into 2 rectangular 64 x 32 blocks 14, 15 (binary tree block division).

The lower left 64 × 64 block is divided into 4 square 32 × 32 blocks (quad-tree block division). The upper left block and the lower right block of the 4 32 × 32 blocks are further divided. The upper-left 32 × 32 block is vertically divided into 2 rectangular 16 × 32 blocks, and the right 16 × 32 block is further horizontally divided into 216 × 16 blocks (binary tree block division). The bottom-right 32 × 32 block is horizontally divided into 232 × 16 blocks (binary tree block division). As a result, the lower left 64 × 64 block is divided into 16 × 32 blocks 16, 216 × 16 blocks 17, 18, 232 × 32 blocks 19, 20, and 232 × 16 blocks 21, 22.

The lower right 64 × 64 block 23 is not divided.

As described above, in fig. 3, the block 10 is divided into 13 variable-size blocks 11 to 23 by the recursive quadtree and binary tree block division. Such partitioning is sometimes referred to as QTBT (quad-tree plus binary tree) partitioning.

In fig. 3, 1 block is divided into 4 or 2 blocks (quad tree or binary tree block division), but the division is not limited to these. For example, 1 block may be divided into 3 blocks (ternary tree block division). There are cases where a partition including such a ternary tree block partition is referred to as an MBT (multi type tree) partition.

[ slice/tile of picture ]

In order to decode pictures in parallel, a picture may be configured in units of slices or tiles. A picture composed of slice units or tile units may be composed of the dividing unit 102.

A slice is a unit of basic coding constituting a picture. A picture is composed of, for example, 1 or more slices. Further, a slice is composed of 1 or more consecutive CTUs (Coding Tree units).

Fig. 4A is a conceptual diagram illustrating an example of a slice configuration. For example, a picture includes 11 × 8 CTUs and is divided into 4 slices (slices 1 to 4). Slice 1 consists of 16 CTUs, slice 2 consists of 21 CTUs, slice 3 consists of 29 CTUs, and slice 4 consists of 22 CTUs. Here, each CTU within a picture belongs to a certain slice. The shape of the slice is a shape dividing the picture in the horizontal direction. The boundaries of the slice need not be the picture ends, but may be where in the boundaries of the CTUs within the picture. The processing order (encoding order or decoding order) of CTUs in a slice is, for example, raster scan order. In addition, the slice includes header information and encoded data. The header information may include characteristics of the slice, such as the CTU address of the start of the slice and the slice type.

A tile is a unit of a rectangular area constituting a picture. The tiles may also be assigned a number called TileId in raster scan order.

Fig. 4B is a conceptual diagram illustrating an example of the structure of a tile. For example, a picture includes 11 × 8 CTUs and is divided into 4 tiles of rectangular areas (tiles 1-4). In the case of using the tiles, the processing order of the CTUs is changed compared to the case of not using the tiles. Without using tiles, multiple CTUs within a tile are processed in raster scan order. In the case of using tiles, at least 1 CTU is processed in a raster scan order in each of a plurality of tiles. For example, as shown in fig. 4B, the processing order of the plurality of CTUs included in tile 1 is from the left end of row 1 of tile 1 toward the right end of row 1 of tile 1, and then from the left end of row 2 of tile 1 toward the right end of row 2 of tile 1.

In addition, 1 tile includes 1 or more slices, and 1 slice includes 1 or more tiles.

[ subtracting section ]

The subtracting unit 104 subtracts a prediction signal (a prediction sample input from the prediction control unit 128 shown below) from an original signal (an original sample) in block units input from the dividing unit 102 and divided by the dividing unit 102. That is, the subtraction unit 104 calculates a prediction error (also referred to as a residual) of a block to be encoded (hereinafter, referred to as a current block). Then, the subtraction unit 104 outputs the calculated prediction error (residual) to the conversion unit 106.

The original signal is an input signal to the encoding apparatus 100, and is a signal (for example, a luminance (luma) signal and 2 color difference (chroma) signals) representing an image of each picture constituting a moving image. Hereinafter, a signal representing an image may be referred to as a sample.

[ converting part ]

The transform unit 106 transforms the prediction error in the spatial domain into a transform coefficient in the frequency domain, and outputs the transform coefficient to the quantization unit 108. Specifically, the transform unit 106 performs, for example, a predetermined Discrete Cosine Transform (DCT) or a predetermined Discrete Sine Transform (DST) on the prediction error in the spatial domain. The predetermined DCT or DST may be set in cosine.

The transform unit 106 may adaptively select a transform type from among a plurality of transform types, and transform the prediction error into a transform coefficient using a transform basis function (transform basis function) corresponding to the selected transform type. Such a transform may be referred to as EMT (explicit multiple core transform) or AMT (adaptive multiple transform).

The plurality of transform types includes, for example, DCT-II, DCT-V, DCT-VIII, DST-I, and DST-VII. Fig. 5A is a table showing transformation base functions corresponding to the transformation type example. In fig. 5A, N denotes the number of input pixels. The selection of a transform type from among these multiple transform types may depend on, for example, the type of prediction (intra prediction and inter prediction) or the intra prediction mode.

Information indicating whether such an EMT or AMT is applied (e.g., referred to as an EMT flag or an AMT flag) and information indicating the selected transform type are typically signaled in the CU level. The signaling of such information is not necessarily limited to the CU level, and may be at another level (for example, a bit sequence level, a picture level, a slice level, a tile level, or a CTU level).

The transform unit 106 may transform the transform coefficient (transform result) again. Such a retransform may be called an AST (adaptive secondary transform) or NSST (non-separable secondary transform). For example, the transform unit 106 performs re-transform on each sub-block (for example, 4 × 4 sub-blocks) included in a block of transform coefficients corresponding to an intra prediction error. The information indicating whether NSST is applied and the information on the transformation matrix used for NSST are typically signaled at the CU level. The signaling of such information is not necessarily limited to the CU level, and may be at another level (for example, a sequence level, a picture level, a slice level, a tile level, or a CTU level).

The conversion unit 106 may apply Separable conversion and Non-Separable conversion. The separable transformation is a method of performing transformation a plurality of times by separating the number of input dimensions for each direction, and the inseparable transformation is a method of performing transformation collectively by regarding 2 or more dimensions as one dimension when the input is multidimensional.

For example, as an example of the indiscriminate transform, a 4 × 4 block is input, and when the block is regarded as one array having 16 elements, the array is subjected to transform processing using a 16 × 16 transform matrix.

In a further example of the indivisible Transform, a 4 × 4 input block is regarded as one array having 16 elements, and then the array may be subjected to a Transform such as Givens rotation a plurality of times (Hypercube Givens Transform).

In the conversion unit 106, the type of the basis for the conversion into the frequency region may be switched according to the region in the CU. As an example, there is SVT (spatial Varying Transform). In SVT, as shown in fig. 5B, CU is divided into two halves in the horizontal or vertical direction, and only one of the regions is converted into a frequency region. The type of transform basis may be set per region, for example, using DST7 and DCT 8. In this example, only one of the 2 regions in the CU is transformed, and the other is not transformed, but all of the 2 regions may be transformed. The division method may be not only 2-fold division but also 4-fold division, or information indicating the division may be separately encoded and signaled in the same manner as the CU division. In addition, SVT may be called SBT (Sub-block Transform).

[ quantifying section ]

The quantization unit 108 quantizes the transform coefficient output from the transform unit 106. Specifically, the quantization unit 108 scans the transform coefficient of the current block in a predetermined scan order and quantizes the transform coefficient based on a Quantization Parameter (QP) corresponding to the scanned transform coefficient. The quantization unit 108 outputs the quantized transform coefficient (hereinafter, referred to as a quantization coefficient) of the current block to the entropy coding unit 110 and the inverse quantization unit 112. The predetermined scanning order may be set in advance.

The prescribed scan order is an order for quantization/inverse quantization of transform coefficients. For example, the predetermined scanning order may be defined in ascending order of frequency (order from low frequency to high frequency) or descending order of frequency (order from high frequency to low frequency).

The Quantization Parameter (QP) is a parameter that defines a quantization step size (quantization width). For example, if the value of the quantization parameter increases, the quantization step size also increases. That is, if the value of the quantization parameter increases, the quantization error increases.

Further, there is a case where a quantization matrix is used for quantization. For example, there are cases where a plurality of quantization matrices are used in accordance with the frequency transform sizes of 4 × 4, 8 × 8, and the like, the prediction modes of intra prediction, inter prediction, and the like, and the pixel components of luminance, color difference, and the like. In addition, quantization is a method of digitizing a value sampled at a predetermined interval in association with a predetermined level, and in this field, reference may be made using other expressions such as rounding, and scaling, or rounding, and scaling may be employed. The predetermined interval and level may be set in advance.

As methods of using a quantization matrix, there are a method of using a quantization matrix directly set on the encoding apparatus side and a method of using a default quantization matrix (default matrix). On the encoding device side, a quantization matrix corresponding to the feature of the image can be set by directly setting the quantization matrix. However, in this case, there is a disadvantage that the code amount increases by the encoding of the quantization matrix.

On the other hand, there is also a method of quantizing both the coefficients of the high-frequency components and the coefficients of the high-frequency components similarly without using a quantization matrix. In addition, this method is equal to a method using a quantization matrix (flat matrix) in which coefficients are all the same value.

The quantization matrix may also be specified by, for example, SPS (Sequence Parameter Set) or PPS (Picture Parameter Set). The SPS includes parameters used for sequences, and the PPS includes parameters used for pictures. There are cases where SPS and PPS are referred to simply as parameter sets.

[ entropy encoding part ]

The entropy encoding unit 110 generates an encoded signal (encoded bit stream) based on the quantized coefficients input from the quantization unit 108. Specifically, the entropy encoding unit 110 binarizes the quantization coefficient, for example, arithmetically encodes the binary signal, and outputs a compressed bit stream or sequence.

[ inverse quantization part ]

The inverse quantization unit 112 inversely quantizes the quantization coefficient input from the quantization unit 108. Specifically, the inverse quantization unit 112 inversely quantizes the quantized coefficients of the current block in a predetermined scanning order. Then, the inverse quantization unit 112 outputs the inverse-quantized transform coefficient of the current block to the inverse transform unit 114. The predetermined scanning order may be set in advance.

[ inverse transformation section ]

The inverse transform unit 114 restores the prediction error (residual) by inversely transforming the transform coefficient input from the inverse quantization unit 112. Specifically, the inverse transform unit 114 performs inverse transform corresponding to the transform performed by the transform unit 106 on the transform coefficient, thereby restoring the prediction error of the current block. The inverse transform unit 114 outputs the restored prediction error to the addition unit 116.

The restored prediction error usually loses information by quantization, and therefore does not match the prediction error calculated by the subtraction unit 104. That is, the reconstructed prediction error usually includes a quantization error.

[ addition section ]

The adder 116 adds the prediction error input from the inverse transform unit 114 to the prediction sample input from the prediction control unit 128, thereby reconstructing the current block. The adder 116 outputs the reconstructed block to the block memory 118 and the loop filter 120. There are also cases where the reconstruction block is referred to as a native decoding block.

[ Block memory ]

The block memory 118 is a storage unit for storing a block to be referred to in intra prediction and a block in a picture to be encoded (i.e., a current picture), for example. Specifically, the block memory 118 stores the reconstructed block output from the adder 116.

[ frame memory ]

The frame memory 122 is a storage unit for storing reference pictures used for inter-frame prediction, and may be referred to as a frame buffer. Specifically, the frame memory 122 stores the reconstructed block filtered by the loop filter unit 120.

[ Loop Filter Unit ]

The loop filter unit 120 applies loop filtering to the block reconstructed by the adder unit 116, and outputs the filtered reconstructed block to the frame memory 122. The in-loop filter is a filter (in-loop filter) used in the coding loop, and includes, for example, a deblocking filter (DF or DBF), a Sample Adaptive Offset (SAO), an Adaptive Loop Filter (ALF), and the like.

In the ALF, a least square error filter for removing coding distortion is applied, and for example, 1 filter selected from a plurality of filters based on the direction and activity (activity) of a local gradient (gradient) is applied for each 2 × 2 sub-block in a current block.

Specifically, the sub-blocks (e.g., 2 × 2 sub-blocks) are first classified into a plurality of classes (e.g., 15 or 25 classes). The sub-blocks are classified based on the direction of the gradient and the activity. For example, the classification value C (e.g., C5D + a) is calculated using the direction value D (e.g., 0 to 2 or 0 to 4) of the gradient and the activity value a (e.g., 0 to 4) of the gradient. And, the sub-blocks are classified into a plurality of classes based on the classification value C.

The direction value D of the gradient is derived, for example, by comparing the gradients in a plurality of directions (e.g., horizontal, vertical, and 2 diagonal directions). The activity value a of the gradient is derived by, for example, adding the gradients in the plurality of directions and quantifying the addition result.

Based on the result of such classification, a filter for a sub-block is decided from among a plurality of filters.

As the shape of the filter used in the ALF, for example, a circularly symmetric shape is adopted. Fig. 6A to 6C are diagrams showing a plurality of examples of the shape of a filter used in the ALF. Fig. 6A shows a 5 × 5 diamond shaped filter, fig. 6B shows a 7 × 7 diamond shaped filter, and fig. 6C shows a 9 × 9 diamond shaped filter. The information representing the shape of the filter is typically signaled at the picture level. The signaling of the information indicating the shape of the filter is not necessarily limited to the picture level, and may be at another level (for example, the sequence level, slice level, tile level, CTU level, or CU level).

The turning on/off of the ALF may also be determined at the picture level or CU level, for example. For example, it is also possible to determine whether or not to apply ALF at the CU level for luminance, and at the picture level for color difference. The information indicating the on/off of the ALF is typically signaled at the picture level or CU level. The signaling of the information indicating the turning on/off of the ALF is not necessarily limited to the picture level or the CU level, and may be at another level (for example, the sequence level, the slice level, the tile level, or the CTU level).

The coefficient sets of the selectable plurality of filters (e.g., up to 15 or 25 filters) are typically signaled at the picture level. The signaling of the coefficient set is not necessarily limited to the picture level, and may be at other levels (for example, sequence level, slice level, tile level, CTU level, CU level, or subblock level).

[ Loop Filter Unit > deblocking Filter ]

In the deblocking filter, the loop filter unit 120 performs filter processing on a block boundary of a reconstructed image to reduce distortion occurring at the block boundary.

Fig. 7 is a block diagram showing an example of a detailed configuration of the loop filter unit 120 functioning as a deblocking filter.

The loop filter unit 120 includes a boundary determination unit 1201, a filter determination unit 1203, a filter processing unit 1205, a processing determination unit 1208, a filter characteristic determination unit 1207, and switches 1202, 1204, and 1206.

The boundary determination unit 1201 determines whether or not a pixel subjected to deblocking filtering (i.e., a target pixel) exists in the vicinity of a block boundary. Then, the boundary determination unit 1201 outputs the determination result to the switch 1202 and the process determination unit 1208.

When the boundary determination unit 1201 determines that the target pixel is present near the block boundary, the switch 1202 outputs the image before the filtering process to the switch 1204. On the other hand, when the boundary determination unit 1201 determines that the target pixel is not present near the block boundary, the switch 1202 outputs the image before the filtering process to the switch 1206.

The filter determination unit 1203 determines whether or not to perform deblocking filter processing on the target pixel, based on the pixel values of at least 1 peripheral pixel located in the periphery of the target pixel. The filter determination unit 1203 then outputs the determination result to the switch 1204 and the processing determination unit 1208.

When the filter determination unit 1203 determines that the deblocking filter process is to be performed on the target pixel, the switch 1204 outputs the image before the filter process acquired via the switch 1202 to the filter processor 1205. On the other hand, when the filter determination unit 1203 determines that the deblocking filter process is not to be performed on the target pixel, the switch 1204 outputs the image before the filter process acquired via the switch 1202 to the switch 1206.

When the image before the filter processing is acquired via the switches 1202 and 1204, the filter processing unit 1205 performs the deblocking filter processing having the filter characteristic determined by the filter characteristic determining unit 1207 on the target pixel. The filter processor 1205 outputs the pixel after the filter processing to the switch 1206.

The switch 1206 selectively outputs the pixels that have not been subjected to the deblocking filtering process and the pixels that have been subjected to the deblocking filtering process by the filtering process unit 1205, in accordance with the control performed by the process determination unit 1208.

The processing determination unit 1208 controls the switch 1206 based on the determination results of the boundary determination unit 1201 and the filter determination unit 1203, respectively. That is, the processing determination unit 1208 outputs the pixel subjected to the deblocking filtering processing from the switch 1206 when the boundary determination unit 1201 determines that the target pixel is present near the block boundary and the filter determination unit 1203 determines that the deblocking filtering processing is performed on the target pixel. In addition to the above, the processing determination unit 1208 outputs the pixels that have not been subjected to the deblocking filter processing from the switch 1206. By repeating such output of the pixels, the filtered image is output from the switch 1206.

Fig. 8 is a conceptual diagram illustrating an example of a deblocking filter having filter characteristics symmetrical with respect to a block boundary.

In the deblocking filtering process, for example, 2 deblocking filters having different characteristics, that is, 1 out of a strong filter and a weak filter, are selected using a pixel value and a quantization parameter. In the strong filter, as shown in fig. 8, when pixels p0 to p2 and pixels q0 to q2 exist across a block boundary, the pixel values of pixels q0 to q2 are changed to pixel values q '0 to q' 2 by performing an operation shown by the following expression, for example.

q’0=(p1+2×p0+2×q0+2×q1+q2+4)/8

q’1=(p0+q0+q1+q2+2)/4

q’2=(p0+q0+q1+3×q2+2×q3+4)/8

In the above formula, p0 to p2 and q0 to q2 represent pixel values of pixels p0 to p2 and pixels q0 to q2, respectively. Further, q3 is the pixel value of a pixel q3 adjacent to the pixel q2 on the opposite side of the block boundary. In addition, on the right side of the above formulas, the coefficient used for the deblocking filter process to multiply the pixel value of each pixel is a filter coefficient.

Further, in the deblocking filter processing, clipping processing may be performed so that the calculated pixel value does not exceed a threshold value and is set. In this clipping processing, the pixel values after the operation based on the above expression are clipped to "the operation target pixel values ± 2 × threshold values" using the threshold values determined based on the quantization parameters. This can prevent excessive smoothing.

Fig. 9 is a conceptual diagram for explaining a block boundary to which deblocking filtering processing is performed. Fig. 10 is a conceptual diagram showing an example of the Bs value.

The block boundary to which the deblocking filtering process is applied is, for example, a boundary of a PU (Prediction Unit) or a TU (Transform Unit) of an 8 × 8 pixel block as shown in fig. 9. The deblocking filtering process may be performed in units of 4 lines or 4 columns. First, for the block P and the block Q shown in fig. 9, Bs (Boundary Strength) values are determined as shown in fig. 10.

According to the Bs value of fig. 10, it is determined whether or not deblocking filtering processing of different strengths is performed despite of the block boundaries belonging to the same image. Deblocking filtering processing for color difference signals is performed with a Bs value of 2. The deblocking filtering process is performed on the luminance signal when the Bs value is 1 or more and a predetermined condition is satisfied. Predetermined conditions may be set in advance. The determination condition of the Bs value is not limited to the condition shown in fig. 10, and may be determined based on other parameters.

[ prediction processing units (Intra prediction unit, inter prediction unit, prediction control unit) ]

Fig. 11 is a flowchart showing an example of processing performed by the prediction processing unit of the encoding device 100. The prediction processing unit is configured by all or a part of the components of the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128.

The prediction processing unit generates a prediction image of the current block (step Sb _ 1). The prediction image is also called a prediction signal or a prediction block. The prediction signal includes, for example, an intra prediction signal or an inter prediction signal. Specifically, the prediction processing unit generates a predicted image of the current block using a reconstructed image that has been obtained by performing generation of a predicted block, generation of a difference block, generation of a coefficient block, restoration of the difference block, and generation of a decoded image block.

The reconstructed image may be, for example, an image of a reference picture, or an image of an encoded block within a current picture, which is a picture including the current block. The coded blocks within the current picture are for example neighbouring blocks of the current block.

Fig. 12 is a flowchart showing another example of the processing performed by the prediction processing unit of the encoding device 100.

The prediction processor generates a prediction image in the 1 st mode (step Sc _1a), a prediction image in the 2 nd mode (step Sc _1b), and a prediction image in the 3 rd mode (step Sc _1 c). The 1 st, 2 nd and 3 rd aspects are different aspects for generating predicted images, and may be, for example, inter-prediction, intra-prediction, or prediction other than these. In these prediction methods, the above-described reconstructed image may be used.

Next, the prediction processing unit selects 1 of the plurality of predicted images generated in steps Sc _1a, Sc _1b, and Sc _1c (step Sc _ 2). This selection of the predicted image, that is, the selection of the mode or mode for obtaining the final predicted image, may be performed based on the cost calculated for each generated predicted image. Alternatively, the predicted image may be selected based on a parameter used in the encoding process. The encoding device 100 may also convert the selected prediction image and information for specifying the mode or mode into an encoded signal (also referred to as an encoded bit stream). The information may be a flag, for example. Thus, the decoding apparatus can generate a predicted image in accordance with the mode or mode selected by the encoding apparatus 100 based on the information. In the example shown in fig. 12, the prediction processing unit generates a predicted image in each mode, and then selects one of the predicted images. However, the prediction processing unit may select a mode or a pattern based on parameters used in the above-described encoding process before generating the predicted image, and generate the predicted image in accordance with the mode or the pattern.

For example, the 1 st and 2 nd aspects may be intra prediction and inter prediction, respectively, and the prediction processing unit may select a final prediction image for the current block based on prediction images generated according to these prediction aspects.

Fig. 13 is a flowchart showing another example of the processing performed by the prediction processing unit of the encoding device 100.

First, the prediction processing unit generates a prediction image by intra prediction (step Sd _1a), and generates a prediction image by inter prediction (step Sd _1 b). The prediction image generated by intra prediction is also referred to as an intra prediction image, and the prediction image generated by inter prediction is also referred to as an inter prediction image.

Next, the prediction processing unit evaluates each of the intra-prediction image and the inter-prediction image (step Sd _ 2). Costs may also be used in this evaluation. In other words, the prediction processing unit calculates the cost C of each of the intra-prediction image and the inter-prediction image. The cost C can be calculated by the formula of the R-D optimization model, e.g., C ═ D + λ × R. In this equation, D is encoding distortion of the prediction image, and is represented by, for example, the sum of absolute differences between pixel values of the current block and pixel values of the prediction image, or the like. R is the amount of generated predicted image code, specifically, the amount of code necessary for encoding the motion information and the like for generating the predicted image. Further, λ is an undetermined multiplier of lagrange, for example.

Next, the prediction processing unit selects, as a final prediction image of the current block, a prediction image for which the minimum cost C is calculated, from among the intra-prediction image and the inter-prediction image (step Sd _ 3). That is, the prediction mode or mode used to generate the predicted image of the current block is selected.

[ Intra prediction Unit ]

The intra prediction unit 124 performs intra prediction (also referred to as picture intra prediction) of the current block with reference to a block within the current picture stored in the block memory 118, thereby generating a prediction signal (intra prediction signal). Specifically, the intra prediction unit 124 performs intra prediction by referring to samples (for example, luminance values and color difference values) of blocks adjacent to the current block, generates an intra prediction signal, and outputs the intra prediction signal to the prediction control unit 128.

For example, the intra prediction unit 124 performs intra prediction using 1 of a plurality of predetermined intra prediction modes. The plurality of intra prediction modes typically includes 1 or more non-directional prediction modes and a plurality of directional prediction modes. The predetermined plurality of patterns may be predetermined.

The 1 or more non-directional prediction modes include a Planar prediction mode and a DC prediction mode, for example, which are specified by the h.265/HEVC specification.

The plurality of directional prediction modes includes, for example, a 33-directional prediction mode specified by the h.265/HEVC specification. The plurality of directional prediction modes may include prediction modes in 32 directions (65 directional prediction modes in total) in addition to 33 directions. Fig. 14 is a conceptual diagram showing all 67 intra prediction modes (2 non-directional prediction modes and 65 directional prediction modes) that can be used for intra prediction. The solid arrows indicate 33 directions defined by the h.265/HEVC specification, and the dashed arrows indicate additional 32 directions (the 2 non-directional prediction modes are not shown in fig. 14).

In various processing examples, the luminance block may be referred to for intra prediction of the color difference block. That is, the color difference component of the current block may also be predicted based on the luminance component of the current block. Such intra-frame prediction is sometimes referred to as CCLM (cross-component linear model) prediction. Such an intra prediction mode (for example, referred to as a CCLM mode) of a color difference block that refers to a luminance block may be added as 1 intra prediction mode of the color difference block.

The intra prediction unit 124 may correct the intra-predicted pixel value based on the gradient of the reference pixel in the horizontal/vertical direction. The intra prediction with such correction is sometimes called PDPC (position dependent intra prediction combination). Information indicating the presence or absence of application of PDPC (for example, referred to as a PDPC flag) is usually signaled on a CU level. The signaling of the information is not necessarily limited to the CU level, and may be at another level (for example, a sequence level, a picture level, a slice level, a tile level, or a CTU level).

[ interframe prediction part ]

The inter prediction unit 126 performs inter prediction (also referred to as picture inter prediction) of the current block by referring to a reference picture that is stored in the frame memory 122 and is different from the current picture, thereby generating a prediction signal (inter prediction signal). Inter prediction is performed in units of a current block or a current sub-block (e.g., a 4 × 4 block) within the current block. For example, the inter prediction unit 126 performs motion search (motion estimation) on the current block or current sub-block in the reference block, and finds a reference block or sub-block that best matches the current block or current sub-block. The inter prediction unit 126 then obtains motion information (for example, a motion vector) for compensating for motion or change from the reference block or sub-block to the current block or sub-block. The inter prediction unit 126 performs motion compensation (or motion prediction) based on the motion information, and generates an inter prediction signal of the current block or sub-block. The inter prediction unit 126 outputs the generated inter prediction signal to the prediction control unit 128.

The motion information used for motion compensation may be signaled as an inter prediction signal in various forms. For example, the motion vector may also be signaled. As another example, the difference between the motion vector and the predicted motion vector (motion vector predictor) may be signaled.

[ basic flow of inter-frame prediction ]

Fig. 15 is a flowchart showing an example of a basic flow of inter prediction.

The inter prediction unit 126 first generates a prediction image (steps Se _1 to Se _ 3). Next, the subtraction unit 104 generates a difference between the current block and the predicted image as a prediction residual (step Se _ 4).

Here, the inter prediction unit 126 generates a prediction image by performing the determination of the Motion Vector (MV) of the current block (steps Se _1 and Se _2) and the motion compensation (step Se _3) in generating the prediction image. In addition, the inter-frame prediction unit 126 determines the MV by selecting a candidate motion vector (candidate MV) (step Se _1) and deriving the MV (step Se _2) in determining the MV. The selection of the candidate MVs is for example performed by selecting at least 1 candidate MV from the candidate MV list. In addition, in deriving the MV, the inter prediction unit 126 may select at least 1 candidate MV from among the at least 1 candidate MV, and determine the selected at least 1 candidate MV as the MV of the current block. Alternatively, the inter prediction unit 126 may determine the MV of the current block by searching for the region of the reference picture indicated by the candidate MV for each of the selected at least 1 candidate MV. The area for searching for the reference picture may be referred to as motion search (motion estimation).

In the above example, the steps Se _1 to Se _3 are performed by the inter prediction unit 126, but the processing of the step Se _1, the step Se _2, and the like may be performed by other components included in the encoding device 100.

[ procedure for deriving motion vector ]

Fig. 16 is a flowchart showing an example of motion vector derivation.

The inter prediction unit 126 derives the MV of the current block in a mode in which motion information (e.g., MV) is encoded. In this case, for example, the motion information is coded and signaled as a prediction parameter. That is, the encoded motion information is contained in the encoded signal (also referred to as encoded bitstream).

Alternatively, the inter prediction unit 126 derives the MV in a mode in which motion information is not encoded. In this case, motion information is not included in the encoded signal.

Here, the MV derivation modes may include a normal inter mode, a merge mode, a FRUC mode, an affine mode, and the like, which will be described later. Among these modes, the modes for encoding motion information include a normal inter mode, an affine mode, and an affine mode (specifically, an affine inter mode and an affine merge mode). The motion information may include not only the MV but also predicted motion vector selection information described later. In addition, as a mode in which motion information is not encoded, there is a FRUC mode or the like. The inter prediction unit 126 selects a mode for deriving the MV of the current block from among these multiple modes, and derives the MV of the current block using the selected mode.

Fig. 17 is a flowchart showing another example of motion vector derivation.

In the mode in which the difference MV is encoded, the inter prediction section 126 derives the MV of the current block. In this case, the difference MV is coded and signaled as a prediction parameter, for example. That is, the encoded difference MV is contained in the encoded signal. The difference MV is the difference between the MV of the current block and its predicted MV.

Alternatively, the inter prediction unit 126 derives the MV in a mode in which the difference MV is not encoded. In this case, the coded difference MV is not included in the coded signal.

Here, as described above, the MV derivation modes include the normal inter mode, the merge mode, the FRUC mode, the affine mode, and the like, which will be described later. Among these modes, the mode for encoding the difference MV includes a normal inter-frame mode, an affine mode (specifically, an affine inter-frame mode), and the like. Note that the modes in which the difference MV is not encoded include a FRUC mode, a merge mode, and an affine mode (specifically, an affine merge mode). The inter prediction unit 126 selects a mode for deriving the MV of the current block from among these multiple modes, and derives the MV of the current block using the selected mode.

[ procedure for deriving motion vector ]

Fig. 18 is a flowchart showing another example of motion vector derivation. The inter prediction mode, which is a mode for deriving MVs, includes a plurality of modes, and is roughly divided into a mode in which a difference MV is encoded and a mode in which a difference motion vector is not encoded. The modes in which the difference MV is not encoded include a merge mode, a FRUC mode, and an affine mode (specifically, an affine merge mode). As will be described in detail later, the merge mode is a mode in which the MV of the current block is derived by selecting a motion vector from neighboring encoded blocks, and the FRUC mode is a mode in which the MV of the current block is derived by searching among encoded regions. The affine mode is a mode in which the motion vectors of the respective sub-blocks constituting the current block are derived by assuming affine transformation, and the MV of the current block is derived.

Specifically, as shown in the figure, when the inter prediction mode information indicates 0 (0 in Sf _1), the inter prediction unit 126 derives the motion vector (Sf _2) in the merge mode. When the inter prediction mode information indicates 1 (1 in Sf _1), the inter prediction unit 126 derives a motion vector (Sf _3) in the FRUC mode. When the inter-prediction mode information indicates 2 (2 in Sf _1), the inter-prediction unit 126 derives the motion vector (Sf _4) by the affine mode (specifically, the affine merge mode). When the inter prediction mode information indicates 3 (3 in Sf _1), the inter prediction unit 126 derives a motion vector (Sf _5) from a mode in which the difference MV is encoded (for example, a normal inter mode).

[ MV derivation > common interframe mode ]

The normal inter mode is an inter prediction mode in which an MV of a current block is derived based on a block similar to an image of the current block from a region of a reference picture represented by a candidate MV. Also, in the normal inter mode, the difference MV is encoded.

Fig. 19 is a flowchart showing an example of inter prediction in the normal inter mode.

The inter prediction unit 126 first acquires a plurality of candidate MVs for the current block based on information such as MVs of a plurality of encoded blocks located temporally or spatially around the current block (step Sg _ 1). That is, the inter prediction unit 126 creates a candidate MV list.

Next, the inter prediction unit 126 extracts N (N is an integer equal to or greater than 2) candidate MVs from the plurality of candidate MVs obtained in step Sg _1 as predicted motion vector candidates (also referred to as predicted MV candidates) in a predetermined order of priority (step Sg _ 2). The priority order may be set in advance for each of the N candidate MVs.

Next, the inter prediction unit 126 selects 1 predicted motion vector candidate from the N predicted motion vector candidates as a predicted motion vector (also referred to as a predicted MV) of the current block (step Sg _ 3). At this time, the inter prediction unit 126 encodes predicted motion vector selection information for identifying the selected predicted motion vector into the stream. The stream is the above-described encoded signal or encoded bit stream.

Next, the inter prediction unit 126 refers to the encoded reference picture to derive the MV of the current block (step Sg _ 4). At this time, the inter prediction unit 126 also encodes the difference between the derived MV and the predicted motion vector as a difference MV into the stream. In addition, the encoded reference picture is a picture composed of a plurality of blocks reconstructed after encoding.

Finally, the inter prediction unit 126 performs motion compensation on the current block using the derived MV and the encoded reference picture, thereby generating a prediction image of the current block (step Sg _ 5). The predicted image is the inter prediction signal described above.

Information indicating an inter prediction mode (in the above example, a normal inter mode) used for generating a prediction image, which is included in the encoded signal, is encoded as a prediction parameter, for example.

In addition, the candidate MV list may be used in common with lists used in other modes. Further, the processing on the candidate MV list may also be applied to the processing on the list used in other modes. The processing on the candidate MV list is, for example, extraction or selection of a candidate MV from the candidate MV list, rearrangement of a candidate MV, deletion of a candidate MV, or the like.

[ MV derivation > merge mode ]

The merge mode is an inter prediction mode that derives a candidate MV from the list of candidate MVs as the MV for the current block.

Fig. 20 is a flowchart showing an example of inter prediction by the merge mode.

The inter prediction unit 126 first obtains a plurality of candidate MVs for the current block based on information such as MVs of a plurality of encoded blocks located temporally or spatially around the current block (step Sh _ 1). That is, the inter prediction unit 126 creates a candidate MV list.

Next, the inter prediction unit 126 derives the MV of the current block by selecting 1 candidate MV from the plurality of candidate MVs obtained in step Sh _1 (step Sh _ 2). At this time, the inter prediction unit 126 encodes MV selection information for identifying the selected candidate MV into the stream.

Finally, the inter prediction unit 126 performs motion compensation on the current block using the derived MV and the encoded reference picture, thereby generating a prediction image of the current block (step Sh _ 3).

Information indicating an inter prediction mode (in the above example, the merge mode) used for generating a prediction image, which is included in the encoded signal, is encoded as a prediction parameter, for example.

Fig. 21 is a conceptual diagram for explaining an example of motion vector derivation processing of a current picture in the merge mode.

First, a predicted MV list in which candidates of predicted MVs are registered is generated. The candidates for the prediction MV include a spatial neighboring prediction MV which is an MV owned by a plurality of coded blocks located in the spatial periphery of the target block, a temporal neighboring prediction MV which is an MV owned by a block in the vicinity of the position of the target block in the coded reference picture, a combined prediction MV which is an MV generated by combining the spatial neighboring prediction MV and the MV value of the temporal neighboring prediction MV, and a zero prediction MV which is an MV having a value of zero.

Next, 1 predicted MV is selected from the plurality of predicted MVs registered in the predicted MV list, and the MV of the target block is determined.

Further, the variable length encoding unit encodes the prediction MV by describing, in the stream, merge _ idx which is a signal indicating which prediction MV has been selected.

The predicted MVs registered in the predicted MV list described in fig. 21 are examples, and the number of predicted MVs may be different from the number of predicted MVs in the figure, or the predicted MVs may be configured to include a type other than a type of predicted MV in the figure, or may be configured to add predicted MVs other than the type of predicted MVs in the figure.

The final MV may be determined by performing DMVR (decoder motion vector refinement) processing, which will be described later, using the MV of the target block derived in the merge mode.

The predicted MV candidates are the above-described candidate MVs, and the predicted MV list is the above-described candidate MV list. The candidate MV list may also be referred to as a candidate list. Also, merge _ idx is MV selection information.

[ MV derivation > FRUC mode ]

The motion information may be transmitted from the decoding apparatus side without converting the motion information into a signal from the encoding apparatus side. As described above, the merge mode defined by the h.265/HEVC specification may be used. Further, for example, motion information may be derived by performing motion search on the decoding apparatus side. In an embodiment, on the decoding apparatus side, motion search is performed without using the pixel values of the current block.

Here, a mode in which the decoding apparatus side performs motion search will be described. The mode in which the motion search is performed on the decoding apparatus side may be referred to as a PMMVD (pattern matched motion vector derivation) mode or a FRUC (frame rate up-conversion) mode.

Fig. 22 shows an example of FRUC processing in the form of a flowchart. First, a list of a plurality of candidates each having a predicted Motion Vector (MV) (i.e., a candidate MV list, which may be common to a merge list) is generated with reference to a motion vector of an encoded block spatially or temporally adjacent to a current block (step Si _ 1). Next, the best candidate MV is selected from among the plurality of candidate MVs registered in the candidate MV list (step Si _ 2). For example, the evaluation value of each of the candidate MVs included in the candidate MV list is calculated, and 1 candidate MV is selected based on the evaluation values. And, based on the selected candidate motion vector, a motion vector for the current block is derived (step Si _ 4). Specifically, for example, the motion vector of the selected candidate (best candidate MV) is derived as it is as a motion vector for the current block. Furthermore, the motion vector for the current block may also be derived, for example, by pattern matching in a peripheral region of a position within the reference picture corresponding to the selected candidate motion vector. That is, pattern matching using a reference picture and search for an evaluation value may be performed for a region around the optimal candidate MV, and if there is an MV having a better evaluation value, the optimal candidate MV may be updated to the MV and set as the final MV of the current block. The processing to update to the MV having a better evaluation value may not be performed.

Finally, the inter prediction unit 126 performs motion compensation on the current block using the derived MV and the encoded reference picture, thereby generating a prediction image of the current block (step Si _ 5).

When the processing is performed in units of sub-blocks, the same processing may be performed.

The evaluation value may also be calculated by various methods. For example, a reconstructed image of a region in the reference picture corresponding to the motion vector is compared with a reconstructed image of a predetermined region (this region may be a region of another reference picture or a region of a block adjacent to the current picture, for example, as described below). The predetermined region may be set in advance.

Further, the difference between the pixel values of the 2 reconstructed images may be calculated and used for the evaluation value of the motion vector. In addition, the evaluation value may be calculated using information other than the difference.

Next, an example of pattern matching will be described in detail. First, 1 candidate MV included in a candidate MV list (e.g., merge list) is selected as a starting point of search based on style matching. For example, as the pattern matching, the 1 st pattern matching or the 2 nd pattern matching may be used. The pattern 1 matching and the pattern 2 matching may be referred to as bidirectional matching (binary matching) and template matching (template matching), respectively.

[ MV derivation > FRUC > bidirectional matching ]

In the 1 st pattern matching, pattern matching is performed between 2 blocks along a motion track (motion track) of the current block within different 2 reference pictures. Therefore, in the 1 st pattern matching, as a predetermined region used for the calculation of the evaluation value of the candidate described above, a region in another reference picture along the motion trajectory of the current block is used. The predetermined region may be set in advance.

Fig. 23 is a conceptual diagram for explaining an example of pattern 1 matching (bidirectional matching) between 2 blocks of 2 reference pictures along a motion trajectory. As shown in fig. 23, in the 1 st pattern matching, 2 motion vectors (MV0, MV1) are derived by searching for the most matched pair among pairs (pair) of 2 blocks within different 2 reference pictures (Ref0, Ref1) along the motion trajectory of the current block (Cur block). Specifically, as the current block, a difference between the reconstructed image at the specified position in the 1 st encoded reference picture (Ref0) specified by the MV candidate and the reconstructed image at the specified position in the 2 nd encoded reference picture (Ref1) specified by the symmetric MV obtained by scaling the MV candidate at the display time interval is derived, and the evaluation value is calculated using the obtained difference value. The candidate MV having the best evaluation value among the plurality of candidate MVs can be selected as the final MV, which can bring about a better result.

Under the assumption of a continuous motion trajectory, the motion vectors (MV0, MV1) indicating the 2 reference blocks are proportional with respect to the temporal distance (TD0, TD1) between the current picture (Cur Pic) and the 2 reference pictures (Ref0, Ref 1). For example, when the current picture is temporally located between 2 reference pictures and the temporal distances from the current picture to the 2 reference pictures are equal, a mirror-symmetric bidirectional motion vector is derived by the 1 st pattern matching.

[ MV derivation > FRUC > template matching ]

In the 2 nd pattern matching (template matching), pattern matching is performed between a template within the current picture (a block adjacent to the current block within the current picture (e.g., an upper and/or left adjacent block)) and a block within the reference picture. Thus, in the 2 nd pattern matching, as a predetermined region used for calculation of the evaluation value of the candidate described above, a block adjacent to the current block within the current picture is used.

Fig. 24 is a conceptual diagram for explaining an example of pattern matching (template matching) between a template in a current picture and a block in a reference picture. As shown in fig. 24, in the 2 nd pattern matching, a motion vector of the current block is derived by searching for a block that best matches a block adjacent to the current block (Cur block) in the current picture (Cur Pic) in the reference picture (Ref 0). Specifically, the difference between the reconstructed image of the encoded region adjacent to the left or one of the encoded regions adjacent to the current block and the reconstructed image at the same position in the encoded reference picture (Ref0) specified by the candidate MV is derived, and the evaluation value is calculated using the obtained difference value, whereby the candidate MV having the best evaluation value among the plurality of candidate MVs can be selected as the best candidate MV.

Information indicating whether such FRUC mode is applied (for example, referred to as a FRUC flag) may also be signaled on the CU level. In addition, in the case where the FRUC mode is applied (for example, in the case where the FRUC flag is true), information indicating the applicable pattern matching method (1 st pattern matching or 2 nd pattern matching) may be signaled at the CU level. The signaling of such information is not necessarily limited to the CU level, and may be at another level (for example, a sequence level, a picture level, a slice level, a tile level, a CTU level, or a subblock level).

[ MV derivation > affine model ]

Next, an affine mode in which motion vectors are derived in sub-block units based on motion vectors of a plurality of adjacent blocks will be described. This mode may be referred to as an affine motion compensation prediction (affine motion compensation prediction) mode.

Fig. 25A is a conceptual diagram for explaining an example of deriving a motion vector in units of sub-blocks based on motion vectors of a plurality of adjacent blocks. In fig. 25A, the current block includes 16 4 × 4 sub-blocks. Here, a motion vector v of the top left control point of the current block is derived based on the motion vectors of the neighboring blocks0Likewise, the motion vector v of the top-right corner control point of the current block is derived based on the motion vectors of the neighboring sub-blocks1. Further, 2 motion vectors v may be expressed by the following expression (1A)0And v1Projection, the motion vector (v) of each sub-block in the current block may also be derivedx,vy)。

[ numerical formula 1]

Here, x and y represent the horizontal position and the vertical position of the subblock, respectively, and w represents a predetermined weight coefficient. The predetermined weight coefficient may be determined in advance.

Information representing such affine patterns (e.g., referred to as affine flags) may also be signaled at the CU level. The signaling of the information indicating the affine pattern is not necessarily limited to the CU level, and may be at another level (for example, a sequence level, a picture level, a slice level, a tile level, a CTU level, or a subblock level).

In addition, such affine patterns may include several patterns in which the methods of deriving the motion vectors of the upper left and upper right corner control points are different. For example, among the affine modes, there are 2 modes of an affine inter-frame (also referred to as affine normal inter-frame) mode and an affine merge mode.

[ MV derivation > affine model ]

Fig. 25B is a conceptual diagram for explaining an example of deriving a motion vector for each sub-block in the affine mode having 3 control points. In fig. 25B, the current block includes 16 4 × 4 sub-blocks. Here, a motion vector v of the top left control point of the current block is derived based on the motion vectors of the neighboring blocks0Likewise, the motion vector v of the top right control point of the current block is derived based on the motion vectors of the neighboring blocks1Deriving a motion vector v for the lower left corner control point of the current block based on the motion vectors of the neighboring blocks2. Furthermore, 3 motion vectors v may be expressed by the following expression (1B)0、v1And v2Projection, the motion vectors (vx, vy) of the sub-blocks within the current block can also be derived.

[ numerical formula 2]

Here, x and y denote a horizontal position and a vertical position of the center of the sub-block, respectively, w denotes a width of the current block, and h denotes a height of the current block.

It is also possible to signalize different control point numbers (e.g. 2 and 3) affine modes switching at CU level. In addition, the information indicating the control point number of the affine mode used at the CU level may be signaled at another level (for example, a sequence level, a picture level, a slice level, a tile level, a CTU level, or a subblock level).

In addition, the affine mode having 3 control points may include several modes in which the derivation methods of the motion vectors of the upper-left, upper-right, and lower-left corner control points are different. For example, among the affine modes, there are 2 modes of an affine inter-frame (also referred to as affine normal inter-frame) mode and an affine merge mode.

[ MV derivation > affine merging mode ]

Fig. 26A, 26B, and 26C are conceptual diagrams for explaining the affine merging mode.

In the affine merge mode, as shown in fig. 26A, for example, the prediction motion vectors of the respective control points of the current block are calculated based on a plurality of motion vectors corresponding to blocks encoded in the affine mode among the encoded block a (left), block B (top), block C (top right), block D (bottom left), and block E (top left) adjacent to the current block. Specifically, the encoded blocks a (left), B (top), C (top right), D (bottom left), and E (top left) are examined in the order of the blocks to determine the first valid block encoded in the affine mode. Based on a plurality of motion vectors corresponding to the determined block, a predicted motion vector of a control point of the current block is calculated.

For example, as shown in fig. 26B, in the case where a block a adjacent to the left of the current block is encoded with an affine mode having 2 control points, a motion vector v projected at the positions of the upper left corner and the upper right corner of the encoded block including the block a is derived3And v4. And, based on the derived motion vector v3And v4Calculating a predicted motion vector v of a control point at the upper left corner of the current block0And predicted motion vector v of control point in upper right corner1

For example, as shown in fig. 26C, in the case where a block a adjacent to the left of the current block is encoded with an affine mode having 3 control points, a motion vector v projected at positions including the upper left corner, the upper right corner, and the lower left corner of the encoded block of the block a is derived3、v4And v5. And, based on the derived motionMotion vector v3、v4And v5Calculating a predicted motion vector v of a control point at the upper left corner of the current block0Predicted motion vector v of control point in upper right corner1And predicted motion vector v of control point in lower left corner2

The predicted motion vector derivation method may be used to derive a predicted motion vector for each control point of the current block in step Sj _1 of fig. 29, which will be described later.

Fig. 27 is a flowchart showing an example of the affine merging mode.

In the affine merge mode, as shown in the figure, first, the inter prediction unit 126 derives the prediction MVs of the control points of the current block (step Sk _ 1). The control points are points at the upper left and upper right corners of the current block as shown in fig. 25A, or points at the upper left, upper right, and lower left corners of the current block as shown in fig. 25B.

That is, as shown in fig. 26A, the inter prediction unit 126 examines the coded blocks a (left), B (top), C (top right), D (bottom left), and E (top left) in that order, and determines the first valid block to be coded in the affine mode.

When the block a is determined and has 2 control points, the inter prediction unit 126 uses the motion vectors v in the upper left corner and the upper right corner of the encoded block including the block a as shown in fig. 26B3And v4Calculating the motion vector v of the control point at the top left corner of the current block0And the motion vector v of the control point in the upper right corner1. For example, the inter-prediction unit 126 encodes the motion vector v of the upper left corner and the upper right corner of the block3And v4Projected into the current block, the predicted motion vector v of the control point in the upper left corner of the current block is calculated0Predicted motion vector v with control point in upper right corner1And (4) calculating.

Alternatively, when the block a is determined and has 3 control points, the inter prediction unit 126, as shown in fig. 26C, performs inter prediction based on the motion vector v including the upper left corner, the upper right corner, and the lower left corner of the encoded block of the block a3、v4And v5Calculating the control point of the upper left corner of the current blockMotion vector v0Motion vector v of control point in upper right corner1Motion vector v of control point in lower left corner2. For example, the inter-prediction unit 126 encodes the motion vectors v of the upper left corner, the upper right corner, and the lower left corner of the block3、v4And v5Projected into the current block, the predicted motion vector v of the control point in the upper left corner of the current block is calculated0Predicted motion vector v of control point in upper right corner1And the motion vector v of the control point in the lower left corner2

Next, the inter prediction unit 126 performs motion compensation on each of a plurality of sub blocks included in the current block. That is, the inter prediction unit 126 uses 2 predicted motion vectors v for each of the plurality of sub-blocks0And v1And the above-mentioned formula (1A) or 3 predicted motion vectors v0、v1And v2And the above equation (1B), and the motion vector of the sub-block is calculated as the affine MV (step Sk _ 2). Then, the inter prediction unit 126 performs motion compensation on the sub-block using the affine MVs and the encoded reference picture (step Sk _ 3). As a result, the current block is motion compensated, and a prediction image of the current block is generated.

[ MV derivation > affine inter mode ]

Fig. 28A is a conceptual diagram for explaining the affine inter mode having 2 control points.

In this affine inter mode, as shown in fig. 28A, a motion vector selected from among motion vectors of encoded blocks a, B, and C adjacent to the current block is used as a predicted motion vector v of a control point in the upper left corner of the current block0. Also, a motion vector selected from among motion vectors of encoded blocks D and E adjacent to the current block is used as a predicted motion vector v of a control point in the upper right corner of the current block1

Fig. 28B is a conceptual diagram for explaining the affine inter mode having 3 control points.

In this affine inter mode, as shown in fig. 28B, a motion vector selected from among motion vectors of encoded blocks a, B, and C adjacent to the current block is used as the left of the current blockPredicted motion vector v for upper corner control points0. Also, a motion vector selected from among motion vectors of encoded blocks D and E adjacent to the current block is used as a predicted motion vector v of a control point in the upper right corner of the current block1. Further, a motion vector selected from motion vectors of the encoded blocks F and G adjacent to the current block is used as the predicted motion vector v of the control point in the lower left corner of the current block2

Fig. 29 is a flowchart showing an example of the affine inter mode.

As shown in the figure, in the affine inter mode, first, the inter prediction unit 126 derives prediction MVs (v) for each of 2 or 3 control points of the current block0、v1) Or (v)0、v1、v2) (step Sj _ 1). The control point is a point of the upper left corner, the upper right corner, or the lower left corner of the current block, as shown in fig. 25A or fig. 25B.

That is, the inter prediction unit 126 selects a motion vector of a block in the encoded blocks in the vicinity of each control point of the current block shown in fig. 28A or 28B, and derives a predicted motion vector (v) of the control point of the current block0、v1) Or (v)0、v1、v2). At this time, the inter prediction unit 126 encodes predicted motion vector selection information for identifying the selected 2 motion vectors into the stream.

For example, the inter prediction unit 126 may determine, by cost evaluation or the like, which block of the motion vector is selected from the coded block adjacent to the current block as the predicted motion vector of the control point, and may describe a flag indicating which predicted motion vector is selected in the bit stream.

Next, the inter-frame prediction unit 126 performs motion search while updating the prediction motion vector selected or derived in step Sj _1 (step Sj _2) (steps Sj _3 and Sj _ 4). That is, the inter prediction unit 126 calculates the motion vector of each sub-block corresponding to the updated prediction motion vector as an affine MV by using the above expression (1A) or expression (1B) (step Sj _ 3). Then, the inter prediction unit 126 performs motion compensation on each sub-block using the affine MVs and the encoded reference picture (step Sj _ 4). As a result, the inter prediction unit 126 determines, for example, a predicted motion vector that has the smallest cost as a motion vector of the control point in the motion search cycle (step Sj _ 5). At this time, the inter prediction unit 126 further encodes each difference between the determined MV and the predicted motion vector as a difference MV into the stream.

Finally, the inter prediction unit 126 performs motion compensation on the current block using the determined MV and the encoded reference picture, thereby generating a predicted image of the current block (step Sj _ 6).

[ MV derivation > affine inter mode ]

In the case of signaling by switching the affine modes of different control points (for example, 2 and 3) at the CU level, there are cases where the number of control points is different between the encoded block and the current block. Fig. 30A and 30B are conceptual diagrams for explaining a prediction vector derivation method of control points in the case where the number of control points in an encoded block and a current block is different.

For example, as shown in fig. 30A, in the case where the current block has 3 control points of the upper left corner, the upper right corner, and the lower left corner, and the block a adjacent to the left of the current block is encoded with the affine mode having 2 control points, the motion vector v projected at the positions of the upper left corner and the upper right corner of the encoded block including the block a is derived3And v4. And, from the derived motion vector v3And v4Calculating a predicted motion vector v of a control point at the upper left corner of the current block0And predicted motion vector v of control point in upper right corner1. Further, based on the derived motion vector v0And v1Calculating the predicted motion vector v of the control point at the lower left corner2

For example, as shown in fig. 30B, in the case of affine mode encoding in which the current block has 2 control points in the upper left corner and the upper right corner and the block a adjacent to the current block on the left side has 3 control points, a motion vector v projected at the positions of the upper left corner, the upper right corner, and the lower left corner of the encoded block including the block a is derived3、v4And v5. And, from the derived motion vector v3、v4And v5Calculating a control point of the upper left corner of the current blockPredicted motion vector v of0And predicted motion vector v of control point in upper right corner1

This predictive motion vector derivation method can also be used for deriving the predictive motion vector for each control point of the current block in step Sj _1 in fig. 29.

[ MV derivation > DMVR ]

Fig. 31A is a flowchart showing a relationship between the merge mode and the DMVR.

The inter prediction section 126 derives a motion vector of the current block in the merge mode (step Sl _ 1). Next, the inter prediction unit 126 determines whether or not to perform motion search, which is search for a motion vector (step Sl _ 2). Here, if it is determined that the motion search is not performed (no in step Sl _2), the inter prediction unit 126 determines the motion vector derived in step Sl _1 as the final motion vector for the current block (step Sl _ 4). That is, in this case, the motion vector of the current block is decided in the merge mode.

On the other hand, if it is determined in step Sl _1 that motion search is to be performed (yes in step Sl _2), the inter prediction unit 126 derives a final motion vector for the current block by searching for a peripheral area of the reference picture indicated by the motion vector derived in step Sl _1 (step Sl _ 3). That is, in this case, the motion vector of the current block is decided with the DMVR.

Fig. 31B is a conceptual diagram for explaining an example of DMVR processing for determining an MV.

First, an optimal MVP set as a current block (e.g., in a merge mode) is set as a candidate MV. Then, according to the MV candidate (L0), a reference pixel is specified from the 1 st reference picture (L0) which is an encoded picture in the L0 direction. Similarly, according to the candidate MV (L1), a reference pixel is determined from the 2 nd reference picture (L1) which is an encoded picture in the L1 direction. The template is generated by taking the average of these reference pixels.

Next, using the templates, the peripheral regions of the MV candidates in the 1 st reference picture (L0) and the 2 nd reference picture (L1) are searched, and the MV with the lowest cost is determined as the final MV. The cost value may be calculated using, for example, a difference between each pixel value of the template and each pixel value of the search area, a candidate MV value, or the like.

Typically, the configuration and operation of the processing described here are basically common to the encoding device and the decoding device described later.

Even if the processing example itself described here is not used, any processing may be used as long as the processing can extract the final MV by searching the periphery of the candidate MV.

[ motion Compensation > BIO/OBMC ]

In motion compensation, there is a mode in which a prediction image is generated and corrected. This mode is, for example, BIO and OBMC described later.

Fig. 32 is a flowchart showing an example of generation of a prediction image.

The inter prediction unit 126 generates a prediction image (step Sm _1), and corrects the prediction image in, for example, one of the modes described above (step Sm _ 2).

Fig. 33 is a flowchart showing another example of generation of a prediction image.

The inter prediction unit 126 determines a motion vector of the current block (step Sn _ 1). Next, the inter prediction unit 126 generates a prediction image (step Sn _2), and determines whether or not to perform correction processing (step Sn _ 3). Here, if it is determined that the correction process is performed (yes in step Sn _3), the inter prediction unit 126 corrects the predicted image to generate a final predicted image (step Sn _ 4). On the other hand, if the inter prediction unit 126 determines not to perform the correction processing (no in step Sn _3), it outputs the predicted image as the final predicted image without correction (step Sn _ 5).

In addition, there is a mode of correcting the luminance when generating a prediction image in motion compensation. This mode is, for example, LIC described later.

Fig. 34 is a flowchart showing another example of generation of a prediction image.

The inter prediction unit 126 derives a motion vector of the current block (step So _ 1). Next, the inter prediction unit 126 determines whether or not to perform the luminance correction processing (step So _ 2). Here, if it is determined that the brightness correction process is performed (yes in step So _2), the inter-frame prediction unit 126 generates a prediction image while performing the brightness correction (step So _ 3). That is, a prediction image is generated by LIC. On the other hand, if the inter-frame prediction unit 126 determines not to perform the luminance correction process (no in step So _2), it generates a prediction image by normal motion compensation without performing the luminance correction (step So _ 4).

[ motion Compensation > OBMC ]

The inter prediction signal may also be generated using not only motion information of the current block obtained through motion search but also motion information of neighboring blocks. Specifically, the inter prediction signal may be generated in units of sub blocks within the current block by adding a prediction signal based on motion information obtained by motion search (in the reference picture) and a prediction signal based on motion information of an adjacent block (in the current picture) by weighting. Such inter-frame prediction (motion compensation) may be referred to as OBMC (overlapped block motion compensation).

In the OBMC mode, information indicating the size of a sub-block used for OBMC (for example, referred to as an OBMC block size) may be signaled at a sequence level. Further, information indicating whether or not the OBMC mode is applied (for example, referred to as an OBMC flag) may be signaled on the CU level. The level of signaling of such information is not necessarily limited to the sequence level and CU level, and may be other levels (for example, picture level, slice level, tile level, CTU level, or subblock level).

An example of the OBMC mode will be described more specifically. Fig. 35 and 36 are a flowchart and a conceptual diagram for explaining an outline of the predicted image correction processing by the OBMC processing.

First, as shown in fig. 36, a prediction image (Pred) based on normal motion compensation is acquired using a Motion Vector (MV) assigned to a processing target (current) block. In fig. 36, an arrow "MV" indicates a reference picture, indicating what the current block of the current picture refers to in order to obtain a predicted image.

Next, the motion vector (MV _ L) derived for the encoded left neighboring block is applied to the block to be encoded (reused), and a predicted image (Pred _ L) is obtained. The motion vector (MV _ L) is represented by an arrow "MV _ L" indicating a reference picture from the current block. Then, the 1 st correction of the predicted image is performed by superimposing the 2 predicted images Pred and Pred _ L. This has the effect of blending the boundaries between adjacent blocks.

Similarly, a motion vector (MV _ U) derived for the encoded upper adjacent block is applied to the block to be encoded (reused), and a prediction image (Pred _ U) is obtained. The motion vector (MV _ U) is represented by an arrow "MV _ U" indicating a reference picture from the current block. Then, the predicted image Pred _ U is superimposed on the predicted image (for example, Pred and Pred _ L) after the 1 st correction, thereby performing the 2 nd correction of the predicted image. This has the effect of blending the boundaries between adjacent blocks. The prediction image obtained by the 2 nd correction is the final prediction image of the current block (smoothed) after being mixed with the boundary of the adjacent block.

In addition, although the above example is a correction method using 2 paths of left and upper adjacent blocks, the correction method may be a correction method using 3 paths or more of right and/or lower adjacent blocks.

The region to be superimposed may be not the entire pixel region of the block but only a partial region near the block boundary.

Here, the prediction image correction processing of the OBMC for obtaining 1 prediction image Pred by superimposing the additional prediction images Pred _ L and Pred _ U on 1 reference picture will be described. However, when the predicted image is corrected based on a plurality of reference images, the same processing may be applied to each of the plurality of reference images. In such a case, by performing OBMC image correction based on a plurality of reference pictures, after obtaining corrected predicted images from the respective reference pictures, the obtained plurality of corrected predicted images are further superimposed to obtain a final predicted image.

In OBMC, the unit of the target block may be a prediction block unit or a sub-block unit obtained by dividing a prediction block into further blocks.

As a method of determining whether or not the OBMC processing is applied, for example, there is a method of using an OBMC _ flag as a signal indicating whether or not the OBMC processing is applied. As a specific example, the encoding device may determine whether or not the target block belongs to a region with a complex motion. The encoding apparatus performs encoding by applying OBMC processing to set a value of 1 as an OBMC _ flag when the encoding apparatus belongs to an area with complicated motion, and performs encoding of a block by setting a value of 0 as the OBMC _ flag when the encoding apparatus does not belong to an area with complicated motion. On the other hand, the decoding apparatus decodes the OBMC _ flag described in the stream (for example, compressed sequence), and switches whether or not the OBMC processing is applied according to the decoded value, thereby performing decoding.

In the above example, the inter prediction unit 126 generates 1 rectangular prediction image for a rectangular current block. However, the inter prediction unit 126 may generate a plurality of prediction images having different shapes from a rectangle for the rectangular current block, and combine the prediction images to generate a final rectangular prediction image. The shape other than rectangular may be triangular, for example.

Fig. 37 is a conceptual diagram for explaining generation of a predicted image of 2 triangles.

The inter prediction unit 126 performs motion compensation on the 1 st partition of the triangle in the current block using the 1 st MV of the 1 st partition, thereby generating a triangle prediction image. Similarly, the inter prediction unit 126 performs motion compensation on the 2 nd partition of the triangle in the current block using the 2 nd MV of the 2 nd partition, thereby generating a triangular prediction image. The inter prediction unit 126 combines these prediction images to generate a prediction image having the same rectangular shape as the current block.

In the example shown in fig. 37, the 1 st and 2 nd partitions are triangular, but may be trapezoidal or may have different shapes. Further, in the example shown in fig. 37, the current block is constituted by 2 partitions, but may be constituted by 3 or more partitions.

The 1 st partition and the 2 nd partition may be repeated. That is, the 1 st partition and the 2 nd partition may include the same pixel region. In this case, the prediction image of the current block may also be generated using the prediction image in the 1 st partition and the prediction image in the 2 nd partition.

In this example, the example in which the predicted image is generated by inter prediction for all of the 2 partitions is shown, but the predicted image may be generated by intra prediction for at least 1 partition.

[ motion Compensation > BIO ]

Next, a method of deriving a motion vector will be described. First, a mode of deriving a motion vector based on a model assuming constant-velocity linear motion will be described. This mode may be called a BIO (bi-directional optical flow) mode.

Fig. 38 is a conceptual diagram for explaining a model assuming constant-velocity linear motion. In fig. 38, (vx, vy) represents a velocity vector, and τ 0 and τ 1 represent temporal distances between the current picture (Cur Pic) and 2 reference pictures (Ref0 and Ref1), respectively. (MVx0, MVy0) represents a motion vector corresponding to the reference picture Ref0, and (MVx1, MVy1) represents a motion vector corresponding to the reference picture Ref 1.

At this time, (MVx0, MVy0) and (MVx1, MVy1) are represented as (vx τ 0, vy τ 0) and (-vx τ 1, -vy τ 1), respectively, on the assumption of constant-velocity linear motion of the velocity vector (vx, vy), and the following optical flow equation (2) may be employed.

[ numerical formula 3]

Here, i (k) represents the luminance value of the reference image k (k is 0 or 1) after the motion compensation. The optical flow equation represents that the sum of (i) the temporal differential of the luminance values, (ii) the product of the velocity in the horizontal direction and the horizontal component of the spatial gradient of the reference image, and (iii) the product of the velocity in the vertical direction and the vertical component of the spatial gradient of the reference image is equal to zero. The motion vector of a block unit obtained from the merge list or the like may be corrected in pixel units based on a combination of the optical flow equation and Hermite interpolation (Hermite interpolation).

Further, the motion vector may be derived on the decoding apparatus side by a method different from the derivation of the motion vector based on the model assuming the constant velocity linear motion. For example, the motion vector may be derived in sub-block units based on the motion vectors of a plurality of adjacent blocks.

[ motion Compensation > LIC ]

Next, an example of a mode for generating a prediction image (prediction) by LIC (local optical compensation) processing will be described.

Fig. 39 is a conceptual diagram for explaining an example of a predicted image generation method using luminance correction processing by LIC processing.

First, an MV is derived from an encoded reference picture, and a reference picture corresponding to a current block is acquired.

Next, for the current block, information indicating how the luminance values vary in the reference picture and the current picture is extracted. The extraction is performed based on the luminance pixel values of the encoded left neighboring reference region (peripheral reference region) and the encoded upper neighboring reference region (peripheral reference region) of the current picture and the luminance pixel value at the equivalent position within the reference picture specified by the derived MV. Then, the luminance correction parameter is calculated using information indicating how the luminance value changes.

A prediction image for the current block is generated by performing luminance correction processing to which the above-described luminance correction parameter is applied to the reference image in the reference picture specified by the MV.

The shape of the peripheral reference region in fig. 39 is an example, and other shapes may be used.

Although the process of generating the predicted image from 1 reference picture is described here, the same applies to the case of generating the predicted image from a plurality of reference pictures, and the predicted image may be generated by performing the luminance correction process on the reference images acquired from the respective reference pictures in the same manner as described above.

As a method of determining whether or not the LIC processing is applied, for example, there is a method of using LIC _ flag as a signal indicating whether or not the LIC processing is applied. As a specific example, in the encoding apparatus, it is determined whether or not the current block belongs to an area in which a luminance change has occurred, and if the current block belongs to the area in which the luminance change has occurred, the value is set to LIC _ flag and 1 is encoded by applying LIC processing, and if the current block does not belong to the area in which the luminance change has occurred, the value is set to LIC _ flag and 0 is encoded without applying LIC processing. On the other hand, the decoding device may decode LIC _ flag described in the stream, and switch whether to apply LIC processing or not according to the value to perform decoding.

As another method of determining whether or not to apply the LIC processing, for example, a method of determining whether or not to apply the LIC processing to the peripheral block is also available. As a specific example, when the current block is in the merge mode, it is determined whether or not the neighboring encoded blocks selected at the time of deriving the MV in the merge mode process have been encoded by applying the LIC process. Based on the result, whether to apply the LIC process or not is switched to perform encoding. In this example, the same processing is applied to the processing on the decoding apparatus side.

The following describes a technical solution of the LIC processing (luminance correction processing) with reference to fig. 39, and details thereof are described below.

First, the inter prediction unit 126 derives a motion vector for acquiring a reference image corresponding to a block to be encoded, from a reference picture that is an already encoded picture.

Next, the inter-frame prediction unit 126 uses the luminance pixel values of the left-adjacent and upper-adjacent encoded peripheral reference regions and the luminance pixel value at the same position in the reference picture specified by the motion vector for the encoding target block, extracts information indicating how the luminance value has changed in the reference picture and the encoding target picture, and calculates the luminance correction parameter. For example, the luminance pixel value of a pixel in the peripheral reference region in the encoding target picture is p0, and the luminance pixel value of a pixel in the peripheral reference region in the reference picture at the same position as the pixel is p 1. The inter-frame prediction unit 126 calculates, as luminance correction parameters, coefficients a and B optimized to p0 as a × p1+ B for a plurality of pixels in the peripheral reference region.

Next, the inter prediction unit 126 performs luminance correction processing on the reference image in the reference picture specified by the motion vector using the luminance correction parameter, thereby generating a prediction image for the block to be encoded. For example, the luminance pixel value in the reference image is p2, and the luminance pixel value of the predicted image after the luminance correction processing is p 3. The inter prediction unit 126 generates a prediction image after the luminance correction process by calculating a × p2+ B to p3 for each pixel in the reference image.

The shape of the peripheral reference region in fig. 39 is an example, and other shapes may be used. Further, a part of the peripheral reference region shown in fig. 39 may be used. For example, a region including a predetermined number of pixels spaced apart from each of the upper adjacent pixel and the left adjacent pixel may be used as the peripheral reference region. The peripheral reference region is not limited to a region adjacent to the encoding target block, and may be a region not adjacent to the encoding target block. A predetermined number of pixels may be set in advance.

In the example shown in fig. 39, the peripheral reference region in the reference picture is a region specified by a motion vector of the picture to be coded from the peripheral reference region in the picture to be coded, but may be a region specified by another motion vector. For example, the other motion vector may be a motion vector of a peripheral reference region in the encoding target picture.

Although the operation of the encoding device 100 is described here, the operation of the decoding device 200 is typically the same.

In addition, the LIC processing may be applied not only to luminance but also to color difference. In this case, the correction parameters may be derived for Y, Cb and Cr, respectively, or a common correction parameter may be used for any of them.

Moreover, the LIC processing may be applied in units of sub-blocks. For example, the correction parameter may be derived using the reference region around the current subblock and the reference region around the reference subblock in the reference picture specified by the MV of the current subblock.

[ prediction control section ]

The prediction control unit 128 selects either one of the intra prediction signal (the signal output from the intra prediction unit 124) and the inter prediction signal (the signal output from the inter prediction unit 126), and outputs the selected signal to the subtraction unit 104 and the addition unit 116 as a prediction signal.

As shown in fig. 1, in various examples of the encoding apparatus, the prediction control unit 128 may output the prediction parameters input to the entropy encoding unit 110. The entropy encoding unit 110 may generate an encoded bit stream (or sequence) based on the prediction parameter input from the prediction control unit 128 and the quantization coefficient input from the quantization unit 108. The prediction parameters may also be used in the decoding apparatus. The decoding apparatus may receive and decode the coded bit stream, and perform the same processing as the prediction processing performed by the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128. The prediction parameters may include a selection prediction signal (for example, a motion vector, a prediction type, or a prediction mode used in the intra prediction unit 124 or the inter prediction unit 126), or an arbitrary index, flag, or value that is based on or indicates a prediction process performed by the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128.

[ example of mounting of encoder ]

Fig. 40 is a block diagram showing an example of mounting the encoder apparatus 100. The encoding device 100 includes a processor a1 and a memory a 2. For example, a plurality of components of the coding apparatus 100 shown in fig. 1 are mounted via the processor a1 and the memory a2 shown in fig. 40.

The processor a1 is a circuit that performs information processing and is accessible to the memory a 2. For example, the processor a1 is a dedicated or general-purpose electronic circuit that encodes moving images. The processor a1 may be a CPU. The processor a1 may be an aggregate of a plurality of electronic circuits. For example, the processor a1 may function as a plurality of components among the plurality of components of the encoding device 100 shown in fig. 1 and the like.

The memory a2 is a dedicated or general-purpose memory that stores information for the processor a1 to encode moving pictures. The memory a2 may also be an electronic circuit, and may also be connected to the processor a 1. Additionally, memory a2 may also be included in processor a 1. The memory a2 may be an aggregate of a plurality of electronic circuits. The memory a2 may be a magnetic disk, an optical disk, or the like, and may be represented as a storage device, a recording medium, or the like. The memory a2 may be a nonvolatile memory or a volatile memory.

For example, memory a2 may store coded moving pictures or bit sequences corresponding to coded moving pictures. In addition, the memory a2 may store a program for the processor a1 to encode a moving image.

For example, the memory a2 may function as a component for storing information among a plurality of components of the encoding device 100 shown in fig. 1 and the like. For example, the memory a2 may also function as the block memory 118 and the frame memory 122 shown in fig. 1. More specifically, in the memory a2, a reconstructed block, a reconstructed picture, and the like may be stored.

In addition, the encoding device 100 may not be equipped with all of the plurality of components shown in fig. 1 and the like, or may not perform all of the plurality of processes described above. Some of the plurality of components shown in fig. 1 and the like may be included in another device, and some of the plurality of processes described above may be executed by another device.

[ decoding device ]

Next, a decoding apparatus that can decode, for example, the encoded signal (encoded bit stream) output from the above-described encoding apparatus 100 will be described. Fig. 41 is a block diagram showing a functional configuration of decoding apparatus 200 according to the embodiment. The decoding apparatus 200 is a moving picture decoding apparatus that decodes a moving picture in units of blocks.

As shown in fig. 41, the decoding device 200 includes an entropy decoding unit 202, an inverse quantization unit 204, an inverse transformation unit 206, an addition unit 208, a block memory 210, a loop filtering unit 212, a frame memory 214, an intra prediction unit 216, an inter prediction unit 218, and a prediction control unit 220.

The decoding apparatus 200 is realized by, for example, a general-purpose processor and a memory. In this case, when the software program stored in the memory is executed by the processor, the processor functions as the entropy decoding unit 202, the inverse quantization unit 204, the inverse transformation unit 206, the addition unit 208, the loop filtering unit 212, the intra prediction unit 216, the inter prediction unit 218, and the prediction control unit 220. The decoding device 200 may be realized as 1 or more dedicated electronic circuits corresponding to the entropy decoding unit 202, the inverse quantization unit 204, the inverse transform unit 206, the addition unit 208, the loop filter unit 212, the intra prediction unit 216, the inter prediction unit 218, and the prediction control unit 220.

Hereinafter, each of the components included in the decoding device 200 will be described after the flow of the overall processing of the decoding device 200 is described.

[ Overall flow of decoding processing ]

Fig. 42 is a flowchart showing an example of the overall decoding process performed by decoding apparatus 200.

First, the entropy decoding unit 202 of the decoding device 200 determines a partition pattern of a block of a fixed size (for example, 128 × 128 pixels) (step Sp _ 1). The division pattern is a division pattern selected by the encoding apparatus 100. Then, decoding apparatus 200 performs the processing of steps Sp _2 to Sp _6 for each of the plurality of blocks constituting the partition pattern.

That is, the entropy decoding unit 202 decodes (specifically, entropy decodes) the encoded quantized coefficient and prediction parameter of the decoding target block (also referred to as a current block) (step Sp _ 2).

Next, the inverse quantization unit 204 and the inverse transform unit 206 inverse-quantize and inverse-transform the plurality of quantized coefficients, thereby restoring the plurality of prediction residuals (i.e., the difference blocks) (step Sp _ 3).

Next, the prediction processing unit, which is composed of all or a part of the intra prediction unit 216, the inter prediction unit 218, and the prediction control unit 220, generates a prediction signal (also referred to as a prediction block) of the current block (step Sp _ 4).

Next, the addition unit 208 reconstructs the current block into a reconstructed image (also referred to as a decoded image block) by adding the prediction block to the difference block (step Sp _ 5).

Then, if the reconstructed image is generated, the loop filter unit 212 performs filtering on the reconstructed image (step Sp _ 6).

Then, the decoding device 200 determines whether or not the decoding of the entire picture is completed (step Sp _7), and if it is determined that the decoding is not completed (no in step Sp _7), repeats the processing from step Sp _ 1.

As shown in the figure, the processing of steps Sp _1 to Sp _7 is performed sequentially by the decoding device 200. Alternatively, a plurality of some of these processes may be performed in parallel, or the order may be replaced.

[ entropy decoding section ]

The entropy decoding unit 202 entropy-decodes the encoded bit stream. Specifically, the entropy decoding unit 202 performs arithmetic decoding from the coded bit stream into a binary signal, for example. Then, the entropy decoding unit 202 multiplies (dubinize) the binary signal. The entropy decoding unit 202 outputs the quantized coefficients to the inverse quantization unit 204 in units of blocks. The entropy decoding unit 202 may output the prediction parameters included in the encoded bit stream (see fig. 1) to the intra prediction unit 216, the inter prediction unit 218, and the prediction control unit 220 according to the embodiment. The intra prediction unit 216, the inter prediction unit 218, and the prediction control unit 220 can perform the same prediction processing as the processing performed by the intra prediction unit 124, the inter prediction unit 126, and the prediction control unit 128 on the encoding apparatus side.

[ inverse quantization part ]

The inverse quantization unit 204 inversely quantizes the quantization coefficient of the decoding target block (hereinafter referred to as the current block) input from the entropy decoding unit 202. Specifically, the inverse quantization unit 204 inversely quantizes the quantization coefficients of the current block based on the quantization parameters corresponding to the quantization coefficients, respectively. Then, the inverse quantization unit 204 outputs the inverse-quantized coefficient (i.e., transform coefficient) of the current block to the inverse transform unit 206.

[ inverse transformation section ]

The inverse transform unit 206 restores the prediction error by inversely transforming the transform coefficient input from the inverse quantization unit 204.

For example, in the case where the information read out from the encoded bitstream indicates that EMT or AMT is applied (e.g., the AMT flag is true), the inverse transform section 206 inversely transforms the transform coefficient of the current block based on the read out information indicating the transform type.

For example, when the information read out from the encoded bit stream indicates that NSST is applied, the inverse transform unit 206 applies inverse retransformation to the transform coefficients.

[ addition section ]

The addition section 208 reconstructs the current block by adding the prediction error, which is an input from the inverse transform section 206, to the prediction sample, which is an input from the prediction control section 220. The adder 208 then outputs the reconstructed block to the block memory 210 and the loop filter 212.

[ Block memory ]

The block memory 210 is a storage unit for storing a block in a picture to be decoded (hereinafter, referred to as a current picture) referred to for intra prediction. Specifically, the block memory 210 stores the reconstructed block output from the adder 208.

[ Loop Filter Unit ]

The loop filter unit 212 applies loop filtering to the block reconstructed by the adder unit 208, and outputs the filtered reconstructed block to the frame memory 214, a display device, and the like.

When the information indicating the on/off of the ALF read from the encoded bit stream indicates the on of the ALF, 1 filter is selected from the plurality of filters based on the direction and activity of the local gradient, and the selected filter is applied to the reconstructed block.

[ frame memory ]

The frame memory 214 is a storage unit for storing reference pictures used for inter-frame prediction, and may be referred to as a frame buffer. Specifically, the frame memory 214 stores the reconstructed block filtered by the loop filter unit 212.

[ prediction processing units (Intra prediction unit, inter prediction unit, prediction control unit) ]

Fig. 43 is a flowchart showing an example of processing performed by the prediction processing unit of the decoding apparatus 200. The prediction processing unit is configured by all or a part of the components of the intra prediction unit 216, the inter prediction unit 218, and the prediction control unit 220.

The prediction processing unit generates a prediction image of the current block (step Sq _ 1). The prediction image is also called a prediction signal or a prediction block. The prediction signal includes, for example, an intra prediction signal or an inter prediction signal. Specifically, the prediction processing unit generates a prediction image of the current block using a reconstructed image obtained by performing generation of a prediction block, generation of a difference block, generation of a coefficient block, restoration of the difference block, and generation of a decoded image block.

The reconstructed image may be, for example, an image of a reference picture, or an image of a decoded block within a current picture that is a picture including the current block. The decoded blocks within the current picture are, for example, neighboring blocks of the current block.

Fig. 44 is a flowchart showing another example of the processing performed by the prediction processing unit of the decoding device 200.

The prediction processing unit determines the mode or mode for generating the prediction image (step Sr _ 1). For example, the mode or pattern may be determined based on, for example, a prediction parameter or the like.

When the 1 st mode is determined as a mode for generating a prediction image, the prediction processing unit generates a prediction image in accordance with the 1 st mode (step Sr _2 a). When the 2 nd method is determined as a mode for generating a prediction image, the prediction processing unit generates a prediction image in accordance with the 2 nd method (step Sr _2 b). When the 3 rd mode is determined as the mode for generating the prediction image, the prediction processing unit generates the prediction image according to the 3 rd mode (step Sr _2 c).

The 1 st, 2 nd, and 3 rd aspects may be different aspects for generating predicted images, and may be, for example, inter prediction, intra prediction, or other prediction. In these prediction methods, the above-described reconstructed image may be used.

[ Intra prediction Unit ]

The intra prediction unit 216 generates a prediction signal (intra prediction signal) by performing intra prediction with reference to a block in the current picture stored in the block memory 210 based on the intra prediction mode read from the coded bit stream. Specifically, the intra prediction unit 216 performs intra prediction by referring to samples (for example, luminance values and color difference values) of blocks adjacent to the current block, generates an intra prediction signal, and outputs the intra prediction signal to the prediction control unit 220.

In addition, when the intra prediction mode of the reference luminance block is selected in the intra prediction of the color difference block, the intra prediction unit 216 may predict the color difference component of the current block based on the luminance component of the current block.

In addition, when the information read out from the encoded bit stream indicates the application of PDPC, the intra prediction unit 216 corrects the pixel value after intra prediction based on the gradient of the reference pixel in the horizontal/vertical direction.

[ interframe prediction part ]

The inter prediction unit 218 predicts the current block with reference to the reference picture stored in the frame memory 214. Prediction is performed in units of a current block or a subblock (e.g., a 4 × 4 block) within the current block. For example, the inter prediction unit 218 performs motion compensation using motion information (e.g., a motion vector) read from the encoded bitstream (e.g., the prediction parameters output from the entropy decoding unit 202), generates an inter prediction signal of the current block or sub-block, and outputs the inter prediction signal to the prediction control unit 220.

When the information read from the encoded bit stream indicates that the OBMC mode is applied, the inter prediction unit 218 generates an inter prediction signal using not only the motion information of the current block obtained by the motion search but also the motion information of the neighboring block.

When the information read from the coded bit stream indicates that the FRUC mode is applied, the inter-frame prediction unit 218 performs motion search by a pattern matching method (bidirectional matching or template matching) read from the coded bit stream, thereby deriving motion information. Then, the inter prediction unit 218 performs motion compensation (prediction) using the derived motion information.

When the BIO mode is applied, the inter-frame prediction unit 218 derives a motion vector based on a model assuming constant-velocity linear motion. Further, in the case where the information read out from the encoded bitstream indicates that the affine motion compensation prediction mode is applied, the inter prediction section 218 derives a motion vector in sub-block units based on the motion vectors of a plurality of adjacent blocks.

[ MV derivation > common interframe mode ]

When the information read from the coded bit stream indicates that the normal inter mode is applied, the inter prediction unit 218 derives an MV based on the information read from the coded bit stream, and performs motion compensation (prediction) using the MV.

Fig. 45 is a flowchart showing an example of inter prediction in the normal inter mode in decoding apparatus 200.

The inter prediction unit 218 of the decoding device 200 performs motion compensation on each block. The inter prediction unit 218 acquires a plurality of candidate MVs for the current block based on information such as MVs of a plurality of decoded blocks located in the vicinity of the current block temporally or spatially (step Ss _ 1). That is, the inter prediction unit 218 creates a candidate MV list.

Next, the inter prediction unit 218 extracts N (N is an integer equal to or greater than 2) candidate MVs from among the plurality of candidate MVs acquired at step Ss _1 as predicted motion vector candidates (also referred to as predicted MV candidates) in accordance with a predetermined priority order (step Ss _ 2). The priority order may be set in advance for each of the N predicted MV candidates.

Next, the inter prediction unit 218 decodes the predicted motion vector selection information from the input stream (i.e., the encoded bit stream), and selects 1 predicted MV candidate from the N predicted MV candidates as the predicted motion vector (also referred to as predicted MV) of the current block using the decoded predicted motion vector selection information (step Ss _ 3).

Next, the inter prediction unit 218 decodes the difference MV from the input stream, and adds the difference value, which is the decoded difference MV, to the selected prediction motion vector to derive the MV of the current block (step Ss _ 4).

Finally, the inter prediction unit 218 performs motion compensation on the current block using the derived MV and the decoded reference picture, thereby generating a prediction image of the current block (step Ss _ 5).

[ prediction control section ]

The prediction control unit 220 selects either one of the intra prediction signal and the inter prediction signal, and outputs the selected signal to the adder 208 as a prediction signal. In general, the configurations, functions, and processes of the prediction control unit 220, the intra prediction unit 216, and the inter prediction unit 218 on the decoding apparatus side may correspond to the configurations, functions, and processes of the prediction control unit 128, the intra prediction unit 124, and the inter prediction unit 126 on the encoding apparatus side.

[ mounting example of decoding device ]

Fig. 46 is a block diagram showing an example of mounting the decoding apparatus 200. The decoding device 200 includes a processor b1 and a memory b 2. For example, a plurality of components of the decoding device 200 shown in fig. 41 are mounted via the processor b1 and the memory b2 shown in fig. 46.

The processor b1 is a circuit that performs information processing and is accessible to the memory b 2. For example, the processor b1 is a dedicated or general-purpose electronic circuit that decodes an encoded moving image (i.e., an encoded bitstream). The processor b1 may be a CPU. The processor b1 may be an aggregate of a plurality of electronic circuits. For example, the processor b1 may function as a plurality of components among the plurality of components of the decoding apparatus 200 shown in fig. 41 and the like.

The memory b2 is a dedicated or general purpose memory that stores information used by the processor b1 to decode an encoded bitstream. The memory b2 may be an electronic circuit or may be connected to the processor b 1. In addition, the memory b2 may also be included in the processor b 1. The memory b2 may be an aggregate of a plurality of electronic circuits. The memory b2 may be a magnetic disk, an optical disk, or the like, and may be represented as a storage device, a recording medium, or the like. The memory b2 may be a nonvolatile memory or a volatile memory.

For example, the memory b2 may store a moving picture or a coded bit stream. In addition, in the memory b2, a program for the processor b1 to decode the encoded bitstream may be stored.

For example, the memory b2 may function as a component for storing information among a plurality of components of the decoding device 200 shown in fig. 41 and the like. Specifically, the memory b2 may function as the block memory 210 and the frame memory 214 shown in fig. 41. More specifically, in the memory b2, a reconstructed block, a reconstructed picture, and the like may be stored.

In the decoding device 200, all of the plurality of components shown in fig. 41 and the like may not be mounted, and all of the plurality of processes described above may not be performed. Some of the plurality of components shown in fig. 41 and the like may be included in another device, and some of the plurality of processes described above may be executed by another device.

[ definitions of terms ]

The terms may be defined as follows, for example.

A picture is an arrangement of multiple luminance samples in a monochrome format, or 4: 2: 0. 4: 2: 2 and 4: 4: 4 and 2 corresponding color difference samples in the color format. Pictures may also be frames or fields.

A frame is a composition of the top half frame occurring for a plurality of sample lines 0, 2, 4, … and the bottom half frame occurring for a plurality of sample lines 1, 3, 5, ….

A slice is an integer number of coding tree units contained in 1 independent slice and all dependent slices (if any) that precede the next independent slice (if any) in the same access unit.

A tile is a rectangular area of multiple coding tree blocks within a particular column of tiles and a particular row of tiles in a picture. Tiles may still apply loop filtering across the edges of the tile, but may also be rectangular areas in the frame that are intended to be able to be decoded and encoded independently.

A block is an M × N (N rows and M columns) arrangement of a plurality of samples or an M × N arrangement of a plurality of transform coefficients. The block may be a square or rectangular region of a plurality of pixels formed of a plurality of matrices of 1 luminance and 2 color differences.

The CTU (coding tree unit) may be a coding tree block having a plurality of luminance samples of a picture in which 3 samples are arranged, or a plurality of coding tree blocks corresponding to 2 color difference samples. Alternatively, the CTU may be a multi-sample coding tree block of a picture, which is a monochrome picture or a picture that is coded by constructing a syntax used in coding of 3 separate color planes and a plurality of samples.

The super block may also constitute 1 or 2 blocks of mode information, or a 64 x 64 pixel square block that can be further divided recursively into 4 32 x 32 blocks.

[ details of entropy encoding section of encoding apparatus ]

In the present embodiment, the CABAC skip mode can be applied. The CABAC skip mode may also be expressed as an arithmetic coding skip mode or an arithmetic decoding skip mode.

Fig. 47 is a block diagram showing a detailed functional configuration of the entropy encoding unit 110 of the encoding device 100 according to the present embodiment. The entropy encoding unit 110 generates a bit sequence by applying variable length coding to coefficient information of an image, and outputs the generated bit sequence. This bit sequence corresponds to the image to be encoded and is also referred to as an encoded signal, an encoded bit stream, or an encoded bit sequence.

In the example of fig. 47, the entropy encoding unit 110 includes a binarization unit 132, a switching unit 134, an intermediate buffer 136, an arithmetic encoding unit 138, a switching unit 140, and a multiplexing unit 142. The entropy coding unit 110 generates a bit sequence, outputs the generated bit sequence, and stores the generated bit sequence in the output buffer 144. The bit sequence held to the output buffer 144 is output from the output buffer 144 as appropriate. The entropy encoding part 110 may also include an output buffer 144.

[ binarization section of entropy encoding section ]

The binarization section 132 binarizes the coefficient and the like. Specifically, the binarization section 132 converts the quantized frequency transform coefficients and the like into a data series of values expressed by, for example, 0 or 1, and outputs the obtained data series. Hereinafter, this data sequence is also referred to as a binary data sequence. Further, the binarization by the binarization section 132 is basically binarization for arithmetic coding, more specifically binarization for binary arithmetic coding. That is, the binarizing unit 132 basically derives a binarized data sequence of image information in accordance with binarization for arithmetic coding.

Further, as the binarization method, there are unitary binarization (truncated binarization), truncated unitary binarization (truncated binarization), unitary/k-th order exponential golomb combination binarization, fixed length binarization, table referencing, and the like.

For example, entropy Coding by a Context-based Adaptive Binary Arithmetic Coding method (Context-based Adaptive Binary Arithmetic Coding) is performed by binarization by the binarizing unit 132 and Arithmetic Coding by the Arithmetic Coding unit 138. The context adaptive binary arithmetic coding scheme is also called CABAC. The binarization by the binarization section 132 may also be expressed as binarization for a context-adaptive binary arithmetic coding scheme.

[ switching part of entropy coding part ]

The switching units 134 and 140 switch whether or not to apply arithmetic coding to the binary data sequence in accordance with the mode information interlocking operation. For example, the switching sections 134 and 140 switch whether or not to apply arithmetic coding to the binarized data sequence in accordance with mode information given from the outside of the encoding device 100. The mode information may be given as an instruction from a user or a higher-level system.

For example, the mode information indicates whether the CABAC skip mode is valid or invalid, i.e., whether the CABAC skip mode is applied. Further, for example, arithmetic coding is applied to the binarized data sequence when the CABAC skip mode is invalid, and arithmetic coding is not applied to the binarized data sequence when the CABAC skip mode is valid.

Specifically, when the CABAC skip mode is invalid, the switching unit 134 outputs the binarized data sequence output from the binarizing unit 132 to the intermediate buffer 136, and stores the binarized data sequence in the intermediate buffer 136. The arithmetic coding unit 138 applies arithmetic coding to the binarized data sequence stored in the intermediate buffer 136 and outputs the binarized data sequence to which the arithmetic coding has been applied. The switching section 140 outputs the binarized data sequence output from the arithmetic coding section 138 to the multiplexing section 142.

On the other hand, when the CABAC skip mode is valid, the switching section 134 outputs the binarized data sequence output from the binarizing section 132 to the switching section 140 as it is. The switching unit 140 then outputs the binary data sequence output from the switching unit 134 to the multiplexing unit 142. That is, arithmetic coding is bypassed (bypass). In order to avoid confusion with bypass arithmetic coding, which is one form of arithmetic coding, arithmetic coding bypass may be expressed as arithmetic coding skipping.

Information indicating whether the CABAC skip mode is valid or invalid is input from the outside of the encoding apparatus 100 as an instruction of a user or an instruction of a higher-level system, for example.

[ intermediate buffer of entropy coding part ]

The intermediate buffer 136 is a storage unit for storing the binary data sequence, and is also called an intermediate memory. A delay occurs in the arithmetic coding by the arithmetic coding unit 138. Further, the delay amount fluctuates according to the content of the binarized data sequence. The intermediate buffer 136 absorbs the fluctuation of the delay amount and smoothly performs the subsequent processing. Data is input to a storage unit such as the intermediate buffer 136, and data is output from the storage unit in response to storing data in the storage unit and data is read from the storage unit.

[ arithmetic coding section of entropy coding section ]

The arithmetic coding unit 138 performs arithmetic coding. Specifically, the arithmetic coding unit 138 reads the binarized data sequence stored in the intermediate buffer 136 and applies arithmetic coding to the binarized data sequence. The arithmetic coding unit 138 may apply arithmetic coding corresponding to the context adaptive binary arithmetic coding scheme to the binarized data sequence.

For example, the arithmetic coding unit 138 selects an occurrence probability of a value according to a context such as a data type, performs arithmetic coding according to the selected occurrence probability, and updates the occurrence probability according to the result of the arithmetic coding. That is, the arithmetic coding unit 138 performs arithmetic coding according to the variable occurrence probability. Arithmetic coding according to a variable probability of occurrence is also referred to as context-adaptive arithmetic coding.

The arithmetic coding unit 138 may perform arithmetic coding with a fixed occurrence probability for a specific data type or the like. Specifically, the arithmetic coding unit 138 may perform arithmetic coding with an occurrence probability of 50% as the occurrence probability of 0 or 1. Arithmetic coding according to a fixed occurrence probability is also called bypass arithmetic coding.

[ multiplexing section of entropy coding section ]

The multiplexing unit 142 multiplexes the mode information indicating whether the CABAC skip mode is valid or invalid and the binarized data sequence that is arithmetically encoded or not arithmetically encoded, and generates a bit sequence including the mode information and the binarized data sequence.

The multiplexer 142 outputs the bit sequence to the output buffer 144, and stores the bit sequence in the output buffer 144. The bit sequence stored in the output buffer 144 is output from the output buffer 144 as appropriate. That is, the multiplexer 142 outputs the bit sequence via the output buffer 144.

For example, mode information indicating whether the CABAC skip mode is valid or invalid may be included in the bit sequence as a parameter of an upper order. Specifically, the mode information may be included in an SPS (sequence parameter set) of the bit sequence, may be included in a PPS (picture parameter set) of the bit sequence, or may be included in a slice header of the bit sequence. The mode information contained in the bit sequence is represented by 1 or more bits.

The binarized data sequence may be included in the slice data. Here, the binarized data sequence may be a binarized data sequence to which arithmetic coding is applied or a binarized data sequence to which arithmetic coding is not applied.

Further, the mode information included in the bit sequence may also be expressed as application information whether arithmetic coding is applied to the binarized data sequence included in the bit sequence. In other words, the mode information may be included in the bit sequence as application information indicating whether arithmetic coding is applied to the binarized data sequence. The application information may indicate whether the bit sequence contains a binarized data sequence to which arithmetic coding is applied, and whether the bit sequence contains a binarized data sequence to which arithmetic coding is not applied.

Further, when the mode information indicating whether the CABAC skip mode is valid or invalid is exchanged between the transmission and reception apparatuses by a higher-level system or is predetermined, the mode information may not be included in the bit sequence. That is, in this case, multiplexing may not be performed.

[ output buffer ]

The output Buffer 144 is a storage unit for storing a bit sequence, and is also called a CPB (Coded Picture Buffer) or an output memory. The bit sequence obtained by encoding the image information by the encoding apparatus 100 is stored in the output buffer 144. The bit sequence stored in the output buffer 144 is appropriately output and multiplexed with, for example, an encoded audio signal.

[ CABAC skip mode of encoding processing ]

For example, in a system in which processing with low delay is desired, the CABAC skip mode is set to be active. This makes it possible to generate a bit sequence without performing arithmetic coding processing and buffering control processing for the arithmetic coding processing, and to perform coding processing with lower delay.

The processing block configuration described with reference to fig. 47 is an example, and other processing block configurations may be used.

[ details of the entropy decoding unit of the decoding apparatus ]

Fig. 48 is a block diagram showing a detailed functional configuration of the entropy decoding unit 202 of the decoding device 200 according to the present embodiment. The entropy decoding unit 202 derives a coefficient and the like by entropy-decoding a bit sequence input via the input buffer 232. This bit sequence is, for example, a bit sequence generated by the encoding device 100 shown in fig. 47, and may have the above-described data structure.

In the example of fig. 48, the entropy decoding unit 202 includes a separating unit 234, a switching unit 236, an arithmetic decoding unit 238, an intermediate buffer 240, a switching unit 242, and an inverse binarization unit 244. The entropy decoding section 202 may also include an input buffer 232.

[ input buffer ]

The input buffer 232 is a storage unit for storing a bit sequence, and is also called a CPB or an input memory. The bit sequence decoded by the decoding device 200 is separated from, for example, an encoded audio signal and stored in the input buffer 232. Then, the decoding apparatus 200 reads the bit sequence stored in the input buffer 232 and decodes the bit sequence.

[ separation section of entropy decoding section ]

The separating unit 234 acquires the bit sequence from the input buffer 232, separates the pattern information and the binarized data sequence from the bit sequence, and outputs the pattern information and the binarized data sequence. That is, the separating unit 234 acquires a bit sequence including the pattern information and the binarized data sequence via the input buffer 232, and outputs the pattern information and the binarized data sequence included in the bit sequence. The binarized data sequence may be a binarized data sequence to which arithmetic coding is applied or a binarized data sequence to which arithmetic coding is not applied.

As described above, the mode information may be represented as application information indicating whether or not arithmetic coding is applied to the binarized data sequence included in the bit sequence. Further, when the pattern information is exchanged by the upper system, when the pattern information is set in advance, or the like, the pattern information may not be included in the bit sequence. In this case, the mode information may not be separated and output. The mode information may be given as an instruction from the outside of the decoding apparatus 200, specifically, from the user, the upper system, or the like.

Further, when the mode information indicating whether the CABAC skip mode is valid or invalid is exchanged between the transmission and reception apparatuses by a higher-level system or is set in advance, the mode information may not be included in the bit sequence. That is, in this case, only the binarized data sequence which is arithmetically encoded or not arithmetically encoded may be output without performing the separation and output of the mode information. The mode information may be given as an instruction from the outside of the decoding apparatus 200, specifically, from the user, the upper system, or the like.

[ switching part of entropy decoding part ]

The switching units 236 and 242 operate in conjunction with each other in accordance with the mode information obtained from the separating unit 234 and the like to switch whether or not arithmetic decoding is applied to the binary data sequence. For example, arithmetic decoding is applied to the binarized data sequence in the case where the CABAC skip mode is invalid, and arithmetic decoding is not applied to the binarized data sequence in the case where the CABAC skip mode is valid.

Specifically, when the CABAC skip mode is invalid, the switching unit 236 outputs the binarized data sequence output from the separating unit 234 to the arithmetic decoding unit 238. The arithmetic decoding unit 238 applies arithmetic decoding to the binarized data sequence, and outputs the binarized data sequence to which arithmetic decoding is applied, thereby storing the binarized data sequence to which arithmetic decoding is applied in the intermediate buffer 240.

The switching unit 242 appropriately acquires the binarized data sequence stored in the intermediate buffer 240, and outputs the binarized data sequence acquired from the intermediate buffer 240 to the inverse binarizing unit 244.

On the other hand, when the CABAC skip mode is active, the switching unit 236 outputs the binarized data sequence output from the separating unit 234 to the switching unit 242 as it is. The switching unit 242 then outputs the binarized data sequence output from the switching unit 236 to the inverse binarizing unit 244. I.e. the arithmetic decoding is bypassed. In order to avoid confusion with bypass arithmetic decoding, which is one form of arithmetic decoding, the arithmetic decoding bypass may be expressed as skipping arithmetic decoding.

[ arithmetic decoding unit of entropy decoding unit ]

The arithmetic decoding unit 238 performs arithmetic decoding. Specifically, the arithmetic decoding section 238 applies arithmetic decoding to the binarized data sequence to which the arithmetic coding has been applied, and outputs the binarized data sequence to which the arithmetic decoding has been applied, thereby storing the binarized data sequence to which the arithmetic decoding has been applied in the intermediate buffer 240. The binarized data sequence to which the arithmetic decoding is applied corresponds to the original binarized data sequence to which the arithmetic coding is not applied. The arithmetic decoding unit 238 may apply arithmetic decoding corresponding to the context adaptive binary arithmetic coding scheme to the binarized data sequence.

For example, the arithmetic decoding unit 238 selects the occurrence probability of the value according to the context such as the data type, performs arithmetic decoding according to the selected occurrence probability, and updates the occurrence probability according to the result of the arithmetic decoding. That is, the arithmetic decoding unit 238 performs arithmetic decoding according to the variable occurrence probability. Arithmetic decoding according to a variable occurrence probability is also referred to as context-adaptive arithmetic decoding.

The arithmetic decoding unit 238 may perform arithmetic decoding on a specific data type or the like according to a fixed occurrence probability. Specifically, the arithmetic decoding unit 238 may perform arithmetic decoding with an occurrence probability of 50% as the occurrence probability of 0 or 1. Arithmetic decoding performed according to a fixed occurrence probability is also referred to as bypass arithmetic decoding.

[ intermediate buffer of entropy decoding section ]

The intermediate buffer 240 is a storage unit for storing the binary data sequence after arithmetic decoding, and is also referred to as an intermediate memory. In the arithmetic decoding performed by the arithmetic decoding unit 238, a delay occurs. Further, the delay amount fluctuates according to the content of the binarized data sequence. The fluctuation of the delay amount is absorbed by the intermediate buffer 240, and the subsequent processing is smoothly performed.

[ inverse binarization section of entropy decoding section ]

The inverse binarization section 244 derives a quantization coefficient and the like by performing inverse binarization on the binarized data sequence. Specifically, the inverse binarization section 244 converts the binarized data series of values expressed by, for example, 0 or 1 into quantized frequency transform coefficients or the like, and outputs the quantized frequency transform coefficients or the like to the inverse quantization section 204. Further, the inverse binarization performed by the inverse binarization section 244 is basically inverse binarization corresponding to binarization for arithmetic coding, more specifically, inverse binarization corresponding to binarization for binary arithmetic coding.

Further, entropy decoding of the context adaptive binary arithmetic coding method is performed by, for example, arithmetic decoding by the arithmetic decoding unit 238 and inverse binarization by the inverse binarization unit 244. That is, the inverse binarization section 244 may perform inverse binarization according to a context-adaptive binary arithmetic coding scheme. In addition, inverse binarization is also called multivalued.

[ CABAC skip mode of decoding processing ]

For example, in a system in which processing with low delay is desired, the CABAC skip mode is set to be active. This makes it possible to decode the bit sequence without performing arithmetic decoding processing and buffering control processing for the arithmetic decoding processing, and to perform decoding processing with a lower delay.

The processing block configuration described with reference to fig. 48 is an example, and other processing block configurations may be used.

[ existence of application of arithmetic encoding and arithmetic decoding ]

The encoding device 100 and the decoding device 200 of the present embodiment are particularly useful for a real-time communication system or the like that requires encoding and decoding in a short time. Specifically, the encoding device 100 and the decoding device 200 are useful for a video conference system, an electronic mirror, and the like. For example, in these system environments, the CABAC skip mode is set to be active.

Further, the basic application information indicates whether arithmetic coding is applied to a binarized data sequence included in a bit sequence inclusively in units including 1 or more slices or 1 or more pictures. The presence or absence of application of arithmetic coding is switched inclusively in units including 1 or more slices or 1 or more pictures.

However, the switching of the presence or absence of application of arithmetic coding may be performed in finer units. For example, arithmetic encoding and arithmetic decoding may be skipped in a specific data type. More specifically, the arithmetic coding and the arithmetic decoding may be skipped instead of bypassing the arithmetic coding and the bypass arithmetic decoding.

Further, for example, switching of context arithmetic coding, bypass arithmetic coding, and skipping of arithmetic coding may be performed. Also, context arithmetic decoding, bypass arithmetic decoding, and skipped switching of arithmetic decoding can be performed.

The application information indicating whether to apply arithmetic coding to the binarized data sequence may be expressed by a flag of 1 bit or in other forms. For example, by adding information indicating that arithmetic coding is applied to the binarized data sequence to the bit sequence, the bit sequence can contain the added information as application information. Alternatively, by adding information indicating that arithmetic coding is not applied to the binarized data sequence to the bit sequence, the bit sequence can contain the added information as application information.

The application information may be included in the bit sequence as information common to other information. For example, when the information indicating the picture type is included in the bit sequence and whether or not the arithmetic coding is applied is switched according to the picture type, the information indicating the picture type may be the application information.

[ switching of syntax Structure ]

The code amount may be largely different depending on whether arithmetic coding and arithmetic decoding are applied. In particular, the amount of information of coefficient information of an image is large. Thus, in the case where arithmetic encoding and arithmetic decoding are not used with respect to the coefficient information, the code amount is likely to become very large.

Therefore, for example, the binarization section 132 of the encoding device 100 binarizes the coefficient information in different binarization forms in the case where the arithmetic coding is applied and the case where the arithmetic coding is not applied. Also, the inverse binarization section 244 of the decoding apparatus 200 performs inverse binarization in different inverse binarization forms with respect to the coefficient information in the case where arithmetic coding is applied and in the case where arithmetic coding is not applied.

The binarization section 132 of the encoding device 100 may be given mode information in the same manner as the switching sections 134 and 140 of the encoding device 100. The binarization section 132 of the encoding device 100 may acquire the given mode information and switch the binarization format of the coefficient information according to the mode information.

Similarly, the inverse binarization section 244 of the decoding apparatus 200 may be given mode information in the same manner as the switching sections 236 and 242 of the decoding apparatus 200. The inverse binarization section 244 of the decoding device 200 may acquire the given mode information and switch the inverse binarization form of the coefficient information according to the mode information.

For example, for coefficient information, different syntax constructions may also be applied in the case where the CABAC skip mode is valid and in the case where the CABAC skip mode is invalid. However, the syntax structure in the case where the CABAC skip mode is valid may be the same as the syntax structure in the case where the CABAC skip mode is invalid and predetermined conditions are satisfied. Thus, an increase in the circuit scale is suppressed.

[ 1 st example of coefficient encoding method ]

Fig. 49 is a diagram for explaining example 1 of the coefficient encoding process according to the present embodiment. For example, the entropy encoding unit 110 of the encoding device 100 performs the operation shown in fig. 49.

In the processing loop (S101 to S108) for each TU (orthogonal transform unit), first, the entropy encoding unit 110 determines whether the CABAC skip mode is valid (S102). The CABAC skip mode being active means that a CABAC skip mode for skipping CABAC processing is selected as the operation mode.

If the CABAC skip mode is not valid (no in S102), the entropy encoding unit 110 determines whether the orthogonal transform skip mode is valid (S103). The fact that the orthogonal transform skip mode is active means that the orthogonal transform skip mode in which the orthogonal transform process is skipped is selected as the operation mode.

If all of the above 2 determinations are false (no in S102 and no in S103), the entropy encoding unit 110 binarizes the coefficients in the 1 st syntax structure (S104). On the other hand, when at least one of the 2 determinations is true (yes in S102 or yes in S103), the entropy encoding unit 110 binarizes the coefficient in the 2 nd syntax structure (S105).

Next, the entropy encoding unit 110 determines whether the CABAC skip mode is valid again (S106). When the CABAC skip mode is not active (no in S106), the entropy encoding unit 110 performs CABAC processing on the binarized data sequence whose coefficients are binarized, and generates a bit sequence (S107). On the other hand, when the CABAC skip mode is active (yes in S106), the entropy encoding unit 110 outputs the binarized data sequence in which the coefficients are binarized as it is as a bit sequence.

The 1 st syntax structure may be a syntax structure in which CABAC processing is assumed. Specifically, the 1 st syntax structure may be a syntax structure having a larger number of binarized syntaxes than the 2 nd syntax structure and having a higher correlation between values in the syntaxes. In addition, the 2 nd syntax structure may be a syntax structure having a lower correlation between values between syntaxes and a smaller number of binarized syntaxes than the 1 st syntax structure.

Here, the coefficient may be a quantized coefficient obtained by orthogonally transforming a prediction residual coefficient of an image and then quantizing the transformed prediction residual coefficient. Alternatively, the coefficients may be quantized coefficients obtained by quantizing only the prediction residual coefficients of the image without performing orthogonal transform. Alternatively, the coefficients may be prediction residual coefficients obtained by performing neither orthogonal transform nor quantization.

Here, a round per TU is used, but the round may be a round such as per CU, per CTU, or per sub-TU obtained by subdividing a TU, instead of per TU.

This process flow is an example, and a part of the described process may be omitted, or an unrecited process or condition determination may be added.

Although the processing flow of the encoding device 100 has been described above, the processing flow described here can be applied to the decoding device 200 by inverting the content of the processing (for example, changing the binarization to inverse binarization and changing the encoding to decoding).

[ Effect of example 1 of coefficient encoding method ]

By using the method described with reference to fig. 49, it is possible to perform binarization in accordance with a common syntax structure when CABAC processing is not performed and when orthogonal transform processing is not performed. Therefore, it is possible to suppress an increase in the amount of the finally generated bit sequence while suppressing an increase in the circuit scale, and to suppress a possibility of a processing delay.

[ 2 nd example of coefficient encoding method ]

Fig. 50 is a diagram for explaining example 2 of the coefficient encoding processing according to the present embodiment. For example, the entropy encoding unit 110 of the encoding device 100 performs the operation shown in fig. 50.

In this example, the processing loop (S201 to S210) for each TU includes processing loops (S202 to S207) for each sub-TU obtained by subdividing the TU. In the processing loop (S202 to S207) for each sub TU, first, the entropy encoding unit 110 determines whether the CABAC skip mode is valid (S203).

When the CABAC skip mode is not valid (no in S203), the entropy encoding unit 110 performs the following determination. Specifically, the entropy encoding unit 110 determines whether or not the number of non-bypass CABAC-processed syntaxes is equal to or greater than a threshold value among syntaxes already encoded in the processing target TU (S204).

If all of the above 2 determinations are false (no in S203 and no in S204), the entropy encoding unit 110 binarizes the coefficients in the 1 st syntax configuration (S205). On the other hand, if at least one of the 2 determinations is true (yes in S203 or yes in S204), the entropy encoding unit 110 binarizes the coefficient in the 2 nd syntax structure (S206).

Next, the entropy encoding unit 110 determines again whether the CABAC skip mode is valid (S208). When the CABAC skip mode is not active (no in S208), the entropy encoding unit 110 performs CABAC processing on the binarized data sequence whose coefficients are binarized, and generates a bit sequence (S209). On the other hand, when the CABAC skip mode is active (yes in S208), the entropy encoding unit 110 outputs the binarized data sequence in which the coefficients are binarized as it is as a bit sequence.

The 1 st syntax structure may be a syntax structure in which CABAC processing is assumed. Specifically, the 1 st syntax structure may be a syntax structure having a larger number of binarized syntaxes than the 2 nd syntax structure and having a higher correlation between values in the syntaxes. In addition, the 2 nd syntax structure may be a syntax structure having a lower correlation between values between syntaxes and a smaller number of binarized syntaxes than the 1 st syntax structure.

Here, the coefficient may be a quantized coefficient obtained by orthogonally transforming a prediction residual coefficient of an image and then quantizing the transformed prediction residual coefficient. Alternatively, the coefficients may be quantized coefficients obtained by quantizing only the prediction residual coefficients of the image without performing orthogonal transform. Alternatively, the coefficients may be prediction residual coefficients obtained by performing neither orthogonal transform nor quantization.

Here, a round per TU is used, but the round may be a round per CU, CTU, or the like, instead of per TU. Here, a loop for each sub TU is used, but the loop may be a loop for each coefficient, each coded syntax, or the like, instead of each sub TU.

This process flow is an example, and a part of the described process may be omitted, or an unrecited process or condition determination may be added.

Although the processing flow of the encoding device 100 has been described above, the processing flow described here can be applied to the decoding device 200 by inverting the content of the processing (for example, changing the binarization to inverse binarization and changing the encoding to decoding).

[ Effect of example 2 of coefficient encoding method ]

By using the method described with reference to fig. 50, when CABAC processing is not performed and when the number of syntaxes subjected to non-bypass CABAC processing is equal to or greater than a threshold value, binarization can be performed in accordance with a common syntax structure. Therefore, it is possible to suppress an increase in the amount of the finally generated bit sequence while suppressing an increase in the circuit scale, and to suppress a possibility of a processing delay.

[ variation of coefficient encoding method ]

The method described with reference to fig. 49 corresponds to an example in which binarization is performed in accordance with a common syntax structure when the CABAC processing skip mode is active and when the orthogonal transform skip mode is active. The method described with reference to fig. 50 corresponds to an example in which binarization is performed in accordance with a common syntax structure when the CABAC processing skip mode is active and when the number of syntaxes subjected to the non-bypass CABAC processing is equal to or greater than a threshold value.

However, the predetermined conditions for using the common syntax structure are not limited to these. When other predetermined conditions are satisfied and when the CABAC processing skip mode is enabled, binarization may be performed according to a common syntax structure. In addition, predetermined conditions may be set in advance.

In addition, a completely common syntax structure may not be used. Only a part of the syntax structure may be common, and the other part may be different depending on the respective conditions. For example, in a common syntax structure, syntax elements may be partially different depending on conditions.

In addition, not only two types of syntax structures but also 3 or more types of syntax structures may be switched by combining conditions. Specifically, the 3 rd syntax structure may be used in addition to the 1 st syntax structure and the 2 nd syntax structure described above.

For example, in the above example, when the CABAC processing skip mode is invalid and the condition for using the 2 nd syntax structure is not satisfied, the 1 st syntax structure is used. However, in this case, the 3 rd syntax structure may be used if the condition for using the 3 rd syntax structure is also satisfied, and the 1 st syntax structure may be used if the condition for using the 3 rd syntax structure is not satisfied.

That is, in the present embodiment, when the CABAC processing skip mode is invalid and the condition for using the 2 nd syntax structure is not satisfied, the 1 st syntax structure is not always used, and the 3 rd syntax structure may be allowed to be used.

[ typical example of composition and treatment ]

A typical example of the configuration and processing of the encoding device 100 and the decoding device 200 described above is shown below.

Fig. 51 is a flowchart showing the operation of the encoding device 100. For example, the encoding device 100 includes a circuit and a memory connected to the circuit. The circuits and memories included in the coding device 100 may correspond to the processor a1 and the memory a2 shown in fig. 40. The circuit of the encoding device 100 performs the operation shown in fig. 51. Specifically, the circuit of the encoding device 100 encodes an image in operation (S301). The circuit of the encoding apparatus 100 may encode an image for each block.

Fig. 52 is a flowchart showing a specific example of the encoding operation (S301) shown in fig. 51. For example, the circuit of the encoding device 100 performs the operation shown in fig. 52 in encoding an image (S301).

Specifically, the circuit of the encoding device 100 binarizes the coefficient information of the image (S311). Also, the circuit of the encoding device 100 controls whether to apply arithmetic coding to the binarized data sequence in which the coefficient information is binarized (S312).

That is, the circuit of the encoding device 100 determines whether to apply arithmetic coding to the binarized data sequence in which the coefficient information is binarized. Here, in a case where it is determined that the arithmetic coding is applied, the circuit of the encoding device 100 applies the arithmetic coding to the binarized data sequence. On the other hand, in the case where it is determined that the arithmetic coding is not applied, the circuit of the encoding device 100 does not apply the arithmetic coding to the binarized data sequence.

The circuit of the encoding device 100 outputs a bit sequence including a binarized data sequence to which arithmetic coding is applied or not applied (S313).

For example, when it is determined that arithmetic coding is applied and arithmetic coding is applied to the binarized data sequence, the circuit of the encoding apparatus 100 outputs a bit sequence including the binarized data sequence to which arithmetic coding is applied. On the other hand, in the case where it is determined that arithmetic coding is not applied and arithmetic coding is not applied to the binarized data sequence, the circuit of the encoding apparatus 100 outputs a bit sequence including the binarized data sequence to which arithmetic coding is not applied.

Fig. 53 is a flowchart showing a specific example of the binarization operation (S311) shown in fig. 52. For example, the circuit of the encoding device 100 performs the operation shown in fig. 53 in the binarization of the coefficient information (S311).

Specifically, the circuit of the encoding device 100 binarizes the coefficient information in the 1 st syntax structure or the 2 nd syntax structure according to whether arithmetic coding is applied to the binarized data series (S321) and whether a predetermined condition is satisfied (S322). For example, when arithmetic coding is applied to the binarized data sequence and a predetermined condition is not satisfied (yes in S321 and no in S322), the circuit of the coding apparatus 100 binarizes the coefficient information in accordance with the 1 st syntax structure (S323).

When arithmetic coding is applied to the binarized data sequence and a predetermined condition is satisfied (yes in S321 and yes in S322), the circuit of the coding apparatus 100 binarizes the coefficient information in accordance with the 2 nd syntax structure (S324). Here, the 2 nd syntax configuration is different from the 1 st syntax configuration. Further, in the case where arithmetic coding is not applied to the binarized data series (no in S321), the circuit of the encoding device 100 binarizes the coefficient information in accordance with the 2 nd syntax configuration (S324).

This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition may be a condition under which the orthogonal transform process is skipped when the coefficient information is derived from the prediction residual of the image. This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition that orthogonal transform processing is skipped is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition may be a condition that the number of syntax elements subjected to encoding processing in a mode different from the bypass mode according to CABAC in a region including the processing target block in the image is equal to or greater than a threshold value.

This makes it possible to make the syntax structure common when arithmetic coding is not applied and when a predetermined condition that the number of syntaxes of non-bypass CABAC is equal to or greater than a threshold is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

Further, the bit sequence may indicate whether the application of arithmetic coding is valid in a sequence parameter set, a picture parameter set, or a slice header, for example.

Thus, the encoding device 100 can switch whether or not the application of arithmetic coding is valid in the sequence parameter set, the picture parameter set, or the slice header. Therefore, the encoding device 100 can suppress frequent switching such as switching for each data type as to whether or not arithmetic coding is applied. This can suppress an increase in the code amount and suppress processing delay.

For example, the circuit of the encoding device 100 may switch whether or not arithmetic coding is applied, in units including 1 or more slices or 1 or more pictures, inclusively. This enables the encoding device 100 to inclusively switch whether or not arithmetic coding is applied in a large unit. Therefore, the encoding device 100 can suppress frequent switching such as switching for each data type as to whether or not arithmetic coding is applied. This can suppress an increase in the code amount and suppress processing delay.

The above-described operation performed by the circuit of the encoding apparatus 100 may be performed by the entropy encoding unit 110 of the encoding apparatus 100.

Fig. 54 is a flowchart showing the operation of decoding apparatus 200. For example, the decoding device 200 includes a circuit and a memory connected to the circuit. The circuits and memories included in the decoding device 200 may correspond to the processor b1 and the memory b2 shown in fig. 46. The circuit of decoding apparatus 200 performs the operation shown in fig. 54. Specifically, the circuit of the decoding device 200 decodes the image in operation (S401). The circuit of the decoding apparatus 200 may decode the image for each block.

Fig. 55 is a flowchart showing a specific example of the decoding operation (S401) shown in fig. 54. For example, the circuit of the decoding device 200 performs the operation shown in fig. 55 in decoding the image (S401).

Specifically, the circuit of the decoding device 200 acquires a bit sequence including a binarized data sequence in which the coefficient information of the image is binarized (S411). Also, the circuit of the decoding apparatus 200 controls whether or not arithmetic decoding is applied to the binarized data sequence (S412).

That is, the circuit of the decoding device 200 determines whether to apply arithmetic decoding to the binarized data sequence included in the bit sequence. Here, in the case where it is determined that the arithmetic decoding is applied, the circuit of the decoding apparatus 200 applies the arithmetic decoding to the binarized data sequence. On the other hand, in the case where it is determined that the arithmetic decoding is not applied, the circuit of the decoding apparatus 200 does not apply the arithmetic decoding to the binarized data sequence.

Also, the circuit of the decoding apparatus 200 inversely binarizes the binarized data sequence to which arithmetic decoding is applied or not applied (S413).

For example, when it is determined that arithmetic decoding is applied and arithmetic decoding is applied to the binarized data sequence, the circuit of the decoding apparatus 200 inversely binarizes the binarized data sequence to which arithmetic decoding is applied. On the other hand, in the case where it is determined that the arithmetic decoding is not applied and the arithmetic decoding is not applied to the binarized data sequence, the circuit of the decoding apparatus 200 inversely binarizes the binarized data sequence to which the arithmetic decoding is not applied.

Fig. 56 is a flowchart showing a specific example of the inverse binarization operation (S413) shown in fig. 55. For example, the circuit of the decoding device 200 performs the operation shown in fig. 56 in the inverse binarization of the binarized data sequence (S413).

Specifically, the circuit of the decoding device 200 inversely binarizes the binarized data sequence in the 1 st syntax structure or the 2 nd syntax structure according to whether arithmetic decoding is applied to the binarized data sequence (S421) and whether a predetermined condition is satisfied (S422). For example, when arithmetic decoding is applied to the binarized data sequence and a predetermined condition is not satisfied (yes in S421 and no in S422), the circuit of the decoding apparatus 200 inversely binarizes the binarized data sequence in accordance with the 1 st syntax structure (S423).

When arithmetic decoding is applied to the binarized data sequence and a predetermined condition is satisfied (yes in S421 and yes in S422), the circuit of the decoding device 200 inversely binarizes the binarized data sequence in accordance with the 2 nd syntax structure (S424). Here, the 2 nd syntax configuration is different from the 1 st syntax configuration. Further, in the case where arithmetic decoding is not applied to the binarized data sequence (no in S421), the circuit of the decoding apparatus 200 inversely binarizes the binarized data sequence in accordance with the 2 nd syntax structure (S424).

This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition may be a condition under which the inverse orthogonal transform process is skipped when deriving the prediction residual of the image from the coefficient information. This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition that the inverse orthogonal transform process is skipped is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

For example, the predetermined condition may be a condition that the number of syntax elements for performing decoding processing in a mode different from the bypass mode according to CABAC in a region including the processing target block in the image is equal to or greater than a threshold value.

This makes it possible to make the syntax structure common when arithmetic decoding is not applied and when a predetermined condition that the number of syntaxes of non-bypass CABAC is equal to or greater than a threshold is satisfied. Thus, it is possible to suppress an increase in the amount of code and suppress processing delay while suppressing an increase in the circuit scale.

Further, the bit sequence may indicate whether or not the application of arithmetic decoding is valid in a sequence parameter set, a picture parameter set, or a slice header, for example.

Thus, the decoding apparatus 200 can switch whether or not the application of arithmetic decoding is valid in the sequence parameter set, the picture parameter set, or the slice header. Therefore, the decoding apparatus 200 can suppress frequent switching such as switching according to the data type, depending on whether arithmetic decoding is applied. This can suppress an increase in the code amount and suppress processing delay.

For example, the circuit of the decoding device 200 may switch whether or not arithmetic decoding is applied, in units including 1 or more slices or 1 or more pictures, inclusively. This enables the decoding device 200 to inclusively switch whether or not arithmetic decoding is applied in a large unit. Therefore, the decoding device 200 can suppress frequent switching such as switching for each data type depending on whether arithmetic decoding is applied. This can suppress an increase in the code amount and suppress processing delay.

The above-described operations performed by the circuit of the decoding apparatus 200 may be performed by the entropy decoding unit 202 of the decoding apparatus 200.

[ other examples ]

The encoding device 100 and the decoding device 200 of each of the above examples may be used as an image encoding device and an image decoding device, respectively, or may be used as a moving image encoding device and a moving image decoding device.

Further, the encoding device 100 and the decoding device 200 may perform only a part of the above-described operations, and the other device may perform the other operation. The encoding device 100 and the decoding device 200 may include only some of the above-described plurality of components, and other devices may include other components.

At least a part of the above examples may be used as an encoding method or a decoding method, may be used as a binarization method or an inverse binarization method, or may be used as another method.

In the above examples, the coefficient information of the image is subjected to the processing such as binarization, inverse binarization, encoding, and decoding, but the processing is not limited to the coefficient information, and the processing may be performed on image information including other information of the image.

Each component may be configured by dedicated hardware, or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory.

Specifically, each of the encoding device 100 and the decoding device 200 may include a Processing circuit (Processing circuit) and a Storage device (Storage) electrically connected to the Processing circuit and accessible from the Processing circuit. For example, the processing circuit corresponds to the processor a1 or b1, and the storage device corresponds to the memory a2 or b 2.

The processing circuit includes at least one of dedicated hardware and a program execution unit, and executes processing using the storage device. In addition, in the case where the processing circuit includes a program execution unit, the storage device stores a software program executed by the program execution unit.

Here, software for realizing the above-described encoding device 100, decoding device 200, and the like is a program as follows.

For example, the program may also cause a computer to execute an encoding method that encodes an image; in the encoding of the image, the coefficient information of the image is binarized; controlling whether to apply arithmetic coding to the binarized data sequence in which the coefficient information is binarized; outputting a bit sequence including the binarized data sequence to which arithmetic coding is applied or to which arithmetic coding is not applied; in the binarization of the coefficient information, the coefficient information is binarized in accordance with a1 st syntax structure when arithmetic coding is applied to the binarized data sequence and a predetermined condition is not satisfied; applying arithmetic coding to the binarized data sequence and, when the predetermined condition is satisfied, binarizing the coefficient information in a2 nd syntax structure different from the 1 st syntax structure; the coefficient information is binarized according to the 2 nd syntax structure without applying arithmetic coding to the binarized data sequence.

Further, for example, the program may cause a computer to execute a decoding method of decoding an image; in the decoding of the image, acquiring a bit sequence including a binarized data sequence in which coefficient information of the image is binarized; controlling whether to apply arithmetic decoding to the binarized data sequence; inversely binarizing the binarized data sequence to which arithmetic decoding is applied or not applied; in the inverse binarization of the binarized data sequence, when arithmetic decoding is applied to the binarized data sequence and a predetermined condition is not satisfied, the binarized data sequence is inverse binarized according to a1 st syntax structure; applying arithmetic decoding to the binarized data sequence and, when the predetermined condition is satisfied, inverse binarization to the binarized data sequence in accordance with a2 nd syntax structure different from the 1 st syntax structure; the binarized data sequence is constructed as inverse binarization according to the syntax 2 without applying arithmetic decoding to the binarized data sequence.

As described above, each component may be a circuit. These circuits may constitute 1 circuit as a whole, or may be different circuits. Each component may be realized by a general-purpose processor or may be realized by a dedicated processor.

The processing executed by a specific component may be executed by another component. The order of executing the processes may be changed, or a plurality of processes may be executed in parallel. The encoding and decoding device may include the encoding device 100 and the decoding device 200.

The ordinal numbers used in the description such as 1 st and 2 nd can be appropriately replaced. Further, the number of the constituent elements may be newly added or removed.

Although the technical solutions of the encoding device 100 and the decoding device 200 have been described above based on a plurality of examples, the technical solutions of the encoding device 100 and the decoding device 200 are not limited to these examples. The embodiments may be modified as long as they do not depart from the gist of the present invention, or may be constructed by combining components of different examples, which are assumed by those skilled in the art.

The present invention may be implemented by combining at least 1 or more of the technical means disclosed herein with at least a part of the other technical means of the present invention. Further, a part of the processing, a part of the configuration of the apparatus, a part of the syntax, and the like described in the flowcharts of 1 or more of the technical means disclosed herein may be combined with other technical means to be implemented.

[ implementation and application ]

In the above embodiments, the functional or functional blocks can be realized by an MPU (micro processing unit), a memory, and the like in general. The respective processing of the functional blocks may be realized as a program execution unit such as a processor that reads out and executes software (program) recorded in a recording medium such as a ROM. The software may also be distributed. The software may be recorded in a recording medium such as a semiconductor memory. Each functional block may be realized by hardware (dedicated circuit). Various combinations of hardware and software may be employed.

The processing described in each embodiment may be realized by centralized processing using a single apparatus (system), or may be realized by distributed processing using a plurality of apparatuses. The processor that executes the program may be single or plural. That is, the collective processing may be performed or the distributed processing may be performed.

The technical solution of the present invention is not limited to the above embodiment, and various modifications can be made, and they are also included in the technical solution of the present invention.

Further, an application example of the moving image encoding method (image encoding method) or the moving image decoding method (image decoding method) described in each of the above embodiments and various systems for implementing the application example will be described. Such a system may be characterized by having an image encoding device using an image encoding method, an image decoding device using an image decoding method, or an image encoding/decoding device including both devices. Other configurations of such a system can be changed as appropriate depending on the case.

[ use example ]

Fig. 57 is a diagram showing the overall configuration of a content providing system ex100 suitable for realizing a content distribution service. The area where the communication service is provided is divided into desired sizes, and base stations ex106, ex107, ex108, ex109, and ex110 of the fixed radio station, which are shown as an example, are installed in the respective cells.

In the content providing system ex100, devices such as a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, and a smart phone ex115 are connected to the internet ex101 via the internet service provider ex102, the communication network ex104, and the base stations ex106 to ex 110. The content providing system ex100 may be connected by combining some of the above-described devices. In various implementations, the devices may be directly or indirectly connected to each other via a telephone network, short-range wireless, or the like, without passing through the base stations ex106 to ex 110. Furthermore, the streaming server ex103 may be connected to the respective devices such as the computer ex111, the game machine ex112, the camera ex113, the home appliance ex114, and the smartphone ex115 via the internet ex101 and the like. The streaming server ex103 is connected to a terminal or the like in a hot spot in the airplane ex117 via the satellite ex 116.

Instead of the base stations ex106 to ex110, a wireless access point, a hot spot, or the like may be used. The streaming server ex103 may be directly connected to the communication network ex104 without via the internet ex101 or the internet service provider ex102, or may be directly connected to the airplane ex117 without via the satellite ex 116.

The camera ex113 is a device capable of taking still images and moving images such as a digital camera. The smart phone ex115 is a smart phone, a mobile phone, a PHS (Personal handyphone System), or the like that is compatible with a mobile communication System called 2G, 3G, 3.9G, 4G, or 5G in the future.

The home appliance ex114 is a refrigerator, a device included in a household fuel cell cogeneration (cogeneration) system, or the like.

In the content providing system ex100, a terminal having a camera function is connected to the streaming server ex103 via the base station ex106 or the like, thereby enabling live distribution or the like. In live distribution, the terminals (such as the terminals in the computer ex111, the game machine ex112, the camera ex113, the home appliance ex114, the smart phone ex115, and the airplane ex 117) may perform the encoding process described in the above embodiments on the still image or moving image content photographed by the user using the terminals, may multiplex video data obtained by encoding with audio data obtained by encoding audio corresponding to the video, and may transmit the obtained data to the streaming server ex 103. That is, each terminal functions as an image encoding device according to an aspect of the present invention.

On the other hand, the streaming server ex103 performs streaming distribution of the content data transmitted to the client having the request. The client is a terminal or the like in the computer ex111, the game machine ex112, the camera ex113, the home appliance ex114, the smart phone ex115, or the airplane ex117, which can decode the data subjected to the encoding processing. Each device that receives the distributed data may decode and reproduce the received data. That is, each device may function as an image decoding apparatus according to an aspect of the present invention.

[ Dispersion treatment ]

The streaming server ex103 may be a plurality of servers or a plurality of computers, and may distribute data by distributed processing or recording. For example, the streaming server ex103 may be implemented by a CDN (content Delivery Network), and content Delivery is implemented by a Network that connects a plurality of edge servers distributed in the world and edge servers. In a CDN, edge servers that are physically close may be dynamically allocated according to clients. Furthermore, by caching and distributing the content to the edge server, latency can be reduced. Further, when some type of error occurs or when the communication state changes due to an increase in traffic or the like, the processing can be distributed by a plurality of edge servers, or the distribution can be continued by switching the distribution subject to another edge server or bypassing the network in which the failure has occurred.

Further, the data encoding process for the photographed data may be performed by each terminal, may be performed on the server side, or may be performed by sharing the data with each other, instead of being performed in the distributed process of the distribution itself. As an example, the encoding process is generally performed 2 processing cycles. The complexity or code amount of an image in a frame or scene unit is detected in the loop of 1 st time. In addition, in the 2 nd cycle, a process for improving the encoding efficiency while maintaining the image quality is performed. For example, the terminal performs the 1 st encoding process and the server side that receives the content performs the 2 nd encoding process, thereby reducing the processing load on each terminal and improving the quality and efficiency of the content. In this case, if there is a request for receiving and decoding in substantially real time, the first encoded data by the terminal can be received and reproduced by another terminal, and therefore, more flexible real-time distribution is possible.

As another example, the camera ex113 or the like extracts feature amounts (amounts of features or characteristics) from an image, compresses data on the feature amounts as metadata, and transmits the compressed data to the server. The server switches quantization accuracy and the like according to the importance of the feature amount determination target, for example, and performs compression according to the meaning of the image (or the importance of the content). The feature data is particularly effective for improving the accuracy and efficiency of motion vector prediction at the time of recompression in the server. Further, the terminal may perform simple coding such as VLC (variable length coding), and the server may perform coding with a large processing load such as CABAC (context adaptive binary arithmetic coding).

As still another example, in a stadium, a shopping mall, a factory, or the like, there is a case where a plurality of terminals photograph a plurality of pieces of video data of substantially the same scene. In this case, a plurality of terminals that have performed image capturing and other terminals and servers that have not performed image capturing are used as necessary, and the encoding process is assigned and distributed in units of GOPs (Group of pictures), pictures, tiles that divide pictures, or the like. Thus, delay can be reduced and real-time performance can be improved.

Further, since the plurality of pieces of video data are substantially the same scene, the server may manage and/or instruct the plurality of pieces of video data so as to refer to the pieces of video data photographed by the respective terminals. The server may receive encoded data from each terminal, change the reference relationship among a plurality of data, or re-encode the picture itself by correcting or replacing the picture. Thus, a stream having improved quality and efficiency of one data can be generated.

Further, the server may perform transcoding for changing the encoding method of the video data and then distribute the video data. For example, the server may convert the MPEG encoding system into VP (for example, VP9), or may convert h.264 into h.265.

In this way, the encoding process can be performed by the terminal or 1 or more servers. Thus, the following description of "server" or "terminal" is used as a main body for performing the processing, but a part or all of the processing performed by the server may be performed by the terminal, or a part or all of the processing performed by the terminal may be performed by the server. The same applies to the decoding process.

[3D, Multi-Angle ]

There are increasing cases where different scenes photographed by a plurality of terminals such as the camera ex113 and the smartphone ex115 that are substantially synchronized with each other, or images or videos photographed at different angles from the same scene are combined and used. The images photographed by the respective terminals may be merged based on a relative positional relationship between the terminals acquired separately, or areas where feature points included in the images coincide with each other.

The server may encode a still image based on scene analysis of a moving image or the like automatically or at a time designated by a user and transmit the encoded still image to the receiving terminal, instead of encoding a two-dimensional moving image. When the relative positional relationship between the imaging terminals can be acquired, the server can generate a three-dimensional shape of the scene based on images captured from different angles, not only on a two-dimensional moving image, but also on the same scene. The server may encode three-dimensional data generated by point cloud (point cloud) or the like separately, or may select or reconstruct images from among images photographed by a plurality of terminals based on the result of recognizing or tracking a person or an object using the three-dimensional data, and generate an image to be transmitted to the receiving terminal.

In this way, the user can enjoy a scene by arbitrarily selecting each video corresponding to each photographing terminal, and can also enjoy the contents of a video from which a selected viewpoint is cut out from three-dimensional data reconstructed using a plurality of images or videos. Further, it is also possible to collect audio from a plurality of different angles together with video, and the server may multiplex the audio from a specific angle or space with the corresponding video and transmit the multiplexed video and audio.

In recent years, contents such as Virtual Reality (VR) and Augmented Reality (AR) that correspond to the real world and the Virtual world have become widespread. In the case of VR images, the server creates viewpoint images for the right eye and the left eye, respectively, and may perform encoding allowing reference between the viewpoint images by Multi-View Coding (MVC) or the like, or may encode the viewpoint images as different streams without referring to each other. Upon decoding of different streams, they can be reproduced in synchronization with each other according to the viewpoint of a user to reproduce a virtual three-dimensional space.

In the case of an AR image, the server may superimpose virtual object information on the virtual space on the camera information of the real space based on the three-dimensional position or the movement of the viewpoint of the user. The decoding device may acquire or hold the virtual object information and the three-dimensional data, generate a two-dimensional image from the movement of the viewpoint of the user, and create superimposed data by smoothly connecting the two-dimensional image and the three-dimensional data. Alternatively, the decoding device may transmit the movement of the viewpoint of the user to the server in addition to the request of the virtual object information. The server may create superimposed data in accordance with the received motion of the viewpoint from the three-dimensional data held in the server, encode the superimposed data, and distribute the encoded superimposed data to the decoding device. The superimposition data typically has an α value indicating transparency in addition to RGB, and the server may set the α value of a portion other than the object created from the three-dimensional data to 0, for example, and encode the portion in a state where the portion is transparent. Alternatively, the server may set RGB values of predetermined values as a background, such as a chroma key, and generate data in which the portion other than the object is a background color.

Similarly, the decoding process of the distributed data may be performed by the client (for example, a terminal), may be performed by the server, or may be performed by sharing the data with each other. For example, a certain terminal may transmit a transmission/reception request to a server, receive a content corresponding to the request from another terminal, perform decoding processing, and transmit a decoded signal to a device having a display. By dispersing the processing and selecting appropriate contents regardless of the performance of the communicable terminal itself, data with good image quality can be reproduced. As another example, image data of a large size may be received by a TV or the like, and a part of an area such as a tile into which a picture is divided may be decoded and displayed by a personal terminal of a viewer. This makes it possible to share the entire image and confirm the region in charge of the user or the region to be confirmed in more detail at hand.

In a situation where a plurality of indoor and outdoor short-distance, medium-distance, or long-distance wireless communications are available, it is possible to seamlessly receive content using a distribution system standard such as MPEG-DASH. Thus, the user can freely select a decoding device or a display device such as a user's terminal, a display installed indoors or outdoors, and the like, and switch the devices in real time. Further, decoding can be performed while switching between a terminal to be decoded and a terminal to be displayed, based on its own position information or the like. This makes it possible to map and display information on a part of a wall surface or a floor surface of a building in which a display-enabled device is embedded, while the user moves to the destination. Further, the bit rate of the received data can be switched based on the ease of access to the encoded data on the network, such as the ability to cache the encoded data from the receiving terminal in a short time to an accessible server or to copy the encoded data to an edge server of a content distribution service.

[ hierarchical encoding ]

The switching of contents will be described using a stream of a hierarchy (scalable) that is compression-encoded by applying the moving picture encoding method shown in each of the above embodiments, as shown in fig. 58. The server may have a plurality of streams having the same content and different qualities as a single stream, but may be configured to switch the content using the characteristics of temporally and spatially hierarchical streams obtained by hierarchical encoding as shown in the figure. That is, the decoding side can freely switch between decoding low-resolution content and decoding high-resolution content by determining which layer to decode based on intrinsic factors such as performance and extrinsic factors such as the state of the communication band. For example, when the user wants to view a video being viewed on the mobile smartphone ex115, for example, after going home, the user may use a device such as an internet TV, which decodes the same stream into a different layer, thereby reducing the load on the server side.

Further, in addition to the configuration in which pictures are coded for each layer and scalability is realized in the extension layer higher than the base layer as described above, the extension layer may include meta information such as statistical information based on the pictures. The decoding side may generate high-quality content by super-parsing the picture of the base layer based on the meta information. Super-resolution can also increase the SN ratio while maintaining and/or enlarging the resolution. The meta information includes information for specifying linear or nonlinear filter coefficients used in the super-resolution processing, information for specifying parameter values in the filtering processing, machine learning, or minimum 2-product operation used in the super-resolution processing, and the like.

Alternatively, a configuration may be provided in which a picture is divided into tiles or the like according to the meaning of an object or the like within an image. The decoding side decodes only a portion of the area by selecting the tile for decoding. Further, by storing the attributes of the object (person, car, ball, etc.) and the position within the video (coordinate position in the same image, etc.) as meta information, the decoding side can specify the position of a desired object based on the meta information and determine a tile including the object. For example, as shown in fig. 59, the meta information may be stored using a data storage structure different from pixel data such as an SEI (supplemental enhancement information) message in HEVC. The meta information indicates, for example, the position, size, color, or the like of the main object.

The meta information may be stored in units of a plurality of pictures, such as a stream, a sequence, or a random access unit. The decoding side can acquire the time when a specific person appears in the video, and the like, and can specify the picture in which the object exists by matching the information on the picture unit and the time information, and can determine the position of the object in the picture.

[ optimization of Web Page ]

Fig. 60 is a diagram showing an example of a display screen of a web page in the computer ex111 and the like. Fig. 61 is a diagram showing an example of a display screen of a web page in the smartphone ex115 or the like. As shown in fig. 60 and 61, in some cases, a web page includes a plurality of link images as links to image content, and the visibility is different depending on the viewing device. When a plurality of link images are visible on the screen, the display device (decoding device) may display a still image or an I picture included in each content as a link image, may display a video such as a gif moving picture using a plurality of still images or I pictures, or may decode and display a video only by receiving a base layer until the user explicitly selects a link image, or until the link image approaches the vicinity of the center of the screen, or until the entire link image enters the screen.

When the link image is selected by the user, the display device decodes the link image while setting the base layer to the highest priority, for example. In addition, if information indicating that the content is hierarchical is present in HTML constituting the web page, the display apparatus may decode to the extension layer. Further, when there is a shortage of communication bands or before selection in order to ensure real-time performance, the display device can reduce a delay between the decoding time and the display time of the leading picture (delay from the start of decoding of the content to the start of display) by decoding and displaying only the picture to be referred to ahead (I picture, P picture, B picture to be referred to ahead only). Further, the display device may forcibly ignore the reference relationship of pictures, coarsely decode all B pictures and P pictures as the forward reference, and perform normal decoding by increasing the number of received pictures with the passage of time.

[ automatic traveling ]

In addition, when transmitting and receiving still images or video data such as two-dimensional or three-dimensional map information for automatic travel or travel support of a vehicle, the receiving terminal may receive weather or construction information as meta information in addition to image data belonging to 1 or more layers, and decode the information in association with the received information. The meta information may belong to a layer, or may be simply multiplexed with the image data.

In this case, since the vehicle, the drone, the airplane, or the like including the receiving terminal is moving, the receiving terminal can perform seamless reception and decoding while switching the base stations ex106 to ex110 by transmitting the position information of the receiving terminal. The receiving terminal can dynamically switch to how much meta information is received or how much map information is updated, depending on the selection of the user, the situation of the user, and/or the state of the communication band.

In the content providing system ex100, the client can receive, decode, and reproduce encoded information transmitted by the user in real time.

[ distribution of personal content ]

In addition, the content supply system ex100 can distribute not only high-quality and long-time content provided by a video distribution provider but also low-quality and short-time content provided by an individual by unicast or multicast. It is conceivable that the content of such individuals will increase in the future. In order to make the personal content a better content, the server may perform an encoding process after performing an editing process. This can be realized using the following configuration, for example.

When photographing, the server performs recognition processing such as photographing error, scene search, meaning analysis, object detection, and the like, based on original image data or encoded data. Then, the server manually or automatically corrects focus offset, camera shake, or the like, deletes scenes having a lower importance such as scenes having a lower brightness than other pictures or having no focus, or edits such as emphasizing an edge of an object or changing a color tone, based on the recognition result. And the server encodes the edited data based on the editing result. It is also known that the audience rate decreases if the shooting time is too long, and the server may automatically crop a scene with little motion, not only a scene with low importance as described above, but also a scene with little motion, based on the image processing result, according to the shooting time, so as to have a content within a specific time range. Alternatively, the server may generate a summary based on the result of the meaning analysis of the scene and encode the summary.

There are cases where a content constituting an infringement of a copyright, a copyright of a writer, a portrait right, or the like is written in the personal content as it is, and there are cases where it is inconvenient for a person that the shared range exceeds a desired range. Thus, for example, the server may encode the image by forcibly changing the face of a person around the screen or the home or the like to an out-of-focus image. Further, the server may recognize whether or not a face of a person different from a person registered in advance is captured in the encoding target image, and perform processing such as mosaic processing on the face portion when the face is captured. Alternatively, as the pre-processing or post-processing of the encoding, the user may specify a person or a background region to be processed from the viewpoint of copyright or the like. The server may perform processing such as replacing the designated area with another video or blurring the focus. If the image is a person, the person can be tracked in the moving image, and the image of the face portion of the person can be replaced.

Since there is a strong demand for real-time viewing of personal content with a small data size, the decoding apparatus may receive the base layer first with the highest priority, decode it, and reproduce it, depending on the bandwidth. The decoding apparatus may receive the enhancement layer during this period, and may reproduce a high-quality video including the enhancement layer even when the video is reproduced more than 2 times, such as when the reproduction is looped. In this way, if the stream is hierarchically encoded, it is possible to provide an experience that the stream becomes smooth and the image becomes good even if the moving image is rough at the time of non-selection or at the beginning of viewing. In addition to hierarchical coding, if a coarser stream reproduced at the 1 st time and a2 nd stream coded with reference to a1 st time moving image are configured as 1 stream, the same experience can be provided.

[ other practical examples ]

These encoding and decoding processes are usually processed in LSIex500 provided for each terminal. An LSI (large scale integration circuit) ex500 (see fig. 57) may be configured by a single chip or a plurality of chips. Further, software for encoding or decoding images may be loaded into a recording medium (such as a CD-ROM, a flexible disk, or a hard disk) that can be read by the computer ex111 or the like, and encoding and decoding processes may be performed using the software. Further, when the smartphone ex115 is equipped with a camera, the moving image data acquired by the camera may be transmitted. The moving picture data at this time may be data subjected to encoding processing by LSIex500 possessed by the smartphone ex 115.

Alternatively, LSIex500 may be configured to download and activate application software. In this case, the terminal first determines whether the terminal corresponds to the encoding method of the content or has the performance capability of the specific service. In the case where the terminal does not support the encoding system of the content or does not have the capability of executing a specific service, the terminal may download the codec or the application software and then acquire and reproduce the content.

In addition, not only the content providing system ex100 via the internet ex101, but also at least any one of the moving image encoding device (image encoding device) and the moving image decoding device (image decoding device) according to the above embodiments may be incorporated in the digital broadcasting system. Since multiplexed data obtained by multiplexing video and audio is transmitted and received by using broadcast radio waves such as satellites, there is a difference in that the content providing system ex100 is suitable for multicast in a configuration that facilitates unicast, but the same application can be made to encoding processing and decoding processing.

[ hardware constitution ]

Fig. 62 is a diagram showing more details of the smartphone ex115 shown in fig. 57. Fig. 63 is a diagram showing a configuration example of the smartphone ex 115. The smart phone ex115 includes an antenna ex450 for transmitting and receiving radio waves to and from the base station ex110, a camera unit ex465 capable of capturing video and still images, and a display unit ex458 such as a liquid crystal display for displaying decoded data such as video captured by the camera unit ex465 and video received by the antenna ex 450. The smartphone ex115 further includes an operation unit ex466 such as an operation panel, an audio output unit ex457 such as a speaker for outputting audio or sound, an audio input unit ex456 such as a microphone for inputting audio, a memory unit ex467 capable of storing coded data or decoded data of a photographed video, a still image, a recorded audio, a received video, a still image, a mail, or the like, and an insertion slot 464 serving as an interface with a SIMex468 for identifying a user and authenticating access to various data on behalf of a network.

The main control unit ex460 capable of comprehensively controlling the display unit ex458, the operation unit ex466, and the like is connected to the power supply circuit unit ex461, the operation input control unit ex462, the video signal processing unit ex455, the camera interface unit ex463, the display control unit ex459, the modulation/demodulation unit ex452, the multiplexing/demultiplexing unit ex453, the audio signal processing unit ex454, and the slot unit ex464 memory unit ex467 via the synchronous bus ex 470.

The power circuit unit ex461 activates the smartphone ex115 to be operable if the power key is turned on by the user's operation, and supplies power from the battery pack to each unit.

The smartphone ex115 performs processing such as call and data communication under the control of a main control unit ex460 having a CPU, ROM, RAM, and the like. During a call, the audio signal collected by the audio input unit ex456 is converted into a digital audio signal by the audio signal processing unit ex454, subjected to spectrum spread processing by the modulation/demodulation unit ex452, subjected to digital-to-analog conversion processing and frequency conversion processing by the transmission/reception unit ex451, and the resultant signal is transmitted via the antenna ex 450. The received data is amplified, subjected to frequency conversion processing and analog-digital conversion processing, subjected to spectrum inverse diffusion processing by the modulation/demodulation unit ex452, converted into an analog audio signal by the audio signal processing unit ex454, and then output from the audio output unit ex 457. In data communication, text, still images, or video data can be transmitted under the control of the main control unit ex460 via the operation input control unit ex462 based on an operation of the operation unit ex466 or the like of the main body unit. The same transmission and reception processing is performed. In the data communication mode, when transmitting video, still images, or video and audio, the video signal processing unit ex455 compression-encodes the video signal stored in the memory unit ex467 or the video signal input from the camera unit ex465 by the moving picture encoding method described in each of the above embodiments, and transmits the encoded video data to the multiplexing/demultiplexing unit ex 453. The audio signal processing unit ex454 encodes an audio signal collected by the audio input unit ex456 during the video or still image capturing process by the camera unit ex465, and sends the encoded audio data to the multiplexing/demultiplexing unit ex 453. The multiplexing/demultiplexing unit ex453 multiplexes the coded video data and the coded audio data in a predetermined manner, and the modulation and conversion processing is performed by the modulation/demodulation unit (modulation/demodulation circuit unit) ex452 and the transmission/reception unit ex451, and the data is transmitted via the antenna ex 450. The predetermined mode may be set in advance.

When receiving an e-mail to which video and/or audio are attached, a video linked to a web page, or the like, and the like, the multiplexing/demultiplexing unit ex453 demultiplexes the multiplexed data into a bit stream of video data and a bit stream of audio data by demultiplexing the multiplexed data, supplies the encoded video data to the video signal processing unit ex455 and supplies the encoded audio data to the audio signal processing unit ex454, respectively, by way of the synchronous bus ex470, in order to decode the multiplexed data received via the antenna ex 450. The video signal processing unit ex455 decodes the video signal by the video decoding method corresponding to the video encoding method described in each of the above embodiments, and displays the video or still image included in the linked video file from the display unit ex458 via the display control unit ex 459. The audio signal processing unit ex454 decodes the audio signal, and outputs the audio signal from the audio output unit ex 457. As real-time streaming media has become more popular, there may be occasions where reproduction of sound is socially inappropriate depending on the situation of the user. Therefore, as the initial value, a configuration is preferable in which only the video data is reproduced without reproducing the audio signal, and the audio may be reproduced in synchronization only when the user performs an operation such as clicking the video data.

Note that, although the smart phone ex115 is described as an example, other types of terminals may be used, including a transmission/reception type terminal having both an encoder and a decoder, and a transmission terminal having only an encoder and a reception terminal having only a decoder. In the digital broadcasting system, the explanation has been made on the assumption that multiplexed data in which audio data is multiplexed with video data is received and transmitted. However, in the multiplexed data, character data or the like related to video may be multiplexed in addition to audio data. Further, the video data itself may be received or transmitted instead of the multiplexed data.

Further, the main control unit ex460 including the CPU controls the encoding and decoding processes, but many terminals include GPUs. In this way, a memory shared by the CPU and the GPU, a memory that can be commonly used by managing addresses, and a configuration that can perform processing of a large area at a time by utilizing the performance of the GPU can be made. This shortens the encoding time, and ensures real-time performance to realize low delay. In particular, it is efficient if the processes of motion estimation, deblocking filtering, SAO (Sample Adaptive Offset), and transformation/quantization are performed together in units of pictures or the like by the GPU instead of the CPU.

Industrial applicability

The present invention can be used for, for example, a television set, a digital video recorder, a car navigation, a portable telephone, a digital camera, a digital video camera, a video conference system, an electronic mirror, or the like.

Description of the reference symbols

100 encoder

102 division part

104 subtraction part

106 transformation part

108 quantization part

110 entropy coding part

112. 204 inverse quantization unit

114. 206 inverse transformation part

116. 208 addition unit

118. 210 block memory

120. 212 loop filter part

122. 214 frame memory

124. 216 intra prediction unit

126. 218 inter prediction unit

128. 220 prediction control unit

132 binarization section

134. 140, 236, 242 switching part

136. 240 intermediate buffer

138 arithmetic coding unit

142 multiplexing part

144 output buffer

200 decoding device

202 entropy decoding unit

232 input buffer

234 separation part

238 arithmetic decoding unit

244 inverse binarization unit

1201 boundary determining unit

1202. 1204, 1206 switch

1203 filter determination unit

1205 filter processing unit

1207 Filter characteristic determining section

1208 processing determination unit

a1, b1 processor

a2, b2 memory

110页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于变换跳过模式的系数译码

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类