Image coding method, image decoding method and device for processing image segmentation

文档序号:1958200 发布日期:2021-12-10 浏览:14次 中文

阅读说明:本技术 处理图片分割的影像编码方法、影像解码方法及其装置 (Image coding method, image decoding method and device for processing image segmentation ) 是由 林和燮 林柾润 于 2020-03-23 设计创作,主要内容包括:本发明实施例的视频解码方法包括如下步骤:对包括分割视频的图片的瓦片或切片且用于处理覆盖所述图片的特定区域的多个子图的子图信息进行解码;及基于所述子图信息,识别所述多个子图,对构成所述子图的各个瓦片或切片进行解码,另外,所述子图信息包括显示与所述多个子图对应的处理级别的级别信息。(The video decoding method of the embodiment of the invention comprises the following steps: decoding sub-picture information comprising tiles or slices of a picture of a split video and used to process a plurality of sub-pictures covering a particular region of the picture; and identifying the plurality of subgraphs based on the subgraph information, decoding each tile or slice constituting the subgraph, and displaying the level information of the processing level corresponding to the plurality of subgraphs.)

1. An image decoding method is characterized in that,

the method comprises the following steps:

decoding sub-picture information comprising tiles or slices of a picture that segments an image and used to process a plurality of sub-pictures covering a particular region of the picture; and

identifying the plurality of subgraphs based on the subgraph information, decoding individual tiles or slices that make up the subgraph,

the sub-picture information includes level information showing a processing level corresponding to the plurality of sub-pictures.

2. The image decoding method according to claim 1,

the level information is used for a processing suitability test of a bitstream containing the subgraph.

3. The image decoding method according to claim 2,

the processing suitability test of the bitstream determines a process of a processing step of the bitstream according to the level information and the environment variable.

4. The image decoding method according to claim 2,

the level information is determined according to quantity information of tiles included in a sub-group including the sub-picture.

5. The image decoding method according to claim 2,

the level information is transmitted in a supplemental enhancement information message corresponding to the bitstream.

6. The image decoding method according to claim 2,

the level information includes maximum or minimum layer unit information showing a step capable of processing tiles included in the respective subgraphs.

7. The image decoding method according to claim 2,

and the layer unit information display divides the tiles in the subgraph according to the steps corresponding to the layer unit information and processes the divided tiles respectively.

8. The image decoding method according to claim 2,

whether or not decoding of a sub-picture within an encoded sub-picture group is possible is variably determined by a suitability test procedure based on the level information in a decoding apparatus that decodes the sub-picture.

9. The image decoding method according to claim 2,

the suitability test process includes a process of determining whether to process a subgraph or a processing step based on the level information and an environment variable of the decoding device.

10. The image decoding method of claim 9,

the environment variable of the decoding apparatus is determined according to at least one of a decoding environment variable, a system environment variable, a network variable, and a user viewpoint variable.

11. The image decoding method according to claim 1,

the tile or slice includes a plurality of coding tree units that are basic units for partitioning the picture,

the coding tree unit is divided into one or more coding units that are basic units for performing inter prediction or intra prediction,

the coding unit is partitioned into at least one of a quadtree, a binary tree, or a ternary tree structure,

the subgraph includes a particular rectangular area within the picture formed by the contiguous arrangement of tiles or slices.

12. An image decoding apparatus is characterized in that,

the method comprises the following steps:

a picture dividing unit that decodes sub-picture information of a plurality of sub-pictures that include a tile or a slice of a picture of a divided video and that cover a specific region of the picture; and

a decoding processing unit that identifies the plurality of subgraphs based on the subgraph information, decodes each tile or slice constituting the subgraph,

the sub-picture information includes level information showing a processing level corresponding to the plurality of sub-pictures.

Technical Field

The present invention relates to image encoding and decoding, and more particularly, to a method of performing prediction and conversion by dividing a video Picture (Picture) into a plurality of regions.

Background

In the video compression method, one Picture (Picture) is divided into a plurality of regions having a predetermined size and encoded. In order to improve compression efficiency, inter prediction (inter prediction) and intra prediction (intra prediction) techniques are used, in which the repetition rate between pictures is removed.

In this case, the reason why a residual signal (residual signal) is produced by intra prediction and inter prediction and the residual signal is obtained is that when the residual signal is included and encoded, the data compression rate is improved by reducing the data amount, and the value of the residual signal is reduced by improving the prediction.

The intra prediction method uses pixels around the current block to predict data of the current block. The difference between the actual value and the predicted value is referred to as a residual signal block. For the case of High Efficiency Video Coding (HEVC), the intra prediction method increases from the nine prediction modes used by existing h.264/AVC to 35 prediction modes with more refinement for prediction.

For the inter prediction method, the current block is compared with blocks in the surrounding pictures to find the most similar block. The position information (Vx, Vy) of the searched block is referred to as a motion vector at this time. The difference in intra-block pixel values between the predicted prediction blocks is referred to as a residual signal (motion-compensated residual block) by the current block and the motion vector.

Thus, intra prediction and inter prediction are more embodied, reducing the data amount of the residual signal, but greatly increasing the amount of calculation for processing video.

In particular, there is difficulty in implementing a pipeline or the like due to an increase in complexity in determining an intra-picture division structure for image encoding and decoding, and the conventional block division method and the size and shape of a block divided according thereto are not suitable for encoding of high resolution video.

In addition, in recent years, in order to support virtual reality such as 360VR video, a process of projecting a fused super high resolution video is required in real time by preprocessing a plurality of high resolution videos, and the current prediction conversion and quantization processing program of the block structure may not be efficient in the super high resolution video processing.

Disclosure of Invention

Technical problem

The present invention has been made to solve the above-described problems, and an object of the present invention is to provide an image processing method, and an image decoding and encoding method using the same, which are suitable for encoding and decoding an ultra-high resolution image and which can process effective video segmentation for the image.

Technical scheme

The video decoding method for solving the above problems includes the steps of: -including tiles or slices of a picture of a split picture, -decoding sub-picture information for processing a plurality of sub-pictures covering a specific area of said picture; and identifying the plurality of subgraphs based on the subgraph information, and decoding respective tiles or slices constituting the subgraph, wherein the subgraph information comprises level information displaying processing levels corresponding to the plurality of subgraphs.

The video decoding apparatus for solving the above problem includes: a picture dividing unit that includes tiles or slices for dividing a picture of a video, and decodes sub-picture information for processing a plurality of sub-pictures covering a specific region of the picture; and a decoding processing unit that identifies the plurality of subgraphs based on the subgraph information including level information indicating processing levels corresponding to the plurality of subgraphs, and decodes the tiles or slices constituting the subgraph.

Effects of the invention

According to the embodiment of the invention, the coding and decoding efficiency of the high-resolution image can be improved according to the more effective picture segmentation and parallel processing.

In particular, each of the divided subgraphs is composed of different conditions and forms, and indicates appropriate subgraph information corresponding to the subgraph information, so that the video decoding processing with adaptability and efficiency according to the performance and environment in the decoding device can be performed.

Drawings

Fig. 1 is a block diagram illustrating a structure of an image encoding apparatus according to an embodiment of the present invention.

Fig. 2 to 5 are diagrams for explaining a first embodiment of a method of processing an image by dividing the image into blocks.

FIG. 6 is a block diagram illustrating an embodiment of a method for performing inter prediction in a video coding device.

Fig. 7 is a block diagram illustrating a structure of an image decoding apparatus according to an embodiment of the present invention.

FIG. 8 is a block diagram illustrating an embodiment of a method of performing inter-prediction in a video decoding device.

Fig. 9 is a diagram for explaining a second embodiment of a method of dividing a video into blocks.

Fig. 10 is a diagram showing an example of a syntax (syntax) structure used for processing to divide a video into blocks.

Fig. 11 is a diagram for explaining a third embodiment of a method of dividing a video into blocks.

Fig. 12 is a diagram for explaining an embodiment of a method of dividing a coding unit in a binary tree structure to constitute a conversion unit.

Fig. 13 is a diagram for explaining a fourth embodiment of a method of processing an image by dividing the image into blocks.

Fig. 14 to 16 are diagrams for explaining another embodiment of a method of processing an image by dividing the image into block units.

Fig. 17 and 18 are diagrams for explaining an embodiment of a method of determining a division structure of a conversion unit by performing Rate Distortion Optimization (RDO).

Fig. 19 is a diagram for explaining a composite segmented structure of another embodiment of the present invention.

Fig. 20 is a flowchart for explaining an encoding procedure of tile group information of the embodiment of the present invention.

Fig. 21 to 25 are diagrams for explaining an illustration of tile groups and tile group information of the embodiment of the present invention.

Fig. 26 is a flowchart for explaining a tile group information-based decoding process of an embodiment of the present invention.

FIG. 27 is a diagram for explaining an initialization process of tile group headers of the embodiment of the present invention.

Fig. 28 is a diagram for explaining variable parallel processing based on parallel layers and units according to an embodiment of the present invention.

Fig. 29 is a diagram for explaining a case of mapping between tile group information and user viewpoint information according to an embodiment of the present invention.

Fig. 30 is a diagram illustrating a syntax of tile group header information of an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be specifically described below with reference to the drawings. In describing the embodiments of the present specification, when it is judged that the detailed description of the related known structure or function will obscure the gist of the present specification, the detailed description thereof will be omitted.

When any component is referred to as being "connected" or "coupled" to another component, it is directly connected or coupled to the other component, but it can be understood that other components may exist in the middle. Note that, in the present invention, the content of "including" a specific structure does not exclude a corresponding structure, and means that another structure can be included in the scope of the implementation of the present invention or the technical idea of the present invention.

The terms first, second, and the like are used when describing various components, but the components are not limited by the terms. The term is used for the purpose of distinguishing only one constituent element from other constituent elements. For example, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component without departing from the scope of the claims of the present invention.

Further, the components shown in the embodiments of the present invention are not intended to be independently displayed for displaying functions having different characteristics from each other, and are not intended to be configured by a single hardware or software component unit for separating the components. That is, for convenience of explanation, each of the components is included in a list of the respective components, and at least two components of each of the components are combined to constitute one component, or one component is divided into a plurality of components to perform functions.

Further, some of the components are not essential components for performing the essential functions of the present invention, and are only selective components for improving the performance. The present invention includes only the structural elements necessary for realizing the essence of the present invention in addition to the structural elements used for improving the performance, and the structure including the necessary structural elements in addition to the optional structural elements used for improving the performance is also included in the scope of claims of the present invention.

Fig. 1 is a block diagram showing a configuration of an image encoding apparatus according to an embodiment of the present invention, and an image encoding apparatus 10 includes: the picture dividing unit 110, the conversion unit 120, the quantization unit 130, the scanning unit 131, the entropy encoding unit 140, the intra prediction unit 150, the inter prediction unit 160, the inverse quantization unit 135, the inverse conversion unit 125, the post-processing unit 170, the picture storage unit 180, the subtraction unit 190, and the addition unit 195.

Referring to fig. 1, a picture dividing unit 110 analyzes an input video signal, divides a picture into coding units, determines a prediction mode, and determines the size of the prediction unit for each coding unit.

The picture dividing unit 110 transmits the prediction unit to be encoded to the intra prediction unit 150 or the inter prediction unit 160 according to the prediction mode (or prediction method). Then, the picture dividing unit 110 transmits the prediction unit to be encoded to the subtraction unit 190.

Here, a Picture (Picture) of a video is composed of a plurality of tiles or slices, and the tiles or slices are divided into a plurality of Coding Tree Units (CTUs), which are basic units of divided pictures.

Also, the plurality of tiles or slices of embodiments of the present invention constitute one or more tile or slice groups that constitute a sub-picture that divides the picture into rectangular regions. And, a parallelization handler for a tile or slice group based subgraph is executed, which is described further below.

The Coding tree Unit is divided into one or more Coding Units (CUs) that are basic units for performing inter prediction (inter prediction) or intra prediction (intra prediction).

A Coding Unit (CU) is partitioned into one or more Prediction Units (PUs), which are basic units for performing Prediction.

For this case, the encoding apparatus 10 determines either one of inter prediction and intra prediction by a prediction method for each of the divided Coding Units (CUs), but may generate a prediction block differently for each Prediction Unit (PU).

In addition, the encoding Unit (CU) is divided into one or more Transform Units (TU), which are basic units for performing Transform on residual blocks (residual blocks).

In this case, the picture dividing part 110 transmits the image data to the subtraction part 190 by the block unit (e.g., Prediction Unit (PU) or conversion unit (TU)) divided as described above.

Referring to fig. 2, a Coding Tree Unit (CTU) having a maximum size of 256 × 256 pixels is divided into a quad tree (quad tree) structure and four Coding Units (CUs) having a square shape.

The four Coding Units (CU) having the square shape are respectively re-divided into the quad-tree structure, and as described above, the Depth (Depth) of the Coding Unit (CU) divided into the quad-tree structure has any integer value of 0 to 3.

A Coding Unit (CU) is partitioned into one or more Prediction Units (PUs) according to prediction modes.

For the intra prediction mode, when the size of the Coding Unit (CU) is 2Nx2N, the Prediction Unit (PU) has a size of 2Nx2N shown in (a) of fig. 3 or NxN shown in (b) of fig. 3.

In addition, for the inter prediction mode, when the size of the Coding Unit (CU) is 2Nx2N, the Prediction Unit (PU) has any one of 2Nx2N shown in fig. 4 (a), 2NxN shown in fig. 4 (b), Nx2N shown in fig. 4 (c), NxN shown in fig. 4 (d), 2NxnU shown in fig. 4 (e), 2NxnD shown in fig. 4 (f), nLx2N shown in fig. 4 (g), and nRx2N shown in fig. 4 (h).

Referring to fig. 5, a Coding Unit (CU) is divided into a quad tree (quad tree) structure and four conversion units (TUs) having a square shape.

The four conversion units (TU) having the square shape are respectively re-divided into the quad-tree structure, and as described above, the Depth (Depth) of the conversion units (TU) divided into the quad-tree structure has any integer value from 0 to 3.

Here, in the case where a Coding Unit (CU) is an inter prediction mode, a Prediction Unit (PU) and a conversion unit (TU) divided from the corresponding Coding Unit (CU) have mutually independent division structures.

In case that a Coding Unit (CU) is an intra prediction mode, a conversion unit (TU) divided from the corresponding Coding Unit (CU) cannot be larger than the size of a Prediction Unit (PU).

Also, the conversion unit (TU) divided as described above has a maximum size of 64x64 pixels.

The conversion unit 120 converts a residual block, which is a residual signal between an original block of the input Prediction Unit (PU) and a prediction block generated in the intra prediction unit 150 or the inter prediction unit 160, the conversion being performed with the conversion unit (TU) as a basic unit.

In the conversion process, mutually different conversion matrices are determined according to a prediction mode (intra or inter), residual signals of intra prediction have directionality according to the intra prediction mode, and the conversion matrices are adaptively determined according to the intra prediction mode.

The conversion unit is converted by two (horizontal, vertical) one-dimensional conversion matrices, for example, in the case of inter prediction, one conversion matrix determined in advance is determined.

In the case of intra prediction, when the intra prediction mode is horizontal, the probability that the residual block has directivity in the vertical direction increases, and thus an integer matrix based on DCT (discrete cosine transform) is applied in the vertical direction, and an integer matrix based on DST (output image) or KLT (reduced-dimension and image compression) is applied in the horizontal direction. When the intra prediction mode is vertical, the DST-based or KLT-based integer matrix is applied in the vertical direction, and the DCT (discrete cosine transform) -based integer matrix is applied in the horizontal direction.

Also, in case of DC (dual channel) mode, the integer matrix based on all DCT is bidirectionally applied.

Also, for intra prediction, the conversion matrix can be adaptively determined based on the size of the conversion unit (TU).

The quantization unit 130 determines a quantization step size for quantizing the coefficients of the residual block converted by the conversion matrix, and the quantization step size may be determined for each quantization unit having a size equal to or larger than a predetermined size.

The quantization cells have a size of 8x8 or 16x16, and the quantization unit 130 quantizes the coefficients of the conversion block using a quantization matrix determined according to the quantization step size and the prediction mode.

The quantization unit 130 uses the quantization step size of the quantization unit adjacent to the current quantization unit as the quantization step size prediction identifier of the current quantization unit.

The quantization unit 130 searches the left-side quantization cell, the upper-side quantization cell, and the upper-left quantization cell of the current quantization cell in order and generates a quantization step size prediction identifier of the current quantization cell using one or two effective quantization step sizes.

For example, the quantization unit 130 determines the first valid quantization step size retrieved in the above-described order as the quantization step size prediction identifier, determines the average of two valid quantization step sizes retrieved in the above-described order as the quantization step size prediction identifier, or determines the average as the quantization step size prediction identifier if only one quantization step size is valid.

In the case where the quantization step size prediction identifier is determined, the quantization section 130 transfers a difference value between the quantization step size of the current quantization cell and the quantization step size prediction identifier to the entropy encoding section 140.

In addition, all of the left coding unit, the upper coding unit, and the upper left coding unit of the current coding unit do not exist, or there may exist a coding unit existing before in the coding order within the maximum coding unit.

Therefore, the quantization step size of the previous quantization unit in the coding order within the quantization unit adjacent to the current coding unit and the maximum coding unit can be used as a candidate identifier.

For this case, the priority order is set in the order of 1) the left-side quantization unit of the current coding unit, 2) the upper-side quantization unit of the current coding unit, 3) the upper-left quantization unit of the current coding unit, and 4) the previous quantization unit in the coding order. The order can be switched and the upper left quantization unit can be omitted.

The conversion block quantized as described above is transferred to the inverse quantization unit 135 and the scanning unit 131.

The scanning unit 131 scans the coefficients of the quantized conversion block and converts the coefficients into one-dimensional quantized coefficients, and in this case, the coefficient distribution of the quantized conversion block depends on the intra prediction mode, and the scanning method is determined according to the intra prediction mode.

In addition, the coefficient scanning method may be determined differently depending on the size of the conversion unit, and the scanning pattern may be different depending on the directional intra prediction mode.

In the case where the quantized coefficients are divided into a plurality of subsets (sub-sets), the quantized coefficients within each subset are applied with the same scanning pattern, and the scanning pattern between the subsets can be applied with a zigzag scanning or a diagonal scanning.

In addition, the scan pattern is preferably scanned forward from the main subset containing DC to the remainder subset, which may be reversed.

Further, the scan pattern between the subsets can be set in the same manner as the scan pattern of the quantized coefficients in the subsets, and the scan pattern between the subsets can be determined according to the intra prediction mode.

Further, encoding device 10 includes, in the bitstream, information indicating the position of the last quantized coefficient other than 0 in the conversion unit (PU) and the position of the last quantized coefficient other than 0 in each subset, and transmits the information to decoding device 20.

The inverse quantization unit 135 inverse-quantizes the quantized quantization coefficients as described above, and the inverse transform unit 125 performs inverse transform on a Transform Unit (TU) basis to restore the inverse-quantized transform coefficients to residual blocks in a spatial region.

The adder 195 combines the residual block restored by the inverse transform unit 125 and the prediction block received from the intra prediction unit 150 or the inter prediction unit 160 to generate a restored block.

And, the post-processing section 170 performs a deblocking (deblocking) filtering process for removing a blocking effect occurring in the restored picture; and a Sample Adaptive Offset (SAO) adaptation process for compensating a difference value from the original video in a pixel unit, an Adaptive Loop Filtering (ALF) process for compensating a difference value from the original video by an encoding unit, and the like.

The deblocking filtering process is applied to a boundary of a Prediction Unit (PU) or a conversion unit (TU) having a size set in advance or more.

For example, the deblocking filtering process includes the steps of: determining a boundary (boundary) to be filtered; determining a boundary filtering strength (boundary filtering strength) applicable to the boundary; determining whether a deblocking filter is applicable or not; for the case where it is determined that the deblocking filter is applicable, a filter applicable to the boundary is selected.

Further, whether the deblocking filter is applied or not is determined by whether i) the boundary filtering strength is greater than 0 or not and ii) whether a value of a degree of change in pixel values at a boundary portion of two blocks (P block, Q block) adjacent to the boundary to be filtered is smaller than a first reference value determined by a quantization parameter or not.

Preferably, the filters are at least two or more. In the case where the absolute value of the difference value between two pixels at the block boundary is equal to or greater than the second reference value, a filter that performs relatively weak filtering is selected.

The second reference value is determined by the quantization parameter and the boundary filtering strength.

Also, the Sample Adaptive Offset (SAO) applying process is to reduce a disparity value (distorsion) between a pixel and an original pixel within a video to which the deblocking filter is applied, and to determine whether to perform the Sample Adaptive Offset (SAO) applying process in a picture or slice unit.

A picture or slice is divided into a plurality of offset regions, and offset types including a number (e.g., four) of edge offset types and two band offset types set in advance are determined for each offset region.

For example, in the case where the offset type is an edge offset type, the edge type to which each pixel belongs is determined based on the distribution of two pixel values adjacent to the current pixel, and the offset corresponding thereto is applied.

An Adaptive Loop Filtering (ALF) process performs filtering based on a value comparing a restored image subjected to a deblocking filtering process or an adaptive offset applying process with an original image.

The Picture storage unit 180 receives the post-processing image data input from the post-processing unit 170, restores and stores images in units of pictures (pictures), which are frames or fields.

The inter prediction unit 160 performs motion estimation using at least one reference picture stored in the picture storage unit 180, and specifies a reference picture index and a motion vector for displaying the reference picture.

In this case, the prediction block corresponding to the prediction unit to be encoded is estimated from the reference picture used for motion estimation in the plurality of reference pictures stored in the picture storage unit 180, based on the determined reference picture index and motion vector.

The intra prediction unit 150 performs intra prediction encoding using reconstructed pixel values of the inside of the picture including the current prediction unit.

The intra prediction unit 150 receives the input current prediction unit to be prediction-encoded, and performs intra prediction by selecting one of a preset number of intra prediction modes according to the size of the current block.

The intra prediction unit 150 performs adaptive filtering on the reference pixels in order to generate the intra prediction block, and generates the reference pixels using the available reference pixels when the reference pixels are not available.

The entropy encoding unit 140 entropy encodes the quantized coefficients quantized by the quantization unit 130, the intra prediction information received from the intra prediction unit 150, the motion information received from the inter prediction unit 160, and the like.

Fig. 6 is a block diagram showing an embodiment of a structure for performing inter prediction in the encoding apparatus 10. The illustrated inter prediction encoder includes a motion information determination unit 161, a motion information encoding mode determination unit 162, a motion information encoding unit 163, a prediction block generation unit 164, a residual block generation unit 165, a residual block encoding unit 166, and a multiplexer 167.

Referring to fig. 6, the motion information determination unit 161 determines motion information of the current block, the motion information including a reference picture index and a motion vector, and displays any one of previously coded and restored pictures with reference to the picture index.

When the current block is bidirectionally predictive-encoded, the reference picture index of one of the reference pictures of display list 0(L0) and the reference picture index of one of the reference pictures of display list 1(L1) are included.

Also, in the case of bi-predictive coding the current block, an index of one or both of reference pictures showing a composite List (LC) generated in conjunction with list 0 and list 1 is included.

The motion vector, which is a pixel unit (integer unit) or a sub-pixel unit, indicates the position of the prediction block within each picture with reference to the picture index.

For example, the motion vector has a precision of 1/2, 1/4, 1/8, or 1/16 pixels, and for the case where the motion vector is not an integer unit, the prediction block is generated from pixels of the integer unit.

The motion information encoding mode determination unit 162 determines an encoding mode of motion information for the current block, and the encoding mode is exemplified by any one of a skip mode, a merge mode, and an AMVP (advanced motion vector prediction) mode.

The skip mode, which is applied when a Prediction Unit (PU), i.e., a current block, is the same size as a Coding Unit (CU), has a skip candidate identifier having the same motion information as that of the current block, and is applied when a residual signal is 0.

The merge mode is applicable when there is a merge candidate identifier having the same motion information as that of the current block, and is applicable when there is a residual signal when the current block size is different from or the same as a Coding Unit (CU), and in addition, the merge candidate identifier and the skip candidate identifier are the same.

The AMVP mode is applicable when the skip mode and the merge mode are not applied, and an AMVP candidate identifier having a motion vector most similar to that of the current block is selected as the AMVP prediction identifier.

But the coding mode is adapted to include a more subdivided motion compensated predictive coding mode as a process in addition to the illustrated method. The adaptively determined MOTION-compensated PREDICTION mode includes not only the above-mentioned AMVP mode, merge mode, and skip mode, but also at least one of FRUC (frame rate UP-CONVERSION) mode, FRAME RATE UP-CONVERSION mode, BIO (BI-DIRECTIONAL OPTICAL FLOW) mode, AMP (AFFINE MOTION PREDICTION) mode, OBMC (OVERLAPPED BLOCK MOTION COMPENSATION) mode, DMVR (decoding-end MOTION VECTOR correction), encoder-SIDE MOTION VECTOR PREDICTION mode, ATMVP (optional temporal MOTION VECTOR PREDICTION), Spatial-temporal MOTION VECTOR PREDICTION mode, and Local-compensated PREDICTION mode, which are proposed as the current new MOTION-compensated PREDICTION mode, and the adaptively determined mode is determined according to a condition in advance of the current MOTION-compensated PREDICTION mode.

The motion information encoding unit 163 encodes the motion information in accordance with the mode determined by the motion information encoding mode determination unit 162.

For example, the motion information encoding unit 163 performs the merge motion vector encoding process when the motion information encoding mode is the skip mode or the merge mode, and performs the AMVP encoding process when the motion information encoding mode is the AMVP mode.

The prediction block generation unit 164 generates a prediction block using the motion information of the current block, and copies a block corresponding to the position where the motion vector is displayed in the picture displayed with reference to the picture index when the motion vector is in integer units, thereby generating a prediction block of the current block.

In the case where the motion vector is not in integer units, the prediction block generation unit 164 can generate pixels of a prediction block from integer unit pixels within a picture displayed with reference to the picture index.

In this case, the prediction pixel is generated using an 8-order interpolation filter for the luminance pixel, and the prediction pixel is generated using a 4-order interpolation filter for the color difference pixel.

The residual block generator 165 generates a residual block using the current block and the prediction block of the current block, and generates a residual block using the current block and a prediction block of 2Nx2N size corresponding to the current block when the size of the current block is 2Nx 2N.

In addition, in the case where the size of the current block used for prediction is 2NxN or Nx2N, a prediction block of each of two 2NxN blocks constituting 2Nx2N is acquired, and then a final prediction block of 2Nx2N size is generated using the two 2NxN prediction blocks.

Also, a residual block of 2Nx2N size can be generated using the prediction block of 2Nx2N size, and overlap smoothing is applied to pixels at the boundary portion in order to solve the discontinuity at the boundary portion between two prediction blocks of 2NxN size.

The residual block encoding unit 166 divides the residual block into one or more Transform Units (TUs), and performs transform coding, quantization, and entropy coding on each Transform Unit (TU).

The residual block encoding unit 166 converts the residual block generated by the inter prediction method using an integer-based transform matrix, which is an integer-based DCT matrix.

The residual block encoding unit 166 uses a quantization matrix determined by quantization parameters to quantize the coefficients of the residual block converted by the conversion matrix.

The quantization parameter is determined by a Coding Unit (CU) having a size set in advance or larger, and when the current Coding Unit (CU) is smaller than the size set in advance, the quantization parameter of the first Coding Unit (CU) in the coding order in the Coding Unit (CU) within the size set in advance is encoded, and the quantization parameter of the remaining Coding Units (CU) is not encoded because the quantization parameter is the same as the parameter.

And, the coefficients of the conversion block are quantized using a quantization matrix determined according to the quantization parameter and a prediction mode.

The quantization parameter determined for the Coding Unit (CU) of the preset size or larger is predictive-coded using the quantization parameter of the Coding Unit (CU) adjacent to the current Coding Unit (CU).

The quantization parameter prediction identifier of the current Coding Unit (CU) is generated by searching in the left Coding Unit (CU) and the upper Coding Unit (CU) of the current Coding Unit (CU) in sequence and using one or two effective quantization parameters.

For example, a valid first quantization parameter retrieved in the order is determined as a quantization parameter prediction identifier, and a valid first quantization parameter retrieved in the order of the left Coding Unit (CU) and the Coding Unit (CU) immediately before in the coding order is determined as a valid first quantization parameter prediction identifier.

The coefficients of the quantized transform block are scanned to transform into one-dimensional quantized coefficients, and the scanning mode is set differently according to the entropy coding mode.

For example, in the case of encoding by CABAC, the quantization coefficients of inter-frame prediction encoding are scanned in one manner (zigzag, or diagonal raster scan) set in advance, and in the case of encoding by CAVLC, the quantization coefficients are scanned in a manner different from the above-described manner.

For example, when the scanning method is inter, zigzag, or intra, the coefficient scanning method may be determined according to the intra prediction mode, or may be determined differently according to the size of the conversion unit.

In addition, the scanning pattern is different according to the directional intra-frame prediction mode, and the scanning order of the quantized coefficients is reversely scanned.

The multiplexer 167 diversifies the motion information encoded by the motion information encoding unit 163 and the residual signal encoded by the residual block encoding unit 166.

The motion information includes only an index indicating a prediction identifier in the case of jumping or merging, and includes a reference picture index, a differential motion vector, and an AMVP index of a current block in the case of AMVP, depending on the encoding mode.

Next, an embodiment of the operation of the intra prediction unit 150 shown in fig. 1 will be specifically described.

First, the intra prediction unit 150 receives the prediction mode information and the size of the Prediction Unit (PU) from the picture dividing unit 110, and reads the reference pixel from the picture storage unit 180 in order to determine the intra prediction mode of the Prediction Unit (PU).

The intra prediction unit 150 checks whether or not there is a reference pixel that cannot be used to determine the intra prediction mode of the current block, and determines whether or not to generate the reference pixel.

When the current block is located at the upper boundary of the current picture, a pixel adjacent to the upper side of the current block is not defined, and when the current block is located at the left boundary of the current picture, a pixel adjacent to the left side of the current block is not defined, and the pixel is determined to be a pixel that is not available.

Further, even when the current block is at the slice boundary and the pixel adjacent to the upper side or the left side of the slice is not the pixel restored by the first encoding, it can be determined that the current block is not the utilizable pixel.

As described above, in the case where there is no pixel adjacent to the left or upper side of the current block or no pixel restored by encoding, the intra prediction mode of the current block can be determined using only the available pixels.

In addition, a reference pixel at a position where the reference pixel of the current block can be used to generate the reference pixel at a position where the reference pixel cannot be used may be used, and for example, in the case where the pixel of the upper block cannot be used, the upper pixel may be generated using a part or all of the left pixels, or vice versa.

That is, when there is no reference pixel that can be used in the previously set direction, the reference pixel at the closest position in the opposite direction is copied to generate the reference pixel.

In addition, when there is an upper or left pixel of the current block, a reference pixel that cannot be used is determined according to the encoding mode to which the block belongs.

For example, when a block to which a reference pixel adjacent to the upper side of the current block belongs is a block restored by inter-coding, it is determined that the pixel of the pixel cannot be used.

In this case, the block adjacent to the current block is intra-coded, and usable reference pixels are generated using the pixels to which the restored block belongs, and the encoding device 10 transmits information of the usable reference pixels determined according to the encoding mode to the decoding device 20.

The intra prediction unit 150 determines an intra prediction mode of the current block using the reference pixels, and the number of allowable intra prediction modes of the current block differs according to the size of the block.

For example, in the case where the size of the current block is 8x8, 16x16, 32x32, there are 34 intra prediction modes, and in the case where the size of the current block is 4x4, there are 17 intra prediction modes.

The 34 or 17 intra prediction modes are composed of at least one non-directional mode (non-directional mode) and a plurality of directional modes (directional modes).

The one or more non-directional modes are DC modes and/or planar (planar) modes. When the DC mode and the planar mode are included as the non-directional mode, there are 35 intra prediction modes regardless of the size of the current block.

For this case, two non-directional modes (DC mode and planar mode) and 33 directional modes are included.

For the planar mode, a prediction block of the current block is generated using at least one pixel value (or a prediction value of the pixel value, hereinafter referred to as a first reference value) and a reference pixel at a lower right side (bottom-right) of the current block.

The configuration of the video decoding apparatus according to an embodiment of the present invention is derived from the configuration of the video encoding apparatus 10 described with reference to fig. 1 to 6, and for example, the video encoding method described with reference to fig. 1 to 6 is performed in reverse to decode the video.

Fig. 7 is a block diagram showing the structure of a video decoding apparatus according to an embodiment of the present invention, and the decoding apparatus 20 includes: an entropy decoding unit 210, an inverse quantization/inverse conversion unit 220, an adder 270, a post-processing unit 250, a picture storage unit 260, an intra prediction unit 230, a motion compensation prediction unit 240, and an intra/inter conversion switch 280.

The entropy decoding unit 210 receives and decodes the input bit stream encoded by the video encoding apparatus 10, separates the bit stream into an intra prediction mode index, motion information, a quantization coefficient sequence, and the like, and transmits the decoded motion information to the motion compensation prediction unit 240.

The entropy decoding unit 210 transmits the intra prediction mode index to the intra prediction unit 230 and the inverse quantization/inverse conversion unit 220, and transmits the inverse quantization coefficient sequence to the inverse quantization/inverse conversion unit 220.

The inverse quantization/inverse conversion unit 220 converts the quantized coefficient sequence into two-dimensionally arranged inverse quantization coefficients, selects one of a plurality of scan patterns for the conversion, and selects a scan pattern based on, for example, a prediction mode (i.e., intra prediction or inter prediction) and an intra prediction mode of the current block.

The inverse quantization/inverse conversion unit 220 applies a quantization matrix selected from a plurality of quantization matrices to the inverse quantization coefficients arranged two-dimensionally, and restores the quantization coefficients.

In addition, quantization matrices different from each other are applied according to the size of a current block to be restored, and for blocks of the same size, a quantization matrix is selected based on at least one of a prediction mode and an intra prediction mode of the current block.

The inverse quantization/inverse transform section 220 inverse-transforms the restored quantized coefficients to restore a residual block, the inverse transform process being performed in a Transform Unit (TU) as a basic unit.

The adder 270 combines the residual block restored by the inverse quantization/inverse conversion unit 220 and the prediction block generated by the intra prediction unit 230 or the motion compensation prediction unit 240 to restore a video block.

The post-processing unit 250 performs post-processing on the restored video generated by the adder 270, and reduces a deblocking product and the like due to a video loss caused by a quantization process by filtering and the like.

The picture storage 260 is a frame memory for storing the local decoded picture on which the post-processing of filtering is performed by the post-processing unit 250.

The intra prediction unit 230 restores the intra prediction mode of the current block based on the intra prediction mode index received from the entropy decoding unit 210, and generates a prediction block according to the restored intra prediction mode.

The motion compensation prediction unit 240 generates a prediction block of the current block from the picture stored in the picture storage unit 260 based on the motion vector information, and generates the prediction block by applying the selected interpolation filter when motion compensation with a small degree of accuracy is applied.

The intra/inter switch 280 supplies the prediction block generated by any one of the intra prediction unit 230 and the motion compensation prediction unit 240 to the adder 270 based on the encoding mode.

Fig. 8 is a block diagram of an embodiment of a structure for performing inter prediction at video decoding device 20, the inter prediction decoder comprising: a multiplexer 241, a motion information encoding mode determination unit 242, a merge mode motion information decoding unit 243, an AMVP mode motion information decoding unit 244, a selection mode motion information decoding unit 248, a prediction block generating unit 245, a residual block decoding unit 246, and a restoration block generating unit 247.

Referring to fig. 8, the multiplexer 241 inversely multiplexes the currently encoded motion information and the encoded residual signal from the received bitstream, transmits the inversely multiplexed motion information to the motion information encoding mode determining unit 242, and transmits the inversely multiplexed residual signal to the residual block decoding unit 246.

The motion information coding mode determination unit 242 determines the motion information coding mode of the current block, and determines that the motion information coding mode of the current block is coded in the skip coding mode when the skip flag (skip _ flag) of the received bitstream has a value of 1.

The motion information coding mode determination unit 242 determines that the motion information coding mode of the current block is coded in the merge mode when the skip flag of the received bitstream has a value of 0 and the motion information received from the multiplexer 241 has only the merge index.

The motion information coding mode determination unit 242 determines that the motion information coding mode of the current block is coded in the AMVP mode when the skip flag of the received bitstream has a value of 0 and the motion information received from the multiplexer 241 has the reference picture index, the differential motion vector, and the AMVP index.

The merge mode motion information decoding unit 243 is activated when the motion information coding mode of the current block is determined to be the skip or merge mode by the motion information coding mode determination unit 242, and the AMVP mode motion information decoding unit 244 is activated when the motion information coding mode of the current block is determined to be the AMVP mode by the motion information coding mode determination unit 242.

The selected mode motion information decoder 248 performs decoding processing on motion information in a prediction mode selected from among the motion compensation prediction modes other than the AMVP mode, the merge mode, and the skip mode. The selection of the prediction mode includes a more sophisticated motion prediction mode than the AMVP mode, and the block is adaptively determined according to conditions determined in advance, such as a block size and block partition information, the presence of signaling information, a block position, and the like. Selecting the prediction mode includes: for example, at least one of FRUC (frame rate UP CONVERSION, FRAME RATE UP-CONVERSION) mode, BIO (BI-DIRECTIONAL OPTICAL FLOW) mode, AMP (AFFINE MOTION PREDICTION) mode, OBMC (OVERLAPPED BLOCK MOTION COMPENSATION) mode, DMVR (decoding-end MOTION VECTOR correction) mode, ATMVP (optional temporal MOTION VECTOR PREDICTION) mode, STMVP (Spatial-temporal MOTION VECTOR PREDICTION) mode, LIC (Local OPTICAL COMPENSATION) mode.

The prediction block generation unit 245 generates a prediction block of the current block using the motion information restored by the merge mode motion information decoding unit 243 or the AMVP mode motion information decoding unit 244.

In the case where the motion vector is in integer units, a block corresponding to a position where the motion vector in the picture displayed with reference to the picture index is displayed is copied to generate a prediction block of the current block.

In the case where the motion vector is not in integer units, pixels of the prediction block are generated from integer unit pixels in a picture in which the reference picture index is displayed, and in this case, in luminance pixels, an 8-step interpolation filter is used, and in the case of color difference pixels, a 4-step interpolation filter is used to generate prediction pixels.

The residual block decoding unit 246 entropy-decodes the residual signal, and generates a two-dimensional quantized count block by inverse scanning the entropy-decoded coefficient, the inverse scanning method being different depending on the entropy decoding method.

For example, in the case of decoding by adaptive binary arithmetic coding (CABAC) based on an environment, the inverse scanning manner is applied in a zigzag inverse scanning manner in the case of decoding by adaptive variable length coding (CAVLC) in the diagonal direction by the raster inverse scanning manner. The reverse scan method may be determined differently according to the size of the prediction block.

The residual block decoding unit 246 inverse-quantizes the count block generated as described above using the inverse quantization matrix, and restores the quantization parameters in order to guide the quantization matrix. Here, the quantization step size is restored in coding units of a size set in advance or larger.

The residual block decoding unit 260 performs inverse transformation on the inverse-quantized count block to restore a residual block.

The recovery block generator 270 adds the prediction block generated by the prediction block generator 250 and the residual block generated by the residual block decoder 260 to generate a recovery block.

An embodiment of a process of restoring the current block by intra prediction is described below with reference to fig. 7 again.

First, the intra prediction mode of the current block is decoded from the received bitstream, and to this end, the entropy decoding unit 210 restores the first intra prediction mode index of the current block with reference to one of the intra prediction mode tables.

The plurality of intra prediction mode tables are shared by the encoding device 10 and the decoding device 20, and any one selected according to the distribution of intra prediction modes of a plurality of blocks adjacent to the current block is applied.

For example, when the intra prediction mode of the left block of the current block is the same as the intra prediction mode of the upper block of the current block, the first intra prediction mode index of the current block is restored by applying the first intra prediction mode table, and when the intra prediction mode of the left block of the current block is different from the intra prediction mode of the upper block of the current block, the first intra prediction mode index of the current block is restored by applying the second intra prediction mode table.

As another example, when the intra prediction modes of the upper block and the left block of the current block are all directional prediction modes (directional intra prediction modes), and when the direction of the intra prediction mode of the upper block and the direction of the intra prediction mode of the left block are within a predetermined angle, the first intra prediction mode index of the current block is restored by applying the first intra prediction mode table, and when the direction of the intra prediction mode of the upper block and the direction of the intra prediction mode of the left block are not within the predetermined angle, the first intra prediction mode index of the current block is restored by applying the second intra prediction mode table.

The entropy decoding part 210 transfers the first intra prediction mode index of the restored current block to the intra prediction part 230.

The intra prediction part 230, which receives the index from the first intra prediction mode, determines the most probable mode of the current block as the intra prediction mode of the current block for the case where the index has the minimum value (i.e., the case of 0).

In addition, the intra prediction unit 230 compares an index indicating a maximum possible mode of the current block with the first intra prediction mode index for a case where the index has a value other than 0, and determines, as the intra prediction mode of the current block, an intra prediction mode corresponding to a second intra prediction mode index that is obtained by adding 1 to the first intra prediction mode index in a case where the first intra prediction mode index is not less than the index indicating the maximum possible mode of the current block as a result of the comparison, and otherwise determines the intra prediction mode corresponding to the first intra prediction mode index as the intra prediction mode of the current block.

The intra prediction mode allowable for the current block is composed of at least one non-directional mode (non-directional mode) and a plurality of directional modes (directional modes).

The one or more non-directional modes are DC modes and/or planar (planar) modes. And, either one of the DC mode and the planar mode is adaptively included in the allowable intra prediction mode set.

For this purpose, information specifying a non-directional mode contained in the allowable intra prediction mode set is contained in a picture header or a slice header.

Next, the intra prediction unit 230 reads the reference pixels from the picture storage unit 260 and determines whether or not there is a reference pixel that cannot be used in order to generate an intra prediction block.

The determination may be performed based on the presence or absence of a reference pixel used when generating the intra prediction block, using the intra prediction mode of decoding of the current block.

Next, when it is necessary to generate a reference pixel, the intra prediction unit 230 generates a reference pixel at a position unavailable using a reference pixel that is available and recovered at an early stage.

The definition of the unavailable reference pixels and the method of generating the reference pixels are the same as those of the intra prediction unit 150 in fig. 1, but the reference pixels used when generating the intra prediction block can be selectively restored according to the intra prediction mode of decoding the current block.

The intra prediction unit 230 determines whether or not to apply a filter to the reference pixel in order to generate the prediction block, that is, whether or not to apply a filter to the reference pixel in order to generate the intra prediction block of the current block based on the decoded intra prediction mode and the size of the current prediction block.

The problem of deblocking artifacts becomes greater as the size of a block becomes larger, and thus the number of prediction modes for filtering reference pixels increases as the size of the block becomes larger.

In the case where it is determined that the filter needs to be applied to the reference pixel, the intra prediction unit 230 filters the reference pixel using the filter.

At least two filters can also be adapted according to the degree of difference in step difference between the reference pixels. Preferably, the filter coefficients of the filter are symmetric.

The two or more filters can be adaptively applied according to the size of the current block, and when a filter is applied, a filter having a narrow bandwidth is applied to a block having a small size, and a filter having a wide bandwidth is applied to a block having a large size.

In the DC mode, the prediction block is generated based on the average value of the reference pixel, and thus, in the vertical mode having correlation (correlation) in different vertical directions, the filter is not required to be applied to the reference pixel, and in the horizontal mode in which the video has correlation in the horizontal direction, the filter is not required to be applied to the reference pixel.

Thus, the reference pixel is adaptively filtered based on the intra prediction mode of the current block and the size of the prediction block, since there is a correlation between the applicability or non-applicability of the filtering and the intra prediction mode of the current block.

Next, the intra prediction unit 230 generates a prediction block using the reference pixels or the filtered reference pixels according to the restored intra prediction mode, and the generation of the prediction block is the same as the operation of the encoding device 10, and thus, a detailed description thereof will be omitted.

The intra prediction unit 230 determines whether or not to filter the generated prediction block, which is determined using information included in a slice header or a coding unit header or according to an intra prediction mode of the current block.

When it is determined that the generated prediction block is to be filtered, the intra prediction unit 230 generates a new pixel by filtering a pixel at a specific position of the generated prediction block using an available reference pixel adjacent to the current block.

For example, in the DC mode, a prediction pixel in contact with a reference pixel among the prediction pixels performs filtering with the reference pixel in contact with the prediction pixel.

Therefore, the prediction pixels are filtered using one or two reference pixels according to the positions of the prediction pixels, and the filtering of the prediction pixels in the DC mode is applied to prediction blocks of all sizes.

In the vertical mode, among the prediction pixels of the prediction block, the prediction pixel that is in contact with the left-side reference pixel is changed by a reference pixel other than the upper-side pixel used when the prediction block is generated.

Similarly, among the prediction pixels generated in the horizontal mode, the prediction pixel that is in contact with the upper reference pixel is changed by a reference pixel other than the left pixel used when the prediction block is generated.

The current block is restored using the predicted block of the current block restored in the above-described manner and the residual block of the decoded current block.

Fig. 9 is a diagram for explaining a second embodiment of a method of processing an image by dividing the image into blocks.

Referring to fig. 9, a Coding Tree Unit (CTU) having a maximum size of 256 × 256 pixels is first divided into a quad tree (quad tree) structure and four Coding Units (CUs) having a square shape.

Here, at least one of the coding units divided into the quad-tree structure is divided into a binary tree (binary tree) structure, and is again divided into two Coding Units (CUs) having a rectangular shape.

At least one of the coding units divided into the quadtree structure may be divided into the quadtree structure, or may be divided into four Coding Units (CUs) having a square shape again.

At least one of the coding units subdivided into the binary tree structure is subdivided into two Coding Units (CUs) having a square or rectangular shape.

At least one of the coding units subdivided into the quadtree structure may be subdivided into a quadtree structure or a binary tree structure, or may be subdivided into Coding Units (CUs) having a square or rectangular shape.

As described above, the coding unit divided into the binary tree structure cannot be further divided and used for prediction and conversion. At this time, the binary-divided Coding unit includes a Coding Block (CB), which is a block unit actually performing encoding/decoding, and a syntax corresponding to the corresponding Coding block. That is, the sizes of the Prediction Unit (PU) and the conversion unit (TU) to which the Coding Block (CB) belongs are the same as the sizes of the corresponding Coding Blocks (CB) as shown in fig. 9.

As described above, the coding unit divided into the quadtree structure is divided into one or more Prediction Units (PUs) using the method described with reference to fig. 3 and 4.

Also, as described above, the coding unit divided into the quadtree structure is divided into one or more conversion units (TUs) having a maximum size of 64x64 pixels using the method described with reference to fig. 5.

Fig. 10 is a diagram showing an example of a syntax (syntax) structure used for processing in order to divide an image into block units.

Referring to fig. 10 and 9, a block structure according to an embodiment of the present invention is determined by split _ cu _ flag indicating whether a quad tree is split or not and binary _ split _ flag indicating whether a binary tree is split or not.

For example, whether or not the Coding Unit (CU) is divided as described above is displayed by split _ CU _ flag. And, corresponding to the coding unit of the binary partition after the quad-tree partition, a binary _ split _ flag indicating whether the binary partition is present or not and a syntax indicating the partitioned direction are determined. In this case, as a method of displaying the directionality of binary division, as shown by the binary _ split _ hor and the binary _ split _ ver, a method of decoding a plurality of syntaxes and decoding one syntax and a corresponding signal value as shown by the binary _ split _ mode or a method of decoding a signal value based on the syntax and processing division processing in the Horizontal (Horizontal) (0) or Vertical (Vertical) (1) direction is exemplified.

As still another embodiment of the present invention, the depth of a Coding Unit (CU) divided using a binary tree is displayed using a binary tree depth (binary _ depth).

The methods described with reference to fig. 1 to 8 are applied to blocks (e.g., Coding Units (CUs), Prediction Units (PUs), and Transform Units (TUs)) divided by the methods described with reference to fig. 9 and 10, and encoding and decoding of a video are performed.

Next, still another embodiment of a method of dividing a Coding Unit (CU) into one or more conversion units (TUs) will be described with reference to fig. 11 to 16.

According to an embodiment of the present invention, a Coding Unit (CU) is divided into a binary tree structure and divided into a basic unit, i.e., a conversion unit (TU), which performs conversion of a residual block.

For example, referring to fig. 11, at least one of rectangular coding blocks (CU0, CU1) divided into a binary tree structure and having a size of Nx2N or 2NxN is divided into a binary tree structure again and divided into square conversion units (TU0, TU1) having a size of NxN.

As described above, the block-based image encoding method performs prediction, conversion, quantization, and entropy encoding steps.

In the prediction step, a difference signal with respect to the current block is calculated by generating a prediction signal with reference to the block to be currently encoded and the currently encoded video or the peripheral video.

In the conversion step, the differential signal is input to perform conversion using various conversion functions, and the converted signal is subjected to Energy concentration (Energy contribution) classified by a DC coefficient and an AC coefficient to improve the encoding efficiency.

In the quantization step, quantization is performed by inputting a Transform coefficient (Transform coefficient), and then entropy coding is performed on the quantized signal, thereby encoding a video.

In addition, the video decoding method is performed in reverse order of the encoding process described above, and the current image quality of the video is distorted in the quantization step.

As a method for improving the coding efficiency and reducing the current image quality distortion, the size and shape of the conversion unit (TU) and the type of the applicable conversion function are diversified according to the distribution of the differential signal input in the conversion step and the characteristics of the image.

For example, in the prediction step, in the case of searching for a block similar to the current block based on the block motion estimation process, the distribution of the Difference signal is generated in various forms according to the characteristics of the image by using a cost (cost) measurement method such as SAD (sum of Absolute Difference) or MSE (Mean Square error).

Thereby, the size or shape of the Conversion Unit (CU) is selectively determined based on the distribution of various differential signals to perform conversion, thereby performing efficient encoding.

Referring to fig. 12, in an arbitrary coding unit (CUx), when a differential signal occurs as shown in the display (a) of fig. 12, the corresponding coding unit (CUx) is divided into two conversion units (TUs) in a binary tree structure as shown in the display (b) of fig. 12, thereby performing efficient conversion.

For example, the DC value is an average value of a general display input signal, and thus, as shown in (a) of fig. 12, in the case where a differential signal is received by input of a conversion process, the encoding unit (CUx) is divided into two conversion units (TUs), thereby effectively displaying the DC value.

Referring to fig. 13, a coding unit (CU0) of a square shape having a size of 2Nx2N is divided into a binary tree structure, and divided into rectangular conversion units (TU0, TU1) having a size of Nx2N or 2 NxN.

According to still another embodiment of the present invention, the step of dividing the Coding Unit (CU) into the binary tree structure is repeatedly performed two or more times to be divided into the plurality of conversion units (TUs), as described above.

Referring to fig. 14, a rectangular coded block (CB1) having a size of Nx2N is divided into a binary tree structure, a block having a size of NxN divided is again divided into a binary tree structure to form a rectangular block having a size of N/2xN or NxN/2, and then, a block having a size of N/2xN or NxN/2 is again divided into a binary tree structure and divided into square conversion units (TU1, TU2, TU4, TU5) having a size of N/2 xN/2.

Referring to fig. 15, a square coding unit (CU0) having a size of 2Nx2N is divided into a binary tree structure, a block having a size of the divided Nx2N is again divided into a binary tree structure to form a square block having a size of NxN, and then the block having the size of NxN is again divided into rectangular conversion units (TU1, TU2) having a size of N/2xN by dividing into a binary tree structure.

Referring to fig. 16, a rectangular coding unit (CU0) having a size of 2NxN is divided into a binary tree structure, and a block having the size of the divided NxN is again divided into a quadtree structure and divided into square conversion units (TU1, TU2, TU3, TU4) having a size of N/2 xN/2.

The method described with reference to fig. 1 to 8 is applied to blocks (e.g., Coding Units (CUs), Prediction Units (PUs), and conversion units (TUs)) divided by the method described with reference to fig. 11 to 16, thereby performing encoding and decoding of a video.

Next, an embodiment of a method for determining a block division structure by the encoding device 10 of the present invention will be described.

The picture dividing unit 110 provided in the video encoding apparatus 10 executes rdo (rate distortion optimization) in accordance with a preset order, and determines the division structures of the encoding unit (CU), the Prediction Unit (PU), and the conversion unit (TU) that can be divided as described above.

For example, to determine the block division structure, the picture division part 110 performs RDO-Q (Rate distortion Optimization-Quantization) and determines an optimal block division structure at the bit Rate (bitrate) and distortion (distortion) sides.

Referring to fig. 17, in case that the Coding Unit (CU) has a shape of 2Nx2N pixel size, RDO is performed in order of the conversion unit (PU) partition structure of 2Nx2N pixel size shown in (a) of fig. 17, (n x n pixel size shown in (b), n x2N pixel size shown in (c) of fig. 17, and 2n x n pixel size shown in (d) of fig. 17 to determine the optimal partition structure of the conversion unit (PU).

Referring to fig. 18, in case that the Coding Unit (CU) has a form of Nx2N or 2NxN pixel size, the optimal partition structure of the conversion unit (PU) is determined by sequentially performing RDO in a conversion unit (PU) partition structure of Nx2N (or, 2NxN) pixel size shown in (a) of fig. 18, NxN (or NxN/2) and NxN pixel size shown in (b) of fig. 18, N/2xN/2, N/2xN, and NxN shown in (c) of fig. 18, N/2xN/2, N/2xN, and NxN pixel size shown in (d) of fig. 18, and N/2xN pixel size shown in (e) of fig. 18.

In the above, the block segmentation method of the present invention is described by taking rdo (rate distortion optimization) as an example to determine the block segmentation structure, but the picture segmentation part 110 utilizes SAD (sum of Absolute difference) or MSE (Mean Square Error) to determine the block segmentation structure to reduce complexity and maintain proper efficiency.

According to an embodiment of the present invention, whether Adaptive Lof Filtering (ALF) is applicable or not is determined in units of Coding Units (CUs), Prediction Units (PUs), or conversion units (TUs) partitioned as described above.

For example, whether or not Adaptive Loop Filtering (ALF) is applied is determined by a Coding Unit (CU) unit, and the size or coefficient of the loop filter applied differs according to the Coding Unit (CU).

In this case, information indicating whether the Adaptive Loop Filter (ALF) is applied or not is included in the slice header.

In the case of color difference signals, whether Adaptive Loop Filtering (ALF) is applicable or not can be determined on a picture-by-picture basis, and the loop filter may be rectangular, having a different shape and luminance.

And, the Adaptive Loop Filtering (ALF) determines applicability or non-applicability by slice. Therefore, information indicating whether Adaptive Loop Filtering (ALF) is applicable or not in the current slice is included in the slice header or picture.

In case the current slice is adapted for adaptive loop filtering, the slice header or picture header additionally contains information of the filter length in the horizontal and/or vertical direction of the luminance component used in the adaptive loop filtering process.

The slice header or the picture header includes information indicating the number of filter components, and when the number of filter components is 2 or more, the filter coefficients may be encoded by using a prediction method.

Therefore, the slice header or the picture header includes information showing whether the filter coefficient is encoded by the prediction method, and the predicted filter coefficient is included in the case of using the prediction method.

In this case, information indicating whether or not each color difference component is filtered is included in a slice header or a picture header, and is jointly encoded (i.e., multi-encoded) together with information indicating whether or not Cr and Cb are filtered in order to reduce the number of bytes.

At this time, in the case of color difference components, in order to reduce complexity, the most frequent possibility is high for the case where Cr and Cb are not all filtered, and therefore, in the case where Cr and Cb are not all filtered, entropy encoding is performed by assigning the smallest index.

When all of Cr and Cb are filtered, entropy encoding is performed by assigning the largest index.

Fig. 19 to 29 are diagrams for explaining a composite segmented structure according to another embodiment of the present invention.

For example, referring to fig. 19, the Coding Unit (CU) displays the shape of the Coding Unit (CU) divided into a rectangle having a shape in which the horizontal length W is longer than the vertical length H in fig. 19 (a) and a rectangle having a shape in which the vertical length H is longer than the horizontal length W as shown in fig. 19 (B), according to the division into the binary tree structure. Thus, in the case of a coding unit having a long length in a specific direction, there is a high possibility that the information is concentrated in the boundary regions on the left and right sides or the upper and lower sides of the edge, as compared with the middle region.

Therefore, in order to perform more precise and efficient encoding and decoding, the encoding apparatus 10 according to the embodiment of the present invention divides the coding unit into the ternary tree (ternary tree) or ternary tree (triple tree) structure in which the edge region of the coding unit divided into longer specific directional lengths is easily divided by the quadtree and binary tree division.

For example, fig. 19 (a) shows, in the case where the coding unit to be segmented is a coding unit that is horizontally segmented, ternary segmentation is performed for the first region at the left edge, which is the horizontal W/8 and vertical H/4 length, the second region, which is the horizontal W/8 × 6 and vertical H/4 length, and the third region at the right edge, which is the horizontal W/8 and vertical H/4 length.

Fig. 19 (B) shows that, when the coding unit to be divided is a coding unit that is divided vertically, the coding unit is divided into a first region at the upper edge, which is a horizontal W/4 and vertical H/8 length, and a second region at the middle edge, which is a horizontal W/4 and vertical H/8 length, and a third region at the lower edge, which is a horizontal W/4 and vertical H/8 length.

The encoding device 10 according to the embodiment of the present invention processes the division of the ternary tree structure by the picture dividing unit 110. Therefore, the picture dividing unit 110 determines the division of the quad tree and the binary tree structure in accordance with the coding efficiency, and also re-determines the division scheme of the subdivision based on the quad tree structure.

Here, the partition of the ternary tree structure is processed for all coding units without other restrictions. However, as described above, it is preferable to allow the use of the ternary tree structure only for a coding unit of a specific condition based on the coding and decoding efficiency.

Also, preferably, the ternary tree structure requires various ways of ternary partitioning of the coding tree unit, but allows only the use of optimally defined shapes based on coding and decoding complexity and transmission bandwidth of signaling.

Therefore, when the picture dividing unit 110 determines the division of the current coding unit, the current coding unit determines and determines whether or not the specific-shaped ternary tree structure is divided, only when the condition is set in advance. Also, according to the tree allowing the ternary tree as described above, the division ratio of the binary tree is not only 1:1 but is expanded and changed to 3:1, 1:3, and the like. Accordingly, the partition structure of the coding unit of the embodiment of the present invention includes a composite tree structure that is subdivided into a quadtree, a binary tree, or a ternary tree according to a scale.

For example, the picture dividing unit 110 determines a composite division structure of the division target coding unit based on the division table.

According to the embodiment of the present invention, the picture dividing section 110 processes the quadtree division corresponding to the maximum size of the block (e.g., based on the pixels 128x128, 256x256, etc.), and performs a composite division process of processing at least one of the binary tree structure and the ternary tree structure division corresponding to the terminal node of the quadtree division.

In particular, according to an embodiment of the present invention, the picture division part 110 determines any one of BINARY tree division, i.e., first BINARY division (BINARY1), second BINARY division (BINARY2), ternary tree division, i.e., first ternary division (TRI1), or second ternary division (TRI2), corresponding to the characteristics and size of the current block according to the division table.

Here, the first binary partition corresponds to a vertical or horizontal partition having a ratio of N: N, the second binary partition corresponds to a vertical or horizontal partition having a ratio of 3N: N or N:3N, and root (authority) CUs of the respective binary partitions are divided into CUs 0 and CU1 of respective sizes displayed in the partition table.

The first ternary division corresponds to a vertical or horizontal division having a ratio of N:2N: N, the second ternary division corresponds to a vertical or horizontal division having a ratio of N:6N: N, and the root CUs of the respective ternary divisions are divided into CUs 0, CU1, and CU2 of respective sizes shown in the division table.

However, the picture dividing part 110 of the embodiment of the present invention sets the maximum coding unit size and the minimum coding unit size for applying the first binary division, the second binary division, the first ternary division, or the second ternary division, respectively.

Since the case where encoding and decoding processes corresponding to a block having a minimum size, for example, a horizontal or vertical pixel of 2 or less, are performed is inefficient in terms of complexity, the partition table according to the embodiment of the present invention defines a partition structure allowable for each size of coding unit in advance.

Thus, the picture dividing unit 110 prevents the minimum size, for example, a size less than 4, from being divided in the case where the horizontal or vertical pixel size is 2 in advance, and determines whether or not the first binary division, the second binary division, the first ternary division, or the second ternary division is allowed or not in advance from the size of the division target block, compares the results with the rate distortion optimization performance check corresponding to the allowable division structure, and determines the optimum division structure.

For example, in the case where the maximum-sized rights coding unit CU0 is binary-divided, the binary division structure is divided into CUs 0, CU1 constituting any one of 1:1, 3:1, or 1:3 vertical divisions, and the ternary division structure is divided into CUs 0, CU1, and CU2 constituting any one of 1:2:1 or 1:6:1 vertical divisions.

The allowable vertical division structure is restrictively determined according to the size of the division object coding unit. For example, the vertical partition structures of the 64X64 coding unit and the 32X32 coding unit all allow for the first binary partition, the second binary partition, the first ternary partition, and the second ternary partition, but the second ternary partition limit in the vertical partition structure of the 16X16 coding unit is not possible. Also, the vertical split structure of the 8X8 coding unit can allow the first binary split to be limited. Thus, the division of the block of the minimum size due to the complexity is prevented in advance.

Likewise, in the case where the maximum-sized rights coding unit CU0 is binary split, the binary split structure is split into CU0, CU1 constituting any one of 1:1, 3:1 or 1:3 horizontal splits, and the ternary split structure is split into CU0, CU1 and CU2 constituting any one of 1:2:1 or 1:6:1 horizontal splits.

The allowable horizontal division structure is restrictively determined according to the size of the division object coding unit. For example, the horizontal split structures of the 64X64 coding unit and the 32X32 coding unit all allow for the first binary split, the second binary split, the first ternary split, and the second ternary split, but the second ternary split is restricted to be impossible in the horizontal split structure of the 16X16 coding unit. Also, the horizontal split structure of the 8X8 coding unit can restrictively allow only the first binary split. Thus, the division of a block of a size less than the minimum size, which causes complexity, is prevented in advance.

The picture dividing unit 110 performs horizontal division processing on the vertically divided coding units in the first binary division or the second binary division, or performs horizontal division processing in the first ternary division or the second ternary division, according to the division table.

For example, for coding units vertically divided at 32X64, the picture divider 110 divides into CU0, CU1 of 32X32 according to the first binary division, or C0, CU1 of 32X48, 32X16 according to the second binary division, or CU0, CU1, CU2 of 32X32, 32X16, 32X16 according to the first ternary division, or CU0, CU1, CU2 of 32X8, 64X48, 32X8 according to the second ternary division.

The picture dividing unit 110 performs vertical division processing on the horizontally divided coding units in the first binary division or the second binary division, or performs vertical division processing in the first ternary division or the second ternary division.

For example, the picture dividing unit 110 is divided into CU0 and CU1 of 16X16 according to the first binary division, or C0 and CU1 of 24X 168X 16 according to the second binary division, or CU0, CU1 and CU2 of 8X16, 16X16 and 8X16 according to the first ternary division, or CU0, CU1 and CU2 of 4X16, 24X16 and 4X16 according to the second ternary division, corresponding to the coding unit divided horizontally by 32X 16.

The division allowance structure is determined by the condition unit differently in the vertical and horizontal directions depending on the size of the CTU, the CTU group unit and the slice unit, and the CU division ratio and the determination size information in the case of the first binary division, the second binary division, the first ternary division, and the second ternary division are defined by the division table or are set in advance by the condition information.

The division target coding unit is divided as an equal horizontal or vertical division. However, the equal division is an inefficient prediction method when a region with a high predicted value set exists only in a part of the boundary region. Thus, the picture dividing unit 110 according to the embodiment of the present invention allows the condition unit to allow the non-uniform division of the non-uniform division according to the fixed ratio shown in fig. 20 (C).

For example, in the case where Binary equal division is Binary: 1:1, the non-equal division is determined as the ratios of Asymmetric Binary (1/3, 2/3), (1/4, 3/4), (2/5, 3/5), (3/8, 5/8), (1/5, 4/5). For example, for a trifurcate equal division of 1:2:1, the unequal division variably determines its ratio as shown by 1:6: 1.

The picture dividing Unit 110 according to the embodiment of the present invention basically divides a picture into a plurality of Coding Tree Units (CTUs) including Coding units that are prediction units, and the plurality of Coding Tree units constitute tile units or slice units. For example, a picture may be divided into a plurality of tiles each having a rectangular area, or the picture may be divided into tiles divided into one or more columns, tiles divided into one or more rows, or tiles divided into one or more columns and one or more rows. The picture may be divided into tiles of the same uniform size or tiles of different sizes based on the length of the horizontal and vertical lines in the picture.

In general, according to the standard syntax such as HEVC, in the case of an area divided into tiles or slices, high-level syntax is assigned and encoded as the header information so that the processing can be performed independently of other tiles or slices. With this high level syntax, parallel processing can be performed on a tile or slice basis.

However, the current tile or slice coding scheme depends only on the coding conditions in the coding apparatus, but has a problem that the performance and environment of the decoding apparatus are not considered. For example, even if the number of processes or threads of the decoding apparatus is larger than that of the encoding apparatus, there is a problem that the performance thereof cannot be used.

In particular, the one-way division structure and the title determination program depend on an encoding device for a current video that requires partial decoding processing based on ultra-high resolution and user viewpoint tracking such as a recently-developed 360-degree virtual reality video, and eventually have a problem of causing deterioration in overall encoding and decoding performance.

According to the embodiment of the present invention for solving this problem, the picture dividing section 110 divides the plurality of tiles of the divided picture into independent tiles or dependent tiles determined within the tile or tile group as described above, and assigns attribute information to each tile independently or dependently encoded and decoded from other tiles to constitute header information corresponding thereto.

The picture dividing unit 110 according to the embodiment of the present invention divides the picture into a tile group or a sub-picture in which a plurality of tiles are continuously arranged, according to the position and the attribute of the tile, encodes configuration information corresponding to each sub-picture included in each tile group or sub-picture group, and transmits the encoded configuration information to the decoding device 20, thereby enabling independent or dependent processing of tiles included in the tiles or sub-pictures corresponding to the tile group.

Thus, the tile group or sub-picture is not limited by the name, and the substantial meaning is formed by dividing the picture, displaying one or more rectangular regions made up of tiles or slices. Therefore, although the divided region according to the embodiment of the present invention is mainly described by the name of a tile group, the divided region is a rectangular region of a divided Picture and can also be referred to as a sub Picture (subpicture) region including one or more tiles or slices. The independent or dependent processing of each subgraph can also be determined from the signaling of the composition information of the subgraph group containing the subgraph. Therefore, the technical structure corresponding to the tile group described below is also applicable to the sub-picture.

Here, independent means that processing as an independent picture is performed regardless of other tiles, tile groups, or sub-pictures that partition encoding and decoding processes including intra prediction, inter prediction, transformation, quantization, entropy, and filters. But does not mean that all encoding and decoding processes are performed completely independently for each tile, and it is also possible to select information using different tiles for encoding and decoding at the time of inter prediction or loop filtering.

The dependency indicates a case where encoding or decoding information of another tile is required in encoding and decoding processes including intra prediction, inter prediction, conversion, quantization, entropy, and a filter. But not all of the encoding and decoding processes are performed on the complete dependency of each tile, and independent encoding and decoding can be handled in some cases.

As described above, the tile group indicates the specific area within the picture formed by arranging the pictures in series, and the picture dividing unit 110 according to the embodiment of the present invention executes the tile group structure of the coding conditions and the generation process of the tile group information, which enables the more efficient parallel decoding process of the environment and performance in the decoding apparatus 20.

As described above, the tile group information includes sub-group configuration information corresponding to a sub-picture or a sub-picture group.

In this regard, description will be first made on tile group information to be processed and specified by the picture dividing unit 110.

FIG. 20 is a flowchart for explaining the process of encoding tile group information of the embodiment of the present invention.

Referring to fig. 20, the encoding device 10 according to the embodiment of the present invention divides a picture into a plurality of tile regions by the picture dividing unit 110 (S1001), and configures one or more tile groups or sub-pictures based on coding characteristic information of the divided tiles (S1003).

The encoding device 10 generates tile group information or sub-picture information corresponding to each tile group by the picture dividing unit 110 (S1005), encodes the generated tile group information or sub-picture information (S1007), and transmits the encoded tile group information or sub-picture information to the decoding device 20.

Here, the tile group information or the sub-picture information is header information for each tile group or sub-picture, and the header information is in the form of high-level syntax and is included in picture header information of the coded video bitstream. The header information may be in a form of a higher level syntax, and may be transmitted as Supplemental Enhancement Information (SEI) information included in the encoded video bitstream.

More specifically, the tile group or sub-picture information according to the embodiment of the present invention includes identification information for each tile group or sub-picture, and each tile group or sub-picture includes picture configuration information that enables efficient parallel decoding processing, either locally or independently.

For example, each tile group (or sub-picture) corresponds to a user viewpoint (PERSPECTIVE), or corresponds to a projection direction of a 360-degree video, or is configured according to a specific position, and thus, the tile group (or sub-picture) information includes characteristic information of each tile group, decoding or reference priority information corresponding to tiles included in the tile group, or parallelization possibility/non-possibility information, and thus, the decoding device 20 can perform variable and efficient video decoding processing.

The tile Group or sub-Picture information is updated and updated in units Of Group Of Pictures (GOP) to which each Picture belongs, and thus tile Group information or initialization is configured in accordance with a period Of NAL (Network abstraction layer) units.

Also, the level information is proposed as specific tile group (or sub-picture) information of an embodiment of the present invention. The level information can also indicate coding dependency or independence between tiles within each tile group (or sub-picture) or between tile groups (or sub-pictures) different from the slices, and is used for processing suitability determination in the decoding apparatus 20 corresponding to a value assigned according to the level information. That is, as described above, the decoding device 20 executes the processing suitability test program for the bitstream including the parallelization step for each tile group or sub-picture according to the performance and environment, and determines the processing suitability for each layer of the tile group (or sub-picture) included in the bitstream according to the level information.

And, for example, the number of tiles included in the tile group (or the sub-group), the CPB size, the bit rate, the presence or absence of independent tiles (or independent sub-groups) within the tile group (or the sub-group) or the independence or absence of all tiles, or whether all tiles are dependent or not, etc. are indicated by the level information.

For example, in the case of a 360-degree virtual reality video, high-level information can be determined corresponding to a tile group or a sub-group corresponding to a user VIEW PORT (VIEW PORT) that requires high-quality decoding according to the intention of a video creator or a content provider, and the first level information can be assigned to the tile group or the sub-group. For this case, the decoding apparatus 20 tests the processing suitability corresponding to the first level information of the tile group (sub-picture) according to the performance and environment, assigns the performance of the maximum value as possible, and independently processes each tile or sub-picture within the corresponding tile group or sub-picture group in parallel, respectively.

And, a second set of level information is assigned to a group of tiles at a relatively lower level than the first set of levels. For this case, decoding apparatus 20 tests the processing suitability corresponding to the tile group (sub-graph) level information according to the performance and environment, preferentially processes tiles specified by tiles that can be independent in parallel, performs processing of the remaining dependent tiles, and performs processing of the performance having an intermediate value. Here, the independent tile preferentially processed in parallel is a first tile of the tiles included within the tile group, and the merge tile is a dependent tile on which the encoding and decoding program depends the first tile.

In addition, a third group of level information is allocated to a tile group at a level lower than the second group of levels. For this case, the decoding apparatus 20 tests the processing suitability corresponding to the tile group (sub-picture) level information, and performs the decoding processing of the dependent tile within the current tile group using the tile decoding information processed in the other tile group.

In addition, the level information of the embodiment of the present invention includes parallel layer information indicating the parallelization processing possibility of the tiles or the sub-pictures within the tile group or the sub-picture group. The parallel layer information includes maximum or minimum parallel layer unit information showing steps of parallel processing of tiles or subgraphs within respective tile groups or subgraph groups.

Thus, the parallel layer unit information is divided into parallel layers in accordance with the step in which the tiles in the tile group correspond to the parallel layer unit information, and is composed of tile groups (sub-images), and the display is independently processed in parallel.

For example, the parallel layer divided in the step corresponding to the parallel layer unit information includes at least one independent tile, and the decoding device 20 determines whether or not the tiles within the encoded tile group are decoded in parallel by varying based on a parallelization determining procedure of the parallel layer information.

Here, the parallelization determining program includes a program for determining parallelization steps for each tile group based on the level information, the parallel layer information, and an environment variable of the decoding apparatus, the environment variable being determined according to at least one of a system environment variable, a network variable, and a user viewpoint (PERSPECTIVE) variable of the decoding apparatus 20.

Thus, the decoding apparatus 20 executes efficient early decoding setting based on the environment variable according to the self-suitability determination program of the decoding apparatus 20 based on the tile group or the sub-picture level information, and executes optimal local decoding and parallel decoding processing based on the early decoding setting, and particularly, the overall encoding and decoding performance corresponding to the super-high resolution merged video can be improved for the user viewpoint such as the 360-degree video.

Fig. 21 to 25 are diagrams for explaining illustration of tile groups and tile group information of the embodiment of the present invention.

Fig. 21 shows a partition tree structure partitioned into a plurality of tile groups according to an embodiment of the present invention, and fig. 22 is an exemplary diagram of the partitioned tile groups.

Referring to fig. 21, tile group information shows tile groups and tile configuration information in a picture based on a tree structure, and a picture 300 arbitrary according to the tile group information is configured of a plurality of tile groups.

In order to effectively present and encode the configuration information of the tile group, the tile group information includes the configuration information of the tile group in the picture according to a tree structure such as a binary tree, a ternary tree or a quadtree.

For example, in the tree structure, the corresponding components are represented by a root node, a parent node, and child nodes, and the parent node information corresponding to the tile group and the child node information corresponding to each tile are specified in the tile group header information in correspondence with the root node corresponding to the picture.

For example, the tile group header information specifies, as tile group information corresponding to a picture node (root node), intra-picture tile group number information, coding characteristic information to which the intra-picture tile groups are commonly applied, and the like.

In the tile group header information, the group level information and the parallel layer information corresponding to each tile group corresponding to the tile group node (parent node) are stored together with the number information of tiles in the tile group, the size information of the internal tiles, the coding characteristic information of the internal tiles, and the like, and the coding information of the tile groups different from each other is executed between the tile groups, different from the coding information applied or transmitted in common in the root node.

In addition, the tile group header information includes coding specific information of each tile together with coding information shared by each tile group node corresponding to the tile node (child node or terminal node). Thus, the encoding device 10 can perform encoding by applying different encoding conditions to each tile in the same tile group, and can include the information in the tile group header information.

In addition, for parallelization processing, tile group header information includes parallel layer information indicating a tile group of a layer unit capable of performing parallelization, and the decoding device 20 performs selective adaptive parallel decoding according to decoding conditions with reference thereto.

To this end, each tile is distinguished as an independent tile or a dependent tile as described above. Thus, the encoding device 10 encodes the independence information and the dependency information for each tile group/tile and transmits the encoded independence information and dependency information to the decoding device 20, and the decoding device 20 determines whether or not parallelization processing is possible between the tiles based on the encoded independence information and dependency information, and adaptively executes the most suitable high-speed parallelization processing according to the environment variables and performance.

Referring to fig. 22, one or more tile groups are included in one picture, and when the picture is divided into two or more tile groups, the number of tile groups is restrictively defined according to a rule defined in advance such as a multiple of 2 or a multiple of 4.

The tile group information includes information on the number of partitions of the tile group, and is encoded by specifying a tile group header.

For example, for the case where the number of tile groups is defined as T, a value converted by log2(T) is encoded in the tile group header.

In addition, each tile set includes more than one tile, which are determined whether to be processed in parallel in steps according to the parallel layer information. For example, the parallel layer information is presented in depth information (D) within a tile group, the number of tiles N within a tile group being exponentially formed by 2D, etc.

For example, the parallel layer information is specified by a value of 0 or more, and whether or not the tile corresponding to the parallel layer information is divided and parallelized is variably determined.

For example, referring to fig. 22, one tile group 1 is not divided into two or more specific tiles, and the parallel layer information of tile group 1 may be allocated to 0. At this time, the tile group 1 is composed of one tile, and the corresponding tile has the same size as the tile group 1.

In addition, tile group 2 includes two tiles, and in this case, the parallel layer information of the tile group may be allocated to 1.

And, tile group 3 contains four tiles, and the parallel layer information of the tile group can be allocated to 2 at this time.

And, tile group 4 contains 8 tiles, and in this case, the parallel layer information of the tile group may be allocated to 3.

In the present embodiment, the same description is made with the tile group as the sub-graph group and the tiles as the respective sub-graphs, and the level information and the parallel layer information corresponding to the sub-graphs included as the sub-graph group are determined.

In addition, fig. 23 shows various tile coupling structures of the tile group proposed in the present invention.

The tile group of the embodiment of the present invention includes a plurality of tiles specified by one tile or by multiples of 2, multiples of 4, for which the picture dividing section 110 divides the tiles in various ways and allocates a tile group composed of the divided tiles.

For example, as shown in fig. 23, in the case of allocating the first tile group 20, the second tile group 21, the third tile group 22, and the fourth tile group 23, the first tile group 20 includes tiles divided in a manner of Width (Width) (lateral length) and variable division Height (Height) (longitudinal length) of a specific tile, and the longitudinal lengths of the tiles are subdivided in a ratio of 1:2 to 1:3, and the sizes of the tiles in the tile groups can be configured to be adjusted differently.

Also, as shown in the case of the second tile group 21, the tile group 21 includes tiles that are fixed in height (longitudinal length) of a specific tile and divided in a variable division width (lateral length). At this time, the transverse length is further divided according to the equal proportion of 1:2 and 1:3, and the size of the tiles in the tile group can be formed by different adjustment.

In addition, the third tile group 22 is constituted by one tile, and in this case, the size and shape of the tile (22-1) are the same as those of the tile group 22.

The fourth tile group 23 is an example in which a plurality of tiles 23-1, 23-2, 23-3, and 23-4 are formed in the same size, and each tile 23-1, 23-2, 23-3, and 23-4 is a rectangle in the vertical direction, and forms one tile group 23. When the tile group is formed in the same size, one tile group can be formed in a horizontally rectangular shape.

Thus, encoding apparatus 10 configures the shape, number, and the like of tiles in a tile group differently for each tile group, and tile group information indicating the tile group information is encoded by a tile group header and signaled to decoding apparatus 20.

Thus, in decoding device 20, based on the tile group information, the shape, number, etc. of tiles within a tile group are determined by tile group. And, by the shape and number information of the tiles within the tile group, the local decoding or parallelization process of the decoding apparatus 20 is efficiently performed. Here, the level information relates to performance and suitability tests, and the number information of tiles included in the sub-group corresponding to the tile group information can also be used in determining the level information.

For example, in the case of the first tile group 20, the decoding apparatus 20 determines the size at a ratio of 1:1, guides the horizontal length from the size of the first tile group 20, and guides the vertical length by acquiring information on the intra-tile group information ratio (1:2, 1:3, etc.) or size information of another signaling for guiding the vertical length after guiding the horizontal length. Here, the horizontal size or vertical size information of each tile may be converted and directly transmitted to the decoding device 20 to be acquired.

For example, in the case of the second tile group 21, the decoding apparatus 20 determines the tile group 21 at a ratio of 1:1, guides the vertical length to the size of the second tile group 21, and determines the tile structure information for each tile group by using the horizontal length information in the additionally signaled tile group information.

As shown in the case of the fourth tile group 23, the decoding apparatus 20 performs vertical/horizontal division from the entire size of the tile group, and determines tile structure information of each tile group by guiding the horizontal and vertical sizes of specific tiles.

In addition, fig. 24 shows a specific process of a tile boundary region in the tile group-based encoding process of the embodiment of the present invention.

First, the encoding device 10 of the embodiment of the present invention displays size information of each tile by a numerical value such as height (M), width (N), or the like, or encodes from the numerical value information by a multiplication operation or a multiplication log conversion value or the like to be included in header information or processes to be processed guided by the number of CTUs within each tile so as to indicate the tile included in tile group information by each tile group.

For example, preferably the size of each tile is the minimum CTU unit, as is the maximum tile set unit, the size of the tile being defined by a multiple of 4 or 8 in both the lateral and longitudinal lengths within the range.

Also, according to an embodiment of the present invention, interleaving with CTU units occurs according to the size setting of tiles and tile groups. For example, the boundaries (42, 44) between tiles and respective CTUs that are typically included in a tile group coincide, but partitioning (41, 43) a tile or tile group to not coincide with inter-CTU boundaries occurs by picture boundary regions or the like.

Thus, the encoding device 10 performs parallel decoding in which a parallelization process is allocated in the decoding device 20, for each independent tile classification process in the case of a tile in which the inter-tile group boundary or the boundary line of the tile coincides with the CTU boundary.

But for the case where the boundary of the tile group or tile boundary does not coincide with the CTU boundary, encoding device 10 classifies the tile including the start position or Left-Top (x, y) position of the boundary region CTU as an independent tile, and classifies tiles adjacent thereto as dependent tiles.

In this case, the decoding apparatus 20 should first process the decoding of the independent tile and then perform the decoding of the dependent tile, thereby preventing the image quality from deteriorating.

Fig. 25 shows that each tile group 30, 31, 32, which is a form of dividing an arbitrary picture into various tile groups, is configured by a tile group including each tile, and the tile group identifier, parallel layer information, and tile group level information are assigned to each tile group, and tile group information corresponding to these tile group identifiers is signaled to the decoding apparatus 20 according to the embodiment of the present invention.

Each tile group 30, 31, 32 includes two, four, three, etc. tiles of various numbers, and the encoding device 10 determines and encodes a tile group header of the encoding characteristics of each tile group including encoding characteristic information such as on/off information of various encoding tools (OMAF, ALF, SAO, etc.) of the tiles within the tile group to transmit to the decoding device 20.

The tile group header information consists of a form that leads the next tile group header information from the tile group header information corresponding to the first tile in the picture.

For example, the encoding device 10 encodes the header information of the next tile group in such a manner that the header information of the first tile group in one picture is sequentially updated, using the option value, the difference signal, the Offset (Offset) information, and the like applied to the header information of the tile group and different from the header information of the first tile group. Thus, titles are obtained from all tile groups within a picture according to sequential upgrades.

The encoding device 10 may process the coding characteristic information applied for each tile group to be guided from the decoding device 20 based on the header information of the first tile group 30 in the picture, by using an option difference value, a difference signal, or Offset (Offset) information of the header information of the other tile groups 31 and 32 present in the picture.

Further, the starting Picture of a Group of pictures (Group of pictures) of the encoding apparatus 10 is encoded so as to upgrade tile Group header information of another Picture in a GOP including an IDR (Instantaneous Decoding Refresh) Picture, with reference to tile Group header information of a Picture encoded by the IDR Picture. This is explained further.

Fig. 26 is a flowchart for explaining a tile group information-based decoding process of an embodiment of the present invention.

Referring to fig. 26, first, the decoding apparatus 20 decodes tile group header information (S2001), and acquires tile group information based on the header information (S2003).

Decoding apparatus 20 determines one or more tiles (tiles) that divide a picture of an image based on tile group information, and combines the tiles to be composed of a plurality of tile groups.

Here, the decoding apparatus 20 guides tile configuration information within the tile group using CTU (coding tree unit) position information of the left and right lower ends additionally transmitted or guided by the encoding apparatus 10.

Furthermore, the encoding device 10 can cause the decoding device 20 to guide the tile configuration information within the tile, using only the CTU information at the lower right end in order to improve the encoding efficiency.

For example, CTU position information at the lower right end is signaled to the decoding apparatus 20 by a separate flag value. The decoding apparatus 20 can also guide the configuration information of the Row (Row) and Column (Column) of the tile by regarding the CTU located in the last Row and Column of the tile as the right-side lower end CTU.

At this time, CTU information at the upper left end is defined using CTU position information of the start of the tile. In this way, the decoding apparatus 20 guides the configuration information of the tiles in the tile group using the CTU position information of the left lower end and the right lower end, and in the case of a plurality of tiles, the same method is used to set the CTU having the boundary surface of the preceding tile as the start position as the start CTU and guides the position information of the right lower end CTU corresponding to the start CTU to guide the configuration information of the tiles in the tile group.

The decoding device 20 determines specific information for each tile group and tile using the tile group information (S2005), allocates parallel processes for each tile group/tile according to the determined characteristic information and environment variables (S2007), and performs decoding for the tile of the allocated parallel process (S2009).

More specifically, the tile group information includes tile group header information including structure information of each tile group, dependency or independence information between tiles within the tile group, the group level information, and the parallel layer information.

Selectively decoding in parallel each tile composed of the plurality of tile groups based on an environment variable of a decoding apparatus determined according to at least one of characteristic information, a system environment variable, a network variable, and a user view (PERSPECTIVE) variable determined from the tile group information.

In particular, the tile Group information is signaled from the encoding apparatus 10 as tile Group header information, stored and managed in the decoding apparatus 20, and is initialized or updated according to Group of Picture (Group of Picture) identification information including the Picture.

FIG. 27 is a diagram for explaining an initialization process of tile group headers of the embodiment of the present invention.

Referring to fig. 27, when a video bitstream including a plurality of GOPs is acquired from the decoding apparatus 20 and processed, tile group headers can be shared and updated in one GOP, and different tile group headers are used for initialization in mutually different GOPs.

For example, if there is a GOP 0 including N pictures, the POC 0 picture of the GOP 0 is a picture encoded by an IDR (Instantaneous Decoding Refresh), and the Decoding device 20 stores the first six tile group headers corresponding to the picture POC 0. At this time, each tile group header is classified by an inherent tile ID (identification number).

In this case, the decoding apparatus 20 specifies the configuration of the tile group using the position information when each tile group header includes, as unique information, the position information or address information of the upper left CTU and the position information or address information of the lower right CTU of the tile group in each picture.

More specifically, the tile group header (N1101) of POC 5 contains header information for a total of six tile groups, each of which is classified by an inherent ID.

Then, the tile group header information between pictures included in a specific GOP is processed by independently acquiring the post-dependent guide.

For example, referring to fig. 27, the decoding apparatus 20, in order to guide the tile group information (N1101) of POC 5, first directly utilizes the header information decoded from the first tile group header (N1100) of POC 0.

The decoding apparatus 20 determines the size and shape information of the tile group N1102 by using the upper left CTU position information and the lower right CTU position information of the tile group N1102 in order to acquire the configuration information of the tiles in each tile group.

Also, the decoding apparatus 20 can acquire tile group information of the second tile group (N1103) of POC 5.

In addition, decoding apparatus 20 refers to POC 8 tile group header (N1110), and already-encoded tile group header information.

For example, in the case of being in a group of pictures in which POC 0 and POC 8 are the same, the composition of the intra-picture tile group remains the same, and therefore, tile group composition information (the number of tile groups, location information of left-upper end CTU within tile group, location information of right-lower end CTU within tile group) of POC 8 is also guided from tile group header information of IDR picture (POC 0).

The POC 8 tile group header (N1110) guides the basic tile group configuration information and the coding condition information from the POC 5 tile group header (N1101), specifies the filtering conditions (ALF/SAO/De-blocking filter) for the POC 8 tile group, specifies Delta QPs different from each other for the tile group, or specifies and applies to an inter-picture predictive coding tool (such as an Overall Media (OMAF)/Affine (Affine)), using Offset information separately signaled from the coding apparatus 10, and performs selective adaptive decoding for the tile group.

Also, POC N-1 tile group header (N1120) is also derived from decoded POC 8 tile group header (N1110) or POC 5 tile group header (N1101), and decoding apparatus 20 derives only difference information to be upgraded from decoded tile group header information according to the activity in which decoding order or network element type derivation occurs, and encodes specific conditions by POC N-1 picture tile group.

In addition, the IDR picture POC M of the other GOP 1 is independently decoded, for which a tile group header (N1130) constituting the POC M is additionally specified. Thus, mutually different tile group structures are formed between mutually different GOPs.

Also, decoding apparatus 20 performs a partially parallel decoding process using the tile group level information and the tile group type information.

For example, as described above, each tile group is specified by mutually different tile group levels, and for example, in the case where the tile group level is 0 according to the tile group level, the tile group is classified as a dependent tile group if the independent tile group is 1.

And, the tile group information includes tile group type characteristic information, and the tile group type is classified as I, B, P type.

First, when a tile group of type I is designated as an independent tile group level, decoding by intra prediction needs to be performed within the tile group without referring to mutually different tile groups.

In addition, the case where the I-type tile group is specified at the dependent tile group level means that the video of the corresponding tile group is decoded by intra prediction, but decoding is performed with reference to an adjacent independent tile group.

In addition, a tile group displaying a B or P type performs video decoding by intra-picture prediction or inter-picture prediction, but a reference area and a range can be determined according to a tile group level.

First, for the case of a tile group of B or P type specified at an independent tile group level, decoding apparatus 20 performs intra-picture prediction decoding in a range within the independent tile group at the time of performing intra-picture prediction.

In the case where the decoding apparatus 20 specifies a B-type or P-type tile group at the dependent tile group level, when performing inter-picture prediction, it is possible to perform decoding with reference to the decoding information of the tile group defined for each independent tile group in the already-decoded picture.

For example, when the characteristic of the tile group specified at the dependent tile group level is of the B or P type, decoding apparatus 20 refers to the decoding information of the adjacent tile groups that have already been subjected to intra prediction and decodes the tile group, and sets a reference area from the independent tile groups that have already been subjected to inter prediction and performs motion compensation processing.

For example, N1102 of N1102 and N1121 of POC N-1 of a particular POC 5 are independent tile groups, and in the case of a tile group decoded before POC 8, for the case where N1111 is an independent tile, the tile group level of tile group N1111 to be decoded is 0, and the Type is B Type (B Type), the decoding apparatus 20 performs inter-picture prediction decoding with reference to N1102 and N1121 for tile group N1111 to be decoded. In contrast, as shown in N1104 of POC 5, in the case of a non-independent tile group, the decoding apparatus 20 refers to N1104 and cannot decode N1111.

When N1112 of POC 8 has a tile group level of 1 and a type of B or P, the decoding apparatus 20 determines the tile group (N1112) as a non-independent (dependent) tile group, and the reference area for intra prediction decoding is limited by the adjacent decoded tile group. For example, in the case where N1111 is already decoded, the decoding device 20 refers to N1111 and performs intra prediction decoding of N1112. When performing inter-picture prediction decoding, the decoding apparatus 20 refers to N1102 whose tile group level is 0 in the already decoded region, and performs inter-picture prediction decoding of N1112.

Thus, in the present invention, the reference structure of the prediction structure is limited or specified within a picture or between pictures by using the tile group level and tile group type information, and by doing so, the process enables efficient execution of tile group-unit partial decoding of an arbitrary picture.

Fig. 28 is a diagram for explaining variable parallel processing based on parallel layer units according to an embodiment of the present invention.

Referring to fig. 28, when a picture (N200) is divided into four tile groups N205, N215, N225, and N235, the tile group N205 is composed of two lower tiles N201 and N202, and N215 and N235 are respectively composed of two tiles N211 and N212 and one tile N231 tile. Like the tile group N225, eight tiles N221 to N224 and N226 to N229 constitute one tile group.

Thus, a tile group is made up of one or more tiles. Also, the size and coding characteristics of the tiles may be determined in a variety of ways, signaled by explicit coding of characteristics and configuration information of the tiles within the set of tiles, or indirectly directed at the decoding device 20.

As an example of a method of explicitly coding signaling, a method of specifying size information (Width/Height) of a lower tile in a tile group, first CTU position information of a lower tile, the number of columns and rows of tiles, the number of CTUs in a tile, and the like in a tile group header, or signaling structure information of a tile using position information of the first CTU position information (Top-Left) and the last CTU (Bottom-Right) in a tile is exemplified.

A method of grasping whether a tile is divided or not at a decoder according to the dependency or not of the coding information of the first CTU and the previous coding information for each tile by indirect guidance is exemplified.

Also, according to the embodiment of the present invention, tile group level information and tile parallel layer information are used in combination, and as shown in fig. 28, efficient parallel decoding is performed to maximize the performance of the decoding apparatus 20 that performs video encoding by tile group and tile division.

For example, in the encoding device 10, an arbitrary picture is divided into a plurality of tile groups according to the characteristics information of the video, the intention of the video creator, the service provider, and the like, and the encoding device 10 removes the dependency of prediction and reference between the respective tile groups and initializes the encoded data shared with the tile inner peripheral blocks such as the adaptive coded data, the motion vector predictor Buffer (MVP Buffer), and the post-prediction compensation list.

The encoding device 10 performs encoding in the decoding device 20 so that each of the plurality of tile groups is assigned to one parallel processing procedure and decoded. For example, one process (Core) or Thread (Thread) is allocated in the encoding apparatus 10 to perform single encoding or four processes are allocated to N205, N215, N225, and N235, respectively, to perform encoding in parallel.

The encoding device 10 transmits tile group level information and parallel layer information, to which the variable allocation parallel process is allocated in the decoding device 20, as tile group information to the decoding device 20 according to the encoding processing information.

Thus, the decoding apparatus 20 basically executes parallel processing performed by the encoding apparatus 10, and executes more specific, additional parallel processing or local decoding processing in accordance with the performance, the environment variables, and the like.

For example, the encoding processing information is exemplified as picture Projection format (Projection format) information of 360 pictures, viewpoint information (Viewport) information of video, etc., and the encoding apparatus 10 maps picture region information, which is intended by a video producer or a service provider to perform parallel/parent decoding or partial decoding, to tile group information according to importance of a specific tile group picture, region of interest (ROI) information, etc.

For example, the target video to be encoded is composed of an omnidirectional video as in the case of a 360-degree video, and the specific view video in the input video is mapped to the intra-picture tile group, and parallel decoding or local decoding is performed in the decoding device 20.

The parallel layer information for this shows whether the tile allocation within the tile group comprises minimum or maximum parallel layer step information in step and further parallel processing procedures.

Thus, the decoding apparatus 20 determines whether or not a plurality of processes/threads corresponding to the lower tile number are allocated within one tile group using the parallel layer information, and individually determines whether or not independent/dependent decoding is possible for any tile group or tile within the tile group using the tile group level information, thereby performing partial decoding at the user view point.

For example, decoding device 20 determines in steps whether to additionally perform a parallel process of partitioning in layer units for tiles within a tile group according to parallel layer information.

For example, decoding device 20 cannot execute additional parallel programs for tiles within a tile group if the parallel layer unit information is 0. In this case, a tile group level containing only dependent tiles can be allocated within a tile group.

Conversely, when the parallelization level value is 1, decoding apparatus 20 performs one additional parallel process for a particular tile within the set of tiles. At this time, a tile group level consisting of only independent tiles or a tile group level containing independent tiles and dependent tiles is allocated at tiles constituting the tile group.

In addition, decoding device 20 performs classification processing of the primary or local image according to the tile group level, or determines the presence or absence of independent and dependent tiles for tiles within a tile group.

For example, when the value of the tile group level is set to 0, the tiles of the tile group are all indicated as independent tiles, and decoding device 20 classifies by primary image tile group.

When the tile group level is set to 1, the first tile of the tile group is indicated as an independent tile, the remaining tiles are indicated as dependent tiles, and the decoding device 20 processes the tiles in the non-main image tile group classification.

When the tile group level is set to 2, all of the tiles in the tile group are designated as dependent tiles, and the decoding device 20 performs decoding processing with reference to the decoded neighboring tile group or the decoding information of the neighboring tiles.

Thus, the decoding apparatus 20 allocates the parallel threads (threads) corresponding to the tiles constituting the tile group in parallel layer units, and performs high-speed parallel decoding in frame units.

The decoding apparatus 20 checks the tile group level, determines a tile group corresponding to the main video or the main view according to the system environment, the network situation, the view change of the user, and the like, and independently performs intra-picture partial decoding corresponding to the specific view when the decoder determines the area that can be locally decoded.

For example, referring to fig. 28, the decoding apparatus 20 assigns two parallel decoding processes (cores) corresponding to N205 to N201 and N202, eight processes to N221 to 224 and N226 to 229, two processes to N211 and N212, and one process to N231 in N235, respectively, thereby assigning 13 processes at maximum and performing parallel decoding. The exemplary decoding apparatus 20 may process parallel decoding using a total of four processes by allocating one process to each of the four tile groups corresponding to N205, N215, N225, and N235 according to the performance and environment variables thereof, or may perform Single process (Single Core) decoding of an entire picture using one process.

For example, the decoding device 20 classifies the corresponding portion as a main video in the case where the tile group level of the specific tile group N205 is set high, classifies the corresponding portion as a non-main video in the case where the setting N215 is set low, identifies N211 among the tiles of N215 as an independent tile, and identifies N212 as a dependent tile.

As described above, the decoding device 20 performs processing to perform local high-speed parallel decoding by linking a tile group with an omnidirectional image such as a 360-degree image, and in particular, by classifying only a part of tile groups in one picture as a main image according to the intention of a content creator or the intention of a user (visual response, motion response).

For example, assuming that N225 is a front video, N205 is a left-right video (N201 (right), N202 (left)), and N215 and N235 are diagonal or rear videos, respectively, the decoding device 20 selectively performs local decoding only on the front video N225 and the left-right video N205 according to a user viewpoint change or the like, performs parallel decoding for four tile groups for N205, N215, N225, and N235, and performs parallel decoding using one-process (single) decoding or four-process using one or four cores at the decoder.

Thus, efficient decoding is performed by changing the allocation of one or more parallel processing processes based on the parallel layer information of the tile group and the tile group level information.

Fig. 29 is a diagram for explaining a case of mapping between tile group information and user viewpoint information according to an embodiment of the present invention.

Referring to fig. 29, a picture is composed of five tile groups N301, N302, N303, N304, and N305, and corresponds to a VIEW PORT (VIEW PORT) as viewpoint information of a video.

For example, a tile group corresponds to one or more view ports. As shown in fig. 29, N301 is mapped to the Center (Center) view port, N302 is mapped to the Left (Left) view port, N303 is mapped to the Right (Right) view port, N304 is mapped to the Top Right (Right Top) and Bottom Right (Right Bottom) view port images and N305 is mapped to the Top Left (Left Top) and Bottom Left (Left Bottom) view port images.

Also, multiple view ports can be mapped on a tile group, and in this case, each view port can be mapped on each tile in the tile group. For example, N304 and N305 are sets of tiles consisting of two view port tiles (N304 consisting of N306 and N307, and N305 consisting of N308 and N309), respectively.

Also, tiles N306 and N307 of tile group N304 are mapped with view ports of two mutually different viewpoints (upper right and lower right).

Decoding device 20 utilizes the tile set information to process partial expansion and transformation of the image. The tile group header includes mapping information corresponding to view port or view information of the video, and also includes scale information or rotation conversion (90 degree/180 degree/270 degree conversion) information indicating whether or not partial decoding of the user view is performed or not and pixel expansion of the processed video.

The decoding device 20 decodes and outputs an image in which the Rotation conversion and scale information are used to perform image pixel adjustment and image conversion (Rotation) of the image in the tile group.

Fig. 30 is a diagram illustrating a syntax of tile group header information of an embodiment of the present invention.

Referring to fig. 30, tile group header information includes at least one of tile group address information, tile number-in-tile group information, parallel layer information, tile group level information, tile group type information, tile group QP delta information, tile group QP offset information, tile group SAO information, and tile group ALF information.

First, Tile group address (Tile _ group _ address) information indicates an address of a first Tile in a Tile group when an arbitrary picture is composed of a plurality of Tile groups. The boundary of the tile group or the first tile within the tile group is guided by the position of the first CTU at the upper left end and the position of the CTU at the lower right end (Address).

The Single tile flag (Single _ tile _ per _ tile _ group _ flag) information is flag information for confirming the tile configuration information in a tile group, and the tile group in a picture is configured by a plurality of tiles when the Single _ tile _ per _ tile _ group _ flag has a value of 0 or False. In contrast, in the case where the value of Single _ tile _ per _ tile _ group _ flag is 1 or True, the corresponding tile group refers to a tile group consisting of one tile.

And, the parallel layer information indicates as Tile group scalability (Tile _ group _ scalability) information, which refers to a minimum or maximum parallel process unit allocatable to tiles within a Tile group. The number of threads allocated to a tile within a tile group is adjusted by the value.

In addition, the Tile group level information (Tile _ group _ level) indicates the presence or absence of independent tiles and dependent tiles of tiles within the Tile group. The display utilizes tile group level information whether the tiles within the tile group are all composed of independent tiles, or are composed of independent and non-independent (dependent) tiles, or are all composed of non-independent (dependent) tiles.

The Tile group type information (Tile _ group _ type) is classified into I/B/P type Tile group characteristic information, and means various encoding methods and restrictions such as a prediction method and a prediction mode for encoding a corresponding Tile group according to each type structure.

Further, as the coding information, tile group QP delta information, tile group QP offset information, tile group SAO information, tile group ALF information, and the like are exemplified, and on/off information of various coding tools in the tile group is separately specified as flag information in the tile group header. Decoding device 20 directs coding Tool (Tool) information for all tiles within the set of tiles that are currently being decoded or directs coding information for a portion of the tiles within the set of tiles by another algorithmic process.

The method of the present invention is implemented by a program executed on a computer, and is stored in a computer-readable recording medium such as a ROM, a RAM, a CD-ROM, a magnetic tape, a flexible disk, an optical data storage device, or the like, and is implemented by a carrier wave (for example, transmission via a network).

The computer-readable recording medium is distributed to computer systems connected via a network, and stores and executes computer-readable codes in a distributed manner. Also, a functional (function) program, code, and code segments for implementing the method can be easily inferred by programmers in the art to which the present invention pertains.

While the preferred embodiments of the present invention have been shown and described, the present invention is not limited to the above-described specific embodiments, and various modifications can be made by those skilled in the art without departing from the gist of the present invention within the scope of the claims.

54页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:编解码视频中的二次变换的使用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类