Apparatus and method for video encoding or decoding

文档序号：1643429 发布日期：2019-12-20 浏览：19次中文

阅读说明：本技术 用于视频编码或解码的设备和方法 (Apparatus and method for video encoding or decoding ) 是由林晶娟孙世勋李善英申在燮于 2018-03-02 设计创作，主要内容包括：本公开涉及用于将图片分成多个图块以高效地编码视频的视频编码或解码。在本公开的一个方面,一种用于对分割成多个图块的图片进行编码的视频编码方法包括以下步骤：对指示是否合并多个图块中的一些图块的第一信息进行编码；当第一信息被编码以指示图块合并时,通过合并多个图块中的一些图块来生成一个或多个合并图块,每个合并图块被定义为一个图块；对指示多个图块当中合并到合并图块中的每一个的图块的第二信息进行编码；以及将合并图块中的每一个编码为一个图块,而对合并到每个合并图块的图块之间的编码依赖性没有限制。(The present disclosure relates to video encoding or decoding for dividing a picture into multiple tiles to efficiently encode video. In one aspect of the present disclosure, a video encoding method for encoding a picture partitioned into a plurality of tiles includes the steps of: encoding first information indicating whether to merge some of the plurality of tiles; generating one or more merged tiles by merging some of the plurality of tiles when the first information is encoded to indicate tile merging, each merged tile being defined as one tile; encoding second information indicating a tile merged into each of the merged tiles among the plurality of tiles; and encoding each of the merged tiles into one tile without limitation on encoding dependencies between tiles merged into each merged tile.)

1. A video encoding method for encoding a picture partitioned into a plurality of tiles, the method comprising the steps of:

encoding first information indicating whether to merge some of the plurality of tiles;

when the first information is encoded to indicate tile merging, generating one or more merged tiles by merging some of the plurality of tiles, each of the merged tiles being defined as one tile;

encoding second information indicating tiles, among the plurality of tiles, merged to each of the merged tiles; and

encoding each of the merged tiles into one tile without limitation on encoding dependencies between tiles merged into each of the merged tiles.

2. The method of claim 1, wherein the coding dependencies comprise intra prediction dependencies between tiles merged into each of the merged tiles.

3. The method of claim 1, wherein, for each of the merged tiles, the second information comprises identification information of a starting tile and an ending tile merged into tiles of each of the merged tiles.

4. The method of claim 1, wherein, for each of the merged tiles, the second information comprises location information about a starting tile and an ending tile among tiles merged into each of the merged tiles.

5. The method of claim 1, wherein, for each tile of the plurality of tiles, the second information comprises information indicating whether the tile is merged into one of the merged tiles.

6. The method of claim 5, wherein the second information further comprises an index of a merged tile to which each tile that is merged from among the plurality of tiles belongs.

7. The method of claim 1, further comprising the steps of:

encoding third information indicating a number of the one or more generated merged tiles.

8. A video decoding method for decoding a picture partitioned into a plurality of tiles, the method comprising the steps of:

decoding first information from a bitstream indicating whether to merge some of the plurality of tiles;

decoding second information indicating a tile to be merged among the plurality of tiles from the bitstream when the decoded first information indicates tile merging; and

generating one or more merged tiles by merging tiles indicated by the second information, each of the merged tiles being defined as one tile; and

decoding each of the merged tiles into one tile without limitation on decoding dependencies between tiles merged into each of the merged tiles.

9. The method as in claim 8 wherein the decoding dependencies comprise intra prediction dependencies between tiles merged into each of the merged tiles.

10. The method of claim 8, wherein, for each of the merged tiles, the second information comprises identification information of a starting tile and an ending tile merged into tiles of each of the merged tiles.

11. The method of claim 8, wherein, for each of the merged tiles, the second information includes location information about a starting tile and an ending tile among tiles merged into each of the merged tiles.

12. The method of claim 8, wherein the second information comprises information indicating whether each tile of the plurality of tiles is merged.

13. The method of claim 12, wherein the second information further comprises an index of a merged tile to which each tile that is merged among the plurality of tiles belongs.

14. The method of claim 8, further comprising the steps of:

encoding third information indicating a number of the one or more generated merged tiles.

Technical Field

The present disclosure relates to video encoding or decoding for partitioning a picture into multiple tiles in order to efficiently encode video.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Recently, video size, resolution, and frame rate are gradually increasing, and demand for video content such as games or 360-degree video (hereinafter, referred to as "360 ° video") is also increasing in addition to existing 2D natural images produced by cameras.

The 360 ° video is an image captured in various directions using a plurality of cameras. In order to compress and transmit video of various scenes, images output from a plurality of cameras are spliced into one 2D image. The stitched image is compressed and sent to a decoding device. The decoding apparatus decodes the compressed image and then maps the decoded image to a 3D space to reproduce the image. The 360 ° video may be converted into various projection formats according to the encoding format to be encoded. Examples of Projection formats include rectangular spherical Projection (Equirectangular Projection), Cube Map Projection (Cube Map Projection), and Truncated Square Pyramid Projection (Truncated Square Pyramid Projection).

In the case of 360 ° video, an image displayed on a screen must be changed according to a change in a user's view, and thus there is a limitation of improving encoding efficiency according to the conventional compression technology based on 2D video. In order to improve video encoding and decoding efficiency of 360 ° video, the video needs to be encoded and decoded by setting a region of interest (ROI) in projection video of the 360 ° video according to a user's view angle and distinguishing image quality of the ROI from image quality of other regions. For this reason, a structure (e.g., a tile structure) capable of dividing a picture into a plurality of regions and independently encoding and decoding each region is required. However, the image segmentation structure according to the conventional compression technique is not flexible enough to set the ROI.

Disclosure of Invention

Technical problem

Accordingly, the present disclosure has been made in view of the above problems, and it is an object of the present invention to provide a method and apparatus for video encoding and decoding, which can solve the problem of an interval at a tile boundary, thereby improving encoding efficiency.

Technical scheme

According to an aspect of the present disclosure, there is provided a video encoding method for encoding a picture partitioned into a plurality of tiles, the method including the steps of: encoding first information indicating whether to merge some of the plurality of tiles; generating one or more merged tiles by merging some of the plurality of tiles when the first information is encoded to indicate tile merging, each of the merged tiles being defined as one tile; encoding second information indicating tiles, among the plurality of tiles, merged into each merged tile; and encoding each of the merged tiles into one tile without limiting encoding dependencies between tiles merged into each merged tile.

According to another aspect of the present disclosure, there is provided a video decoding method for decoding a picture partitioned into a plurality of tiles, the method including the steps of: decoding first information indicating whether to merge some of the plurality of tiles from the bitstream; decoding second information indicating a tile to be merged among the plurality of tiles from the bitstream when the decoded first information indicates tile merging; and generating one or more merged tiles by merging tiles indicated by the second information, each of the merged tiles being defined as one tile; and decoding each of the merged tiles into one tile without limiting decoding dependencies between tiles merged into each merged tile.

Drawings

Fig. 1 is an exemplary diagram of a picture that is divided into a plurality of tiles and encoded.

Fig. 2 is another example diagram of a picture that is divided into a plurality of tiles and encoded.

Fig. 3 is a block diagram of a video encoding apparatus according to an embodiment of the present disclosure.

Fig. 4 is a flowchart illustrating an operation of a video encoding apparatus according to an embodiment of the present disclosure.

Fig. 5 is an exemplary diagram of marking identification information of tiles on a picture including a plurality of tiles, some of which are merged.

Fig. 6 is an exemplary diagram of marking location information of tiles on a picture including a plurality of tiles, some of which are merged.

Fig. 7 is an exemplary diagram of marking information on whether each tile is merged or not on a picture including a plurality of tiles, some of which are merged.

Fig. 8 is an exemplary diagram of marking identification information of tiles on a picture including a plurality of merged tiles.

Fig. 9 is an exemplary diagram of marking location information of tiles on a picture including a plurality of merged tiles.

Fig. 10 is an example diagram of marking information on whether each tile is merged and an index of the merged tile on a picture including a plurality of merged tiles.

Fig. 11 is another exemplary diagram of marking information on whether each tile is merged and an index of the merged tile on a picture including a plurality of merged tiles.

Fig. 12a, 12b, and 12c illustrate exemplary merged tile scenes for a projection format of 360 ° video.

Fig. 13 is a block diagram of a video decoding apparatus according to an embodiment of the present disclosure.

Fig. 14 is a flowchart illustrating an operation of a video decoding apparatus according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that, when reference numerals are added to constituent elements in respective drawings, the same reference numerals denote the same elements although the elements are shown in different drawings. Further, in the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

Fig. 1 is an exemplary diagram of a picture that is divided into a plurality of tiles and encoded.

Fig. 2 is another example diagram of a picture that is divided into a plurality of tiles and encoded.

In the High Efficiency Video Coding (HEVC) standard, a picture may be partitioned into multiple tiles that are rectangular regions. A picture may be partitioned into one or more columns of tiles, one or more rows of tiles, or one or more columns and one or more rows of tiles. The picture may be divided evenly into tiles of the same size, or may be divided into tiles of different sizes based on the length of the rows and columns. However, each row should have the same number of tiles and each column should have the same number of tiles.

When a picture is partitioned into multiple tiles, each tile may be encoded and decoded independently of other tiles. Here, independent means that all encoding and decoding processes of each tile, including intra prediction, inter prediction, transformation, quantization, entropy, and filtering, can be performed independently of encoding and decoding processes of other tiles. However, this does not mean that all encoding and decoding processes are performed completely independently for each tile. In inter prediction or loop filtering, a tile may be selectively encoded and decoded using information about other tiles.

An example of the high level syntax of the tiles is shown in table 1.

[ Table 1]

pic_parameter_set_rbsp(){
	…
tiles_enabled_flag
	…
if(tiles_enabled_flag){
	num_tile_columns_minus1
num_tile_rows_minus1
	uniform_spacing_flag
if(！uniform_spacing_flag){
	for(i＝0；i＜num_tile_columns_minus1；i++)
column_width_minus1[i]
	for(i＝0；i＜num_tile_rows_minus1；i++)
row_height_minus1[i]
	}
loop_filter_across_tiles_enabled_flag
	}

Table 1 shows tiles _ enabled _ flag as a flag indicating On/Off states of tile functions in a Picture Parameter Set (PPS), and a plurality of syntaxes for specifying the size of a tile when the flag is set to the On state. The table also shows num _ tile _ columns _ minus1 having a value obtained by subtracting 1 from the number of tiles vertically divided from the picture, num _ tile _ rows _ minus1 having a value obtained by subtracting 1 from the number of tiles horizontally divided from the picture, and uniform _ spacing _ flag as a flag indicating that the image is uniformly divided into tiles on the horizontal and vertical axes. When a picture is not uniformly divided on the horizontal axis and the vertical axis (uniform _ spacing _ flag off), width information (column _ width _ minus1) about each tile based on the horizontal axis and height information (row _ height _ minus1) about each tile based on the vertical axis are additionally transmitted. Finally, a flag (loop _ filter _ across _ tiles _ enabled _ flag) indicating whether or not to execute the loop filter in the boundary region between tiles is transmitted. Fig. 1 shows an exemplary diagram of a block divided into the same size when uniform _ spacing _ flag is set to On. Fig. 2 shows an exemplary diagram of being divided into blocks of different sizes when uniform _ spacing _ flag is set to Off.

When setting the ROI, such a conventional tile structure has a limitation in processing the ROI and the region outside the ROI according to the characteristics of the respective regions. The ROI should be encoded with higher image quality than other regions. However, when the ROI is segmented into a plurality of tiles, image quality may be degraded due to an interval at a tile boundary or the like. Thus, the ROI is preferably set as one tile rather than being divided into multiple tiles. The region other than the ROI is preferably set to a plurality of tiles having a small size so as to be flexibly processed according to a change in viewing angle even if the image quality is slightly degraded.

In the present disclosure, an irregular pattern of tile structures is defined. Therefore, unlike a conventional tile structure in which each row of a picture should have the same number of tiles and each column should have the same number of tiles, tiles can be flexibly configured. Accordingly, the method and apparatus for video encoding and decoding proposed in the present disclosure may solve the problem of the spacing at the tile boundary within the ROI and improve the encoding efficiency.

As used herein, the term "tile" represents a segmented region from a picture. Tiles may be used as a tool for encoding and decoding each region independently (but tiles may selectively have dependencies in inter prediction or loop filtering). In encoding processes such as intra prediction other than inter prediction and loop filtering, the dependency between a "tile" and other tiles is limited. In the present disclosure, the term "tile" may be replaced with other terms (e.g., region, area) having the same meaning.

Fig. 3 is a block diagram of a video encoding apparatus according to an embodiment of the present disclosure.

The encoding apparatus includes a block divider 310, a predictor 320, a subtractor 330, a transformer 340, a quantizer 345, an encoder 350, an inverse quantizer 360, an inverse transformer 365, an adder 370, a filter unit 380, and a memory 390. Each element of the encoding apparatus may be implemented as a hardware chip or may be implemented as software, and the microprocessor may be implemented to perform the functions of the software corresponding to the respective elements.

Block divider 310 divides each picture constituting a video into a plurality of tiles. The block divider 310 then divides each tile into a plurality of Coding Tree Units (CTUs), which are then recursively divided using a tree structure. In the tree structure, a leaf node is a Coding Unit (CU) which is a basic unit of coding. A Quadtree (QT) structure that divides a node into four child nodes, or a quadtree plus binary tree (QTBT) structure that combines the QT structure and a Binary Tree (BT) structure that divides a node into two child nodes, may be used as the tree structure.

In this disclosure, block partitioner 310 defines an irregular or flexible tile structure by partitioning a picture into multiple tiles and merging some tiles to generate one or more merged tiles. Each merged tile is defined as a tile. Details will be described later with reference to other drawings.

The predictor 320 generates a prediction block by predicting the current block. The predictor 320 includes an intra predictor 322 and an inter predictor 324. Here, the current block, which is an encoded basic unit corresponding to a leaf node in the tree structure, refers to a CU to be currently encoded. Alternatively, the current block may be one of a plurality of sub-blocks into which the CU is divided.

The intra predictor 322 predicts pixels in the current block using pixels (reference samples) located around the current block in the current picture including the current block. There are a plurality of intra prediction modes according to the prediction direction, and adjacent pixels and calculation formulas to be used are defined differently according to each prediction mode.

The inter predictor 324 searches for a block most similar to the current block in a reference picture encoded and decoded earlier than the current picture, and generates a prediction block for the current block using the searched block. Then, the inter predictor generates a motion vector corresponding to a displacement between a current block in a current picture and a prediction block in a reference picture. Motion information including information on a reference picture used to predict the current block and information on a motion vector is encoded by the encoder 350 and transmitted to the video decoding apparatus.

The subtractor 330 subtracts the prediction block generated by the intra predictor 322 or the inter predictor 324 from the current block to generate a residual block.

The transformer 340 transforms a residual signal in a residual block having a pixel value in a spatial domain into a transform coefficient in a frequency domain. The transformer 340 may transform the residual signal in the residual block by using the size of the current block as a transform unit, or may divide the residual block into a plurality of smaller sub-blocks and transform the residual signal in a transform unit corresponding to the size of the sub-blocks. Various methods are possible to divide the residual block into smaller sub-blocks. For example, the residual block may be partitioned into sub-blocks of the same predefined size, or may be partitioned in a Quadtree (QT) manner with the residual block as a root node.

The quantizer 345 quantizes the transform coefficient output from the transformer 340, and outputs the quantized transform coefficient to the encoder 350.

The encoder 350 encodes the quantized transform coefficients using a coding scheme such as CABAC to generate a bitstream. The encoder 350 encodes merge information for defining an irregular or flexible tile structure by merging some of tiles into which a picture is divided, thereby allowing a video decoding apparatus to define the same tile structure as the video encoding apparatus. The merge information includes first information indicating whether to merge some of the plurality of tiles and second information indicating a tile to be merged into each merged tile among the plurality of tiles. Third information indicating the number of merged tiles may be further included in the merged information. Syntax elements related to the merging information may be configured at predetermined positions in one or more of a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), Supplemental Enhancement Information (SEI), and a slice header.

The encoder 350 encodes information on the size of the CTU located at the uppermost layer of the tree structure and partition information for block partitioning from the CTUs of the tree structure, so that the video decoding apparatus can partition blocks in the same manner as the video encoding apparatus. For example, in the case of QT partition, QT partition information indicating whether or not a block of an upper layer is partitioned into four blocks of a lower layer is encoded. In the case of BT partition, an encoder encodes BT partition information indicating whether each block is divided into two blocks and indicating a partition type, starting from a block corresponding to a leaf node of QT.

The encoder 350 encodes information regarding a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and encodes the intra prediction information or the inter prediction information according to the prediction type.

The inverse quantizer 360 inversely quantizes the quantized transform coefficient output from the quantizer 345 to generate a transform coefficient. The inverse transformer 365 transforms the transform coefficients output from the inverse quantizer 360 from the frequency domain to the spatial domain to reconstruct the residual block.

The adder 370 adds the reconstructed residual block to the prediction block generated by the predictor 320 to reconstruct the current block. When intra prediction of the next block is performed in order, pixels in the reconstructed current block are used as reference samples.

The filtering unit 380 deblocking filters the boundary between the reconstructed blocks to remove blocking artifacts caused by block-by-block encoding/decoding and stores the blocks in the memory 390. When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture to inter-predict blocks in subsequent pictures to be encoded.

Hereinafter, a video encoding method for defining an irregular or flexible tile structure by merging some tiles of a plurality of tiles to generate one or more merged tiles will be described in detail.

Fig. 4 is a flowchart illustrating an operation of a video encoding apparatus for encoding a picture divided into a plurality of tiles.

The video encoding apparatus encodes first information indicating whether to merge some of the plurality of tiles (S410). For example, as the first information, a flag merge _ tile _ enabled _ flag indicating whether tiles are merged may be used. When some tiles are merged, the flag merge tile enabled flag may be encoded with an indication On. When there is no tile to merge, the flag merge _ tile _ enabled _ flag may be encoded with an indication of Off.

When the first information indicating whether to merge tiles is encoded to indicate the merging of tiles, the video encoding apparatus generates one or more merged tiles by merging some tiles of the plurality of tiles (S420). Each generated merged tile is defined as a tile. In other words, the tiles to be merged into each merged tile are not simply grouped while maintaining their characteristics prior to merging, but are merged into a single tile. For example, merging may be performed in a manner that eliminates restrictions on coding dependencies between tiles merged into each merged tile.

After generating the one or more merged tiles, the video encoding apparatus encodes second information indicating tiles, among the plurality of tiles, merged into each merged tile (S430). For each merged tile, the second information may include: i) identification information of a start tile and an end tile among tiles merged into each merged tile, ii) position information on the start tile and the end tile among tiles merged into each merged tile, or iii) information indicating whether each of the plurality of tiles is merged. The second information may include: iv) index information about each of the one or more merged tiles into which the tile is merged. Specific examples of the second information will be described later with reference to other drawings.

The video encoding device may additionally encode third information indicating a number of the one or more merged tiles generated.

The video encoding device encodes each of the merged tiles as one tile without limitation on encoding dependencies between tiles merged into each of the merged tiles (S440). Here, the encoding dependencies may include intra prediction dependencies between tiles merged into each merged tile. That is, the limitation of intra prediction dependencies between tiles merged into the same merged tile is eliminated.

Hereinafter, exemplary syntax elements for merging tiles will be described with reference to fig. 5 to 12.

31页详细技术资料下载

Apparatus and method for video encoding or decoding

相关技术

网友询问留言