Picture tile attributes signaled using cycles over tiles

文档序号：639641 发布日期：2021-05-11 浏览：11次中文

阅读说明：本技术 使用图块上进行的循环以信号传送的图片图块属性 (Picture tile attributes signaled using cycles over tiles ) 是由理卡尔德·肖伯格米特拉·达姆汉尼安马丁·彼得松于 2019-10-01 设计创作，主要内容包括：在将包括多个图块的图片编码到比特流中时,提供了用于使用紧凑语法以信号传送每图块的图块属性的方法和装置。这些实施例使用图块上进行的循环以信号传送每图块的属性值。图块属性可以例如以图块语法元素集合(每图块属性一个语法元素)的形式,或者例如以标志集合的形式以启用或禁用图块属性的使用。这些实施例为编码器针对图片中的每图块或任何图块子集指派图块属性值提供了自由,并且使用图块上进行的(一个或多个)循环通过紧凑语法以信号传送属性值。(Methods and apparatus are provided for signaling tile attributes for each tile using a compact syntax when encoding a picture comprising a plurality of tiles into a bitstream. These embodiments use a loop over the tiles to signal the attribute values for each tile. The tile attributes may be, for example, in the form of a set of tile syntax elements (one syntax element per tile attribute), or, for example, in the form of a set of flags to enable or disable the use of tile attributes. These embodiments provide freedom for the encoder to assign tile attribute values for each tile or any subset of tiles in a picture, and signal the attribute values through a compact syntax using the loop(s) that are performed on the tiles.)

1. A method (100) of encoding a picture, the method (100) comprising:

dividing (102) a picture into a plurality of tiles;

associating (104) one or more tile attributes with one or more tiles, each tile attribute comprising a syntax element or flag indicating applicability of the tile attribute;

assigning (106) one or more tile attributes to a tile or subset of tiles; and

picture and tile attributes are signaled (110) in a bitstream, which tile attributes are signaled using one or more cycles (108) performed on the tiles.

2. The method (100) of claim 1, wherein tile attributes include one or more of delta _ QP, deblocking strength, and tile boundary handling flags.

3. The method (100) according to any one of claims 1-2, wherein:

associating (104) one or more tile attributes with one or more tiles comprises: defining one or more tile attribute sets, each tile attribute set having a unique set identifier; and also includes; and

signaling (110) tile attributes using one or more cycles (108) performed on the tiles comprises: for each tile, a set identifier to be applied to the tile attribute set of that tile is signaled.

4. The method (100) of any of claims 1-2, wherein tile attribute values are signaled in a picture parameter set and the loop over tiles is signaled in a tile or a slice header.

5. The method (100) of any of claims 1-2, wherein the tile attributes are signaled directly in a slice header.

6. The method (100) of any of claims 1-2, wherein signaling (110) tile properties using one or more cycles (108) performed on a tile comprises at least one of:

signaling an initial set of tile attributes and a set of flags for each tile, the flags indicating whether the tile attributes are enabled or disabled for that tile; and

signaling an initial set of tile attributes followed by a set of copy flags for each tile, wherein each flag specifies: for each tile, the tile attribute values should be copied, or explicitly signaled, from the initial set of tile attributes.

7. The method (100) according to any one of claims 1 to 6, wherein delta _ QP is signaled for each tile, and a final initial tile or block QP is calculated as the sum of the decoder reference QP value from the sequence parameter set or picture parameter set plus tile delta _ QP plus optional delta _ QP signaled for the block.

8. The method (100) of claim 1, wherein:

associating (106) one or more tile attributes with one or more tiles comprises:

if all the blocks have the same attribute value, a set _ id is used to create an attribute set;

if at least two tiles in a picture have different tile attribute values, creating at least two sets of tile attributes and assigning a unique set _ id to each of the at least two sets of tile attributes; and

signaling tile attributes using one or more cycles performed on a tile if the number of attributes in the at least two attribute sets is different comprises: the number of tile attribute sets, the number of tile attributes, the value of each tile attribute in each tile attribute set, and the tile _ attribute _ set _ id of each tile are signaled in the bitstream.

9. The method (100) of claim 1, wherein

Signaling (110) tile attributes using one or more cycles (108) performed on the tiles comprises: the loop over the picture is placed in the picture header, or the slice header.

10. The method (100) of claim 9, wherein the loop over a tile is placed in a tile header, and wherein a tile header of a subsequent tile is optional.

11. The method (100) of claim 9, wherein signaling (110) tile properties using one or more cycles (108) performed on a tile comprises: attributes are signaled directly in the fragment header, rather than in the parameter set, by:

signaling one or more codewords specifying how many tiles are in a segment and the spatial location of the tiles; and

signaling at least one tile attribute value for each tile of the segment.

12. The method (100) of claim 1, wherein signaling (110) tile properties using one or more cycles (108) performed on a tile comprises: the set of tile attributes is signaled, followed by a set of flags for each tile that indicate usage of the tile attributes in the set of tile attributes for the associated tile.

13. The method (100) of claim 12, wherein the signaled set of flags for each tile specifies whether each initial tile attribute value is used or turned off for that tile.

14. The method (100) of claim 12, wherein signaling (110) tile properties using one or more cycles (108) performed on a tile comprises at least one of:

signaling a first state and a second state for each tile attribute, and signaling a set of flags for each tile, one flag for each tile attribute, wherein each flag indicates use of a tile attribute having either the first state or the second state;

a set of tile attributes is signaled, and for each tile and each tile attribute in the set, a flag indicating whether an initial value of the tile attribute should be used or overwritten is signaled, and if the flag indicates that an initial tile attribute value should be overwritten, a codeword specifying the new value of the tile attribute is signaled.

15. The method (100) of claim 2, further comprising:

determining a reference QP value refQP from a sequence parameter set or a picture parameter set;

determining one or more delta _ QP values for each tile; and

wherein signaling tile attributes using one or more cycles performed on the tiles comprises: signaling at least the delta _ QP value.

16. A picture encoder, comprising:

a processing circuit; and

a memory containing instructions executable by the processing circuitry, whereby the encoder is configured to:

dividing (102) a picture into a plurality of tiles;

associating (104) one or more tile attributes with one or more tiles, each tile attribute comprising a syntax element or flag indicating applicability of the tile attribute;

assigning (106) one or more tile attributes to a tile or subset of tiles; and

picture and tile attributes are signaled (110) in a bitstream, which tile attributes are signaled using one or more cycles (108) performed on the tiles.

17. The picture encoder according to claim 16, wherein the memory further contains instructions executable by the processing circuitry, whereby the encoder is configured to perform the method (100) according to any one of claims 1 to 15.

18. A method of decoding a picture, the method comprising:

receiving a bitstream comprising a picture divided into a plurality of tiles, and one or more tile attributes associated with one or more tiles, each tile attribute comprising a syntax element or flag indicating applicability of the tile attribute; and

applying the tile attributes to the tiles using one or more cycles performed on the tiles.

19. The method of claim 18, wherein tile attributes include one or more of delta _ QP, deblocking strength, and tile boundary handling flags.

20. The method of any one of claims 18 to 19, wherein:

the tile attributes associated with the one or more tiles include one or more sets of tile attributes, each set of tile attributes having a unique set identifier; and

wherein receiving tile attributes associated with one or more tiles comprises: for each tile, a set identifier is received that is to be applied to the tile attribute set for that tile.

21. The method of any of claims 18 to 20, wherein the attribute values are received in a picture parameter set and the loop over tiles is received in a tile or a slice header.

22. The method of any of claims 18 to 19, wherein the tile attributes are received directly in a slice header.

23. The method of any of claims 18 to 19, wherein receiving tile properties associated with one or more tiles comprises at least one of:

receiving an initial set of tile attributes and a set of flags for each tile, the flags indicating whether the tile attributes are enabled or disabled for that tile; and

an initial set of tile attributes is received, followed by a set of copy flags for each tile, wherein each flag specifies: for each tile, the tile attribute values should be copied, or explicitly signaled, from the initial set of tile attributes.

24. The method of any one of claims 18 to 19 wherein delta _ QP is received for each tile and the final initial tile or block QP is calculated as the sum of the decoder reference QP value from the sequence parameter set or picture parameter set plus tile delta _ QP plus optional delta _ QP received for the block.

25. The method of claim 20, wherein receiving tile attributes associated with one or more tiles comprises:

resolving the number of the pattern block attribute sets;

for each set of tile attributes, parsing the tile attributes in that set, storing a value (A) and assigning the value (A) to a set id;

for a tile (T) in a picture, parsing the tile _ attribute _ set _ id codeword and identifying a set of tile attributes using a value (B) of tile _ attribute _ set _ id; and

decoding the tile (T) using the value (A) stored and assigned to the set id corresponding to the value (B).

26. The method of claim 25, wherein the number of attributes in at least two attribute sets is different, and wherein

Receiving tile attributes using one or more cycles performed on the tiles comprises: the number of tile attribute sets, the number of tile attributes, the value of each tile attribute in each tile attribute set, and the tile _ attribute _ set _ id of each tile are received in the bitstream.

27. The method of claim 20, wherein receiving tile attributes associated with one or more tiles comprises:

determining a number (N) of tile attribute sets from one or more codewords in the bitstream, where N > 1;

determining, for each tile attribute set, a number of tile attribute values (M) from codewords in the bitstream, where M > 0 and the total number of attribute values is equal to NxM;

assigning a unique set id value (V) to each tile attribute set;

decoding the segment header and determining the number of blocks in the segment according to the code word in the parameter set or the code word in the segment header;

for each tile (T) in a slice, determining a tile set id (I) from the codeword in the slice header and storing the set id (I) value in a list L;

decoding a particular tile (P) of the segment and determining a tile number of the tile P;

using the tile number as an index in a list L to determine a tile set id (I) for the tile P;

using the tile set id (I) to select a set of tile attributes assigned to a set id value (V) equal to the tile set id (I); and

in the decoding of the tile P, the attribute values of the selected tile attribute set are used.

28. The method of claim 27, wherein the set id value (V) of the first set of attributes in decoding order is 0 and the set id value (V) of the second set of attributes in decoding order is 1.

29. The method of claim 20, wherein the loop over tiles is over a tile header, and wherein a tile header of a subsequent tile is optional.

30. The method of claim 20, wherein receiving tile attributes associated with one or more tiles comprises:

decoding the slice header and determining a number (N) of tiles in the picture or portion of the picture from a codeword in the parameter set or a codeword in the slice header, where N > 1;

creating a list L of size N, wherein each entry in the list includes at least one tile attribute value decoded from the slice header;

decoding a particular tile (P) of the segment and determining a tile number of the tile P;

using the tile number as an index in a list L to select a tile attribute value for decoding a tile P; and

the selected attribute values are used in the decoding process of the tile P.

31. The method of claim 30, further comprising:

receiving one or more codewords specifying how many tiles there are in a segment and the spatial location of the tiles; and

at least one tile attribute value is received for each tile of the segment.

32. The method of claim 18, wherein receiving tile attributes using one or more cycles performed on a tile comprises: a set of tile attributes is received, and then a set of flags for each tile is received, the flags indicating usage of tile attributes in the set of tile attributes for the associated tile.

33. The method of claim 32 wherein the received set of flags for each tile specifies whether each initial tile attribute value is used or turned off for that tile.

34. The method of claim 32, wherein receiving tile attributes using one or more cycles performed on a tile comprises at least one of:

receiving a first state and a second state for each tile attribute, and receiving a set of flags for each tile attribute, one flag for each tile attribute, wherein each flag indicates use of a tile attribute having either the first state or the second state;

a set of tile attributes is received, and for each tile and each tile attribute in the set, a flag is received indicating whether an initial value of the tile attribute should be used or overwritten, and if the flag indicates that an initial tile attribute value should be overwritten, a codeword is received specifying a new value for the tile attribute.

35. The method of claim 18, further comprising:

receiving a reference QP value (refQP) in a sequence parameter set or a picture parameter set;

receiving one or more delta _ QP values for each tile; and

the QP value for a block in a tile is calculated as the sum of the reference QP and all delta _ QP values for the tile.

36. A picture decoder, comprising:

a processing circuit; and

a memory containing instructions executable by the processing circuitry, whereby the encoder is configured to:

applying the tile attributes to the tiles using one or more cycles performed on the tiles.

37. The picture decoder according to claim 36, wherein the memory further contains instructions executable by the processing circuitry whereby the decoder is configured to perform the method according to any one of claims 18 to 35.

Technical Field

The present disclosure relates generally to video encoding and decoding techniques and, more particularly, to encoding and decoding parametric attributes for each tile by looping over some or all of the tiles in an image.

Background

High Efficiency Video Coding (HEVC) is a block-based video codec standardized by both ITU-T and Moving Picture Experts Group (MPEG), utilizing both spatial and temporal prediction techniques. Spatial prediction reduces spatial redundancy and is implemented using intra (I) prediction within the same frame from the current picture. Temporal prediction reduces temporal redundancy and is implemented using inter (P) or bi-directional inter (B) prediction at the block level using previously decoded reference pictures. However, regardless of the particular type of prediction technique, the difference between the resulting original pixel data and the predicted pixel data (referred to as the "residual") is transformed to the frequency domain and quantized. The transformed residual is quantized, the level of which is determined by a Quantization Parameter (QP), which helps to control the trade-off between bitrate and video quality.

The transformed and quantized residual is then entropy coded together with the necessary prediction parameters before being sent to the decoder. The prediction parameters, which are also entropy coded, include a prediction mode and a motion vector. Upon receipt, the decoder performs entropy decoding, inverse quantization and inverse transformation to obtain a residual. The decoder then reconstructs the image from the residual using intra-prediction or inter-prediction techniques.

Both MPEG and ITU-T are studying successors to HEVC within the joint video exploration group (jfet). The video codec is named "multifunctional video coding (VVC)".

Disclosure of Invention

Embodiments of the present disclosure use looping over tiles to signal per-tile attribute values for a picture. The tile attributes may be, for example, in the form of a set of tile syntax elements (one syntax element per tile attribute), or, for example, in the form of a set of flags to enable or disable the use of tile attributes. These embodiments provide freedom for the encoder to assign tile attribute values for each tile or any subset of tiles in a picture, and signal the attribute values through a compact syntax using the loop(s) that are performed on the tiles.

One embodiment relates to a method of encoding a picture. The picture is divided into a plurality of tiles. One or more tile attributes are associated with one or more tiles. Each tile attribute includes a syntax element or flag indicating the applicability of the tile attribute. One or more tile attributes are assigned to a tile or subset of tiles. Picture and tile attributes are signaled in a bitstream, which are signaled using one or more cycles performed on the tiles.

Another embodiment relates to a method of decoding a picture. A bitstream is received that includes a picture divided into a plurality of tiles and one or more tile attributes associated with one or more tiles. Each tile attribute includes a syntax element or flag indicating the applicability of the tile attribute. Applying the tile attributes to the tiles using one or more cycles performed on the tiles.

Drawings

Fig. 1 shows an example of using QTBT to divide CTUs into CUs.

FIG. 2 illustrates example tile partitioning.

Fig. 3A-3B illustrate exemplary tiled structures having high resolution tiles and low resolution tiles.

Fig. 4 shows an example of a tile structure that is not supported in HEVC.

Fig. 5 shows two examples of frame packing (horizontal and vertical).

Fig. 6 illustrates decoding an example bitstream into a decoded picture according to one embodiment of the present disclosure.

Fig. 7 is a flowchart of a method of encoding a picture.

Detailed Description

Quadtree and binary tree (OTBT) structure

As previously mentioned, HEVC is a block-based video codec standardized by ITU-T and MPEG, utilizing both temporal and spatial prediction. HEVC uses a block structure in which each top-level coding block, i.e., the largest block in a coding block partition, referred to herein as a Coding Tree Unit (CTU), is partitioned by a Quadtree (QT) structure. This partitioning results in coded block partitions, referred to herein as Coding Units (CUs). The CU may then be further recursively partitioned into smaller equal-sized CUs using a quadtree structure, up to 8 x 8 block sizes.

The block structure in the current version of VVC is different from that in HEVC. Specifically, the block structure in VVC is referred to as quadtree plus binary tree plus ternary tree block structure (QTBT + TT). CUs in QTBT + TT may have a square or rectangular shape. As in HEVC, the Coding Tree Unit (CTU) is first partitioned by a quadtree structure. Then, the coded blocks (also referred to as CUs) are further divided in a binary tree structure in the vertical or horizontal direction with equal-sized partitions to form coded blocks. Thus, the blocks may have a square or rectangular shape.

The encoder may set the depth of the quadtree and the binary tree in the bitstream. Fig. 1 shows an example of segmenting CTUs using QTBT + TT. TT allows to divide a CU into three partitions instead of two partitions of equal size. This increases the likelihood of using a block structure that is more appropriate for the content structure in the picture.

Context Adaptive Binary Arithmetic Coding (CABAC)

Context Adaptive Binary Arithmetic Coding (CABAC) is an entropy coding tool used in HEVC and VVC. CABAC is configured to encode binary symbols, which keeps complexity low and allows modeling of the probability of more frequently used bits of a symbol. Since the coding modes usually have a strong correlation locally, the probability model is selected adaptively based on the local context.

Sheet

The concept of slices in HEVC divides a picture into independently coded slices, where each slice is read in raster scan order in units of CTUs. Different coding types may be used for slices of the same picture. For example, a slice may be an I-slice, a P-slice, or a B-slice. However, the main purpose of a slice is to achieve resynchronization in case of lost data.

Picture block

The HEVC video coding standard also includes tools called "tiles" that divide a picture into rectangular, spatially independent regions. Pictures in HEVC may be partitioned into rows of samples and columns of samples using tiles, where any given tile is located at the intersection of a given row and a given column. Fig. 2 shows an example of tile partitioning using four (4) row tiles and five (5) column tiles, resulting in a total of twenty (20) tiles for the picture. As shown in fig. 2, tiles in HEVC are always aligned with CTU boundaries.

The tile structure is signaled in a Picture Parameter Set (PPS) by specifying the thickness of the rows and the width of the columns. The various rows and columns may have different sizes, but the partitioning is always across the entire picture, from left to right and top to bottom, respectively.

Table 1 lists the PPS syntax used to specify tile structure in HEVC. As shown in Table 1, the flag (i.e., tiles enabled flag) indicates whether tiles are used. If tiles _ enabled _ flag is set, the tile column count (i.e., num _ tile _ columns _ minus1) and tile row count (i.e., num _ tile _ rows _ minus1) are specified. The uniform _ spacing _ flag is a flag that specifies whether the column width and row height should be explicitly signaled, or whether a predetermined method of equally spacing the block boundaries should be used. If explicit signaling is indicated, the column width is signaled one by one, followed by the row height. Column width and row height information is signaled in CTU units. Finally, the flag loop _ filter _ across _ tiles _ enabled _ flag specifies whether the loop filter across tile boundaries is turned on or off for all tile boundaries in the picture.

TABLE 1-Tile syntax in HEVC

Similar to slices, there are no decoding dependencies between tiles of the same picture. This includes intra prediction, context selection, and motion vector prediction. One exception, however, is that loop filtering dependencies are generally allowed between tiles. However, these dependencies can be disabled by setting loop _ filter _ across _ tiles _ enabled _ flag appropriately.

In contrast to tiles, tiles do not require as much header data. The header overhead for each tile includes signaling of bitstream offsets that exist in the slice header and indicate the starting points of all tiles in the picture. The decoder decodes the starting point to enable splitting of the encoded pictures into encoded tiles in order to distribute them for parallel decoding. In HEVC, it is mandatory to include a bitstream offset in the slice header when tiles are enabled. However, the combination of tiles and slices is limited in HEVC. Specifically, all CTUs in a tile belong to the same slice, or all CTUs belong to the same tile.

The bitstream offset may also allow tiles to be extracted and tiled to reconstitute the tiles into an output stream. This requires some encoder-side constraints to make the tiles independent in time. One constraint limits the motion vectors so that motion compensation for a tile uses only samples included in spatially co-located tiles of previous pictures. Another constraint limits the Temporal Motion Vector Prediction (TMVP) so that the process proceeds independently in time. To be completely independent, the deblocking operation on the boundaries between tiles, and the Sample Adaptive Offset (SAO) filtering operation between times, must be disabled via the loop _ filter _ across _ tiles _ enabled _ flag described previously. However, disabling deblocking may introduce visible lines between tiles. Thus, some implementations disable deblocking, while other implementations do not. Motion Constrained Tile Sets (MCTS) are a feature in HEVC for signaling coding-side constraints for temporally independent tile sets. The set of tiles in the MCTS covers one or more tiles of the picture.

Tiles are sometimes used for 360 degree video intended for consumption using a Head Mounted Display (HMD) device. The field of view when using today's HMD devices is limited to around 20% of the entire sphere. This means that only 20% of the entire 360 degree video is consumed by the user. Typically, a full 360 degree video sphere is available to the HMD device, which then crops out the portion to be rendered for the user. This portion (i.e., the portion of the sphere that the user sees) is referred to as the viewport. A well-known optimization of resources is to make the HMD device video system aware of the user's head movement and direction of looking so that less resources are spent processing video samples that are not rendered to the user. The resource may be, for example, bandwidth from the server to the client or the decoding capability of the device. For future HMD devices, whose field of view will be larger than currently possible, non-uniform resource allocation will still be beneficial. In particular, the human visual system requires a higher image quality in the central visual area (about 18 ° horizontal view) and a lower image quality in the surrounding area (about 120 ° or more for a comfortable horizontal view). Thus, by allocating more resources in the central visual area than in the surrounding areas, a non-uniform resource allocation will help meet the demands of the human visual system.

Optimizing the resources of a region of interest (ROI) is another use case for tiles. The RoI may be specified in the content or extracted by a method such as eye tracking. One way to reduce the required resources using head movement is to use tiles. The method first encodes a video sequence using a tile multiple times. The tile partition structure is the same in all encodings; however, video sequences are encoded at different video qualities. This results in at least one high quality encoding for the video sequence and one low quality encoding for the video sequence. This means that for each tile at a particular point in time there is at least one high quality tile representation and at least one low quality tile representation. The difference between a high quality tile and a low quality tile may be: high quality tiles are encoded at a higher bit rate than low quality tiles or the resolution of high quality tiles is higher than that of low quality tiles.

Fig. 3A-3B illustrate examples of high quality tiles having a higher resolution than that of low quality tiles. In particular, fig. 3A shows a picture that has just been decoded by the decoder. In this context, a picture that displays output from a decoder (i.e., a decoded picture) is a picture that is displayed in the compressed domain. In this example, all 8 tiles A-H in the compressed domain have equal spatial size. Tiles a-H are then scaled and spatially arranged after the picture is decoded but before the picture is rendered. Pictures that have been prepared for rendering (i.e., after scaling and rearrangement) are pictures to be displayed in the output domain. The output field seen in fig. 3B shows the picture as it is rendered or presented to the user.

As shown in fig. 3A-3B, tiles B, D, F and H are high quality tiles because they have higher resolution in the output domain. However, tiles A, C, E and G are low resolution tiles because the scaling step reduces the actual resolution.

In addition to illustrating how tiles have different resolutions, FIGS. 3A-3B also illustrate that tiles in the compressed domain (FIG. 3A) need not be spatially ordered in the same manner as they are ordered in the output domain (FIG. 3B). Given that tiles are independent in both space and time, the spatial placement of tiles in the compressed domain is not critical.

There are a number of factors that increase bit cost when tiles are enabled in HEVC. First, cross-tile prediction is disabled, which means that motion vectors and intra modes are not predicted across tiles. The use of tiles also disables Quantization Parameter (QP) prediction and context selection. Second, CABAC is initialized for each tile, which means CABAC adaptability is hampered. Third, the bitstream offset must be signaled for each tile. Fourth, the tile partitioning structure needs to be specified in the PPS. Finally, CABAC is refreshed after each tile and the encoded data must be byte aligned.

Tiles are useful; however, some considerations need to be addressed. For example, in the current form of HEVC, HEVC restricts tiles to ensure that they span the entire picture. However, this limits the flexibility of the tile. For example, fig. 4 shows an example of a tile structure that is not supported by the current implementation of HEVC.

In JFET-K0260, the concept of flexible tiles is presented. In this proposal, a picture can be divided into tiles in a more flexible way than just defining the number of tile rows and tile columns (as in HEVC). JVET-K0260 proposes the tile syntax listed in Table 2 below to express flexible tiles.

TABLE 2 Flexible tiles syntax from JVET-K0260

Wherein the content of the first and second substances,

number _ of _ tiles _ in _ picture _ minus2 specifies the number of tiles in a picture;

subtitle _ width _ minus1 specifies the width of a sub-picture block unit in units of coding tree units;

subtitle _ height _ minus1 specifies the height of a sub-picture block unit in units of coding tree units;

use _ previous _ tile _ size _ flag equal to 1 specifies that the size of the current tile is equal to the size of the previous tile. use _ previous _ tile _ size _ flag equal to 0 specifies that the size of the current tile is not equal to the size of the previous tile. When not present, infer use _ previous _ tile _ size _ flag value equal to 0;

tile _ width _ minus1[ i ] plus 1 specifies the width of the ith tile in units of sub-tile units; and

tile _ height _ minus1[ i ] plus 1 specifies the height of the ith tile in sub-tile units.

Stereo video and texture deepening

Stereoscopic video is known in the art. In stereoscopic video, each eye receives a separate view, with the viewpoint of the second view slightly shifted compared to the first view. Stereoscopic video is typically packaged into a frame using side-by-side packaging or top-down packaging, as depicted in fig. 5. One drawback of stereoscopic video is that if the user is moving, the immersive experience is degraded because the viewpoint of the stereoscopic video is fixed.

In order to be able to at least partially look around the object, i.e. slightly offset the viewport, a combination of texture and depth information may be used, wherein further views may be synthesized from the texture and depth information. Texture plus depth (sometimes referred to as 2D plus depth) is also typically frame packed within a picture in a side-by-side, top-and-bottom, or some other manner.

MPEG is currently working on some activities for immersive video that will be published in the MPEG-I standard set. One activity is about 3 degrees of freedom (3DoF), also known as 360 ° video, where the user can view along all directions of the sphere using a Head Mounted Display (HMD). As for the stereoscopic video, the viewpoint is fixed.

The disadvantages of the prior art

HEVC uses tile tools to divide a picture into independent regions. Tile boundaries break the parsing and spatial prediction dependencies so that tiles can be processed independently of other tiles. However, in HEVC, the definition of the geometry and attributes of the tiles are not independent of each other. The QP defines the attributes. In HEVC, the geometry of tiles is constrained by a grid of tiles given in rows and columns, and tile attributes are not defined for each tile, but for all tiles in a picture or all tiles in a slice. One example of such tile attributes is delta-QP, which is set for all tiles in a slice, or MCTS, which is set for a set of tiles in a picture. In HEVC, the initial QP for each tile is set by a delta-QP (slice _ QP _ delta) codeword in the slice header. The HEVC specification specifies that the initial QP value (tile QP) to be used for each tile is set to 26+ init _ QP _ minus26+ slice _ QP _ delta, where init _ QP _ minus26 is a codeword in PPS. Thus, all tiles in a slice share the same initial QP value, and in HEVC, it is not possible to have different initial QP values for tiles belonging to the same slice. Other examples of tile attributes are any slice header syntax elements in HEVC.

The flexible tile concept removes row and column constraints from tiles and allows for more flexible picture partitioning, enabling partitioning of a picture into rectangular tiles without overlap. However, tile attributes are still constrained because they are not defined for each tile, but are defined for all tiles in a picture or slice in the Picture Parameter Set (PPS) level or slice level. One way to pass different attribute values for each tile is to use one slice per tile. However, this will increase the bit cost and may not be suitable for the flexible tile structure as shown in fig. 4. This solution also does not operate on tile attributes defined at the picture level.

Features of some embodiments

Embodiments of the present disclosure address these issues by providing a means for signaling tile attribute values per tile using a compact syntax. These embodiments use a loop over the tiles to signal the attribute values for each tile. The tile attributes may be, for example, in the form of a set of tile syntax elements (one syntax element per tile attribute), or, for example, in the form of a set of flags to enable or disable the use of tile attributes. These embodiments provide freedom for the encoder to assign tile attribute values for each tile or any subset of tiles in a picture, and signal the attribute values through a compact syntax using the loop(s) that are performed on the tiles.

Tile attributes supported by embodiments of the present disclosure may include, for example, but are not limited to: delta _ QP, deblocking strength, and tile boundary handling flags (e.g., for MCTS and loop filtering).

In one embodiment, sets of tile attributes are defined, where each set has a particular set _ id. Looping over the tiles in the Picture Parameter Set (PPS), slice header, picture header, tile group header, or tile header signals the appropriate set _ id for each tile.

In one embodiment, the number of tile attributes in each set of tile attributes is explicitly signaled.

In one embodiment, tile attribute values are given in the PPS and loops made on the tiles are placed in the tiles or slice headers.

In one embodiment, the tile attributes are not signaled in the parameter set, but rather are signaled directly in the slice header.

In one embodiment, an initial set of tile attributes is signaled, followed by a set of flags for each tile to enable or disable QPs. If QP is disabled, then as a backup scheme, each tile attribute in the initial set of tile attributes is used for the current tile.

In one embodiment, an initial set of tile attributes is signaled followed by a set of copy flags for each tile, where each flag specifies whether a tile attribute value should be copied from the tile attribute list or explicitly signaled for a particular tile.

In one embodiment, delta _ QP is signaled for each tile, and the final original tile or block QP is calculated as the sum of the decoder reference QP value from the sequence parameter set or picture parameter set plus tile delta _ QP plus optional delta _ QP signaled for the block.

General description of the invention

Embodiments of the present disclosure introduce the concept of signaling attribute values per tile in a compact manner by looping over the tiles at the PPS, picture header, slice header, or tile header level. Tile attributes may also be referred to as tile properties or tile parameters.

One example of a tile attribute to be defined for each tile according to an embodiment is delta _ QP. A delta _ QP value may be defined as the difference between a reference Quantization Parameter (QP) value and a quantization parameter value for a tile. The reference QP value may be a QP value signaled for a set of parameters, a picture, a slice, etc. The QP value for a tile may be an initial QP value for the tile, such as a QP value for a first block in the tile, or a QP value for a first block in a prediction tile.

The delta _ QP value may also be defined as the difference between the QP values of the previous tile and the current tile. In one example, if a tile region defined in a picture has texture content and another tile region in the picture has depth content, it may be beneficial to define delta _ QP for each tile, as different tiles may want to use different QP values. One tile may be encoded using a high QP value and one tile may be encoded using a low QP value.

Deblocking parameters are other examples of tile properties. In HEVC, the strength of the deblocking filter may be adjusted by the encoder on a picture and slice basis. According to embodiments of the present disclosure, a deblocking parameter, such as a deblocking strength, may be provided for each tile and thus may be adapted to the content of each tile.

In HEVC, similar to slice boundaries, tile boundaries do break the parsing and spatial prediction dependencies such that tiles can be processed independently, but loop filters (deblocking and SAO) can still cross tile boundaries to optionally prevent tile boundary artifacts. This function is controlled by the loop _ filter _ across _ tiles _ enabled _ flag syntax element in the PPS. Setting this function for each tile is another example of per-tile set tile attributes according to embodiments disclosed herein. In an embodiment, in the case where some tiles in a slice in a picture are independent in their content, while some other tiles are dependent, a loop _ filter _ across _ tiles _ enabled _ flag may be set for each tile, meaning that the flag is disabled for tiles with independent content and enabled for tiles with dependent content.

Motion constraints as defined for Motion Constrained Tile Sets (MCTS) in HEVC are other examples of tile attributes that may be defined for each tile according to embodiments disclosed herein. An MCTS set equal to 1 prohibits the use of motion vectors across tile boundaries. In HEVC, MCTS is set at the PPS level and applies to all tiles in a picture. As one of the tile properties according to an embodiment, each tile definition of a motion constraint allows motion prediction across some tile boundaries and prohibits motion prediction across other tile boundaries, which may be useful in applications where correlated and independent tile content is blended. Another possible tile attribute is which MCTS the tile belongs to is signaled.

Of course, the tile attributes are not limited to the above examples.

In all embodiments disclosed below, it is assumed that the parameters for the number of tiles in a picture are given by the tile structure. For example, in HEVC, the number of tiles is given by (num _ tile _ columns _ minus1+1) × (num _ tile _ rows _ minus1+ 1). In the flexible tile syntax given in table 2, this parameter is given by number _ of _ tiles _ in _ picture _ minus2+ 2.

Example 1-Tile Attribute set ID

In one embodiment, the tile attribute values are signaled for each tile using loops over the tiles in the PPS. At least one set of tile attributes is defined in the PPS, where each set contains at least one syntax element related to a tile attribute. In another part of the PPS, the indices of the tile attribute sets to be used for each tile are signaled in a loop over the tiles.

The following example encoder steps may be applied to the construction and signaling of tile properties:

1. if all the blocks have the same attribute value, a set _ id is used to create an attribute set; if there are at least two tiles in a picture with different tile attribute values, at least two sets of tile attributes are created and each of the at least two sets of tile attributes is assigned a unique set _ id.

2. For each tile in the picture, the tile _ attribute _ set _ id value is set to the set _ id value of the relevant tile attribute set.

3. The following is signaled in the bitstream: the number of tile attribute sets, the value of each tile attribute in each tile attribute set, and the tile _ attribute _ set _ id of each tile.

The following example decoder steps may be applied to extract tile attribute values from the bitstream of the current embodiment and use them during decoding:

1. the number of tile attribute sets is parsed.

2. For each set of tile attributes, the tile attributes in that set are parsed, the value (A) is stored and assigned to the set id.

3. For tile (T) in a picture, the tile _ attribute _ set _ id codeword is parsed and the tile attribute set is identified using the value (B) of tile _ attribute _ set _ id.

4. Decoding the tile (T) using the value (A) stored and assigned to the set id corresponding to the value (B).

An example syntax table and semantic description of embodiment 1 above the HEVC video coding specification is provided in table 3.

TABLE 3 Tile Attribute set ID

Wherein the content of the first and second substances,

number _ of _ tile _ attribute _ sets _ minus1 plus 1 specifies the number of attribute sets in the PPS;

tile _ attribute [ i ] specifies the value of a tile attribute in the ith tile attribute set; and

tile attribute set id i specifies the tile attribute set id to be used for the ith tile. the value of tile attribute set id i should be between 0 and number of tile attribute sets 1.

In the syntax example above, only one tile attribute in the set of tile attributes is signaled using UVLC. In general, there may be more than one tile attribute in each set of tile attributes, and each tile attribute may be signaled using UVLC, fixed length codes, or flags. If more than one tile attribute exists in the set of tile attributes, the number of tile attributes in the set may be signaled to the decoder using a codeword.

Example 2-signalling the number of Properties

In a second embodiment, the number of attributes in each set may be different. In this case, the number of tile attributes in each attribute set may be signaled to the decoder along with the attribute values. Loops for tile attributes may be placed in the PPS or in the slice header or in the picture header or in the tile header. On the decoder side, the number of tile property sets, the number of properties in each set, and the properties in each set are decoded. Finally, tile _ attribute _ set _ id is decoded to set the appropriate attribute value for each tile.

An example syntax table and semantic description of this embodiment above the HEVC video coding specification is given in table 4.

TABLE 4-number of Signaling attributes

Wherein the content of the first and second substances,

number _ of _ tile _ attribute _ sets _ minus1 plus 1 specifies the number of attribute sets for a tile;

number _ of _ tile _ attributes _ in _ set _ minus1 plus 1 specifies the number of attributes in each tile attribute set

tile _ attribute [ i, j ] specifies the value of the jth tile attribute in the ith set of tile attributes; and

tile attribute set id i specifies the tile attribute set id to be used for the ith tile. tile attribute set id i should be between 0 and number of tile attribute sets minus 1.

Example 3 sum of Attribute values in PPSCirculation in segment heads

In a third embodiment, the loop performed on a tile is placed in the picture header, the tile header, or the slice header. In the case where the loop over a tile is placed at the tile header, the tile header of a subsequent tile may be optional. At least one set of tile attributes is defined, wherein each set contains at least one syntax element related to a tile attribute. In a tile or slice header, a loop over the tiles signals an index of the set of tile attributes to be used for each tile.

If each tile has a tile header and the loop performed on the tile is placed in the tile header, the loop performed on the tile only loops over the current tile associated with the tile header, i.e., enters the loop only once.

A subset of the following example decoder steps may be used in this embodiment:

1. the decoder determines a number (N) of tile attribute sets from one or more codewords in the bitstream, where N > 1.

2. For each tile attribute set, the decoder determines a number of tile attribute values (M) from codewords in the bitstream, where M > 0 and the total number of attribute values is equal to N × M.

3. The decoder assigns a unique set id value (V) to each tile attribute set. Optionally, the set id value (V) of the first attribute set in decoding order is 0, the set id value (V) of the second attribute set in decoding order is 1, and so on.

4. The decoder decodes the segment header and determines the number of blocks in the segment according to the code word in the parameter set or the code word in the segment header.

5. For each tile (T) in the slice, the decoder determines the tile set id (i) from the codeword in the slice header and stores the set id (i) value in the list L.

6. The decoder decodes a particular tile (P) of the segment and determines the tile number of the tile P.

7. The decoder uses the tile number as an index in the list L to determine the tile set id (i) of the tile P.

8. The tile set id (i) is used to select the tile attribute set assigned to the set id value (V), which is equal to the tile set id (i).

9. The decoder uses the attribute values of the selected tile attribute set in the decoding process for tile P.

Here, a slice is a complete picture or a part of a picture. A slice includes a slice header and encoded video data representing a portion of a picture. The slice header includes syntax elements whose values are used to decode encoded video data representing a portion of a picture. A fragment may include multiple tiles, and this particular type of fragment may be referred to as a tile group. A tile is one example type of fragment and a group of tiles is another example type of fragment. Embodiments are not limited to these two exemplary types.

The above decoder steps can be explained by the example shown in fig. 6. The bitstream (20) is decoded into a decoded picture (30) comprising four tiles shown as (31, 32, 33, 34). The tile structure is passed to the decoder through a picture parameter set (not shown in the figure). Assume in this example that the decoder decodes the number of attribute sets to 3. The decoder then decodes the 3 tile attribute sets and assigns set id values 0, 1, 2 to the three sets.

A picture is divided into two slices. One segment includes segments 31 and 32 and the other segment includes segments 33 and 34. The first slice includes a slice header 10 and encoded slices 11 and 12 for slices 31 and 32, respectively. The second segment includes a segment header 13 and encoded tiles 14 and 15 for tiles 33 and 34, respectively.

When the decoder decodes the slice header 13, the data in the header 13 specifies: the segment includes two tiles, and they are the third and fourth tiles in the picture. In the slice header 13, there are two tile set id values, one for tile 33 and one for tile 34. When the decoder decodes the tile data 14, it uses the set of tile attributes in the PPS that match the tile set id value for the tile 33. When the decoder decodes the tile data 15, it uses the set of tile attributes in the PPS that match the tile set id value for the tile 34.

An example syntax table and semantic description of this embodiment above the HEVC video coding specification is given in table 5. In this example syntax table, it is assumed that the loop performed on the tile is placed in the slice segment header.

Table 5-cycle in segment header where,

number _ of _ tile _ attribute _ sets _ minus1 plus 1 specifies the number of attribute sets in the PPS;

tile _ attribute [ i ] gives the value of the tile attribute in the ith tile attribute set;

number _ of _ tiles _ in _ slice specifies the number of tiles in a slice; and

tile attribute set id i specifies which set of tile attributes should be assigned to the ith tile. tile attribute set id i should be between 0 and number of tile attribute sets minus 1.

Example 4 Attribute Signaling in segment headers

In this embodiment, the attribute is not signaled in the parameter set. Instead, the attribute values are signaled directly in the slice header. Fragment headers are as defined in example 3. In the slice header, there are one or more codewords that specify how many tiles there are in the slice and the spatial location of the tiles. At least one tile attribute value is then signaled for each tile of the segment.

A subset of the following example decoder steps may be used in this embodiment:

1. the decoder decodes the slice header and determines a number of tiles (N) in the picture or portion of the picture from a codeword in the parameter set or a codeword in the slice header, where N > 1.

2. The decoder creates a list L of size N, where each entry in the list includes at least one tile attribute value. The attribute values are decoded from the slice header.

3. The decoder decodes a particular tile (P) of the segment and determines the tile number of the tile P.

4. The decoder uses the tile number as an index in the list L to select the tile attribute value to be used for decoding the tile P.

5. The decoder uses the selected attribute values in the decoding process of tile P.

An example syntax table and semantic description of this embodiment above the HEVC video coding specification is given in table 6.

TABLE 6 Attribute Signaling in segment headers

Wherein the content of the first and second substances,

slice _ address specifies the spatial position of the slice;

number _ of _ tiles _ in _ slice specifies the number of tiles in a slice. Along with slice address values and tile partitioning information derived from other codewords, the decoder can derive the spatial location of each tile in the slice;

tile attribute i gives the value of the tile attribute for each tile in the tile.

Example 5 Attribute flags

In this embodiment, a set of tile attributes is signaled, and then a set of flags is signaled for each tile to determine the usage of the tile attributes in the set of tile attributes.

In a second variation of this embodiment, the set of flags of each tile signaled specifies whether to use or turn off use of each initial tile attribute value for that tile.

In a third variation of this embodiment, two states, e.g., state 1 and state 2, for each tile attribute are signaled. Then, a set of flags is signaled for each tile, one flag per tile attribute, where each flag specifies the use of a tile attribute having state 1 or state 2.

In a fourth variation of this embodiment, a set of tile attributes is signaled, and then for each tile and each attribute, a flag specifies: the initial value of the tile attribute will be used (e.g., flag 0) or should be overwritten (e.g., flag 1). In the case of an overwrite, the flag is followed by a codeword specifying the new value of the tile attribute.

Table 7 gives example syntax tables and semantic descriptions of embodiment 2 based on the HEVC specification:

TABLE 7 Attribute flags

Wherein the content of the first and second substances,

number _ of _ tile _ attributes specifies the number of tile attributes for a tile in a picture;

tile _ attribute [ i ] specifies the value of the ith tile attribute in the tile attribute list; and

tile _ attribute _ override _ flag [ i, j ] equal to 0 specifies that the ith tile attribute is to be used for the jth tile. tile _ attribute _ override _ flag [ i, j ] equal to 1 specifies that the ith tile attribute is to be overwritten at the jth tile.

In yet another variation of this embodiment, instead of using the overwrite flag as in the example of table 7, a flag is used to determine: for a particular tile, the attribute values should be copied from the tile attribute list or explicitly signaled. The difference from the previous example is that the attribute list remains static.

This is illustrated below with the syntax and semantics above HEVC as an example.

TABLE 8 Attribute flags

Wherein the content of the first and second substances,

number _ of _ tile _ attributes specifies the number of tile attributes for a tile in a picture;

tile _ attribute _ in _ list [ i ] specifies the value of the ith tile attribute in the tile attribute list;

tile _ attribute _ copy _ flag [ i, j ] is equal to a value of 1 specifying the ith tile attribute of the jth tile to be copied from the tile attribute list. tile _ attribute _ copy _ flag [ i, j ] equal to 0 specifies that the ith tile attribute value of the jth tile is explicitly signaled;

tile _ attribute [ i, j ] specifies the value of the ith tile attribute for the jth tile. If tile _ attribute _ copy _ flag [ i, j ] is equal to 1, tile _ attribute [ i, j ] is set equal to tile _ attribute _ in _ list [ i ].

Example 6 Delta OP

As previously mentioned, one attribute that may be used in embodiments disclosed herein is delta _ QP. In one embodiment, the decoder determines a reference QP value (refQP) from a sequence parameter set or a picture parameter set. Then, using any of the previously described embodiments, delta QP values are signaled for each tile. Looking at one tile T, let the corresponding delta QP value be deltaQP 1. Optionally, there may be a deltaQP2 signaled for the first block of tile T. Then, the QP value to be used for the first block becomes refQP + deltaQP1+ deltaQP 2. If the block does not have deltaQP2 (e.g., because the first block does not contain any non-zero transform coefficients), the QP value for the first block becomes refQP + deltaQP1, which is also referred to as the initial QP value for the block.

The QP value assigned to the first block is used to scale the decoded transform coefficients. It can also be used for a deblocking process of sample values near a boundary between the first block and the adjacent block. According to this embodiment, the possibility of signaling deltaQP per tile is useful for splicing tiles originating from different bitstreams into one output bitstream. Tile T1 may have been encoded into bitstream B1 using QP value Q1, and tile T2 may have been encoded into bitstream B2 using QP value Q2. If T1 and T2 were spliced into one output bitstream without the possibility of setting tile QP, then the QP values for T1 and T2 may not be set correctly in the output stream. By making delta _ QP part of the tile attributes and using one of the previously described embodiments, the correct QP value can be set for all output tiles by simply changing the value in the header. This is important because changing values in a video coding layer requires rewriting large amounts of data because the video coding layer can be coded using arithmetic coding (e.g., CABAC).

Note that in HEVC, block delta QP is signaled only for blocks that contain at least one non-zero transform coefficient. This means that if the first block in tile T1 and the first block in T2 do not contain any non-zero coefficients, it is not possible to assign the correct QP value for tiles T1 and T2 if these two blocks are stitched together into one picture unless a slice header is inserted.

As previously described, the HEVC specification specifies that the initial QP value (tile QP) to be used for each tile is set to 26+ init _ QP _ minus26+ slice _ QP _ delta. One implementation of this embodiment above HEVC would be to set the initial QP value to be used for the tile in the slice to:

QP＝26+init_qp_minus26+slice_qp_delta+tile_qp_delta

QP＝26+init_qp_minus26+tile_qp_delta

Wherein tile _ qp _ delta is conveyed in the codeword sent for each tile, such that the value of tile _ qp _ delta in each tile of a picture may be different even though multiple pictures belong to the same slice or segment using a loop over the tile to signal tile _ qp _ delta as an attribute using one of the embodiments described above.

Alternatively, the QP to be used for the first block is set to QP if the first block does not contain any transform coefficients, or QP to QP + dQP if the first block does contain transform coefficients, where dQP is a delta QP syntax element sent in the video coding layer.

Coding method

Fig. 7 depicts steps in a method 100 of encoding a picture. The picture is divided into a plurality of tiles (block 102). The one or more tile attributes are associated with the one or more tiles (block 104). Each tile attribute includes a syntax element or flag indicating the applicability of the tile attribute. One or more tile attributes are assigned to a tile or subset of tiles (block 106). Picture and tile attributes are signaled in a bitstream (110), wherein the tile attributes are signaled using one or more cycles performed on the tiles (block 108).

Advantages of the invention

Embodiments of the present disclosure enable more flexible assignment of tile attribute values than is possible in the prior art. Multiple use cases, such as 360 degree video and 2D plus depth formats, benefit from this flexible per tile attribute assignment approach because the attribute values of different tiles in a picture can be customized according to the requirements of each use case and then signaled in a compact manner using the loop over the tiles using the proposed solution.

In embodiments disclosed herein, different tile attribute values may be assigned to individual tiles using only a tile partitioning tool. This is in contrast to HEVC, which must use tile and slice partitioning tools together for the same purpose.

The benefits of the embodiments disclosed herein are greater for use cases where different attribute values are adapted to apply to different tiles in the same picture, based on the nature of the content (e.g., texture and depth information) or the properties of the content in the different tiles, such as quantization or noise levels, or the different requirements of the different tiles in terms of post-processing to suppress compression artifacts. In such applications the possibility of signaling the attribute values of each tile is very beneficial.

Some embodiments contemplated herein are described more fully with reference to the accompanying drawings. However, other embodiments are also within the scope of the subject matter disclosed herein. The disclosed subject matter should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided as examples to convey the scope of the subject matter to those skilled in the art.

Although primarily described herein as a method, the inventive concepts are conveyed to one skilled in the art using software pseudo code, embodiments of the disclosure may be implemented as a process or method; as encoding/decoding means; as a transitory or non-transitory computer-readable medium containing instructions operable to cause a processing circuit to perform a particular process or method; or as a computer program product operable to cause a processing circuit to perform a specific process or method.

28页详细技术资料下载

Picture tile attributes signaled using cycles over tiles

相关技术

网友询问留言