Attribute layer and indication improvements in point cloud coding

文档序号:739781 发布日期:2021-04-20 浏览:19次 中文

阅读说明:本技术 点云译码中属性层和指示的改进 (Attribute layer and indication improvements in point cloud coding ) 是由 王业奎 弗努·亨德里 弗莱德斯拉夫·扎克哈成科 于 2019-09-10 设计创作,主要内容包括:本发明公开了一种视频译码机制。所述机制包括接收包括多个经译码的点云译码(point cloud coding,PCC)帧序列的码流。所述多个经译码的PCC帧序列表示多个PCC属性,所述PCC属性包括几何形状、纹理以及反射率、透明度和法向中的一个或多个。每个译码PCC帧由一个或多个PCC网络抽象层(network abstraction layer,NAL)单元表示。所述机制还包括对所述码流进行解析,以针对每个PCC属性获取用于译码对应PCC属性的多个视频编解码器中的一个视频编解码器的指示。所述机制还包括根据所述PCC属性的所指示的视频编解码器对所述码流进行解码。(The invention discloses a video decoding mechanism. The mechanism includes receiving a codestream comprising a sequence of multiple decoded point cloud decoding (PCC) frames. The plurality of coded sequences of PCC frames represents a plurality of PCC attributes including geometry, texture, and one or more of reflectivity, transparency, and normal. Each decoded PCC frame is represented by one or more PCC Network Abstraction Layer (NAL) units. The mechanism further includes parsing the codestream to obtain, for each PCC attribute, an indication of one of a plurality of video codecs used to decode the corresponding PCC attribute. The mechanism further includes decoding the codestream according to the indicated video codec of the PCC attribute.)

1. A method implemented by a video decoder, the method comprising:

receiving, by a receiver of the decoder, a code stream comprising a plurality of sequences of coded Point Cloud Coding (PCC) frames, wherein the plurality of sequences of coded PCC frames represent a plurality of PCC attributes, the PCC attributes including one or more of geometry, texture, and reflectivity, transparency, and normal, and each coded PCC frame is represented by one or more PCC Network Abstraction Layer (NAL) units;

the processor of the decoder parsing the codestream to obtain, for each PCC attribute, an indication of one of a plurality of video codecs (codecs) for decoding the corresponding PCC attribute;

the processor decodes the codestream according to the video codec indicated by the PCC attribute.

2. The method of claim 1, wherein each sequence of Point Cloud Coding (PCC) frames is associated with a sequence-level data unit comprising a sequence-level parameter, wherein the sequence-level data unit comprises a first syntax element that indicates that a first attribute is coded by a first video codec and that indicates that a second attribute is coded by a second video codec.

3. The method according to claim 1 or 2, wherein the first syntax element is an identified _ codec _ for _ attribute element contained in a frame group header in the bitstream.

4. The method of any of claims 1-3, wherein the first attribute is organized into a plurality of streams, and wherein a second syntax element represents stream membership of a data unit of the codestream associated with the first attribute.

5. The method of any of claims 1-4, wherein the first attribute is organized into a plurality of layers, and wherein a third syntax element represents a layer membership of a data unit of the codestream associated with the first attribute.

6. The method according to any of claims 1 to 5, wherein the second syntax element is a num _ streams _ for _ attribute element contained in a frame group header in the code stream, and the third syntax element is a num _ layers _ for _ attribute element contained in the frame group header in the code stream.

7. The method of any of claims 1-6, wherein a fourth syntax element indicates that a first layer of the plurality of layers contains data associated with an irregular point cloud.

8. The method of any of claims 1-7, wherein the fourth syntax element is a regular _ points _ flag element included in a frame header of the bitstream.

9. The method of any of claims 1-8, wherein the codestream is decoded into a decoded sequence of PCC frames, and further comprising the processor forwarding the decoded sequence of PCC frames to a display for presentation.

10. A method implemented in a video encoder, the method comprising:

encoding, by a processor of the encoder, a plurality of Point Cloud Coding (PCC) attributes of a sequence of PCC frames into a codestream using a plurality of codecs, wherein the plurality of PCC attributes includes one or more of geometry, texture, and reflectivity, transparency, and normal, and each decoded PCC frame is represented by one or more PCC Network Abstraction Layer (NAL) units;

the processor encodes, for each PCC attribute, an indication of one of the video codecs used to code the corresponding PCC attribute;

and the transmitter of the encoder transmits the code stream to the decoder.

11. The method of claim 10, wherein the sequence of PCC frames is associated with a sequence-level data unit comprising a sequence-level parameter, wherein the sequence-level data unit comprises a first syntax element that indicates that a first PCC attribute is coded by a first video codec and that indicates that a second PCC attribute is coded by a second video codec.

12. The method according to claim 10 or 11, wherein the first syntax element is an identified _ codec _ for _ attribute element contained in a frame group header in the bitstream.

13. The method of any of claims 10 to 12, wherein the first attribute is organized into a plurality of streams, and wherein a second syntax element represents stream membership of a data unit of the codestream associated with the first attribute.

14. The method of any of claims 10 to 13, wherein the first attribute is organized into a plurality of layers, and wherein a third syntax element represents a layer membership of a data unit of the codestream associated with the first attribute.

15. The method according to any of claims 10 to 14, wherein the second syntax element is a num _ streams _ for _ attribute element contained in a frame group header in the codestream, and the third syntax element is a num _ layers _ for _ attribute element contained in the frame group header in the codestream.

16. The method of any of claims 10-15, wherein a fourth syntax element indicates that a first layer of the plurality of layers contains data associated with an irregular point cloud.

17. The method of any of claims 10-16, wherein the fourth syntax element is a regular _ points _ flag element included in a frame header of the bitstream.

18. A video coding device, comprising:

a processor, a receiver coupled with the processor, and a transmitter coupled with the processor, wherein the processor, receiver, and transmitter are configured to perform the method of any of claims 1-17.

19. A non-transitory computer readable medium comprising a computer program product for use with a video coding apparatus; the computer program product comprises computer executable instructions stored in the non-transitory computer readable medium; the computer-executable instructions, when executed by a processor, cause the video coding apparatus to perform the method of any of claims 1-17.

20. An encoder, comprising:

a first attribute encoding module and a second attribute encoding module for encoding a plurality of PCC attributes of a sequence of Point Cloud Coding (PCC) frames into a codestream using a plurality of codecs, wherein the plurality of PCC attributes include one or more of geometry, texture, and reflectivity, transparency, and normal, and each decoded PCC frame is represented by one or more PCC Network Abstraction Layer (NAL) units;

a syntax encoding module to encode, for each PCC attribute, an indication of one of the video codecs used to decode the corresponding PCC attribute;

and the sending module is used for sending the code stream to a decoder.

21. The encoder according to claim 20, characterized in that the encoder is further adapted to perform the method according to any of the claims 10 to 17.

22. A decoder, comprising:

a receiving module, configured to receive a code stream comprising a plurality of decoded Point Cloud Coding (PCC) frame sequences, wherein the plurality of decoded PCC frame sequences represent a plurality of PCC attributes, the PCC attributes include one or more of geometry, texture, and reflectivity, transparency, and normal, and each decoded PCC frame is represented by one or more PCC Network Abstraction Layer (NAL) units;

the analysis module is used for analyzing the code stream so as to obtain an indication of one video codec in a plurality of video codecs for decoding the corresponding PCC attribute aiming at each PCC attribute;

and the decoding module is used for decoding the code stream according to the video codec indicated by the PCC attribute.

23. The decoder according to claim 22, characterized in that the decoder is further adapted to perform the method according to any of claims 1 to 9.

Technical Field

The present invention relates generally to video coding, and in particular to coding of video attributes of Point Cloud Coding (PCC) video frames.

Background

Even if the video is relatively short, a large amount of video data is required to describe, which may cause difficulties when the data is to be streamed or otherwise transmitted in a communication network with limited bandwidth capacity. Therefore, video data is typically compressed and then transmitted over modern telecommunication networks. Since memory resources may be limited, the size of the video may also be an issue when stored on a storage device. Video compression devices typically use software and/or hardware on the source side to encode the video data for transmission or storage, thereby reducing the amount of data required to represent digital video images. Then, the compressed data is received at the destination side by a video decompression apparatus that decodes the video data. With limited network resources and an increasing demand for higher video quality, there is a need for improved compression and decompression techniques that can increase the compression ratio with little impact on image quality.

Disclosure of Invention

In one embodiment, the invention includes a method implemented by a video decoder. The method comprises the following steps: a receiver receives a code stream comprising a plurality of sequences of coded Point Cloud Coding (PCC) frames, wherein the plurality of sequences of coded PCC frames represent a plurality of PCC attributes including one or more of geometry, texture, and reflectivity, transparency, and normal, and each coded PCC frame is represented by one or more PCC Network Abstraction Layer (NAL) units. The method further comprises the following steps: a processor parses the codestream to obtain, for each PCC attribute, an indication of one of a plurality of video codecs (codecs) used to decode the corresponding PCC attribute. The method further comprises the following steps: the processor decodes the codestream according to the video codec indicated by the PCC attribute. In some video coding systems, a single codec is used to encode an entire sequence of PCC frames. The PCC frame may include a plurality of PCC attributes. Some video codecs may encode some PCC attributes more efficiently than other video codecs. This embodiment allows different video codecs to encode different PCC attributes for the same sequence of PCC frames. The present embodiments also provide various syntax elements to support coding flexibility when a PCC frame in a sequence employs multiple PCC attributes (e.g., three or more). By providing more attributes, the encoder may encode more complex PCC frames. Furthermore, the decoder may decode and thus display more complex PCC frames. Furthermore, by allowing different codecs to be used for different properties, the optimal coding process can be selected according to the codec. This may reduce processor resource usage on the encoder and decoder sides. In addition, this may improve compression and decoding efficiency, thereby reducing memory usage and network resource usage when transmitting a code stream between an encoder and a decoder.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: each sequence of PCC frames is associated with a sequence-level data unit that includes a sequence-level parameter, wherein the sequence-level data unit includes a first syntax element that represents that a first attribute is coded by a first video codec and that represents that a second attribute is coded by a second video codec.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first syntax element is an identified _ codec _ for _ attribute element contained in a frame group header in the codestream.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first attribute is organized into a plurality of streams, and a second syntax element represents stream membership of a data unit of the codestream associated with the first attribute.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first attribute is organized into a plurality of layers, and a third syntax element represents a layer membership of a data unit of the codestream associated with the first attribute.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the second syntax element is a num _ streams _ for _ attribute element included in a frame group header in the codestream, and the third syntax element is a num _ layers _ for _ attribute element included in the frame group header in the codestream.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: a fourth syntax element represents that a first layer of the plurality of layers includes data associated with an irregular point cloud.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the fourth syntax element is a regular _ points _ flag element included in a frame group header in the bitstream.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the codestream is decoded into a decoded sequence of PCC frames, and the method further includes the processor forwarding the decoded sequence of PCC frames to a display for presentation.

In one embodiment, the invention includes a method implemented in a video encoder. The method comprises the following steps: the processor encodes, with a plurality of codecs, a plurality of PCC attributes of a sequence of PCC frames into a codestream, wherein the plurality of PCC attributes includes one or more of geometry, texture, and reflectivity, transparency, and normal, and each coded PCC frame is represented by one or more PCC NAL units. The method further comprises the following steps: the processor encodes, for each PCC attribute, an indication of one of the video codecs used to code the corresponding PCC attribute. The method further comprises the following steps: and the transmitter transmits the code stream to the decoder. In some video coding systems, a single codec is used to encode an entire sequence of PCC frames. The PCC frame may include a plurality of PCC attributes. Some video codecs may encode some PCC attributes more efficiently than other video codecs. This embodiment allows different video codecs to encode different PCC attributes for the same sequence of PCC frames. The present embodiments also provide various syntax elements to support coding flexibility when a PCC frame in a sequence employs multiple PCC attributes (e.g., three or more). By providing more attributes, the encoder may encode more complex PCC frames. Furthermore, the decoder may decode and thus display more complex PCC frames. Furthermore, by allowing different codecs to be used for different properties, the optimal coding process can be selected according to the codec. This may reduce processor resource usage on the encoder and decoder sides. In addition, this may improve compression and decoding efficiency, thereby reducing memory usage and network resource usage when transmitting a code stream between an encoder and a decoder.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the sequence of PCC frames is associated with a sequence-level data unit that includes a sequence-level parameter, wherein the sequence-level data unit includes a first syntax element that represents that a first PCC attribute is coded by a first video codec and that represents that a second PCC attribute is coded by a second video codec.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first syntax element is an identified _ codec _ for _ attribute element contained in a frame group header in the codestream.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first attribute is organized into a plurality of streams, and a second syntax element represents stream membership of a data unit of the codestream associated with the first attribute.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first attribute is organized into a plurality of layers, and a third syntax element represents a layer membership of a data unit of the codestream associated with the first attribute.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the second syntax element is a num _ streams _ for _ attribute element included in a frame group header in the codestream, and the third syntax element is a num _ layers _ for _ attribute element included in the frame group header in the codestream.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: a fourth syntax element represents that a first layer of the plurality of layers includes data associated with an irregular point cloud.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the fourth syntax element is a regular _ points _ flag element included in a frame group header in the bitstream.

In one embodiment, this disclosure includes a video coding apparatus comprising: a processor, a receiver coupled to the processor, and a transmitter coupled to the processor, wherein the processor, receiver, and transmitter are configured to perform the method according to any of the above aspects.

In one embodiment, the invention includes a non-transitory computer readable medium comprising a computer program product for use with a video coding apparatus; the computer program product comprises computer executable instructions stored in the non-transitory computer readable medium; the computer-executable instructions, when executed by a processor, cause the video coding apparatus to perform a method according to any one of the above aspects.

In one embodiment, the present invention includes an encoder comprising a first attribute encoding module and a second attribute encoding module to encode a plurality of PCC attributes of a sequence of PCC frames into a codestream with a plurality of codecs, wherein the plurality of PCC attributes includes one or more of geometry, texture, and reflectivity, transparency, and normal, and each coded PCC frame is represented by one or more PCC NAL units. The encoder further includes a syntax encoding module to encode, for each PCC attribute, an indication of one of the video codecs used to code the corresponding PCC attribute. The encoder also comprises a sending module used for sending the code stream to a decoder.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the encoder is further configured to perform the method according to any of the above aspects.

In one embodiment, the present invention includes a decoder comprising a receiving module for receiving a codestream comprising a plurality of coded sequences of PCC frames, wherein the plurality of coded sequences of PCC frames represent a plurality of PCC attributes, the PCC attributes including one or more of geometry, texture, and reflectivity, transparency, and normal, and each coded PCC frame is represented by one or more PCC NAL units. The decoder further includes a parsing module configured to parse the code stream to obtain, for each PCC attribute, an indication of one video codec of a plurality of video codecs used for decoding the corresponding PCC attribute. The decoder further includes a decoding module configured to decode the codestream according to the video codec indicated by the PCC attribute.

Optionally, in any of the above aspects, there is provided another implementation of the aspect: the decoder is also configured to perform the method according to any of the above aspects.

For the sake of clarity, any of the above-described embodiments may be combined with any one or more of the other embodiments described above to create new embodiments within the scope of the invention.

These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.

Drawings

For a more complete understanding of the present invention, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

FIG. 1 is a flow diagram of an exemplary method of decoding a video signal;

FIG. 2 is a schematic diagram of an exemplary encoding and decoding (codec) system for video coding;

FIG. 3 is a schematic diagram of an exemplary video encoder;

FIG. 4 is a schematic diagram of an exemplary video decoder;

FIG. 5 is an example of a point cloud media that may be transcoded according to a PCC mechanism;

FIG. 6 is an example of data segmentation and packing of a point cloud media frame;

fig. 7 is a schematic diagram of an exemplary PCC video stream with an extended set of attributes;

fig. 8 is a diagram of an exemplary mechanism for encoding PCC attributes using multiple codecs;

FIG. 9 is a schematic diagram of an example of a property layer;

FIG. 10 is a schematic diagram of an example of an attribute flow;

fig. 11 is a flow diagram of an example method for encoding a PCC video sequence using multiple codecs;

fig. 12 is a flow diagram of an example method of decoding a PCC video sequence using multiple codecs;

FIG. 13 is a schematic diagram of an exemplary video coding apparatus;

fig. 14 is a schematic diagram of an example system for coding PCC video sequences using multiple codecs;

fig. 15 is a flow diagram of another exemplary method of encoding a PCC video sequence using multiple codecs;

fig. 16 is a flow diagram of another exemplary method of decoding a PCC video sequence using multiple codecs;

Detailed Description

It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The present invention should in no way be limited to the illustrative embodiments, drawings, and techniques illustrated below, including the exemplary designs and embodiments illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

Many video compression techniques can be used to reduce video files while minimizing data loss. For example, video compression techniques may include performing spatial (e.g., intra) prediction and/or temporal (e.g., inter) prediction to reduce or remove data redundancy in a video sequence. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as tree blocks (treeblocks), Coding Tree Blocks (CTBs), Coding Tree Units (CTUs), Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice within a picture are coded using spatial prediction with reference to reference samples in neighboring blocks within the same picture. Video blocks in an inter-coded (P or B) slice within a picture may be coded using spatial prediction for reference samples in neighboring blocks within the same picture, or using temporal prediction for reference samples within other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame. Spatial prediction or temporal prediction produces a prediction block that represents an image block. The residual data represents pixel differences between the original image block and the prediction block. Thus, an inter-coded block is coded according to a motion vector pointing to a block of reference samples constituting a predicted block and residual data representing the difference between the coded block and the predicted block. The intra-coded block is coded according to the intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain. Thereby generating residual transform coefficients that can be quantized. The quantized transform coefficients are initially arranged in a two-dimensional array. The quantized transform coefficients may be scanned to produce a one-dimensional vector of transform coefficients. Entropy coding may be applied to achieve further compression. Such video compression techniques are discussed in more detail below.

To ensure that the encoded video can be correctly decoded, the video is encoded and decoded according to the corresponding video coding standard. The video coding standards include the International Telecommunication Union (ITU) standardization sector (ITU-T) H.261, the International organization for standardization/International electrotechnical Commission (ISO/IEC) Moving Picture Experts Group (MPEG) -1 part 2, ITU-T H.262 or ISO/IEC MPEG-2 part 2, ITU-T H.263, ISO/IEC-4 part 2, Advanced Video Coding (AVC) (also known as ITU-T H.264 or ISO/IEC-4 part 10), and high efficiency video coding (ITU-T H.265 ). AVC includes Scalable Video Coding (SVC), Multiview Video Coding (MVC), and multiview plus depth video coding (MVC + D) and three-dimensional (3D) AVC (3D-AVC) extensions. HEVC includes Scalable HEVC (SHVC), multi-view HEVC (MV-HEVC), and 3D HEVC (3D-HEVC) extensions. The joint video experts group (jvt) of ITU-T and ISO/IEC has begun to develop a video coding standard called universal video coding (VVC). VVCs are included in the Working Draft (WD), which includes JFET-K1001-v 4 and JFET-K1002-v 1.

PCC is a mechanism for encoding video of 3D objects. The point cloud is a collection of data points in 3D space. These data points include parameters that determine location in space, color, and the like. The point cloud may be used in various applications such as real-time 3D immersive reality, Virtual Reality (VR) viewing of content with interactive parallax, 3D free viewpoint sports playback broadcasting, geographic information systems, cultural heritage, autonomous navigation based on large scale 3D dynamic maps, and automotive applications. The ISO/IEC MPEG codec for PCC can operate on lossless and/or lossy compressed point cloud data, and has high coding efficiency and robustness to a network environment. Using such a codec, point clouds can be processed as a computer data form, stored in various storage media, transmitted and received over a network, and distributed over broadcast channels. The PCC coding environment is divided into PCC category 1, PCC category 2, and PCC category 3. The present invention is directed to PCC category 2, which PCC category 2 is related to MPEG output files N17534 and N17533. The design of PCC category 2 codecs aims at compressing the geometry information and texture information of dynamic point clouds by compressing the point cloud data into a collection of different video sequences, with other video codecs. For example, two video sequences may be generated and compressed using one or more video codecs, one video sequence representing geometry information of the point cloud data and the other video sequence representing texture information. Other metadata (e.g., occupancy map and auxiliary slice information) that support interpreting the video sequence may also be generated and compressed separately.

The PCC system may support geometry PCC attributes that include location data and texture PCC attributes that include color data. However, some video applications may contain other types of data, such as reflectivity, transparency, and normal vectors. Some of these types of data may be coded more efficiently using certain codecs than using other codecs. However, PCC systems may require that the entire PCC flow be encoded by the same codec, and therefore require that all PCC attributes be encoded by the same codec. Further, the PCC attributes may be partitioned into multiple layers. The layers may then be combined and/or transcoded into one or more PCC attribute streams. For example, the layers of the attribute may be decoded according to a time-interleaved decoding scheme, in which a first layer is decoded in a PCC Access Unit (AU) whose image output order is an even value, and a second layer is decoded in a PCC AU whose image output order is an odd value. Since there may be 0 to 4 streams per attribute and such streams have various layer combinations, correctly identifying streams and layers can be a challenge. However, a PCC system may not be able to determine how many layers are decoded or combined in a given PCC code stream. Furthermore, PCC systems may not have a mechanism to indicate the manner in which layers are combined and/or to indicate the correspondence between the layers and PCC attribute flows. Finally, the PCC video data is coded using slices. For example, a three-dimensional (3D) PCC object may be represented as a set of two-dimensional (2D) slices. This allows the PCC to be used in conjunction with a video codec for encoding 2D video frames. However, in some cases, some points in the point cloud may not be captured by the patch. For example, isolated points in 3D space may be difficult to decode as part of a slice. In this case, the only meaningful tile is a one pixel by one pixel tile containing only a single point, which can significantly increase the indication (signal) overhead in the case of many such points. Instead, an irregular point cloud may be used, which is a special patch containing a number of isolated points. For irregular point cloud slices, the attributes may be indicated using a different method than other slice types. However, the PCC system may not be able to indicate that the PCC attribute layer carries irregular point cloud points/patches.

Disclosed herein are mechanisms for improving PCC by addressing the above-mentioned problems. In one embodiment, the PCC system may use different codecs to decode different PCC attributes. In particular, a separate syntax element may be used to identify the video codec for each attribute. In another embodiment, the PCC system explicitly indicates the number of layers that are coded and/or combined to represent each PCC attribute flow. Further, the PCC system may use the syntax elements to indicate a manner for decoding and/or combining PCC attribute layers in the PCC attribute stream. Further, the PCC system may use one or more syntax elements to represent a layer index for the layer associated with each data element of the corresponding PCC attribute flow. In yet another embodiment, a flag may be used for each PCC attribute layer to indicate whether the PCC attribute layer carries any irregular point cloud points. Such embodiments may be used alone or in combination. Furthermore, such embodiments allow PCC systems to use more complex coding mechanisms in a manner that the decoder can recognize and, thus, the decoder can decode. These examples and others are described in detail below.

Fig. 1 is a flow chart of an exemplary method 100 of operation for decoding a video signal. Specifically, the encoder encodes a video signal. In the encoding process, various mechanisms are employed to compress the video signal to reduce the video file. The file is small, the compressed video file can be sent to a user, and meanwhile, related bandwidth overhead is reduced. The decoder then decodes the compressed video file to reconstruct the original video signal for display to the end user. The decoding process is typically the inverse of the encoding process, so that the video signal reconstructed by the decoder remains identical to the video signal at the encoder side.

In step 101, a video signal is input into an encoder. For example, the video signal may be an uncompressed video file stored in a memory. As another example, a video file may be captured by a video capture device (e.g., a video camera) and encoded to support live streaming of the video. The video file may include an audio component and a video component. The video component comprises a series of image frames. When these image frames are viewed in sequence, a visual effect of motion is given. These frames include pixels represented in light, referred to herein as luminance components (or luminance samples), and pixels represented in color, referred to as chrominance components (or color samples). In some examples, the frames may also include depth values to support three-dimensional viewing.

In step 103, the video is divided into blocks. Segmentation involves subdividing the pixels in each frame into square and/or rectangular blocks for compression. For example, in High Efficiency Video Coding (HEVC), also referred to as h.265 and MPEG-H part 2, a frame may first be divided into Coding Tree Units (CTUs), which are blocks of a predefined size (e.g., 64 pixels by 64 pixels). These CTUs include luma samples and chroma samples. The CTUs may be partitioned into blocks using a coding tree and then repeatedly subdivided until a configuration is obtained that supports further coding. For example, the luminance component of a frame may be subdivided until each block includes relatively uniform luminance values. In addition, the chroma components of a frame may be subdivided until each block includes a relatively uniform color value. Therefore, the segmentation mechanism differs according to the content of the video frame.

In step 105, the image block divided in step 103 is compressed using various compression mechanisms. For example, inter prediction and/or intra prediction may be employed. Inter prediction exploits the fact that: objects in common scenes tend to appear in successive frames. Therefore, a block representing an object in a reference frame does not need to be represented repeatedly in adjacent frames. In particular, an object (e.g., a table) may remain fixed in position over multiple frames. Thus, the table is represented once and the reference frame can be re-referenced by neighboring frames. A pattern matching mechanism may be employed to match objects in multiple frames. Further, a moving object may be represented in a plurality of frames due to object movement or camera movement, or the like. In one particular example, the video may show the car moving across the screen in multiple frames. Motion vectors may be used to represent such movement. A motion vector is a two-dimensional vector that provides an offset between the coordinates of an object in one frame and the coordinates of the object in a reference frame. Thus, inter prediction may encode an image block in a current frame as a set of motion vectors that indicate the offset between the image block in the current frame and a corresponding block in a reference frame.

Intra-prediction is used to encode blocks in a common frame. Intra prediction exploits the fact that: the luminance component and the chrominance component tend to be aggregated in one frame. For example, a green color of a certain portion of a tree tends to be adjacent to similar green colors. Intra prediction employs multiple directional prediction modes (e.g., 33 in HEVC), planar mode, and Direct Current (DC) mode. These directional modes indicate that the samples of the current block are similar/identical to the samples of the neighboring blocks in the corresponding direction. Planar mode means that a series of blocks on a row/column (e.g., plane) can be interpolated from neighboring blocks at the edge of the row. The planar mode actually means that the light/color smoothly transitions between rows/columns by using a relatively constant slope that varies by a numerical value. The DC mode is used for boundary smoothing, meaning that the block is similar/identical to the average of the samples of all neighboring blocks that are related to the angular direction of the directional prediction mode. Therefore, the intra prediction block may represent an image block as various relational prediction mode values instead of actual values. Also, the inter prediction block may represent an image block as a motion vector value instead of an actual value. In either case, the prediction block may not accurately represent the image block in some cases. All difference values are stored in the residual block. The residual block may be transformed to further compress the file.

In step 107, various filtering techniques may be used. In HEVC, filters are used according to an in-loop filtering scheme. The block-based prediction described above may result in a block image at the decoder side. Furthermore, a block-based prediction scheme may encode a block and then reconstruct the encoded block for subsequent use as a reference block. The in-loop filtering scheme iteratively applies a noise suppression filter, a deblocking filter, an adaptive loop filter, and a Sample Adaptive Offset (SAO) filter to the block/frame. These filters reduce block artifacts so that the encoded file can be accurately reconstructed. Furthermore, these filters reduce artifacts in the reconstructed reference block, making the artifacts less likely to produce other artifacts in subsequent blocks encoded from the reconstructed reference block.

Once the video signal is segmented, compressed and filtered, the resulting data is encoded in the codestream in step 109. The codestream comprises the data described above as well as any indicative data required to support a suitable video signal reconstruction at the decoder side. These data may include, for example, partition data, prediction data, residual blocks, and various flags that provide decoding instructions to the decoder. The codestream may be stored in memory for transmission to a decoder upon request. The codestream may also be broadcast and/or multicast to multiple decoders. The generation of the code stream is an iterative process. Thus, steps 101, 103, 105, 107 and 109 may be performed consecutively and/or simultaneously in a plurality of frames and blocks. The order shown in fig. 1 is presented for clarity and ease of description, and is not intended to limit the video coding process to a particular order.

In step 111, the decoder receives the code stream and starts a decoding process. Specifically, the decoder converts the code stream into corresponding syntax data and video data using an entropy decoding scheme. In step 111, the decoder uses syntax data in the codestream to determine the partitions of the frame. The segmentation should match the result of the block segmentation in step 103. The entropy encoding/decoding employed in step 111 is described below. The encoder makes a number of choices in the compression process, for example a block segmentation scheme from a number of possible choices depending on the spatial position of values in one or more input images. Indicating an exact selection may use a large number of binary symbols (bins). As used herein, a "binary symbol" is a binary value (e.g., a bit value that may vary depending on context) as a variable. Entropy coding causes the encoder to discard any options that are clearly unsuitable for a particular situation, leaving a set of available options. Each available option is then assigned a code word. The length of the codeword depends on the number of available options (e.g., one binary symbol for two options, two binary symbols for three to four options, and so on). The encoder then encodes the codeword for the selected option. This scheme reduces the code words because the code words are as large as expected, thus uniquely indicating selection from a small subset of the available options, rather than uniquely indicating selection from a potentially large set of all possible options. The decoder then decodes the options by determining the set of available options in a similar manner as the encoder. By determining the set of available options, the decoder can read the codeword and determine the selection made by the encoder.

In step 113, the decoder performs block decoding. Specifically, the decoder performs an inverse transform, generating a residual block. The decoder then reconstructs the image block from the partition using the residual block and the corresponding prediction block. The prediction blocks may include intra-prediction blocks and inter-prediction blocks generated by the encoder in step 105. The reconstructed image block is then placed in a frame of the reconstructed video signal according to the segmentation data determined in step 111. The syntax for step 113 may also be indicated in the codestream by entropy coding as described above.

In step 115, filtering is performed on the frames of the reconstructed video signal in a manner similar to step 107 on the encoder side. For example, noise suppression filters, deblocking filters, adaptive loop filters, and SAO filters may be applied to the frames to remove blocking artifacts. Once the frame has been filtered, the video signal may be output to a display for viewing by an end user in step 117.

Fig. 2 is a schematic diagram of an exemplary encoding and decoding (codec) system 200 for video coding. In particular, the functionality of the codec system 200 enables the method 100 to operate. The codec system 200 is generally applicable to describe components used in both encoders and decoders. The codec system 200 receives the video signal and divides the video signal as described in connection with steps 101 and 103 of the operating method 100 to obtain a divided video signal 201. When acting as an encoder, the codec system 200 then compresses the segmented video signal 201 into a coded bitstream, as described in connection with steps 105, 107 and 109 of the method 100. While acting as a decoder, codec system 200 generates an output video signal from the codestream, as described in connection with steps 111, 113, 115, and 117 of method of operation 100. The codec system 200 includes a general decoder control component 211, a transform scaling and quantization component 213, an intra estimation component 215, an intra prediction component 217, a motion compensation component 219, a motion estimation component 221, a scaling and inverse transform component 229, a filter control analysis component 227, an in-loop filter component 225, a decoded picture buffer component 223, and a header formatting and Context Adaptive Binary Arithmetic Coding (CABAC) component 231. These components are coupled as shown. In fig. 2, a black line represents movement of data to be encoded/decoded, and a dotted line represents movement of control data that controls operations of other components. The components of the codec system 200 may all reside in the encoder. The decoder may comprise a subset of the components in the codec system 200. For example, the decoder may include an intra prediction component 217, a motion compensation component 219, a scaling and inverse transform component 229, an in-loop filter component 225, and a decoded picture buffer component 223. These components are described below.

The video signal 201 is divided into a captured video sequence that has been divided into blocks of pixels by the coding tree. The coding tree subdivides blocks of pixels into smaller blocks of pixels using various partitioning modes. These blocks may then be further subdivided into smaller blocks. These blocks may be referred to as nodes on the coding tree. The larger parent node is divided into smaller child nodes. The number of times a node subdivides is referred to as the depth of the node/coding tree. In some cases, the divided blocks may be included in a Coding Unit (CU). For example, a CU may be a sub-portion of a CTU including a luma block, one or more red chroma (Cr) blocks, and one or more blue chroma (Cb) blocks, and syntax instructions corresponding to the CU. Partitioning patterns may include Binary Trees (BT), Ternary Trees (TT), and Quaternary Trees (QT) for partitioning a node into two, three, or four child nodes of different shapes (depending on the partitioning pattern used). The segmented video signal 201 is forwarded to the universal decoder control component 211, the transform scaling and quantization component 213, the intra estimation component 215, the filter control analysis component 227 and the motion estimation component 221 for compression.

The universal decoder control component 211 is configured to decide to decode pictures in the video sequence in the bitstream based on the application constraints. For example, the universal decoder control component 211 manages the optimization of the code rate/stream size and reconstruction quality. These decisions may be made based on storage space/bandwidth availability and image resolution requests. The universal decoder control component 211 also manages buffer utilization based on transmission speed to mitigate buffer underrun and overload. To address these issues, the universal decoder control component 211 manages the partitioning, prediction, and filtering by other components. For example, the universal transcoder control component 211 may dynamically increase compression complexity to increase resolution and bandwidth utilization or decrease compression complexity to decrease resolution and bandwidth utilization. Thus, the universal decoder control component 211 controls other components in the codec system 200 to balance video signal reconstruction quality with code rate. The universal decoder control component 211 generates control data that is used to control the operation of the other components. The control data is also forwarded to the header formatting and CABAC component 231 for encoding in the code stream to indicate the parameters used by the decoder for decoding.

The split video signal 201 is also sent to the motion estimation component 221 and the motion compensation component 219 for inter prediction. A frame or slice in the split video signal 201 may be divided into a plurality of video blocks. Motion estimation component 221 and motion compensation component 219 inter-prediction code the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. The codec system 200 may perform multiple coding passes to select an appropriate coding mode for each block in the video data, and so on.

The motion estimation component 221 and the motion compensation component 219 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation component 221 is the process of generating motion vectors, wherein these motion vectors are used to estimate the motion of a video block. For example, a motion vector may represent a displacement of a coded object relative to a prediction block. A prediction block is a block that highly matches the block to be coded in terms of pixel differences. The prediction block may also be referred to as a reference block. Such pixel differences may be determined by Sum of Absolute Difference (SAD), Sum of Squared Difference (SSD), or other difference metrics. HEVC employs a number of coded objects, including CTUs, Coding Tree Blocks (CTBs), and CUs. For example, a CTU may be divided into CTBs, which are then divided into CBs for inclusion in a CU. A CU may be encoded as a Prediction Unit (PU) comprising prediction data and/or a Transform Unit (TU) comprising transform residual data of the CU. Motion estimation component 221 performs rate-distortion analysis as part of a rate-distortion optimization process, generating motion vectors, PUs, and TUs. For example, the motion estimation component 221 may determine a plurality of reference blocks, a plurality of motion vectors, etc. for the current block/frame and may select the reference block, motion vector, etc. with the best rate-distortion characteristics. The optimal rate-distortion characteristics can maintain a balance between the quality of the video reconstruction (e.g., the amount of data loss due to compression) and the coding efficiency (e.g., the size of the final encoding).

In some examples, the codec system 200 may calculate values for sub-integer pixel positions of reference pictures stored in the decoded picture buffer component 223. For example, the video codec system 200 may interpolate one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference image. Thus, the motion estimation component 221 can perform a motion search with respect to integer pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision. Motion estimation component 221 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing the locations of the PUs to locations of prediction blocks of a reference picture. The motion estimation component 221 outputs the calculated motion vector as motion data to the header formatting and CABAC component 231 for encoding, and as motion data to the motion compensation component 219.

The motion compensation performed by motion compensation component 219 may include retrieving or generating a prediction block based on the motion vector determined by motion estimation component 221. Also, in some examples, motion estimation component 221 and motion compensation component 219 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation component 219 may locate the prediction block to which the motion vector points. The pixel values of the predicted block are then subtracted from the pixel values of the current video block being coded to obtain pixel differences, forming a residual video block. In general, motion estimation component 221 performs motion estimation with respect to the luma component, and motion compensation component 219 uses the motion vectors calculated from the luma component for the chroma component and the luma component. The prediction block and the residual block are forwarded to the transform scaling and quantization component 213.

The split video signal 201 is also sent to an intra estimation component 215 and an intra prediction component 217. As with the motion estimation component 221 and the motion compensation component 219, the intra estimation component 215 and the intra prediction component 217 may be highly integrated, but are illustrated separately for conceptual purposes. Intra-estimation component 215 and intra-prediction component 217 intra-predict the current block with respect to blocks in the current frame in place of the inter-prediction performed between frames by motion estimation component 221 and motion compensation component 219 as described above. In particular, intra-estimation component 215 determines an intra-prediction mode to encode the current block. In some examples, intra-estimation component 215 selects an appropriate intra-prediction mode from a plurality of tested intra-prediction modes for encoding the current block. The selected intra prediction mode is then forwarded to the header formatting and CABAC component 231 for encoding.

For example, the intra estimation component 215 performs rate distortion analysis on various tested intra prediction modes, calculates rate distortion values, and selects the intra prediction mode with the best rate distortion characteristics among the tested modes. Rate-distortion analysis is generally used to determine the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as to determine the code rate (e.g., number of bits) used to produce the encoded block. Intra estimation component 215 calculates ratios from the distortion and rate of various encoded blocks to determine the intra prediction mode that results in the best rate-distortion value for the block. In addition, the intra estimation component 215 may be used to code depth blocks in a depth image using a Depth Modeling Mode (DMM) according to rate-distortion optimization (RDO).

When implemented at an encoder, the intra prediction component 217 may generate a residual block from the predicted block according to the intra prediction mode determined by the intra estimation component 215, or read the residual block from the code stream when implemented at a decoder. The residual block comprises the difference between the predicted block and the original block, represented as a matrix. The residual block is then forwarded to the transform scaling and quantization component 213. Intra estimation component 215 and intra prediction component 217 may operate on the luma component and the chroma component.

The transform scaling and quantization component 213 is used to further compress the residual block. The transform scaling and quantization component 213 performs a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or the like, or a conceptually similar transform on the residual block, thereby generating a video block comprising residual transform coefficient values. Wavelet transforms, integer transforms, subband transforms, or other types of transforms may also be used. The transform may convert the residual information from the pixel domain to a transform domain (e.g., frequency domain). The transform scaling and quantization component 213 is also used to scale the transform residual information according to frequency, etc. Such scaling involves applying a scaling factor to the residual information in order to quantize different frequency information at different granularities, which may affect the final visual quality of the reconstructed video. The transform scaling and quantization component 213 is also used to quantize the transform coefficients to further reduce the code rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting the quantization parameter. In some examples, the transform scaling and quantization component 213 may then perform a scan of a matrix comprising quantized transform coefficients. The quantized transform coefficients are forwarded to the header formatting and CABAC component 231 for encoding in the bitstream.

The scaling and inverse transform component 229 performs the inverse operations of the transform scaling and quantization component 213 to support motion estimation. The scaling and inverse transform component 229 performs inverse scaling, inverse transformation, and/or inverse quantization to reconstruct the residual block in the pixel domain, e.g., for subsequent use as a reference block. The reference block may become a prediction block for another current block. Motion estimation component 221 and/or motion compensation component 219 may add the residual block back to the corresponding prediction block to compute a reference block for motion estimation of a subsequent block/frame. Filters are applied to the reconstructed reference block to reduce artifacts generated during scaling, quantization and transformation. These artifacts may render the prediction inaccurate (and create additional artifacts) when predicting subsequent blocks.

The filter control analysis component 227 and the in-loop filter component 225 apply filters to the residual block and/or reconstructed image block. For example, the transformed residual block from the scaling and inverse transform component 229 may be combined with the corresponding prediction block from the intra prediction component 217 and/or the motion compensation component 219 to reconstruct the original image block. Then, a filter may be applied to the reconstructed image block. In some examples, a filter may be applied to the residual block. As with the other components in FIG. 2, filter control analysis component 227 and in-loop filter component 225 are highly integrated and can be implemented together, but are described separately for conceptual purposes. Filters applied to reconstruct the reference block are applied to particular spatial regions, the filters including a plurality of parameters to adjust the manner in which the filters are used. The filter control analysis component 227 analyzes the reconstructed reference blocks to determine the locations at which these filters need to be used and to set the corresponding parameters. These data are forwarded to the header formatting and CABAC component 231 for encoding as filter control data. The in-loop filter component 225 uses these filters according to filter control data. These filters may include deblocking filters, noise suppression filters, SAO filters, and adaptive loop filters. These filters may be applied in the spatial/pixel domain (e.g., reconstructed block of pixels) or in the frequency domain, according to an example.

When operating as an encoder, the filtered reconstructed image blocks, residual blocks, and/or predicted blocks are stored in decoded image buffer component 223 for subsequent use in motion estimation as described above. When operating as a decoder, the decoded picture buffer component 223 stores the filtered reconstructed block and forwards it to a display as part of the output video signal. Decoded picture buffer component 223 may be any storage device capable of storing predicted blocks, residual blocks, and/or reconstructed image blocks.

The header formatting and CABAC component 231 receives data from various components in the codec system 200 and encodes these data in a coded code stream for transmission to a decoder. In particular, the header formatting and CABAC component 231 generates various headers to encode the control data (e.g., overall control data and filter control data). Furthermore, prediction data (including intra prediction and motion data) and residual data in the form of quantized transform coefficient data are encoded in the code stream. The final code stream includes all the information needed by the decoder to reconstruct the original segmented video signal 201. These information may also include an intra prediction mode index table (also referred to as a codeword mapping table), definitions of coding contexts for various blocks, an indication of the most probable intra prediction mode, an indication of partitioning information, and so forth. These data may be encoded using entropy coding. For example, Context Adaptive Variable Length Coding (CAVLC), CABAC, syntax-based context-adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or other entropy coding techniques may be employed to encode the above information. After entropy encoding, the coded bitstream may be sent to another device (e.g., a video decoder) or archived for subsequent transmission or retrieval.

Fig. 3 is a block diagram of an exemplary video encoder 300. The video encoder 300 may be used to implement the encoding function of the codec system 200 and/or perform steps 101, 103, 105, 107 and/or 109 of the method of operation 100. The encoder 300 divides the input video signal resulting in a divided video signal 301 that is substantially similar to the divided video signal 201. Then, the components in the encoder 300 compress and encode the split video signal 301 in a bitstream.

In particular, the segmented video signal 301 is forwarded to an intra prediction component 317 for intra prediction. Intra-prediction component 317 may be substantially similar to intra-estimation component 215 and intra-prediction component 217. The segmented video signal 301 is also forwarded to a motion compensation component 321 for inter prediction from reference blocks in the decoded picture buffer component 323. Motion compensation component 321 may be substantially similar to motion estimation component 221 and motion compensation component 219. The predicted block and the residual block from the intra prediction component 317 and the motion compensation component 321 are forwarded to a transform and quantization component 313 for transforming and quantizing the residual block. The transform and quantization component 313 may be substantially similar to the transform scaling and quantization component 213. The transformed quantized residual block and the corresponding prediction block (along with associated control data) are forwarded to entropy encoding component 331 for decoding in the bitstream. The entropy encoding component 331 may be substantially similar to the header formatting and CABAC component 231.

The transformed quantized residual block and/or the corresponding prediction block is also forwarded from the transform and quantization component 313 to the inverse transform and quantization component 329 for reconstruction as a reference block for use by the motion compensation component 321. Inverse transform and quantization component 329 may be substantially similar to scaling and inverse transform component 229. According to an example, the in-loop filter in the in-loop filter component 325 is also applied to the residual block and/or the reconstructed reference block. In-loop filter component 325 may be substantially similar to filter control analysis component 227 and in-loop filter component 225. In-loop filter component 325 may include multiple filters as described in connection with in-loop filter component 225. The filtered block is then stored in the decoded picture buffer component 323 as a reference block for use by the motion compensation component 321. Decoded picture buffer component 323 can be substantially similar to decoded picture buffer component 223.

Fig. 4 is a block diagram of an example video decoder 400. The video decoder 400 may be used to implement the decoding function of the codec system 200 and/or perform steps 111, 113, 115 and/or 117 of the operating method 100. The decoder 400 receives the code stream from the encoder 300 or the like and generates a reconstructed output video signal from the code stream for display to an end user.

The code stream is received by entropy decoding component 433. Entropy decoding component 433 is used to perform entropy decoding schemes such as CAVLC, CABAC, SBAC, PIPE encoding, or other entropy encoding techniques. For example, entropy decoding component 433 may use the header information to provide context to parse other data encoded in the code-stream as codewords. The above decoding information includes any information required for decoding the video signal, such as overall control data, filter control data, partition information, motion data, prediction data, and quantized transform coefficients in a residual block. The quantized transform coefficients are forwarded to an inverse transform and quantization component 429 for reconstruction into a residual block. Inverse transform and quantization component 429 may be similar to inverse transform and quantization component 329.

The reconstructed residual block and/or the prediction block is forwarded to an intra prediction component 417 for reconstruction into an image block according to an intra prediction operation. The intra prediction component 417 may be similar to the intra estimation component 215 and the intra prediction component 217. In particular, intra prediction component 417 uses the prediction mode to locate reference blocks in the frame and applies the residual block to the result to reconstruct an intra-predicted image block. The reconstructed intra-predicted image blocks and/or residual blocks and corresponding inter-prediction data are forwarded to the decoded image buffer component 423 via the in-loop filter component 425. The decoded picture buffer component 423 and the in-loop filter component 425 may be substantially similar to the decoded picture buffer component 223 and the in-loop filter component 225, respectively. The in-loop filter component 425 filters the reconstructed image blocks, residual blocks, and/or predicted blocks described above. This information is stored in the decoded picture buffer component 423. The reconstructed image block from decoded picture buffer component 423 is forwarded to motion compensation component 421 for inter prediction. The motion compensation component 421 may be substantially similar to the motion estimation component 221 and/or the motion compensation component 219. Specifically, the motion compensation component 421 generates a prediction block using a motion vector of a reference block and applies a residual block to the result to reconstruct an image block. The resulting reconstructed block may also be forwarded to the decoded picture buffer component 423 through the in-loop filter component 425. Decoded picture buffer component 423 continues to store other reconstructed picture blocks. These reconstructed image blocks may be reconstructed into frames by the segmentation information. The frames may also be placed in a sequence. The sequence is output to a display as a reconstructed output video signal.

Fig. 5 is an example of a point cloud media 500 that may be transcoded according to a PCC mechanism. A point cloud is a collection of data points in space. The point cloud may be generated by 3D scanners that measure a large number of points on the outer surface of the object around the 3D scanner. The point cloud may be described in terms of geometric attributes, texture attributes, reflectivity attributes, transparency attributes, normal attributes, and the like. As part of the method 100, each attribute may be transcoded by a codec, such as the video codec system 200, the encoder 300, and/or the decoder 400. Specifically, each attribute of the PCC frame may be encoded separately at the encoding end and decoded and recombined at the decoding end to recreate the PCC frame.

The point cloud media 500 includes three bounding boxes 502, 504, and 506. Each of the bounding boxes 502, 504, and 506 represents a portion or segment of a 3D image from the current frame. Although the bounding boxes 502, 504, and 506 contain 3D images of people, in actual practice, other objects may be included in the bounding boxes. Each bounding box 502, 504, and 506 includes an x-axis, a y-axis, and a z-axis, representing the number of pixels occupied by the 3D image in the x, y, and z directions, respectively. For example, the x-axis and y-axis depict about 400 pixels (e.g., about 0-400 pixels), while the z-axis depicts about 1000 pixels (e.g., about 0-1000 pixels).

Each of the bounding boxes 502, 504, and 506 contains one or more tiles 508, represented in FIG. 5 by cubes or boxes. Each slice 508 contains a portion of the entire object within one of the bounding boxes 502, 504, or 506, and may be described or represented by slice size information. For example, the tile information may include two-dimensional (2D) and/or three-dimensional (3D) coordinates that describe the location of the tile 508 within the bounding box 502, 504, or 506. The slice information may also include other parameters. For example, the slice information may include a normalxaxis or the like parameter inherited from the reference slice information for the current slice information. That is, one or more parameters may be inherited from slice information of a reference frame for slice information of a current frame. Further, the current frame may inherit one or more metadata portions of the reference frame (e.g., slice rotation, scale parameters, material identifiers, etc.). Slice 508 is interchangeably referred to herein as a 3D slice or slice data unit. A list of tiles 508 may be generated in each bounding box 502, 504, or 506 and stored in a tile buffer in descending order from largest to smallest tiles. The slice may then be encoded by an encoder and/or decoded by a decoder.

The patches 508 may describe various attributes of the point cloud media 500. Specifically, the position of each pixel in the x, y, and z axes is the geometry of the pixel. A slice 508 containing the locations of all pixels in the current frame may be coded to capture the geometry attributes of the current frame of the point cloud media 500. Further, each pixel may include color values in red, blue, green (RGB) and/or luminance and chrominance (YUV) spectra. A slice 508 containing the colors of all pixels in the current frame may be coded to capture texture attributes of the current frame of the point cloud media 500.

Further, each pixel may (or may not) include a certain reflectivity. Reflectance is the amount of light (e.g., colored light) projected from a pixel to an adjacent pixel. Shiny objects have high reflectivity and therefore spread the light/color of their corresponding pixels onto other pixels in the vicinity. At the same time, matte objects with little or no reflectivity may not affect the color/light level of nearby pixels. A slice 508 containing the reflectivity of all pixels in the current frame may be coded to capture the reflectivity attributes of the current frame of the point cloud media 500. Some pixels may also be partially to fully transparent (e.g., glass, transparent plastic, etc.). Transparency is the amount of light/color of a neighboring pixel that can pass through the current pixel. A slice 508 containing transparency levels for all pixels in the current frame may be coded to capture the transparency attributes of the current frame of the point cloud media 500. Further, the points of the point cloud media may generate a surface. A surface may be associated with a normal vector, which is a vector perpendicular to the surface. The normal vectors may be used to describe object motion and/or interaction. Thus, in some cases, a user may wish to encode the normal vector of a surface to support additional functionality. A slice 508 containing normal vectors for one or more surfaces in the current frame may be coded to capture normal attributes of the current frame of the point cloud media 500.

According to an example, the geometry, texture, reflectivity, transparency, and normal attributes may contain data describing some or all of the data points in the point cloud media 500. For example, reflectivity, transparency, and normal attributes are optional, and thus, even in the same codestream, these attributes may appear alone or in combination for some point cloud media 500 examples, but not for other point cloud media 500 examples. Thus, the number of slices 508 and the number of attributes may vary from frame to frame and video to video depending on the subject of the shot, the video settings, and so forth.

Fig. 6 is an example of data segmentation and packing of a point cloud media frame 600. In particular, the example of fig. 6 depicts a 2D representation of a tile 508 of point cloud media 500. The point cloud media frame 600 includes a bounding box 602 corresponding to a current frame in the video sequence. Unlike bounding boxes 502, 504, and 506 of FIG. 5, which are 3D, bounding box 602 is 2D. As shown, the bounding box 602 includes a number of tiles 604. Slice 604 is interchangeably referred to herein as a 2D slice or a slice data unit. In general, slice 604 in FIG. 6 is an image representation in bounding box 504 of FIG. 5. Thus, the 3D image in the bounding box 504 in fig. 5 is projected onto the bounding box 602 through the tile 604. The portion of the bounding box 602 that does not contain a tile 604 is referred to as an empty space 606. The empty space 606 may also be referred to as a void space, an empty sample, etc.

In view of the above, it should be noted that the video-based Point Cloud Compression (PCC) codec is based on the segmentation of 3D point cloud data (e.g., slice 508 of fig. 5) into 2D slices (e.g., slice 604 of fig. 6). Indeed, the above described transcoding method or process may be advantageously used in various types of technologies, such as immersive six degree of freedom (6 DoF), dynamic augmented reality/virtual reality (AR/VR) objects, cultural heritage, Graphical Information Systems (GIS), Computer Aided Design (CAD), autonomous navigation, and so on.

The location of each tile (e.g., one of tiles 604 of fig. 6) within a bounding box (e.g., bounding box 602) may be determined solely by the size of the tile. For example, the largest tile 604 in FIG. 6 is first projected onto the bounding box 602 starting from the top left corner (0, 0). After the largest tile 604 is projected onto the bounding box 602, the next largest tile 604 is projected (i.e., filled) onto the bounding box 602, and so on until the smallest tile 604 is projected onto the bounding box 602. Again, this process only considers the size of each tile 604. In some cases, the smaller tiles 604 may occupy the space between the larger tiles, and the position may eventually be closer to the upper left corner of the bounding box 602 than the larger tiles 604. During encoding, the process may be repeated for each relevant attribute until slices of each attribute in the frame are encoded into one or more corresponding attribute streams. Then, the group of data elements in the attribute stream for recreating a single frame may be stored in the code stream in the PCC AU. On the decoder side, these attribute streams are taken from the PCC AU and decoded to create slice 604. Such slices 604 may then be combined to recreate the PCC media. Accordingly, the point cloud media frame 600 may be transcoded by a codec, such as the video codec system 200, the encoder 300, and/or the decoder 400, as part of the method 100 to compress the point cloud media 500 for transmission.

Fig. 7 is a schematic diagram of an exemplary PCC video stream 700 with an extended set of attributes; for example, the PCC video stream 700 may be created when encoding a point cloud media frame 600 from the point cloud media 500 using the video codec system 200, the encoder 300, and/or the decoder 400, etc., according to the method 100.

PCC video stream 700 includes a sequence of PCC AU 710. The PCC AU 710 includes data sufficient to reconstruct a single PCC frame. Data is located in PCC AU 710 in NAL unit 720. NAL units 720 are packet-sized data containers. For example, the size of a single NAL unit 720 is typically designed to allow simple network transmission. NAL unit 720 may contain a header indicating the type of NAL unit 720 and a payload containing the relevant video data. PCC video stream 700 is designed for an extended attribute set and therefore contains several attribute-specific NAL units 720.

The PCC video stream 700 may include a group of frames (GOF) header 721, an auxiliary information frame 722, an occupancy picture frame 723, a geometry NAL unit 724, a texture NAL unit 725, a reflectivity NAL unit 726, a transparency NAL unit 727, and a normal NAL unit 728, each of which is a type of NAL unit 720. The GOF header 721 includes various syntax elements that describe the corresponding PCC AU 710, the frames related to the corresponding PCC AU 710, and/or other NAL units 720 in the PCC AU 710. According to an example, a PCC AU 710 may or may not contain a single GOF header 721. The auxiliary information frame 722 may contain metadata associated with the frame, such as information associated with the slice used to encode the attribute. Occupancy map frame 723 may contain other metadata related to the frame, such as an occupancy map indicating the areas occupied by data in the frame versus the empty areas in the frame. The remaining NAL units 720 contain attribute data for the PCC AU 710. Specifically, geometry NAL unit 724, texture NAL unit 725, reflectivity NAL unit 726, transparency NAL unit 727, and normal NAL unit 728 contain a geometry attribute, a texture attribute, a reflectivity attribute, a transparency attribute, and a normal attribute, respectively.

As described above, attributes may be organized as streams. For example, there may be 0 to 4 streams per attribute. The stream may comprise a logically separate portion of PCC video data. For example, properties of different objects may be encoded into multiple property streams of the same type (e.g., a first geometry stream of a first 3D bounding box, a second property stream of a second 3D bounding box, etc.). In another example, attributes associated with different frames may be encoded into multiple attribute streams (e.g., a transparency attribute stream for even frames and a transparency attribute stream for odd frames). In yet another example, tiles may be placed in layers to represent 3D objects. Thus, the separate layers may be included in separate streams (e.g., a first texture attribute stream for the top layer, a second texture attribute stream for the second layer, etc.). Regardless of the example, the PCC AU 710 can contain zero, one, or multiple NAL units for the corresponding attribute.

The present invention may increase flexibility in coding various attributes (e.g., as included in geometry NAL unit 724, texture NAL unit 725, reflectivity NAL unit 726, transparency NAL unit 727, and/or normal NAL unit 728). In a first example, different PCC attributes may be coded using different codecs. In one particular example, the geometry of the PCC video may be coded into the geometry NAL unit 724 using a first codec, while the reflectivity of the PCC video is coded into the reflection NAL unit 726 using a second codec. In another example, up to five codecs may be used when coding PCC video (e.g., one codec per attribute). The one or more codecs for the one or more attributes may then be indicated as one or more syntax elements in the PCC video stream 700 (e.g., in the GOF header 721).

Further, as described above, PCC attributes may use various combinations of layers and/or flows. Thus, one or more syntax elements (e.g., in GOF header 721) may be used to indicate the layer and/or stream combination used by the encoder when encoding each attribute in order to allow the decoder to determine the layer and/or stream combination for each attribute when decoding. Furthermore, syntax elements (e.g., in GOF header 721) may be used to indicate a mode for coding and/or combining PCC attribute layers in a PCC attribute stream. Further, one or more syntax elements (e.g., in GOF header 721) may be used to represent the layer index of the layer associated with each NAL unit 720 corresponding to the PCC attribute stream. For example, the GOF header 721 may be used to indicate the number of layers and streams associated with the geometry attributes, the arrangement of such layers and streams, and the layer index of each geometry NAL unit 724, so that a decoder may assign each geometry NAL unit 724 to an appropriate layer when decoding PCC frames.

Finally, a flag (e.g., in GOF header 721) may indicate whether any PCC attribute layer contains any irregular point cloud points. An irregular point cloud is a collection of one or more data points that are not contiguous with neighboring data points and therefore cannot be represented in a 2D slice, such as slice 604. Instead, the points are represented as part of an irregular cloud of points that contains coordinates and/or translation parameters associated with the irregular point cloud points. Since the irregular point clouds are represented using a data structure other than a 2D slice, the flag allows the decoder to correctly identify the presence of the irregular point clouds and select the appropriate mechanism to decode the data.

The following is an exemplary mechanism to implement the above aspects. Defining: the video NAL unit is a PCC NAL unit with PccNalUnitType equal to GMTRY _ NALU, TEXTURE _ NALU, REFLECT _ NALU, TRANSP _ NALU or NORMAL _ NALU.

Code stream format: the clause specifies a relationship between a NAL unit stream and a byte stream, either of which is referred to as a codestream. The codestream may have two formats: NAL unit stream format or byte stream format. The NAL unit stream format is conceptually a more basic type, including a sequence of syntax structures called PCC NAL units. The sequence is ordered in decoding order. The decoding order (and content) of PCC NAL units in a NAL unit stream is constrained. A byte stream format may be constructed from the NAL unit stream format by ordering the NAL units in decoding order and adding a start code prefix and zero or more zero value bytes before each NAL unit to form a byte stream. The NAL unit stream format can be extracted from the byte stream format by searching the byte stream for the location of a unique start code prefix pattern. The byte stream format is similar to the format used in HEVC and AVC.

The PCC NAL unit header syntax may be implemented as described in table 1 below.

TABLE 1 PCC NAL unit header syntax

pcc_nal_unit_header(){ Descriptor(s)
forbidden_zero_bit f(1)
pcc_nal_unit_type_plus1 u(5)
pcc_stream_id u(2)
}

A frame group header Raw Byte Sequence Payload (RBSP) syntax can be implemented as described in table 2 below.

Table 2 frame group header RBSP syntax

The PCC profile and level syntax may be implemented as described in table 3 below.

TABLE 3 PCC Profile syntax

PCC NAL unit header semantics may be implemented as follows: the forbidden _ zero _ bit may be set to zero. The subtraction of 1 from pc _ NAL _ unit _ type _ plus1 specifies the value of the variable PccNalUnitType, which specifies the RBSP data structure type contained in the PCC NAL unit, as shown in table 4. The variable NalUnitType is defined as follows:

PccNalUnitType=pcc_nal_unit_type_plus1–1 (7-1)

PCC NAL units (semantics not specified) with NAL _ unit _ type in the range UNSPEC25.. UNSPEC30 (inclusive) should not affect the decoding process specified here. It should be noted that the use of PCC NAL unit types in the UNSPEC25.. UNSPEC30 range may be determined by the application. The decoding process of these values of PccNalUnitType is not specified in the present invention. Since different applications may use these PCC NAL unit types for different purposes, particular attention should be paid when designing encoders that generate PCC NAL units using these PccNalUnitType values and decoders that interpret the contents of PCC NAL units using these PccNalUnitType values. The present invention does not define any management of these values. These PccNalUnitType values may only apply in the following context: the use of conflicts (e.g., meaning of PCC NAL unit content has a different definition for the same PccNalUnitType value) is unimportant, impossible, managed, e.g., defined or managed in a control application or transport specification, or by controlling the environment of codestream distribution.

For purposes other than determining the amount of data in the PCC AU of the codestream, the decoder may ignore (remove and discard from the codestream) the contents of all PCC NAL units using the PccNalUnitType reserved value. This requirement may define a compatible extension of the present invention in the future.

TABLE 4 PCC NAL unit type code

The identified video codec (e.g., HEVC or AVC) is indicated in a frame group header NAL unit, which is present in the first PCC AU of each Cloud Point Stream (CPS). The pc _ stream _ ID indicates a PCC stream Identifier (ID) of the PCC NAL unit. When PccNalUnitType is equal to GOF _ HEADER, AUX _ INFO or OCP _ MAP, the value of pcc _ stream _ id is set to 0. The value of the pc _ stream _ id may be limited to less than 4 when defining a set of one or more PCC grades and levels.

The order of PCC NAL units and their association with PCC AU are described below. The PCC AU includes zero or one frame group header NAL unit, one side information frame NAL unit, one video AU or video auspices occupying the picture frame NAL unit and data units carrying PCC attributes (e.g., geometry, texture, reflectivity, transparency, or normal attributes). Let video _ AU (i, j) denote one video AU, where pc _ stream _ ID equals j for a PCC attribute with a PCC attribute ID equal to attribute _ type [ i ]. The video AUs existing in the PCC AU may be arranged in the following order. If attributes _ first _ ordering _ flag is equal to 1, then for any two video AUs present in the PCC AU: video _ au (i1, j1) and video _ au (i2, j2), as follows. If i1 is less than i2, then video _ au (i1, j1) should precede video _ au (i2, j2) regardless of the values of j1 and j 2. Otherwise, if i1 is equal to i2 and j1 is greater than j2, then video _ au (i1, j1) should follow video _ au (i2, j 2).

Otherwise (e.g., attributes _ first _ ordering _ flag equal to 0), for any two video AUs present in the PCC AU: video _ au (i1, j1) and video _ au (i2, j2), the following applies. If j1 is less than j2, then video _ au (i1, j1) should precede video _ au (i2, j2) regardless of the values of i1 and i 2. Otherwise, if j1 is equal to j2 and i1 is greater than i2, then video _ au (i1, j1) should follow video _ au (i2, j 2). The above-described order of video AUs leads to the following result. If attributes _ first _ ordering _ flag is equal to 1, the order of the video AUs (when present) in the PCC AUs where all PCC NAL units of each particular PCC attribute (when present) are consecutive in decoding order without interleaving with PCC NAL units of other PCC attributes is as follows (in the listed order):

video_au(0,0),video_au(0,1),……,video_au(0,num_streams_for_attribute[0]),

video_au(1,0),video_au(1,1),……,video_au(1,num_streams_for_attribute[1]),

……

video_au(num_attributes–1,0),video_au(num_attributes–1,1),……,video_au(num_attributes–1,num_streams_for_attribute[1])。

otherwise (attributes _ first _ ordering _ flag is equal to 0), the order of the video AUs (when present) in the PCC AUs is as follows (in the listed order), where in the PCC AU all PCC NAL units of each particular pc _ stream _ id value (when present) are consecutive in decoding order and not interleaved with PCC NAL units of other pc _ stream _ id values:

video_au(0,0),video_au(1,0),……,video_au(num_attributes–1,0),

video_au(0,1),video_au(1,1),……,video_au(num_attributes–1,1),

……

video_au(0,num_streams_for_attribute[1]),video_au(1,num_streams_for_attribute[1]),……,video_au(num_attributes–1,num_streams_for_attribute[1])。

the association of NAL units with a video AU and the order of the NAL units within the video AU are specified in the specification of the identified video codec (e.g., HEVC or AVC). The identified video codec is indicated in a frame header NAL unit, which is present in the first PCC AU of each CPS.

The first PCC AU of each CPS starts with a frame group header NAL unit, each frame group header NAL unit indicating the start of a new PCC AU.

Other PCC AUs start with a side information frame NAL unit. In other words, the side information frame NAL unit starts a new PCC AU when there is no previous frame group header NAL unit.

The frame group header RBSP semantics are as follows: num _ attributes represents the maximum number of PCC attributes (e.g., geometry, texture, etc.) that can be carried in the CPS. It should be noted that the num _ attributes value may be limited to be equal to or less than 5 when defining one or more PCC profiles and sets of levels. When attribute _ first _ ordering _ flag is set to 0, this indicates that: in a PCC AU, all PCC NAL units (when present) of each particular PCC attribute are contiguous in decoding order, without interleaving with PCC NAL units of other PCC attributes. When attribute _ first _ ordering _ flag is set to 0, this indicates that: in a PCC AU, all PCC NAL units of each particular pc _ stream _ id value (when present) are consecutive in decoding order, without interleaving with PCC NAL units of other pc _ stream _ id values. attribute _ type [ i ] indicates a PCC attribute type of the ith PCC attribute. Table 5 below illustrates an explanation of different PCC attribute types. In defining a set of one or more PCC grades and levels, the values of attribute _ type [0] and attribute _ type [1] may be limited to be equal to 0 and equal to 1, respectively.

TABLE 5 description of attribute _ type [ i ]

identified _ codec _ for _ attribute [ i ] represents the video codec used to decode the identification of the ith PCC attribute, as shown in table 6 below.

TABLE 6 description of identified _ codec _ for _ attribute [ i ]

num _ streams _ for _ attribute [ i ] represents the maximum number of PCC flows for the ith PCC attribute. It should be noted that the value of num _ streams _ for _ attribute [ i ] may be limited to be less than or equal to 4 when defining one or more PCC profiles and sets of levels. num _ layers _ for _ attribute [ i ] indicates the number of attribute layers for the ith PCC attribute. It should be noted that, in defining one or more PCC profiles and sets of levels, the value of num _ layer _ for _ attribute [ i ] may be limited to be less than or equal to 4. max _ attribute _ layer _ idx [ i ] [ j ] represents the maximum value of the attribute layer index of the PCC stream, where pc _ stream _ id equals j for the ith PCC attribute. The value of max _ attribute _ layer _ idx [ i ] [ j ] should be less than the value of num _ layer _ for _ attribute [ i ]. attribute _ layers _ combination _ mode [ i ] [ j ] represents an attribute layer combination pattern of attribute layers carried in a PCC stream, where pc _ stream _ id is equal to j for the ith PCC attribute. an explanation of the different values of attribute _ layers _ combination _ mode [ i ] [ j ] is illustrated in table 7 below.

TABLE 7 description of attribute _ layers _ combination _ mode [ i ] [ j ]

When attribute _ layers _ combination _ mode [ i ] [ j ] is present and equal to 0, the variable attrLayerIdx [ i ] [ j ] is derived as follows, where the variable attrLayerIdx [ i ] [ j ] represents the attribute layer index of the attribute layer of the PCC stream (for the ith PCC attribute, pc _ stream _ id is equal to j), the PCC NAL units of the attribute layer are carried in the video AU, where the graph order count value is equal to PicOrderCntVal, as specified in the identified specification of the video codec:

tmpVal=PicOrderCntVal%num_streams_for_attribute[i]

if(j==0)

attrLayerId[i][j]=tmpVal (7-2)

else

attrLayerId[i][j][k]=max_attribute_layer_id[i][j–1]+1+tmpVal

and when the regular _ points _ flag [ i ] [ j ] is equal to 1, the attribute layer with the layer index equal to j carries the regular points of the point cloud signal for the ith PCC attribute. regular _ points _ flag [ i ] [ j ], when set to 0, indicates that for the ith PCC attribute, the attribute layer with layer index equal to j carries the irregular points of the point cloud signal. It should be noted that the value of regular _ points _ flag [ i ] [ j ] may be limited to 0 when defining one or more sets of PCC grades and levels. frame _ width represents the frame width in pixels of the geometry video and texture video. The frame width should be a multiple of the occupancy resolution. frame _ height represents the frame height in pixels for geometry video and texture video. The frame height should be a multiple of the occupancy resolution. The occupancy _ resolution represents the horizontal and vertical resolutions of packed slices in pixels in geometry video and texture video. The occupancy _ solution should be an even multiple of the occupancy precision. radius _ to _ smoothing denotes a radius for detecting a neighbor for smoothing. radius _ to _ smoothing should range from 0 to 255 (inclusive).

neighbor _ count _ smoothing represents the maximum number of neighbors for smoothing. The value of neighbor _ count _ smoothing should range from 0 to 255 (inclusive). radius2_ boundary _ detection indicates the boundary point detection radius. radius2_ boundary _ detection should have a value ranging from 0 to 255 (inclusive). threshold _ smoothing represents a smoothing threshold. the value of threshold _ smoothing should range from 0 to 255 (inclusive). lossless _ geometry represents lossless geometry coding. A lossless _ geometry value, when equal to 1, indicates that the point cloud geometry information is decoded in a lossless manner. A lossless _ geometry value, when equal to 0, indicates that the point cloud geometry information is decoded in a lossy manner. lossless _ texture denotes lossless texture coding. The lossless _ texture value, when equal to 1, indicates that the point cloud texture information is decoded in a lossless manner. The lossless _ texture value, when equal to 0, indicates that the point cloud texture information is decoded in a lossy manner. lossless _ geometry _444 indicates that the 4:2:0 video format or the 4:4:4 video format is used for the geometry frame. The value of lossless _ geometry _444, when equal to 1, indicates that the geometry video is coded in 4:4:4 format. The value of lossless _ geometry _444, when equal to 0, indicates that the geometry video is coded in 4:2:0 format.

absolute _ d1_ coding represents the way in which geometric layers other than the layer closest to the projection plane are decoded. absolute _ d1_ coding, when equal to 1, indicates the decoding of the actual geometry values of the geometry layers, except for the layer closest to the projection plane. absolute _ d1_ coding, when equal to 0, indicates that the geometry layer other than the layer closest to the projection plane is differentially decoded. bin _ arithmetric _ coding indicates whether binary arithmetic coding is used. The value of bin _ arithmetric _ coding, when equal to 1, indicates that binary arithmetic coding is used for all syntax elements. The value of bin _ arithmetric _ coding, when equal to 0, indicates that non-binary arithmetic coding is used for some syntax elements. A gof _ header _ extension _ flag, when equal to 0, indicates that there is no gof _ header _ extension _ data _ flag syntax element in the frame group header RBSP syntax structure. A gof _ header _ extension _ flag, when equal to 1, indicates the presence of a gof _ header _ extension _ data _ flag syntax element in the frame group header RBSP syntax structure. The decoder may ignore all data following the value 1 of gof _ header _ extension _ flag in the frame group header NAL unit. The gof _ header _ extension _ data _ flag may have any value, and the presence and value of this flag does not affect decoder consistency. The decoder may ignore all gof _ header _ extension _ data _ flag syntax elements.

PCC grade and level semantics are as follows: the pc _ profile _ idc indicates the level of CPS compliance. The pc _ pl _ reserved _ zero _19bits are equal to 0 in the code stream conforming to the version of the invention. Other values of pc _ pl _ reserved _ zero _19bits are reserved for use by ISO/IEC. The decoder may ignore the value of pc _ pl _ reserved _ zero _19 bits. pcc _ level _ idc indicates the level at which the CPS conforms. When an HEVC compliant decoder decodes an HEVC stream of PCC attribute type that is equivalent to attribute _ type [ i ] (extraction specified by the subcode stream extraction process), in active SPS, the HEVC _ ptl _12bytes _ attribute [ i ] may be equal to a 12-byte value between general _ profile _ idc and general _ level _ idc (inclusive). When an AVC compliant decoder decodes an AVC bitstream that is equivalent to the PCC attribute type of attribute _ type [ i ] (extracted specified by the subcode extraction process), in active SPS, AVC _ pl _3 bytes _ attribute [ i ] may be equal to a value of 3 bytes between profile _ idc and level _ idc (inclusive).

The extraction process of the subcode stream is as follows: the input of the process is a PCC code stream inBitstream, a target PCC attribute type targetAttType and a target PCC stream ID value targetStreamId. The output of this process is a subcode stream. For an input code stream, the code stream consistency requirement is as follows: according to the identified video codec specification of attribute type targetattrttype, any output sub-stream should be a consistent video stream, which is the output of the process specified in this section, and having a consistent PCC stream inBitstream, any type of targetattrttype representing PCC attributes present in inBitstream, and for attribute type targetratenid, less than or equal to the maximum PCC stream ID value of the PCC stream present in inBitstream.

The output subcode stream is derived by the following sequential steps. According to the value of targetattrttype, the following applies. If targetAttType is equal to ATTR _ GEOMETRY, then all PCC NAL units with PccNalUnitType not equal to GMTRY _ NALU or with PCC _ stream _ id not equal to targetStreamId are removed. Otherwise, if targetattType is equal to ATTR _ TEXTURE, then all PCC NAL units with PccNalUnitType not equal to TEXTURE _ NALU or pc _ stream _ id not equal to targetStreamId are removed. Otherwise, if targetattType is equal to ATTR _ REFLECT, then all PCC NAL units with PccNalUnitType not equal to REFLECT _ NALU or with PCC _ stream _ id not equal to targetStreamId are removed. Otherwise, if targetattType is equal to ATTR _ TRANSP, then all PCC NAL units with PccNalUnitType not equal to TRANSP _ NALU or with PCC _ stream _ id not equal to targetStreamId are removed. Otherwise, if targetattType is equal to ATTR _ NORMAL, then all PCC NAL units with PccNalUnitType not equal to NORMAL _ NALU or with PCC _ stream _ id not equal to targetStreamId are removed. The first byte may also be removed for each PCC NAL unit.

In an alternative embodiment of the first set of methods described above, the PCC NAL unit header is designed to use more bits for the pc _ stream _ id and allows more than four streams per attribute. In this case, one more type is added in the PCC NAL unit header.

Fig. 8 is a schematic diagram of an exemplary mechanism 800 for encoding PCC attributes 841 and 842 using multiple codecs 843 and 844. For example, mechanism 800 may be used to encode and/or decode attributes of PCC video stream 700. Thus, the mechanism 800 may be used to encode and/or decode point cloud media frames 600 from point cloud media 500. Thus, encoder 300 may create a code stream from the PCC sequence using mechanism 800, and decoder 400 may reconstruct the PCC sequence from the code stream using mechanism 800. Thus, mechanism 800 may be used by codec system 200 and may further be used to support method 100.

Mechanism 800 may be applicable to multiple PCC attributes 841 and 842. For example, PCC attributes 841 and 842 may be any two attributes selected from the group consisting of a geometry attribute, a texture attribute, a reflectivity attribute, a transparency attribute, and a normal attribute. As shown in fig. 8, mechanism 800 depicts a left-to-right encoding process and a right-to-left decoding process. Codecs 843 and 844 may be any two codecs, such as HEVC, AVC, VVC, etc., or any version thereof. When encoding certain PCC attributes 841 and 842, certain codecs 843 and 844, or versions thereof, may be more efficient than others. In this example, codec 843 is used to encode attribute 841, while codec 844 is used to encode attribute 842, respectively. The results of these encodings are combined to create PCC video stream 845 that contains PCC attributes 841 and 842. On the decoder side, codec 843 is used to decode attribute 841 and codec 844 is used to decode attribute 842, respectively. The decoded attributes 841 and 842 may then be recombined to generate a decoded PCC video stream 845.

A benefit of employing mechanism 800 is that the most efficient codecs 843 and 844 can be selected for the corresponding attributes 841 and 842. Mechanism 800 is not limited to two attributes 841 and 842, two codecs 843 and 844. For example, each attribute (geometry, texture, reflectivity, transparency, and normal) may be encoded by a separate codec. To ensure that the correct codecs 843 and 844 can be selected to decode the corresponding attributes 841 and 842, the encoder can instruct the codecs 843 and 844 and their correspondence to the attributes 841 and 842, respectively. For example, the encoder may add one or more syntax elements in the GOF header to indicate the correspondence of the codec to the attribute. The decoder may then read the relevant syntax, select the correct codecs 843 and 844 for the attributes 841 and 842, and decode the PCC video stream 845. In one particular example, the codecs 843 and 844 for the attributes 841 and 842, respectively, may be represented using an identified _ codec _ for _ attribute syntax element.

FIG. 9 is a diagram 900 of an example of property layers 931, 932, 933, and 934. For example, attribute layers 931, 932, 933, and 934 may be used to carry attributes of the PCC video stream 700. Thus, when encoding and/or decoding a point cloud media frame 600 from a point cloud media 500, layers 931, 932, 933, and 934 may be used. Accordingly, the encoder 300 may create a code stream from the PCC sequence using the layers 931, 932, 933, and 934, and the decoder 400 may reconstruct the PCC sequence from the code stream using the layers 931, 932, 933, and 934. Thus, layers 931, 932, 933, and 934 can be used by codec system 200 and can also be used to support method 100. Further, property layers 931, 932, 933, 934 can be used to carry one or more of the properties 841, 842.

Attribute layers 931, 932, 933, and 934 are data packets associated with an attribute that can be stored and/or modified independently of other data packets associated with the same attribute. Accordingly, each attribute layer 931, 932, 933, and/or 934 may be changed and/or represented without affecting the remaining attribute layers 931, 932, 933, and/or 934. In some examples, property layers 931, 932, 933, and/or 934 may be visually represented on top of each other, as shown in fig. 9. For example, textures covering the entire object (e.g., overall) may be stored in the property layer 931, while more detailed textures (e.g., specific) are included in the property layers 932, 933, and/or 934. In another example, attribute layers 931 and/or 932 may be applied to odd-numbered frames and attribute layers 933 and/or 934 may be applied to even-numbered frames. This may omit some layers in response to a change in frame rate. Each attribute may have 0 to 4 attribute layers 931, 932, 933, and/or 934. To indicate the configuration used, the encoder may use syntax elements, such as num _ layers _ for _ attribute [ i ], in the sequence level data (e.g., GOF header). The decoder may read the syntax elements and determine the number of attribute layers 931, 932, 933, and/or 934 for each attribute. Additional syntax elements, such as attribute _ layers _ combination _ mode [ i ] [ j ] and attrLayerIdx [ i ] [ j ], may also be used to represent the combination of attribute layers used in the PCC video stream and the index of each layer used by the corresponding attribute, respectively.

In yet another example, some property layers (e.g., property layers 931, 932, and 933) may carry data related to regular tiles, while other property layers (e.g., property layer 934) carry data related to irregular point clouds. This may be useful because irregular point clouds may be described using different data than regular point cloud patches. To indicate that a particular layer carries data associated with an irregular point cloud, the encoder may encode another syntax element in the sequence-level data. In one specific example, a regular _ points _ flag in the GOF header may be used to indicate that the property layer carries at least one irregular point cloud point. The decoder can then read the syntax elements and decode the corresponding property layers accordingly.

Fig. 10 is a schematic diagram 1000 of an example of attribute streams 1031, 1032, 1033, and 1034. For example, attribute streams 1031, 1032, 1033, and 1034 may be used to carry attributes of PCC video stream 700. Accordingly, the attribute streams 1031, 1032, 1033, and 1034 may be used when encoding and/or decoding the point cloud media frame 600 according to the point cloud media 500. Accordingly, encoder 300 may create a code stream from the PCC sequence using attribute streams 1031, 1032, 1033, and 1034, and decoder 400 may reconstruct the PCC sequence from the code stream using attribute streams 931, 932, 933, and 934. Thus, the attribute streams 1031, 1032, 1033, and 1034 may be used by the codec system 200 and may also be used to support the method 100. Further, attribute streams 1031, 1032, 1033, 1034 may be used to carry one or more of attributes 841, 842. In addition, property layers 931, 932, 933, and 934 can be carried using property streams 1031, 1032, 1033, and 1034.

The attribute streams 1031, 1032, 1033, and 1034 are time-varying attribute data sequences. Specifically, attribute streams 1031, 1032, 1033, and 1034 are sub-streams of the PCC video stream. Each attribute stream 1031, 1032, 1033, and 1034 carries a sequence of attribute-specific NAL units and thus as a storage and/or transport data structure. Each attribute stream 1031, 1032, 1033, and 1034 may carry one or more attribute layers 931, 932, 933, and 934 of data. For example, attribute stream 1031 may carry attribute layers 931 and 932, while attribute stream 1032 carries attribute layers 931 and 932 (attribute streams 1033 and 1034 are omitted). In another example, each attribute stream 1031, 1032, 1033, and 1034 carries a single corresponding attribute layer 931, 932, 933, and 934. In other examples, some attribute streams 1031, 1032, 1033, and 1034 carry multiple attribute layers 931, 932, 933, and 934, while other attribute streams 1031, 1032, 1033, and 1034 carry single attribute layers 931, 932, 933, and 934 or are omitted. It can be seen that the attribute streams 1031, 1032, 1033, and 1034 and the attribute layers 931, 932, 933, and 934 can be in many combinations and permutations. Thus, the encoder may use a syntax element (e.g., num _ streams _ for _ attribute) in the sequence level data (e.g., in the GOF header) to indicate the number of attribute streams 1031, 1032, 1033, and 1034 used to encode each attribute. A decoder may then use such information (e.g., in combination with attribute layer information) to decode attribute streams 1031, 1032, 1033, and 1034 to reconstruct the PCC sequence.

Fig. 11 is a flow diagram of an example method 1100 for encoding a PCC video sequence using multiple codecs. For example, method 1100 may organize data into codestreams using property layers 931, 932, 933, and 934 and/or streams 1031, 1032, 1033, and 1034 in accordance with mechanism 800. Further, method 1100 may specify a mechanism for encoding attributes in a GOF header. Further, the method 1100 may generate the PCC video stream 700 by encoding the point cloud media frames 600 according to the point cloud media 500. Further, the codec system 200 and/or the encoder 300 may use the method 1100 in performing the encoding steps of the method 100.

Method 1100 may begin when an encoder receives a sequence of PCC frames comprising a point cloud medium. For example, the encoder may determine to encode such frames in response to receiving a user command. In method 1100, an encoder may determine that a first attribute should be encoded by a first codec and a second attribute should be encoded by a second codec. For example, when the first codec is more efficient for the first attribute and the second codec is more efficient for the second attribute, the decision may be made according to a predetermined condition and/or according to user input. Thus, in step 1101, the encoder encodes a first attribute of the sequence of PCC frames into the codestream using a first codec. Further, in step 1103, the encoder encodes a second attribute of the PCC frame sequence into the codestream using a second codec different from the first codec.

In step 1105, the encoder encodes various syntax elements into the codestream along with the encoded video data. For example, syntax elements may be coded into sequence-level data units containing sequence-level parameters in order to indicate to the decoder the decisions made during encoding so that the PCC frame can be reconstructed correctly. In particular, the encoder may encode the sequence-level data unit to include a first syntax element representing that the first attribute is coded by a first codec and representing that the second attribute is coded by a second codec. In one particular example, the PCC frame may include a plurality of attributes including a first attribute and a second attribute. Further, the plurality of attributes of the PCC frame may include one or more of geometry, texture, reflectivity, and transparency and normal. In addition, the first syntax element may be an identified _ codec _ for _ attribute element contained in a GOF header in the codestream.

In some examples, the first attributes may be organized into multiple streams. In this case, a second syntax element may be used to represent stream membership of a data unit of the codestream associated with the first attribute. In some examples, the first attributes may also be organized into multiple layers. In this case, the third syntax element may represent a layer membership of a data unit of the codestream associated with the first attribute. In one specific example, the second syntax element may be a num _ streams _ for _ attribute element, and the third syntax element may be a num _ layers _ for _ attribute element, each of which may be included in a frame group header in the bitstream. In yet another example, a fourth syntax element may be used to represent that a first layer of the plurality of layers contains data associated with an irregular point cloud. In one specific example, the fourth syntax element may be a regular _ points _ flag element contained in a frame group header in the codestream.

By including such information in the sequence-level data, the decoder may have sufficient information to decode the PCC video sequence. Thus, in step 1107, the encoder may transmit the codestream to generate a decoded sequence of PCC frames from the first attribute coded by the first codec and the second attribute coded by the second codec, as well as other attributes and/or syntax elements described herein.

Fig. 12 is a flow diagram of an example method 1200 of decoding a PCC video sequence using multiple codecs. For example, method 1200 may read data from a codestream using property layers 931, 932, 933, and 934 and/or streams 1031, 1032, 1033, and 1034 in accordance with mechanism 800. Furthermore, method 1200 may determine the mechanism for decoding the attributes by reading the GOF header. Further, the method 1200 may read the PCC video stream 700 in order to reconstruct the point cloud media frame 600 and the point cloud media 500. Further, the codec system 200 and/or decoder 400 may use the method 1200 in performing the decoding steps of the method 100.

Method 1200 may begin when a decoder receives a code stream comprising a sequence of PCC frames in step 1201. The decoder may then parse the codestream or portions thereof in step 1205. For example, the decoder may parse the code stream to obtain sequence-level data units containing sequence-level parameters. The sequence-level data unit may include various syntax elements that describe the encoding process. Thus, the decoder may parse the video data from the codestream and use the syntax elements to determine the correct process to decode the video data.

For example, the sequence-level data unit may include a first syntax element indicating that the first attribute is coded by a first codec and indicating that the second attribute is coded by a second codec. In one particular example, the PCC frame may include a plurality of attributes including a first attribute and a second attribute. Further, the plurality of attributes of the PCC frame may include one or more of geometry, texture, reflectivity, transparency, and normal. In addition, the first syntax element may be an identified _ codec _ for _ attribute element contained in a GOF header in the codestream.

In some examples, the first attribute may be organized into a plurality of streams. In this case, a second syntax element may be used to represent stream membership of a data unit of the codestream associated with the first attribute. In some examples, the first attribute may also be organized into multiple layers. In this case, the third syntax element may represent a layer membership of a data unit of the codestream associated with the first attribute. In one specific example, the second syntax element may be a num _ streams _ for _ attribute element, and the third syntax element may be a num _ layers _ for _ attribute element, each of which may be included in a frame group header in the bitstream. In yet another example, a fourth syntax element may be used to represent that a first layer of the plurality of layers contains data associated with an irregular point cloud. In one specific example, the fourth syntax element may be a regular _ points _ flag element contained in a frame group header in the codestream.

Thus, the decoder may decode the first attribute with the first codec and decode the second attribute with the second codec at step 1207 to generate a decoded sequence of PCC frames. The decoder may also utilize other attributes and/or syntax elements described herein when determining the appropriate mechanisms to employ when codec-based decoding various attributes of the PCC video sequence.

Fig. 13 is a schematic diagram of an exemplary video coding apparatus 1300. The video coding device 1300 is suitable for implementing the disclosed examples/embodiments described herein. The video coding device 1300 comprises a downstream port 1320, an upstream port 1350, and/or a transceiving unit (Tx/Rx)1310 comprising a transmitter and/or receiver for transmitting data upstream and/or downstream over a network. The video coding apparatus 1300 further includes: a processor 1330 including a Central Processing Unit (CPU) and/or logic for processing data; and a memory 1332 for storing data. Video decoding device 1300 may also include electronic, optical-to-electrical (OE), electrical-to-optical (EO), and/or wireless communication components coupled to upstream port 1350 and/or downstream port 1320 for data communication over an electrical, optical, or wireless communication network. The video coding device 1300 may also include an input/output (I/O) device 1360 to transmit data to and from a user. The I/O devices 1360 may include output devices such as a display for displaying video data, speakers for outputting audio data, and so forth. The I/O devices 1360 may also include input devices such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with such output devices.

Processor 1330 is implemented in hardware and software. Processor 1330 may be implemented as one or more CPU chips, cores (such as, for example, a multi-core processor), field-programmable gate arrays (FPGAs), Application Specific Integrated Circuits (ASICs), and Digital Signal Processors (DSPs). Processor 1330 communicates with downstream port 1320, Tx/Rx 1310, upstream port 1350, and memory 1332. Processor 1330 includes a decode module 1314. The decoding module 1314 implements the above disclosed embodiments, such as methods 100, 1100, 1200, 1500, and 1600 and mechanism 800, which may employ point cloud media 500, point cloud media frames 600, and/or PCC video stream 700 and/or streams 1031-1034 decoded at layers 931-934. Transcoding module 1314 may also implement any other methods/mechanisms described herein. In addition, the coding module 1314 may implement the codec system 200, the encoder 300, and/or the decoder 400. For example, transcoding module 1314 may use an extended set of attributes for a PCC having multiple streams and layers, and may indicate the use of such a set of attributes in the sequence level data to support decoding. Accordingly, coding module 1314 causes video coding device 1300 to provide other functionality and/or flexibility in coding PCC video data. Accordingly, the decoding module 1314 improves the functions of the video decoding apparatus 1300 and solves problems specific to the field of video decoding. Further, transcoding module 1314 may transform video transcoding device 1300 to a different state. Alternatively, the decode module 1314 may be implemented as instructions (e.g., a computer program product stored on a non-transitory medium) stored in the memory 1332 and executed by the processor 1330.

The memory 1332 includes one or more types of memory, such as a magnetic disk, a magnetic tape drive, a solid state drive, a Read Only Memory (ROM), a Random Access Memory (RAM), a flash memory, a ternary content-addressable memory (TCAM), a Static Random Access Memory (SRAM), and so forth. Memory 1332 may be used as an over-flow data storage device to store such programs when selected for execution, as well as to store instructions and data that are read during program execution.

Fig. 14 is a schematic diagram of an example system 1400 for coding PCC video sequences using multiple codecs. The system 1400 comprises a video encoder 1402, the video encoder 1402 comprising a first attribute encoding module 1401 for encoding a first attribute of a sequence of PCC frames into a codestream using a first codec. The video encoder 1402 further comprises a second attribute encoding module 1403 for encoding a second attribute of the sequence of PCC frames into the codestream using a second codec different from the first codec. The video encoder 1402 also includes a syntax encoding module 1405 for encoding a sequence-level data unit containing a sequence-level parameter into the bitstream, wherein the sequence-level data unit includes a first syntax element that indicates that the first attribute is coded by the first codec and indicates that the second attribute is coded by the second codec. The video encoder 1402 further comprises a sending module 1407 for sending the code stream to support generating a decoded sequence of PCC frames according to the first attribute coded by the first codec and the second attribute coded by the second codec. The modules of video encoder 1402 may also be used to perform any of the steps/items described above with respect to methods 1100 and/or 1500.

The system 1400 further comprises a video decoder 1410, the video decoder 1410 comprising a receiving module 1411 for receiving a bitstream comprising a sequence of PCC frames. The video decoder 1410 further includes a parsing module 1413 configured to parse the bitstream to obtain a sequence-level data unit including a sequence-level parameter, wherein the sequence-level data unit includes a first syntax element indicating that a first attribute of the PCC frame is coded by a first codec and a second syntax element indicating that a second attribute of the PCC frame is coded by a second codec. The video decoder 1410 further includes a decoding module 1415 for decoding the first attribute with the first codec and the second attribute with the second codec to generate a decoded PCC frame sequence. The modules of the video decoder 1410 may also be used to perform any of the steps/items described above with respect to the methods 1200 and/or 1600.

Fig. 15 is a flow diagram of another exemplary method 1500 of encoding a PCC video sequence using multiple codecs. For example, method 1500 may simultaneously use property layers 931, 932, 933, and 934 and/or streams 1031, 1032, 1033, and 1034 to organize data into codestreams in accordance with mechanism 800. Further, method 1500 may specify a mechanism for encoding attributes in a GOF header. Further, the method 1500 may generate the PCC video stream 700 by encoding the point cloud media frames 600 according to the point cloud media 500. Further, the method 1500 may be used by the codec system 200 and/or the encoder 300 when performing the encoding steps of the method 100.

At step 1501, a plurality of PCC attributes are encoded into the codestream as part of a sequence of PCC frames. The PCC attributes are encoded using multiple codecs. PCC attributes include geometry and texture. The PCC attributes further include one or more of reflectivity, transparency, and normal. Each coded PCC frame is represented by one or more PCC NAL units. At step 1503, an indication is encoded for each PCC attribute. The indication represents a video codec used to decode the corresponding PCC attribute. At step 1505, the code stream is sent to a decoder.

Fig. 16 is a flow diagram of another example method 1600 for decoding a PCC video sequence using multiple codecs. For example, method 1600 may concurrently use property layers 931, 932, 933, and 934 and/or streams 1031, 1032, 1033, and 1034 to read data from the codestream in accordance with mechanism 800. Further, method 1600 may determine the mechanism for decoding the attributes by reading the GOF header. In addition, the method 1600 may read the PCC video stream 700 in order to reconstruct the point cloud media frame 600 and the point cloud media 500. Further, the method 1600 may be used by the codec system 200 and/or the decoder 400 when performing the decoding steps of the method 100.

At step 1601, a codestream is received. The code stream includes a plurality of decoded PCC frame sequences. The coded sequence of PCC frames represents a plurality of PCC attributes. PCC attributes include geometry and texture. The PCC attributes further include one or more of reflectivity, transparency, and normal. Each coded PCC frame is represented by one or more PCC NAL units. At step 1603, the code stream is parsed to obtain, for each PCC attribute, an indication of a codec used to decode the corresponding PCC attribute. At step 1605, the bitstream is decoded according to the indicated video codec for the PCC attribute.

The first component is directly coupled to the second component when there are no intervening components between the first component and the second component other than wires, traces, or other media. When there are intervening components, other than wires, traces, or other media, between a first component and a second component, the first component and the second component are indirectly coupled. The term "coupled" and variations thereof include direct coupling and indirect coupling. Unless otherwise indicated, use of the term "about" is intended to include a range of the following numbers ± 10%.

While several embodiments of the present invention have been provided, it should be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of the present invention. The present examples are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein. For example, various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

Moreover, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

46页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:参考图像列表结构的索引指示

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类