Method and apparatus for using direct transcoding in point cloud compression

文档序号：1205571 发布日期：2020-09-01 浏览：6次中文

阅读说明：本技术 在点云压缩中使用直接译码的方法和设备 (Method and apparatus for using direct transcoding in point cloud compression ) 是由 D·弗林 S·拉瑟雷于 2019-01-10 设计创作，主要内容包括：使用直接译码模式来对与当前节点相关联的子体积内的点的坐标而不是对子节点的占用模式编码的用于对点云译码的方法和设备。使用直接译码的资格基于来自另一节点的占用数据。如果有资格,则在比特流中表示标志以发信号通知直接译码是否被应用于子体积中的点。(Methods and apparatus for coding a point cloud using a direct coding mode to encode coordinates of a point within a sub-volume associated with a current node rather than an occupancy pattern for the sub-node. The qualification to use direct decoding is based on occupancy data from another node. If qualified, a flag is represented in the bitstream to signal whether direct coding is applied to a point in the sub-volume.)

1. A method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the method comprising:

traversing the tree to entropy encode an occupancy pattern of a set of sub-nodes, wherein an occupancy pattern indicates which of the sub-volumes associated with respective sub-nodes contain at least one point, and wherein the traversing comprises:

for the current node associated with the sub-volume,

determining, based on occupancy data from another node, that a point within the sub-volume associated with the current node is eligible for direct coding;

determining that direct coding is to be applied based on a number of points within the sub-volume associated with the current node;

inserting a flag in the bitstream indicating that the direct coding is to be applied in association with the current node; and

entropy encoding the location coordinate data for at least some of the points within the sub-volume.

2. The method of claim 1 or claim 11, wherein determining that the current node is eligible for direct coding comprises determining eligibility based on an occupancy pattern of a parent node of the current node.

3. The method of claim 2, wherein the occupancy pattern of the parent node comprises whether peer nodes of the current node are occupied.

4. The method of claim 1 or claim 11, wherein determining that the current node is eligible for direct coding comprises determining eligibility based on occupancy states of neighboring nodes of the current node.

5. The method of claim 4, wherein the neighboring node is a node associated with a respective sub-volume that shares at least one face with the sub-volume associated with the current node.

6. The method of claim 4, wherein the neighboring node is a node associated with a respective sub-volume that shares at least one edge with the sub-volume associated with the current node.

7. The method of claim 4, wherein the neighboring node is a node associated with a respective sub-volume that shares at least one vertex with the sub-volume associated with the current node.

8. The method of any of claims 1-7 or 11, wherein determining that the current node is eligible for direct coding comprises determining an eligibility based on occupancy data of a grandparent node of the current node.

9. The method of claim 8, wherein the occupancy data of the grandparent node comprises whether sibling nodes of a parent node of the current node are occupied.

10. An encoder for encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the encoder comprising:

a processor;

a memory; and

a coded application comprising instructions executable by the processor, the instructions when executed causing the processor to perform the method of any of claims 1 to 9.

11. A method of decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometric shape of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the method comprising:

traversing the tree to entropy decode occupancy patterns of a set of child nodes, wherein an occupancy pattern indicates which of the sub-volumes associated with a respective child node contain at least one point, and wherein the traversing comprises:

for the current node associated with the sub-volume,

determining, based on occupancy data from another node, that a point within the sub-volume associated with the current node is eligible for direct coding;

decoding a flag from the bitstream, wherein the decoded flag indicates that direct coding is used for the current node; and

entropy decoding position coordinate data for a point within the sub-volume based on the decoded marker.

12. A decoder for decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the decoder comprising:

a processor;

a memory; and

decoding application comprising instructions executable by the processor, the instructions, when executed, causing the processor to perform the method of claim 11 or any one of claims 2 to 9 when dependent on claim 11.

13. A non-transitory processor-readable medium storing processor-executable instructions that, when executed by a processor, cause the processor to perform the method of any one of claims 1-9 or 11.

14. A computer readable signal containing program instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1 to 9 or 11.

Technical Field

The present application relates generally to point cloud compression, and in particular to methods and apparatus for improving compression of point clouds using inferred direct coding (coding) of sparsely populated volumes.

Background

Data compression is used in communication and computer networks to efficiently store, transmit, and reproduce information. Increasingly, there is interest in the representation of three-dimensional objects or spaces, which may involve large data sets, for which efficient and effective compression would be very useful and of high value. In some cases, a three-dimensional object or space may be represented using a point cloud, which is a set of points, each having a three-coordinate location (X, Y, Z), and in some cases, other attributes, such as color data (e.g., brightness and chromaticity), transparency, reflectivity, normal vectors, and the like. The point cloud may be static (a snapshot of a fixed object or environment/object at a single point in time) or dynamic (a chronological sequence of point clouds).

Example applications of point clouds include terrain and mapping applications. Autonomous vehicles and other machine vision applications may rely on point cloud sensor data in the form of a 3D scan of an environment, such as from a LiDAR scanner. Virtual reality simulation may rely on point clouds.

It will be appreciated that point clouds may involve a large amount of data, and it is important to compress (encode) and decode this data quickly and accurately. Accordingly, it would be advantageous to provide methods and apparatus that are capable of more efficiently and/or effectively compressing point cloud data.

Drawings

Reference will now be made, by way of example, to the accompanying drawings, which illustrate example embodiments of the present application, and in which:

FIG. 1 shows a simplified block diagram of an example point cloud encoder;

FIG. 2 shows a simplified block diagram of an example point cloud decoder;

FIG. 3 illustrates an example partial sub-volumes and associated tree structure for transcoding;

FIG. 4 illustrates recursive splitting and decoding of octrees;

FIG. 5 illustrates, in flow diagram form, an example method for encoding a point cloud;

FIG. 6 illustrates a portion of an example octree;

FIG. 7 illustrates an example of adjacent sub-volumes;

FIG. 8 illustrates, in flow diagram form, an example method for decoding a bitstream of compressed point cloud data;

FIG. 9 shows an example simplified block diagram of an encoder; and

fig. 10 shows an example simplified block diagram of a decoder.

Like reference numbers may be used in different figures to denote like components.

Detailed Description

Methods of encoding and decoding point clouds, and encoders and decoders for encoding and decoding point clouds are described. In general, this application describes methods and apparatus for coding point clouds using a direct coding mode to code coordinates of a point within a sub-volume associated with a current node, rather than an occupancy pattern for the sub-node. The qualification to use direct decoding is based on occupancy data from another node. The other node is a previously decoded node, such as a peer node of the current node or a neighboring node to the current node. The other node may be a plurality of neighboring nodes. If qualified, a flag is represented in the bitstream to signal whether direct coding is applied to a point in the sub-volume.

In one aspect, the present application describes a method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains the points of the point cloud. The method includes traversing the tree to entropy encode an occupancy pattern for the set of sub-nodes, wherein the occupancy pattern indicates which of the sub-volumes associated with the respective sub-nodes contain at least one point. The traversing comprises the following steps: for a current node associated with a sub-volume, determining that a point within the sub-volume associated with the current node is eligible for direct coding based on occupancy data from another node; determining that direct coding is to be applied based on a number of points within the sub-volume associated with the current node; inserting a flag in the bitstream indicating that direct decoding is to be applied in association with the current node; and entropy encoding the location coordinate data for at least some of the points within the sub-volume.

In another aspect, the present application describes a method of decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometric shape of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud. The method includes traversing the tree to entropy decode occupancy patterns of the set of sub-nodes, wherein the occupancy patterns indicate which of the sub-volumes associated with the respective sub-nodes contain at least one point. The traversing comprises the following steps: for a current node associated with a sub-volume, determining that a point within the sub-volume associated with the current node is eligible for direct coding based on occupancy data from another node; decoding a flag from the bitstream, wherein the decoded flag indicates that direct coding is used for the current node; and entropy decoding the position coordinate data of the point within the sub-volume based on the decoded marker.

In some implementations, determining that the current node is eligible for direct coding includes determining eligibility based on occupancy data of a parent node of the current node. In some cases, the occupancy data for a parent node includes whether a peer node of the current node is occupied.

In some implementations, determining that the current node is eligible for direct coding includes determining eligibility based on occupancy states of neighboring nodes of the current node. In some examples, the neighboring node is a node associated with a respective sub-volume that shares at least one face with the sub-volume associated with the current node. In some other examples, the neighboring node is a node associated with a respective sub-volume that shares at least one edge with the sub-volume associated with the current node. In other examples, the neighboring node is a node associated with a respective sub-volume that shares at least one vertex with the sub-volume associated with the current node.

In some implementations, determining that the current node is eligible for direct coding includes determining the eligibility based on occupancy data of grandparent nodes of the current node. In some cases, the occupancy data of the grandparent node includes whether sibling nodes of a parent node of the current node are occupied.

In another aspect, the present application describes an encoder and decoder configured to implement such encoding and decoding methods.

In another aspect, the present application describes a non-transitory computer-readable medium storing computer-executable program instructions that, when executed, cause one or more processors to perform the described encoding and/or decoding methods.

In another aspect, the present application describes a computer-readable signal containing program instructions which, when executed by a computer, cause the computer to perform the described encoding and/or decoding method.

Other aspects and features of the present application will become apparent to those ordinarily skilled in the art upon review of the following description of examples in conjunction with the accompanying figures.

Sometimes, in the following description, the terms "node" and "sub-volume" may be used interchangeably. It should be understood that a node is associated with a sub-volume. A node is a particular point on the tree, which may be an internal node or a leaf node. A sub-volume is a finite physical space represented by nodes. The term "volume" may be used to refer to the largest bounded space defined to contain a point cloud. To build a tree structure of interconnected nodes to decipher the point cloud data, a volume is recursively split into sub-volumes.

In this application, the term "and/or" is intended to cover all possible combinations and subcombinations of the listed elements, including any one of the following: listed elements, any sub-combinations, or all elements individually, without necessarily excluding others.

In this application, the phrase "at least one of … … or … …" is intended to cover any one or more of the listed elements, including any of the following: the list of elements, any sub-combination, or all elements, individually listed, is not necessarily exclusive of other elements, nor is it necessarily required that all elements be exclusive.

A point cloud is a set of points in a three-dimensional coordinate system. These points are generally intended to represent the outer surface of one or more objects. Each point has a location (position) in a three-dimensional coordinate system. The position may be represented by three coordinates (X, Y, Z), which may be cartesian or any other coordinate system. These points may have other associated attributes such as color, and in some cases may also be three-component values such as R, G, B or Y, Cb, Cr. Other relevant attributes may include transparency, reflectivity, normal vector, etc., depending on the desired application of the point cloud data.

The point cloud may be static or dynamic. For example, the detailed scan or mapping of the object or terrain may be static point cloud data. LiDAR-based scanning of an environment for machine vision purposes may be dynamic in that the point cloud (at least possibly) may change over time, e.g., one volume at a time in a continuous scan. Thus, a dynamic point cloud is a chronological sequence of point clouds.

Point cloud data can be used in many applications, including protection (scanning of historical or cultural objects), mapping, machine vision (such as automated or semi-automated automobiles), and virtual reality systems, to name a few examples. Dynamic point cloud data for machine vision and like applications may be quite different from static point cloud data for protection purposes. For example, automotive vision typically involves a low resolution, colorless, high dynamic point cloud acquired by a LiDAR (or similar) sensor at a higher capture frequency. The purpose of such point clouds is not for human consumption or viewing, but rather for machine object detection/classification in the decision-making process. For example, a typical LiDAR frame contains tens of thousands of points, whereas a high-quality virtual reality application requires millions of points. It is expected that higher resolution data will be required over time as computing speeds increase and new applications emerge.

While point data is useful, the lack of efficient and effective compression (i.e., encoding and decoding processes) may prevent adoption and deployment.

One of the more common mechanisms for transcoding point cloud data is to use a tree-based structure. In a tree-based structure, a bounded three-dimensional volume of a point cloud is recursively split into sub-volumes. The nodes of the tree correspond to sub-volumes. The decision whether to divide the sub-volume further may be based on the resolution of the tree and/or whether any points are contained in the sub-volume. A leaf node may have an occupancy flag that indicates whether its associated sub-volume contains a point. The split flag may signal whether the node has child nodes (i.e., whether the current volume has been further split into sub-volumes). In some cases, these flags may be entropy coded, and in some cases, predictive coding may be used.

One commonly used tree structure is an octree. In this structure, the volumes/sub-volumes are all cubes, and each split of a sub-volume results in eight additional sub-volumes/sub-cubes. Another commonly used tree structure is the KD-tree, in which a volume (cube or cuboid) is recursively divided into two parts by a plane orthogonal to one of the axes. Octree is a special case of a KD-tree whose volume is divided by three planes, each plane orthogonal to one of the three axes. Both examples relate to a cube or rectangular cuboid; however, the present application is not limited to such a tree structure, and in some applications the volumes and sub-volumes may have other shapes. The volume is not necessarily divided into two sub-volumes (KD-trees) or eight sub-volumes (octree), and the division of the volume may involve other divisions, including division into non-rectangular shapes or involving non-adjacent sub-volumes.

For ease of explanation, the present application may refer to octrees, as octrees are popular candidate tree structures for automotive applications, but it should be understood that the methods and apparatus described herein may be implemented using other tree structures.

Referring now to fig. 1, fig. 1 shows a simplified block diagram of a point cloud encoder 10 according to aspects of the present application. The point cloud encoder 10 includes a tree construction module 12, the tree construction module 12 for receiving point cloud data and generating a tree (in this example an octree) representing a geometry of a volumetric space containing the point cloud and indicating a location (position) or position (position) of a point of the point cloud from the geometry.

A basic process for creating an octree for transcoding a point cloud may include:

1. starting from a bounding volume (cube) containing a point cloud in a coordinate system

2. Splitting the volume into 8 subvolumes (eight subcubes)

3. For each sub-volume, if the sub-volume is empty, the sub-volume is marked with 0, and if there is at least one point in it, the sub-volume is marked with 1.

4. Repeating (2) for all sub-volumes labeled 1 to split the sub-volumes until a maximum split depth is reached

5. For all leaf sub-volumes (subcubes) of maximum depth, the leaf cube is marked with a 1 if it is not non-empty, otherwise the leaf cube is marked with a 0

Body

The above process may be described as occupying equal to the splitting process, where splitting represents occupying, with the constraint that there is a maximum depth or resolution beyond which no further splitting will occur. In this case, a single flag would signal whether a node is split and therefore occupied by at least one point, and vice versa. At maximum depth, the flag signals occupancy without further splitting.

In some implementations, splitting and occupying are independent, such that a node may be occupied and may or may not be split. There are two variants of this implementation:

1. splitting and then occupying. The semaphore indicates whether the node is split. If split, the node must contain a point-that is, the split represents an occupation. Otherwise, if the node is not split, another busy flag signals whether the node contains at least one point. Thus, when a node is not further split (i.e., it is a leaf node), the leaf node must have an associated busy flag to indicate whether it contains any points.

2. Occupancy and then split. A single flag indicates whether a node is occupied. If not occupied, no splitting will occur. If occupied, the split flag is decoded to indicate whether the node is further split.

Regardless of which process described above is used to construct the tree, the tree may be traversed in a predefined order (breadth-first or depth-first, and according to the scan pattern/order within each partitioned sub-volume) to produce a series of bits from the markers (occupancy and/or split markers). This may be referred to as serialization or binarization of the tree. As shown in fig. 1, in this example, the point cloud encoder 10 includes a binarizer 14 for binarizing the octree to produce a bitstream of binarized data representing the tree.

The bit sequence may then be encoded using an entropy encoder 16 to produce a compressed bit stream. The entropy encoder 16 may code the sequence of bits using a context model 18, which context model 18 specifies probabilities for encoding the bits based on a context determination of the entropy encoder 16. The context model 18 may be adaptively updated after each bit or defined set of bits is decoded. In some cases, the entropy encoder 16 may be a binary arithmetic encoder. In some implementations, the binary arithmetic encoder may employ Context Adaptive Binary Arithmetic Coding (CABAC). In some implementations, decoders other than arithmetic decoders may be used.

In some cases, the entropy encoder 16 may not be a binary decoder, but may operate on non-binary data. The octree data output from the tree building module 12 may not be evaluated in binary form, but may be encoded as non-binary data. For example, in the case of an octree, eight markers (e.g., occupancy markers) within a sub-volume in its scan order may be considered to be 2⁸A 1-bit number (e.g., an integer having a value between 1 and 255, since a value of 0 is not possible for splitting a sub-volume, i.e., if not occupied at all, then not split). In some implementations, the number may be encoded by an entropy encoder using a multi-symbol arithmetic coder. Within a sub-volume, for example within a cube, the sequence of markers defining the integer may be referred to as a "pattern".

As with video or image coding, point cloud coding may include a predictive operation in which efforts are made to predict the pattern of the sub-volume and the residual from the prediction is coded, rather than encoding the pattern itself. The prediction may be spatial (depending on previously coded sub-volumes in the same point cloud) or may be temporal (depending on previously coded point clouds in a time-ordered sequence of point clouds).

A block diagram of an example point cloud decoder 50 corresponding to the encoder 10 is shown in fig. 2. The point cloud decoder 50 includes an entropy decoder 52 that uses the same context model 54 as used by the encoder 10. The entropy decoder 52 receives an input bitstream of compressed data and entropy decodes the data to produce an output sequence of decompressed bits. The sequence is then converted to reconstructed point cloud data by a tree reconstructor 56. The tree reconstructor 56 reconstructs the tree structure from the decompressed data and knowledge of the scan order in which the tree data is binarized. Thus, the tree reconstructor 56 is able to reconstruct the location of the points from the point cloud (subject to the resolution of the tree decoding).

An example partial sub-volume 100 is shown in fig. 3. In this example, the slices of the sub-volume 100 are shown in two dimensions, and the size of the sub-volume 100 is 16 x 16 for ease of illustration. It should be noted that the sub-volume has been divided into four 8 x 8 sub-squares, and two of the 8 x 8 sub-squares have been further subdivided into 4 x 4 sub-squares, three of the sub-squares being further divided into 2 x 2 sub-squares, one of the 2 x 2 sub-squares then being divided into a 1 x 1 square. The 1 × 1 square is the maximum depth of the tree and represents the highest resolution of the location point data. Points from the point cloud are shown as points in the figure.

The structure of the tree 102 is shown on the right side of the sub-volume 100. The right side of tree 102 shows the sequence of split tags 104 and the corresponding sequence of occupied tags 106 acquired in a predefined breadth-first scan order. It will be observed that in this illustrative example, there is an occupancy flag for each non-split sub-volume (node), i.e., it has an associated split flag set to zero. These sequences may be entropy encoded.

The usage occupancy is shown in fig. 4Another example of a split condition. Fig. 4 illustrates the recursive splitting and decoding of an octree 150. Only a portion of octree 150 is shown. The FIFO 152 is shown as processing nodes for splitting to illustrate the breadth first property of the present process. The FIFO 152 outputs the occupied nodes 154 queued in the FIFO 152 for further splitting after processing its parent node 156. The tree builder splits the sub-volume associated with the occupancy node 154 into eight sub-volumes (cubes) and determines their occupancy. Occupancy may be indicated by an occupancy flag for each sub-volume. In the prescribed scan order, these flags may be referred to as an occupancy pattern for the node 154. The pattern may be represented by an integer representing a sequence of occupancy markers associated with the sub-volume in a predefined scan order. In the case of octree, the pattern is at [1, 255]An integer within the range.

The entropy encoder then encodes the pattern using a non-binary arithmetic encoder based on the probabilities specified by the context model. In this example, the probability may be a pattern distribution that is based on an initial distribution model and is adaptively updated. In one implementation, the pattern distribution is actually a counter of the number of times each pattern is encountered during decoding (an integer from 1 to 255). After each sub-volume is coded, the pattern distribution may be updated. The pattern distribution can be normalized as desired because the relative frequency of the patterns is closely related to the probability estimates, not the absolute counts.

Based on this pattern, those child nodes that are occupied (e.g., have a flag of 1) are then pushed into FIFO 152 for further splitting in turn (provided that these nodes are not the maximum depth of the tree).

An octree representation, or more generally any tree representation, is effective in representing points with spatial correlation because trees tend to decompose the higher order bits of the point coordinates. For octree, each depth level refines the coordinates of points within the subvolume by one bit for each component, at the expense of eight bits per refinement. Further compression is obtained by entropy coding the partition information (i.e., modes) associated with each tree node. This further compression is possible because the pattern distribution is not uniform — non-uniformity is another result of the correlation.

However, one of the problems with compressing point cloud data in a tree structure is that it does not necessarily handle isolated points very well. Recursive splitting of a sub-volume and positioning of points within the split sub-volume involves computational burden and time, and signaling the recursive splitting of a sub-volume to indicate positioning of one or several isolated points is costly in terms of bandwidth/memory storage and computational time and resources. Furthermore, outliers "contaminate" the distribution of modes, inducing many modes to have only one occupied child node, changing the balance of the distribution and penalizing the decoding of other modes.

Thus, in one aspect, the present application proposes direct decoding of location information for isolated points. Direct coding of the location of a point (e.g., coordinates within a volume or sub-volume) may be referred to as Direct Coding Mode (DCM).

Using DCM for all points is extremely inefficient. One option is to use a dedicated flag to signal for each occupied node whether DCM will be used for any point within that node; however, this option may result in excessive signaling overhead, resulting in poor compression performance.

Thus, according to another aspect of the present application, the eligibility to use DCM for an occupied node is determined based on occupancy information from other nodes. If the holding node is eligible to use DCM, a flag is inserted in the bitstream to signal whether DCM is applied.

Referring now to fig. 5, fig. 5 illustrates, in flow diagram form, an example method 200 of encoding a point cloud. The method 200 in this example involves recursively splitting the occupied nodes (sub-volumes) and breadth-first traversing the tree for transcoding.

In operation 204, it is determined whether the sub-volume is eligible for DCM with respect to a currently occupied node, e.g., a current sub-volume associated with a node of the tree occupied by at least one point. If not, the node is split and decoded according to a usual tree decoding process in operation 206. That is, in at least this example, the sub-volumes are split into sub-volumes, as shown in operation 216, and the occupancy patterns for these sub-volumes are entropy encoded in operation 218. Any of these sub-subvolumes occupied by at least one point is buffered for further splitting/encoding (into a FIFO buffer) in operation 220. Although not explicitly shown, it should be understood that the method 200 incorporates a stopping condition, such as a maximum tree depth, after which the method will not split the sub-volume further.

If the node is evaluated in operation 204 and it is determined that the node qualifies for DCM, the number of points contained in the sub-volume is evaluated against a threshold in operation 208. If the number of points in a sub-volume is less than a threshold, DCM is used. If the number of points is equal to or greater than the threshold, DCM is not used. The threshold is preset and may be hard coded or determined by the user. It may be transmitted from the encoder to the decoder in header information. The threshold may be 2 or greater. It should be appreciated that if the number is less than or equal to the threshold, the evaluation may be modified to enable DCM, and in this case the threshold may be set one point lower to achieve the same result. In any case, if DCM is not to be used, in operation 210 the DCM flag is set to negative (which in some implementations may be signaled as a value of 0) and output in the bitstream to inform the decoder that DCM will not be used in that sub-volume. The method 200 then loops to operation 206 to split and encode the sub-volumes in the usual manner.

If DCM is to be used, then in operation 212 the DCM flag is set to positive (which in some implementations may be a value of 1), and in operation 214 at least some points within the sub-volume are encoded by encoding their coordinate positions within the sub-volume. In some implementations, this may include encoding X, Y and Z Cartesian coordinate positions relative to the corners of the sub-volume. For example, the corner may be the vertex of the sub-volume closest to the origin of the coordinate system. Depending on the implementation, various techniques for encoding the coordinates may be applied, including prediction operations, differential coding, and so on.

Operation 214 is described above as encoding at least some points, rather than all points, because in some possible implementations a rate-distortion optimization procedure may be applied to evaluate whether the rate cost of DCM decoding a point exceeds the distortion cost of no point decoding. Note that if such RD optimization evaluation affects whether the parent node is already "occupied," RD optimization may need to be performed earlier in the decoding process and/or the process may involve two decoding passes.

Once the node has been encoded using DCM or regular encoding of the pass through pattern, the method 200 retrieves the next occupied node/sub-volume from the FIFO buffer, as shown in operation 222, and loops back to evaluate whether the node/sub-volume qualifies for DCM. As described above, the stop condition will eventually stop further subdivision of the sub-volume and all nodes in the FIFO will have been processed.

The qualification evaluation in operation 204 will be based on previously decoded occupancy data for the node. This allows both the encoder and decoder to independently make the same determination of eligibility. For the qualification discussion that follows, reference will be made to FIG. 6, which schematically illustrates a partial octree 300 including a current node 302. The current node 302 is an occupied node and is being evaluated for decoding. Either it is further split and its mode is coded or its points are DCM coded. The current node 302 is one of eight child nodes of the parent node 304, and the parent node 304 is a child node of the grandparent node 306.

In one embodiment, eligibility may be based on a parent schema. That is, the eligibility criterion may be based on the occupancy states of seven other nodes that are children of the parent node 304 (i.e., the peer node 307 of the current node 302). For example, if no peer node 307 is occupied, it may indicate that the current node 302 is isolated and a good candidate for possible DCM decoding.

In another embodiment, eligibility may be based on neighboring nodes (sub-volumes). In some embodiments, two nodes are adjacent if the two nodes are associated with respective sub-volumes that share at least one face. In a broader definition, nodes are adjacent if they share at least one edge. In a broader definition, two nodes are adjacent if they share at least one vertex. FIG. 7 shows a set of neighbors around a current node, where the neighbors are defined as nodes of a shared surface. In this example, the node/sub-volume is a cube, and the cube at the center of the image has six neighbors, one for each face.

In an octree, it will be appreciated that the neighbor of current node 302 will include three sibling nodes 307. It will also include three nodes that do not have the same parent node 304. Thus, some neighboring nodes will be available because they are sibling nodes, but some may or may not be available depending on whether the nodes were previously decoded. Special processing may be applied to handle the missing neighbors. In some implementations, a lost neighbor may be assumed to be occupied or may be assumed to be unoccupied, depending on whether the process is biased towards DCM. It should be appreciated that the neighbor definitions may be extended to include neighboring nodes based on shared edges or based on shared vertices to include additional neighboring sub-volumes in the evaluation.

It will also be appreciated that evaluating the immediate surrounding neighborhood of the current node 302 based on the occupancy states of neighboring nodes may be a more accurate evaluation of isolation, three of which would only share edges and one of which would only share vertices, than evaluating the occupancy states of sibling nodes. However, the evaluation of the occupancy state of a peer node has the advantage of modularity, since all relevant data for evaluation is part of the parent node 304, which means that its implementation has less memory footprint, while the evaluation of the occupancy state of a neighbor involves buffering tree occupancy data in case it is necessary to perform a qualification evaluation on future neighboring nodes.

In some cases, the above two criteria may be applied simultaneously or may be selected between the two. For example, if a neighbor is available, qualification may be based on the neighbor node; however, if one or more neighbors are not available because they are from nodes that have not yet been decoded, then qualification evaluation may revert to peer node 307 (parent mode) based analysis.

In yet another embodiment, the qualification may alternatively or additionally be based on grandparent patterns. In other words, the qualification may be based on the occupancy state of the tertiary node 308 as the peer node of the parent node 304. If the parent node 304 is the only occupied node in grandparent mode, this may be a strong indication that the node's point is isolated.

In other implementations, additional or alternative evaluations may be incorporated into the qualification evaluation. For example, the evaluation may look at the occupancy state of the neighbor nodes of the parent node or the grandparent node.

In some implementations, two or more of the above criteria for evaluating local occupancy states may be used in combination.

Determining eligibility based on any one or combination of the above evaluations may be based on all surrounding nodes being unoccupied. In some cases, it may be based on no more than a threshold number of surrounding nodes being unoccupied. In some embodiments, the threshold may be set to one. Other thresholds for determining eligibility may be selected depending on the degree of DCM desired.

In the above example, the decision to actually use DCM (as opposed to whether the node simply qualifies for it) is based on the number of points in the node. As described above, a threshold number of decision points between DCM and conventional split and mode coding may be chosen for a particular implementation. The threshold may be 1 point, 2 points, or any other suitable number for a particular application. In some cases, DCM allows only one point to be used. In this case, there is no need to signal the number of points to decode in the bitstream, since a DCM flag being positive can effectively tell the decoder that a single point will be DCM decoded in the bitstream. If more than one point can be decoded, the bitstream contains the number of points. The number of points (if decoded in the bitstream) may be signaled using any suitable decoding mechanism.

The coordinates may also be decoded using any suitable decoding mechanism. For example, in some implementations, they may be directly bypass coded. In some implementations, entropy coding may be used. In some implementations, the three coordinates may be independently decoded. The number of bits required for the coordinates depends on the resolution and generation of the treeDepth of the sub-volume of DCM. For example, if the current sub-volume is 2 in size^DThe D bit can be used for each coordinate.

Referring now to fig. 8, fig. 8 illustrates, in flow diagram form, one example method 400 for decoding a bitstream of encoded point cloud data.

In operation 402, the decoder evaluates whether the currently occupied nodes of the point cloud data tree qualify for DCM. The decoder uses the same qualification as used by the encoder. Typically, the qualification determination is based on some occupancy data from the peer node or the neighboring node, examples of which have been described above.

If the node is not eligible, the decoder splits the node and entropy decodes the occupied pattern in operation 404, and then pushes any occupied child nodes into the FIFO buffer in operation 406. However, if the node qualifies for DCM, the decoder decodes the DCM flag in operation 408. The decoded DCM flag indicates whether DCM is actually used to encode a point in the current node, as shown in operation 410. In this example, a DCM flag value of 1 corresponds to DCM being used, and a flag value of 0 corresponds to DCM not being used. If the DCM flag indicates DCM is not used, the method 400 proceeds to operations 404 and 406 to decode the pattern as usual. If the DCM flag indicates that DCM is used, the decoder decodes coordinate point data of any point in the node in operation 414.

If the encoder and decoder are configured to use DCM where there is more than one point per node, the decoder decodes the number of points in operation 412. It should be appreciated that the value may be encoded as a number less than one, since it is known that the value must be one or more. Once the decoder knows the number of encoded points, the coordinate data for each point is decoded in operation 414.

After the decoder has decoded the mode or decoded the point coordinate data, then in operation 416, it fetches the next occupied node from the FIFO buffer and returns to operation 402 to evaluate whether it is eligible for DCM decoding.

Example implementations of the above approach have been shown to provide negligible compression variation, but with significantly reduced decoding complexity. Since DCM is much more complex than octree splitting, complexity is reduced. The tree is naturally "pruned" by DCM decoding, and in some tests the decoding time is reduced by about 40%. It was found in testing that more significant compression performance improvements (measured in bits per point) can be achieved when incorporating parent neighbors (tertiary nodes) into the qualification determination.

Some of the examples above are based on a tree decoding process using the pre-occupation and post-split principle. The DCM decoding procedure may also be applied, with suitable modifications, to tree decoding procedures that rely on split-then-occupy principles.

In one example implementation, in addition to decoding the coordinate data of the points, DCM decoding may also give the coordinate data of the sub-volume at a level above the lowest resolution of the tree, and then tree decoding is performed as usual. This may be referred to as "skip depth" DCM. In such an implementation, a prescribed minimum number of tree depth levels to skip may be preset, and the actual number to skip may be signaled in the bitstream after assertion of the DCM flag. After signaling the number of depth levels, the coordinates of the recovery tree coded sub-volume are encoded.

Referring now to fig. 9, fig. 9 shows a simplified block diagram of an example embodiment of an encoder 1100. The encoder 1100 includes a processor 1102, a memory 1104, and an encoding application 1106. The encoding application 1106 may include a computer program or application stored in the memory 1104 and containing instructions that, when executed, cause the processor 1102 to perform operations such as those described herein. For example, encoding application 1106 may encode and output a bitstream encoded according to the processes described herein. It should be appreciated that the encoding application 1106 may be stored on a non-transitory computer readable medium such as an optical disc, a flash memory device, a random access memory, a hard drive, and so forth. When executed, the processor 1102 performs the operations and functions specified in the instructions to operate as a special purpose processor that implements the process (es) described above. In some examples, such a processor may be referred to as "processor circuitry".

Referring now also to fig. 10, fig. 10 shows a simplified block diagram of an example embodiment of a decoder 1200. The decoder 1200 includes a processor 1202, a memory 1204, and a decoding application 1206. The decoding application 1206 may comprise a computer program or application stored in the memory 1204 and containing instructions that, when executed, cause the processor 1202 to perform operations such as those described herein. It should be appreciated that the decoding application 1206 may be stored on a computer-readable medium, such as an optical disk, a flash memory device, a random access memory, a hard drive, or the like. When executed, the processor 1202 performs the operations and functions specified in the instructions to operate as a special purpose processor implementing the process (es) described above. In some examples, such a processor may be referred to as "processor circuitry".

It should be understood that a decoder and/or encoder according to the present application may be implemented in many computing devices, including but not limited to servers, suitably programmed general purpose computers, machine vision systems, and mobile devices. The decoder or encoder may be implemented by software containing instructions for configuring one or more processors to perform the functions described herein. The software instructions may be stored on any suitable non-transitory computer readable memory, including CD, RAM, ROM, flash memory, etc.

It will be appreciated that the decoders and/or encoders described herein, as well as the modules, routines, processes, threads, or other software components implementing the described methods/processes for configuring an encoder or decoder, may be implemented using standard computer programming techniques and languages. The application is not limited to a particular processor, computer language, computer programming convention, data structure, other such implementation details. Those skilled in the art will recognize that the described processes may be implemented as part of computer executable code stored in volatile or non-volatile memory, as part of an Application Specific Integrated Chip (ASIC), and so forth.

The present application also provides a computer readable signal encoding data generated by application of an encoding process according to the present application.

Certain adaptations and modifications of the described embodiments can be made. The embodiments discussed above are therefore to be considered in all respects as illustrative and not restrictive.

20页详细技术资料下载

Method and apparatus for using direct transcoding in point cloud compression

相关技术

网友询问留言