Method and apparatus for entropy coding of point clouds

文档序号：1205572 发布日期：2020-09-01 浏览：6次中文

阅读说明：本技术 用于对点云进行熵编码的方法和设备 (Method and apparatus for entropy coding of point clouds ) 是由 S·拉瑟雷 D·弗林于 2019-01-10 设计创作，主要内容包括：用于对点云进行编码的方法和设备。与子体积相关联的当前节点被拆分成另外的子体积,每个另外的子体积对应于当前节点的子节点,并且在编码器处,占用模式基于子节点的占用状态针对当前节点被确定。基于针对与当前节点相邻的多个节点的占用数据,概率分布从多个概率分布中被选择。编码器熵基于所选择的概率分布对占用模式进行编码,以产生用于比特流的编码数据,并且更新所选择的概率分布。解码器基于针对相邻节点的占用数据进行相同的选择,并且对比特流进行熵解码以重建占用模式。(Method and apparatus for encoding a point cloud. A current node associated with the sub-volume is split into further sub-volumes, each further sub-volume corresponding to a sub-node of the current node, and at the encoder, an occupancy pattern is determined for the current node based on the occupancy states of the sub-nodes. The probability distribution is selected from a plurality of probability distributions based on occupancy data for a plurality of nodes neighboring the current node. The encoder entropy encodes the occupancy pattern based on the selected probability distribution to produce encoded data for the bitstream, and updates the selected probability distribution. The decoder makes the same selection based on occupancy data for neighboring nodes and entropy decodes the bitstream to reconstruct the occupancy pattern.)

1. A method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes that have parent-child relationships and that represent geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the method comprising:

for a current node associated with a sub-volume split into further sub-volumes, each further sub-volume corresponding to a child node of the current node,

determining an occupancy pattern for the current node based on the occupancy states of the child nodes;

selecting a probability distribution from a plurality of probability distributions, wherein the selection is based on occupancy data for a plurality of nodes neighboring the current node;

entropy encoding the occupancy pattern based on the selected probability distribution to produce encoded data for the bitstream; and

updating the selected probability distribution based on the occupancy pattern.

2. The method of claim 1 or claim 10, wherein the occupancy data comprises an occupancy pattern for a parent node of the current node, and wherein the plurality of nodes comprises siblings that share the same parent node as the current node.

3. The method of claim 1 or claim 10, wherein the neighboring node is a node associated with a respective sub-volume that shares at least one face with a sub-volume associated with the current node.

4. The method of claim 1 or claim 10, wherein the neighboring node is a node associated with a respective sub-volume that shares at least one edge with a sub-volume associated with the current node.

5. The method of claim 1 or claim 10, wherein the neighboring node is a node associated with a respective sub-volume that shares at least one vertex with a sub-volume associated with the current node.

6. The method of any of claims 1-5 or 10, wherein the plurality of probability distributions includes a respective distribution associated with each of a plurality of occupancy patterns of the plurality of nodes adjacent to the current node.

7. The method of claim 6, wherein the respective distributions comprise distributions associated with occupancy patterns comprising full occupancy, horizontal orientation, vertical orientation, and sparse fill.

8. The method of claim 1, further comprising: determining that none of the plurality of nodes adjacent to the current node are occupied, determining that more than one of the child nodes are occupied, and encoding a flag indicating that more than one of the child nodes are occupied.

9. An encoder for encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the encoder comprising:

a processor;

a memory; and

a coded application comprising instructions executable by the processor, which when executed, cause the processor to perform the method of claim 1 or any one of claims 2 to 8 when dependent on claim 1.

10. A method of decoding a bit stream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and which represent geometry of a volumetric space, the geometry of the volumetric space being recursively split into sub-volumes and containing points of the point cloud, the method comprising:

for a current node associated with a sub-volume split into further sub-volumes, each further sub-volume corresponding to a child node of the current node,

selecting a probability distribution from a plurality of probability distributions, wherein the selection is based on occupancy data for a plurality of nodes neighboring the current node;

entropy decoding the bitstream based on the selected probability distribution to generate a reconstructed occupancy pattern for the current node, the reconstructed occupancy pattern signaling occupancy of the child node; and

updating the selected probability distribution based on the reconstructed occupancy pattern.

11. The method of claim 10, further comprising: determining that none of the plurality of nodes adjacent to the current node are occupied, and in response, decoding a flag indicating that more than one of the child nodes are occupied.

12. A decoder for decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and which represent geometry of a volumetric space, the geometry of the volumetric space being recursively split into sub-volumes and containing points of the point cloud, the decoder comprising:

a processor;

a memory; and

decoding application comprising instructions executable by the processor, which when executed cause the processor to perform the method of claim 10, 11 when dependent on claim 10, or any one of 2 to 8.

13. A non-transitory processor-readable medium storing processor-executable instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-8 or 10-11.

14. A computer readable signal containing program instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1 to 8 or 10 to 11.

Technical Field

The present application relates generally to point cloud compression and in particular to methods and apparatus for improved entropy encoding of point clouds.

Background

Data compression is used in communication and computer networks to efficiently store, transmit, and reproduce information. There is an increasing interest in the representation of three-dimensional objects or spaces, which may involve large data sets, and efficient and effective compression of such representations would be very useful and valuable. In some cases, a three-dimensional object or space may be represented using a point cloud, which is a collection of points, each having three coordinate locations (X, Y, Z), and in some cases other attributes, such as color data (e.g., brightness and chromaticity), transparency, reflectivity, normal vectors, and the like. The point cloud may be static (a snapshot of a fixed object or environment/object at a single point in time) or dynamic (a chronological sequence of point clouds).

Example applications of point clouds include terrain and mapping applications. Autonomous vehicles and other machine vision applications may rely on point cloud sensor data in the form of an environmental 3D scan, such as from a laser radar (LiDAR) scanner. Virtual reality simulation may rely on point clouds.

It should be appreciated that point clouds may involve a large amount of data, and it is important to compress (encode and decode) this data quickly and accurately. Accordingly, it would be advantageous to provide methods and apparatus for compressing point cloud data more efficiently and/or effectively.

Drawings

Reference will now be made, by way of example, to the accompanying drawings, which illustrate example embodiments of the present application, and in which:

FIG. 1 shows a simplified block diagram of an example point cloud encoder;

FIG. 2 shows a simplified block diagram of an example point cloud decoder;

FIG. 3 illustrates an example partial sub-volumes and associated tree structure for encoding;

FIG. 4 illustrates recursive splitting and encoding of octrees;

FIG. 5 shows an example scan pattern within an example cube from an octree;

FIG. 6 illustrates an example occupancy pattern within an example cube;

FIG. 7 illustrates, in flow chart form, an example method for encoding a point cloud;

FIG. 8 illustrates a portion of an example octree;

FIG. 9 illustrates an example of neighboring sub-volumes;

FIG. 10 illustrates an example neighbor configuration showing occupancy between neighboring nodes;

FIG. 11 schematically shows one illustrative embodiment of a point cloud entropy encoding process using a parent mode dependent context;

FIG. 12 shows an illustrative embodiment of a process for point cloud entropy encoding using a context dependent on a neighbor configuration;

FIG. 13 illustrates, in flow chart form, an example method for decoding a bitstream of compressed point cloud data;

FIG. 14 shows an example simplified block diagram of an encoder;

fig. 15 shows an example simplified block diagram of a decoder;

FIG. 16 shows an example Cartesian coordinate system and example rotations about axes;

FIG. 17 illustrates categories of invariance of neighbor configurations at one or several iterations of rotation about the Z-axis;

FIG. 18 illustrates the categories of invariance for vertically rotated neighbor configurations;

FIG. 19 illustrates the categories of invariance for both rotation and reflection; and

FIG. 20 shows the categories for invariance under three rotations and reflections.

Like reference numerals may have been used in different figures to designate like components.

Detailed Description

Methods of encoding and decoding point clouds, and encoders and decoders for encoding and decoding point clouds are described. A current node associated with the sub-volume is split into further sub-volumes, each further sub-volume corresponding to a sub-node of the current node, and at the encoder, an occupancy pattern is determined for the current node based on the occupancy states of the sub-nodes. A probability distribution is selected from a plurality of probability distributions based on occupancy data for a plurality of nodes adjacent to the current node. The encoder entropy encodes the occupancy pattern based on the selected probability distribution to produce encoded data for the bitstream and updates the selected probability distribution. The decoder makes the same selection based on occupancy data for neighboring nodes and entropy decodes the bitstream to reconstruct the occupancy pattern.

In one aspect, the present application describes a method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud. The method comprises the following steps: for a current node associated with a sub-volume split into further sub-volumes, each further sub-volume corresponding to a child node of the current node, determining an occupancy pattern for the current node based on the occupancy states of the child nodes; selecting a probability distribution from a plurality of probability distributions, wherein the selection is based on occupancy data for a plurality of nodes neighboring a current node; entropy encoding the occupancy pattern based on the selected probability distribution to produce encoded data for the bitstream; and updating the selected probability distribution based on the occupancy pattern.

In another aspect, the present application describes a method of decoding a bit stream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud. The method comprises the following steps: for a current node associated with a sub-volume split into further sub-volumes, each further sub-volume corresponding to a sub-node of the current node, selecting a probability distribution from a plurality of probability distributions, wherein the selecting is based on occupancy data for a plurality of nodes adjacent to the current node; entropy decoding the bitstream based on the selected probability distribution to generate a reconstructed occupancy pattern for the current node, the reconstructed occupancy pattern signaling occupancy of the child node; and updating the selected probability distribution based on the reconstructed occupancy pattern.

In some implementations, the occupancy data is an occupancy pattern for a parent node of the current node, and the plurality of nodes includes siblings that share the same parent node as the current node.

In some implementations, the neighboring nodes are nodes associated with respective sub-volumes that share at least one face with the sub-volume associated with the current node. In further implementations, the neighboring nodes are nodes associated with respective sub-volumes that share at least one edge with the sub-volume associated with the current node. In yet another implementation, the neighboring nodes are nodes associated with respective sub-volumes that share at least one vertex with the sub-volume associated with the current node.

In some implementations, the plurality of probability distributions includes a respective distribution associated with each of a plurality of occupancy patterns of a plurality of nodes adjacent to the current node. In some cases, the respective distributions include distributions associated with occupancy patterns including full occupancy, horizontal orientation, vertical orientation, and sparse filling.

In some implementations, the encoding method further includes: determining that none of a plurality of nodes adjacent to a current node are occupied; determining that more than one of the child nodes is occupied; and encoding a flag indicating that more than one of the child nodes is occupied.

In further aspects, the present application describes encoders and decoders configured to implement such encoding and decoding methods.

In yet another aspect, the present application describes a non-transitory computer-readable medium storing computer-executable program instructions that, when executed, cause one or more processors to perform the described encoding and/or decoding methods.

In yet another aspect, the present application describes a computer-readable signal containing program instructions that, when executed by a computer, cause the computer to perform the described encoding and/or decoding method.

Other aspects and features of the present application will become apparent to those ordinarily skilled in the art upon review of the following description of examples in conjunction with the accompanying figures.

Sometimes in the following description, the terms "node" and "sub-volume" are used interchangeably. It should be understood that a node is associated with a sub-volume. A node is a particular point on the tree, which may be an internal node or a leaf node. A sub-volume is a bounded physical space represented by nodes. The term "volume" may be used to refer to the largest bounded space defined to contain a point cloud. To build a tree structure of interconnected nodes to encode point cloud data, a volume is recursively divided into sub-volumes.

In this application, the term "and/or" is intended to cover all possible combinations and subcombinations of the listed elements, including any one, any subcombination, or all elements listed individually, and not necessarily to exclude additional elements.

In this application, the phrase "at least one of … or …" is intended to cover any one or more of the listed elements, including any element listed individually, any subcombination, or all elements, and does not necessarily exclude any additional elements, nor do all elements.

The point cloud is a collection of points in a three-dimensional coordinate system. These points are generally intended to represent the outer surface of one or more objects. Each point has a location (position) in the three-dimensional coordinate system. The position may be represented by three coordinates (X, Y, Z), which may be cartesian or any other coordinate system. These points may have other associated attributes such as color, and in some cases may also be three-component values such as R, G, B or Y, Cb, Cr. Other associated attributes may include transparency, reflectivity, normal vector, etc., depending on the desired application of the point cloud data.

The point cloud may be static or dynamic. For example, the detailed scan or mapping of the object or terrain may be static point cloud data. Lidar based environmental scanning for machine vision purposes may be dynamic, as the point cloud (at least possibly) varies over time, e.g. one volume at a time in succession. The dynamic point cloud is thus a chronological sequence of point clouds.

Point cloud data may be used in many applications, including protection (scanning of historical or cultural objects), mapping, machine vision (such as autonomous or semi-autonomous cars), and virtual reality systems, to name a few examples. Dynamic point cloud data for machine vision and like applications can be very different from static point cloud data for protection purposes. For example, automotive vision typically involves relatively small resolution, colorless, highly dynamic point clouds obtained by lidar (or similar) sensors having higher capture frequencies. The purpose of such point clouds is not for human consumption or viewing, but rather for machine object detection/classification in the decision-making process. For example, a typical lidar frame contains on the order of tens of thousands of points, whereas high quality virtual reality applications require millions of points. It is expected that as computing speeds increase and new applications emerge, there will be a need for higher resolution data over time.

While point cloud data is useful, the lack of efficient and effective compression (i.e., encoding and decoding processes) may prevent adoption and deployment.

One of the more common mechanisms for encoding point cloud data is through the use of tree-based structures. In a tree-based structure, the bounding three-dimensional volume of the point cloud is recursively divided into sub-volumes. The nodes of the tree correspond to sub-volumes. Whether to further divide the sub-volume may be determined based on the resolution of the tree and/or whether any points are contained in the sub-volume. A leaf node may have an occupancy flag that indicates whether its associated sub-volume contains a point. The split flag may signal whether the node has child nodes (i.e., whether the current volume has been further split into sub-volumes). In some cases, these flags may be entropy encoded, and in some cases, predictive coding may be used.

A common tree structure is an octree. In this structure, the volumes/sub-volumes are all cubes, and each split of a sub-volume results in eight additional sub-volumes/sub-cubes. Another commonly used tree structure is the KD-tree, in which a volume (cube or cuboid) is recursively divided into two parts by a plane orthogonal to one of the axes. Octree is a special case of a KD-tree, where a volume is divided by three planes, each plane orthogonal to one of the three axes. Both of these examples relate to cubes or cuboids; however, the present application is not limited to such a tree structure, and in some applications the volumes and sub-volumes may have other shapes. The partitioning of a volume does not have to be divided into two sub-volumes (KD-trees) or eight sub-volumes (octree), but may involve other partitioning, including partitioning into non-rectangular shapes or involving non-adjacent sub-volumes.

For ease of explanation and because octrees are popular candidate tree structures for automotive applications, this application may refer to octrees, but it should be understood that other tree structures may be used to implement the methods and apparatus described herein.

Referring now to fig. 1, a simplified block diagram of a point cloud encoder 10 in accordance with aspects of the present application is shown. Point cloud encoder 10 includes a tree construction module 12 for receiving the point cloud data and generating a tree (in this example, an octree) that represents the geometry of the volumetric space containing the point cloud and indicates the location or position of points from the point cloud in that geometry.

A basic process for creating an octree for encoding a point cloud may include:

1. starting with a bounding volume (cube) containing the point cloud in the coordinate system.

2. The volume is split into 8 subvolumes (eight subcubes).

3. For each sub-volume, the sub-volume is marked with 0 if it is empty and with 1 if there is at least one point.

4. For all subvolumes labeled 1, repeat (2) to split these subvolumes until the maximum split depth is reached.

5. For all leaf volumes (subcubes) of maximum depth, if it is not non-empty, it is marked as 1, otherwise it is 0.

The above process may be described as occupying being equal to the splitting process, where splitting means occupying, the constraint being that there is a maximum depth or resolution beyond which no further splitting will occur. In this case, a single flag signals whether a node is split or not and whether it is therefore occupied by at least one point or not, and vice versa. At the maximum depth, the flag signals occupancy and no further splitting is possible.

In some implementations, splitting and occupying are mutually independent, such that a node may be occupied and may or may not be split. There are two variants of this implementation:

1. splitting and then occupying. The semaphore indicates whether the node is split. If split, the node must contain a point-i.e., split means occupied. Otherwise, if the node is not split, then an additional occupancy flag signals whether the node contains at least one point. Thus, when a node is no longer split, i.e., it is a leaf node, the leaf node must have an associated busy flag to indicate whether it contains any points.

2. The first occupation and the second division. A single flag indicates whether a node is occupied. If unoccupied, no splitting occurs. If it is already occupied, then the split flag is encoded to indicate whether the node is further split.

Regardless of which process described above is used to construct the tree, the tree may be traversed in a predefined order (breadth-first or depth-first, and according to the scan pattern/order within each partitioned sub-volume) to generate a bit sequence from the markers (occupied and/or split markers). This may be referred to as serialization or binarization of the tree. As shown in fig. 1, in this example, the point cloud encoder 10 includes a binarizer 14 for binarizing an octree to produce a bitstream of binarized data representing the tree.

The bit sequence may then be encoded using an entropy encoder 16 to produce a compressed bit stream. The entropy encoder 16 may encode the sequence of bits using a context model 18, which context model 18 specifies probabilities for encoding the bits based on a context determination by the entropy encoder 16. The context model 18 may be adaptively updated after each bit or defined set of bits is encoded. In some cases, the entropy encoder 16 may be a binary arithmetic encoder. In some implementations, the binary arithmetic encoder may employ Context Adaptive Binary Arithmetic Coding (CABAC). In some implementations, encoders other than arithmetic encoders may be used.

In some cases, the entropy encoder 16 may not be a binary encoder, but may operate on non-binary data. Octree data from the output of tree building module 12 may not be evaluated in binary form, but may be encoded as non-binary data. For example, in the case of an octree, eight markers (e.g., occupancy markers) within a sub-volume in their scan order can be considered to be 2⁸A 1 bit number (e.g., an integer with a value between 1 and 255, since a 0 value is not possible for a split sub-volume, i.e., if it is completely unoccupied, it will not be split). In some implementations, the number may be encoded by an entropy encoder using a multi-symbol arithmetic encoder. Within a sub-volume (e.g., cube), the sequence of tokens defining this integer may be referred to as a "pattern".

Like video or image coding, point cloud coding may include a predictive operation in which efforts are made to predict the pattern for a sub-volume and the residuals from the prediction are encoded, rather than the pattern itself. The prediction may be spatial (depending on previously encoded sub-volumes in the same point cloud) or temporal (depending on previously encoded point clouds in a chronological sequence of point clouds).

A block diagram of an example point cloud decoder 50 corresponding to the encoder 10 is shown in fig. 2. The point cloud decoder 50 includes an entropy decoder 52 that uses the same context model 54 as used by the encoder 10. The entropy decoder 52 receives an input bitstream of compressed data and entropy decodes the data to produce an output sequence of decompressed bits. The sequence is then converted to reconstructed point cloud data by a tree reconstructor 56. The tree reconstructor 56 reconstructs the tree structure from the decompressed data and knowledge of the scan order in which the tree data is binarized. Thus, the tree reconstructor 56 is able to reconstruct the location of the points from the point cloud (subject to the resolution of the tree coding).

An example partial sub-volume 100 is shown in fig. 3. In this example, for ease of illustration, the slices of the sub-volume 100 are shown in two dimensions, and the sub-volume 100 is 16x16 in size. It should be noted that the subvolume has been divided into four 8x8 sub-squares, and two of those have been further subdivided into 4x4 sub-squares, three of which have been further divided into 2x2 sub-squares, and one of the 2x2 sub-squares then further divided into a 1x1 square. The 1x1 square is the maximum depth of the tree and represents the highest resolution for the location point data. Points from the point cloud are shown as small points in the graph.

The structure of the tree 102 is shown to the right of the sub-volume 100. The sequence of split flags 104 and the corresponding sequence of occupied flags 106 obtained in a predefined breadth-first scan order are shown to the right of tree 102. It will be observed that in this illustrative example, there is an occupancy flag for each undisrupted sub-volume (node), i.e., it has an associated split flag set to zero. These sequences may be entropy encoded.

Another example of employing an occupancy ≡ split condition is shown in fig. 4. Fig. 4 illustrates the recursive splitting and encoding of an octree 150. Only a portion of octree 150 is shown. The FIFO152 is shown as processing nodes for splitting to illustrate the breadth first nature of the present process. The FIFO152 outputs a appropriated node 154 that is queued in the FIFO152 for further splitting after processing by its parent node 156. The tree builder splits the sub-volume associated with the occupied node 154 into eight sub-volumes (cubes) and determines its occupancy. Occupancy may be indicated by an occupancy flag for each sub-volume. In the prescribed scan order, the flag may be referred to as an occupancy pattern for the node 154. The pattern may be specified by an integer representing a sequence of occupancy markers associated with the sub-volumes in the predefined scan order. In the case of an octree, the pattern is an integer in the range [1, 255 ].

The entropy encoder then encodes the pattern using a non-binary arithmetic encoder based on the probabilities specified by the context model. In this example, the probability may be a pattern distribution that is based on an initial distribution model and is adaptively updated. In one implementation, the pattern distribution is actually a counter of the number of times each pattern (an integer from 1 to 255) is encountered during encoding. After each sub-volume is encoded, the pattern distribution may be updated. The pattern distribution can be normalized as desired because the relative frequency of the patterns is closely related to the probability assessment and not to the absolute counts.

Based on the pattern, those child nodes that are occupied (e.g., have a flag of 1) are then pushed into the FIFO152 for further sequential splitting (assuming the node is not the maximum depth of the tree).

Referring now to FIG. 5, an example cube 180 from an octree is shown. The cube 180 is subdivided into eight subcubes. The scanning order used to read the flag results in an eight-bit string that can be read as an integer [1, 255] in binary form. Based on the scanning order and the final bit position of the flag of each subcube in the string, these subcubes have the values as shown in fig. 5. The scan order may be any order of the subcubes, provided that both the encoder and decoder use the same scan order.

By way of example, FIG. 6 shows a cube 200 in which four "front" subcubes are occupied. On the basis that the occupied subcube is a cube 1+4+16+64, this will correspond to pattern 85. The integer pattern number specifies the occupancy pattern in the subcube.

An octree representation, or more generally any tree representation, is effective in representing points with spatial correlation because trees tend to decompose the higher order bits of the point coordinates. For an octree, each level of depth refines the coordinates of points within the sub-volume by one bit for each component, taking eight bits per refinement. Further compression is obtained by entropy encoding the split information (i.e., the schema) associated with each tree node. This further compression is possible because the mode distribution is not uniform — non-uniformity is another result of the correlation.

One potential inefficiency in current systems is that pattern distributions (e.g., histograms of pattern numbers seen in previously encoded nodes of the tree) are developed in the process of encoding the point cloud. In some cases, the pattern distribution may be initialized to an equal probability, or to some other predetermined distribution; but using a pattern distribution means that the context model does not take into account or exploit local geometric dependencies.

Thus, according to one aspect of the present application, the encoder and decoder each maintain more than one pattern distribution (e.g., a set of probabilities associated with an occupancy pattern) and select a pattern distribution whose patterns are to be used to encode a pattern of a particular node based on some occupancy information from previously encoded nodes in the vicinity of the particular node. In one example implementation, occupancy information is obtained from an occupancy pattern of a parent node to a particular node. In another example implementation, occupancy information is obtained from one or more nodes that are proximate to a particular node.

Referring now to fig. 7, an example method 200 of encoding a point cloud is shown in flow chart form. The method 200 in this example involves recursively splitting the occupied nodes (sub-volumes) and breadth-first traversing the tree for encoding.

In operation 202, the encoder determines an occupancy pattern for the current node. The current node is an occupied node that has been split into eight child nodes, each corresponding to a respective subcube. The occupancy pattern for the current node specifies the occupancy of eight child nodes in the scan order. As mentioned above, such an occupancy pattern may be indicated using an integer between 1 and 255, e.g. an eight-bit binary string.

In operation 204, the encoder selects a probability distribution from the set of probability distributions. The selection of the probability distribution is based on occupancy information from nearby previously encoded nodes (i.e., at least one node that is a neighbor of the current node). In some embodiments, two nodes are adjacent if the two nodes are associated with respective sub-volumes that share at least one face. In a broader definition, nodes are adjacent if they share at least one edge. In a broader definition, two nodes are adjacent if they share at least one vertex. The parent mode in which the current node is a child node within provides occupancy data for the current node and seven siblings of the current node. In some implementations, the occupancy information is a parent mode. In some implementations, the occupancy information is a set of neighbor nodes that include nodes that are the same depth level of the tree as the current node but have different parent nodes. In some cases, combinations thereof are possible. For example, the set of neighbor nodes may include some sibling nodes and some non-sibling nodes.

Once the probability distribution is selected, the encoder then entropy encodes the occupancy pattern for the current node using the selected probability distribution, as indicated by operation 206. Then, in operation 208, it updates the selected probability distribution based on the occupancy pattern, e.g., it may increment a count corresponding to the occupancy pattern. In operation 210, the encoder evaluates whether there are more nodes to encode and, if so, returns to operation 202 to encode the next node.

The probability distribution selection in operation 204 will be based on occupancy data for nearby previously encoded nodes. This allows both the encoder and decoder to make the same selection independently. For the following discussion of probability distribution selection, reference will be made to FIG. 8, which schematically illustrates a partial octree 300 including a current node 302. The current node 302 is the node that is occupied and is being evaluated for encoding. The current node 302 is one of eight children of the parent node 306, and the parent node 306 is in turn a child of a grandparent node (not shown). The current node 302 is divided into eight child nodes 304. The occupancy pattern for the current node 302 is based on the occupancy of the child node 304. For example, as illustrated, the occupancy pattern may be 00110010, i.e., pattern 50, using the convention that black dots are occupied nodes.

Current node 302 has sibling node 308 with the same parent node 306. The parent pattern is an occupied pattern for the parent node 306, which as illustrated would be 00110000, pattern 48. The parent mode may be used as a basis for selecting a suitable probability distribution for entropy coding the occupancy mode for the current node.

FIG. 9 illustrates a set of neighbors around a current node, where the neighbors are defined as nodes of a shared surface. In this example, the node/sub-volume is a cube, and the cube at the center of the image has six neighbors, one for each face. In an octree, it should be understood that the neighbors of the current node will include three sibling nodes. It will also include three nodes that do not have the same parent. Thus, some neighboring nodes will be available because they are siblings, but some neighboring nodes may or may not be available, depending on whether those nodes have been previously encoded. Special processing may be applied to cope with lost neighbors. In some implementations, a lost neighbor may be assumed to be occupied or may be assumed to be unoccupied. It should be appreciated that the neighbor definitions may be extended to include shared edge-based or shared vertex-based neighboring nodes to include additional neighboring sub-volumes in the assessment.

It should also be appreciated that assessing the immediate surrounding neighborhood of the current node 302 based on the occupancy states of neighboring nodes may be a more accurate assessment of isolation than assessing the occupancy states of siblings, where three would share only one edge and one would share only one vertex (in the case of an octree). However, assessing sibling occupancy states has the advantage of being modular in that all relevant data for assessment is part of the parent node, meaning it is implemented with a small memory footprint, whereas assessing neighbor occupancy states involves buffering tree occupancy data in the case where tree occupancy data is needed to qualify future neighboring nodes.

The occupancy of the neighbors may be read in a scan order that effectively assigns values to each neighbor, much like that described above with respect to the occupancy pattern. As illustrated, the neighbor node effectively takes on values of 1, 2,4, 8, 16, or 32, and thus there are 64(0 to 63) possible neighbor occupancy configurations. This value may be referred to herein as a "neighbor configuration". As an example, fig. 10 illustrates an example of a neighbor configuration 15, where neighbors 1, 2,4, and 8 are occupied and neighbors 16 and 32 are empty.

In some cases, both of the above two criteria (parent mode and neighbor configuration) may be applied or a selection may be made between the two. For example, if a neighbor is available, then probability distribution selection may be based on neighboring nodes; otherwise, if one or more neighbors are not available because they are from nodes that have not yet been encoded, the probability distribution selection can revert to sibling-based analysis (parent mode).

In yet another embodiment, the probability distribution selection may alternatively or additionally be based on grandparent patterns. In other words, the probability distribution selection may be based on the occupancy state of a tertiary node that is a sibling of parent node 306.

In yet another implementation, additional or alternative assessments may be taken into account in the selection of the probability distribution. For example, the probability distribution selection may look at the occupancy state of the neighbor nodes of the parent node or the neighbor nodes of the grandparent node.

Any two or more of the above criteria for assessing local occupancy states may be used in combination in some implementations.

In some embodiments, the number of probability distributions may be equal to the number of possible occupancy results in the selection criteria. In other words, in the case of the parent mode for the octree, there will be 255 probability distributions. In the case of neighbor configuration, if a neighbor is defined as a shared surface, there will be 64 probability distributions. However, it should be appreciated that too much distribution may result in slow adaptation due to lack of data, i.e., context dilution. Thus, in some embodiments, similar patterns may be grouped together so as to use the same probability distribution. For example, a split distribution may be used for modes corresponding to full occupancy, vertical orientation, horizontal orientation, mostly empty, then all other cases. This may reduce the number of probability distributions to about five. It should be understood that different groupings of patterns may be formed to result in different numbers of probability distributions.

Referring now to FIG. 11, one illustrative embodiment of a process 400 for point cloud entropy encoding using a parent mode dependent context is schematically shown. In this example, the current node 402 has been split into eight child nodes, and its occupancy pattern 404 will be encoded using a non-binary entropy encoder 406. The non-binary entropy encoder 406 uses a probability distribution selected from one of six possible probability distributions 408. The selection is based on the parent mode-i.e., the selection is based on occupancy information of the parent node from the current node 402. The parent pattern is identified by an integer between 1 and 255.

The choice of probability distribution may be a decision tree that assesses whether the pattern corresponds to an entire node (e.g., pattern 255), a horizontal structure (e.g., pattern 170 or 85; assuming the Z-axis is vertical), a vertical structure (e.g., pattern 3, 12, 48, 192), a sparsely populated distribution (e.g., pattern 1, 2,4, 8, 16, 32, 64, or 128; i.e., no sibling node is occupied), a semi-sparsely populated distribution (total number of occupied nodes between the current node and sibling nodes ≦ 3), and all other cases. The example modes indicated for the different categories are examples only. For example, the "horizontal" category may include patterns that relate to two or three occupied cubes on the same horizontal level. The "vertical" category may include patterns involving three or four occupied cubes arranged in a wall-like arrangement. It should also be understood that finer gradations may be used. For example, the "horizontal" category may be further subdivided into levels in the upper portion of the cube and levels in the lower portion of the cube, with different probability distributions for each. Other occupancy pattern groupings with some correlation may be made and assigned to corresponding probability distributions. Further discussion regarding grouping of patterns in the context of neighborhood configurations and invariance between neighborhood configurations is set forth further below.

Fig. 12 shows an illustrative embodiment of a process 500 for point cloud entropy encoding using a context dependent on a neighbor configuration. This example assumes the definition of the neighbors and neighbor configuration numbers used above in connection with fig. 9. This example also assumes that each neighbor configuration has a dedicated probability distribution, which means that there are 64 different probability distributions. The current node 502 has an occupancy pattern 504 to be encoded. The probability distribution is selected based on the neighbors of the current node 502. That is, the neighbor configuration NC in [0, 63] is found and used to select the associated probability distribution.

It should be appreciated that in some embodiments, neighbor configurations may be grouped such that more than one neighbor configuration uses the same probability distribution based on similarities in patterns. In some embodiments, the process may use different arrangements of neighbors to contextualize (select) the distribution. Additional neighbors may be added, such as eight neighbors diagonally adjacent on all three axes, or twelve neighbors diagonally adjacent on two axes. Embodiments avoiding specific neighbors, such as neighbors that introduce additional dependencies in depth-first scanning or only dependencies on specific axes, may also be used to reduce the codec state of large trees.

In this example, the case where NC ═ 0 is handled in a certain manner. If no neighbor is occupied, it may indicate that the current node 502 is isolated. Thus, the process 500 further checks how many of the child nodes of the current node 502 are occupied. If only one child node is occupied, then a flag is encoded to indicate that a single child node is occupied, and the index of the node is encoded using 3 bits. If more than one child node is occupied, then the process 500 encodes the occupancy pattern using an NC-0 probability distribution.

Referring now to fig. 13, an example method 600 for decoding a bitstream of encoded point cloud data is shown in flow chart form.

In operation 602, the decoder selects one of the probability distributions based on occupancy information from one or more nodes in the vicinity of the current node. As described above, the occupancy information may be the parent pattern of the parent node from the current node, i.e., the occupancy of the current node and its siblings, or it may be the occupancy of the neighboring nodes of the current node, which may include some siblings. Other or additional occupancy information may be used in some implementations.

Once the probability distribution is selected, the decoder entropy decodes a portion of the bitstream using the selected probability distribution to reconstruct the occupancy pattern for the current node in operation 604. This mode is part of the decoder reconstructing the tree to recover the encoded point cloud data. Once the point cloud data is decoded, it may be output from a decoder for use, such as for rendering views, segmentation/classification, or other applications.

In operation 606, the decoder updates the probability distribution based on the reconstructed occupancy pattern, and then, if there are additional nodes to decode, it moves to the next node in the buffer and returns to operation 602.

Example implementations of the above-described method have been shown to provide compression improvements with negligible increase in coding complexity. Although neighbor-based selection has higher computational complexity and memory usage, it shows better compression performance than parent-mode-based selection. In some tests, the relative improvement in bits per point is between 4% and 20% over the MPEG point cloud test model. It has been noted that initializing the probability distribution based on the distribution derived from the test data results in improved performance compared to initializing with a uniform distribution.

Some of the above examples are based on a tree coding process that uses a non-binary encoder to signal the occupancy pattern. It should be appreciated that the process may be modified to use a context adaptive binary entropy coder (CABAC). The occupancy pattern may be binarized into binary information or a concatenation of pattern indices.

In a variation of the neighbor-based probability distribution selection, the number of distributions may be reduced by exploiting the symmetry of the neighborhood. By permuting the neighborhood or permuting pattern distribution, a structurally similar configuration with lines of symmetry can reuse the same distribution.

As an example, consider eight angular patterns NC e [21, 22, 25, 26, 37, 38, 41, 42], each representing the symmetry of one angular neighbor pattern. These NC values are likely to be very relevant to a particular but different node pattern. These correlation patterns may also follow the same symmetry as the neighbor patterns. For example, a method may be implemented that reuses a single distribution by permuting the probabilities of the distribution to represent multiple instances of NC.

The encoder derives a mode number for the node based on the occupancy of the child node. For child node 0.. 7, bit c is defined which corresponds to the occupancy. The pattern number is found as pn \ sum { i ═ 0}, 72^ i } × c _ i. The encoder selects the distribution and permutation functions according to the neighborhood configuration. The encoder reorders the probabilities contained within the distributions according to a permutation function and then arithmetically encodes the pattern numbers using the permuted distributions. The update of the probability of the permuted distribution by the arithmetic coder is mapped back to the original distribution by an inverse permutation function.

The corresponding decoder first selects the same distribution and permutation functions according to the neighborhood configuration. The permuted distribution, which is used by the arithmetic decoder to entropy decode the mode number, is generated in the same manner as the encoder. Then, bits including the pattern number are respectively assigned to the corresponding children \ Left \ lfluor \ div { pn } {2^ i } \\ Right \ rfluor \ text { mod } 2.

It should be noted that the same permutation can be implemented without reordering the distributed data itself, but rather introducing a level of indirection and using a permutation function to permute the look-up for a given index in the distribution.

Alternative embodiments consider permutations of the patterns themselves rather than distributions, allowing shuffling before or after entropy encoding/decoding, respectively. Such an approach may be more suitable for efficient implementation by a bitwise shuffle operation. In this case, neither the encoder nor the decoder performs the reordering of the distribution, but instead modifies the calculation of the mode number to pn \ sum _ { i ═ () } ^72^ ic _ { f (i) }, where f (i) is a permutation function. One such example of a function f (i) ({4, 7, 6, 5, 0, 3, 2, 1}) [ i ] allows a single distribution using NC ═ 22 to be used for NC ═ 41.

The method of deriving the desired permutation may be based on the rotational symmetry of the neighborhood configuration, or may be based on reflections along a particular axis. Furthermore, the permutation need not permute all positions according to, for example, symmetry; but partial permutations may be used. For example, when replacing NC 22 with NC 41, the positions on the symmetry axis may not be replaced, resulting in a mapping {0,7,2,5,4,3,6,1}, where positions 0,2,4,6 are not replaced. In other embodiments, only pairs 1 and 7 are swapped.

In the following, an example of an embodiment based on rotational symmetry and reflection is provided for the specific case of an octree, where six neighbors share a common face with the current cube. Without loss of generality, the Z-axis extends vertically relative to the direction in which the figure is viewed, as shown in fig. 16. The relative position of a neighbor such as "above" (respectively "below") should then be understood as increasing (respectively decreasing) in the Z-direction along the Z-axis. The same comments apply to left/right along the X-axis and front/back along the Y-axis.

Fig. 16 shows three rotations 2102, 2104, and 2106 along Z, Y and the X axis, respectively. The three rotations are 90 degrees, i.e. they are rotated a quarter turn along the respective axis.

Fig. 17 shows the classification of the invariance of neighbor configurations at one or several iterations of rotation 2102 along the Z-axis. This invariance represents the same statistical behavior of the point cloud geometry along any direction belonging to the XY plane. This is especially true for the use case of a car moving on the earth's surface with a local approximation of the XY plane. The horizontal configuration is a given occupancy of four neighbors (located left, right, front and back of the current cube), independent of the occupancy of the upper neighbor (2202) and the lower neighbor (2204). Under rotation 2102, the four horizontal configurations 2206, 2208, 2210, and 2212 belong to the same class of invariance. Similarly, the two configurations 2214 and 2216 belong to the same class of invariance. There are only six types of invariants (grouped under the classification set 2218) under the rotation 2102.

The vertical configuration is a given occupancy of two neighbors 2202 and 2204, independent of the occupancy of four neighbors located to the left, right, front, and back of the current cube. As shown on fig. 18, there are four possible vertical configurations. Thus, if the invariance with respect to rotation 2102 along the Z axis is considered, there are 24 possible configurations 6x 4.

The reflection 2108 along the Z-axis is shown on fig. 16. The vertical configurations 2302 and 2304 depicted on fig. 18 belong to the same class of invariance under reflection 2108. There are three classes of invariance (grouped under category 2306) under reflection 2108. Invariance under reflection 2108 means that in terms of point cloud geometry statistics, the behavior in the up and down directions is substantially the same. For a car moving on a road, it is an accurate assumption.

If one assumes invariance under both rotation 2102 and reflection 2108, there are 18 invariances resulting from the product of the two sets 2218 and 2306. These 18 classifications are shown in fig. 19.

Applying additional invariance under two additional rotations 2104 and 2106, the two configurations 2401 and 2402 belong to the same class of invariance. Further, two configurations 2411 and 2412, two configurations 2421 and 2422, three configurations 2431, 2432 and 2433, two configurations 2441 and 2442, two configurations 2451 and 2452, and the last two configurations 2461 and 2462 belong to the same category. Thus, invariance under three rotations (2102, 2104, and 2106) and reflection 2108 results in 10 classes of invariance, as shown on FIG. 20.

From the example provided above, whether invariance under three rotations and reflections is assumed, the number of valid adjacent configurations is 64, 24, 18 or 10. In the tests, the compression performance was improved in the case of 18 configurations, since the basic assumptions on invariance (rotation and reflection along the Z-axis) were in fact well satisfied and therefore did not degrade the distribution accuracy, while providing faster convergence of the distribution due to the smaller number of distributions to be updated (18 to 64). However, for 10 (vs. 18) configurations, one observes a non-negligible performance degradation, since the XY axis on the one hand and the Z axis on the other hand behave statistically differently. In applications where the number of configurations must be reduced to meet memory footprint and/or complexity constraints, a setup with 10 configurations may be interesting.

Before entropy coding, the mode undergoes the same transformation, i.e. rotation and reflection, because the neighbor configuration does belong to one of the invariance classes. This preserves the statistical consistency between the unchanged adjacent configurations and coding modes.

It should also be understood that during traversal of the tree, a child node will have some neighboring nodes at the same tree depth that have been previously visited and can be causally used as dependencies. For these same level neighbors, the collocated parent neighbors may not be consulted, but the same level neighbors may be used. Since the same level neighbors have a parent halved size, one configuration will consider the neighbors as occupied if any of the four immediately adjacent neighboring child nodes (i.e., the four child nodes sharing a face with the current node) are occupied.

Referring now to fig. 14, a simplified block diagram of an example embodiment of an encoder 1100 is shown. The encoder 1100 includes a processor 1102, a memory 1104, and an encoding application 1106. The encoding application 1106 may include a computer program or application stored in the memory 1104 and containing instructions that, when executed, cause the processor 1102 to perform operations such as those described herein. For example, the encoding application 1106 may encode a bitstream and output an encoded bitstream according to the processes described herein. It should be appreciated that the encoding application 1106 may be stored on a non-transitory computer readable medium such as a compact disk, flash memory device, random access memory, hard drive, or the like. When executed, the processor 1102 performs the operations and functions specified in the instructions to operate as a special purpose processor that implements the process (es) described above. In some examples, such a processor may be referred to as "processor circuitry".

Reference is now also made to fig. 15, which shows a simplified block diagram of an example embodiment of a decoder 1200. The decoder 1200 includes a processor 1202, a memory 1204, and a decoding application 1206. The decoding application 1206 may comprise a computer program or application stored in the memory 1204 and containing instructions that, when executed, cause the processor 1202 to perform operations such as those described herein. It should be appreciated that the decoding application 1206 may be stored on a computer-readable medium such as a compact disk, a flash memory device, a random access memory, a hard drive, and the like. When executed, the processor 1202 performs the operations and functions specified in the instructions to operate as a special purpose processor that implements the described process (es). In some examples, such a processor may be referred to as "processor circuitry".

It should be understood that a decoder and/or encoder according to the present application may be implemented in many computing devices, including but not limited to servers, appropriately programmed general-purpose computers, machine vision systems, and mobile devices. The decoder or encoder may be implemented by software containing instructions for configuring one or more processors to perform the functions described herein. The software instructions may be stored on any suitable non-transitory computer readable memory, including CD, RAM, ROM, flash memory, etc.

It will be appreciated that the decoders and/or encoders described herein, as well as the modules, routines, processes, threads, or other software components implementing the methods/processes described for configuring an encoder or decoder, may be implemented using standard computer programming techniques and languages. The application is not limited to a particular processor, computer language, computer programming convention, data structure, other such implementation details. Those skilled in the art will appreciate that the described processes may be implemented as part of computer executable code stored in volatile or non-volatile memory, as part of an Application Specific Integrated Chip (ASIC), etc.

The present application also provides a computer readable signal encoding data generated by application of an encoding process according to the present application.

Certain adaptations and modifications of the described embodiments can be made. The embodiments discussed above are therefore to be considered in all respects as illustrative and not restrictive.

28页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：竖直分层式有限字母表迭代解码

Method and apparatus for entropy coding of point clouds

相关技术

网友询问留言