Method and apparatus for binary entropy encoding and decoding of point clouds

文档序号：639583 发布日期：2021-05-11 浏览：85次中文

阅读说明：本技术 用于点云的二进制熵编解码的方法和设备 (Method and apparatus for binary entropy encoding and decoding of point clouds ) 是由 S·拉瑟雷于 2019-10-02 设计创作，主要内容包括：用于对点云进行编码或解码的方法和设备。使用熵编解码来对表示体积的子体积的占用模式的比特序列进行编解码。针对当前子体积,用于对占用模式进行熵编解码的相应熵编解码器的概率可以基于当前子体积的多个邻近子体积的占用数据和用于细分邻近子体积的占用数据而被选择。(Methods and apparatus for encoding or decoding point clouds. A bit sequence representing an occupancy pattern of a sub-volume of the volume is coded using entropy coding. For a current sub-volume, probabilities of a respective entropy codec used to entropy decode occupancy patterns may be selected based on occupancy data of a plurality of adjacent sub-volumes of the current sub-volume and occupancy data used to subdivide the adjacent sub-volumes.)

1. A method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes with parent-child relationships and the plurality of nodes representing geometries of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the method comprising:

for a current node associated with a sub-volume split into other sub-volumes, each other sub-volume corresponding to a child node of the current node,

determining an occupancy pattern for the current node based on the occupancy states of the child nodes;

selecting one or more probabilities associated with a respective entropy codec used to entropy encode the occupancy pattern, wherein the selection is based on occupancy data of a plurality of neighboring nodes to the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes; and

entropy encoding the occupancy pattern based on the selected one or more probabilities using the associated one or more entropy codecs to produce encoded data for the bitstream.

2. A method of decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the method comprising:

for a current node associated with a sub-volume split into other sub-volumes, each other sub-volume corresponding to a child node of the current node,

selecting one or more probabilities associated with a respective entropy codec used for entropy decoding occupancy patterns, wherein the selection is based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes; and

entropy decoding the bitstream based on the selected one or more probabilities using one or more associated entropy codecs to produce a reconstructed occupancy pattern for the current node that represents occupancy of the child nodes.

3. The method of claim 1 or 2, wherein selecting the one or more probabilities is based on a proximity configuration determined based on an occupancy state of each of the proximity nodes of the current node.

4. The method of claim 3, wherein if occupancy data of a neighboring node of the current node indicates that the neighboring node is occupied and the occupancy data of child nodes of the neighboring node indicates that at least one of the neighboring node's occupied child nodes is adjacent to the current node, then the neighboring node of the current node is considered occupied for purposes of determining the proximity configuration.

5. The method of claim 3 or 4, wherein if occupancy data of a neighboring node of the current node indicates that the neighboring node is occupied and the occupancy data of child nodes of the neighboring node indicates that none of the occupied child nodes of the neighboring node is adjacent to the current node, then the neighboring node of the current node is considered unoccupied for the purpose of determining the proximity configuration.

6. The method of any of claims 3-5, wherein if a neighboring node of the current node has not been coded, then the neighboring node of the current node is considered occupied for purposes of determining the neighboring configuration.

7. The method of any of the preceding claims, wherein the neighboring nodes of the current node are those nodes in the tree structure at the same depth as the current node, and their associated sub-volumes intersect the sub-volume of the current node.

8. The method of claim 4 or any one of claims 5 to 7 when dependent on claim 4, wherein the child nodes adjacent to the current node are those nodes that are one depth below the current node in the tree structure and whose associated sub-volumes intersect the sub-volume of the current node.

9. The method of any preceding claim, wherein the occupancy data of the plurality of neighboring nodes comprises an occupancy state of each of the plurality of neighboring nodes.

10. The method according to any of the preceding claims, wherein the tree structure represents an octree.

11. The method of claim 2 or any one of claims 3 to 10 when dependent on claim 2, further comprising: decoding a flag from the bitstream, the flag indicating: the one or more probabilities associated with a respective entropy codec used for entropy decoding the occupancy pattern should be selected based on the occupancy data of the plurality of neighboring nodes of the current node and the occupancy data of the child node of at least one of the plurality of neighboring nodes.

12. An encoder for encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes with parent-child relationships and the plurality of nodes representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the encoder comprising:

a processor;

a memory; and

a coding application containing instructions executable by the processor, which when executed cause the processor to perform the method of claim 1 or any one of claims 3 to 10 when dependent on claim 1.

13. A decoder for decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and the plurality of nodes representing geometries of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, the decoder comprising:

a processor;

a memory; and

a decoding application containing instructions executable by the processor, which when executed cause the processor to perform the method of claim 2 or any of claims 3 to 11 when dependent on claim 2.

14. A non-transitory processor-readable medium storing processor-executable instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-11.

15. A computer readable signal containing program instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 11.

Technical Field

The present application relates generally to point cloud compression, and in particular to a method and apparatus for binary entropy coding (coding) of point clouds.

Background

Data compression is used in communications and computer networking to efficiently store, transmit, and reproduce information. There is an increasing interest in the representation of three-dimensional objects or spaces, which may involve large data sets, and efficient and effective compression for such representations would be very useful and valuable. In some cases, a three-dimensional object or space may be represented using a point cloud, which is a collection of points having three coordinate locations (X, Y, Z), respectively, and in some cases other attributes, such as color data (e.g., luminance and chrominance), transparency, reflectivity, normal vector, and so forth. The point cloud may be static (a still object or a snapshot of the environment/object at a single point in time) or dynamic (a time-sequential sequence of point clouds).

Example applications for point clouds include topology and mapping applications. Autonomous vehicles and other machine vision applications may rely on point cloud sensor data in the form of a 3D scan of an environment, such as from a LiDAR (laser radar) scanner. Virtual reality simulation may rely on point clouds.

It should be appreciated that point clouds can involve a large amount of data, and it is of great interest to compress (encode and decode) this data quickly and accurately. Accordingly, it would be advantageous to provide methods and apparatus for more efficiently and/or effectively compressing data of a point cloud. Moreover, it would be advantageous to find a method and apparatus for coding point clouds that can be implemented using context adaptive binary entropy coding without the need to manage an excessive amount of context.

Drawings

Reference will now be made, by way of example, to the accompanying drawings, which illustrate example embodiments of the present application, and in which:

FIG. 1 shows a simplified block diagram of an example point cloud encoder;

FIG. 2 shows a simplified block diagram of an example point cloud decoder;

FIG. 3 illustrates an example partial sub-volumes and associated tree structure for codec;

FIG. 4 illustrates recursive splitting and coding of octrees;

FIG. 5 shows an example scan pattern within an example cube from an octree;

FIG. 6 illustrates an example occupancy pattern within an example cube;

FIG. 7 illustrates, in flow diagram form, an example method for encoding a point cloud;

FIG. 8 illustrates a portion of an example octree;

FIG. 9 illustrates an example of contiguous sub-volumes;

FIG. 10 illustrates an example neighbor configuration showing occupancy between neighboring nodes;

FIG. 11 diagrammatically illustrates one illustrative embodiment of a process for point cloud entropy encoding using a parent mode dependent context;

FIG. 12 shows an illustrative embodiment of a process for point cloud entropy encoding using a context dependent on a neighbor configuration;

FIG. 13 illustrates, in flow diagram form, one example method for decoding a bitstream of compressed point cloud data;

FIG. 14 shows an example simplified block diagram of an encoder;

fig. 15 shows an example simplified block diagram of a decoder;

FIG. 16 illustrates an example Cartesian coordinate system and example rotations and/or reflections about an axis;

FIG. 17 illustrates the invariance class of neighbor configurations at one or several iterations of rotation about the Z-axis;

FIG. 18 illustrates invariance categories for neighbor configurations for vertical reflections;

FIG. 19 illustrates the invariance categories for both rotation and reflection;

FIG. 20 illustrates the invariance categories for three rotations and reflections;

fig. 21 illustrates the equivalence between non-binary codec and concatenated binary codec for occupied modes;

FIG. 22 illustrates, in a flow chart, an example method for encoding and decoding occupancy patterns in a tree-based point cloud codec (coder) using binary encoding;

FIG. 23 shows a simplified block diagram of portions of an example encoder;

FIG. 24 graphically illustrates an example context reduction operation based on neighbor screening;

FIG. 25 illustrates another example context reduction operation based on neighbor screening;

FIG. 26 illustrates, in flow diagram form, one example of a method for binary coding occupancy patterns using combined context reduction;

FIG. 27 illustrates an example of adjacent sub-volumes, some of which have been coded between adjacent sub-volumes;

FIG. 28 illustrates an example of a contiguous sub-volume and a sub-volume of the contiguous sub-volume that has been coded;

figure 29 shows an example of sub-volume occupancy in adjacent sub-volumes;

FIG. 30 illustrates, in flow chart form, a method of encoding an occupancy pattern of a current node based at least in part on occupancy data of a child node of at least one of a plurality of neighboring nodes;

FIG. 31 illustrates, in flow chart form, a method of decoding an occupancy pattern of a current node based at least in part on occupancy data of a child node of at least one of a plurality of neighboring nodes;

FIG. 32 illustrates, in flow chart form, a method of deciding a neighbor configuration in dependence upon sub-volumes of adjacent sub-volumes;

FIG. 33 illustrates another example of sub-volume occupancy in adjacent sub-volumes; and

figure 34 illustrates yet another example of sub-volume occupancy in adjacent sub-volumes.

Similar reference numerals may have been used in different figures to denote similar components.

Detailed Description

Methods of encoding and decoding point clouds and encoders and decoders for encoding and decoding point clouds are described. An entropy codec (e.g., a binary entropy codec) may be used to codec a bit sequence representing an occupancy pattern of a sub-volume of the volume. The probabilities associated with respective entropy codecs used to entropy encode the occupancy patterns may be selected based on occupancy data of neighboring sub-volumes of the current sub-volume and further based on occupancy data of sub-volumes of at least one of the neighboring sub-volumes.

In examples useful for understanding the present application, the context may be based on the neighbor configuration and the partial sequence of previously coded bits in the bit sequence. A determination may be made as to whether a context reduction operation is to be applied, and if it is determined that a context reduction operation is to be applied, the operation reduces the number of available contexts. Example context reduction operations include: reduction of neighbor configurations, special handling of empty neighbor configurations, and statistical-based context merging based on masking by sub-volumes associated with previously coded bits. The reduction may be applied prior to codec, and a determination may be made during codec as to whether a condition for using the reduced context set is satisfied.

In one aspect, the present application provides a method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having parent-child relationships and which represent a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud. The method comprises the following steps: for a current node associated with a sub-volume split into other sub-volumes, each other sub-volume corresponding to a child node of the current node; an occupancy pattern for the current node determination is determined based on the occupancy states of the child nodes. The method further comprises the following steps: selecting one or more probabilities associated with a respective entropy codec used to entropy encode the occupancy pattern, wherein the selection is based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes. The method further comprises the following steps: entropy encoding the occupancy pattern based on the selected one or more probabilities using an associated one or more entropy codecs to produce encoded data for the bitstream.

In another aspect, the present application provides a method of decoding a bit stream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having a parent-child relationship and which represent a plurality of nodes of a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud. The method comprises the following steps: for a current node associated with a sub-volume split into other sub-volumes, each other sub-volume corresponding to a child node of the current node, selecting one or more probabilities associated with a respective entropy codec for entropy decoding an occupancy pattern, wherein the selecting is based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes. The method further comprises the following steps: entropy decoding the bitstream based on the selected one or more probabilities using one or more associated entropy codecs to produce a reconstructed occupancy pattern for the current node that represents occupancy of the child nodes.

In some embodiments, selecting the one or more probabilities may be based on a proximity configuration determined based on an occupancy state of each of the current node's neighbor nodes.

In some embodiments, a neighboring node of a current node may be considered occupied for the purpose of determining a proximity configuration if its neighboring node's occupancy data indicates that it is occupied and the neighboring node's occupancy data indicates that it occupies at least one of the child nodes in proximity to the current node.

In some embodiments, a neighbor node of a current node may be considered unoccupied for the purposes of determining a proximity configuration if its neighbor node's occupancy data indicates that it is occupied and the neighbor node's occupancy data indicates that none of its occupied child nodes are in proximity to the current node. This may correspond to (intentionally/artificially) setting the occupancy bit of the neighboring node to zero when determining the neighboring configuration.

In some embodiments, a neighbor node of a current node may be considered occupied for the purpose of determining a neighbor configuration if its occupancy data indicates that it is occupied and the neighbor node has not yet been coded. When a neighboring node has not yet been coded, the decoder does not yet have information about the occupancy of its child nodes, and therefore such information cannot be used to treat the neighboring node as occupied or unoccupied for the purpose of determining the neighboring configuration.

In some embodiments, if the occupancy data of a neighbor node to the current node indicates that it is unoccupied, then the neighbor node may be considered unoccupied for the purpose of determining the proximity configuration.

In some implementations, the neighboring nodes to the current node may be those nodes that are at the same depth in the tree structure as the current node, and the associated sub-volumes of those nodes intersect the sub-volume of the current node.

In some implementations, the child nodes adjacent to the current node may be those nodes that are one depth below the current node in the tree structure, and the associated sub-volumes of those nodes intersect the sub-volume of the current node.

In some embodiments, the occupancy data of the plurality of neighboring nodes may include an occupancy state of each of the plurality of neighboring nodes.

In some embodiments, the tree structure may represent an octree.

In some embodiments, the encoding method may further include: encoding a flag indicating that one or more probabilities associated with a respective entropy codec used to entropy encode the occupancy pattern have been selected based on occupancy data of a plurality of neighboring nodes to the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes.

In some embodiments, the decoding method may further include: decoding a flag indicating that one or more probabilities associated with a respective entropy codec used to entropy decode the occupancy pattern should be selected based on occupancy data of a plurality of neighboring nodes to the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes.

In another aspect, the present application provides a method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being defined in a tree structure having a plurality of nodes having a parent-child relationship and representing a plurality of nodes of a geometry of a volumetric space, the volumetric space being recursively split into sub-volumes and containing points of the point cloud, wherein occupancy of a sub-volume of a volume is indicated using a bit sequence, wherein each bit of the bit sequence indicates occupancy of the respective sub-volume within the volume in a scanning order, and wherein the volume has a plurality of neighboring volumes, an occupancy pattern of the neighboring volumes being a neighbor configuration. The method comprises the following steps: determining, for at least one bit of a sequence of bits of the volume, that a context reduction condition is fulfilled, and on this basis, selecting a reduced set of contexts comprising fewer contexts than a product of a neighboring configuration's count and a number of previously coded bits in the sequence; selecting a context from the reduced set of contexts for encoding and decoding at least one bit based on an occupancy state of at least some of the neighboring volumes and at least one previously encoded bit of the bit sequence; entropy encoding at least one bit based on the selected context using a binary entropy encoder to produce encoded data for the bitstream; and updating the selected context.

In another aspect, the present application provides a method of decoding a bit stream of compressed point cloud data to produce a reconstructed point cloud, the point cloud being defined in a tree structure having a plurality of nodes having a parent-child relationship and representing a geometry of a volumetric space that is recursively split into sub-volumes and contains points of the point cloud, wherein occupancy of a sub-volume of the volume is indicated using a bit sequence, wherein each bit of the bit sequence indicates occupancy of the respective sub-volume within the volume in scan order, and wherein the volume has a plurality of neighboring volumes, an occupancy pattern of the neighboring volumes being a neighbor configuration. The decoding method comprises the following steps: determining, for at least one bit of a sequence of bits of the volume, that a context reduction condition is fulfilled, and on this basis, selecting a reduced set of contexts comprising fewer contexts than a product of a neighboring configuration's count and a number of previously coded bits in the sequence; selecting a context from the reduced set of contexts for encoding and decoding at least one bit based on an occupancy state of at least some of the neighboring volumes and at least one previously encoded bit of the bit sequence; entropy decoding at least one bit based on the selected context using a binary entropy decoder to generate reconstructed bits from the bitstream; and updating the selected context.

In some embodiments, the context reduction conditions may include: determining that the occupancy bits of the one or more previous codecs are associated with one or more respective sub-volumes positioned between one or more of the sub-volumes and adjacent volumes associated with the at least one bit. In some cases, this may include: it is determined that the four sub-volumes associated with the previously encoded bits share a face with a particular neighboring volume.

In some embodiments, the context reduction conditions may include: it is determined that at least four bits of the bit sequence have been previously coded.

In some implementations, determining that the context reduction condition is satisfied may include: determining the occupancy pattern of the neighboring volumes indicates that the plurality of neighboring volumes are unoccupied. In some of those cases, the selected reduced set of contexts may include a number of contexts corresponding to a number of previously coded bits in the bit sequence, and optionally selecting a context may include selecting a context based on a sum of previously coded bits in the bit sequence.

In some embodiments, the context reduction conditions may include: it is determined that at least a threshold number of bits in the bit sequence have been previously coded, and the reduced context set may include a lookup table that maps neighbor configurations to fewer contexts than each possible combination of patterns of previously coded bits in the bit sequence. In some examples, upon determining that the distance measure between the respective pair of available contexts is less than the threshold, a lookup table may be generated based on iteratively grouping the available contexts into multiple categories, and each category of the multiple categories may include the respective context in a smaller set, and there may be an available context for each possible combination of the neighbor configuration and the pattern of previously coded bits in the bit sequence.

In some embodiments, at least some of the adjacent volumes are adjacent volumes that share at least one face with the volume.

In another aspect, the present application describes an encoder and decoder configured to implement such encoding and decoding methods.

In yet another aspect, the present application describes a non-transitory computer-readable medium storing computer-executable program instructions that, when executed, cause one or more processors to perform the described encoding and/or decoding methods.

In yet another aspect, the present application describes a computer-readable signal containing program instructions that, when executed by a computer, cause the computer to perform the described encoding and/or decoding method.

The present application also describes computer-implemented applications, including terrain applications, mapping applications, automotive industry applications, automotive applications, virtual reality applications, and cultural heritage applications, among others. These computer-implemented applications include the following processes: receiving a data stream or data file, unpacking the data stream or data file to obtain a bit stream of compressed point cloud data, and decoding the bit stream as described in the above aspects and embodiments thereof. Thus, these computer-implemented applications utilize point cloud compression techniques in accordance with aspects and embodiments thereof described throughout this application.

Methods of encoding and decoding point clouds and encoders and decoders for encoding and decoding point clouds are also described. In some embodiments, the receiving unit receives multiplexed data obtained by multiplexing the coded point cloud data with other coded data types (such as metadata, images, video, audio, and/or graphics). The receiving unit includes a demultiplexing unit for separating the multiplexed data into the coded point data and other coded data and at least one decoding unit (or decoder) for decoding the coded point cloud data. In some other embodiments, the transmitting unit transmits multiplexed data obtained by multiplexing the coded point cloud data with other coded data types (such as metadata, images, video, audio, and/or graphics). The transmitting unit comprises at least one encoding unit (or encoder) for encoding the point cloud data and a multiplexing unit for combining the encoded and decoded point cloud data and the other encoded and decoded data into multiplexed data.

Other aspects and features of the present application will become apparent to those ordinarily skilled in the art upon review of the following description of examples in conjunction with the accompanying figures.

Any feature described with respect to one aspect or embodiment of the invention may also be used with respect to one or more other aspects/embodiments. These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described herein.

Sometimes, in the following description, the terms "node", "volume", and "sub-volume" may be used interchangeably. It should be appreciated that the nodes are associated with volumes or sub-volumes. A node is a particular point on the tree that may be an internal node or a leaf node. A volume or sub-volume is a bounded physical space represented by nodes. In some cases, the term "volume" may be used to refer to the largest bounded space defined to contain a point cloud. The volume may be recursively divided into sub-volumes for the purpose of building a tree structure of interconnected nodes to encode the point cloud data.

In this application, the term "and/or" is intended to cover all possible combinations and subcombinations of the listed elements, including any one, any subcombination, or all elements listed individually, but not necessarily excluding additional elements.

In this application, the phrase "at least one of … … or … …" is intended to cover any one or more of the listed elements, including any one, any subcombination, or all of the elements listed individually, but not necessarily excluding any additional elements, nor all elements.

The point cloud is a collection of points in a three-dimensional coordinate system. These points are generally intended to represent the exterior surface of one or more objects. Each point has a location (position) in a three-dimensional coordinate system. The position may be represented by three coordinates (X, Y, Z), which may be cartesian or any other coordinate system. These points may have other associated attributes (such as color), and in some cases these attributes may also be three component values, such as R, G, B or Y, Cb, Cr. Other associated attributes may include transparency, reflectivity, normal vector, etc., depending on the desired application of the point cloud data.

The point cloud may be static or dynamic. For example, the detailed scan or mapping of the object or terrain may be static point cloud data. LiDAR-based environmental scans for machine vision purposes may be dynamic in that the point cloud changes (at least potentially) over time, for example, with each successive scan of a volume. Thus, a dynamic point cloud is a time-sequential sequence of point clouds.

Point cloud data may be used in several applications, including protection (scanning of historical or cultural objects), mapping, machine vision (such as autonomous or semi-autonomous cars), and virtual reality systems, to give some examples. Dynamic point cloud data for applications such as machine vision may be completely different from static point cloud data for protection purposes. For example, automotive vision typically involves relatively small resolution, colorless, highly dynamic point clouds obtained by LiDAR (or similar) sensors at high capture frequencies. The purpose of such point clouds is not for human consumption or viewing, but rather for machine object detection/classification in the decision process. By way of example, a typical LiDAR frame contains tens of thousands of points, whereas a high quality virtual reality application requires millions of points. It is expected that higher resolution data will be required over time as the speed of operation increases and new applications are discovered.

While point cloud data is useful, the lack of efficient and effective compression (i.e., encoding and decoding processes) may prevent adoption and deployment. A particular challenge that does not arise in the case of other data compression (such as audio or video) when coding a point cloud is coding the geometry of the point cloud. Point clouds tend to be sparsely distributed, making it more challenging to efficiently encode and decode the locations of the points.

One of the more common mechanisms for coding and decoding point cloud data is through the use of a tree-based structure. In a tree-based structure, a bounded three-dimensional volume of a point cloud is recursively divided into sub-volumes. The nodes of the tree correspond to sub-volumes. Whether to further partition the sub-volume may be determined based on the resolution of the tree and/or whether there are any points contained in the sub-volume. A leaf node may have an occupancy flag that indicates whether its associated sub-volume contains a point or does not contain a point. The split flag may indicate whether the node has child nodes (i.e., whether the current volume has been further split into sub-volumes). In some cases, entropy coding, and in some cases, predictive coding may be used for the markers.

A commonly used tree structure is an octree. In this structure, the volumes/sub-volumes are all cubes, and each split of a sub-volume results in eight other sub-volumes/sub-cubes. Another commonly used tree structure is the KD-tree, in which a volume (cube or cuboid) is recursively divided into two parts by a plane orthogonal to one of the axes. Octree is a special case of a KD-tree, where a volume is divided by three planes, each orthogonal to one of the three axes. Both examples relate to cubes or cuboids; however, the present application is not limited to such tree structures, and in some applications the volumes and sub-volumes may have other shapes. The volume does not have to be divided into two sub-volumes (KD-trees) or eight sub-volumes (octree), but may involve other partitioning, including dividing into non-rectangular shapes or involving non-adjacent sub-volumes.

For ease of explanation, and because octrees are popular candidate tree structures for automotive applications, this application may refer to octrees, but it should be understood that the methods and apparatus described herein may be implemented using other tree structures.

Referring now to fig. 1, a simplified block diagram of a point cloud encoder 10 is shown, according to an aspect of the present application. Point cloud encoder 10 includes a tree building module 12 for receiving point cloud data and generating a tree (in this example, an octree) that represents the geometry of the volumetric space containing the point cloud and indicates the locations or positions of points from the point cloud in the geometry.

A basic process for creating an octree for coding (code) a point cloud may include:

1. starting from a bounding volume (cube) containing a point cloud in a coordinate system

2. Splitting the volume into 8 subvolumes (eight subcubes)

3. For each sub-volume, if the sub-volume is empty, then mark the sub-volume with 0, or if there is at least one point in the sub-volume, then mark the sub-volume with 1

4. Repeating (2) for all subvolumes labeled 1 to split those subvolumes until a maximum split depth is reached

5. For all leaf volumes (subcubes) of maximum depth, if it is not empty, then label the leaf cube with 1, otherwise label the leaf cube with 0.

The above process may be described as occupying being equal to splitting the process, where splitting implies occupying, and the constraint is that there is a maximum depth or resolution beyond which no further splitting will occur. In this case, a single flag indicates whether a node is split, and thus whether the node is occupied by at least one point, and vice versa. At the maximum depth, the mark indicates occupancy, where no further splitting is possible.

In some embodiments, splitting and occupying are independent, such that a node may be occupied and may or may not be split. There are two variations of this embodiment:

1. occupied after splitting. The signal flag indicates whether the node is split. If split, the node must contain a point-i.e., the split implies occupancy. Otherwise, if the node is not split, another occupancy flag indicates whether the node contains at least one point. Thus, when a node is not further split (i.e., the node is a leaf node), the leaf node must have an associated occupancy flag to indicate whether the leaf node contains any points.

2. Splitting after occupation. The signal flag indicates whether the node is occupied. If not occupied, no splitting occurs. If occupied, a split flag is encoded to indicate whether the node is further split or not.

Regardless of which of the above-described processes is used to construct the tree, the tree may be traversed in a predefined order (breadth-first or depth-first and according to the scan pattern/order within each partitioned sub-volume) to generate a bit sequence from the markers (occupancy and/or split markers). This may be referred to as serialization or binarization of the tree. As shown in fig. 1, in this example, the point cloud encoder 10 includes a binarizer 14 for binarizing an octree to produce a bitstream of binarized data representing the tree.

The bit sequence may then be encoded using an entropy encoder 16 to produce a compressed bit stream. The entropy encoder 16 may encode the sequence of bits using a context model 18 that specifies probabilities for encoding and decoding the bits based on a context determination by the entropy encoder 16. The context model 18 may be adaptively updated after each bit or defined set of bits is coded. In some cases, the entropy encoder 16 may be a binary arithmetic encoder. In some embodiments, the binary arithmetic encoder may employ Context Adaptive Binary Arithmetic Coding (CABAC). In some embodiments, codecs other than arithmetic codecs may be used.

In some cases, the entropy encoder 16 may not be a binary codec, but may operate on non-binary data. Octree data from the output of tree building module 12 may not be evaluated in binary form, but may be encoded as non-binary data. For example, in the case of an octree, eight markers (e.g., occupancy markers) within a sub-volume in their scan order may be considered to be 2⁸A 1 bit number (e.g., an integer having a value between 1 and 255, since a value of 0 is not possible for a split sub-volume, i.e., if a sub-volume is completely unoccupied, it will not be split). In some implementations, the number may be encoded by an entropy encoder using a multi-symbol arithmetic codec. Within a sub-volume (e.g., cube), the sequence of markers that define the integer may be referred to as a "pattern".

As with video or image codecs, point cloud codecs may include predictive operations in which an effort is made to predict the pattern of a sub-volume. The prediction may be spatial (dependent on previously coded sub-volumes in the same point cloud) or temporal (dependent on previously coded point clouds in a time-ordered sequence of point clouds).

A block diagram of an example point cloud decoder 50 corresponding to the encoder 10 is shown in fig. 2. The point cloud decoder 50 includes an entropy decoder 52 that uses the same context model 54 used by the encoder 10. The entropy decoder 52 receives an input bitstream of compressed data and entropy decodes the data to produce an output sequence of decompressed bits. The sequence is then converted into reconstructed point cloud data by a tree reconstructor 56. The tree reconstructor 56 reconstructs the tree structure from the decompressed data and knowledge of the scan order in which the tree data was binarized. Thus, the tree reconstructor 56 is able to reconstruct the location of the points from the point cloud (limited by the resolution of the tree codec).

An example partial sub-volume 100 is shown in fig. 3. In this example, the sub-volume 100 is shown in two dimensions for ease of illustration, and the size of the sub-volume 100 is 16 x 16. It should be noted that the sub-volume has been divided into four 8 x 8 sub-squares, and two of these four sub-squares have been further subdivided into 4 x 4 sub-squares, three of the 4 x 4 sub-squares are further divided into 2 x 2 sub-squares, and then one of the 2 x 2 sub-squares is divided into a 1 x 1 square. The 1 × 1 square is the maximum depth of the tree and represents the highest resolution for the location point data. Points from the point cloud are shown as dots in the figure.

The structure of the tree 102 is shown on the right side of the sub-volume 100. On the right side of the tree 102 is shown a sequence of split markers 104 and a corresponding sequence of occupied markers 106 obtained in a predefined breadth-first scan order. It will be observed that in this illustrative example, there is an occupancy flag for each sub-volume (node) that is not split (i.e., has an associated split flag set to zero). These sequences may be entropy encoded.

Another example of employing the occupancy ≡ split condition is shown in fig. 4. Fig. 4 illustrates the recursive splitting and coding of the octree 150. Only a portion of octree 150 is shown. The FIFO 152 is shown as processing nodes for splitting to illustrate the breadth first nature of the present process. The FIFO 152 outputs the occupied node 154 queued in the FIFO 152 for further splitting after processing its parent node 156. The tree builder splits the sub-volume associated with the occupancy node 154 into eight sub-volumes (cubes) and determines its occupancy. Occupancy may be indicated by an occupancy marker for each sub-volume. In the prescribed scan order, the labels may be referred to as the occupancy pattern of the nodes 154. The pattern may be specified by an integer representing a sequence of occupancy markers associated with the sub-volume in the predefined scan order. In the case of octrees, the patterns are integers within the range [1,255 ].

The entropy encoder then encodes the pattern using a non-binary arithmetic encoder based on the probabilities specified by the context model. In this example, the probabilities may be based on the pattern distribution of the initial distribution model and adaptively updated. In one embodiment, the mode distribution is actually a counter of the number of times each mode (integer from 1 to 255) has been encountered during codec. The pattern distribution may be updated after each sub-volume is coded. Since the relative frequency of the pattern is germane to the probability estimates and not to the absolute counts, the pattern distribution can be normalized as needed.

Based on the pattern, those child nodes that are occupied (e.g., with the flag ═ 1) are then pushed into the FIFO 152 for further splitting in turn (provided the node is not the maximum depth of the tree).

Referring now to FIG. 5, an example cube 180 from an octree is shown. The cube 180 is subdivided into eight subcubes. The scanning order used to read the indicia produces an eight-bit string that can be read as an integer [1,255] in binary form. The subcubes have the values shown in fig. 5 based on the scanning order and the resulting bit positions of the marker of each subcube in the string. The scan order can be any sequence of subcubes, provided that both the encoder and decoder use the same scan order.

By way of example, FIG. 6 shows a cube 180 that occupies four "front" subcubes. On the basis that the occupied subcube is a cube 1+4+16+64, this will correspond to pattern 85. The integer pattern number specifies the occupancy pattern in the subcube.

Because trees tend to factor the higher order bits of the point coordinates, octree representation, or more generally any tree representation, is efficient in representing points with spatial correlation. For octrees, each depth level refines the coordinates of points within a sub-volume by one bit for each component, taking eight bits per refinement. Further compression is obtained by entropy coding the split information (i.e., patterns) associated with each tree node. This further compression is possible because the mode distribution is not uniform (non-uniformity is another result of the correlation).

One potential inefficiency in current systems is that the pattern distribution (e.g., the histogram of pattern numbers seen in previously coded nodes of the tree) is developed during the coding of the point cloud. In some cases, the pattern distribution may be initialized to be equi-probable, or may be initialized to some other predetermined distribution; but using a pattern distribution means that the context model does not take into account or take advantage of local geometric dependencies.

In european patent application No. 18305037.6, the applicant describes a method and apparatus for selecting between available mode distributions for codec of occupancy modes of a particular node based on some occupancy information from previously codec nodes in the vicinity of the particular node. In one example embodiment, occupancy information is obtained from an occupancy pattern of a parent node to a particular node. In another example embodiment, occupancy information is obtained from one or more nodes that are proximate to a particular node. The content of european patent application No. 18305037.6 is incorporated herein by reference.

Referring now to fig. 7, an example method 200 of encoding a point cloud is shown in flow chart form. In this example, the method 200 involves a recursive splitting of the occupied nodes (sub-volumes) and a breadth-first traversal of the trees used for encoding and decoding.

In operation 202, the encoder determines an occupancy pattern of the current node. The current node is an occupied node that has been split into eight child nodes, each corresponding to a respective subcube. The occupancy pattern of the current node specifies the occupancy of eight child nodes in scan order. As described above, an integer between 1 and 255 (e.g., an eight-bit binary string) may be used to indicate the occupancy pattern.

In operation 204, the encoder selects a probability distribution from the set of probability distributions. The selection of the probability distribution is based on some occupancy information from nearby previously codec nodes (i.e. at least one node that is a neighbor of the current node). In some embodiments, two nodes are proximate if they are associated with respective sub-volumes that share at least one face. In a broader definition, nodes are adjacent if they share at least one edge. In yet another broader definition, two nodes are adjacent if they share at least one vertex. The parent mode, in which the current node is a child node, provides the current node with occupancy data for the current node and seven sibling nodes. In some implementations, the occupancy information is a parent mode. In some implementations, the occupancy information is occupancy data for a set of neighbor nodes that includes nodes at the same tree depth level as the current node but with different parent nodes. In some cases, combinations of these are possible. For example, the set of neighbor nodes may include some sibling nodes and some non-sibling nodes.

As indicated by operation 206, once the probability distribution has been selected, the encoder then entropy encodes the occupancy pattern of the current node using the selected probability distribution. The encoder then updates the selected probability distribution in operation 208 based on the occupancy pattern, e.g., the encoder may increment a count corresponding to the occupancy pattern. In operation 210, the encoder evaluates whether there are other nodes to be coded and, if so, returns to operation 202 to code the next node.

The probability distribution selection in operation 204 will be based on the occupancy data of nearby previously codec nodes. This allows both the encoder and decoder to make the same selection independently. For the following discussion of probability distribution selection, reference will be made to FIG. 8, which graphically illustrates a partial octree 300 including a current node 302. The current node 302 is an occupied node and is being evaluated for codec. The current node 302 is one of the eight children of the parent node 306, which in turn is a child of a grandparent node (not shown). The current node 302 is divided into eight child nodes 304. The occupancy pattern of the current node 302 is based on the occupancy of the child node 304. For example, as illustrated, using black dots is a specification of occupancy nodes, which may be 00110010, i.e., pattern 50.

The current node 302 has a peer node 308 with the same parent node 306. The parent pattern is the occupancy pattern of the parent node 306, which as illustrated would be 00110000, i.e., pattern 48. The parent mode may serve as a basis for selecting an appropriate probability distribution to entropy encode the occupancy pattern of the current node.

FIG. 9 illustrates a set of neighbors around a current node, where the neighbors are defined as nodes of a shared surface. In this example, the node/sub-volume is a cube, and the cube at the center of the image has six neighbors, one for each face. In an octree, it should be appreciated that the neighbor of the current node will include three sibling nodes. The neighbor of the current node will also include three nodes that do not have the same parent node. Thus, the occupancy data of some of the neighboring nodes will be available because they are siblings, but the occupancy data of some neighboring nodes may or may not be available, depending on whether those nodes have been previously coded. Special treatment may be applied to handle missing neighbors. In some embodiments, a missing neighbor may be assumed to be occupied or may be assumed to be unoccupied. It should be appreciated that the neighbor definition may be extended to include neighboring nodes based on shared edges or based on shared vertices to include additional neighboring sub-volumes in the evaluation.

It should be appreciated that the foregoing process looks at the occupancy of nearby nodes in an attempt to determine the likelihood of occupancy of the current node 302 in order to select the more appropriate context(s) and use more accurate probabilities for entropy coding the occupancy data of the current node 302. It should be appreciated that the occupancy states of neighboring nodes that share a face with the current node 302 may be a more accurate assessment of whether the current node 302 is likely to be isolated than based on an assessment of the occupancy states of sibling nodes, three of which will only share edges and one of which will only share vertices (in the case of an octree). However, the evaluation of occupancy status of peers has the advantage of modularity, since all relevant data for evaluation is part of the parent node, which means that it has a small memory footprint for implementation, whereas the evaluation of neighbor occupancy states involves buffering tree occupancy data, which is not needed when determining neighbor occupancy states in connection with coding future nearby nodes.

The occupancy of the neighbors may be read in a scan order that effectively assigns values to each neighbor, much like that described above with respect to the occupancy pattern. As illustrated, the neighboring node effectively assumes the values 1,2, 4, 8, 16, or 32, and thus there are 64(0 to 63) possible neighbor occupancy configurations. This value may be referred to herein as a "neighbor configuration". As an example, fig. 10 illustrates an example of a neighbor configuration 15 in which neighbors 1,2, 4, and 8 are occupied, while neighbors 16 and 32 are empty.

In some cases, both of the above criteria (parent mode and neighbor configuration) may be applied simultaneously or may be selected in between. For example, if a neighbor is available, then probability distribution selection may be made based on neighboring nodes; however, if one or more neighbors in a neighbor are not available because they are from nodes that have not yet been coded, then probability distribution selection can revert to peer node-based analysis (parent mode).

In yet another embodiment, the probability distribution selection may alternatively or additionally be based on a grandparent pattern. In other words, the probability distribution selection may be based on the occupancy state of a tertiary parent node that is a peer of parent node 306.

In yet another embodiment, additional or alternative evaluations may be taken into account in the selection of the probability distribution. For example, the probability distribution selection may look at the occupancy states of the neighbor nodes of the parent node or the neighbor nodes of the grandparent node.

Any two or more of the above criteria for evaluating local occupancy states may be used in combination in some implementations.

In the case of a non-binary entropy codec, the occupancy data of the current node may be coded by selecting a probability distribution. The probability distribution contains some probability corresponding to the number of possible occupancy patterns of the current node. For example, in the case of coding and decoding the occupation pattern of the octree, there is 2⁸-1-255 possible patterns, which means that each probability distribution comprises 255 probabilities. In some embodiments, the number of probability distributions may be equal to the number of possible occupancy results in the selection criteria, i.e. using neighbor, peer and/or parent occupancy data. For example, in case the parent pattern of the octree is used as the selection criterion for determining the probability distribution to be used, there will be 255 probability distributions respectively relating to 255 probabilities. In the case of neighbor configuration, if a neighbor is defined as a shared surface, there will be 64 probability distributions, where each distribution contains 255 probabilities.

It is understood that too much distribution can result in slow adaptation due to insufficient data (i.e., contextual dilution). Thus, in some embodiments, similar patterns may be grouped so that the same probability distribution is used. For example, a single distribution may be used for modes corresponding to full occupancy, vertical orientation, horizontal orientation, mostly empty, then all other cases. This may reduce the number of probability distributions to about five. It will be appreciated that different groupings of patterns may be formed to result in different numbers of probability distributions.

Referring now to FIG. 11, one illustrative embodiment of a process 400 for point cloud entropy encoding using a parent mode dependent context is diagrammatically illustrated. In this example, the current node 402 has been split into eight child nodes, and the occupancy pattern 404 of the current node will be encoded using a non-binary entropy encoder 406. The non-binary entropy encoder 406 uses a probability distribution selected from one of six possible probability distributions 408. The selection is based on the parent mode-i.e., the selection is based on occupancy information from the parent node to the current node 402. The parent pattern is identified by an integer between 1 and 255.

The choice of probability distribution may be a decision tree that evaluates whether the pattern corresponds to the entire node (e.g., pattern 255), a horizontal structure (e.g., pattern 170 or 85; assuming the Z-axis is vertical), a vertical structure (e.g., pattern 3, 12, 48, 192), a sparsely populated distribution (e.g., pattern 1,2, 4, 8, 16, 32, 64, or 128; i.e., none of the sibling nodes is occupied), a semi-sparsely populated distribution (total number of occupied nodes between the current node and the sibling nodes ≦ 3), and all other cases. The example modes indicated for the different categories are merely examples. For example, the "horizontal" category may include patterns that relate to two or three occupied cubes on the same horizontal level. The "vertical" category may include patterns involving three or four occupied cubes arranged in a wall-like manner. It should also be appreciated that finer gradations may be used. For example, the "level" category may be further subdivided into levels in the upper portion of the cube and levels in the lower portion of the cube, where there is a different probability distribution for each case. Other groupings of occupancy patterns with some correlation may be made and assigned to corresponding probability distributions. Further discussion regarding invariance between pattern groupings and neighbor configurations in the context of neighbor configurations is set forth further below.

Fig. 12 shows an illustrative embodiment of a process 500 for point cloud entropy encoding using a context dependent on a neighbor configuration. This example assumes the definition of the neighbors and neighbor configuration numbers used above in connection with fig. 9. This example also assumes that each neighbor configuration has a dedicated probability distribution, which means that there are 64 different probability distributions. The current node 502 has an occupancy pattern 504 to be encoded. The probability distribution is selected based on the nodes in the vicinity of the current node 502. That is, the neighbor configuration NC in [0,63] is found and used to select the associated probability distribution.

It will be appreciated that in some embodiments, neighbor configurations may be grouped such that more than one neighbor configuration uses the same probability distribution based on similarities in patterns. In some embodiments, the process may use a different arrangement of neighbors for contextual analysis (selection) of the distribution. Additional neighbors may be added, such as eight neighbors that are diagonally adjacent on all three axes, or twelve neighbors that are diagonally adjacent on two axes. Embodiments that avoid specific neighbors may also be used, for example to avoid using neighbors that introduce additional dependencies in depth-first scans or to introduce only dependencies on specific axes in order to reduce the codec state of the large tree.

In this example, the case where NC ═ 0 is handled in a certain manner. If there are no occupied neighbors, it may indicate that the current node 502 is isolated. Thus, process 500 further checks the number of occupied child nodes of current node 502. If only one child node is occupied (i.e., the Number Occupied (NO) is equal to 1), a flag is encoded indicating that the single child node is occupied and the index of the node is coded using 3 bits. If more than one child node is occupied, then process 500 uses NC-0 probability distributions to codec the occupancy pattern.

Referring now to fig. 13, an example method 600 for decoding a bitstream of encoded point cloud data is shown in flow chart form.

In operation 602, the decoder selects one of the probability distributions based on occupancy information from one or more nodes in the vicinity of the current node. As described above, the occupancy information may be a parent pattern from the parent node to the current node (i.e., occupancy of the current node and its siblings), or it may be occupancy of neighboring nodes to the current node, which may include some of the sibling nodes. Other or additional occupancy information may be used in some embodiments.

Once the probability distribution has been selected, the decoder entropy decodes a portion of the bitstream using the selected probability distribution to reconstruct the occupancy pattern of the current node in operation 604. The occupancy pattern is used by the decoder to reconstruct the tree in order to reconstruct the encoded point cloud data. Once the point cloud data is decoded, it may be output from the decoder for use, such as for rendering views, segmentation/classification, or other applications.

In operation 606, the decoder updates the probability distribution based on the reconstructed occupancy pattern, and then if there are other nodes to decode, it moves to the next node in the buffer and returns to operation 602.

Example embodiments of the above-described method have been shown to provide compression improvements in which the increase in codec complexity is negligible. Although neighbor-based selection has higher computational complexity and greater memory usage, neighbor-based selection exhibits better compression performance than parent-mode-based selection. In some tests, the relative improvement in bits per point over the MPEG point cloud test model was between 4% and 20%. It has been noted that initializing the probability distribution based on the distribution derived using the test data results in improved performance compared to initializing with a uniform distribution.

Some of the above examples are based on a tree codec process that uses a non-binary codec to represent the occupancy pattern. New developments using binary entropy codecs are presented further below.

In one variation of neighbor-based probability distribution selection, the number of distributions may be reduced by exploiting the symmetry of the neighborhood. A structurally similar configuration with lines of symmetry can reuse the same distribution by permuting the neighborhood or permuting the pattern distribution. In other words, neighbor configurations that may use the same pattern distribution may be grouped into categories. A category that contains more than one neighbor configuration may be referred to herein as a "neighbor configuration" because one of these neighbor configurations effectively subsumes the other neighbor configurations by reflecting or permuting those other configurations.

As an example, consider eight corner patterns NC ∈ [21,22,25,26,37,38,41,42], which represent the symmetry of corner neighbor patterns, respectively. It is possible that these values of the NC correlate well with a particular but different pattern of nodes. It is also possible that these correlation patterns follow the same symmetry as the neighbor patterns. As an example, a method of reusing multiple cases where a single distribution represents NC may be implemented, the reuse being achieved by permuting the probabilities of the distributions.

The encoder derives a mode number for the node based on the occupancy of the child node. The encoder selects the distribution and permutation functions according to the neighbor configuration. The encoder reorders the probabilities included in the distributions according to a permutation function and then arithmetically encodes the pattern numbers using the permuted distributions. The update of the probability of the permutation distribution by the arithmetic coder is mapped back to the original distribution by the inverse permutation function.

The corresponding decoder first selects the same distribution and permutation functions according to the neighbor configuration. The permuted distribution is generated in the same manner as the encoder, with the permuted distribution being used by the arithmetic decoder to entropy decode the mode numbers. The bits including the pattern number are then assigned to the corresponding children, respectively.

It should be noted that the same permutation may be implemented, but without reordering the data of the distribution itself, but rather introducing a hierarchy of indirection and using a permutation function to permute the look-up of a given index in the distribution.

Alternative embodiments consider permutations of the modes themselves rather than distributions, allowing for reordering before or after entropy encoding/decoding, respectively. This approach may be more suitable for efficient implementation by bit-by-bit reordering operations. In this case, neither the encoder nor the decoder performs a re-ordering of the distribution, but modifies the operation of the coding pattern numbering toWherein c is_iIs the occupancy state of the ith sub-cell, and σ (i) is the permutation function. One such example permutation functionThe NC-22 distribution is allowed to be used for the NC-41 distribution. The permutation function can be used by the decoder to useThe occupation status of the child node is derived from the encoded pattern number.

The method for deriving the required permutation may be based on the rotational symmetry of the neighbor configuration or may be based on reflections along a particular axis. Furthermore, it is not necessary for the permutation to permute all positions according to, for example, symmetry; instead, partial permutations may be used. For example, when replacing NC 22 with NC 41, the position in the axis of symmetry may not be replaced, resulting in a mappingWhere positions 0, 2, 4, 6 are not replaced. In other embodiments, only pairs 1 and 7 are transposed.

An example of an embodiment based on rotational symmetry and reflection is provided below for the special case of an octree with six neighbors sharing a common plane with the current cube. Without loss of generality, the Z-axis extends perpendicularly with respect to the direction of viewing the figure, as shown in fig. 16. Then, the relative position of the neighbors, such as "above" (respectively "below") should be understood as being in the increasing (respectively decreasing) Z direction along the Z axis. The same comments apply to left/right along the X-axis and front/back along the Y-axis.

Fig. 16 shows three rotations 2102, 2104, and 2106 along Z, Y and the X axis, respectively. The three rotations are 90 degrees, i.e. they perform a quarter turn rotation along their respective axes.

Fig. 17 shows the invariance categories of neighbor configurations at one or several iterations of rotation 2102 along the Z-axis. This invariance represents the same statistical behavior of the point cloud geometry along any direction belonging to the XY plane. This is particularly true for the use case of a car moving on the earth's surface locally approximated by the XY plane. The horizontal configuration is a given occupancy of four neighbors (located to the left, right, front, and back of the current cube), independent of the occupancy of the upper neighbor (2202) and the lower neighbor (2204). Under rotation 2102, the four horizontal configurations 2206, 2208, 2210, and 2212 belong to the same class of invariance. Similarly, the two configurations 2214 and 2216 belong to the same category of invariance. There are only six categories of invariance under rotation 2102 (grouped under category set 2218).

The vertical configuration is a given occupancy of the two neighbors 2202 and 2204, independent of the occupancy of the four neighbors located to the left, right, front, and back of the current cube. As shown in fig. 18, there are four possible vertical configurations. Thus, if one considers invariance with respect to rotation 2102 along the Z-axis, there are 24 possible configurations, 6 × 4.

The reflection 2108 along the Z-axis is shown in fig. 16. The vertical configurations 2302 and 2304 depicted in fig. 18 belong to the same class of invariance under reflection 2108. There are three categories of invariance under reflection 2108 (grouped under category set 2306). Invariance under reflection 2108 means that in terms of point cloud geometry statistics, the behavior in the up and down directions is substantially the same. This is an accurate assumption of a moving car on the road.

If one assumes invariance under both rotation 2102 and reflection 2108, then there are 18 classes of invariance resulting from the product of the two sets 2218 and 2306. These 18 categories are shown in fig. 19.

Additional invariance is applied under two other rotations 2104 and 2106, the two configurations 2401 and 2402 belonging to the same category of invariance. Further, two configurations 2411 and 2412, two configurations 2421 and 2422, three configurations 2431, 2432 and 2433, two configurations 2441 and 2442, two configurations 2451 and 2452, and the last two configurations 2461 and 2462 all belong to the same category. Thus, invariance under the three rotations (2102, 2104, and 2106) and reflection 2108 causes 10 classes of invariance, as shown in FIG. 20.

According to the example provided above, the number of valid neighbor configurations (i.e., the class in which 64 neighbor configurations can be grouped) is any of 64, 24, 18, or 10, with or without assuming invariance under the three rotations and reflections.

Before entropy coding, the patterns undergo the same transformation (i.e. rotation and reflection) since the neighbor configuration does belong to one of the invariance classes. This preserves the statistical consistency between the unchanged neighbor configuration and the codec mode.

It should also be understood that during traversal of the tree, child nodes will have certain neighboring nodes at the same tree depth that have been previously visited and can be causally used as dependencies. Instead of consulting a parent collocated neighbor, the same hierarchy of neighbors may be used for these same hierarchy of neighbors (i.e., at the same level as the child node). Since the same hierarchy of neighbors has a parent's halved size, if any of the four immediately adjacent neighboring child nodes (i.e., the four child nodes sharing a face with the current node) are occupied, then one configuration takes into account the occupied neighbors. Thus, as will be described in more detail below, the neighbor configuration of the current node may be determined based on occupancy data of neighboring nodes of the current node and further based on occupancy data of children nodes of at least one of the neighboring nodes. Thus, the probability or probabilities associated with the respective entropy codec used for entropy coding (e.g. binary entropy coding) the occupancy pattern of the current node may be selected not only based on the occupancy data of a plurality of (same level, i.e. at the same level as the current node) neighboring nodes of the current node, but also based on the occupancy data of children of at least one (possibly all) of the plurality of (same level) neighboring nodes.

Referring now to fig. 27, there is shown a current node (i.e., its associated (sub) volume or current (sub) volume) 4000 and its six neighbors 4010, 4020, 4030, 4040, 4050 and 4060. For the current example of an octree, the neighbors of the current node may be those nodes (at the same level or depth of the tree) whose associated volumes share a face with the current volume. Other definitions of neighboring nodes are also possible. For example, the neighbors of the current node may be those nodes (at the same level or depth of the tree) as follows: its associated volume shares an edge (or vertex) with the current volume. In general, regardless of the structure of the tree, neighboring nodes may be those nodes (at the same level or depth of the tree) as follows: its associated volume intersects the current volume.

In the context of the present application, it is understood that volumes (nodes) that intersect each other are neighboring volumes (nodes). Thus, the terms "intersect with … …" and "adjacent to … …" may be considered synonymous in the context of the present application.

It is noted that the expressions "volume" and "sub-volume" may be used somewhat interchangeably in the sense that each sub-volume is itself a volume that may be subdivided into sub-volumes. In any case, the volume/sub-volume relationships can be clearly understood by specifying parent-child relationships between the involved nodes/volumes.

Assume that the scan order of the nodes is performed in a breadth first manner, in increasing X order, then in increasing Y order, and finally in increasing Z order. In doing so, three neighbors with the lowest X coordinate (i.e., neighbor 4010), the lowest Y coordinate (i.e., neighbor 4030), and the lowest Z coordinate (i.e., neighbor 4050) have been coded. Thus, if one of the three neighbors is occupied, the configuration of the occupied sub-volume associated with the occupied neighbor is known. Although the present example defines the scan order in increasing X order, then increasing Y order, and finally increasing Z order, other breadth-first scan orders may be used for this purpose.

Referring now to fig. 28, an exemplary current volume is shown in which all three neighbors that have been coded (i.e., neighbors 4010, 4030, and 4050) are occupied. The occupied sub-volumes of neighbor 4010 are sub-volumes 4011, 4012, and 4013; the occupied sub-volumes of neighbor 4030 are sub-volumes 4031, 4032 and 4033; and the occupied sub-volumes of neighbor 4050 are sub-volumes 4051 and 4052. In this example, all three neighbors that have been coded are occupied, but it should be understood that typically only two neighbors or one of the two neighbors may actually be occupied, or even none of them.

Knowledge of occupied sub-volumes that already have been coded to occupy neighbors can be used to refine the occupancy state of the neighbors in the operation of the neighbor occupancy configuration. Referring now to figure 29(a) where the neighbors 4010 have occupied sub-volumes 4014 and 4015 and none of them share a face with the current volume 4000. In this case, it may be advantageous to treat neighbor 4010 as unoccupied in the operation of the neighbor occupancy configuration. In fig. 29(b), at least one of the sub-volumes 4016 and 4017 of the neighbor 4010 shares a face with the current volume 4000. In this case, the neighbor 4010 is considered to be occupied in the operation of the neighbor occupation configuration.

Referring now to fig. 30, an example of a method 4100 of encoding a point cloud to generate a bitstream of compressed point cloud data is shown in flow chart form. A point cloud is defined as a tree structure (e.g., an octree) having a plurality of nodes that have a parent-child relationship and that represent the geometry of a volume space that is recursively split into sub-volumes and contains the points of the point cloud. The operations of method 4100 described below are performed separately for a current node associated with a (sub) volume split into other sub-volumes, each of the other sub-volumes corresponding to a sub-node of the current node. In operation 4110, an occupancy pattern of the current node is determined based on the occupancy states of the child nodes. In operation 4120, one or more probabilities (e.g., contexts) associated with the respective entropy codecs are selected for entropy encoding the occupancy patterns. The selection is based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all of) the neighboring nodes of the plurality of neighboring nodes. In operation 4130, the occupancy pattern is entropy encoded based on the selected one or more probabilities using the associated one or more entropy codecs to produce encoded data for the bitstream.

In some implementations, the method 4100 can also include an operation (not shown in fig. 30) for updating one or more selected probabilities based on the occupancy pattern.

A non-binary entropy codec may be used to entropy decode the occupancy pattern of the current node. In this case, selecting one or more probabilities in operation 4120 of method 4100 may correspond to or involve selecting a probability distribution (and associated non-binary entropy codec) for entropy coding the occupancy pattern. Updating the one or more selected probabilities may then correspond to or involve updating the selected probability distribution.

On the other hand, as will be described in more detail below, a cascade of one or more binary entropy codecs may be used to entropy codec the occupancy pattern of the current node. Accordingly, operation 4120 of method 4100 may involve: for each bit of the sequence of bits representing the occupancy pattern, a respective probability (and correspondingly, an associated entropy codec) for coding that bit is selected. The selection of the probability may be based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all) of the plurality of neighboring nodes. Furthermore, the probability may be selected based on a partial sequence of bits of the bit sequence that have been coded. In other words, for each bit of the bit sequence, a context may be selected based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all) of the plurality of neighboring nodes. Furthermore, the selection context may be based on a partial sequence of bits of the bit sequence that have been coded. In context, it may be said that operation 4120 of method 4100 involves: the context for entropy coding the occupancy pattern is selected based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all) neighboring nodes of the plurality of neighboring nodes. Then, in some implementations, the context may be updated based on the occupancy pattern.

Referring now to fig. 31, an example of a method 4200 of decoding a bitstream of compressed point cloud data to produce a reconstructed point cloud is shown in flow chart form. The point cloud is defined in a tree structure (e.g., an octree) having a plurality of nodes that have a parent-child relationship and that represent the geometry of a volume space that is recursively split into sub-volumes and contains the points of the point cloud. The operations of method 4200 described below are performed separately for a current node associated with a sub-volume that is split into other sub-volumes, each of the other sub-volumes corresponding to a sub-node of the current node. In operation 4210, one or more probabilities associated with a respective entropy codec used to entropy decode the occupancy patterns are selected. The selection is based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one of the plurality of neighboring nodes. In operation 4220, the bitstream is entropy decoded based on the selected one or more probabilities using one or more associated entropy codecs to generate a reconstructed occupancy pattern for the current node that represents occupancy of the child nodes. In some embodiments, method 4200 may further include an operation (not shown in fig. 31) of updating one or more selected probabilities based on the reconstructed occupancy pattern.

A non-binary entropy codec may be used to entropy decode the occupancy pattern of the current node. In this case, selecting one or more probabilities in operation 4210 of method 4200 may correspond to or involve selecting a probability distribution (and associated non-binary entropy codec) for entropy coding the occupancy pattern. Updating the one or more selected probabilities may then correspond to or involve updating the selected probability distribution.

In another aspect, a cascade of one or more binary entropy codecs may be used to entropy codec the occupancy pattern of the current node. Then, in the same manner as used for encoding, operation 4210 of method 4200 may involve: for each bit of the sequence of bits representing the occupancy pattern, a respective probability (and correspondingly, an associated entropy codec) for coding that bit is selected. The selection of the probability may be based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all) of the plurality of neighboring nodes. Furthermore, the probability may be selected based on a partial sequence of bits in the bit sequence that have already been coded. In other words, for each bit of the bit sequence, a context may be selected based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all) of the plurality of neighboring nodes. Furthermore, the selection context may be based on a partial sequence of bits in the bit sequence that have already been coded. In context, it may be said that operation 4210 of method 4200 involves: the context for entropy coding the occupancy pattern is selected based on occupancy data of a plurality of neighboring nodes of the current node and occupancy data of a child node of at least one (possibly all) neighboring nodes of the plurality of neighboring nodes. Then, in some embodiments, the context may be updated based on the reconstructed occupancy pattern.

In some embodiments of methods 4100 and 4200, the respective selections at operations 4120 and 4210 may be based on the proximity configuration. As described above, the proximity configuration may be determined based on occupancy data of (same level) neighboring nodes of the current node. Further, occupancy data of child nodes of at least one (and possibly all) of the plurality of neighboring nodes may be used for calculations to adapt the neighbor configuration. In particular, occupancy data for a child node of a given neighboring node of the plurality of neighboring nodes may be used to determine whether the given neighboring node should be considered occupied for purposes of computing a neighbor configuration. One example of such adaptation of the calculation of the proximity configuration will be described with reference to fig. 32.

Referring now to fig. 32, an example method 4300 for determining occupancy of a neighbor (neighboring node) in the operation of neighbor configuration is shown in flow chart form. A method is performed for a current volume to determine a neighbor configuration for the current volume. In operation 4310, neighbors of the current volume are selected. For each selected neighbor, occupancy of the neighbor is checked in operation 4330. If the neighbor is unoccupied (no in operation 4330), the method proceeds to operation 4340 and the selected neighbor will be considered unoccupied (e.g., having an occupancy bit of zero) in the operation of the neighbor occupancy configuration. That is, the occupancy of such a neighbor is considered for determining the neighbor configuration. The method then proceeds to operation 4320. If the selected neighbor is occupied (yes in operation 4330), it is checked in operation 4350 whether the neighbor has already been coded. If the neighbor has not been coded ("no" in operation 4350), the method proceeds to operation 4360 and the neighbor that has not been coded will be considered occupied in the operation of the neighbor occupancy configuration. Also for such neighbors, the occupancy of such neighbors is considered as being used for determining the neighbor configuration. The method then proceeds to operation 4320. If the selected neighbor has been coded ("yes" in operation 4350), then it is checked in operation 4370 whether at least one of the occupied sub-volumes of the neighbor that has been coded shares a face with the current volume. In general, it is checked in operation 4370 whether at least one of the occupied sub-volumes of the neighbors that have been coded intersects the current volume. If so (YES in operation 4370), the method proceeds to operation 4360 and the neighbors that have been coded will be considered occupied in the operation of the neighbor configuration. Thus, also for such neighbors, their occupancy is considered to be used for determining the neighbor configuration. Otherwise ("no" in operation 4370), the method proceeds to operation 4340 and the neighbors that have been coded will be considered unoccupied in the operation of the neighbor configuration. That is, the occupancy bit for a neighbor will be set (intentionally/artificially) to zero when determining the neighbor configuration. The method then proceeds to operation 4320. In operation 4320, it is checked whether there are neighbors of the current volume that have not been selected. If so (YES in operation 4320), the method returns to operation 4310 to select the next neighbor of the current volume. Once all neighbors have been processed ("no" in operation 4320), the neighbor configuration is calculated in operation 4380 depending on the respective occupancy (e.g., occupancy bits) of the neighbors decided in operation 4340 or in operation 4360. This operation may continue in the same manner as described above, but taking into account the occupancy of the neighbors of the current volume as determined in operations 4340 and 4360. Therein, it can be said that operation 4340 modifies the occupancy with respect to the direct determination only based on occupancy data of the respective neighbor. Also, it can be said that the determined neighbor configuration is modified with respect to the direct determination, but does not take into account occupancy data of the sub-volumes of neighbors that have been coded.

Unless a deactivation flag in the bitstream indicates that the original neighbor configuration should be used, the (modified) neighbor configuration determined using the modified occupancy of the neighbor may be used to select one or more probabilities in operation 4120 of method 4100 and operation 4210 of method 4200, respectively. This is described in more detail below.

Methods 4100, 4200 and 4300 have been shown to provide compression gains above 1% on the geometry of the point cloud relative to directly determining the neighbor configuration without considering the sub-volume occupying the neighbors that have already been coded.

It should be understood that the method described hereinabove is not limited to a neighbor (or a sub-volume of a neighbor) that shares a face with the current volume. For example, the neighbors of the current volume may be volumes that share the same hierarchy level as all those of the face or edge shared by the current volume. The criterion in operation 4370 of the method 4300 would then have to be replaced by checking whether the neighbors have occupied sub-volumes that share a face or edge with the current volume. An example of this neighbor definition is illustrated in fig. 33. As another example, the neighbors of the current volume may be volumes of the same hierarchy as all those for which the current volume shares a face, edge, or vertex. The criterion in operation 4370 of the method 4300 would then have to be replaced by checking whether the neighbors have occupied sub-volumes that share a face, edge, or vertex with the current volume. An example of this neighbor definition is illustrated in fig. 34.

In general, the neighbors of the current volume may be all those volumes of the same hierarchy that intersect the current volume. Moreover, regardless of the definition of the neighbors of the current volume, the criteria in operation 4370 of the method 4300 may be replaced by checking whether the neighbors have an occupied sub-volume that intersects the current volume. In other words, it should be understood that the methods 4100, 4200, and 4300 may be applied to any tree having nodes of associated volumes, where neighboring nodes of the current node are defined as nodes having the same depth (level) as the current node for which the associated volumes have a non-empty intersection with the current volume associated with the current node. For example, the intersection may be a face, an edge, a vertex, or any set of non-empty points. An occupied neighbor that has been coded is considered occupied in the operation of the neighbor occupancy configuration if and only if at least one of its occupied sub-nodes has an associated volume (which has a non-empty intersection with the current volume).

Referring now to FIG. 33, an adjacent volume 4070 is shown sharing an edge with the current volume 4000. In figure 33(a), neighbor 4070 has occupied sub-volumes 4071 and 4072, and none of them share an edge with current volume 4000. In this case, neighbor 4070 is considered unoccupied in the operation of the neighbor occupancy configuration. In figure 33(b), at least one of the sub-volumes 4073 and 4074 of the neighbor 4070 shares an edge with the current volume 4000. In this case, neighbor 4070 is considered occupied in the operation of the neighbor occupancy configuration.

Referring now to FIG. 34, there is shown an adjacent volume 4080 sharing a vertex with the current volume 4000. In figure 34(a) neighbor 4080 has occupied sub-volumes 4081 and 4082 and none of them share a vertex with current volume 4000. In this case, neighbor 4080 is considered unoccupied in the operation of the neighbor occupancy configuration. In figure 34(b), at least one of the sub-volumes 4083 and 4084 of the neighbor 4070 shares a vertex with the current volume 4000. In this case, neighbor 4078 is considered occupied in the operation of the neighbor occupancy configuration.

It has been observed that methods 4100, 4200 and 4300 provide compression gains of over 1% on dense virtual reality-oriented point clouds, i.e., reductions in compressed bitstream size of over 1%. There is an interesting gain with respect to the simplicity of the method.

However, on sparse point clouds captured, for example, by Lidar attached to a moving vehicle, these methods may show little or no gain (or even a slight loss in the extremely sparse point clouds). Therefore, it may be advantageous to add a flag to the bitstream that represents an adapted activation (flag value 1) or deactivation (flag value 0) of the occupancy of the neighbors of the current volume. Deactivation means that the neighbor is considered occupied/unoccupied in the operation of the neighbor occupancy configuration, regardless of the location of the neighbor's occupied child node.

Entropy coding tree occupancy patterns using binary coding

Some of the above techniques for coding tree occupancy using neighbor occupancy information are detailed in european patent application No. 18305037.6. The described embodiments focus on non-binary entropy coding using occupancy patterns, where the pattern distribution is selected based on neighbor occupancy information. However, in some cases, using a binary codec may be more efficient in terms of hardware implementation. Moreover, on-the-fly updates to many probabilities may require fast access to memory and operations within the heart of the arithmetic codec. Therefore, it may be advantageous to find methods and apparatuses for entropy coding of occupancy patterns using a binary arithmetic codec. It would be advantageous to use a binary codec if it could be done without significantly degrading compression performance while keeping away from the excessive number of contexts to track.

The use of binary codecs instead of non-binary codecs is reflected in the entropy formula:

H(X₁,X₂|Y)＝H(X₁|Y)H(X₂|Y,X₁)

wherein X ═ X₁,X₂) Is the non-binary information to be coded and Y is the context used for coding, i.e. the neighbor configuration or the selected mode distribution. To convert the non-binary codec of X into a binary codec, the information (X)₁,X₂) Splitting into information X₁And X₂These information can be separately coded without increasing entropy. For this purpose, one of the two types of information must be coded and decoded depending on the other, where X is₂Dependent on X₁. This can be extended to n bits of information in X. For example, for n-3:

H(X₁,X_2,X₃|Y)＝H(X₁|Y)H(X₂|Y,X₁)H(X₃|Y,X₁,X₂)

it should be understood that as the occupancy pattern (i.e., bit sequence X) becomes longer, there are more conditions for coding and decoding later bits in the sequence. For binary codecs (e.g. CABAC), this means that the number of contexts to be tracked and managed increases dramatically. Take an octree as an example, where the occupied pattern is an octet sequence b ═ b₀…b₇The bit sequence can be split into eight binary information bits b₀…b₇. The codec may use neighbor configuration N (or NC) to determine context. Assuming that the neighbor configuration can be reduced to 10 valid neighbor configurations by grouping the neighbor configurations into invariance classes, as described above, N is an integer belonging to {0,1,2, …,9 }. For brevity, "category of invariant neighbor configurations" may sometimes be referred to herein simply as "neighbor configurations," but it should be appreciated that this reduced number of neighbor configurations may be implemented based on category-based grouping of neighbor configurations according to invariance.

Fig. 21 illustrates splitting an eight-bit pattern or sequence into eight separate bits for binary entropy coding. It should be noted that the first bit of the sequence is encoded based on the neighbor configuration, so there are a total of ten contexts available. Based on neighbor configuration and any previously coded bits (i.e., bit b)₀) To encode the next bit of the sequence. This contains a total of 20 available contexts: 10 as from N and b₀A product of 2. Using a context selected from 1280 available contexts to pair the final bit b₇Carrying out entropy coding: as 10 from N and b from the previous coding₀、...、b₆Given the product of 128 of the partial modes. That is, for each bit, the number of contexts (i.e., possible combinations of conditions/dependencies) is the defined number of neighbor configurations (10 in this example, based on grouping 64 neighbor configurations into categories) and an ordered sequence (by 2) from n-1 previously coded bits^n-1Given) the product of the number of possible partial modes.

Thus, there are 2550 contexts in total to maintain in connection with the binary codec of the occupied mode. This is a large number of contexts to track and relative shortfalls can lead to poor performance due to context dilution, especially for later bits in the sequence.

Thus, in one aspect, the present application discloses an encoder and decoder that determines whether a context set can be reduced, and if so, applies a context reduction operation to achieve a smaller available context set for entropy codec of at least part of an occupancy pattern using a binary codec. In another aspect, the present application also discloses an encoder and decoder that apply one or more rounds of state reduction using the same context reduction operation in order to perform efficient context selection from a fixed number of contexts. In some embodiments, context reduction is applied a priori when generating a look-up table of contexts and/or algorithm conditions, which is then used by the encoder or decoder in selecting the appropriate context. The reduction is based on testable conditions that the encoder and decoder evaluate to determine from which look-up table to select or how to index/select from to obtain the selected context.

Referring now to fig. 22, an example method 3000 for coding occupancy patterns in a tree-based point cloud codec using binary coding is shown in a flow chart. Method 3000 may be implemented by an encoder or a decoder. In the case of an encoder, a codec operation is being encoded, and in the case of a decoder, a codec operation is being decoded. The encoding and decoding is context-based entropy encoding and decoding.

The example method 3000 is used for entropy coding of occupancy patterns (i.e., bit sequences) for a particular node/volume. The occupancy pattern represents the occupancy state of a sub-node (sub-volume) of the node/volume. In the case of an octree, there are eight child nodes/sub-volumes. In operation 3002, a neighbor configuration is determined. A neighbor configuration is an occupancy state for one or more volumes that are adjacent to the volume for which the occupancy pattern is to be coded. As discussed above, there are various possible embodiments for determining the neighbor configuration. In some examples, there are 10 neighbor configurations, and the neighbor configuration for the current volume is identified based on the occupancy of six volumes that share a face with the current volume.

In operation 3004, an index i of a child node of the current volume is set to 0. Then, in operation 3006, it is evaluated whether context reduction is possible. Different possible context reduction operations are discussed in more detail below. Whether context reduction is possible may be evaluated based on, for example, which bit (e.g., index value) in the bit sequence is being coded. In some cases, context reduction may be possible for later bits in the sequence rather than for the first few bits. Evaluating whether context reduction is possible may be based on, for example, neighbor configurations, since some neighbor configurations may achieve simplification. In some implementations, additional factors may be used to evaluate whether context reduction is possible. For example, an upper limit Bo may be provided as the maximum number of contexts that the binary codec can use to codec bits, and if the initial number of contexts used to codec bits is higher than Bo, context reduction is applied (otherwise context reduction is not applied) such that the number of contexts after reduction is at most Bo. Such a limit Bo may be defined in the encoder and/or decoder specifications in order to ensure that a software or hardware implementation that is capable of handling Bo contexts will always be able to encode and/or decode a point cloud without generating overflow in terms of the number of contexts. Knowing the limit Bo beforehand also allows to anticipate the complexity and memory footprint caused by the binary entropy codec, thus facilitating the design of the hardware. Typical values for Bo are from ten to several hundred.

If the context reduction is determined to be available, then in operation 3008, a context reduction operation is applied. The context reduction operation reduces the number of available contexts in the set of available contexts to a smaller set containing fewer total contexts. It will be recalled that since a context may depend on the partial mode of bits from a previous codec of a bit sequence, the number of available contexts may depend in part on the bit position in the sequence, i.e. the index. In some embodiments, prior to reduction, the number of contexts available in the set may be based on the number of neighbor configurations multiplied by the number of possible partial modes along with the bits of the previous codec. For bits at index i (where i ranges from 0 to n), the number of partial modes will be from 2ⁱAnd (4) giving.

As mentioned above, in some embodiments, the context reduction operation is performed prior to the codec, and the resulting reduced context set is the context set available to the encoder and decoder during the codec operation. The context sets used and/or selected for reduction during codec may be based on evaluating one or more conditions prior to using those reduction sets that correspond to the conditions evaluated in operation 3006 for determining the number of contexts that may be reduced. For example, in the case of a particular neighbor configuration that allows the use of a reduced context set, the encoder and/or decoder may first determine whether the neighbor configuration condition is satisfied, and then if the neighbor configuration condition is satisfied, use the corresponding reduced context set.

In operation 3010, bit b is determined based on neighbor configurations and partial patterns of previously coded bits in the bit sequence_iI.e. selecting bit b from the set (or reduced set, if any) of available contexts_iThe context of (a). The current bit is then entropy encoded by the entropy codec using the selected context in operation 3012.

In operation 3014, if the index i indicates that the bit of the current codec is the last bit in the sequence (i.e., i equals i)_max) Then the codec process proceeds to the next node. Otherwise, the index i is incremented in operation 3016, and the process returns to operation 3006.

It should be appreciated that in some embodiments, context selection may not be dependent on neighbor configuration. In some cases, it may rely only on the fractional pattern (if any) of previously coded bits in the sequence.

A simplified block diagram of a portion of an example encoder 3100 is illustrated in fig. 23. In this illustration, it is understood that the occupancy pattern 3102 is obtained as the corresponding volume is divided into child nodes and circulated through a FIFO buffer 3104 that preserves the geometry of the point cloud. The encoding of the occupied mode 3102 is illustrated as involving a concatenation of binary encoders 3106, one binary codec for each bit of the mode. Among at least some of the binary codecs 3106 are context reduction operations 3108, which operate to reduce the available context to a smaller set of available contexts.

Although fig. 23 illustrates a series of binary codecs 3106, in some embodiments, only one binary codec is used. In case more than one codec is used, the codecs may be (partially) parallelized. Considering the context dependency of one bit on the aforementioned bits in a bit sequence, the codec of a mode may not necessarily be completely parallelized, but it is possible to improve the pipelining by using cascaded binary codecs for the modes to achieve a certain degree of parallelization and speed increase.

Context reduction operations

The above example proposes that the codec process comprises a context reduction operation with respect to at least one bit of the occupied mode in order to reduce the available context set to a smaller available context set. In this sense, a "context reduction operation" may be understood as identifying and incorporating in a particular bit b_iIs considered a repetitive or redundant context. As mentioned above, the reduced context set may be determined prior to codec and may be provided to the encoder and decoder, and the encoder and decoder determine whether to use the reduced context set based on the same conditions described below for reducing the context set.

Neighbor configuration reduction by screening/masking

A first example context reduction operation involves reducing the number of neighbor configurations based on screening/masking. In principle, the neighbor configuration takes into account the occupancy state of the neighboring volume in the context selection process, on the basis that the neighboring volume helps to indicate whether the current volume or sub-volume is likely to be occupied. When decoding the bits associated with the sub-volumes in the current volume, then also take them into account for context selection; however, information from nearby sub-volumes may be more important and informative than occupancy information of neighboring volumes located on the other side of the sub-volume from the current sub-volume. In this sense, the previously decoded bits are associated with "screening" or "masking" sub-volumes of the contiguous volume. This may mean that in such cases the occupancy of the neighboring volume may be neglected, since the correlation of its occupancy state is subsumed by the occupancy state of the sub-volume between the current sub-volume and the neighboring volume, allowing to reduce the number of neighbor configurations.

Referring now to FIG. 24, an example context reduction operation based on neighbor screening is graphically illustrated. Examples relate to encoding and decoding occupancy patterns for a volume 3200. The occupancy pattern represents the occupancy states of eight sub-volumes within the volume 3200. In this example, four sub-volumes in the upper half of the volume 3200 have been codec, so the occupancy states of these four sub-volumes are known. The bits of the occupancy pattern being coded are associated with a fifth sub-volume 3204, which is located in the lower half of the volume 3200 below the four previously coded sub-volumes.

In this example, the codec includes: the context is determined based on the neighbor configuration. 10 neighbor configurations 3202 are shown. A volume 3200 containing a fifth sub-volume 3204 to be encoded is shown in light gray and indicated by reference numeral 3200. The neighbor configuration 3202 is based on an occupancy state of a volume adjacent to the volume 3200 and sharing a face with the volume 3200. The adjacent volume includes a top adjacent volume 3206.

In this example, the number of neighbor configurations may be reduced from 10 to 7 by ignoring the top neighboring volume 3206 in at least some of the configurations. As shown in fig. 24, three of the four configurations showing the top adjacent volume 3206 may be classified under an equivalent configuration not included in the top adjacent volume 3206, thereby reducing the number of neighbor configurations to a total of 7. It may still be advantageous to keep the configuration showing all six neighboring volumes, since there is no existing 5-volume neighbor configuration that can merge the 6-volume configuration (one 5 element has been eliminated), which means that even if the top neighboring volume is removed, a new 5-element neighbor configuration is generated and no overall reduction in context occurs.

In this example, the top neighboring volume 3206 may be eliminated from the neighbor configuration because the context determination for encoding the occupancy bits associated with the fifth sub-volume 3204 will have considered the occupancy states of the four previously encoded sub-volumes directly above it, which better indicate the likelihood and directionality of occupancy of the fifth sub-volume than the occupancy states of the more distant top neighboring volume 3206.

The above example of screening or masking the top contiguous volume 3206 when encoding the occupancy bits corresponding to the fifth sub-volume 3204 by a previously encoded sub-volume is merely one example. Depending on the codec order within the volume 3200, a variety of other possible screening/masking scenarios may be implemented and utilized to reduce the available neighbor configurations.

Referring now to fig. 25, a second example of screening/masking is shown. In this example, the occupancy pattern of the volume 3200 is almost completely codec. The sub-volume to be coded is the eighth sub-volume and is hidden in the figure at the bottom corner (not visible) of the back. In this case, the occupancy states of all seven other sub-volumes have been coded. Specifically, the subvolumes along the top (thus reducing the neighbor configuration to seven total) and along the right and front sides. Thus, in addition to screening the top contiguous volume, the sub-volume with the previously coded occupancy bits shields the front contiguous volume 3210 and the right contiguous volume 3212. This may allow the neighbor configuration to be reduced from seven total to five total, as illustrated.

It will be appreciated that the two foregoing examples of shielding are illustrative, and in some cases, different configurations may be incorporated to address different shielding scenarios. The context reduction operation based on masking/screening by the sub-volumes of the previous codec is general and not limited to these two examples, but it should be appreciated that the context reduction operation cannot be applied in the case of the first sub-volume to be coded, since there needs to be at least one previously coded occupancy bit associated with the nearby sub-volume in order to be used for any masking/screening.

It should also be appreciated that the degree of masking/screening that justifies neighbor configuration reduction may be different in different embodiments. In both of the above examples, all four sub-volumes sharing a face with the neighboring volume have been previously coded before considering the neighboring volume as a mask/filter and thus removing it from the neighbor configuration. In other examples, partial masking/screening may be sufficient, for example, from one previously-coded sub-volume to three previously-coded sub-volumes of the shared surface

Context reduction through special case handling

There are certain situations where context reduction can occur without loss of useful information. In the example context determination process described above, the context for encoding the occupancy bits is based on the neighbor configuration (i.e., the occupancy pattern of the volume adjacent to the current volume) and the partial pattern attributable to the occupancy of the previously encoded sub-volume in the current volume. The latter case results in 2 to be tracked with respect to the eighth bit in the occupied-mode bit sequence⁷128 contexts. Even though the neighbor configuration is reduced to a total of five, this means 640 contexts are to be tracked.

The number of contexts is enormous based on the fact that the bits of the previous codec of the bit sequence have an order and the order is relevant when evaluating the context. However, in some cases, the order may not contain useful information. For example, in neighbor configuration null (i.e., N)₁₀0), any point within the volume may be assumed to be sparsely populated, meaning that the points have a directionality that is not strong enough to justify tracking separate contexts of different occupancy patterns in sibling subvolumes. In the case of a null neighborhood, there is no local orientation or topology for the point cloud, which means that 2 of the previously coded bits based on the bit sequence can be coded^jThe conditions are reduced to j +1 conditions. I.e. for coding one of the bits of the bit sequenceThe context of decoding is based on the bits of the previous codec, but not on the ordered pattern of the bits of the previous codec, but on the sum of the bits of the previous codec. In other words, the entropy expression in this particular case can be expressed as:

H(b|n)≈H(b₀|0)H(b₁|0,b₀)H(b₂|0,b₀+b₁)…H(b₇|0,b₀+b₁+…+b₆)

in some embodiments, similar observations may be made with respect to a full neighbor configuration. In some examples, the full neighbor configuration lacks directionality, which means that the order of bits of the previous codec need not be considered in determining the context. In some examples, the context reduction operation may be applied to only some of the bits in the sequence of bits, such as some of the later bits in the sequence. In some cases, applying this context reduction operation to later bits may be conditioned on determining that earlier bits associated with previously coded sub-volumes are also all occupied.

Statistical-based context reduction

Statistical analysis can be used to reduce contexts by determining which contexts cause approximately the same statistical behavior, and then combining the contexts. This analysis can be performed a priori using test data to develop a reduced set of contexts, which are then provided to both the encoder and decoder. In some cases, analysis may be performed on the current point cloud using a two-pass codec to develop a custom reduced set of contexts for the particular point cloud data. In some such cases, a mapping from a non-reduced context set to a custom-reduced context set may be indicated to the decoder by using a dedicated syntax that is coded into a bitstream.

The two contexts can be compared by the concept of "distance". The first context c has a probability p of bit b equal to zero and the second context c 'has a probability b' equal to zero. The distance between c and c' is given by:

d(c,c’)＝|p log₂p–p’log₂p’|+|(1-p)log₂(1-p)–(1-p’)log₂(1-p’)|

using this measure of similarity (distance), the contexts can then be grouped in processes such as:

1. from M₁Context start and fix threshold level ε

2. For a given context, regrouping all contexts that are less than a threshold level ε from the given context into categories

3. Repeat 2 for all non-regrouped contexts until all non-regrouped contexts are placed into a category

4. Marking from 1 to M₂M of (A)₂The species: this results in a brute force reduction function that maps {1,2, …, M }₁]→[1,2,…,M₂]Wherein M is₁≥M₂。

A brute force reduction function for mapping a context set to a smaller context set may be stored in memory for application by the encoder/decoder as a context reduction operation during codec. The mapping may be stored as a lookup table or other data structure. For example, a brute force reduction function may be applied only to later bits in a bit sequence (pattern).

Combinations and subcombinations of context reduction operations

Three example context reduction operations are described above. Each of these context reduction operations may be applied separately and independently in some embodiments. Any two or more of these context reduction operations may be combined in some embodiments. The additional context reduction operations may be implemented alone or in combination with any one or more of the context reduction operations described above.

FIG. 26 illustrates, in flow diagram form, one example of a method 3300 of occupied-mode binary coding involving combined context reduction. Given a 10-element neighbor configuration N in {0,1,2, …,9}₁₀Method 3300 for 8-bit binary pattern b₀,b₁,…,b₇And carrying out coding and decoding. The first condition evaluated is whether the neighbor configuration is empty, i.e., N₁₀0. If the neighbor configuration is null, the bits are codec without reference to the order of the bits, as indicated with reference numeral 3302. Otherwise, coding and decoding the bit according to the normal mode until the bit b₄At bit b₄Where the encoder and decoder begin applying the brute force context reduction function BR_iThe number of contexts is reduced by mapping the context set defined by the neighbor configuration and the partial mode of the previously coded bits to a smaller context set with substantially similar statistics.

In this example, the last two bits b are masked/filtered based on using reduced neighbor configuration₆And b₇And carrying out coding and decoding.

The function may be implemented as a look-up table (LUT) for reducing the size of the context set. In a practical implementation, all reductions are taken into account by a reduction function (i.e. abbreviated LUT) that takes the context as input and provides the reduced context as output. In this example embodiment, the total number of contexts has been reduced from 2550 to 576, with each reduction function BR_iAre 70, 106, 110 and 119, respectively.

Context selection in a system with a fixed number of contexts

Each of the previously described context reduction operations may be further used in a compression system having a static (fixed) minimum number of contexts. In such a design, for a given symbol in an 8-bit binary pattern, one or more reduction operations are applied to determine a context probability model for encoding or decoding the symbol.

Influence on compression Performance

Compression gains are provided by the current implementation of the MPEG test model for point cloud codec using 10 neighbor configurations and non-binary codec. However, the above suggested use of 10 neighbor configurations using 2550 context concatenated binary codec leads to an even better improvement of compression efficiency. Even when context reduction is used (such as using the three techniques detailed above) to 576 total contexts, binary codec compression is slightly better than implementations using non-binary codecs and much better than test models. This observation has been shown to be consistent between different test point cloud data.

Referring now to fig. 14, a simplified block diagram of an example embodiment of an encoder 1100 is shown. The encoder 1100 includes a processor 1102, a memory 1104, and an encoding application 1106. The encoding application 1106 may include a computer program or application stored in the memory 1104 and containing instructions that, when executed, cause the processor 1102 to perform operations, such as those described herein. For example, the encoding application 1106 may encode a bitstream and output an encoded bitstream according to the processes described herein. It should be appreciated that the encoding application 1106 may be stored on a non-transitory computer-readable medium such as an optical disc, a flash memory device, random access memory, a hard drive, and the like. When executing instructions, the processor 1102 performs the operations and functions specified in the instructions to operate as a special purpose processor that implements the process (es) described. In some examples, this processor may be referred to as "processor circuitry".

Reference is now also made to fig. 15, which shows a simplified block diagram of an example embodiment of a decoder 1200. Decoder 1200 includes a processor 1202, a memory 1204, and a decoding application 1206. The decoding application 1206 may comprise a computer program or application stored in the memory 1204 and containing instructions that, when executed, cause the processor 1202 to perform operations, such as those described herein. It is to be appreciated that the decoding application 1206 can be stored on a computer-readable medium, such as an optical disk, a flash memory device, a random access memory, a hard drive, and so forth. When executing instructions, the processor 1202 performs the operations and functions specified in the instructions to operate as a special purpose processor implementing the process (es) described. In some examples, this processor may be referred to as "processor circuitry".

It should be appreciated that a decoder and/or encoder in accordance with the present application may be implemented in a number of computing devices, including but not limited to servers, appropriately programmed general purpose computers, machine vision systems, and mobile devices. The decoder or encoder may be implemented by software containing instructions for configuring one or more processors to perform the functions described herein. The software instructions may be stored in any suitable non-transitory computer readable memory, including CD, RAM, ROM, flash memory, etc.

It will be appreciated that the decoders and/or encoders described herein, as well as the modules, routines, processes, threads or other software components implementing the described methods/processes for configuring an encoder or decoder, may be implemented using standard computer programming techniques and languages. The application is not limited to a particular processor, computer language, computer programming specification, data structure, other such implementation details. Those skilled in the art will appreciate that the processes described may be implemented as part of computer executable code stored in volatile or non-volatile memory, as part of an Application Specific Integrated Chip (ASIC), etc.

The present application also provides a computer readable signal encoding data generated by applying an encoding process according to the present application.

Certain adaptations and modifications of the described embodiments can be made. The embodiments discussed above are therefore to be considered in all respects as illustrative and not restrictive.

57页详细技术资料下载

Method and apparatus for binary entropy encoding and decoding of point clouds

相关技术

网友询问留言