Technique and device for encoding and decoding point cloud attribute between frames

文档序号：75088 发布日期：2021-10-01 浏览：24次中文

阅读说明：本技术 帧间点云属性编解码的技术和装置 (Technique and device for encoding and decoding point cloud attribute between frames ) 是由芮世薰阿拉什·沃索基刘杉于 2020-03-20 设计创作，主要内容包括：一种帧间点云属性编码方法包括,基于与点云关联的几何数据计算运动估计不确定度值,响应于确定所述运动估计不确定度值小于阈值,识别所述点云对应的至少一个帧间最近相邻点云样本。该方法还包括,基于所述运动估计不确定度值对与所识别的至少一个帧间最近相邻点云样本相关联的至少一个时间候选点进行排序,并从所述至少一个时间候选点中提取至少一个样本属性值,其中所述至少一个样本属性值对应于所述几何数据。(An inter-frame point cloud attribute encoding method includes calculating a motion estimation uncertainty value based on geometric data associated with a point cloud, and identifying at least one inter-frame nearest neighbor point cloud sample corresponding to the point cloud in response to determining that the motion estimation uncertainty value is less than a threshold. The method also includes ranking at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value and extracting at least one sample attribute value from the at least one temporal candidate point, wherein the at least one sample attribute value corresponds to the geometric data.)

1. A method of point cloud attribute coding, performed by at least one processor, the method comprising:

calculating a motion estimation uncertainty value based on geometric data associated with the point cloud;

in response to determining that the motion estimation uncertainty value is less than a threshold, identifying at least one inter-frame nearest neighbor point cloud sample corresponding to the point cloud;

ranking at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value; and

extracting at least one sample property value from the at least one temporal candidate point, wherein the at least one sample property value corresponds to the geometric data.

2. The method of claim 1, further comprising:

based on determining that the motion estimation uncertainty value exceeds the threshold, ceasing identification of at least one inter-frame nearest neighbor point cloud sample.

3. The method of claim 1, wherein identifying the at least one inter-frame nearest neighbor point cloud sample comprises:

determining that at least one candidate nearest neighbor point cloud sample is an inter-frame sample;

determining a maximum allowed number of frames corresponding to offsets between the point cloud and the candidate nearest neighbor point cloud sample;

calculating a degree of change between the point cloud and the candidate nearest neighbor point cloud sample; and

at least one sample of the candidate interframe nearest neighbor point cloud samples with the smallest offset and the lowest degree of change is selected.

4. The method of claim 3, wherein identifying the at least one inter-frame nearest neighbor point cloud sample further comprises: reordering the selected nearest neighbor point cloud samples based on the motion estimation uncertainty value.

5. The method of claim 1, wherein the time candidate point with the shortest temporal distance to the point cloud is ranked first.

6. The method of claim 1, wherein inter-frame point cloud samples are removed from a weighted average based on determining that the motion estimation uncertainty value is above the threshold.

7. The method of claim 1, further comprising: and calculating a distance weighted average of the attribute value associated with the first point cloud, the attribute value associated with the nearest adjacent point cloud sample between the at least one frame and the attribute value associated with the nearest adjacent sample value in the at least one frame.

8. An apparatus for point cloud attribute coding, the apparatus comprising:

at least one memory for storing computer program code; and

at least one processor configured to access the at least one memory and operate in accordance with the computer program code, the computer program code comprising:

computing code for causing the at least one processor to compute a motion estimation uncertainty value based on geometric data associated with the point cloud;

identifying code for causing the at least one processor to identify at least one inter-frame nearest neighbor point cloud sample to which the point cloud corresponds in response to determining that the motion estimation uncertainty value is less than a threshold;

ordering code to cause the at least one processor to order at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value; and

extracting code for causing the at least one processor to extract at least one sample property value from the at least one temporal candidate point, wherein the at least one sample property value corresponds to the geometric data.

9. The apparatus of claim 8, further comprising:

stopping code for causing the at least one processor to stop identification of at least one inter-frame nearest neighbor point cloud sample based on determining that the motion estimation uncertainty value exceeds the threshold.

10. The apparatus of claim 8, wherein the identification code comprises:

first determining code for causing the at least one processor to determine that at least one candidate nearest neighbor point cloud sample is an inter-frame sample;

second determining code for causing the at least one processor to determine a maximum allowed number of frames corresponding to offsets between the point cloud and the candidate nearest neighbor point cloud samples;

computing code for causing the at least one processor to compute a degree of variation between the point cloud and the candidate nearest neighbor point cloud sample; and

selecting code for causing the at least one processor to select at least one sample of the candidate inter-frame nearest neighbor point cloud samples having a minimum offset and a minimum degree of variation.

11. The apparatus of claim 10, wherein the identification code further comprises:

reordering code to cause the at least one processor to reorder the selected nearest neighbor point cloud samples based on the motion estimation uncertainty value.

12. The apparatus of claim 8, wherein the time candidate point with the shortest temporal distance to the point cloud is ranked first.

13. The apparatus of claim 8, wherein inter-frame point cloud samples are removed from a weighted average based on a determination that the motion estimation uncertainty value is above the threshold.

14. The apparatus of claim 8, further comprising:

computing code for causing the at least one processor to compute a distance weighted average of an attribute value associated with a first point cloud, an attribute value associated with the at least one inter-frame nearest neighbor point cloud, and an attribute value associated with at least one intra-frame nearest neighbor sample value.

15. A non-transitory computer-readable storage medium storing instructions that cause at least one processor to:

calculating a motion estimation uncertainty value based on geometric data associated with the point cloud;

in response to determining that the motion estimation uncertainty value is less than a threshold, identifying at least one inter-frame nearest neighbor point cloud to which the point cloud corresponds;

ranking at least one temporal candidate point associated with the identified at least one interframe nearest neighbor point cloud; and

extracting at least one sample property value from the at least one temporal candidate point, wherein the at least one sample property value corresponds to the geometric data.

16. The computer-readable storage medium of claim 15, wherein the instructions further cause the at least one processor to:

based on determining that the motion estimation uncertainty value exceeds the threshold, ceasing identification of at least one inter-frame nearest neighbor point cloud sample.

17. The computer-readable storage medium of claim 15, wherein the instructions that cause the at least one processor to identify the at least one inter-frame nearest neighbor point cloud sample comprise instructions that cause at least one processor to:

determining that at least one candidate nearest neighbor point cloud sample is an inter-frame sample;

determining a maximum allowed number of frames corresponding to offsets between the point cloud and the candidate nearest neighbor point cloud sample;

calculating a degree of change between the point cloud and the candidate nearest neighbor point cloud sample; and

at least one sample of the candidate interframe nearest neighbor point cloud samples with the smallest offset and the lowest degree of change is selected.

18. The computer-readable storage medium of claim 17, wherein the instructions that cause the at least one processor to identify the at least one inter-frame nearest neighbor point cloud sample further comprise instructions that cause the at least one processor to reorder selected nearest neighbor point cloud samples based on the motion estimation uncertainty value.

19. The computer-readable storage medium of claim 15, wherein the temporal candidate point having the shortest temporal distance from the point cloud is top-ranked.

20. The computer-readable storage medium of claim 15, wherein the inter-frame point cloud samples are removed from the weighted average based on determining that the motion estimation uncertainty value is above the threshold.

FIELD

Methods and apparatus of embodiments relate to graph-based point cloud compression (G-PCC), and in particular, to methods and apparatus for inter-frame point cloud attribute encoding and decoding.

RELATED APPLICATIONS

The advanced three-dimensional (3D) representation of the world enables more immersive interactions and communications, and also enables machines to understand, interpret and roam the world. A 3D point cloud (point cloud) has become a representative for implementing such information. A number of use cases associated with point cloud data have been discovered so far, and corresponding requirements for point cloud representation and compression have been developed.

A point cloud is a set of points in 3D space, each point having associated attributes, such as color, material attributes, and the like. The point cloud may be used to reconstruct an object or scene as a collection of such points. Multiple cameras and depth sensors may be used in various environments to obtain a point cloud, which may consist of thousands to billions of points, that can truly represent the reconstructed scene.

Compression techniques need to be used to reduce the amount of data used to represent the point cloud. Therefore, there is a need for techniques for lossy compression of point clouds in real-time communication and six degrees of freedom (6 DoF) virtual reality. In addition, in a scene of dynamic drawing for autonomous driving, cultural heritage application, and the like, a technique of lossless compression of point clouds is required. The Moving Picture Experts Group (MPEG) has begun to study a compression standard that deals with geometry and attributes, which may include, for example, color and reflectivity, scalable/progressive codecs, codecs of a sequence of point clouds taken over a period of time, and random access to a subset of the point clouds.

FIG. 1A is a schematic diagram of a method for generating multiple levels of detail (LoD) in G-PCC.

As shown in fig. 1A, in the current G-PCC attribute codec, an LoD (i.e., group) of each 3D point (e.g., P0-P9) is generated based on a distance of each 3D point, and then an attribute value of the 3D point in each LoD is encoded by predicting in an order 110 based on the LoD of the 3D point, instead of an original order 105. For example, one attribute value of the 3D point P2 is predicted by calculating a distance-based weighted average of the 3D points P0, P5, and P4 that precede the 3D point P2 in encoding or decoding order.

One current anchoring method in G-PCC is as follows.

First, a change rate (variance) of a neighborhood of a 3D point is calculated to check a degree of difference of neighboring values, and if the change rate is below a threshold value, a linear interpolation process based on distances of some nearest neighboring points of a current point i is used to predict an attribute value (a)_i)_i∈0…k-1And thus the computation of the weighted mean prediction based on distance. Is provided withSet the k nearest neighbor points of the current point iDecoded/reconstructed property values for these neighboring points and settingTheir distance to the current point i. The predicted attribute valueGiven by:

note that when encoding the attributes, the geometric locations of all the point clouds are known. In addition, at the encoder and decoder, each neighbor point, along with their reconstructed attribute values, can be used as a k-dimensional tree structure to support nearest neighbor point searches on each point in the same manner.

Second, rate-distortion optimized (RDO) predictor selection is performed if the rate of change is above a threshold. In generating the LoD, a plurality of candidate predictors or candidate predictors are created based on the results of the near neighbor search. For example, when the attribute value of the 3D point P2 is encoded using prediction, the predictor index of the weighted average of the distances from the 3D point P2 to the 3D points P0, P5, and P4, respectively, is set to 0. Then, the predictor index for the distance from the 3D point P2 to the nearest neighbor point P4 is set to 1. Further, predictor indices for distances from the 3D point P2 to the next nearest neighbor points P5 and P0, respectively, are set to 2 and 3, as shown in table 1 below.

Table 1: samples of candidate predictors for attribute coding

Predictor index	Prediction value
		0	Mean value of
1	P4 (1 st nearest point)
		2	P5 (2 nd nearest point)
3	P0 (3 rd nearest point)

After creating the candidate predictors, the best predictor is selected by performing a rate-distortion optimization process, and then the selected predictor index is mapped to a Truncated Unary (TU) code whose binary number is to be arithmetically encoded. Note that in table 1, shorter TU codes will be assigned to smaller predictor indexes.

The maximum number of candidate predictors MaxNumCand is defined and encoded into the attribute header. In the current implementation, the maximum number of candidate predictors MaxNumCand is set equal to number of nearest neighbor borsin prediction +1 and used for encoding and decoding the predictor index with truncated unary binarization.

The lifting transform (lifting transform) for attribute codec in G-PCC is built on top of the above-mentioned predictive transform. The main difference between the prediction scheme and the lifting scheme is the introduction of the update operator.

FIG. 1B is an architectural diagram of Prediction/Update (P/U) boosting in G-PCC. To facilitate the prediction and update steps in the lifting, the signal must be split into two high correlation sets at each stage of the decomposition. In the lifting scheme of G-PCC, segmentation is performed by utilizing a LoD structure in which such high correlation is required between levels, and each level is constructed by nearest neighbor point search, organizing non-uniform point clouds into structured data. The P/U decomposition step of level N generates a detail signal D (N-1) and an approximation signal A (N-1), and D (N-1) and A (N-1) are further decomposed into D (N-2) and A (N-2). This step is repeated until a base layer approximation signal a (1) is obtained.

Finally, instead of encoding the input attribute signal itself, which is composed of LOD (N), …, LOD (1), D (N-2), …, D (1), A (1) are finally encoded in the lifting mechanism. Note that the application of the efficient P/U steps typically results in sparse subband "coefficients" in D (N-1), …, D (1), providing transform codec gain advantages.

Currently, the distance-based weighted mean prediction for the prediction transform described above is used as an anchor in G-PCC for the prediction step in the lifting mechanism.

In prediction and boosting of attribute codecs in G-PCC, the availability of neighbor attribute samples is important for compression efficiency, as more neighbor attribute samples may provide better prediction. Without enough neighbors for prediction, compression efficiency can suffer.

Background

Disclosure of Invention

According to various embodiments, an inter-frame point cloud attribute encoding method is performed by at least one processor and includes calculating a motion estimation uncertainty value based on geometric data associated with a point cloud, and identifying at least one inter-frame nearest neighbor point cloud sample to which the point cloud corresponds in response to determining that the motion estimation uncertainty value is less than a threshold. The method also includes ranking at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value and extracting at least one sample attribute value from the at least one temporal candidate point, wherein the at least one sample attribute value corresponds to the geometric data.

According to various embodiments, an apparatus for inter-frame point cloud attribute encoding includes at least one memory for storing computer program code, and at least one processor for accessing the at least one memory and operating in accordance with the computer program code. The computer program code includes computing code for causing the at least one processor to compute a motion estimation uncertainty value based on geometric data associated with the point cloud, and identifying code for causing the at least one processor to identify at least one inter-frame nearest neighbor point cloud sample corresponding to the point cloud in response to determining that the motion estimation uncertainty value is less than a threshold. The computer program code also includes sorting code to cause the at least one processor to sort at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value, and extracting code to cause the at least one processor to extract at least one sample attribute value from the at least one temporal candidate point, wherein the at least one sample attribute value corresponds to the geometric data.

According to various embodiments, a non-transitory computer-readable storage medium stores instructions that cause at least one processor to calculate a motion estimation uncertainty value based on geometric data associated with a point cloud and, in response to determining that the motion estimation uncertainty value is less than a threshold, identify at least one inter-frame nearest neighbor point cloud to which the point cloud corresponds. The instructions also cause the at least one processor to rank at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud and extract at least one sample attribute value from the at least one temporal candidate point, wherein the at least one sample attribute value corresponds to the geometric data.

Drawings

FIG. 1A is a schematic of the process for LoD formation in G-PCC.

FIG. 1B is a schematic diagram of the architecture of P/U boosting in G-PCC.

Fig. 2 is a block diagram of a communication system of some embodiments.

Fig. 3 is a schematic deployment of a G-PCC compressor and a G-PCC decompressor in an environment, in accordance with some embodiments.

Fig. 4 is a functional block diagram of a G-PCC compressor of some embodiments.

Fig. 5 is a functional block diagram of a G-PCC decompressor of some embodiments.

Fig. 6 is a flow diagram of an inter-frame point cloud attribute encoding method of some embodiments.

Fig. 7 is a block diagram of an apparatus for inter-frame point cloud attribute encoding of some embodiments.

FIG. 8 is a schematic diagram of a computer system suitable for implementing some embodiments.

Detailed Description

Embodiments described herein provide a method and apparatus for inter-frame point cloud attribute coding. In particular, in addition to using attribute values within the same point cloud frame, attribute values from other point cloud frames at different time instants are used for prediction in G-PCC. The method and apparatus may be used to improve prediction (also referred to as predictive transform) in Differential Pulse Code Modulation (DPCM) or lifting prediction step (also referred to as lifting transform) in G-PCC. The method and apparatus for spatio-temporal prediction can also be used for any codec with a similar structure. The method and apparatus may improve prediction performance, especially when point cloud samples are sparse in the current frame, by providing sample attribute values from corresponding locations in other frames.

Fig. 2 is a block diagram of a communication system 200 of various embodiments. The communication system 200 may include at least two terminals 210 and 220 interconnected via a network 250. For unidirectional transmission of data, the first terminal 210 may encode point cloud data at a local location for transmission to the second terminal 220 via the network 250. The second terminal 220 may receive the encoded point cloud data of the first terminal 210 from the network 250, decode the encoded point cloud data, and display the decoded point cloud data. Unidirectional data transmission may be common in media service applications and the like.

Fig. 2 further illustrates a second pair of terminals 230 and 240, the second pair of terminals 230 and 240 being configured to support bi-directional transmission of encoded point cloud data, such as during a video conference. For bi-directional transmission of data, each terminal 230 or 240 may encode point cloud data captured at a local location for transmission to another terminal via the network 250. Each terminal 230 or 240 may also receive encoded point cloud data transmitted by another terminal, may decode the encoded point cloud data, and may display the decoded point cloud data at a local display device.

In fig. 2, terminals 210 and 240 may be depicted as servers, personal computers, and smart phones, although the principles of the embodiments are not so limited. Embodiments may be used in laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. Network 250 represents any number of networks for transmitting the encoded point cloud data between terminals 210 and 240, including, for example, a wired communication network and/or a wireless communication network. The communication network 250 may exchange data in circuit-switched and/or packet-switched channels. Typical networks include telecommunications networks, local area networks, wide area networks, and/or the internet. For purposes of achieving the present description, the architecture and topology of network 250 may not be important to the operation of the embodiments, unless specifically explained below.

Fig. 3 is a schematic diagram of a deployment of G-PCC compressor 303 and G-PCC decompressor 310 in an environment, in accordance with various embodiments. The subject matter may be equally applicable to other applications that support point clouds, including, for example, video conferencing, digital television, applications that store compressed point cloud data on digital media, including CDs, DVDs, memory sticks, and the like.

The streaming media system 300 may include a capture subsystem 313. The capture subsystem 313 may include a point cloud source 301, such as a digital camera, that creates uncompressed point cloud data 302, for example. The point cloud data 302 with the higher data volume may be processed by a G-PCC compressor 303 coupled to the point cloud source 301. The G-PCC compressor 303 may include hardware, software, or a combination thereof to implement or embody aspects of the subject matter as described in detail below. The encoded point cloud data 304 has a lower data volume and may be stored on the streaming media server 305 for future use. One or more streaming clients 306 and 308 may access the streaming server 305 to extract copies 307 and 309 of the encoded point cloud data 304. Client 306 may include a G-PCC decompressor 310. The G-PCC decompressor 310 decodes the incoming copy 307 of the encoded point cloud data and creates outgoing point cloud data 311. The outgoing point cloud data 311 may be presented on a display 312 or other presentation device (not shown). In some streaming media systems, the encoded point cloud data 304, 307, and 309 may be encoded according to a video codec/compression standard. Examples of such standards include those developed by MPEG for G-PCC.

Fig. 4 is a functional block diagram of the G-PCC compressor 303 of various embodiments.

As shown in fig. 4, the G-PCC compressor 303 includes a quantizer 405, a point removal module 410, an octree encoder 415, an attribute transfer module 420, a LoD generator 425, a prediction module 430, a quantizer 435, and an arithmetic encoder 440.

The quantizer 405 receives the locations of points in the input point cloud. The position may be in (x, y, z) coordinates. Quantizer 405 further quantizes the received position using, for example, a scaling algorithm and/or a shifting algorithm.

The point removal module 410 receives the quantized positions from the quantizer 405 and removes or filters out duplicate positions from the received quantized positions.

Octree encoder 415 receives the filtered locations from point removal module 410 and encodes the received filtered locations using an octree encoding algorithm to generate occupancy symbols representing an octree of the input point cloud. The bounding box of the octree corresponding input point cloud may be any 3D shape, such as a cube.

Octree encoder 415 further reorders the received filtered positions based on the encoding of the filtered positions.

The attribute transfer module 420 receives attributes of points in the input point cloud. The properties may include, for example, the color or RGB value and/or reflectivity of each point. The attribute delivery module 420 further receives the reordered locations from the octree encoder 415.

The attribute delivery module 420 further updates the received attributes based on the received reordered locations. For example, the attribute delivery module 420 may execute one or more of the preprocessing algorithms on the received attributes. Each preprocessing algorithm includes, for example, weighting and averaging the received attributes and interpolating additional ones of the received attributes. The attribute transfer module 420 further transfers the updated attributes to the prediction module 430.

The LoD generator 425 receives the reordered locations from the octree encoder 415 and obtains the LoD for each point corresponding to the received reordered locations. Each LoD may be considered a set of points and may be obtained based on the distance of each point. For example, as shown in fig. 1A, points P0, P5, P4, and P2 may be in LoD0 of LoD, points P0, P5, P4, P2, P1, P6, and P3 may be in LoD1 of LoD, and points P0, P5, P4, P2, P1, P6, P3, P9, P8, and P7 may be in LoD2 of LoD.

The prediction module 430 receives the transmitted attributes from the attribute transmission module 420 and the obtained LoD for each point from the LoD generator 425. The prediction module 430 obtains prediction residuals (values) of the received attributes respectively by performing a prediction algorithm on the received attributes in an order based on the received LoD of each point. The prediction algorithm may include any of a variety of prediction algorithms, such as interpolation, weighted average computation, nearest neighbor algorithm, and Rate Distortion Optimization (RDO).

For example, as shown in fig. 1A, before obtaining prediction residuals for received attributes of points P1, P6, P3, P9, P8, and P7 included in LoD1 and LoD2, respectively, prediction residuals for received attributes of points P0, P5, P4, and P2 included in LoD0, respectively, may first be obtained. The prediction residual of the received property of point P2 may be obtained by calculating the distance based on a weighted average of the points P0, P5, and P4.

The quantizer 435 receives the obtained prediction residual from the prediction module 430 and quantizes the received prediction residual using, for example, a scaling algorithm and/or a shifting algorithm.

The arithmetic encoder 440 receives the occupancy symbols from the octree encoder 415 and the quantized prediction residual from the quantizer 435. The arithmetic encoder 440 performs arithmetic encoding on the received occupancy symbol and the quantized prediction residual to obtain a compressed bitstream. The arithmetic coding may include any of various entropy coding algorithms, such as context-adaptive binary arithmetic coding.

Fig. 5 is a functional block diagram of G-PCC decompressor 310 of various embodiments.

As shown in fig. 5, the G-PCC decompressor 310 includes an arithmetic decoder 505, an octree decoder 510, an inverse quantizer 515, a LoD generator 520, an inverse quantizer 525, and an inverse prediction module 530.

The arithmetic decoder 505 receives the compressed bitstream from the G-PCC compressor 303 and performs arithmetic decoding on the received compressed bitstream to obtain the occupied-symbols and the quantized prediction residual. The arithmetic decoding may include any of various entropy decoding algorithms, such as context adaptive binary arithmetic decoding.

The octree decoder 510 receives the obtained occupancy symbols from the arithmetic decoder 505 and decodes the received occupancy symbols using an octree decoding algorithm, resulting in quantized positions.

The inverse quantizer 515 receives the quantized locations from the octree decoder 510 and inverse quantizes the received quantized locations using, for example, a scaling algorithm and/or a shifting algorithm to obtain reconstructed locations of points in the input point cloud.

The LoD generator 520 receives the quantized positions from the octree decoder 510, and acquires the LoD of each point corresponding to the received quantized positions.

The inverse quantizer 525 receives the obtained quantized prediction residual and inverse quantizes the received quantized prediction residual using, for example, a scaling algorithm and/or a shifting algorithm to obtain a reconstructed prediction residual.

The inverse prediction module 530 receives the obtained reconstructed prediction residual from the inverse quantizer 525 and the obtained LoD for each point from the LoD generator 520. The inverse prediction module 530 obtains reconstruction properties of the received reconstructed prediction residuals by applying a prediction algorithm to the received reconstructed prediction residuals in order of the received LoD of each point. The prediction algorithm may include any of a variety of prediction algorithms, such as interpolation, weighted average calculation, nearest neighbor algorithm, and RDO. The reconstructed attribute is a reconstructed attribute of a point within the input point cloud.

Methods and apparatus for inter-frame point cloud attribute encoding are described in detail below. The method and apparatus may be implemented in the G-PCC compressor 303, i.e., the prediction module 430, described above. The method and apparatus may also be implemented in the G-PCC decompressor 310, i.e., the inverse prediction module 530.

Motion estimation and compensation

In some embodiments, geometry-based or joint geometry/attribute-based global/local motion estimation may be performed.

In detail, in the context of point cloud compression, when performing attribute codec, geometric information, such as the location of the point cloud, is known. Motion estimation may be performed using or in conjunction with this information to compensate for any local or global motion present in the current and reference frames.

Because performing motion estimation on sparse point cloud data may be difficult or unreliable, a motion estimation uncertainty metric, me _ uncertainty, may be obtained as a result of the motion estimation. For example, the motion estimation uncertainty metric may be based on the number of candidate target samples having similar motion match scores, or a threshold test of such scores. Each motion matching score may be obtained by a mechanism, such as a block matching process.

The motion estimation uncertainty metric me _ uncertaintiy (when greater than a predetermined threshold) may disable/enable inter prediction, or may be used in determining scaling or weighting factors in the prediction.

Modified nearest neighbor search

In some embodiments, prediction with the nearest neighbor point cloud in a similar manner as in G-PCC may consider neighboring samples from other frames as additional candidates.

The G-PCC design generates multiple LoD layers of the point cloud in the following manner. First, the original point cloud and the reference point cloud are sorted using a Morton code.

And then, sampling the original point cloud from the top LoD layer to the bottom LoD layer in sequence according to the sample distance. Then, a nearest neighbor point search is performed for each point belonging to the LoD. A neighbor list is then built for each point cloud, with geometrically closer samples appearing in the top portion of the list.

In some embodiments, the following provisions further facilitate building a nearest neighbor list with an inter-frame cloud. The definition flag interframe is used to indicate whether the nearest neighboring sample is intra or inter. A variable framenum is defined for indicating a frame number or offset in Picture Order Count (POC) from the current point cloud frame. The maximum number of nearest neighbor samples between frames maxminn is defined.

Further, whenever a new candidate point cloud sample is compared to candidate point cloud samples already in the list, a concept of distance is defined. In the context of this intra/inter hybrid prediction, variables called Temporal-to-Spatial Scale (TSScale) and Temporal-to-Spatial Offset (TSOffset) are introduced to reflect the degree of change in attribute values that is possible between frames. If these values are large, the probability of a change in the attribute values is high due to temporal distance, possible rapid motion, scene changes, etc. In this case, the proximity correlation with similar 3D coordinates to the attribute value is small.

In some embodiments, TSScale may be used to zoom in/out on the relative proximity of spatial distance in time during the nearest neighbor point search and its use at a later stage of prediction, while TSOffset may be used to add an offset to the 3D coordinates of points in the reference frame when the frame is hypothetically "merged" to select among the mixed intra and inter cloud samples. This "merging" treats the inter samples as samples in the current frame, in which case there may be multiple candidate prediction samples at the same 3D location.

In some embodiments, an option is added to reorder the order of the candidate nearest neighbor samples. The encoder (G-PCC compressor 303) may signal the way the candidate list is constructed or reflect the time dimension distance in a weighted average or weight calculation of the inter-frame candidate point cloud samples based on the confidence of the motion estimation (i.e., me _ uncertaintiy).

Application of G-PCC predictive transforms of attributes

1. RDO index coding and decoding

The above embodiments may be applied to RDO-based predictor selection as described above. Specifically, provision is provided for assigning a higher priority to a time candidate (interframe ═ 1) under certain conditions. In some embodiments, earlier temporal candidates in the nearest neighbor list are assigned a higher priority when the motion estimation uncertainty metric me _ uncertainties is lower, and vice versa. This includes removing one temporal candidate from the list of nearest neighbors when the motion estimation uncertainty measure me _ uncertaintiy is above a predetermined threshold. When there are multiple inter-frame candidates, candidate point cloud samples with closer temporal distance (as indicated by framenum) are placed earlier in the list.

The RDO encoder (G-PCC compressor 303) may track inter-frame selection and perform adaptive index order switching. In addition, the number maxminn of inter-frame candidates may be adaptively changed according to the above conditions.

2. Distance-based mean prediction

The above embodiments may be applied to the above distance weighted mean prediction. Specifically, when the motion estimation uncertainty metric me _ uncertaintiy is high, the inter-frame candidates are not included in the weighted average.

In various embodiments, the following inter and intra nearest neighbor sample values a are used_n' s (N-1, …, N) to define distance weighted mean prediction of attribute values of current point clouds

Here the weight of the nth sample can be determined as follows:

where p is the location of the current point cloud sample with attribute a, and p_nIs to have a corresponding attribute value of a_nOf the nth neighboring sample. The parameters TSScale and TSOffset are allocated as described above for the nearest neighbor point between frames. For the nearest neighbor points in the frame, TSScale is set to 1 and TSOffset is set to 0.

Fig. 6 is a flow diagram of a method 600 of inter-frame point cloud attribute encoding, according to embodiments. In some embodiments, one or more of the process blocks of fig. 6 may be performed by G-PCC decompressor 310. In some embodiments, one or more of the process blocks of fig. 6 may be performed by another device or group of devices (e.g., G-PCC compressor 303) separate from or including G-PCC decompressor 310.

Referring to fig. 6, in a first block 610, the method 600 includes calculating a motion estimation uncertainty value based on geometric data associated with a point cloud.

In a second block 620, the method 600 includes, in response to determining that the motion estimation uncertainty value is less than the threshold, identifying at least one inter-frame nearest neighbor point cloud sample to which the point cloud corresponds.

In a third block 630, the method 600 includes sorting the at least one temporal candidate point associated with the identified at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value.

In a fourth block 640, the method 600 includes extracting at least one sample property value from at least one temporal candidate point, wherein the at least one sample property value corresponds to geometric data.

The method may further include, based on determining that the motion estimation uncertainty value exceeds the threshold, ceasing identification of the at least one inter-frame nearest neighbor point cloud sample.

Identifying at least one inter-frame nearest neighbor point cloud sample may further comprise: determining that at least one candidate nearest neighbor point cloud sample is an inter-frame sample; determining a maximum allowable frame number corresponding to an offset between the point cloud and the candidate nearest neighbor point cloud sample; calculating the degree of change between the point cloud and the candidate nearest adjacent point cloud sample; and selecting at least one inter-frame nearest neighbor point cloud sample in the candidate inter-frame nearest neighbor point cloud samples with the minimum offset and the minimum change degree.

Identifying at least one inter-frame nearest neighbor point cloud sample may further comprise: reordering the selected nearest neighbor point cloud samples based on the motion estimation uncertainty value.

The method may further include ranking the temporal candidate point having the shortest temporal distance from the point cloud top.

The method may further include removing the interframe point cloud samples from the weighted average based on determining that the motion estimation uncertainty value is above the threshold.

The method may further include calculating a distance weighted average of the attribute value of the first point cloud association, the attribute value of the at least one inter-frame nearest neighbor point cloud sample association, and the attribute value of the at least one intra-frame nearest neighbor sample value association.

Although fig. 6 shows example blocks of the method 600, in some implementations, the method 600 may include more blocks, fewer blocks, different blocks, or a different arrangement of blocks than those depicted in fig. 6. Additionally or alternatively, two or more blocks of method 600 may be performed in parallel.

Further, the proposed method may be implemented by hardware modules or processing circuits (e.g. one or more processors or one or more integrated circuits). For example, at least one processor executes a program stored in a non-transitory computer readable medium to implement at least one of the proposed methods.

Fig. 7 is a schematic block diagram of an apparatus 700 for inter-frame point cloud attribute coding of some embodiments.

Referring to FIG. 7, apparatus 700 includes computation code 710, identification code 720, sorting code 730, and extraction code 740.

The computing code 710 is for causing at least one processor to compute a motion estimation uncertainty value based on the point cloud associated geometric data.

The identifying code 720 is for causing the at least one processor to identify at least one inter-frame nearest neighbor point cloud sample to which the point cloud corresponds in response to determining that the motion estimation uncertainty value is less than the threshold.

The ranking code 730 is for causing the at least one processor to rank the identified at least one temporal candidate point associated with the at least one inter-frame nearest neighbor point cloud sample based on the motion estimation uncertainty value.

The extraction code 740 is configured to cause the at least one processor to extract at least one sample property value from the at least one temporal candidate point, wherein the at least one sample property value corresponds to the geometric data.

The computing code 710 may be further for causing the at least one processor to compute a distance weighted average of the attribute value of the first point cloud association, the attribute value of the at least one inter-frame nearest neighbor point cloud association, and the attribute value of the at least one intra-frame nearest neighbor sample value association.

The computing code 710 may be further for causing the at least one processor to remove the interframe point cloud samples from the weighted average based on determining that the motion estimation uncertainty value is above the threshold.

The identifying code 720 may be further for causing the at least one processor to determine that the at least one candidate nearest neighbor point cloud sample is an inter-frame sample; determining a maximum allowable frame number corresponding to an offset between the point cloud and the candidate nearest neighbor point cloud sample; calculating the degree of change between the point cloud and the candidate nearest adjacent point cloud sample; and selecting at least one inter-frame nearest neighbor point cloud sample in the candidate inter-frame nearest neighbor point cloud samples with the minimum offset and the minimum change degree.

The identifying code 720 may be further for causing the at least one processor to reorder the selected nearest neighbor point cloud samples based on the motion estimation uncertainty value.

The identifying code 720 may be further for causing the at least one processor to cease identifying the at least one inter-frame nearest neighbor point cloud sample based on determining that the motion estimation uncertainty value exceeds the threshold.

The ranking code 730 may be further operable to cause the at least one processor to assign a top order to the temporal candidate point having the shortest temporal distance from the point cloud.

FIG. 8 is a schematic diagram of a computer system 800 suitable for implementing various embodiments.

The computer software may be encoded using any suitable machine code or computer language and may employ assembly, compilation, linking or similar mechanisms to generate instruction code. These instruction codes may be directly executed by a computer Central Processing Unit (CPU), a Graphics Processing Unit (GPU), etc. or executed through code interpretation, microcode execution, etc.

The instructions may be executed in various types of computers or computer components, including, for example, personal computers, tablets, servers, smart phones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 8 for computer system 800 are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the application. Neither should the configuration of components be interpreted as relying on or requiring any one or combination of components illustrated by the computer system 800 of the non-limiting embodiment.

The computer system 800 may include some human interface input devices. The human interface input device may be responsive to one or more human user inputs such as tactile inputs (e.g., keys, sliding action, digital glove movement), audio inputs (e.g., voice, clapping sound), visual inputs (e.g., gestures), olfactory inputs (not shown). The human interface device may also be used to capture media information that is not necessarily directly related to human conscious input, such as audio (e.g., speech, music, ambient sounds), images (e.g., scanned images, photographic images taken from still-image cameras), video (e.g., two-dimensional video, three-dimensional video, including stereoscopic video).

The human interface input device may include one or more of the following (only one of each shown): keyboard 801, mouse 802, touch pad 803, touch screen 810, joystick 805, microphone 806, scanner 807, camera 808.

The computer system 800 may also include a number of human interface output devices. The human interface output devices may stimulate one or more human user's senses, such as through tactile output, sound, light, and smell/taste. The human interface output devices may include tactile output devices (e.g., tactile feedback from the touch screen 810 or joystick 805, but tactile feedback devices that are not input devices may also be present). Such devices can be, for example, audio output devices (e.g., speakers 809, headphones (not shown)), visual output devices (e.g., screens 810 including CRT, LCD, plasma, OLED screens, each with or without touch screen input capability and each with or without tactile feedback capability-some of which are capable of outputting two-dimensional visual output or output more than three-dimensional by such means as stereoscopic output) coupled to the system bus 848 by a graphics adapter 850, virtual reality glasses (not shown), holographic displays and smoke cabinets (not shown), and printers (not shown).

The computer system 800 may also include human-accessible storage devices and their associated media, such as optical or similar media 821 including CD/DVDROM/RW 820 with CD/DVD, finger drives 822, removable hard or solid state drives 823, conventional magnetic media such as magnetic tapes and floppy disks (not shown), professional ROM/ASIC/PLD based devices such as secure dongles (not shown), and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" in connection with the subject matter disclosed herein does not include a transmission media, carrier wave, or other transitory signal.

The computer system 800 may also include an interface to which one or more communication networks 855 may be connected. The communication network 855 may be, for example, a wireless network, a wired network, an optical network. The communications network 855 may also be a local area network, a wide area network, a metropolitan area network, a vehicle networking and industrial network, a real-time network, a delay tolerant network, and the like. Examples of communication networks 855 include local area networks (e.g., ethernet, wireless LAN, cellular networks including GSM, 3G, 4G, 5G, LTE, etc.), TV wired or wireless wide area digital networks (including cable TV, satellite TV, and terrestrial broadcast TV), car networking and industrial networks (including CAN bus), and so forth. Some communication networks 855 typically require external network interface adapters connected to some general purpose data port or peripheral bus 849 (e.g., a USB port of computer system 800); other networks are typically integrated within the core of computer system 800 by connecting to a system bus as described below (e.g., network interface 854 includes an ethernet interface integrated within a PC computer system or a cellular network interface integrated within a smart phone computer system). Computer system 800 may communicate with other entities using any of networks 855. The communication may be a one-way communication, e.g. receive only (e.g. broadcast TV), one-way transmit only (e.g. from CAN-bus to some CAN-bus devices). The communication may also be two-way, such as with other computer systems using a local or wide area digital network. Each of the network 855 and the network interface 854 described above may employ certain protocols and protocol stacks.

The aforementioned human interface device, human accessible storage device, and network interface 854 may be connected to the core 840 of the computer system 800.

The core 840 may include one or more Central Processing Units (CPUs) 841, Graphics Processing Units (GPUs) 842, special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) 843, hardware accelerators for specific tasks 844, and the like. The above devices, as well as Read Only Memory (ROM)845, random access memory 846, internal mass storage 847 such as internal non-user accessible hard drives, SSDs, etc., may be connected to system bus 848. In some computer systems, the system bus 848 may be accessible via one or more physical plugs, and thus can be extended with additional CPUs, GPUs, and the like. Peripherals may be connected directly to the core's system bus 848 or may be connected to the peripheral bus 849. Architectures for peripheral bus include PCI, USB, etc.

The CPU 841, GPU 842, FPGA 843, and hardware accelerator 844 may execute instructions that, in combination, may constitute the aforementioned computer code. The computer code may be stored in ROM 845 or RAM 846. Intermediate data may also be stored in RAM 846, and permanent data may be stored in, for example, an internal mass storage device 847. Fast storage and retrieval of any storage device may be achieved through the use of cache devices that may be closely associated with one or more of CPU 841, GPU 842, mass storage 847, ROM 845, RAM 846, and the like.

Computer code may be stored on the computer readable medium for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, the functionality of the computer system 800 with architecture, and in particular the core 840, may be generated by a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software embodied in one or more tangible computer-readable media. The computer readable media may be the media above associated with a user accessible mass storage device, as well as some storage device in the kernel 840 of a non-transitory nature, such as a mass storage device 847 or ROM 845 internal to the kernel. Software implementing the embodiments disclosed herein may be stored on the devices described above and executed by the core 840. The computer readable medium may include one or more memory devices or chips, as desired. The software may cause the core 840, and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform the processes described herein, or portions thereof, including defining data structures to be stored in the RAM 846, which are modified according to the software-defined processes. Additionally or alternatively, the functionality of the computer system may be provided by circuitry (e.g., accelerator 844), which may be hardwired logically or otherwise implemented. The circuitry, when operable, may be used in place of or in conjunction with software to perform processes, or portions of processes, described herein. Where appropriate, the software may comprise logic and vice versa. Where appropriate, the computer-readable medium can include circuitry that stores software for execution (e.g., Integrated Circuits (ICs)), that implements logic that needs to be executed, or a combination thereof. The present disclosure includes any suitable combination of hardware and software.

While this disclosure has described certain non-limiting examples, there are several variations of embodiments, orders of the embodiments, and various alternatives and equivalents that may fall within the scope of the disclosure. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles disclosed herein and are thus within the spirit and scope of the present disclosure.

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：多播辅助传送

Technique and device for encoding and decoding point cloud attribute between frames

相关技术

网友询问留言