Method and device for encoding and decoding interframe point cloud attributes

文档序号：55012 发布日期：2021-09-28 浏览：36次中文

阅读说明：本技术 帧间点云属性编解码的方法和装置 (Method and device for encoding and decoding interframe point cloud attributes ) 是由芮世薰阿拉什·沃索基刘杉于 2020-03-20 设计创作，主要内容包括：一种帧间点云属性编解码的方法,由至少一个处理器执行,包括：获取与一比率成反比的值作为目标帧的运动估计的运动估计不可靠性度量,所述比率为所述目标帧中第一点云样本的数目与所述目标帧中的点云样本的数目的比率,其中,所述第一点云样本分别具有帧间参考帧中的第二点云样本。该方法还包括,确定获取的所述运动估计不可靠性度量是否大于预设阈值；基于获得的所述运动估计不可靠性度量被确定为大于所述预设阈值,跳过所述目标帧的运动补偿；以及,基于获得的所述运动估计不可靠性度量被确定为小于或等于所述预设阈值,执行所述目标帧的所述运动补偿。(A method of encoding and decoding an inter-frame point cloud attribute, performed by at least one processor, comprising: obtaining a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame as a motion estimation unreliability metric for motion estimation of the target frame, wherein the first point cloud samples respectively have second point cloud samples in an inter-frame reference frame. The method further comprises determining whether the obtained motion estimation unreliability metric is greater than a preset threshold; skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and performing the motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.)

1. A method for encoding and decoding an inter-frame point cloud attribute, performed by at least one processor, comprising:

obtaining a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame as a motion estimation unreliability metric for motion estimation of the target frame, wherein the first point cloud samples respectively have second point cloud samples in an inter-frame reference frame;

determining whether the obtained motion estimation unreliability metric is greater than a preset threshold;

skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and

performing the motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.

2. The method of claim 1, wherein a smaller value of the motion estimation unreliability metric indicates a larger uncertainty of the motion estimation, and wherein

A larger value of the motion estimation unreliability metric indicates a smaller uncertainty of the motion estimation.

3. The method of claim 1, further comprising: obtaining the second point cloud sample using a nearest neighbor search algorithm.

4. The method of claim 1, further comprising: based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold, skipping prediction of an attribute of a point in one of a plurality of point cloud samples included in the target frame.

5. The method of claim 1, further comprising: performing a prediction of an attribute of a point in one of a plurality of point cloud samples included in the target frame being motion compensated.

6. The method of claim 5, wherein the attribute comprises one or both of a color value and a reflectance value of the point.

7. The method of claim 5, wherein the performing the prediction comprises performing a predictive transform or a lifting transform on the attribute.

8. An apparatus for encoding and decoding an inter-frame point cloud attribute, comprising:

at least one memory for storing computer program code; and

at least one processor configured to access the at least one memory and operate in accordance with the computer program code, the computer program code comprising:

obtaining code for causing the at least one processor to obtain, as a motion estimation unreliability metric for motion estimation of a target frame, a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame, wherein the first point cloud samples each have a second point cloud sample of an inter-frame reference frame;

identifying code for causing the at least one processor to determine whether the obtained motion estimation unreliability metric is greater than a preset threshold;

skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and

code is executed for causing the at least one processor to perform motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.

9. The apparatus of claim 8, wherein a smaller value of the motion estimation unreliability metric indicates a larger uncertainty of the motion estimation, and

a larger value of the motion estimation unreliability metric indicates a smaller uncertainty of the motion estimation.

10. The apparatus of claim 8, wherein the obtaining code is further configured to cause the at least one processor to obtain the second point cloud sample using a nearest neighbor search algorithm.

11. The apparatus of claim 8, wherein the skipping code is further for causing the at least one processor to skip predicting the attribute of the one point in the one point cloud sample of the plurality of point cloud samples included in the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold.

12. The apparatus of claim 8, wherein the executing code is further configured to cause the at least one processor to perform prediction of an attribute of a point in one of a plurality of point cloud samples included in the target frame.

13. The apparatus of claim 12, wherein the attribute comprises one or both of a color value and a reflectance value of the point.

14. The apparatus of claim 12, wherein the executing code is further configured to cause the at least one processor to perform a predictive transform or a lifting transform on the attribute.

15. A non-transitory computer-readable storage medium storing instructions that cause at least one processor to:

determining whether the obtained motion estimation unreliability metric is greater than a preset threshold;

skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and

performing the motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.

16. The non-transitory computer-readable storage medium of claim 15, wherein a smaller value of the motion estimation unreliability metric indicates a greater uncertainty in the motion estimation, and

a larger value of the motion estimation unreliability metric indicates a smaller uncertainty of the motion estimation.

17. The non-transitory computer-readable storage medium of claim 15, wherein the instructions further cause the at least one processor to obtain the second point cloud sample using a nearest neighbor search algorithm.

18. The non-transitory computer-readable storage medium of claim 15, wherein the instructions further cause the at least one processor to skip predicting an attribute of a point in one of a plurality of point cloud samples included in the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold.

19. The non-transitory computer-readable storage medium of claim 15, wherein the instructions further cause the at least one processor to perform prediction of an attribute of a point in one of a plurality of point cloud samples included in the target frame.

20. The non-transitory computer readable storage medium of claim 19, wherein the attribute comprises one or both of a color value and a reflectance value of the point.

FIELD

Methods and apparatus of embodiments relate to graph-based point cloud compression (G-PCC), and in particular, to methods and apparatus for inter-frame point cloud attribute coding and decoding.

RELATED APPLICATIONS

The advanced three-dimensional (3D) representation of the world enables more immersive interactions and communications, and also enables machines to understand, interpret and roam the world. A 3D point cloud (point cloud) has become a representative for implementing such information. A number of use cases associated with point cloud data have been discovered so far, and corresponding requirements for point cloud representation and compression have been developed.

A point cloud is a set of points in 3D space, each point having associated attributes, such as color, material attributes, and the like. The point cloud may be used to reconstruct an object or scene as a collection of such points. Multiple cameras and depth sensors may be used in various environments to obtain a point cloud, which may consist of thousands to billions of points, that can truly represent the reconstructed scene.

Compression techniques need to be used to reduce the amount of data used to represent the point cloud. Therefore, there is a need for techniques for lossy compression of point clouds in real-time communication and six degrees of freedom (6 DoF) virtual reality. In addition, in a scene of dynamic drawing for autonomous driving, cultural heritage application, and the like, a technique of lossless compression of point clouds is required. The Moving Picture Experts Group (MPEG) has begun to study a compression standard that deals with geometry and attributes, which may include, for example, color and reflectivity, scalable/progressive codecs, codecs of a sequence of point clouds taken over a period of time, and random access to a subset of the point clouds.

FIG. 1A is a schematic diagram of a method for generating multiple levels of detail (LoD) in G-PCC.

As shown in fig. 1A, in the current G-PCC attribute codec, an LoD (i.e., group) of each 3D point (e.g., P0-P9) is generated based on a distance of each 3D point, and then an attribute value of the 3D point in each LoD is encoded by predicting in an order 110 based on the LoD of the 3D point, instead of an original order 105. For example, one attribute value of the 3D point P2 is predicted by calculating a distance-based weighted average of the 3D points P0, P5, and P4 that precede the 3D point P2 in encoding or decoding order.

One current anchoring method in G-PCC is as follows.

First, a change rate (variance) of a neighborhood of a 3D point is calculated to check a degree of difference of neighboring values, and if the change rate is below a threshold value, a linear interpolation process based on distances of some nearest neighboring points of a current point i is used to predict an attribute value (a)_i)_i∈0…k-1And thus the computation of the weighted mean prediction based on distance. Is provided withSet the k nearest neighbor points of the current point iDecoded/reconstructed property values for these neighboring points and settingTo them arrive atDistance of the front point i. The predicted attribute valueGiven by:

note that when encoding the attributes, the geometric locations of all the point clouds are known. In addition, at the encoder and decoder, each neighbor point, along with their reconstructed attribute values, can be used as a k-dimensional tree structure to support nearest neighbor point searches on each point in the same manner.

Second, rate-distortion optimized (RDO) predictor selection is performed if the rate of change is above a threshold. In generating the LoD, a plurality of candidate predictors or candidate predictors are created based on the results of the near neighbor search. For example, when the attribute value of the 3D point P2 is encoded using prediction, the predictor index of the weighted average of the distances from the 3D point P2 to the 3D points P0, P5, and P4, respectively, is set to 0. Then, the predictor index for the distance from the 3D point P2 to the nearest neighbor point P4 is set to 1. Further, predictor indices for distances from the 3D point P2 to the next nearest neighbor points P5 and P0, respectively, are set to 2 and 3, as shown in table 1 below.

Table 1: samples of candidate predictors for attribute coding

Predictor index	Prediction value
		0	Mean value of
1	P4 (1 st nearest point)
		2	P5 (2 nd nearest point)
3	P0 (3 rd nearest point)

After creating the candidate predictors, the best predictor is selected by performing a rate-distortion optimization process, and then the selected predictor index is mapped to a Truncated Unary (TU) code whose binary number is to be arithmetically encoded. Note that in table 1, shorter TU codes will be assigned to smaller predictor indexes.

The maximum number of candidate predictors MaxNumCand is defined and encoded into the attribute header. In the current implementation, the maximum number of candidate predictors MaxNumCand is set equal to number of nearest neighbor borsin prediction +1 and used for encoding and decoding the predictor index with truncated unary binarization.

The lifting transform (lifting transform) for attribute codec in G-PCC is built on top of the above-mentioned predictive transform. The main difference between the prediction scheme and the lifting scheme is the introduction of the update operator.

FIG. 1B is an architectural diagram of Prediction/Update (P/U) boosting in G-PCC. To facilitate the prediction and update steps in the lifting, the signal must be split into two high correlation sets at each stage of the decomposition. In the lifting scheme of G-PCC, segmentation is performed by utilizing a LoD structure in which such high correlation is required between levels, and each level is constructed by nearest neighbor point search, organizing non-uniform point clouds into structured data. The P/U decomposition step of level N generates a detail signal D (N-1) and an approximation signal A (N-1), and D (N-1) and A (N-1) are further decomposed into D (N-2) and A (N-2). This step is repeated until a base layer approximation signal a (1) is obtained.

Finally, instead of encoding the input attribute signal itself, which is composed of LOD (N), …, LOD (1), D (N-2), …, D (1), A (1) are finally encoded in the lifting mechanism. Note that the application of the efficient P/U steps typically results in sparse subband "coefficients" in D (N-1), …, D (1), providing transform codec gain advantages.

Currently, the distance-based weighted mean prediction for the prediction transform described above is used as an anchor in G-PCC for the prediction step in the lifting mechanism.

In prediction and boosting of attribute codecs in G-PCC, the availability of neighbor attribute samples is important for compression efficiency, as more neighbor attribute samples may provide better prediction. Without enough neighbors for prediction, compression efficiency can suffer.

Background

Disclosure of Invention

According to various embodiments, a method of inter-frame point cloud attribute coding is performed by at least one processor, comprising obtaining a value inversely proportional to a ratio of a number of first point cloud samples in a target frame to a number of point cloud samples in the target frame as a motion estimation unreliability metric for motion estimation of the target frame, wherein the first point cloud samples respectively have second point cloud samples in an inter-frame reference frame. The method further comprises determining whether the obtained motion estimation unreliability metric is greater than a preset threshold; skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and performing the motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.

In various embodiments, an apparatus for encoding and decoding an inter-frame point cloud attribute includes at least one memory for storing computer program code and at least one processor for accessing the at least one memory and operating in accordance with the computer program code. The computer program code includes code to cause the at least one processor to obtain, as a motion estimation unreliability metric for motion estimation of a target frame, a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame, wherein the first point cloud samples each have a second point cloud sample of an inter-frame reference frame. The computer program code further includes identifying code for causing the at least one processor to determine whether the obtained motion estimation unreliability metric is greater than a preset threshold; skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and executing code for causing the at least one processor to perform motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.

A non-transitory computer-readable storage medium storing instructions that cause at least one processor to obtain, as a motion estimation unreliability metric for motion estimation of a target frame, a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame, wherein the first point cloud samples each have a second point cloud sample in an inter-frame reference frame. The instructions further cause the at least one processor to determine whether the obtained motion estimation unreliability metric is greater than a preset threshold; skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than the preset threshold; and performing the motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to the preset threshold.

Drawings

FIG. 1A is a schematic of the process for LoD formation in G-PCC.

FIG. 1B is a schematic diagram of the architecture of P/U boosting in G-PCC.

Fig. 2 is a block diagram of a communication system of some embodiments.

Fig. 3 is a schematic deployment of a G-PCC compressor and a G-PCC decompressor in an environment, in accordance with some embodiments.

Fig. 4 is a functional block diagram of a G-PCC compressor of some embodiments.

Fig. 5 is a functional block diagram of a G-PCC decompressor of some embodiments.

Fig. 6 is a flow diagram of an inter-frame point cloud attribute encoding method of some embodiments.

Fig. 7 is a block diagram of an apparatus for inter-frame point cloud attribute encoding of some embodiments.

FIG. 8 is a schematic diagram of a computer system suitable for implementing some embodiments.

Detailed Description

Embodiments described herein provide a method and apparatus for inter-frame point cloud attribute coding. In particular, in addition to using attribute values within the same point cloud frame, attribute values from other point cloud frames at different time instants are used for prediction in G-PCC. The method and apparatus may be used to improve prediction (also referred to as predictive transform) in Differential Pulse Code Modulation (DPCM) or lifting prediction step (also referred to as lifting transform) in G-PCC. The method and apparatus for spatio-temporal prediction can also be used for any codec with a similar structure. The method and apparatus may improve prediction performance, especially when point cloud samples are sparse in the current frame, by providing sample attribute values from corresponding locations in other frames.

Fig. 2 is a block diagram of a communication system 200 of various embodiments. The communication system 200 may include at least two terminals 210 and 220 interconnected via a network 250. For unidirectional transmission of data, the first terminal 210 may encode point cloud data at a local location for transmission to the second terminal 220 via the network 250. The second terminal 220 may receive the encoded point cloud data of the first terminal 210 from the network 250, decode the encoded point cloud data, and display the decoded point cloud data. Unidirectional data transmission may be common in media service applications and the like.

Fig. 2 further illustrates a second pair of terminals 230 and 240, the second pair of terminals 230 and 240 being configured to support bi-directional transmission of encoded point cloud data, such as during a video conference. For bi-directional transmission of data, each terminal 230 or 240 may encode point cloud data captured at a local location for transmission to another terminal via the network 250. Each terminal 230 or 240 may also receive encoded point cloud data transmitted by another terminal, may decode the encoded point cloud data, and may display the decoded point cloud data at a local display device.

In fig. 2, terminals 210 and 240 may be depicted as servers, personal computers, and smart phones, although the principles of the embodiments are not so limited. Embodiments may be used in laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment. Network 250 represents any number of networks for transmitting the encoded point cloud data between terminals 210 and 240, including, for example, a wired communication network and/or a wireless communication network. The communication network 250 may exchange data in circuit-switched and/or packet-switched channels. Typical networks include telecommunications networks, local area networks, wide area networks, and/or the internet. For purposes of achieving the present description, the architecture and topology of network 250 may not be important to the operation of the embodiments, unless specifically explained below.

Fig. 3 is a schematic diagram of a deployment of G-PCC compressor 303 and G-PCC decompressor 310 in an environment, in accordance with various embodiments. The subject matter may be equally applicable to other applications that support point clouds, including, for example, video conferencing, digital television, applications that store compressed point cloud data on digital media, including CDs, DVDs, memory sticks, and the like.

The streaming media system 300 may include a capture subsystem 313. The capture subsystem 313 may include a point cloud source 301, such as a digital camera, that creates uncompressed point cloud data 302, for example. The point cloud data 302 with the higher data volume may be processed by a G-PCC compressor 303 coupled to the point cloud source 301. The G-PCC compressor 303 may include hardware, software, or a combination thereof to implement or embody aspects of the subject matter as described in detail below. The encoded point cloud data 304 has a lower data volume and may be stored on the streaming media server 305 for future use. One or more streaming clients 306 and 308 may access the streaming server 305 to extract copies 307 and 309 of the encoded point cloud data 304. Client 306 may include a G-PCC decompressor 310. The G-PCC decompressor 310 decodes the incoming copy 307 of the encoded point cloud data and creates outgoing point cloud data 311. The outgoing point cloud data 311 may be presented on a display 312 or other presentation device (not shown). In some streaming media systems, the encoded point cloud data 304, 307, and 309 may be encoded according to a video codec/compression standard. Examples of such standards include those developed by MPEG for G-PCC.

Fig. 4 is a functional block diagram of the G-PCC compressor 303 of various embodiments.

As shown in fig. 4, the G-PCC compressor 303 includes a quantizer 405, a point removal module 410, an octree encoder 415, an attribute transfer module 420, a LoD generator 425, a prediction module 430, a quantizer 435, and an arithmetic encoder 440.

The quantizer 405 receives the locations of points in the input point cloud. The position may be in (x, y, z) coordinates. Quantizer 405 further quantizes the received position using, for example, a scaling algorithm and/or a shifting algorithm.

The point removal module 410 receives the quantized positions from the quantizer 405 and removes or filters out duplicate positions from the received quantized positions.

Octree encoder 415 receives the filtered locations from point removal module 410 and encodes the received filtered locations using an octree encoding algorithm to generate occupancy symbols representing an octree of the input point cloud. The bounding box of the octree corresponding input point cloud may be any 3D shape, such as a cube.

Octree encoder 415 further reorders the received filtered positions based on the encoding of the filtered positions.

The attribute transfer module 420 receives attributes of points in the input point cloud. The properties may include, for example, the color or RGB value and/or reflectivity of each point. The attribute delivery module 420 further receives the reordered locations from the octree encoder 415.

The attribute delivery module 420 further updates the received attributes based on the received reordered locations. For example, the attribute delivery module 420 may execute one or more of the preprocessing algorithms on the received attributes. Each preprocessing algorithm includes, for example, weighting and averaging the received attributes and interpolating additional ones of the received attributes. The attribute transfer module 420 further transfers the updated attributes to the prediction module 430.

The LoD generator 425 receives the reordered locations from the octree encoder 415 and obtains the LoD for each point corresponding to the received reordered locations. Each LoD may be considered a set of points and may be obtained based on the distance of each point. For example, as shown in fig. 1A, points P0, P5, P4, and P2 may be in LoD0 of LoD, points P0, P5, P4, P2, P1, P6, and P3 may be in LoD1 of LoD, and points P0, P5, P4, P2, P1, P6, P3, P9, P8, and P7 may be in LoD2 of LoD.

The prediction module 430 receives the transmitted attributes from the attribute transmission module 420 and the obtained LoD for each point from the LoD generator 425. The prediction module 430 obtains prediction residuals (values) of the received attributes respectively by performing a prediction algorithm on the received attributes in an order based on the received LoD of each point. The prediction algorithm may include any of a variety of prediction algorithms, such as interpolation, weighted average computation, nearest neighbor algorithm, and Rate Distortion Optimization (RDO).

For example, as shown in fig. 1A, before obtaining prediction residuals for received attributes of points P1, P6, P3, P9, P8, and P7 included in LoD1 and LoD2, respectively, prediction residuals for received attributes of points P0, P5, P4, and P2 included in LoD0, respectively, may first be obtained. The prediction residual of the received property of point P2 may be obtained by calculating the distance based on a weighted average of the points P0, P5, and P4.

The quantizer 435 receives the obtained prediction residual from the prediction module 430 and quantizes the received prediction residual using, for example, a scaling algorithm and/or a shifting algorithm.

The arithmetic encoder 440 receives the occupancy symbols from the octree encoder 415 and the quantized prediction residual from the quantizer 435. The arithmetic encoder 440 performs arithmetic encoding on the received occupancy symbol and the quantized prediction residual to obtain a compressed bitstream. The arithmetic coding may include any of various entropy coding algorithms, such as context-adaptive binary arithmetic coding.

Fig. 5 is a functional block diagram of G-PCC decompressor 310 of various embodiments.

As shown in fig. 5, the G-PCC decompressor 310 includes an arithmetic decoder 505, an octree decoder 510, an inverse quantizer 515, a LoD generator 520, an inverse quantizer 525, and an inverse prediction module 530.

The arithmetic decoder 505 receives the compressed bitstream from the G-PCC compressor 303 and performs arithmetic decoding on the received compressed bitstream to obtain the occupied-symbols and the quantized prediction residual. The arithmetic decoding may include any of various entropy decoding algorithms, such as context adaptive binary arithmetic decoding.

The octree decoder 510 receives the obtained occupancy symbols from the arithmetic decoder 505 and decodes the received occupancy symbols using an octree decoding algorithm, resulting in quantized positions.

The inverse quantizer 515 receives the quantized locations from the octree decoder 510 and inverse quantizes the received quantized locations using, for example, a scaling algorithm and/or a shifting algorithm to obtain reconstructed locations of points in the input point cloud.

The LoD generator 520 receives the quantized positions from the octree decoder 510, and acquires the LoD of each point corresponding to the received quantized positions.

The inverse quantizer 525 receives the obtained quantized prediction residual and inverse quantizes the received quantized prediction residual using, for example, a scaling algorithm and/or a shifting algorithm to obtain a reconstructed prediction residual.

The inverse prediction module 530 receives the obtained reconstructed prediction residual from the inverse quantizer 525 and the obtained LoD for each point from the LoD generator 520. The inverse prediction module 530 obtains reconstruction properties of the received reconstructed prediction residuals by applying a prediction algorithm to the received reconstructed prediction residuals in order of the received LoD of each point. The prediction algorithm may include any of a variety of prediction algorithms, such as interpolation, weighted average calculation, nearest neighbor algorithm, and RDO. The reconstructed attribute is a reconstructed attribute of a point within the input point cloud.

Methods and apparatus for inter-frame point cloud attribute encoding are described in detail below. The method and apparatus may be implemented in the G-PCC compressor 303, i.e., the prediction module 430, described above. The method and apparatus may also be implemented in the G-PCC decompressor 310, i.e., the inverse prediction module 530.

Motion estimation and compensation

In some embodiments, geometry-based or joint geometry/attribute-based global/local motion estimation may be performed.

In detail, in the context of point cloud compression, when performing attribute codec, geometric information, such as the location of the point cloud, is known. Motion estimation may be performed using or in conjunction with this information to compensate for any local or global motion present in the current and reference frames.

Because performing motion estimation on sparse point cloud data may be difficult or unreliable, a motion estimation uncertainty metric, me _ uncertainty, may be obtained as a result of the motion estimation. For example, the motion estimation uncertainty metric may be based on the number of candidate target samples having similar motion match scores, or a threshold test of such scores. Each motion matching score may be obtained by a mechanism, such as a block matching process.

The motion estimation uncertainty metric me _ uncertaintiy (when greater than a preset threshold) may disable/enable inter prediction, or may be used in determining scaling or weighting factors in the prediction.

Modified nearest neighbor search

In some embodiments, prediction with the nearest neighbor point cloud in a similar manner as in G-PCC may consider neighboring samples from other frames as additional candidates.

The G-PCC design generates multiple LoD layers of the point cloud in the following manner. First, the original point cloud and the reference point cloud are sorted using a Morton code.

And then, sampling the original point cloud from the top LoD layer to the bottom LoD layer in sequence according to the sample distance. Then, a nearest neighbor point search is performed for each point belonging to the LoD. A neighbor list is then built for each point cloud, with geometrically closer samples appearing in the top portion of the list.

In some embodiments, the following provisions further facilitate building a nearest neighbor list with an inter-frame cloud. The definition flag interframe is used to indicate whether the nearest neighboring sample is intra or inter. A variable framenum is defined for indicating a frame number or offset in Picture Order Count (POC) from the current point cloud frame. The maximum number of nearest neighbor samples between frames maxminn is defined.

Further, whenever a new candidate point cloud sample is compared to candidate point cloud samples already in the list, a concept of distance is defined. In the context of this intra/inter hybrid prediction, variables called Temporal-to-Spatial Scale (TSScale) and Temporal-to-Spatial Offset (TSOffset) are introduced to reflect the degree of change in attribute values that is possible between frames. If these values are large, the probability of a change in the attribute values is high due to temporal distance, possible rapid motion, scene changes, etc. In this case, the proximity correlation with similar 3D coordinates to the attribute value is small.

In some embodiments, TSScale may be used to zoom in/out on the relative proximity of spatial distance in time during the nearest neighbor point search and its use at a later stage of prediction, while TSOffset may be used to add an offset to the 3D coordinates of points in the reference frame when the frame is hypothetically "merged" to select among the mixed intra and inter cloud samples. This "merging" treats the inter samples as samples in the current frame, in which case there may be multiple candidate prediction samples at the same 3D location.

In some embodiments, an option is added to reorder the order of the candidate nearest neighbor samples. The encoder (G-PCC compressor 303) may signal the way the candidate list is constructed or reflect the time dimension distance in a weighted average or weight calculation of the inter-frame candidate point cloud samples based on the confidence of the motion estimation (i.e., me _ uncertaintiy).

In various embodiments, a method of calculating a motion estimation unreliability metric, me _ uncertaintiy, includes calculating an inverse scale value thereof, such as a ratio of the number of current target frame point cloud samples having (i.e., matching or corresponding to) at least one inter-frame reference point cloud sample from other frames to the number of point cloud samples in the current target frame. At least one inter-frame reference point cloud sample from other frames may be found by the nearest neighbor search of the suggested modified inter-frame point cloud described above. The smaller the ratio, the more uncertain the motion estimation is. This is because, as an output of the suggested modified nearest neighbor search, the temporal prediction candidate points per spatio-temporal distance metric described in the present disclosure will be included in the nearest neighbor list. Having many such candidate points in the list results in a large number of ratios and means that the current target frame and reference frame are reasonably well matched by explicit motion compensation or due to the static motionless nature of the scene and objects.

In one example, the encoder and decoder (G-PCC decompressor 310) may agree to skip motion compensation for the current or target frame based on the motion estimation unreliability metric being greater than a preset threshold. This process is possible because the modified nearest neighbor search is performed at both the encoder and decoder. Therefore, no additional overhead signaling is required. This is especially true for G-PCC. This process may adaptively specify portions of the entire point cloud data, where motion compensation is performed prior to prediction/transformation.

Application of G-PCC predictive transforms of attributes

1. RDO index coding and decoding

The above embodiments may be applied to RDO-based predictor selection as described above. Specifically, provision is provided for assigning a higher priority to a time candidate (interframe ═ 1) under certain conditions. In some embodiments, the temporal candidates are assigned a higher priority by adding the temporal candidates earlier to the list of nearest neighbors when the motion estimation uncertainty measure me _ uncertainties is lower, and vice versa. This includes removing one temporal candidate from the list of nearest neighbors when the motion estimation uncertainty measure me _ uncertaintiy is above a preset threshold. When there are multiple inter-frame candidates, candidate point cloud samples with closer temporal distance (as indicated by framenum) are placed earlier in the list.

The RDO encoder (G-PCC compressor 303) and decoder (G-PCC decompressor 310) may track inter-frame selection and perform adaptive index order switching in a synchronous manner. In addition, the number maxminn of inter-frame candidates may be adaptively changed according to the above conditions.

2. Distance-based mean prediction

The above embodiments may be applied to the above distance weighted mean prediction. Specifically, when the motion estimation uncertainty metric me _ uncertaintiy is high, the inter-frame candidates are not included in the weighted average.

In various embodiments, the following inter and intra nearest neighbor sample values a are used_n' s (N-1, …, N) to define distance weighted mean prediction of attribute values of current point clouds

Here the weight of the nth sample can be determined as follows:

where p is the location of the current point cloud sample with attribute a, and p_nIs to have a corresponding attribute value of a_nOf the nth neighboring sample. The parameters TSScale and TSOffset are allocated as described above for the nearest neighbor point between frames. For the nearest neighbor points in the frame, TSScale is set to 1 and TSOffset is set to 0.

Fig. 6 is a flow diagram of a method 600 of inter-frame point cloud attribute encoding, according to embodiments. In some embodiments, one or more of the process blocks of fig. 6 may be performed by G-PCC decompressor 310. In some embodiments, one or more of the process blocks of fig. 6 may be performed by another device or group of devices (e.g., G-PCC compressor 303) separate from or including G-PCC decompressor 310.

Referring to fig. 6, in a first block 610, the method 600 includes obtaining, as a motion estimation unreliability metric for motion estimation of a target frame, a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame, wherein the first point cloud samples respectively have second point cloud samples in an inter-frame reference frame.

In a second block 620, the method 600 includes determining whether the obtained motion estimation unreliability metric is greater than a preset threshold.

In a third block 630, the method 600 includes skipping motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than a preset threshold.

In a fourth block 640, the method 600 includes performing motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to a preset threshold.

A smaller value of the motion estimation unreliability metric may indicate a larger uncertainty of the motion estimation, whereas a larger value of the motion estimation unreliability metric may indicate a smaller uncertainty of the motion estimation.

The method may also include obtaining a plurality of second point cloud samples using a nearest neighbor search algorithm.

The method may also include skipping prediction of an attribute of a point in one of the plurality of point cloud samples included in the target frame based on the obtained motion estimation unreliability metric being determined to be greater than a preset threshold.

The method may also include performing a prediction of an attribute of a point in one of a plurality of point cloud samples included in the target frame being motion compensated.

The attribute may include one or both of a color value and a reflectance value for the point.

Performing the prediction may include performing a predictive transform or a lifting transform on the property.

Although fig. 6 shows example blocks of the method 600, in some implementations, the method 600 may include more blocks, fewer blocks, different blocks, or a different arrangement of blocks than those depicted in fig. 6. Additionally or alternatively, two or more blocks of method 600 may be performed in parallel.

Further, the proposed method may be implemented by hardware modules or processing circuits (e.g. one or more processors or one or more integrated circuits). For example, at least one processor executes a program stored in a non-transitory computer readable medium to implement at least one of the proposed methods.

Fig. 7 is a schematic block diagram of an apparatus 700 for inter-frame point cloud attribute coding of some embodiments.

Referring to FIG. 7, apparatus 700 includes fetch code 710, identification code 720, skip code 730, and execute code 740.

The obtaining code 710 is configured to cause the at least one processor to obtain, as a motion estimation unreliability metric for motion estimation of a target frame, a value inversely proportional to a ratio of a number of first point cloud samples in the target frame to a number of point cloud samples in the target frame, wherein the first point cloud samples respectively have second point cloud samples of an inter-frame reference frame.

The identifying code 720 is for causing the at least one processor to determine whether the obtained motion estimation unreliability metric is greater than a preset threshold.

Skip code 730 is for causing at least one processor to skip motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be greater than a preset threshold.

Code 740 is executed for causing at least one processor to perform motion compensation of the target frame based on the obtained motion estimation unreliability metric being determined to be less than or equal to a preset threshold.

The obtaining code 710 may also be for causing the at least one processor to obtain a plurality of second point cloud samples using a nearest neighbor search algorithm.

The skipping code 730 may also be for causing the at least one processor to skip predicting an attribute of a point in one of the plurality of point cloud samples included in the target frame based on the obtained motion estimation unreliability metric being determined to be greater than a preset threshold.

Executing the code 740 may also be for causing the at least one processor to perform prediction of an attribute of a point in one of a plurality of point cloud samples included in the motion compensated target frame.

The attribute may include one or both of a color value and a reflectance value of the dot.

Executing code 740 may also be for causing the at least one processor to predictively transform or uptransform the attribute.

FIG. 8 is a schematic diagram of a computer system 800 suitable for implementing various embodiments.

The computer software may be encoded using any suitable machine code or computer language and may employ assembly, compilation, linking or similar mechanisms to generate instruction code. These instruction codes may be directly executed by a computer Central Processing Unit (CPU), a Graphics Processing Unit (GPU), etc. or executed through code interpretation, microcode execution, etc.

The instructions may be executed in various types of computers or computer components, including, for example, personal computers, tablets, servers, smart phones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 8 for computer system 800 are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the application. Neither should the configuration of components be interpreted as relying on or requiring any one or combination of components illustrated by the computer system 800 of the non-limiting embodiment.

The computer system 800 may include some human interface input devices. The human interface input device may be responsive to one or more human user inputs such as tactile inputs (e.g., keys, sliding action, digital glove movement), audio inputs (e.g., voice, clapping sound), visual inputs (e.g., gestures), olfactory inputs (not shown). The human interface device may also be used to capture media information that is not necessarily directly related to human conscious input, such as audio (e.g., speech, music, ambient sounds), images (e.g., scanned images, photographic images taken from still-image cameras), video (e.g., two-dimensional video, three-dimensional video, including stereoscopic video).

The human interface input device may include one or more of the following (only one of each shown): keyboard 801, mouse 802, touch pad 803, touch screen 810, joystick 805, microphone 806, scanner 807, camera 808.

The computer system 800 may also include a number of human interface output devices. The human interface output devices may stimulate one or more human user's senses, such as through tactile output, sound, light, and smell/taste. The human interface output devices may include tactile output devices (e.g., tactile feedback from the touch screen 810 or joystick 805, but tactile feedback devices that are not input devices may also be present). Such devices can be, for example, audio output devices (e.g., speakers 809, headphones (not shown)), visual output devices (e.g., screens 810 including CRT, LCD, plasma, OLED screens, each with or without touch screen input capability and each with or without tactile feedback capability-some of which are capable of outputting two-dimensional visual output or output more than three-dimensional by such means as stereoscopic output) coupled to the system bus 848 by a graphics adapter 850, virtual reality glasses (not shown), holographic displays and smoke cabinets (not shown), and printers (not shown).

The computer system 800 may also include human-accessible storage devices and their associated media, such as optical or similar media 821 including CD/DVDROM/RW 820 with CD/DVD, finger drives 822, removable hard or solid state drives 823, conventional magnetic media such as magnetic tapes and floppy disks (not shown), professional ROM/ASIC/PLD based devices such as secure dongles (not shown), and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" in connection with the subject matter disclosed herein does not include a transmission media, carrier wave, or other transitory signal.

The computer system 800 may also include an interface to which one or more communication networks 855 may be connected. The communication network 855 may be, for example, a wireless network, a wired network, an optical network. The communications network 855 may also be a local area network, a wide area network, a metropolitan area network, a vehicle networking and industrial network, a real-time network, a delay tolerant network, and the like. Examples of communication networks 855 include local area networks (e.g., ethernet, wireless LAN, cellular networks including GSM, 3G, 4G, 5G, LTE, etc.), TV wired or wireless wide area digital networks (including cable TV, satellite TV, and terrestrial broadcast TV), car networking and industrial networks (including CAN bus), and so forth. Some communication networks 855 typically require external network interface adapters connected to some general purpose data port or peripheral bus 849 (e.g., a USB port of computer system 800); other networks are typically integrated within the core of computer system 800 by connecting to a system bus as described below (e.g., network interface 854 includes an ethernet interface integrated within a PC computer system or a cellular network interface integrated within a smart phone computer system). Computer system 800 may communicate with other entities using any of networks 855. The communication may be a one-way communication, e.g. receive only (e.g. broadcast TV), one-way transmit only (e.g. from CAN-bus to some CAN-bus devices). The communication may also be two-way, such as with other computer systems using a local or wide area digital network. Each of the network 855 and the network interface 854 described above may employ certain protocols and protocol stacks.

The aforementioned human interface device, human accessible storage device, and network interface 854 may be connected to the core 840 of the computer system 800.

The core 840 may include one or more Central Processing Units (CPUs) 841, Graphics Processing Units (GPUs) 842, special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) 843, hardware accelerators for specific tasks 844, and the like. The above devices, as well as Read Only Memory (ROM)845, random access memory 846, internal mass storage 847 such as internal non-user accessible hard drives, SSDs, etc., may be connected to system bus 848. In some computer systems, the system bus 848 may be accessible via one or more physical plugs, and thus can be extended with additional CPUs, GPUs, and the like. Peripherals may be connected directly to the core's system bus 848 or may be connected to the peripheral bus 849. Architectures for peripheral bus include PCI, USB, etc.

The CPU 841, GPU 842, FPGA 843, and hardware accelerator 844 may execute instructions that, in combination, may constitute the aforementioned computer code. The computer code may be stored in ROM 845 or RAM 846. Intermediate data may also be stored in RAM 846, and permanent data may be stored in, for example, an internal mass storage device 847. Fast storage and retrieval of any storage device may be achieved through the use of cache devices that may be closely associated with one or more of CPU 841, GPU 842, mass storage 847, ROM 845, RAM 846, and the like.

Computer code may be stored on the computer readable medium for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, the functionality of the computer system 800 with architecture, and in particular the core 840, may be generated by a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software embodied in one or more tangible computer-readable media. The computer readable media may be the media above associated with a user accessible mass storage device, as well as some storage device in the kernel 840 of a non-transitory nature, such as a mass storage device 847 or ROM 845 internal to the kernel. Software implementing the embodiments disclosed herein may be stored on the devices described above and executed by the core 840. The computer readable medium may include one or more memory devices or chips, as desired. The software may cause the core 840, and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform the processes described herein, or portions thereof, including defining data structures to be stored in the RAM 846, which are modified according to the software-defined processes. Additionally or alternatively, the functionality of the computer system may be provided by circuitry (e.g., accelerator 844), which may be hardwired logically or otherwise implemented. The circuitry, when operable, may be used in place of or in conjunction with software to perform processes, or portions of processes, described herein. Where appropriate, the software may comprise logic and vice versa. Where appropriate, the computer-readable medium can include circuitry that stores software for execution (e.g., Integrated Circuits (ICs)), that implements logic that needs to be executed, or a combination thereof. The present disclosure includes any suitable combination of hardware and software.

While this disclosure has described certain non-limiting examples, there are several variations of embodiments, orders of the embodiments, and various alternatives and equivalents that may fall within the scope of the disclosure. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles disclosed herein and are thus within the spirit and scope of the present disclosure.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：可变端点用户界面渲染

Method and device for encoding and decoding interframe point cloud attributes

相关技术

网友询问留言