Depth map processing method, video reconstruction method and related device

文档序号：142521 发布日期：2021-10-22 浏览：25次中文

阅读说明：本技术 深度图处理方法、视频重建方法及相关装置 (Depth map processing method, video reconstruction method and related device ) 是由盛骁杰魏开进于 2020-04-20 设计创作，主要内容包括：深度图处理方法、视频重建方法及相关装置,其中所述方法包括：从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图；获取包含所述当前视频帧的时域上预设窗口的视频帧序列；获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括：像素置信度对应的第一滤波系数权重值；基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。采用上述方案可以提高深度图在时域上的稳定性。(A depth map processing method, a video reconstruction method and a related device are provided, wherein the method comprises the following steps: acquiring a depth map to be processed from an image combination of a current video frame of a multi-angle free visual angle, wherein the image combination of the current video frame of the multi-angle free visual angle comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations; acquiring a video frame sequence of a preset window on a time domain containing the current video frame; obtaining a window filter coefficient value corresponding to each video frame in the sequence of video frames, the window filter coefficient value being generated by weight values for at least two dimensions, including: a first filter coefficient weight value corresponding to the pixel confidence; and filtering the pixels corresponding to the positions in the depth map to be processed according to a preset filtering mode based on the window filtering coefficient values corresponding to the video frames to obtain the depth values of the pixels corresponding to the positions in the depth map to be processed after filtering. By adopting the scheme, the stability of the depth map in the time domain can be improved.)

1. A depth map processing method, comprising:

acquiring a depth map to be processed from an image combination of a current video frame of a multi-angle free visual angle, wherein the image combination of the current video frame of the multi-angle free visual angle comprises a plurality of angle-synchronous texture maps and depth maps with corresponding relations;

acquiring a video frame sequence of a preset window on a time domain containing the current video frame;

obtaining a window filter coefficient value corresponding to each video frame in the sequence of video frames, the window filter coefficient value being generated by weight values for at least two dimensions, including: obtaining a first filter coefficient weight value corresponding to the pixel confidence coefficient by adopting the following method: obtaining confidence values of pixels corresponding to positions in the depth map to be processed and each second depth map, and determining a first filter coefficient weight value corresponding to the confidence values, wherein: the second depth map is a depth map with the same visual angle as the depth map to be processed in each video frame of the video frame sequence;

and filtering the pixels corresponding to the positions in the depth map to be processed according to a preset filtering mode based on the window filtering coefficient values corresponding to the video frames to obtain the depth values of the pixels corresponding to the positions in the depth map to be processed after filtering.

2. The depth map processing method according to claim 1, wherein the weighting values of the window filter coefficients further comprise: at least one of a second filter coefficient weight value corresponding to the frame distance and a third filter coefficient weight value corresponding to the pixel similarity; obtaining the second filter coefficient weight value and the third filter coefficient weight value by adopting the following modes:

acquiring a frame distance between each video frame in the video frame sequence and the current video frame, and determining a second filter coefficient weight value corresponding to the frame distance;

and obtaining similarity values of the texture maps corresponding to the second depth maps and pixels corresponding to the positions in the texture maps corresponding to the depth maps to be processed, and determining third filter coefficient weight values corresponding to the similarity values.

3. The depth map processing method according to claim 1 or 2, wherein the obtaining the confidence value of the pixel corresponding to the position in the depth map to be processed and in each second depth map comprises at least one of:

acquiring the depth map to be processed and the depth map within a preset visual angle range around the visual angle corresponding to each second depth map to obtain a third depth map corresponding to the visual angle, and determining the confidence values of pixels corresponding to the positions in the depth map to be processed and each second depth map based on the third depth map corresponding to each visual angle;

and determining the confidence values of the pixels corresponding to the positions in the depth map to be processed and the second depth maps on the basis of the spatial consistency between the pixels corresponding to the positions in the depth map to be processed and the second depth maps and the pixels in the peripheral preset region in the depth map where the pixels are located.

4. The depth map processing method according to claim 3, wherein determining the confidence value of the pixel corresponding to the position in the depth map to be processed and the position in each second depth map based on the third depth map of each corresponding view comprises:

acquiring a texture map corresponding to the depth map to be processed and texture maps corresponding to the second depth maps, and mapping texture values at corresponding positions in the texture map corresponding to the depth map to be processed and the texture maps corresponding to the second depth maps to corresponding positions in texture maps corresponding to the third depth maps of corresponding view angles according to depth values of pixels at corresponding positions in the depth map to be processed and the second depth maps to obtain mapping texture values corresponding to the third depth maps of corresponding view angles;

and respectively matching the mapping texture value with the actual texture value of the corresponding position in the texture map corresponding to the third depth map of each corresponding view angle, and determining the confidence value of the pixel corresponding to the position in the depth map to be processed and each second depth map based on the distribution interval of the matching degree of the texture value corresponding to the third depth map of each corresponding view angle.

5. The depth map processing method according to claim 3, wherein determining the confidence value of the pixel corresponding to the position in the depth map to be processed and the position in each second depth map based on the third depth map of each corresponding view comprises:

mapping pixels corresponding to the positions in the to-be-processed depth map and the second depth maps to third depth maps of corresponding view angles to obtain mapping depth values of the pixels corresponding to the positions in the third depth maps of the corresponding view angles;

and matching the mapping depth value of the pixel at the corresponding position in the third depth map of each corresponding view angle with the actual depth value of the pixel at the corresponding position in the third depth map of each corresponding view angle, and determining the confidence value of the pixel at the corresponding position in the depth map to be processed and each second depth map based on the distribution interval of the matching degree of the depth value corresponding to the third depth map of each corresponding view angle.

6. The depth map processing method according to claim 3, wherein determining the confidence value of the pixel corresponding to the position in each of the to-be-processed depth map and the second depth map based on the third depth map of each corresponding view comprises:

respectively acquiring depth values of pixels corresponding to positions in the to-be-processed depth map and the second depth maps, respectively mapping to corresponding pixel positions in the third depth map corresponding to the view angle according to the depth values, acquiring and reflecting the depth values of the corresponding pixel positions in the third depth map corresponding to the view angle to the corresponding pixel positions in the to-be-processed depth map and the second depth maps, and obtaining corresponding mapping pixel positions of the third depth map corresponding to the view angle in the to-be-processed depth map and the second depth maps;

and respectively calculating the pixel distance between the actual pixel position of the pixel at the corresponding position in the to-be-processed depth map and each second depth map and the pixel position of the mapping pixel obtained by reflection of the third depth map at the corresponding view angle, and determining the confidence value of the pixel at the corresponding position in the to-be-processed depth map and each second depth map based on the distribution interval of the calculated pixel distances.

7. The depth map processing method according to claim 3, wherein the determining the confidence value of the pixel corresponding to the position in the depth map to be processed and the second depth map based on the spatial consistency between the pixel corresponding to the position in the depth map to be processed and the pixel in the surrounding preset region in the depth map where the pixel is located comprises at least one of:

matching pixels corresponding to positions in the depth map to be processed and in each second depth map with depth values of pixels in a peripheral preset area in the depth map where the pixels are located, and respectively determining confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map based on the matching degree of the depth values and the number of the pixels of which the matching degree meets a preset pixel matching degree threshold;

and matching the weighted average values of the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the pixels in the peripheral preset area in the depth map where the pixels are located, and respectively determining the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps based on the matching degrees of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the corresponding weighted average values.

8. The depth map processing method according to claim 2, wherein the filtering, based on the window filter coefficient value corresponding to each video frame, the depth value of the pixel corresponding to the position in the depth map to be processed according to a preset filtering manner to obtain the filtered depth value of the pixel corresponding to the position in the depth map to be processed, includes:

taking the product of the first filter coefficient weight value and at least one of the second filter coefficient weight value and the third filter coefficient weight value, or a weighted average value as a window filter coefficient value corresponding to each video frame;

and calculating the weighted average value of the products of the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the window filter coefficient values corresponding to the video frames to obtain the filtered depth values of the pixels corresponding to the positions in the to-be-processed depth map.

9. The depth map processing method according to claim 1 or 2, wherein the current video frame is located at an intermediate position of the sequence of video frames.

10. A depth map processing method, comprising:

acquiring a video frame sequence of a preset window on a time domain containing the current video frame;

obtaining a window filter coefficient value corresponding to each video frame in the sequence of video frames, the window filter coefficient value being generated by weight values for at least two dimensions, including: obtaining a first filter coefficient weight value corresponding to the pixel confidence coefficient by adopting the following method: acquiring the depth map to be processed and the depth map within a preset visual angle range around the visual angle corresponding to each second depth map to obtain a third depth map corresponding to the visual angle, and determining the confidence values of pixels corresponding to the positions in the depth map to be processed and each second depth map based on the third depth map corresponding to each visual angle; determining a first filter coefficient weight value corresponding to the confidence value;

and filtering the pixels corresponding to the positions in the depth map to be processed according to a preset filtering mode based on the corresponding window filtering coefficient values of the video frames to obtain the filtered depth values of the pixels corresponding to the positions in the depth map to be processed.

11. A method for video reconstruction, comprising:

acquiring image combinations of video frames of a multi-angle free visual angle, parameter data corresponding to the image combinations of the video frames and virtual viewpoint position information based on user interaction, wherein the image combinations of the video frames comprise a plurality of angle-synchronous groups of texture maps and depth maps with corresponding relations;

-obtaining a filtered depth map using the depth map processing method of any of claims 1-9;

selecting a texture map and a filtered depth map of a corresponding group in the image combination of the video frame at the moment of user interaction according to the virtual viewpoint position information and parameter data corresponding to the image combination of the video frame and a preset rule;

and based on the virtual viewpoint position information and parameter data corresponding to texture maps and depth maps of corresponding groups in image combinations of the video frames at the user interaction time, performing combined rendering on the texture maps and the filtered depth maps of the corresponding groups in the image combinations of the video frames at the selected user interaction time to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

12. A depth map processing apparatus, comprising:

the depth map acquisition unit is suitable for acquiring a depth map to be processed from an image combination of a current video frame of a multi-angle free visual angle, wherein the image combination of the current video frame of the multi-angle free visual angle comprises a plurality of angle-synchronous groups of texture maps and depth maps with corresponding relations;

a frame sequence obtaining unit, adapted to obtain a video frame sequence including a preset window in a time domain of the current video frame;

a window filter coefficient value obtaining unit adapted to obtain a window filter coefficient value corresponding to each video frame in the sequence of video frames, the window filter coefficient value being generated by weight values of at least two dimensions, including: the window filter coefficient value obtaining unit includes: a first filtering coefficient weight value obtaining subunit, adapted to obtain confidence values of pixels corresponding to positions in the to-be-processed depth map and in each second depth map, and determine a first filtering coefficient weight value corresponding to the confidence value, where: the second depth map is a depth map with the same visual angle as the depth map to be processed in each video frame of the video frame sequence;

and the filtering unit is suitable for filtering the pixels corresponding to the positions in the depth map to be processed according to a preset filtering mode based on the window filtering coefficient values corresponding to the video frames to obtain the depth values of the pixels corresponding to the positions in the depth map to be processed after filtering.

13. The depth map processing apparatus according to claim 12, wherein the window filter coefficient value obtaining unit further includes at least one of:

a second filtering coefficient weight value obtaining subunit, adapted to obtain a frame distance between each video frame in the sequence of video frames and the current video frame, and determine a second filtering coefficient weight value corresponding to the frame distance;

and the third filtering coefficient weight value obtaining subunit is suitable for obtaining similarity values of corresponding pixels in the texture map corresponding to each second depth map and the texture map corresponding to the depth map to be processed, and determining a third filtering coefficient weight value corresponding to the similarity value.

14. A video reconstruction system, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is suitable for acquiring an image combination of a video frame of a multi-angle free visual angle, parameter data corresponding to the image combination of the video frame and virtual viewpoint position information based on user interaction, wherein the image combination of the video frame comprises a plurality of angle-synchronous groups of texture maps and depth maps with corresponding relations;

a filtering module adapted to filter a depth map in the video frame;

the selection module is suitable for selecting a texture map and a filtered depth map of a corresponding group in the image combination of the video frame at the moment of user interaction according to the virtual viewpoint position information and parameter data corresponding to the image combination of the video frame and a preset rule;

the image reconstruction module is suitable for performing combined rendering on the texture map and the filtered depth map of the corresponding group in the image combination of the video frame at the selected user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination of the video frame at the user interaction moment to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment;

wherein the filtering module comprises:

a frame sequence obtaining unit, adapted to obtain a video frame sequence including a preset window in a time domain of the current video frame;

15. The video reconstruction system of claim 14, wherein the window filter coefficient value obtaining unit further comprises at least one of:

16. An electronic device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1 to 9 or the method of claim 10 or 11.

17. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions when executed perform the method of any one of claims 1 to 9 or the steps of the method of claim 10 or 11.

Technical Field

The embodiment of the specification relates to the technical field of video processing, in particular to a depth map processing method, a video reconstruction method and a related device.

Background

The 6Degree of Freedom (6 DoF) technology is a technology for providing a high Degree of Freedom viewing experience, and a user can adjust a viewing angle by interactive operation during viewing, so that the user can view from a desired free viewpoint.

In a wide range of scenes, such as sports games, achieving high-degree-of-freedom viewing through Depth Image Based Rendering (DIBR) technology is a solution with great potential and feasibility. Compared with the point cloud reconstruction scheme with respect to the defects of the quality and stability of the reconstructed viewpoint, the DIBR technology can approach the quality of the originally acquired viewpoint in the quality of the reconstructed viewpoint.

In the DIBR scheme, the stability of the depth map in the temporal domain has a significant impact on the quality of the final reconstructed image.

Disclosure of Invention

In view of this, in order to improve the stability of the depth map in the time domain, an aspect of the embodiments of the present specification provides a depth map processing method and a related apparatus.

In order to improve the image quality of a reconstructed video, another aspect of the embodiments of the present specification provides a video reconstruction method and a related apparatus.

First, an embodiment of the present specification provides a depth map generation method, including:

acquiring a video frame sequence of a preset window on a time domain containing the current video frame;

Optionally, the weighting values of the window filter coefficients further include: at least one of a second filter coefficient weight value corresponding to the frame distance and a third filter coefficient weight value corresponding to the pixel similarity; obtaining the second filter coefficient weight value and the third filter coefficient weight value by adopting the following modes:

acquiring similarity values of corresponding pixels in the texture maps corresponding to the second depth maps and the texture map corresponding to the depth map to be processed, and determining a third filter coefficient weight value corresponding to the similarity values;

optionally, the obtaining the confidence value of the pixel corresponding to the position in the depth map to be processed and in each second depth map includes at least one of:

Optionally, the determining the confidence values of the pixels corresponding to the positions in the depth map to be processed and in the second depth maps based on the third depth maps of the respective corresponding viewing angles includes:

Optionally, the determining the confidence value of the pixel corresponding to the position in the depth map to be processed and each second depth map based on the third depth map of each corresponding view includes:

Optionally, the determining the confidence values of the pixels corresponding to the positions in the depth map to be processed and the second depth maps based on the spatial consistency between the pixels corresponding to the positions in the depth map to be processed and the second depth maps and the pixels in the peripheral preset region in the depth map where the pixels are located includes at least one of:

Optionally, the filtering, based on the window filter coefficient value corresponding to each video frame, the depth value of the pixel corresponding to the position in the depth map to be processed according to a preset filtering manner, to obtain the filtered depth value of the pixel corresponding to the position of the depth map to be processed, includes:

Optionally, the current video frame is located at an intermediate position of the sequence of video frames.

An embodiment of the present specification provides another depth map processing method, including:

acquiring a video frame sequence of a preset window on a time domain containing the current video frame;

obtaining a window filter coefficient value corresponding to each video frame in the sequence of video frames, the window filter coefficient value being generated by weight values for at least two dimensions, including: obtaining a first filter coefficient weight value corresponding to the pixel confidence coefficient by adopting the following method: acquiring the depth map to be processed and the depth map within a preset visual angle range around the visual angle corresponding to each second depth map to obtain a third depth map corresponding to the visual angle, and determining the confidence values of pixels corresponding to the positions in the depth map to be processed and each second depth map based on the third depth map corresponding to each visual angle; determining a first filter coefficient weight value corresponding to the confidence value;

An embodiment of the present specification further provides a video reconstruction method, including:

obtaining a filtered depth map by using the depth map processing method of any one of the embodiments;

An embodiment of the present specification further provides a depth map processing apparatus, including:

a frame sequence obtaining unit, adapted to obtain a video frame sequence including a preset window in a time domain of the current video frame;

Optionally, the window filter coefficient value obtaining unit further includes at least one of:

An embodiment of the present specification further provides a video reconstruction system, including:

a filtering module adapted to filter a depth map in the video frame;

wherein the filtering module comprises:

a frame sequence obtaining unit, adapted to obtain a video frame sequence including a preset window in a time domain of the current video frame;

Optionally, the window filter coefficient value obtaining unit further includes at least one of:

The present specification further provides an electronic device, including a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the method according to any one of the foregoing embodiments.

The present specification also provides a computer readable storage medium, on which computer instructions are stored, and the computer instructions execute the steps of the method of any one of the foregoing embodiments when executed.

Compared with the prior art, the technical scheme of the embodiment of the specification has the following beneficial effects:

by adopting the depth map processing scheme of the embodiment of the present specification, a depth map to be processed is obtained from an image combination of a current video frame with a multi-angle free view angle and is filtered in a time domain, for a depth map, namely a second depth map, with the same view angle as that of the depth map to be processed in each video frame in a video frame sequence with a preset window in the time domain, a confidence value of a pixel corresponding to a position in the depth map to be processed and in each second depth map is obtained, a first filter coefficient weight value corresponding to the confidence value is determined, a window filter coefficient value is generated based on the first filter coefficient weight value, a pixel corresponding to a position in the depth map to be processed is filtered according to a preset filtering manner based on a window filter coefficient value corresponding to each video frame, and a depth value after the pixel corresponding to a position in the depth map to be processed is filtered is obtained, the influence of unreliable depth values introduced into the depth map to be processed and each second depth map on the filtering result can be avoided, and therefore the stability of the depth map in the time domain can be improved.

By adopting the video reconstruction scheme of the embodiment of the description, the depth map in the video frame is filtered in the time domain, and for the depth map with the same visual angle as the depth map to be processed, namely the second depth map, in each video frame in the video frame sequence of the window preset in the time domain, the influence on the filtering result caused by the introduction of unreliable depth values in the depth map to be processed and each second depth map can be avoided by acquiring the confidence values of pixels corresponding to the positions in the depth map to be processed and each second depth map, determining the first filtering coefficient weight value corresponding to the confidence value, and adding the third filtering coefficient weight value to the window filtering coefficient value, so that the stability of the depth map in the time domain can be improved, and the image quality of the reconstructed video can be improved.

Drawings

Fig. 1 is a schematic diagram of an architecture of a data processing system in a specific application scenario to which the depth map processing method according to the embodiment of the present disclosure is applied.

Fig. 2 is a diagram illustrating a multi-angle freeview data generation process in an embodiment of the present specification.

Fig. 3 is a schematic diagram of a user side processing 6DoF video data in an embodiment of the present specification.

Fig. 4 is a schematic diagram of an input and an output of a video reconstruction system in an embodiment of the present disclosure.

Fig. 5 is a flowchart of a depth map processing method in an embodiment of the present disclosure.

Fig. 6 is a schematic diagram of a sequence of video frames in an application scenario in an embodiment of the present specification.

Fig. 7 is a flowchart of a method for obtaining confidence values of pixels corresponding to positions in the depth map to be processed and in each second depth map in an embodiment of the present specification.

Fig. 8 is a flowchart of another method for obtaining confidence values of pixels corresponding to positions in the depth map to be processed and in each second depth map in the embodiment of the present specification.

Fig. 9 is a flowchart of another method for obtaining confidence values of pixels corresponding to positions in the depth map to be processed and in each second depth map in the embodiment of the present specification.

Fig. 10 is a schematic diagram of a scene in an embodiment of the present specification, where a confidence value of a pixel corresponding to a position in a depth map to be processed and in each second depth map is determined.

Fig. 11 is a flowchart of a video reconstruction method in an embodiment of the present disclosure.

Fig. 12 is a schematic structural diagram of a depth map processing apparatus in an embodiment of the present specification.

Fig. 13 is a schematic structural diagram of a video reconstruction system in an embodiment of the present disclosure.

Fig. 14 is a schematic structural diagram of an electronic device in an embodiment of the present specification.

Detailed Description

In a traditional video playing scene, for example, a playing video of a sports game, a user often can only watch the game through one viewpoint position in the watching process, and cannot freely switch the viewpoint position to watch a game picture or a game process at different viewpoint positions, so that the user cannot experience the feeling of watching the game while moving the viewpoint on the scene.

The 6Degree of Freedom (6Degree of Freedom, 6DoF) technology can provide high-Degree-of-Freedom viewing experience, a user can adjust the viewing angle of video viewing through an interactive means in the viewing process, and the video can be viewed from the free viewpoint angle desired to be viewed, so that the viewing experience is greatly improved.

To realize 6DoF scenes, Free-D playback technology, depth map-based DIBR technology, and the like are available. The Free-D playback technology is used for expressing a 6DoF image by acquiring point cloud data of a scene through multi-angle shooting, and reconstructing the 6DoF image or a video based on the point cloud data. And the 6DoF video generation method based on the depth map performs combined rendering on the texture map and the depth map of the corresponding group in the image combination of the video frame at the moment of user interaction based on the virtual viewpoint position and the parameter data corresponding to the texture map and the depth map of the corresponding group, and performs reconstruction on the 6DoF image or the video.

Compared with the point cloud reconstruction scheme with respect to the defects of the quality and stability of the reconstructed viewpoint, the DIBR technology can approach the quality of the originally acquired viewpoint in the quality of the reconstructed viewpoint. In order to improve the quality of the reconstructed viewpoint, the depth map is filtered in the time domain in the DIBR process, so that the time domain stability of the depth map reconstruction is improved.

However, the inventors have found that in some cases the quality of the depth map generated after filtering is degraded instead. For this reason, the inventors have conducted further intensive research and experiments, and found that depth values of pixels in a depth map participating in filtering in a time domain are not always reliable, and the depth values are added to the filtering in a unreliable manner, which may result in degradation of quality of a finally generated depth map after filtering.

Referring to fig. 1, which is a schematic structural diagram of a data processing system in a specific application scenario, where an arrangement scenario of the data processing system of a basketball game is shown, a data processing system 10 includes an acquisition array 11 composed of a plurality of acquisition devices, a data processing device 12, a cloud server cluster 13, a play control device 14, a play terminal 15, and an interaction terminal 16. By adopting the data processing system 10, the reconstruction of the multi-angle free visual angle video can be realized, and a user can watch the multi-angle free visual angle video with low time delay.

Referring to fig. 1, a basketball frame on the left side is used as a core viewpoint, the core viewpoint is used as a circle center, and a sector area located on the same plane as the core viewpoint is used as a preset multi-angle free viewing angle range. Each acquisition device in the acquisition array 11 can be arranged in different positions of a field acquisition area in a fan shape according to the preset multi-angle free visual angle range, and can synchronously acquire video data streams from corresponding angles in real time.

In order not to affect the operation of the acquisition device, the data processing device 12 may be located in a field non-acquisition area, which may be regarded as a field server. The data processing device 12 may send a stream pulling instruction to each acquisition device in the acquisition array 11 through a wireless local area network, and each acquisition device in the acquisition array 11 transmits an obtained video data stream to the data processing device 12 in real time based on the stream pulling instruction sent by the data processing device 12.

When the data processing device 12 receives a video frame capture instruction, capture a plurality of frame images of the synchronous video frames from the video frames at the specified frame time in the received multiple video data streams, and upload the plurality of obtained synchronous video frames at the specified frame time to the cloud server cluster 13.

Correspondingly, the cloud server cluster 13 uses the received frame images of multiple synchronous video frames as an image combination, determines parameter data corresponding to the image combination and depth data of each frame image in the image combination, and performs frame image reconstruction on a preset virtual viewpoint path based on the parameter data corresponding to the image combination, pixel data and depth data of a preset frame image in the image combination to obtain corresponding multi-angle free view video data, where the multi-angle free view video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

In an implementation, the cloud server cluster 13 may store the pixel data and the depth data of the image combination in the following manner:

generating a stitched image corresponding to a frame time based on the pixel data and the depth data of the image combination, wherein the stitched image comprises a first field and a second field, the first field comprises the pixel data of a preset frame image in the image combination, and the second field comprises the second field of the depth data of the preset frame image in the image combination. The obtained spliced image and the corresponding parameter data can be stored in a data file, and when the spliced image or the parameter data needs to be obtained, the spliced image or the parameter data can be read from the corresponding storage space according to the corresponding storage address in the header file of the data file.

Then, the playing control device 14 may insert the received multi-angle free-viewing video data into a data stream to be played, and the playing terminal 15 receives the data stream to be played from the playing control device 14 and plays the data stream in real time. The playing control device 14 may be a manual playing control device or a virtual playing control device. In specific implementation, a special server capable of automatically switching video streams can be set as a virtual play control device to control data sources. A director control apparatus such as a director can be used as a play control apparatus in the embodiments of the present specification.

When the image reconstruction instruction from the interactive terminal 16 is received by the cloud server cluster 13, the stitched image of the preset frame image in the corresponding image combination and the parameter data corresponding to the corresponding image combination may be extracted and transmitted to the interactive terminal 16.

The interactive terminal 16 determines interactive frame time information based on trigger operation, sends an image reconstruction instruction containing the interactive frame time information to the server cluster 13, receives a spliced image and corresponding parameter data of a preset frame image in an image combination corresponding to the interactive frame time returned from the server cluster 13 at the cloud, determines virtual viewpoint position information based on the interactive operation, selects corresponding pixel data and depth data and corresponding parameter data in the spliced image according to a preset rule, performs combined rendering on the selected pixel data and depth data, reconstructs to obtain multi-angle free view video data corresponding to the virtual viewpoint position at the interactive frame time, and plays the multi-angle free view video data.

Generally speaking, the entities in the video will not be completely still, for example, using the data processing system, during the basketball game, the entities collected by the collecting array, such as the players, the basketball, the referees, etc., are mostly in motion. Accordingly, both texture data and pixel data in the image combination of the captured video frames vary continuously over time.

In order to improve the quality of the generated multi-angle free-view video image, in order to solve the above problem, the cloud server cluster 13 may perform temporal filtering on the depth map for generating the multi-angle free-view video. For example, for a depth map to be processed, corresponding filter coefficients may be set for performing temporal filtering based on similarity between a texture map of the depth map to be processed and a texture map of a depth map having the same view angle as the depth map to be processed in a temporal domain.

The inventor finds that pixel values in the to-be-processed depth map actually acquired or pixel values in a depth map having the same view angle as the to-be-processed depth map may be erroneous, for example, some entities are blocked under the view angle, and therefore, the depth values of the pixels in the depth map participating in filtering may be unreliable themselves, and the unreliable depth values are added to the filtering, which may result in the quality of the final depth map generated after the filtering being degraded. In view of the above problem, in the depth map processing scheme in the embodiment of the present specification, for a depth map to be processed in a current video frame in a video frame sequence of a preset window in a time domain and a depth map in each video frame having a same view angle as that of the depth map to be processed in the current video frame, a confidence value of a pixel corresponding to a position in the depth map is considered, and a filter coefficient weight value corresponding to the confidence value is added to a filter coefficient value of the preset window in the time domain, so that an unreliable depth value in the depth map to be processed in the video frame sequence of the preset window and an unreliable depth value in the depth map in each video frame having the same view angle as that of the depth map to be processed in the current video frame can be prevented from affecting a filtering result, and stability of the depth map in the time domain can be improved.

In order to make those skilled in the art better understand and implement the depth map processing scheme and the video reconstruction scheme in the embodiments of the present specification, a brief description will be given below to the principle of obtaining 6DoF video based on DIBR.

Firstly, video data or image data can be acquired through acquisition equipment, and depth map calculation is performed, which mainly comprises three steps, respectively: Multi-Camera Video Capturing, Camera intra-Camera parameters Calculation (Camera Parameter Estimation), and Depth Map Calculation (Depth Map prediction). For multi-camera acquisition, it is desirable that the video acquired by the various cameras be frame-level aligned. Referring to fig. 2 in combination, a Texture map (Texture Image)21 may be obtained by video capture by multiple cameras (step S21); by the Camera extrinsic Parameter calculation (step S22), the Camera parameters (Camera Parameter)22, that is, the Parameter data in the following text, including the Camera intrinsic Parameter data and the Camera extrinsic Parameter data, can be obtained; by the Depth Map calculation (step S23), the Depth Map (Depth Map)23 can be obtained. After the above three steps are completed, the texture map collected from the multiple cameras, all the camera parameters and the depth map of each camera are obtained. These three portions of data may be referred to as data files in multi-angle freeview video data, and may also be referred to as 6-degree-of-freedom video data (6DoF video data). With the 6-Degree-of-Freedom video data, the user terminal can generate a virtual viewpoint according to a virtual 6-Degree-of-Freedom (DoF) position, thereby providing a 6DoF video experience.

The 6DoF video data as well as indicative data, which may also be referred to as Metadata (Metadata), may be compressed and transmitted to the user side. The user side can obtain the 6DoF expression of the user side, namely 6DoF video data and the metadata according to the received data, and then 6DoF rendering is carried out on the user side.

The metadata may be used to describe a data schema of the 6DoF video data, and specifically may include: stitching Pattern metadata (Stitching Pattern metadata) indicating storage rules of pixel data and depth data of a plurality of images in a stitched image; edge protection metadata (Padding pattern metadata), which may be used to indicate the way edge protection is performed in the stitched image, and Other metadata (Other metadata). The metadata may be stored in a header file.

With reference to fig. 3, based on 6DoF video data (including camera parameters 31, texture and depth maps 32, and metadata 33), there is, among other things, interactive behavior data 34 at the user end. With this data, a Depth Image-Based Rendering (DIBR) Based 6DoF Rendering (step S30) may be used to generate an Image of a virtual viewpoint at a specific 6DoF position 35 generated according to user behavior, that is, according to a user instruction, to determine a virtual viewpoint at the 6DoF position corresponding to the instruction.

In the video reconstruction system or the DIBR application software adopted in the embodiments of the present specification, the camera parameters, the texture map, the depth map, and the 6DoF position of the virtual camera may be received as inputs, and the generated texture map and the depth map at the virtual 6DoF position may be output at the same time. The 6DoF position of the virtual camera is the aforementioned 6DoF position determined from the user behavior. The DIBR application software may be software that implements virtual viewpoint-based image reconstruction in the embodiments of the present specification.

In a DIBR software employed in an embodiment of the present specification, with reference to fig. 4 in combination, DIBR software 40 may receive as input camera parameters 41, a texture map 42, a depth map 43, and 6DoF position data 44 of a virtual camera, may generate a texture map and a depth map at a virtual 6DoF position through a step of generating a texture map S41 and a step of generating a depth map S42, and output the generated texture map and depth map at the same time.

The input depth map may be processed, e.g. filtered in the temporal domain, before generating the texture map and the depth map at the virtual 6DoF positions.

A depth map processing method that can improve the stability of a depth map in the time domain, which is adopted in the embodiments of the present specification, is described in detail below with reference to the accompanying drawings.

Referring to the flowchart of the depth map processing method shown in fig. 5, the following steps may be specifically adopted to perform filtering processing on the depth map:

s51, obtaining the depth map to be processed from the image combination of the current video frame of the multi-angle free visual angle, wherein the image combination of the current video frame of the multi-angle free visual angle comprises a plurality of angle synchronous texture maps and depth maps with corresponding relations.

S52, obtaining a video frame sequence containing the preset window of the current video frame on the time domain.

In a specific implementation, after a current video frame including a depth map to be processed is obtained, a video frame sequence including a temporal preset window of the current video frame may be obtained. As shown in the schematic diagram of the video frame sequence shown in fig. 6, a T-th frame in the video sequence is set as a current video frame, a preset window size D is equal to 2N +1 frames in a time domain, and the current video frame is located at a middle position of the video frame sequence captured by the preset window in the time domain, so that a video frame sequence of 2N +1 frames from the T-N frame to the T + N frame can be obtained.

It is to be understood that, in a specific implementation, the current video frame may not be located in the middle of the video frame sequence of the preset window.

It should be noted that the size of the preset window in the time domain may be set according to the filtering precision requirement and the processing resource requirement, and according to experience. In an embodiment of this specification, the window size D is 5 video frames, that is, 2N +1 is 5, and N is 2, in other embodiments of this specification, N may also be 3 or 4, and other values may be selected specifically, and the value may be determined according to the final filtering effect. And the size of the preset window in the time domain can be adjusted according to the position of the current frame in the whole video stream.

In a specific implementation, for depth maps in the first N video frames in the entire video stream, no filtering process may be performed, i.e., filtering is performed from the N +1 th frame, and in this specification, T is greater than N.

S53, obtaining a window filter coefficient value corresponding to each video frame in the video frame sequence, where the window filter coefficient value is generated by weight values of at least two dimensions, and the window filter coefficient value includes: and the pixel confidence coefficient corresponds to a first filter coefficient weight value.

In a specific implementation, the first filter coefficient weight value may be obtained as follows:

obtaining confidence values of pixels corresponding to positions in the depth map to be processed and each second depth map, and determining a first filter coefficient weight value corresponding to the confidence values, wherein: the second depth map is a depth map with the same view angle as the depth map to be processed in each video frame of the video frame sequence.

In a specific implementation, there are various methods for evaluating the confidence of the pixels in the depth map to be processed and in each second depth map and obtaining the confidence values of the pixels at the corresponding positions in the depth map to be processed and in each second depth map.

For example, the depth map to be processed in each video frame in the preset window and the depth map in the preset view angle range around the view angle corresponding to each second depth map may be acquired to obtain a third depth map of the corresponding view angle, and the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map may be determined based on the third depth map of each corresponding view angle.

For another example, the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map may be determined based on the spatial consistency between the pixels corresponding to the positions in the depth map to be processed and in each second depth map and the pixels in the peripheral preset region in the depth map where the pixels are located.

In the following, how to specifically acquire the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map will be described in detail through specific application scenarios.

In the embodiments of the present specification, the correspondence relationship between the confidence value and the first filter coefficient weight value may be preset. The larger the confidence value c is, the larger the corresponding Weight value Weight _ c of the first filter coefficient is; the smaller the confidence value c is, the larger the corresponding first filter coefficient Weight value Weight _ c is, and the two are in an inverse correlation relationship.

And S54, filtering the pixels corresponding to the positions in the depth map to be processed according to a preset filtering mode based on the window filtering coefficient values corresponding to the video frames to obtain the depth values of the pixels corresponding to the positions in the depth map to be processed after filtering.

By adopting the scheme of the embodiment, for the depth map with the same visual angle as the depth map to be processed in each video frame in the video frame sequence of the preset window on the time domain, namely the second depth map, by acquiring the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map, and determining a first filter coefficient weight value corresponding to the confidence value, and generating a window filter coefficient value based on the first filter coefficient weight value, based on the corresponding window filter coefficient value of each video frame, filtering the pixels corresponding to the positions in the depth map to be processed according to a preset filtering mode to obtain the filtered depth values of the pixels corresponding to the positions in the depth map to be processed, the influence of unreliable depth values introduced into the depth map to be processed and each second depth map on the filtering result can be avoided, and therefore the stability of the depth map in the time domain can be improved.

In this embodiment, in order to improve the stability of the depth map in the time domain, as described in the previous embodiment, the window filter coefficient value is generated by at least two dimensional weight values, and one of the dimensional weight values is a first filter coefficient weight value corresponding to the pixel confidence. For a better understanding and realization of the embodiments of the present specification by those skilled in the art, the weighting values selected for the other dimensions generating the window filter coefficient values are exemplified below by specific embodiments.

It will be appreciated that, in addition to the following example dimension weight values, the window filter coefficient values may be generated based on the first filter coefficient weight values and filter coefficient weight values for one or more other dimensions, or the window filter coefficient weight values may be generated by including the first filter coefficient weight values, filter coefficient weight values for at least one of the following dimensions, and filter coefficient weight values for other dimensions.

Example dimensions one: second filter coefficient weight value corresponding to frame distance

Specifically, a frame distance between each video frame in the video frame sequence and the current video frame is obtained, and a second filter coefficient weight value corresponding to the frame distance is determined.

In particular implementations, the frame distance may be expressed in terms of a difference in frame position in the sequence of video frames, or in terms of a time interval between corresponding video frames in the sequence of video frames. Since the frames in the frame sequence are usually distributed at equal intervals, the difference between the frame positions in the video frame sequence is chosen for the convenience of operation. With continued reference to FIG. 6, for example, the frame distance between the T-1 th and T +1 th frames and the current video frame (Tth frame) is 1 frame, the frame distance between the T-2 th and T +2 th frames and the current video frame (Tth frame) is 2 frames, and so on, and the frame distance between the T-N th and T + N th frames and the current video frame (Tth frame) is N frames.

In a specific implementation, the corresponding relationship between the set frame distance d and the corresponding second filter coefficient Weight value Weight _ d may be preset. The smaller the frame distance d is, the larger the corresponding Weight value Weight _ d of the second filter coefficient is; the larger the frame distance d is, the smaller the corresponding weighted value Weight _ d of the second filter coefficient is, and the two are in an inverse correlation relationship.

Example dimension two: third filter coefficient weight value corresponding to pixel similarity

Specifically, the similarity value of the corresponding pixel in the texture map corresponding to each second depth map and the corresponding pixel in the texture map corresponding to the to-be-processed depth map may be obtained, and a third filter coefficient weight value corresponding to the similarity value may be determined.

Continuing with FIG. 6, for example, a depth map T corresponding to the view M in the Tth frame of the current video frame is obtained_MFor the depth map to be processed, window [ T-N, T + N]Depth map corresponding to inner visual angle M (T-N)_M…(T-2)_M、(T-1)_M、(T+1)_M…(T+2)_M、(T+N)_MSequentially as the T-2 th frame, the T-1 th frame, the T +1 th frame, the … th frame, the T +2 th frame and the T + N th frame of the T-N … and the depth map T to be processed_MDepth maps with the same visual angle, i.e. each video frame other than the current video frame T in the sequence of video frames and the depth map T to be processed_MA corresponding second depth map.

With continued reference to FIG. 6, a depth map T for a Tth frame view M_MFor convenience of description, a Pixel corresponding to any position (x, y) in the corresponding texture map is referred to as a first Pixel, a texture value of the first Pixel is expressed as Pixel (x1, y1), a Color (x1 ', y 1') of a Pixel corresponding to the first Pixel position in the texture map corresponding to each second depth map can be obtained, and then the Color (x1 ', y 1') of the Pixel corresponding to the position in the texture map corresponding to each second depth map relative to the Color (x1 ', y 1') of the first Pixel can be obtainedY1), and determining a third filter coefficient Weight value Weight _ s corresponding to the similarity value s.

In a specific implementation, a corresponding relationship between the similarity value s and the corresponding third filter coefficient Weight value Weight _ s may be preset. The larger the similarity value s is, the smaller the corresponding Weight value Weight _ s of the third filter coefficient is; the smaller the similarity value s is, the larger the corresponding third filter coefficient Weight value Weight _ s is, and the two are in an inverse correlation relationship.

For how to generate the window filter coefficient value in this embodiment of the present disclosure, in addition to the first filter coefficient weight value corresponding to the pixel confidence, if the filter coefficient weight values of the two exemplary dimensions are considered at the same time: the second filter coefficient Weight value corresponding to the frame distance and the third filter coefficient Weight value corresponding to the pixel similarity may be, in some embodiments of the present disclosure, the first filter coefficient Weight value Weight_iC, the weighted value of the second filter coefficient Weight_{i_}d and the third filter coefficient Weight value Weight_iThe product of s is used as the window filter coefficient value Weight corresponding to each video frame_iNamely: weight_i＝Weight_i_c*Weight_{i_}d*Weight_iS (i is T-N to T + N). Then, a weighted average of the products of the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the window filter coefficient values corresponding to the video frames may be calculated, so as to obtain the filtered depth values of the pixels corresponding to the positions in the to-be-processed depth map.

It is to be understood that, in an implementation, a product of the first filter coefficient weight value and one of the second filter coefficient weight value and the third filter coefficient weight value, or a weighted average value may also be used as the window filter coefficient value corresponding to each video frame. And then, calculating a weighted average value of products of the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the window filter coefficient values corresponding to the video frames to obtain the filtered depth values of the pixels corresponding to the positions in the to-be-processed depth map.

Continuing with the description of FIG. 6, the view angle in the Tth frameM to be processed depth map T_MIs called a second pixel for convenience of description, and the depth value of the second pixel is setThe following formula can be adopted for filtering processing to obtain the filtered depth value of the second pixel

Wherein, the pixels corresponding to the second Pixel position in each depth map in the above formula are all represented by pixels (x2, y2), and the viewing angle identifier and the located frame corresponding to each depth map are distinguished by the superscript and subscript of pixels (x2, y2), respectively.

It is understood that, in the specific implementation, the manner of obtaining the window filter coefficient value is not limited to the above embodiment, for example, the first filter coefficient Weight value Weight may also be taken_iC, second filter coefficient Weight value Weight_iD, the third filter coefficient Weight value Weight_iS and the arithmetic mean or weighted mean of the three, or other weight distribution means to obtain the window filter coefficient value.

In order to increase the filtering speed, the above embodiment may be adopted to perform filtering processing on each pixel in the depth map to be processed in parallel, or perform filtering processing on a plurality of pixels in batches.

By adopting the depth map processing method in the above embodiment, in the process of filtering the depth map to be processed, not only the frame distance between each video frame in the preset window in the time domain and the depth map to be processed (i.e. the second depth map) with the same visual angle as the depth map to be processed and the depth map to be processed, and/or the similarity of the texture value corresponding to the pixel at the corresponding position of the texture map corresponding to each depth map with the same visual angle as the depth map to be processed (i.e. the second depth map) are considered, but also the confidence degrees of the corresponding pixels in the depth map to be processed and the depth map (i.e. the second depth map) with the same visual angle as the depth map to be processed in each video frame are considered, the confidence values of the corresponding pixels are added to the window filter coefficient weight values, so as to avoid the influence on the filtering result caused by introducing unreliable depth values (including the unreliable depth map to be processed and the unreliable depth maps) in the time domain, the stability of the depth map in the time domain can be improved.

In order to make the person skilled in the art better understand and implement the embodiments of the present specification, how to obtain the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map is described in detail below by some specific embodiments.

In a first mode, based on the depth maps in the depth map to be processed and the preset view angle range around the view angle corresponding to each second depth map, the confidence values of the pixels corresponding to the positions in the depth map to be processed and each second depth map are determined.

Specifically, the depth map to be processed in each video frame and the depth map within a preset view angle range around a view angle corresponding to each second depth map may be acquired to obtain a third depth map corresponding to the view angle, and based on the third depth map corresponding to each view angle, the confidence values of the pixels at the corresponding positions in the depth map to be processed and in each second depth map may be determined.

Referring to fig. 6, for a depth map to be processed with an angle of view M in a current video frame, an angle of view of the second depth map in each video frame is also M, in a specific implementation, a depth map within a preset angle of view range [ M-K, M + K ] around the angle of view corresponding to each depth map with the angle of view M (including the depth map to be processed and each second depth map) may be obtained, and may be referred to as a third depth map for convenience of description. For example, it is possible to radiate 15 °, 30 °, 45 °, 60 °, and the like to both sides with the viewing angle M as a center. It is understood that the values are merely exemplary, and are not used to limit the scope of the present invention, and the specific values are related to the viewpoint density of the corresponding image combination in the image combination of each video frame, and the higher the viewpoint density is, the smaller the value range may be, and the lower the viewpoint density is, the value range may be correspondingly enlarged.

In a specific implementation, the view angle range may also be determined by using the distribution positions of the viewpoints corresponding to the image combination in space, for example, texture maps synchronously acquired by 40 acquisition devices arranged in an arc shape and corresponding depth maps obtained thereby, M and K may represent the positions of the acquisition devices, for example, M represents the view angle of the 10 th acquisition device from the left, and K takes 3, so that the confidence value of the pixel at the corresponding position in the depth map corresponding to the 10 th acquisition device may be determined based on the view angles of the 7 th to 13 th acquisition devices from the left, and based on the view angles of the 7 th to 9 th acquisition devices and the depth maps corresponding to the view angles of the 11 th to 13 th acquisition devices, respectively.

It should be noted that the value range of the preset view angle range may not take the view angle of the to-be-processed depth map as the center, and the specific value may be determined according to the spatial position relationship corresponding to the depth map in each video frame. For example, one or more depth maps closest to the corresponding viewpoint in each video frame may be selected for determining the confidence of the pixels in the depth map to be processed and each second depth map.

And determining the confidence values of the pixels corresponding to the positions in the depth map to be processed and the second depth map based on the spatial consistency between the pixels corresponding to the positions in the depth map to be processed and the second depth map and the pixels in the peripheral preset region in the depth map where the pixels are located.

It should be noted that, for the first mode, different implementation forms can be provided, and the following two examples are provided for illustration, and in the specific implementation, either one of the modes can be used alone, or both of the modes can be used in combination, or any one or combination of the modes can be further used in combination with other modes, and the examples in the present specification are only used for better understanding and implementing the present invention for those skilled in the art, and are not used for limiting the protection scope of the present invention.

An example of the first method is: determining confidence of pixels based on matching differences of texture maps

Referring to the flowchart of the method for obtaining the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map shown in fig. 7, the following steps may be specifically adopted:

and S71, acquiring texture maps corresponding to the depth maps to be processed and texture maps corresponding to the second depth maps.

As described in the foregoing embodiment, since the image combination of the multi-angle free view video frame includes multiple sets of angle-synchronized texture maps and depth maps having corresponding relationships, the texture map of the corresponding view can be obtained from the image combination of the current video frame and the video frame of each second depth map in the video frame sequence of the preset window. Referring to fig. 6, texture maps corresponding to all depth maps with M view angles in the T-N frame to the T + N frame video frames may be obtained, respectively.

S72, according to the depth values of the pixels corresponding to the positions in the depth map to be processed and the second depth maps, mapping the texture values at the corresponding positions in the texture map corresponding to the depth map to be processed and the texture map corresponding to the second depth maps to the corresponding positions in the texture map corresponding to the third depth map at each corresponding view angle, so as to obtain the mapping texture values corresponding to the third depth map at each corresponding view angle.

In a specific implementation, based on a spatial position relationship of an image combination of different views in the image combination of each video frame, according to depth values of pixels at corresponding positions in the to-be-processed depth map and each second depth map, texture values at corresponding positions in a texture map corresponding to the to-be-processed depth map and a texture map corresponding to each second depth map are respectively mapped to corresponding positions in a texture map corresponding to a third depth map of each corresponding view, so as to obtain a mapped texture value corresponding to the third depth map of each corresponding view.

Continuing with FIG. 6, for the texture values of the texture map corresponding to the depth map of view M in the T-N frame video frameCan be respectively reflectedThe visual angle of the T-th frame video frame is M-K, M + K]Corresponding positions in the texture map corresponding to the third depth map of multiple views within the range result in mapping texture values Color' (x, y) corresponding to the third depth map of the multiple views, that is:

and S73, matching the mapping texture values with the actual texture values of the corresponding positions in the texture map corresponding to the third depth maps of the corresponding view angles respectively, and determining the confidence values of the pixels corresponding to the positions in the depth map to be processed and the second depth maps based on the distribution intervals of the matching degrees of the texture values corresponding to the third depth maps of the corresponding view angles.

If the matching degree of the texture values corresponding to the third depth maps of the corresponding view angles is higher, it indicates that the difference of the corresponding texture maps is smaller, the reliability of the depth values of the pixels in the corresponding depth map to be processed and the second depth maps is higher, and accordingly, the confidence value is higher.

There may be various embodiments for how to determine the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map based on the distribution intervals of the matching degrees of the texture values corresponding to the third depth map of each corresponding view angle.

In a specific implementation, the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map may be determined comprehensively based on the matching degree of the texture values corresponding to the third depth map of each corresponding view and the number of the third depth maps satisfying the corresponding matching degree. For example, when the number of the matching degrees larger than the preset first matching degree threshold is larger than the preset first number threshold, the corresponding confidence value is set to be 1, otherwise, the corresponding confidence value is set to be 0. Similarly, in a specific implementation, the correspondence between the threshold of the degree of matching of the texture value, the threshold of the number of third depth maps satisfying the corresponding threshold of the degree of matching, and the confidence value may also be set in a gradient manner.

As can be seen from the above, the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map may be binary values, that is, 0 or 1, or may be set to any value in [0,1] or set discrete values.

For convenience of description, if the actual texture value at the corresponding position in the texture map corresponding to the third depth map of each corresponding view is Color _1(x, y), the mapped texture value Color' (x, y) corresponding to the third depth map of each corresponding view of each video frame may be respectively matched with the actual texture value Color _1(x, y) at the corresponding position in the texture map corresponding to the third depth map of the corresponding view, for example, the first matching degree threshold is set to 80%.

In an embodiment of the present specification, the third depth maps corresponding to the to-be-processed depth map and the second depth maps in each video frame are the third depth maps within a range of 30 degrees on both sides of the viewing angle of the to-be-processed depth map and the corresponding second depth map, and the third depth maps of three viewing angles exist within a range of 30 degrees on both sides of the viewing angle of the to-be-processed depth map and the corresponding second depth map. Considering the occlusion of the view angle, for example, if the number of the third depth maps satisfying the preset first matching degree threshold is greater than or equal to 2, determining that the confidence value of the pixel corresponding to the position in the second depth map of the video frame is 1; if the number of the third depth maps meeting the preset first matching degree threshold is 0, determining that the confidence value of a pixel corresponding to the position in the second depth map of the video frame is 1; and if the number of the third depth maps meeting the preset first matching degree threshold is greater than or equal to 2, determining that the confidence value of the pixel corresponding to the position in the second depth map of the video frame is 0.5. And for the depth map to be processed, a third depth map with three visual angles also exists, and the confidence coefficient value of the pixel corresponding to the position in the depth map to be processed is the same as the value judgment condition of the pixel corresponding to the position in each second depth map.

Example two of the first method: determining confidence of pixels based on consistency of depth maps

Determining the confidence of the pixel based on the consistency of the depth maps, wherein two implementation modes can be provided according to different mapping directions of the depth maps, one mode is that the depth values of the pixels corresponding to the positions in the depth map to be processed and the second depth map are mapped to the third depth maps of the corresponding view angles, and the mapping depth values of the pixels corresponding to the positions in the third depth maps of the corresponding view angles are respectively matched with the actual depth values of the pixels corresponding to the positions; and the other method is that the depth values of corresponding pixels at the positions in the acquired third depth map are mapped to corresponding positions in the to-be-processed depth map and the second depth map, and then the mapped depth values of the to-be-processed depth map and the corresponding view angles of the second depth maps are respectively matched with the actual depth values of the corresponding positions in the second depth map. The following is described in detail with specific application scenario development.

Referring to the flowchart of the method for obtaining the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps shown in fig. 8, the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps may be mapped to the third depth maps of the respective corresponding views, and the mapped depth values of the pixels corresponding to the positions in the third depth maps of the respective corresponding views may be respectively matched with the actual depth values of the pixels corresponding to the positions, which may specifically adopt the following steps:

and S81, mapping the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps to the third depth maps of the corresponding view angles to obtain the mapped depth values of the pixels corresponding to the positions in the third depth maps of the corresponding view angles.

As described in the foregoing embodiment, since the image combination of a multi-angle freeview video frame includes multiple sets of angle-synchronized texture maps and depth maps having a correspondence relationship, for any video frame, the depth maps of multiple views are included. According to a preset spatial position relationship, depth values of pixels at corresponding positions in the to-be-processed depth map and the second depth maps in each video frame in the preset window can be mapped to the third depth maps at corresponding view angles, so that mapped depth values of pixels at corresponding positions in the third depth maps at corresponding view angles are obtained. Referring to fig. 6, a third depth map corresponding to the depth map with the view angle M (including the depth map to be processed and the second depth maps) in the preset view angle range [ M-K, M + K ] in each video frame within the window from the T-N frame to the T + N frame may be obtained, and mapping of the depth value of the pixel within the same frame between the different depth maps within the preset view angle range is completed, that is, mapping of the depth value of the pixel corresponding to the position in the depth map with the view angle M (including the depth map to be processed and the second depth maps) in the preset window in the third depth maps with other corresponding view angles within the preset view angle range [ M-K, M + K ] in the frame is completed.

S82, matching the mapped depth value of the pixel at the corresponding position in the third depth map of each corresponding view with the actual depth value of the pixel at the corresponding position in the third depth map of each corresponding view, and determining the confidence values of the pixels at the corresponding positions in the depth map to be processed and in each second depth map based on the distribution interval of the matching degree of the depth values corresponding to the third depth map of each corresponding view.

If the matching degree of the depth values corresponding to the third depth maps of the corresponding view angles is higher, it indicates that the difference of the corresponding depth maps is smaller, the reliability of the depth value of the pixel in the to-be-processed depth map or the pixel in the second depth map is higher, and accordingly, the confidence value is higher.

There may be various embodiments for how to determine the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map based on the distribution intervals of the matching degrees of the depth values corresponding to the third depth maps of each corresponding view angle.

In a specific implementation, the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and in each second depth map may be determined comprehensively based on the matching degree of the depth values corresponding to the third depth map of each corresponding view and the number of the third depth maps satisfying the corresponding matching degree. For example, when the number of the matching degrees greater than a preset second matching degree threshold (as a specific example, the second matching degree threshold is 80% or 70%) is greater than a preset second number threshold, the corresponding confidence value is set to be 1, otherwise, the corresponding confidence value is set to be 0. Similarly, in a specific implementation, the matching degree threshold of the depth value, the number threshold of the third depth maps satisfying the corresponding matching degree threshold, and the corresponding relationship of the confidence value may also be set in a gradient manner.

In an embodiment of the present specification, third depth maps corresponding to the to-be-processed depth map or the second depth map in each video frame are third depth maps within a range of 30 degrees on both sides of a viewing angle of the second depth map, and third depth maps of three viewing angles exist within a range of 30 degrees on both sides of the viewing angle of the second depth map. Considering the occlusion of the view, for example, if the number of the third depth maps satisfying the preset second matching degree threshold is greater than or equal to 2, determining that the confidence value of the pixel corresponding to the position in the to-be-processed depth map or in the second depth map in the video frame is 1; if the number of the third depth maps meeting the preset second matching degree threshold is 0, determining that the confidence value of a pixel corresponding to the position in the depth map to be processed in the video frame or in the second depth map is 1; and if the number of the third depth maps meeting the preset second matching degree threshold is greater than or equal to 2, determining that the confidence value of the pixel corresponding to the position in the depth map to be processed in the video frame or in the second depth map is 0.5.

Referring to the flowchart of the method for obtaining the confidence values of the pixels corresponding to the positions in the depth map to be processed and the second depth maps shown in fig. 9, the depth value of the pixel corresponding to the position in the obtained third depth map may be reflected to the corresponding position in the second depth map, the mapped pixel position of the view angle corresponding to the second depth map may be compared with the distance of the actual pixel position of the corresponding position in the second depth map, and the confidence value of the pixel corresponding to the position in the depth map to be processed and the second depth map may be determined according to the difference between the two values, which may specifically include the following steps:

and S91, respectively acquiring depth values of pixels corresponding to positions in the to-be-processed depth map and the second depth maps, respectively mapping to corresponding pixel positions in the third depth map corresponding to the view angle according to the depth values, acquiring and reflecting the depth values to the corresponding pixel positions in the to-be-processed depth map and the second depth maps according to the depth values of the corresponding pixel positions in the third depth map corresponding to the view angle, and acquiring mapping pixel positions of the third depth maps corresponding to the view angles in the to-be-processed depth map and the second depth maps.

As described in the foregoing embodiment, an image combination of a multi-angle freeview video frame includes multiple sets of angle-synchronized texture maps and depth maps having a corresponding relationship, and for any video frame, includes depth maps of multiple views. The depth values of pixels at corresponding positions in the depth map with the view angle M (including the depth map to be processed and the second depth maps) in each video frame in the windows from the T-N frame to the T + N frame may be obtained, and according to the depth values, the depth values are mapped to corresponding pixel positions in the third depth map corresponding to the preset view angle range [ M-K, M + K ], so as to obtain the depth values of the pixels at corresponding positions in the third depth map with the view angle in the preset view angle range [ M-K, M + K ] in each video frame, in the depth map to be processed and in the second depth maps. Then, according to a preset spatial position relationship, reflecting the depth value of the corresponding pixel in the third depth map in each video frame in the window from the T-N frame to the T + N frame to the corresponding pixel position in the to-be-processed depth map and each second depth map, and obtaining the mapping pixel position of the third depth map of each corresponding view angle in the to-be-processed depth map and each second depth map.

Referring to fig. 6, the depth values of the pixels with corresponding positions in the to-be-processed depth map with the view angle M and in the second depth maps in each video frame within the window from the T-N frame to the T + N frame can be obtained respectively, according to the depth values, mapping to corresponding pixel positions in the corresponding third depth map within the preset view angle range [ M-K, M + K ], respectively, obtaining and according to the depth values of corresponding pixel positions in the third depth map of different view angles within the preset view angle range within the same frame, then, according to the preset spatial position relationship, and reflecting the depth value of the corresponding pixel position in the third depth map of the corresponding visual angle in each video frame to the corresponding pixel position in the to-be-processed depth map or the second depth map in the same video frame to obtain the corresponding mapping pixel position of the third depth map of each corresponding visual angle in the to-be-processed depth map and each second depth map. For example, according to the depth values of the corresponding pixel positions of the third depth map with the view angle range [ M-K, M + K ] in the T-N frame, the depth maps with the view angle M in the T-N frame are respectively mapped to the depth maps with the view angle M in the T-N frame, and the corresponding mapped pixel positions of the third depth maps (such as a plurality of third depth maps with the view angles M-2, M-1, M +2 in the T-N frame) with the view angle M in the T-N frame can be obtained.

And S92, calculating pixel distances between actual pixel positions of pixels at corresponding positions in the to-be-processed depth map and the second depth maps and mapping pixel positions obtained by reflection of the third depth map at the corresponding view angle respectively, and determining the confidence values of the pixels at the corresponding positions in the to-be-processed depth map and the second depth maps based on the distribution intervals of the calculated pixel distances.

If the pixel distance is smaller, the reliability of the depth value of the pixel at the corresponding position in the corresponding to-be-processed depth map or in the second depth map is higher, and accordingly, the confidence value is higher.

There may be various embodiments for how to determine the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map based on the distribution intervals of the matching degrees of the depth values corresponding to the third depth maps of each corresponding view angle.

In a specific implementation, the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps may be determined comprehensively based on the pixel distance size corresponding to the third depth map of each corresponding view and the number of the third depth maps satisfying the corresponding distance threshold interval. For example, it may be set when the distance is less than a preset distance threshold d₀Is greater than a preset third number threshold, the corresponding confidence value is set to 1, otherwise the corresponding confidence is setThe value is 0. Similarly, in implementations, the correspondence of the distance threshold, the number threshold of the third depth maps satisfying the respective distance threshold, and the confidence value may also be set in a gradient.

As can be seen from the above, the confidence values of the corresponding pixels in the depth map to be processed and in each second depth map may be binary values, that is, 0 or 1, or may be set to any value within [0,1] or set discrete values.

In an embodiment of the present specification, the third depth maps in each video frame, which correspond to the to-be-processed depth map and each second depth map, are third depth maps within 30 degrees of both sides of the viewing angle of the to-be-processed depth map and each second depth map, and there are third depth maps of three viewing angles within 30 degrees of both sides of the viewing angle of the to-be-processed depth map and each second depth map. Considering the occlusion of the view angle, for example, if the number of the third depth maps with the pixel distance smaller than the preset first distance threshold is greater than or equal to 2, determining that the confidence value of the pixel corresponding to the position in the to-be-processed depth map or the second depth map of the video frame is 1; if the number of the third depth maps with the pixel distance smaller than the preset first distance threshold is 0, determining that the confidence value of the pixel corresponding to the first pixel position in the depth map to be processed of the video frame or the second depth map is 1; and if the number of the third depth maps with the pixel distance meeting the preset first distance threshold is greater than or equal to 2, determining that the confidence value of the pixel corresponding to the position in the depth map to be processed of the video frame or the second depth map is 0.5.

Specific implementation examples are given above for determining the confidence of the pixel based on the matching difference of the texture map and the confidence of the pixel based on the consistency of the depth map, respectively. In particular implementations, the two may also be combined to determine the confidence of the pixel. Specific combinations of some embodiments are given below, and it is understood that the following examples are not intended to limit the scope of the present invention.

The combination method I comprises the following steps: taking the product of the confidence of the pixel determined based on the matching difference of the texture map and the confidence of the pixel determined based on the consistency of the depth map as the confidence of the pixel corresponding to the position in the second depth map, the confidence can be expressed by the following formula:

Weight_c＝Weight_c_texture*Weight_c_depth；

wherein Weight _ c represents the confidence of the pixel corresponding to the position in the depth map to be processed and the second depth map, Weight _ c _ texture represents the confidence of the pixel determined based on the matching difference of the texture map, and Weight _ c _ depth represents the confidence of the pixel determined based on the consistency of the depth map.

And a second combination mode: taking the weighted sum of the confidence of the pixel determined based on the matching difference of the texture map and the confidence of the pixel determined based on the consistency of the depth maps as the confidence of the pixel corresponding to the position in the depth map to be processed and each second depth map, and expressing the confidence as follows by using a formula:

Weight_c＝a*Weight_c_texture+b*Weight_c_depth；

wherein Weight _ c represents the confidence of the pixel corresponding to the position in the depth map to be processed and each second depth map, Weight _ c _ texture represents the confidence of the pixel determined based on the matching difference of the texture map, Weight _ c _ depth represents the confidence of the pixel determined based on the consistency of the depth map, a is a weighting coefficient of the confidence of the pixel determined based on the matching difference of the texture map, and b is the confidence of the pixel determined based on the consistency of the depth map.

While the above illustrates the manner in which pixel confidence is determined in the first manner, two implementation examples of determining pixel confidence in the second manner are given next:

example one: and respectively matching the pixels corresponding to the positions in the depth map to be processed and the second depth maps with the depth values of the pixels in the preset area around the depth map where the pixels are located, and respectively determining the confidence values of the pixels corresponding to the positions in the depth map to be processed and the second depth maps on the basis of the matching degree of the depth values and the number of the pixels of which the matching degree meets the threshold value of the matching degree of the preset pixels.

Referring to any one of the second depth maps Px shown in fig. 10, for a Pixel (x1 ', y 1') corresponding to any Pixel position in the depth map to be processed, as a Pixel whose confidence is to be determined, the Pixel (x1 ', y 1') may be respectively matched with the depth value of any Pixel in a preset region R around the Pixel (x1 ', y 1') in the second depth map Px, for example, if the matching degree of 5 pixels among 8 pixels in the preset region R is preset to be greater than a preset Pixel matching degree threshold value 60%, the confidence of the Pixel (x1 ', y 1') in the second depth map Px is determined to be 0.8.

In specific implementation, the preset region may be a circle, a rectangle, or an irregular shape, the specific shape is not limited, and the pixel surrounding the confidence to be determined may be the preset region, and the size of the preset region may be set according to experience.

Example two: and matching the weighted average values of the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the pixels in the peripheral preset area in the depth map where the pixels are located, and respectively determining the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps based on the matching degrees of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the weighted average values.

With continued reference to fig. 10, in an embodiment of the present specification, first performing weighted average on the depth values of the pixels in the preset region R around the Pixel (x1 ', y 1'), and then performing similarity matching, for example, the degree of matching between the weighted average and the depth value of the Pixel (x1 ', y 1') is greater than 50%, and it may be determined that the confidence of the Pixel (x1 ', y 1') in the second depth map Px is 1.

Various ways in which the confidence of the pixels corresponding in position in the depth map to be processed and in the respective second depth map can be determined are given above. In particular implementations, at least two of these may be used in combination. The method comprises the steps of adding a first filter coefficient weight value corresponding to a confidence value of a corresponding pixel in the depth map to be processed and each second depth map to a window filter coefficient value, filtering depth values of the corresponding pixels in the depth map to be processed according to a preset filtering mode, obtaining filtered depth values of the corresponding pixels in the depth map to be processed, and avoiding the introduction of unreliable depth values in the depth map to be processed and each second depth map from influencing filtering results, so that the stability of the depth map in a time domain can be improved.

By adopting the depth map processing method of the above embodiment, after the depth map is filtered in the time domain, the image quality of video reconstruction can be improved, and in order to make those skilled in the art better understand and implement, how to perform video reconstruction is described below by an embodiment.

Referring to the flowchart of the video reconstruction method shown in fig. 11, the method may specifically include the following steps:

s111, acquiring image combinations of video frames of multi-angle free visual angles, parameter data corresponding to the image combinations of the video frames and virtual viewpoint position information based on user interaction, wherein the image combinations of the video frames comprise multiple angle-synchronous texture maps and depth maps with corresponding relations.

And S112, filtering the depth map in a time domain.

In a specific implementation, the depth map processing method in the embodiments of the present specification may be used to perform the filtering process, and the specific method may refer to the description of the foregoing embodiments and will not be further elaborated herein.

S113, according to the virtual viewpoint position information and parameter data corresponding to the image combination of the video frame, selecting a texture map and a filtered depth map of a corresponding group in the image combination of the video frame at the user interaction time according to a preset rule.

S114, based on the virtual viewpoint position information and parameter data corresponding to texture maps and depth maps of corresponding groups in image combinations of the video frames at the user interaction time, performing combined rendering on the texture maps and the filtered depth maps of the corresponding groups in the image combinations of the video frames at the selected user interaction time to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time.

By adopting the video reconstruction method, the depth map in the video frame is filtered in the time domain, and for the depth map with the same visual angle as the depth map to be processed, namely the second depth map, in each video frame in the video frame sequence of the preset window in the time domain, the influence of unreliable depth values in the depth map to be processed and each second depth map on the filtering result can be avoided by acquiring the confidence values of pixels corresponding to the positions in the depth map to be processed and each second depth map and determining the first filtering coefficient weight value corresponding to the confidence value and adding the first filtering coefficient weight value to the window filtering coefficient value, so that the stability of the depth map in the time domain can be improved, and the image quality of the reconstructed video can be improved.

The embodiments of the present disclosure also provide a specific apparatus and system capable of implementing the method of the foregoing embodiments, which are described below by way of specific embodiments with reference to the accompanying drawings.

The embodiment of the present specification provides a depth map processing apparatus, which may perform filtering processing on a depth map in a time domain. Referring to the schematic structural diagram of the depth map processing apparatus shown in fig. 12, the depth map processing apparatus 120 may include:

the depth map acquiring unit 121 is adapted to acquire a depth map to be processed from an image combination of a current video frame of a multi-angle free view, where the image combination of the current video frame of the multi-angle free view includes multiple sets of texture maps and depth maps having a corresponding relationship and being synchronized by multiple angles;

a frame sequence obtaining unit 122, adapted to obtain a video frame sequence including a preset window in a time domain of the current video frame;

a window filter coefficient value obtaining unit 123 adapted to obtain a window filter coefficient value corresponding to each video frame in the sequence of video frames, where the window filter coefficient value is generated by weight values of at least two dimensions, and includes: the window filter coefficient value obtaining unit 123 includes: a first filter coefficient weight value obtaining subunit 1231, adapted to obtain confidence values of pixels corresponding to positions in the depth map to be processed and in each second depth map, and determine a first filter coefficient weight value corresponding to the confidence value, where: the second depth map is a depth map with the same visual angle as the depth map to be processed in each video frame of the video frame sequence;

the filtering unit 124 is adapted to filter, based on the window filter coefficient value corresponding to each video frame, a pixel corresponding to a position in the depth map to be processed in a preset filtering manner, so as to obtain a depth value of the pixel corresponding to the position in the depth map to be processed after filtering.

In a specific implementation, the first filter coefficient weight value obtaining subunit 1231 may include at least one of the following confidence value determining subunits:

a first confidence value determining component 12311, adapted to acquire the depth map to be processed and the depth map within a preset view angle range around a view angle corresponding to each second depth map, to obtain a third depth map of the corresponding view angle, and determine, based on the third depth map of each corresponding view angle, a confidence value of a pixel corresponding to a position in the depth map to be processed and each second depth map;

a second confidence value determination means 12312 adapted to determine the confidence value of the pixel corresponding to the position in the depth map to be processed and in each second depth map based on the spatial consistency of the pixel corresponding to the position in the depth map to be processed and in each second depth map and the pixel in the surrounding preset area in the depth map where the pixel is located.

In an embodiment of the present specification, the first confidence value determining means 12311 is adapted to obtain a texture map corresponding to the to-be-processed depth map and a texture map corresponding to each second depth map, and map, according to depth values of pixels corresponding to positions in the to-be-processed depth map and each second depth map, texture values at corresponding positions in the texture map corresponding to the to-be-processed depth map and the texture map corresponding to each second depth map to corresponding positions in a texture map corresponding to a third depth map at each corresponding view angle, so as to obtain a mapped texture value corresponding to the third depth map at each corresponding view angle; and respectively matching the mapping texture value with the actual texture value of the corresponding position in the texture map corresponding to the third depth map of each corresponding view angle, and determining the confidence value of the pixel corresponding to the position in the depth map to be processed and each second depth map based on the distribution interval of the matching degree of the texture value corresponding to the third depth map of each corresponding view angle.

In another embodiment of the present specification, the first confidence value determining means 12311 is configured to map the pixels at the corresponding positions in the depth map to be processed and in each second depth map to the third depth map at each corresponding view angle, so as to obtain the mapped depth value of the pixel at the corresponding position in the third depth map at each corresponding view angle; and matching the mapping depth value of the pixel at the corresponding position in the third depth map of each corresponding view angle with the actual depth value of the pixel at the corresponding position in the third depth map of each corresponding view angle, and determining the confidence value of the pixel at the corresponding position in the depth map to be processed and each second depth map based on the distribution interval of the matching degree of the depth value corresponding to the third depth map of each corresponding view angle.

In yet another embodiment of the present disclosure, the first confidence value determining means 12311 is adapted to obtain depth values of pixels corresponding to positions in the to-be-processed depth map and the second depth maps, respectively, map the depth values to corresponding pixel positions in the third depth map of the corresponding view according to the depth values, obtain and reflect the corresponding pixel positions in the to-be-processed depth map and the second depth maps according to the depth values of the corresponding pixel positions in the third depth map of the corresponding view, so as to obtain corresponding mapped pixel positions of the third depth map of each corresponding view in the to-be-processed depth map and the second depth maps; and respectively calculating the pixel distance between the actual pixel position of the pixel at the corresponding position in the to-be-processed depth map and each second depth map and the pixel position of the mapping pixel obtained by reflection of the third depth map at the corresponding view angle, and determining the confidence value of the pixel at the corresponding position in the to-be-processed depth map and each second depth map based on the distribution interval of the calculated pixel distances.

In an embodiment of the present specification, the second confidence value determining means 12312 is adapted to match the pixels corresponding to the positions in the depth map to be processed and in each second depth map with the depth values of the pixels in the peripheral preset region in the depth map where the pixels are located, and determine the confidence values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map based on the matching degree of the depth values and the number of the pixels whose matching degree satisfies the preset pixel matching degree threshold.

In another embodiment of the present disclosure, the second confidence value determining unit 12312 matches the weighted average of the depth values of the pixels in the depth map to be processed and the corresponding positions in each second depth map with the pixels in the surrounding preset area in the depth map where the pixels are located, and determines the confidence values of the pixels in the depth map to be processed and the corresponding positions in each second depth map respectively based on the matching degrees of the pixels in the depth map to be processed and the corresponding positions in each second depth map with the corresponding weighted average.

In a specific implementation, the weighted values of the window filter coefficients may further include at least one of: a second filter coefficient weight value corresponding to the frame distance and a third filter coefficient weight value corresponding to the pixel similarity.

Accordingly, the window filter coefficient value obtaining unit 123 may further include at least one of:

a second filtering coefficient weight value obtaining subunit 1232, configured to obtain a frame distance between each video frame in the sequence of video frames and the current video frame, and determine a second filtering coefficient weight value corresponding to the frame distance;

the third filtering coefficient weight value obtaining subunit 1233 is adapted to obtain similarity values of corresponding pixels in the texture map corresponding to each second depth map and the texture map corresponding to the depth map to be processed, and determine a third filtering coefficient weight value corresponding to the similarity value.

In an embodiment of the present specification, the filtering unit 124 is adapted to take a product of the first filter coefficient weight value and at least one of the second filter coefficient weight value and the third filter coefficient weight value, or a weighted average value as a window filter coefficient value corresponding to each video frame; and calculating the weighted average value of the products of the depth values of the pixels corresponding to the positions in the to-be-processed depth map and the second depth maps and the window filter coefficient values corresponding to the video frames to obtain the filtered depth values corresponding to the positions in the to-be-processed depth map.

The embodiment of the specification further provides a video reconstruction system, and the video reconstruction system is adopted for video reconstruction, so that the image quality of a reconstructed video can be improved. Referring to the schematic structural diagram of the video reconstruction system shown in fig. 13, the video reconstruction system 130 includes: an acquisition module 131, a filtering module 132, a selection module 133, and an image reconstruction module 134, wherein:

the obtaining module 131 is adapted to obtain an image combination of a video frame of a multi-angle free view, parameter data corresponding to the image combination of the video frame, and virtual viewpoint position information based on user interaction, where the image combination of the video frame includes multiple sets of texture maps and depth maps having a corresponding relationship and being synchronized in multiple angles;

the filtering module 132 is adapted to filter the depth map in the video frame;

the selecting module 133 is adapted to select, according to the virtual viewpoint position information and parameter data corresponding to the image combination of the video frame, a texture map and a filtered depth map of a corresponding group in the image combination of the video frame at the user interaction time according to a preset rule;

the image reconstruction module 134 is adapted to perform combined rendering on the texture map and the filtered depth map of the corresponding group in the image combination of the video frame at the selected user interaction time based on the virtual viewpoint position information and parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination of the video frame at the user interaction time to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction time;

wherein the filtering module 132 may include:

the depth map acquiring unit 1321 is adapted to acquire a depth map to be processed from an image combination of a current video frame of a multi-angle free view, where the image combination of the current video frame of the multi-angle free view includes multiple sets of texture maps and depth maps which have corresponding relationships and are synchronized by multiple angles;

a frame sequence obtaining unit 1322 adapted to obtain a video frame sequence including a preset window in a time domain of the current video frame;

a window filter coefficient value obtaining unit 1323, adapted to obtain a window filter coefficient value corresponding to each video frame in the sequence of video frames, where the window filter coefficient value is generated by weight values of at least two dimensions, and includes: the window filter coefficient value obtaining unit includes: a first filter coefficient weight value obtaining subunit 13231, adapted to obtain confidence values of pixels corresponding to positions in the depth map to be processed and in each second depth map, and determine a first filter coefficient weight value corresponding to the confidence value, where: the second depth map is a depth map with the same visual angle as the depth map to be processed in each video frame of the video frame sequence;

the filtering unit 1324 is adapted to filter, based on the window filter coefficient value corresponding to each video frame, a pixel corresponding to a position in the depth map to be processed in a preset filtering manner, so as to obtain a depth value of the pixel corresponding to the position in the depth map to be processed after filtering.

In a specific implementation, the window filter coefficient value obtaining unit 1323 further includes at least one of:

a second filtering coefficient weight value obtaining subunit 13232, configured to obtain a frame distance between each video frame in the sequence of video frames and the current video frame, and determine a second filtering coefficient weight value corresponding to the frame distance;

the third filtering coefficient weight value obtaining subunit 13233 is adapted to obtain similarity values of corresponding pixels in the texture map corresponding to each second depth map and the texture map corresponding to the depth map to be processed, and determine a third filtering coefficient weight value corresponding to the similarity value.

The specific implementation of the filtering module 132 may refer to fig. 12, and the depth map processing apparatus shown in fig. 12 may be used as the filtering module 132 to perform temporal filtering, and may specifically refer to the depth map processing apparatus and the depth map processing method in the foregoing embodiment. It should be noted that the depth map processing apparatus may be specifically implemented by corresponding software, hardware, or a combination of software and hardware. The calculation of each filter coefficient weight value can be implemented by one or more CPUs or GPUs, or the CPUs and the GPUs in a cooperative manner, and the CPUs can communicate with one or more GPU chips or GPU modules to control each GPU chip or GPU module to perform filtering processing of the depth map.

Referring to the structural schematic diagram of the electronic device shown in fig. 14, the electronic device 140 may include a memory 141 and a processor 142, where the memory 141 stores computer instructions executable on the processor 142, and the processor 142, when executing the computer instructions, may perform the depth map processing method according to any of the foregoing embodiments or the steps of the video reconstruction method according to any of the foregoing embodiments. For specific steps, reference may be made to the description of the foregoing embodiments, which are not described herein again.

It should be noted that the processor 142 may specifically include a CPU chip 1421 formed by one or more CPU cores, or may include a GPU chip 1422, or a chip module composed of the CPU chip 1421 and the GPU chip 1422. The processor 142 and the memory 141 may communicate with each other via a bus or the like, and the chips may communicate with each other via corresponding communication interfaces.

The embodiments of the present specification further provide a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the depth map processing method according to any one of the foregoing embodiments or the steps of the video reconstruction method according to any one of the foregoing embodiments may be performed. For specific steps, reference may be made to the description of the foregoing embodiments, which are not described herein again.

For a better understanding and implementation by a person skilled in the art, the following description is given by way of example of a specific application of the specific application scenario shown in fig. 1.

The cloud server cluster 13 may first perform temporal filtering on the depth map by using the embodiment of the present specification, and then perform image reconstruction based on the texture map and the filtered depth map of the corresponding group in the image combination of the video frame, so as to obtain a reconstructed multi-angle free view image.

In a specific implementation, the cloud server cluster 13 may include: a first cloud server 131, a second cloud server 132, a third cloud server 133, and a fourth cloud server 134. The first cloud server 131 may be configured to determine parameter data corresponding to the image combination; the second cloud server 132 may be configured to determine depth data of each frame of image in the image combination; the third cloud server 133 may perform frame Image reconstruction on a preset virtual viewpoint path by using a Depth Image Based Rendering (DIBR) algorithm Based on the parameter data corresponding to the Image combination, the pixel data of the Image combination, and the Depth data; the fourth cloud server 134 may be configured to generate a multi-angle freeview video, where the multi-angle freeview video data may include: multi-angle free view spatial data and multi-angle free view temporal data of frame images ordered according to frame time.

It can be understood that the first cloud server 131, the second cloud server 132, the third cloud server 133, and the fourth cloud server 134 may also be a server group composed of a server array or a server sub-cluster, and the embodiments of the present disclosure are not limited thereto.

As a specific example, the second cloud server 132 may obtain a depth map from an image combination of a current video frame from a multi-angle free view, as a to-be-processed depth map, where the second cloud server 132 may perform temporal filtering on the to-be-processed depth map by using the solution of the foregoing embodiment of the present specification, so as to improve stability of the depth map in a temporal domain, and then perform video reconstruction by using the temporally filtered depth map, so as to improve image quality of video reconstruction regardless of playing at the playing terminal 15 or the interactive terminal 16.

In embodiments of the present description, the capture device may also be located in a ceiling area of a basketball venue, on a basketball stand, or the like. The acquisition devices can be distributed along a straight line, a fan shape, an arc line, a circle, a matrix or an irregular shape. The specific arrangement mode can be set according to one or more factors such as specific field environment, the number of the acquisition equipment, the characteristics of the acquisition equipment, imaging effect requirements and the like. The acquisition device may be any device having a camera function, such as a general camera, a mobile phone, a professional camera, and the like.

In some embodiments of the present specification, as shown in fig. 1, each acquisition device in the acquisition array 11 may transmit the obtained video data stream to the data processing device 12 in real time through a switch 17 or a local area network.

It can be understood that the data processing device 12 may be disposed in a non-acquisition field area or a cloud end according to a specific scenario, and the server (cluster) and the play control device may be disposed in the non-acquisition field area, the cloud end or a terminal access side according to the specific scenario, which is not intended to limit the specific implementation and protection scope of the present invention. Specific implementation manners, operation principles, specific actions and effects of each device, system, apparatus or system in the embodiments of the present description may be referred to in specific descriptions of corresponding method embodiments.

It can be understood that the above embodiment is applicable to live or live broadcast scenes, but is not limited thereto, and the solutions in the embodiments of the present disclosure for video or image acquisition, data processing of video data streams, and image generation of a server may also be applicable to the playing requirements of non-live scenes, such as recording, rebroadcasting, and other scenes with low latency requirements.

Although the embodiments of the present invention are disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected by one skilled in the art without departing from the spirit and scope of the embodiments of the invention as defined in the appended claims.

37页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种电力运维中实时数据、三维模型和视频结合显示系统

Depth map processing method, video reconstruction method and related device

相关技术

网友询问留言